qemu virtio issues after upgrade to 9

Thank you for your responses. Now I’m a bit wiser. In my opinion, this change came unexpectedly or was not properly tested, or there was not sufficient warning. In my case, the bridge has an MTU of 9000, since the storage is also connected through it. Simply setting the bridge’s MTU to 1500 would lead to a disconnection from the storage. I therefore had to shut down, migrate, and restart each VM – with the result that now all VMs have an MTU of 9000. Do I really have to manually set each VM to an MTU of 1500, damn?! The changes are understandable, but the implementation has caused major issues. Overall, this is very unfortunate.

I understand your frustration with the situation, particularly the migration issue is awkward and @fiona is currently working on improving this situation.

There is a check for this included in pve8to9 though, as well as a note in the release notes [1]. We should probably include this in the upgrade guide as well, I couldn't find it there.

[1] https://pve.proxmox.com/wiki/Roadmap#Known_Issues_&_Breaking_Changes
 
Last edited:
Hi,

your link is not working :).

Do you know when the solution will be released?

I have now upgraded all PVE hosts to version 9, which resulted in all NICs in the VMs having an MTU of 9000. I emptied one host and changed the bridge MTU to 1500. Now I have the problem in the opposite direction. I want to migrate a VM with an MTU of 9000 to a host with a bridge MTU of 1500. However, the migration fails here as well. The only solution would be to manually adjust all VMs… I had hoped that a successful migration would automatically change the MTU from 9000 to 1500 when “MTU: Same as bridge” was set.
 
Fixed the link in my previous post. The respective patch series has been merged yesterday and should land in our testing / no subscription repositories soon - I can post here as soon as it landed.
 
  • Like
Reactions: fettfoen
A quick note on this.

...The pve8to9 checklist scripts will detect vNICs where the MTU would change after upgrade...

It is common practice to migrate all VMs first before performing an upgrade. This message only appears if the checks are executed while VMs are still running on the host. I guess I was being a bit too thorough and didn’t see this warning, since no guests were running on the hosts anymore ^^.
 
FYI, a fix for the MTU migration issue is available with qemu-server >= 9.0.20 which is currently available in the pve-test repository.
 
  • Like
Reactions: shanreich
Chiming in on this, as I've just experienced some weird behavior on a PVE 8 cluster:

I just did a regular package update containing the most recent 6.8 kernel. To complete that update, I've triggered a machine reboot, causing all VMs to be migrated according to my HA settings. So far so good, most VMs migrated without issue but at some it got stuck, complaining about the nets-host-mtu setting being unknown on the target host. All affected VMs had their MTU set to "1", and most migrated just fine so that baffled me. Obviously I was unable to update the target host as requested, as I'd have to reboot that as well to finalize the kernel update, which could have led to the same issue vice versa.

One of the error outputs was as follows:
Code:
task started by HA resource agent
2025-09-18 11:12:54 use dedicated network address for sending migration traffic (10.19.69.11)
2025-09-18 11:12:54 starting migration of VM 110 to node 'pve02' (10.19.69.11)
2025-09-18 11:12:54 starting VM 110 on remote node 'pve02'
2025-09-18 11:12:55 [pve02] Unknown option: nets-host-mtu
2025-09-18 11:12:55 [pve02] 400 unable to parse option
2025-09-18 11:12:55 [pve02] qm start <vmid> [OPTIONS]
2025-09-18 11:12:55 ERROR: online migrate failure - target node pve02 is too old for preserving VirtIO-net MTU, please upgrade
2025-09-18 11:12:55 aborting phase 2 - cleanup resources
2025-09-18 11:12:55 migrate_cancel
2025-09-18 11:12:56 ERROR: migration finished with problems (duration 00:00:03)
TASK ERROR: migration problems

I thought these changes would only appear in conjunction with an upgrade to PVE 9, but we're still on latest 8 and didn't try to do that upgrade yet.

I've got some VMs to migrate by actively removing the MTU = 1 setting from them and re-adding it after they moved, but that was quite tedious, as we have a couple dozen VMs. Then the issue mostly solved itself, as still open and failing migrations were aborted after some time, the host rebooted and the VMs got respawned on the target host in the mean time (which caused an unforeseen but luckily just short outage of our services. Updating the second host then went through fine without hiccups again (where VMs got migrated to the first updated host).

If the default behavior for the MTU setting now also changed for PVE 8, please do not forget to reflect that in the docs. I'd be happy to be able to remove the explicit setting from every VM, but it would be great to be notified about such change being required.
 
Last edited:
Hi,
Chiming in on this, as I've just experienced some weird behavior on a PVE 8 cluster:

I just did a regular package update containing the most recent 6.8 kernel. To complete that update, I've triggered a machine reboot, causing all VMs to be migrated according to my HA settings. So far so good, most VMs migrated without issue but at some it got stuck, complaining about the nets-host-mtu setting being unknown on the target host. All affected VMs had their MTU set to "1", and most migrated just fine so that baffled me. Obviously I was unable to update the target host as requested, as I'd have to reboot that as well to finalize the kernel update, which could have led to the same issue vice versa.

One of the error outputs was as follows:
Code:
task started by HA resource agent
2025-09-18 11:12:54 use dedicated network address for sending migration traffic (10.19.69.11)
2025-09-18 11:12:54 starting migration of VM 110 to node 'pve02' (10.19.69.11)
2025-09-18 11:12:54 starting VM 110 on remote node 'pve02'
2025-09-18 11:12:55 [pve02] Unknown option: nets-host-mtu
2025-09-18 11:12:55 [pve02] 400 unable to parse option
2025-09-18 11:12:55 [pve02] qm start <vmid> [OPTIONS]
2025-09-18 11:12:55 ERROR: online migrate failure - target node pve02 is too old for preserving VirtIO-net MTU, please upgrade
2025-09-18 11:12:55 aborting phase 2 - cleanup resources
2025-09-18 11:12:55 migrate_cancel
2025-09-18 11:12:56 ERROR: migration finished with problems (duration 00:00:03)
TASK ERROR: migration problems

I thought these changes would only appear in conjunction with an upgrade to PVE 9, but we're still on latest 8 and didn't try to do that upgrade yet.
with MTU=1 i.e. inherit from bridge, it is necessary to pass along the new setting so that the QEMU instance on the target will be started using the same MTU settings as the source instance. Otherwise, there will be a mismatch between the internal guest state that is migrated and the settings of the new instance. This could lead to broken network if the MTUs were different and that bug did also affect Proxmox VE 8.

We do try to support migrations from new versions to old versions when possible, but in this case it was not possible in combination with fixing that bug. It is recommended to migrate VMs away before the upgrade (e.g. using HA maintenance mode) or to another upgraded node before reboot.

I've got some VMs to migrate by actively removing the MTU = 1 setting from them and re-adding it after they moved, but that was quite tedious, as we have a couple dozen VMs. Then the issue mostly solved itself, as still open and failing migrations were aborted after some time, the host rebooted and the VMs got respawned on the target host in the mean time (which caused an unforeseen but luckily just short outage of our services. Updating the second host then went through fine without hiccups again (where VMs got migrated to the first updated host).
If the default behavior for the MTU setting now also changed for PVE 8, please do not forget to reflect that in the docs. I'd be happy to be able to remove the explicit setting from every VM, but it would be great to be notified about such change being required.
No, the default behavior for MTU did not change for Proxmox VE 8, just how MTU=1 is handled for migration.
 
Last edited:
Hi,

with MTU=1 i.e. inherit from bridge, it is necessary to pass along the new setting so that the QEMU instance on the target will be started using the same MTU settings as the source instance. Otherwise, there will be a mismatch between the internal guest state that is migrated and the settings of the new instance. This could lead to broken network if the MTUs were different and that bug did also affect Proxmox VE 8.

...

No, the default behavior for MTU did not change for Proxmox VE 8, just how MTU=1 is handled for migration.
I see, so this was basically expected behavior for the recent PVE 8 -> PVE 8 update?

While I can somewhat understand it should have been in my responsibility to check for any prerequisites on this update, I'm still a bit surprised no "red flag" showed up, as this was a patch to patch upgrade only. Especially with a subscription it would have been a blessing to get some "pay attention to this" mail or similar.

In general, if this is deemed a helpful idea, I'd ask for the implementation of some "pre upgrade notes" showing up, in case there could be something breaking going on between patch level upgrades. Even just having such thing hidden in cluster/host options, default turned off, would be a blessing. This idea specifically refers to patch to patch (or UI started) upgrades in general and not the major version upgrades, for which it is well documented on how to proceed.
 
Last edited: