ext4_multi_mount_protect Delaying Container Migration

trystan · May 4, 2018

NFS Shared Storage, LXC templates of varying operating systems all hang for roughly 40 seconds on PVE 5.1 target node with the following when performing a restart migration.

kernel: EXT4-fs warning (device loop0): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.

It finally resumes with:

kernel: EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)

and starts normally.

Some digging leads me to https://www.systutorials.com/docs/linux/man/8-tune2fs/ the tune2fs man pages however those options to set the interval can not be added to the loop device that the container uses for root.

The migration/restart should be close to immediate but there's close to a full minute delay added for the multi mount check to timeout.

trystan · May 7, 2018

Looks like it's specifically an NFS problem, migration time on shared iSCSI as well as Ceph is < 4 seconds and does not include the multi mount protection warnings. Any thoughts on how I can troubleshoot further or a way to work around the MMP interval?

wolfgang · May 7, 2018

Hi,

can't reproduce it with CentOS 7, Debian9, Ubuntu 16.04 here. please send more information about your setup.
pveversion -v
CT config
Container running software
NFS server

trystan · May 7, 2018

Code:

proxmox-ve: 5.1-43 (running kernel: 4.15.15-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.15: 5.1-3
pve-kernel-4.15.15-1-pve: 4.15.15-6
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-19
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-26
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9

Code:

proxmox-ve: 5.1-43 (running kernel: 4.15.15-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.15: 5.1-3
pve-kernel-4.15.15-1-pve: 4.15.15-6
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-19
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-26
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9

nfs server(proxmox w/ same software as source and destination):
exports /rpool/data 172.16.8.54(rw,async,no_root_squash,no_subtree_check)

Container OS: default CentOS 7 (also experienced w/ Ubuntu 16.04) all updates and no additional repos/software added from base
arch: amd64
cores: 4
hostname: migrate-test
memory: 4096
net0: name=eth0,bridge=vmbr0,hwaddr=0E:F1:AF:CE:98:36,ip=dhcp,type=veth
ostype: centos
rootfs: vm:103/vm-103-disk-1.raw,size=8G
swap: 512

Code:

nfs: vm
        export /rpool/data
        path /mnt/pve/vm
        server sionis-nfs
        content images,rootdir
        maxfiles 8
        nodes cyrus,lucius
        options vers=4.2,async,hard,tcp,noatime

trystan · May 7, 2018

Interestingly enough, I can manually add a sync after shutdown on the source node and the delay is gone.

pct shutdown 103 && pct migrate 103 lucius && sync && ssh lucius 'pct start 103'

that results in < 3 second migration time.

I'm using async nfs w/ zfs on the backend and a UPS(async can get messy fast without one) for performance reasons, but I'm wondering if it's possible to include a 'sync' in your migration script.

Thanks

wolfgang · May 7, 2018

I will test tomorrow, but I think it has something to do with the async and zfs.

trystan · May 8, 2018

Thanks for looking into it, my suspicion is that the .raw file still *thinks* it's mounted from the delayed write on shutdown with async.

I'd assume that on any shutdown task the .raw file and parent directory would have an fsync or fdatasync call associated but I guess not.

So far I haven't noticed any issues using a custom .sh for now that just adds a 'sync' call after migrating and before starting on the destination.

Worth noting that the sync is originating from the NFS client(source).

Alibek · Dec 30, 2019

wolfgang said:
I will test tomorrow, but I think it has something to do with the async and zfs.

Is that fixed in Proxmox 6?
In Proxmox 5.4 this bug is present. It also appeared after moving .raw to XFS

Apollon77 · Mar 7, 2020

I have the same effect with containers on glusterfs (directory storage on mounted glusterfs mountpoint). I hae this also when starting containers. After some time it fails and gets restarted and so it works in the end, but delayed. And yes with proxmox 6

Search

Search

ext4_multi_mount_protect Delaying Container Migration

trystan

New Member

trystan

New Member

wolfgang

Proxmox Retired Staff

trystan

New Member

trystan

New Member

wolfgang

Proxmox Retired Staff

trystan

New Member

Alibek

Renowned Member

Apollon77

Well-Known Member

We value your privacy