ext4_multi_mount_protect Delaying Container Migration

trystan

New Member
Dec 15, 2017
21
1
3
34
NFS Shared Storage, LXC templates of varying operating systems all hang for roughly 40 seconds on PVE 5.1 target node with the following when performing a restart migration.

kernel: EXT4-fs warning (device loop0): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.

It finally resumes with:

kernel: EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)

and starts normally.

Some digging leads me to https://www.systutorials.com/docs/linux/man/8-tune2fs/ the tune2fs man pages however those options to set the interval can not be added to the loop device that the container uses for root.

The migration/restart should be close to immediate but there's close to a full minute delay added for the multi mount check to timeout.
 
Looks like it's specifically an NFS problem, migration time on shared iSCSI as well as Ceph is < 4 seconds and does not include the multi mount protection warnings. Any thoughts on how I can troubleshoot further or a way to work around the MMP interval?
 
Hi,

can't reproduce it with CentOS 7, Debian9, Ubuntu 16.04 here. please send more information about your setup.
pveversion -v
CT config
Container running software
NFS server
 
Code:
proxmox-ve: 5.1-43 (running kernel: 4.15.15-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.15: 5.1-3
pve-kernel-4.15.15-1-pve: 4.15.15-6
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-19
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-26
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9

Code:
proxmox-ve: 5.1-43 (running kernel: 4.15.15-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.15: 5.1-3
pve-kernel-4.15.15-1-pve: 4.15.15-6
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-19
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-26
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9

nfs server(proxmox w/ same software as source and destination):
exports /rpool/data 172.16.8.54(rw,async,no_root_squash,no_subtree_check)

Container OS: default CentOS 7 (also experienced w/ Ubuntu 16.04) all updates and no additional repos/software added from base
arch: amd64
cores: 4
hostname: migrate-test
memory: 4096
net0: name=eth0,bridge=vmbr0,hwaddr=0E:F1:AF:CE:98:36,ip=dhcp,type=veth
ostype: centos
rootfs: vm:103/vm-103-disk-1.raw,size=8G
swap: 512

Code:
nfs: vm
        export /rpool/data
        path /mnt/pve/vm
        server sionis-nfs
        content images,rootdir
        maxfiles 8
        nodes cyrus,lucius
        options vers=4.2,async,hard,tcp,noatime
 
Interestingly enough, I can manually add a sync after shutdown on the source node and the delay is gone.

pct shutdown 103 && pct migrate 103 lucius && sync && ssh lucius 'pct start 103'

that results in < 3 second migration time.

I'm using async nfs w/ zfs on the backend and a UPS(async can get messy fast without one) for performance reasons, but I'm wondering if it's possible to include a 'sync' in your migration script.

Thanks
 
I will test tomorrow, but I think it has something to do with the async and zfs.
 
Thanks for looking into it, my suspicion is that the .raw file still *thinks* it's mounted from the delayed write on shutdown with async.

I'd assume that on any shutdown task the .raw file and parent directory would have an fsync or fdatasync call associated but I guess not.

So far I haven't noticed any issues using a custom .sh for now that just adds a 'sync' call after migrating and before starting on the destination.

Worth noting that the sync is originating from the NFS client(source).
 
I have the same effect with containers on glusterfs (directory storage on mounted glusterfs mountpoint). I hae this also when starting containers. After some time it fails and gets restarted and so it works in the end, but delayed. And yes with proxmox 6
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!