Storage replication regulary hangs after upgrade

Denis Kulikov · Jul 10, 2020

Hello, all.

After upgrade one node of our cluster (3 nodes, 2 of them is pve 6.2, replication hangs from 6.2 to 6.2: pve2->pve1) - we seen that replication regulary hangs on different containers and only reboot whole node is helps.
Before upgrade - all working fine in this configuration.

pvesr status showing that container in SYNCING state and other in pending state (for hours and days):

Code:

1029-0     Yes        local/pve1            2020-07-10_21:01:54              pending   5.302523          0 OK
1031-0     Yes        local/pve1            2020-07-10_20:51:55              pending   5.658419          0 SYNCING
1033-0     Yes        local/pve1            2020-07-10_20:52:01              pending   5.484414          0 OK

in replication logs ans sysctemctl -u pvesr - nothing:

Code:

2020-07-10 21:02:00 1031-0: start replication job
2020-07-10 21:02:00 1031-0: guest => CT 1031, running => 1
2020-07-10 21:02:00 1031-0: volumes => local-zfs:subvol-1031-disk-1

root@pve2:~# journalctl -u pvesr --since "2 hours ago"
-- Logs begin at Fri 2020-06-26 10:44:19 +10, end at Fri 2020-07-10 22:51:10 +10. --
июл 10 20:57:25 pve2 systemd[1]: pvesr.service: Succeeded.
июл 10 20:57:25 pve2 systemd[1]: Started Proxmox VE replication runner.
июл 10 20:57:25 pve2 systemd[1]: Starting Proxmox VE replication runner...
июл 10 20:57:27 pve2 systemd[1]: pvesr.service: Succeeded.
июл 10 20:57:27 pve2 systemd[1]: Started Proxmox VE replication runner.
июл 10 20:58:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
июл 10 20:58:07 pve2 systemd[1]: pvesr.service: Succeeded.
июл 10 20:58:07 pve2 systemd[1]: Started Proxmox VE replication runner.
июл 10 20:59:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
июл 10 20:59:01 pve2 systemd[1]: pvesr.service: Succeeded.
июл 10 20:59:01 pve2 systemd[1]: Started Proxmox VE replication runner.
июл 10 21:00:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
июл 10 21:01:16 pve2 pvesr[95858]: trying to acquire cfs lock 'file-replication_cfg' ...

root@pve2:~# ps axf | grep pvesr
95858 ? Rs 67:21 /usr/bin/perl -T /usr/bin/pvesr run --mail 1

And strace (attached to pvesr pid) showing that problem with container freezing (i think):

openat(AT_FDCWD, "/sys/fs/cgroup/freezer///lxc/1031/freezer.state", O_RDONLY|O_CLOEXEC) = 8
ioctl(8, TCGETS, 0x7ffc48c39de0) = -1 ENOTTY (Not applicable to this ioctl device.|Неприменимый к данному устройству ioctl)
lseek(8, 0, SEEK_CUR) = 0
fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(8, "FREEZING\n", 8192) = 9
read(8, "", 8192) = 0
close(8) = 0
openat(AT_FDCWD, "/sys/fs/cgroup/freezer///lxc/1031/freezer.state", O_RDONLY|O_CLOEXEC) = 8
ioctl(8, TCGETS, 0x7ffc48c39de0) = -1 ENOTTY (Not applicable to this ioctl device.|Неприменимый к данному устройству ioctl)
lseek(8, 0, SEEK_CUR) = 0
fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(8, "FREEZING\n", 8192) = 9
read(8, "", 8192) = 0
close(8) = 0
openat(AT_FDCWD, "/sys/fs/cgroup/freezer///lxc/1031/freezer.state", O_RDONLY|O_CLOEXEC) = 8
ioctl(8, TCGETS, 0x7ffc48c39de0) = -1 ENOTTY (Not applicable to this ioctl device.|Неприменимый к данному устройству ioctl)
lseek(8, 0, SEEK_CUR) = 0
fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(8, "FREEZING\n", 8192) = 9
read(8, "", 8192) = 0
close(8) = 0
openat(AT_FDCWD, "/sys/fs/cgroup/freezer///lxc/1031/freezer.state", O_RDONLY|O_CLOEXEC) = 8
ioctl(8, TCGETS, 0x7ffc48c39de0) = -1 ENOTTY (Not applicable to this ioctl device.|Неприменимый к данному устройству ioctl)
lseek(8, 0, SEEK_CUR) = 0
fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(8, "FREEZING\n", 8192) = 9
read(8, "", 8192) = 0
close(8) = 0

lxc-freeze -n 1031 - is hangs forever (as other commands: pct stop, pct stop with force keyword, pct enter, etc).

lxc-unfreeze showing that container in FREEZING state:

Code:

lxc-unfreeze 1031 20200710131309.947 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:162 - Response data length for command "get_init_pid" is 0
lxc-unfreeze 1031 20200710131309.947 DEBUG    commands - commands.c:lxc_cmd_rsp_recv:162 - Response data length for command "get_state" is 0
lxc-unfreeze 1031 20200710131309.947 DEBUG    commands - commands.c:lxc_cmd_get_state:713 - Container "1031" is in "FREEZING" state

We can kill -KILL pid of lxc-start, but after that - we cannot start container again (because all structures in /sys/fs/cgroup for this container not correctly removed in kernel) and only one way - is reboot (or disable replication job zf-> restart pvesr -> delete all snapshots and volumes on replication target -> create replication job -> full replication of container to backup node -> start container on other node - hard and incorrect way).

proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-27-pve: 4.15.18-55
pve-kernel-4.15.18-24-pve: 4.15.18-52
pve-kernel-4.15.18-23-pve: 4.15.18-51
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-19-pve: 4.15.18-45
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.10.17-2-pve: 4.10.17-20
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
pve-zsync: 2.0-3
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

source:

root@pve2:~# zfs list -t snapshot | grep 1031
rpool/data/subvol-1031-disk-1@__replicate_1031-0_1594378315__ 1,38M - 2,13G -

target:

root@pve1:~# zfs list -t snapshot | grep 1031
rpool/data/subvol-1031-disk-1@__replicate_1031-0_1594378315__ 192K - 2,13G -

Any help would be appreciated

MisterY · Jul 11, 2020

my pvesr.service keeps restarting all the time after the last upgrade.

fabian · Jul 13, 2020

MisterY said:
my pvesr.service keeps restarting all the time after the last upgrade.

could you show the output of pveversion -v, systemctl list-timers and journalctl -u pvesr.service -u pvesr.timer --since "-60min"?

Denis Kulikov · Jul 16, 2020

We investigate this issue and find by:
ls -lh /proc/PROC_ID/fd

that some processes on container (that stuck in 'FREEZING') accessing to folder on NFS share.

NFS share is mounted to container as folder:

lxc.mount.entry: /mnt/pve/NAS_records/1013 var/spool/XYZ none bind,create=dir 0 0

NFS share mounted to PVE:

X.Y.Z.W:/volume2/Records on /mnt/pve/NAS_records type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=X.Y.Z.W,mountvers=3,mountport=892,mountproto=udp,local_lock=none,addr=X.Y.Z.W)

All folders on this share are accessible from host, but host is hung (as container) then trying to access to /mnt/pve/NAS_records/1013/XYZ.

From another hosts on cluster (at the same time) - all folders (and /mnt/pve/NAS_records/1013/XYZ) on NFS share is accessible without any problems.

I think that issue occuring when files is writen by container to var/spool/XYZ and pvesr (lxc-freeze) is trying to FREEZE container for replication.
All process in container stuck in D-state.

Before update (6.1 to 6.2) - all worked fine for moths, after update - broken.

Today we plan to try the old kernel 5.3.18-3.

Can change lxc.mount.entry to mp0: with replicate=0 helps?

Any help would be appreciated.

Denis Kulikov · Jul 27, 2020

We try to use kernel 5.3.18-3 and issue is persist.
Then process in container read or write file to nfs share (via bind mount) and pvesr enabled for container replication - sometimes container is stuck (freeze) forever.
Soft mount of NFS share to host - not helps.

Stack trace and ps faxo showing the same thing - rwsem_down_write_slowpath and all process exept one is 'refrigerated':

[Пн июл 27 16:33:17 2020] INFO: task lame:85035 blocked for more than 120 seconds.
[Пн июл 27 16:33:17 2020] Tainted: P W IO 5.3.18-3-pve #1
[Пн июл 27 16:33:17 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Пн июл 27 16:33:17 2020] lame D 0 85035 84981 0x20020324
[Пн июл 27 16:33:17 2020] Call Trace:
[Пн июл 27 16:33:17 2020] __schedule+0x2bb/0x660
[Пн июл 27 16:33:17 2020] schedule+0x33/0xa0
[Пн июл 27 16:33:17 2020] rwsem_down_write_slowpath+0x2e3/0x480
[Пн июл 27 16:33:17 2020] down_write+0x3d/0x40
[Пн июл 27 16:33:17 2020] path_openat+0x450/0x1570
[Пн июл 27 16:33:17 2020] ? generic_file_read_iter+0x9f8/0xd60
[Пн июл 27 16:33:17 2020] ? nfs_check_cache_invalid+0x38/0xa0 [nfs]
[Пн июл 27 16:33:17 2020] do_filp_open+0x93/0x100
[Пн июл 27 16:33:17 2020] ? __alloc_fd+0x46/0x150
[Пн июл 27 16:33:17 2020] do_sys_open+0x177/0x280
[Пн июл 27 16:33:17 2020] __ia32_compat_sys_open+0x1e/0x20
[Пн июл 27 16:33:17 2020] do_fast_syscall_32+0x9d/0x260
[Пн июл 27 16:33:17 2020] entry_SYSENTER_compat+0x7f/0x91
[Пн июл 27 16:33:17 2020] RIP: 0023:0xf7fb1b19
[Пн июл 27 16:33:17 2020] Code: Bad RIP value.
[Пн июл 27 16:33:17 2020] RSP: 002b:00000000ffc3dd4c EFLAGS: 00000282 ORIG_RAX: 0000000000000005
[Пн июл 27 16:33:17 2020] RAX: ffffffffffffffda RBX: 00000000ffd0868a RCX: 0000000000008242
[Пн июл 27 16:33:17 2020] RDX: 00000000000001b6 RSI: 0000000008054e89 RDI: 0000000000000000
[Пн июл 27 16:33:17 2020] RBP: 00000000ffc3ddac R08: 0000000000000000 R09: 0000000000000000
[Пн июл 27 16:33:17 2020] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[Пн июл 27 16:33:17 2020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

4 0 79346 78434 19 0 2956 1548 refrigerator Ds ? 00:00:00 \_ init
5 0 81088 79346 23 -4 2528 1028 refrigerator D<s ? 00:03:12 \_ udevd
5 0 50745 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 50748 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 64349 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 64382 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 64416 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 64429 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 64445 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 68127 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 68128 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 68129 81088 21 -2 2524 1196 refrigerator D< ? 00:00:00 | \_ udevd
5 0 83943 79346 19 0 29852 2636 refrigerator Dl ? 00:00:02 \_ rsyslogd
5 81 84370 79346 19 0 3056 1304 refrigerator Ds ? 00:00:00 \_ dbus-daemon
5 68 99506 79346 19 0 15376 4924 refrigerator Dsl ? 00:03:29 \_ hald
0 0 99507 99506 19 0 4108 1828 refrigerator D ? 00:00:00 | \_ hald-runner
0 0 100588 99507 19 0 4208 2008 refrigerator D ? 00:00:00 | \_ hald-addon-inpu
0 0 100753 99507 19 0 4176 780 refrigerator D ? 00:00:00 | \_ hald-addon-leds
4 68 101129 99507 19 0 3760 1468 refrigerator D ? 00:00:06 | \_ hald-addon-acpi
5 99 101238 79346 19 0 2696 1256 refrigerator D ? 00:00:00 \_ dnsmasq
5 0 102865 79346 19 0 8676 1776 refrigerator Ds ? 00:00:00 \_ sshd
5 0 102955 79346 19 0 2952 1364 refrigerator Ds ? 00:00:00 \_ xinetd
4 0 103752 79346 19 0 6328 1864 refrigerator D ? 00:00:00 \_ mysqld_safe
4 27 104269 103752 19 0 121348 26528 refrigerator Dl ? 02:08:16 | \_ mysqld
4 0 109087 79346 19 0 12700 3272 refrigerator Ds ? 00:00:01 \_ master
4 89 109283 109087 19 0 12936 3404 refrigerator D ? 00:00:01 | \_ qmgr
4 89 33106 109087 19 0 12776 3672 refrigerator D ? 00:00:00 | \_ pickup
5 0 113848 79346 19 0 38176 14788 refrigerator Ds ? 00:00:12 \_ httpd
5 498 76847 113848 19 0 53664 28228 refrigerator D ? 00:01:51 | \_ httpd
5 498 118593 113848 19 0 51604 25988 refrigerator D ? 00:01:54 | \_ httpd
5 498 27399 113848 19 0 52132 26676 refrigerator D ? 00:01:45 | \_ httpd
5 498 77177 113848 19 0 52132 26460 refrigerator D ? 00:01:44 | \_ httpd
5 498 126057 113848 19 0 52364 26724 refrigerator D ? 00:01:02 | \_ httpd
5 498 117219 113848 19 0 52384 26928 refrigerator D ? 00:00:42 | \_ httpd
5 498 13514 113848 19 0 52384 26672 refrigerator D ? 00:00:48 | \_ httpd
5 498 33803 113848 19 0 52128 26448 refrigerator D ? 00:00:48 | \_ httpd
5 498 34259 113848 19 0 52108 26452 refrigerator D ? 00:00:40 | \_ httpd
5 498 34260 113848 19 0 49256 23800 refrigerator D ? 00:00:46 | \_ httpd
5 0 114005 79346 19 0 6796 1736 refrigerator Ds ? 00:00:07 \_ crond
0 0 124580 79346 19 0 3640 1828 refrigerator D ? 00:02:25 \_ gam_server
1 0 126381 79346 19 0 3452 1688 refrigerator Ds ? 00:00:00 \_ incrond
1 0 5836 79346 19 0 6328 1584 refrigerator D ? 00:00:00 \_ safe_asterisk
4 498 5838 5836 50 - 76008 30768 refrigerator Dl ? 00:41:06 | \_ asterisk
0 498 5840 5838 19 0 3268 688 refrigerator D ? 00:00:06 | \_ astcanary
0 498 84979 5838 19 0 35604 11404 refrigerator D ? 00:00:00 | \_ convert_to_mp3.
0 498 85032 84979 0 19 3864 2044 refrigerator DN ? 00:00:00 | | \_ lame
0 498 84981 5838 19 0 35604 11324 refrigerator D ? 00:00:00 | \_ convert_to_mp3.
0 498 85035 84981 0 19 3868 2044 rwsem_down_write_slo DN ? 00:00:00 | \_ lame
5 0 11709 79346 19 0 148456 8308 refrigerator Dl ? 00:10:18 \_ fail2ban-server
0 498 44248 79346 19 0 94468 20888 refrigerator Dsl ? 00:00:00 \_ node
0 498 44529 44248 19 0 119240 43264 refrigerator Dl ? 00:01:45 | \_ node
4 0 44323 79346 19 0 2064 948 refrigerator Ds+ pts/0 00:00:00 \_ mingetty
4 0 44337 79346 19 0 2064 984 refrigerator Ds+ pts/1 00:00:00 \_ mingetty

GDB not showing anything interesting.

root@pve2:~# ls /sys/fs/cgroup/freezer/lxc/1012
cgroup.clone_children cgroup.procs freezer.parent_freezing freezer.self_freezing freezer.state notify_on_release ns tasks

root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/tasks
root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/cgroup.procs
root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/cgroup.clone_children
0
root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/cgroup.clone_children freezer.parent_freezing
0
root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/freezer.parent_freezing
0
root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/freezer.self_freezing
1
root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/freezer.state
FREEZING
root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/notify_on_release
0
root@pve2:~# cat /sys/fs/cgroup/freezer/lxc/1012/tasks

Denis Kulikov · Nov 6, 2020

Small updates:
https://github.com/lxc/lxc/issues/3578
https://bugzilla.proxmox.com/show_bug.cgi?id=3117

Denis Kulikov · Mar 1, 2021

Small update (forgot to inform about this, it may be useful for community):

Code:

/sys/fs/cgroup/freezer/lxc/XYZ# echo THAWED > freezer.state

help to unfreeze pvesr and container (if it stuck in FREEZING state) and kill (or stop) them.
It`s a bug, bug, bug.

niziak · Apr 8, 2021

I got the same issue.
With weekly backup set of LXCs on one node, this issue breaks all LXCs on this node (remains freezed). It starts happening after adding one LXC with snapd installed inside. This LXC cannot be freezed (Proxmox waits for freezing, but snapd keeps hands on own cgroup and remains unfrezed).

rholighaus · Jul 21, 2021

Thanks Denis Kulikov your fix is a life saver!

Has anybody tested the behaviour with PVE7 and LXC 4.0?

Denis Kulikov · Jan 11, 2022

And addional info for community:

Sometimes pvesr stuck (for a one day or more) in SYNCING state:

Code:

2420-0     Yes        local/pve1            2022-01-10_16:07:26              pending   7.537101          0 SYNCING

but container is working normally.

On lsof -p pvesr_pid you can see (/sys/fs/cgroup/freezer/lxc/2420/freezer.state):

Code:

pvesr   3962 root    6wW     REG               0,28        0       63 /run/lock/pve-manager/pve-migrate-2420
pvesr   3962 root    7w      REG               0,26      177   268997 /var/log/pve/replicate/2420-0
pvesr   3962 root    8   unknown               0,43                   /sys/fs/cgroup/freezer/lxc/2420/freezer.state
pvesr   3962 root   12r      REG               0,26    22016    27381 /usr/share/perl5/Net/LDAP/Constant.pm

Workaround for this is:

Code:

echo 'FROZEN' > /sys/fs/cgroup/freezer/lxc/2420/freezer.state

I hope this helps somebody.

Search

Search

Storage replication regulary hangs after upgrade

Denis Kulikov

Active Member

MisterY

Renowned Member

fabian

Proxmox Staff Member

Denis Kulikov

Active Member

Denis Kulikov

Active Member

Denis Kulikov

Active Member

Denis Kulikov

Active Member

niziak

Member

rholighaus

Well-Known Member

Denis Kulikov

Active Member