LXC lag for no reason after upgrade to pve 6-1-8

Linyu

Active Member
Jun 14, 2019
40
1
28
24
I don't know why but it seems the lxc cts become lagging after upgrade to 6-1-8, I mean even you excuse "top" command in CT, it will also hangs for 5 min, I've checked the load, cpu usage, io wait, network usage, everything is fine, but just lagggggggggg.
BTW, all the CTs have this problem in that node.

Where should I start to search for the reason?

pveversion -v:
Code:
root@node-cn-hk2:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.13-1-pve: 5.3.13-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

top on host:
Code:
top - 12:36:36 up 3 days, 30 min,  5 users,  load average: 8.88, 9.75, 10.25
Tasks: 6662 total,   5 running, 6657 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9.0 us, 11.0 sy,  0.0 ni, 79.0 id,  0.6 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem :  64366.4 total,   1116.3 free,  46961.8 used,  16288.3 buff/cache
MiB Swap: 131021.0 total,  69415.2 free,  61605.7 used.  11364.8 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                
18599 root      20   0  297612  35996   6904 R  52.0   0.1 385:14.50 pvestatd                                                                
1051 root      20   0 8867012 727600   1720 S  42.5   1.1   1532:03 lxcfs                                                                  
1627 root      20   0 2816620   1.4g   3596 S  39.4   2.3 241:40.33 kvm                                                                    
5437 root      20   0 4869048   3.7g   5468 S  31.4   5.8 150:21.54 kvm                                                                    
  215 root      25   5       0      0      0 S  28.9   0.0 824:06.78 ksmd                                                                    
17480 root      20   0   21.0g   5.0g   3900 S  22.8   7.9 923:07.87 kvm                                                                    
18514 root      20   0 2757064   1.9g   3764 S  21.5   3.0 903:51.53 kvm                                                                    
11706 rootroot  20   0  241256  30444   7708 S  14.5   0.0   9:54.76 php-fpm                                                                
13893 root      20   0 2681112 678840   3688 S  14.2   1.0 395:05.69 kvm                                                                    
17684 root      20   0 1618164 605552   3636 S  11.7   0.9 394:08.22 kvm                                                                    
3165 root      20   0 1628468 530880   3520 R  10.8   0.8 481:56.02 kvm                                                                    
22129 root      20   0 2691352   1.3g   3548 S   9.5   2.0 472:26.89 kvm                                                                    
9505 root      20   0 1634044 635636   4632 S   9.2   1.0  89:26.30 kvm                                                                    
1099 root      20   0 2678552 371660   3492 S   8.6   0.6 383:14.70 kvm                                                                    
12775 root      20   0 2728400 490204   3540 S   8.6   0.7 370:07.75 kvm                                                                    
10661 root      20   0 2865532 271748   3428 S   8.3   0.4 302:00.97 kvm                                                                    
30698 root      20   0 1698252 597880   4080 S   8.3   0.9 312:39.28 kvm                                                                    
10306 root      20   0   18964  11280   3268 R   7.7   0.0   0:01.77 top                                                                    
27588 rootroot  20   0 3135672  52872   5764 S   5.5   0.1   0:06.30 httpd                                                                  
21602 rootroot  20   0  109276   9652   3460 S   5.2   0.0   2:50.29 php-fpm                                                                
10950 root      20   0 2680004 226332   3312 S   4.9   0.3 132:53.61 kvm                                                                    
17927 rootroot  20   0 2407632  18864   3044 S   4.6   0.0   4:55.85 httpd                                                                  
11320 root      20   0 1603276 294864   3412 S   4.3   0.4  90:14.62 kvm                                                                    
24477 root      20   0 1615080 961360   3616 S   3.7   1.5 153:07.09 kvm                                                                    
30796 root      20   0 1064152 226608   3568 S   3.7   0.3  83:03.06 kvm                                                                    
11379 rootroot  20   0 3062604  19900   3232 S   3.4   0.0   6:32.54 httpd                                                                  
21720 rootroot  20   0  109272   9912   3844 S   3.4   0.0   2:46.42 php-fpm                                                                
8110 rootroot  20   0 4242252  39432   3768 S   3.1   0.1  40:22.18 httpd                                                                  
8753 rootroot  20   0  239268  27192   6484 S   3.1   0.0  10:13.84 php-fpm                                                                
2023 root      20   0       0      0      0 S   2.8   0.0   5:24.54 vhost-1627                                                              
14166 rootroot  20   0 2407632  16332   2980 S   2.8   0.0   1:03.27 httpd                                                                  
21437 root      20   0 2695956   1.0g   3656 S   2.8   1.7 145:54.62 kvm                                                                    
18195 rootroot  20   0 2473168  19600   2864 S   2.5   0.0   2:48.16 httpd                                                                  
17103 root      20   0 1051800 140356   3264 S   2.2   0.2  61:27.95 kvm                                                                    
17368 1001      20   0 5403128 334884   3116 S   2.2   0.5  28:49.41 mysqld                                                                  
18217 root      20   0 1061052 238700   3392 S   2.2   0.4  69:34.81 kvm

vmstat:
Code:
root@node-cn-hk2:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 5  0 63083264 1395168 639904 16248600   13   26   348   195    3    3  7  9 82  1  0
 3  0 63083008 1406664 640224 16259136  388    0 10876  3465 37993 72899  7 10 83  1  0
 6  0 63083008 1233488 640400 16260948  140    0   648  2153 36870 67179  8 12 80  0  0
 6  0 63083008 1236560 640624 16261924   20    0   840  2653 35263 61302  7 11 80  1  0
13  0 63083008 1259372 640820 16262836   20    0   140  2193 36688 76831  7 11 82  0  0
 8  0 63083008 1287556 641216 16266528  308    0  3752  3651 37057 64575  7 13 79  1  0
 
Ok, So I'm not going crazy!

I've been seeing some very similar issues.

I would login onto a CT type "w" and wait.
Sometimes I would get an result after a while (seconds to minutes), sometimes it would just get stuck (D), and would need to reboot the whole host (power off/on)

I did a strace and I can see it trying to read (and getting stuck there):
/sys/devices/system/cpu/online
/proc/uptime

I also noticed that other lxcfs mount points getting stuck:
/proc/cpuinfo
/proc/meminfo
......

This would not happen immediately. This would only happen after the host is running for a day or so....

I just manually downgraded the lxcfs to 4.0.0 (apt install lxcfs=4.0.0-pve1) on the proxmox host.. restarted the CT's, and the delay is gone.
Let's see if it comes back after a few days...

Cheers,
Liviu
 
Last edited:
  • Like
Reactions: Linyu
Ok, So I'm not going crazy!

I've been seeing some very similar issues.

I would login onto a CT type "w" and wait.
Sometimes I would get an result after a while (seconds to minutes), sometimes it would just get stuck (D), and would need to reboot the whole host (power off/on)

I did a strace and I can see it trying to read (and getting stuck there):
/sys/devices/system/cpu/online
/proc/uptime

I also noticed that other lxcfs mount points getting stuck:
/proc/cpuinfo
/proc/meminfo
......

This would not happen immediately. This would only happen after the host is running for a day or so....

I just manually downgraded the lxcfs to 4.0.0 (apt install lxcfs=4.0.0) on the proxmox host.. restarted the CT's, and the delay is gone.
Let's see if it comes back after a few days...

Cheers,
Liviu

Thank you , I just done the same work, and it works like magic.
 
It would be nice to know where exactly lxcfs is stuck there. Could you try to get a backtrace while the hangs are happening (you'll need gdb and lxcfs-dbgsym installed)? (Note that If you've been upgrading or downgrading it on the running machine you may need to restart lxcfs (including the containers since restarting instead of reloading will destroy the lxcfs files) to get proper information.)
To get the backtrace, find lxcfs' pid and run the following (replacing `PIDofLXCFS` with lxcfs' pid number)
Code:
# gdb -batch -n -ex 'set pagination off' -ex 'attach PIDofLXCFS' -ex 'thread apply all bt full'
There'll also be lxcfs-4.0.2 available soon, maybe that helps. Otherwise we'd require more information about what kind of containers you're running so we can try to reproduce this.
 
Thank you for your answer Wolfgang.

I kept one proxmox node on lxcfs-4.0.1.

If/When it starts hanging again I'll try and get a trace.

I ended up restarting all containers after downgrading lxcfs :)

It would be nice to know where exactly lxcfs is stuck there. Could you try to get a backtrace while the hangs are happening (you'll need gdb and lxcfs-dbgsym installed)? (Note that If you've been upgrading or downgrading it on the running machine you may need to restart lxcfs (including the containers since restarting instead of reloading will destroy the lxcfs files) to get proper information.)
To get the backtrace, find lxcfs' pid and run the following (replacing `PIDofLXCFS` with lxcfs' pid number)
Code:
# gdb -batch -n -ex 'set pagination off' -ex 'attach PIDofLXCFS' -ex 'thread apply all bt full'
There'll also be lxcfs-4.0.2 available soon, maybe that helps. Otherwise we'd require more information about what kind of containers you're running so we can try to reproduce this.
 
Hi Wolfgang,

It hapened again, with lxcfs-4.0.0.

It does not get completely stuck, but it takes 2-3 seconds:

#time cat /sys/devices/system/cpu/online
0-7

real 0m2.568s
user 0m0.002s
sys 0m0.000s


I've ran the gdb a few times while running 'w' on a loop.

# ps axf | grep lxcf
1835 ? Ssl 303:01 /usr/bin/lxcfs /var/lib/lxcfs

root@pvebig:~# gdb -batch -n -ex 'set pagination off' -ex 'attach 1835' -ex 'thread apply all bt full' > 1
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
root@pvebig:~# gdb -batch -n -ex 'set pagination off' -ex 'attach 1835' -ex 'thread apply all bt full' > 2
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
root@pvebig:~# gdb -batch -n -ex 'set pagination off' -ex 'attach 1835' -ex 'thread apply all bt full' > 3
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
root@pvebig:~# gdb -batch -n -ex 'set pagination off' -ex 'attach 1835' -ex 'thread apply all bt full' > 4
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.


Hope you can find something,,,,
 

Attachments

  • gdb_lxcfs.zip
    58.2 KB · Views: 1
Today lxcfs started getting stuck again. I took another trace. Attached.
 

Attachments

  • stuck2.zip
    27.6 KB · Views: 1
The traces show a lot of processes waiting for a single mutex in lxcfs. Perhaps this can be improved, but at first glance there's not really a "bug" visible. It's just that lxcfs seems to be doing quite a bit of work in a mutually-exclusive code path. However, this isn't new. Are you sure this does not happen with older lxcfs versions? Which version did you initially upgrade from? The `top` output does show a lot of load and also a lot of swapping and little free memory.
 
Hi Wolfgang,

This is my "home/dev-cluster", so I'm usually updating all packages almost every week.
So I would have went through all lxcfs versions.

I can't really figure out when it started happening, but it certainly happening more often in the last 2 weeks.
Feels like it's happening more since lxcfs has been upgraded to 4.0.1. But it could be caused by another package nevertheless.

After I got the issue with lxcfs-4.0.0 this weekend, I upgraded to 4.0.1 again, and I started getting the issue pretty much every day.

So I downgraded back to 4.0.0 as it felt much more stable. Will see if it feels more stable or not.
If I see the issue in 4.0.0, I'll downgrade to 3.x

Another thing that I have noticed recently, more and more I see issues with lxc-freeze getting stuck when ran by pvesr. (Yes, I'm using zfs replication)
Sometimes the CT's end up stuck in "freezing", and it's impossible to unfreeze without a restart of the whole node.
When that happens, usually the lxcfs issue pops up in all other CTs

Let me know if I can possibly give you any other information :)
 
So there seems to have indeed be a change between lxcfs 3 and 4 that could potentially be the culprit. (A cache for an expensive lookup was not being filled anymore). There's a fix in the stable-4 branch and we'll be shipping it with lxcfs-4.0.2-pve3 soon. I'll let you know when it's ready in the pvetest repository.
 
There's a new lxcfs release in the pvetest repository, lxcfs-4.0.3-pve1, can you try and see if it fixes your issue?
 
Thank you Wolfgang.

I just updated and rebooted 2 nodes.
Will see how it goes over the next week (it usually takes 2-4 days for the issue to pop up).

Cheers,
Liviu
 
Hi Wolfgang,

It seems that lxcfs-4.0.3 has fixed the lag or locking issue.

But I've noticed that lxc CTs are still getting stuck in freezing state sometimes during backup or zfs replication.
Today I had a brainwave, so with one CT stuck in freezing, instead of rebooting the server, I killed lxcfs ("killall -9 lxcfs" was the only thing that worked) and the freeze succeeded and replication was succesful.


See the replication log, and see how long the freezing took:

2020-04-27 22:33:00 123-0: start replication job
2020-04-27 22:33:00 123-0: guest => CT 123, running => 1
2020-04-27 22:33:00 123-0: volumes => local-zfs:subvol-123-disk-0
2020-04-27 22:33:01 123-0: freeze guest filesystem
2020-04-27 23:13:18 123-0: create snapshot '__replicate_123-0_1587983580__' on local-zfs:subvol-123-disk-0

2020-04-27 23:13:18 123-0: thaw guest filesystem
2020-04-27 23:13:18 123-0: incremental sync 'local-zfs:subvol-123-disk-0' (__replicate_123-0_1587905792__ => __replicate_123-0_1587983580__)
2020-04-27 23:13:19 123-0: send from @__replicate_123-0_1587905792__ to rpool/data/subvol-123-disk-0@__replicate_123-0_1587983580__ estimated size is 24.2M
2020-04-27 23:13:19 123-0: total estimated size is 24.2M
2020-04-27 23:13:19 123-0: TIME SENT SNAPSHOT rpool/data/subvol-123-disk-0@__replicate_123-0_1587983580__
2020-04-27 23:13:19 123-0: rpool/data/subvol-123-disk-0@__replicate_123-0_1587905792__ name rpool/data/subvol-123-disk-0@__replicate_123-0_1587905792__ -
2020-04-27 23:13:20 123-0: delete previous replication snapshot '__replicate_123-0_1587905792__' on local-zfs:subvol-123-disk-0
2020-04-27 23:13:20 123-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_123-0_1587905792__' on local-zfs:subvol-123-disk-0
2020-04-27 23:13:21 123-0: end replication job


root@pvefast:~# lxc-ls -f
NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED
123 FREEZING 0 - 192.168.86.71 - true

143337 ? Ds 0:00 \_ /sbin/init
143472 ? Ds 0:06 \_ /lib/systemd/systemd-journald
143494 ? Ds 0:00 \_ /lib/systemd/systemd-networkd
143522 ? Ds 0:00 \_ /lib/systemd/systemd-resolved
143524 ? Dsl 0:01 \_ /usr/lib/accountsservice/accounts-daemon
143525 ? Ds 0:00 \_ /lib/systemd/systemd-logind
143526 ? Ds 0:00 \_ /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
143528 ? Ds 0:00 \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
143529 ? Ds 0:00 \_ /usr/sbin/cron -f
143532 ? Dsl 0:02 \_ /usr/sbin/rsyslogd -n -iNONE
143576 ? Ds 0:24 \_ /usr/sbin/snmpd -Lsd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux mteTrigger mteTriggerConf -f -p /run/snmpd.pid
143587 pts/1 Ds+ 0:00 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud tty2 115200,38400,9600 linux
143588 pts/0 Ds+ 0:00 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud tty1 115200,38400,9600 linux
143589 pts/3 Ds+ 0:00 \_ /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 linux
143657 ? Ds 0:00 \_ /usr/sbin/sshd -D
143831 ? Ds 0:00 \_ /usr/lib/postfix/sbin/master -w
143834 ? D 0:00 \_ qmgr -l -t unix -u
810335 ? D 0:00 \_ pickup -l -t unix -u -c
 
Are there any lxcfs related syslog entries during that time?
 
This is the only thing I can find, but it's not matching the timestamp for the freezes.

Apr 27 01:43:52 pvefast lxcfs[1746]: utils.c: 244: recv_creds: Timed out waiting for scm_cred: Success
Apr 27 03:32:22 pvefast lxcfs[1746]: utils.c: 244: recv_creds: Timed out waiting for scm_cred: No such file or directory
Apr 27 22:42:02 pvefast lxcfs[1746]: utils.c: 244: recv_creds: Timed out waiting for scm_cred: Success
 
I have downgraded to lxcfs-3.0.3-pve60 and it's been stable for over 5 days.
On 4.0.3 I would have a few lxc freezes a day.
 
Hi Wolfgang,

I still get the lxc freezes, even with latest proxmox 6.2-4, with the latest lxcfs. When downgrading the lxcfs to 3.0.3-pve60 I have no issues.

I usually get these frozen lxc containers during a zfs replication of a running container.

While the containr is frozen, I can see two lxcfs processes:
2680191 ? Ssl 23:03 /usr/bin/lxcfs /var/lib/lxcfs
1942414 ? S 0:00 \_ /usr/bin/lxcfs /var/lib/lxcfs


I did a strace on the processes:

Parent:
# strace -p 2680191
strace: Process 2680191 attached
futex(0x7ffc5ab09d00, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY

Child:
# strace -p 1942414
strace: Process 1942414 attached
futex(0x7ff454d808d8, FUTEX_WAIT_PRIVATE, 2, NULL


I have also attached the gdb on both processes.

Killing the child lxcfs unfreezes the container and allows the replication to proceed.

Hope this helps.

Cheers,
Liviu
 

Attachments

  • lxc_stuck.zip
    5.1 KB · Views: 0

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!