Proxmox 6.1.8 LXC zfs replication issues

Paspao

Active Member
Aug 1, 2017
69
2
28
56
Hello,

I have 2 clusters with same hardware, configuration and same LXC.

One cluster is still on V6.0 and another I upgraded to 6.1.8.

The both use local ZFS.

After upgrading the cluster to 6.1 I had issues with LXC with one or more processes in D (Uninterruptible sleep) and was forced to reboot nodes.

I noticed LXC were hanging during replication hours and I found replication jobs hanging in SYNCING .

No errors in syslog in most cases.

In one case I found:
Code:
Apr 22 01:36:40 proxmox kernel: [837119.871166]  filp_close+0x36/0x70
Apr 22 01:38:41 proxmox kernel: [837240.702308]  fuse_flush+0x14d/0x190
Apr 22 01:40:42 proxmox kernel: [837361.533436] zabbix_agentd   D    0 39050  39048 0x00004324
Apr 22 01:40:42 proxmox kernel: [837361.533526] RDX: 0000558d0267d2b0 RSI: 0000000000000001 RDI: 0000000000000006
Apr 22 01:42:43 proxmox kernel: [837482.364556] Call Trace:
Apr 22 01:44:43 proxmox kernel: [837603.195770] RIP: 0033:0x7fdc0a5c32b0
Apr 22 01:46:44 proxmox kernel: [837724.026875]  __x64_sys_close+0x22/0x50
Apr 22 01:48:45 proxmox kernel: [837844.857981]  fuse_request_send+0x29/0x30
Apr 22 01:50:46 proxmox kernel: [837965.689128]  __x64_sys_close+0x22/0x50
Apr 22 01:52:47 proxmox kernel: [838086.520278]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Is there any known bug ?

Any troubleshooting hint?

Thank you.
P.
 
After upgrading the cluster to 6.1 I had issues with LXC with one or more processes in D (Uninterruptible sleep) and was forced to reboot nodes.
which processes where in uninteruptible sleep?

could you please provide the complete journal for that timeframe ?
Code:
journalctl --since '2020-04-22'

the part of the stacktrace you pasted does not look like a known problem (zabbix being blocked on a fuse operation...)
do you have zabbix agents running inside your lxc-containers?
 
Hi Stoiko,

I could provide log but really contains nothing interesting only some:

zed: eid=7528 class=history_event pool_guid=0xA49767BB4C1CD042



which processes where in uninteruptible sleep?

In one case all processes under lxc init were in D (Ds/Dt)

the part of the stacktrace you pasted does not look like a known problem (zabbix being blocked on a fuse operation...)
do you have zabbix agents running inside your lxc-containers?

In some cases only zabbix-agent hangs, yes we have it in all our lxc


Thank you !
P.
 
Last edited:
zed: eid=7528 class=history_event pool_guid=0xA49767BB4C1CD042
That seems like a regular message that can happen when creating a volume or a snapshot - this is unlikely to be the culprit

is the part of the stacktrace you posted in your original post listed in the journal?:
* if yes please post the surrounding lines
* if no i guess you don't have persistent journalling enabled and rebooted the host -> please check the /var/log/syslog* files for the relevant lines

without a complete log it is not really possible to narrow down the issue
 
Hello,

today it happened again, even if I stopped storage replication.

17558 ? D 0:15 \_ /usr/sbin/zabbix_agentd: collector [processing data]

It seems it happened very close to log rotation time at 3:19 AM.

No errors in syslog (attached).

This is output of cat /proc/17558/stack:
Code:
[<0>] request_wait_answer+0x133/0x210
[<0>] __fuse_request_send+0x69/0x90
[<0>] fuse_request_send+0x29/0x30
[<0>] fuse_direct_io+0x404/0x690
[<0>] fuse_file_read_iter+0x9a/0x130
[<0>] new_sync_read+0x122/0x1b0
[<0>] __vfs_read+0x29/0x40
[<0>] vfs_read+0x99/0x160
[<0>] ksys_read+0x61/0xe0
[<0>] __x64_sys_read+0x1a/0x20
[<0>] do_syscall_64+0x5a/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Code:
lsof -p 17558

COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
zabbix_ag 17558 _apt  cwd    DIR   0,62       22       34 /
zabbix_ag 17558 _apt  rtd    DIR   0,62       22       34 /
zabbix_ag 17558 _apt  txt    REG   0,62   287448   100655 /usr/sbin/zabbix_agentd
zabbix_ag 17558 _apt  DEL    REG    0,1                 1 /SYSV6c3e02ce
zabbix_ag 17558 _apt  mem    REG   0,62             40804 /lib/x86_64-linux-gnu/libnss_files-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             40806 /lib/x86_64-linux-gnu/libnss_nis-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             40647 /lib/x86_64-linux-gnu/libnsl-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             40801 /lib/x86_64-linux-gnu/libnss_compat-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             98313 /lib/x86_64-linux-gnu/libkeyutils.so.1.4 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             54283 /usr/lib/x86_64-linux-gnu/libkrb5support.so.0.1 (path dev=0,25, inode=5667)
zabbix_ag 17558 _apt  mem    REG   0,62             54095 /lib/x86_64-linux-gnu/libcom_err.so.2.1 (path dev=0,25, inode=225731)
zabbix_ag 17558 _apt  mem    REG   0,62             54220 /usr/lib/x86_64-linux-gnu/libk5crypto.so.3.1 (path dev=0,25, inode=5659)
zabbix_ag 17558 _apt  mem    REG   0,62             54263 /usr/lib/x86_64-linux-gnu/libkrb5.so.3.3 (path dev=0,25, inode=5665)
zabbix_ag 17558 _apt  mem    REG   0,62               182 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             34866 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.0.0 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62            180292 /usr/lib/x86_64-linux-gnu/libtasn1.so.3.1.16 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             34561 /usr/lib/x86_64-linux-gnu/librtmp.so.0 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62               149 /lib/x86_64-linux-gnu/libz.so.1.2.7 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             54240 /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2.2 (path dev=0,25, inode=1672)
zabbix_ag 17558 _apt  mem    REG   0,62             68992 /usr/lib/x86_64-linux-gnu/libssh2.so.1.0.1 (path dev=0,25, inode=30160)
zabbix_ag 17558 _apt  mem    REG   0,62             34537 /usr/lib/x86_64-linux-gnu/libidn.so.11.6.8 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             40570 /lib/x86_64-linux-gnu/libpthread-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             54161 /lib/x86_64-linux-gnu/libgcrypt.so.11.7.0 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             54197 /usr/lib/x86_64-linux-gnu/libgnutls.so.26.22.4 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             34435 /usr/lib/x86_64-linux-gnu/libsasl2.so.2.0.25 (path dev=0,25, inode=69761)
zabbix_ag 17558 _apt  mem    REG   0,62             40574 /lib/x86_64-linux-gnu/libc-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             40809 /lib/x86_64-linux-gnu/libresolv-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             40815 /lib/x86_64-linux-gnu/librt-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             40599 /lib/x86_64-linux-gnu/libdl-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             95339 /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4.2.0 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             98330 /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2.8.3 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             98329 /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2.8.3 (stat: No such file or directory)
zabbix_ag 17558 _apt  mem    REG   0,62             40571 /lib/x86_64-linux-gnu/ld-2.13.so (stat: No such file or directory)
zabbix_ag 17558 _apt    0r   CHR    1,3      0t0        5 /dev/null
zabbix_ag 17558 _apt    1w   REG   0,62     1937    70384 /var/log/zabbix-agent/zabbix_agentd.log.1 (deleted)
zabbix_ag 17558 _apt    2w   REG   0,62     1937    70384 /var/log/zabbix-agent/zabbix_agentd.log.1 (deleted)
zabbix_ag 17558 _apt    3w   REG  0,171        4       33 /run/zabbix/zabbix_agentd.pid (deleted)
zabbix_ag 17558 _apt    4u  sock    0,9      0t0 35768137 protocol: TCP
zabbix_ag 17558 _apt    5u  sock    0,9      0t0 35768138 protocol: TCPv6
zabbix_ag 17558 _apt    6r   REG  0,116        0        7 /proc/stat
zabbix_ag 17558 _apt   11r   REG    0,5        0    29336 /proc/30433/status

Any other data from /proc can help troubleshooting?

It is really an issue have to reboot server for this, then reboot command hangs in killing process and I don't like power cycling.

Could you please help in troubleshooting?

Thank you
P.
 

Attachments

Last edited:
This is output of cat /proc/17558/stack:
could you try to access:
* `/proc/stat`
* `/proc/30433/status`
(i.e. the 2 files that zabbix tries to access when it hangs) - via cat?

(of course if this happens again you'll need to run the lsof again)

Thanks!
 
additionally could I ask you for:
* `pveversion -v`
* `dmesg`
outputs?