From time to time I have to restart the servers due to loss of control.
Sometimes I get such messages in the log
and from this moment on I can't start / stop virtual servers.
Before this problem I always see such messages in the /var/logs/syslog:
This forum described problems with similar symptoms, but it's something else.
Only a full server reboot can help.
Does anyone have any idea how to diagnose such a problem?
Sometimes I get such messages in the log
Code:
Feb 17 02:38:29 kfn1-node4 pvedaemon[38846]: can't lock file '/var/lock/qemu-server/lock-141.conf' - got timeout
Feb 17 02:38:29 kfn1-node4 pvedaemon[8817]: <root@pam> end task UPID:kfn1-node4:000097BE:08EF85DA:5E49EE8B:qmstop:141:root@pam: can't lock file '/var/lock/qemu-server/lock-141.conf' - got timeout
Before this problem I always see such messages in the /var/logs/syslog:
Code:
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520706] CPU: 52 PID: 1649 Comm: z_wr_int_6 Tainted: P O 4.15.18-21-pve #1
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520736] Hardware name: Supermicro SYS-1029U-TR4T/X11DPU, BIOS 3.1a 07/19/2019
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520811] RIP: 0010:buf_hash_insert+0xbd/0x180 [zfs]
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520828] RSP: 0018:ffffb07b60eebcc0 EFLAGS: 00010206
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520846] RAX: 1b0210fa010154fc RBX: ffff95e1569c91f0 RCX: 0000000000000080
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520866] RDX: 0000000000000001 RSI: ffff96324bf587b0 RDI: ffffb07b7736d558
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520886] RBP: ffffb07b60eebcd8 R08: ffff95e1569c9200 R09: 0000000000baf367
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520913] R10: ffffb07b60eebce0 R11: 0000000000000000 R12: 00000000028256ab
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520934] R13: 000000000005aac0 R14: ffff96335e5df690 R15: ffff9633b1dcbb50
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520958] FS: 0000000000000000(0000) GS:ffff9635bf300000(0000) knlGS:0000000000000000
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.520983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521000] CR2: 00007f79ef135000 CR3: 0000002adea0a003 CR4: 00000000007626e0
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521040] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521060] PKRU: 55555554
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521070] Call Trace:
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521123] arc_write_done+0x125/0x3f0 [zfs]
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521164] zio_done+0x2d0/0xe60 [zfs]
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521179] ? kfree+0x165/0x180
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521197] ? spl_kmem_free+0x33/0x40 [spl]
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521237] zio_execute+0x95/0xf0 [zfs]
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521253] taskq_thread+0x2ae/0x4d0 [spl]
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.521268] ? wake_up_q+0x80/0x80
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.522089] ? zio_reexecute+0x390/0x390 [zfs]
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.523021] kthread+0x105/0x140
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.523759] ? task_done+0xb0/0xb0 [spl]
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.524503] ? kthread_create_worker_on_cpu+0x70/0x70
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.525215] ret_from_fork+0x1f/0x40
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.525909] Code: 05 31 7c 18 00 4a 8d 3c e0 48 8b 37 48 85 f6 0f 84 c2 00 00 00 48 8b 0b 48 89 f0 31 d2 eb 0c 48 8b 40 20 83 c2 01 48 85 c0 74 2f <48> 39 08 75 ef 4c 8b 53 08 4c 39 50 08 75 e5 4c 8
b 5b 10 4c 39
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.527427] RIP: buf_hash_insert+0xbd/0x180 [zfs] RSP: ffffb07b60eebcc0
Feb 17 01:49:32 kfn1-node4 kernel: [1496221.528218] ---[ end trace 1ec84add9901e42b ]---
This forum described problems with similar symptoms, but it's something else.
Only a full server reboot can help.
Does anyone have any idea how to diagnose such a problem?
pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-21-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15.18-21-pve: 4.15.18-48
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-41
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
pve-zsync: 1.7-4
qemu-server: 5.0-55
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2