Hello,
Today a weird thing happened: all of a sudden Icinga started pushing dozens of alerts. All the CTs was unreachable. While still pingable, I couldn't ssh into them nor pct enter from the host (the command was hanging). The web UI show nothing special, the CTs was up ands running.
It's a production server so I freaked out and attempted a soft reboot which get stuck (systemd said it had no time limit to waiting for a process - pvedaemon I think - to exit) and then I did a hard reboot.
After reboot, everything is fine so far.
Here is the relevant parts of the kernel log when this happened:
And then the hanged process when soft rebooting (those entries was repeated a lot of time before hard reboot):
Today a weird thing happened: all of a sudden Icinga started pushing dozens of alerts. All the CTs was unreachable. While still pingable, I couldn't ssh into them nor pct enter from the host (the command was hanging). The web UI show nothing special, the CTs was up ands running.
It's a production server so I freaked out and attempted a soft reboot which get stuck (systemd said it had no time limit to waiting for a process - pvedaemon I think - to exit) and then I did a hard reboot.
After reboot, everything is fine so far.
Here is the relevant parts of the kernel log when this happened:
kernel: [172455.308889] [<ffffffff81306128>] fuse_direct_io+0x3a8/0x5b0
kernel: [172455.323219] [<ffffffff812fba47>] fuse_request_send+0x27/0x30
kernel: [172455.326029] [<ffffffff811fd2c6>] __vfs_read+0x26/0x40
kernel: [172455.327518] [<ffffffff8180aaf2>] entry_SYSCALL_64_fastpath+0x16/0x75
kernel: [172455.329815] ffff880d01637be8 0000000000000082 ffff881038532940 ffff880b1cfce040
kernel: [172455.332307] [<ffffffff810bdd30>] ? wait_woken+0x90/0x90
kernel: [172455.333529] [<ffffffff81306128>] fuse_direct_io+0x3a8/0x5b0
kernel: [172455.335099] [<ffffffff811fd2c6>] __vfs_read+0x26/0x40
kernel: [172455.337775] cat D ffff88103f456a00 0 22418 6479 0x00000104
kernel: [172455.340176] [<ffffffff812fb863>] request_wait_answer+0x163/0x280
kernel: [172455.344816] INFO: task check_mem:22442 blocked for more than 120 seconds.
kernel: [172455.345213] Tainted: P O 4.2.8-1-pve #1
kernel: [172455.346393] ffff880121ec3be8 0000000000000086 ffffffff81e14580 ffff880f650c3700
kernel: [172455.347618] Call Trace:
kernel: [172455.348823] [<ffffffff810bdd30>] ? wait_woken+0x90/0x90
kernel: [172455.349238] [<ffffffff812fba10>] __fuse_request_send+0x90/0xa0
kernel: [172455.351169] [<ffffffff811fd264>] new_sync_read+0x94/0xd0
kernel: [172455.379977] 0000000000000246 ffff880f0e374000 ffff880f0e373c38 ffff880ff6e8fcf0
kernel: [172455.384370] [<ffffffff811fd264>] new_sync_read+0x94/0xd0
And then the hanged process when soft rebooting (those entries was repeated a lot of time before hard reboot):
pvedaemon[11052]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/PVE/Tools.pm line 834.
pvedaemon[11052]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/PVE/Tools.pm line 834.
pvedaemon[11052]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/PVE/Tools.pm line 834.
pvedaemon[24466]: Argument "\n" isn't numeric in int at /usr/share/perl5/PVE/Tools.pm line 847, <GEN3306> line 1.
pvedaemon[24466]: Argument "\n" isn't numeric in int at /usr/share/perl5/PVE/Tools.pm line 848, <GEN3306> line 2.
pvedaemon[24466]: Argument "\n" isn't numeric in int at /usr/share/perl5/PVE/Tools.pm line 849, <GEN3306> line 3.
lxcfs[1510]: Timed out waiting for scm_cred: Success