Hello all,
Recently I started having issue during mainly backups, proxmox hangs and the management page is not available anymore, backup never completes and everything basically stops. Hard reboot is the only solution in this case.
Machine is a Minisforum MS-01 with the i9-13900H
From what I saw I had high I/O of 45% and then nothing... I guess it is storage related. The boot drive is a cheap 1tb nvme from Lexar, it hosts as well a few low use LXC storage (Pihole, cloudflared, Wireguard server,...) and 1 low use VM.
The storage nvme for all other VM's is a Samsung 990pro 4tb, I have docker, plex, a Windows VM, Roon, home assistant, truenas for test, minecraft server)
Backup was on snapshot for all the vm/lxc.
Do you know or can guide me to where the issue may be ? thanks !
See the relevant part of the log; there is nothing after it, till it reboots (nvme1 is the Lexar drive):
May 11 02:00:06 pve pvescheduler[350646]: INFO: starting new backup job: vzdump 100 101 102 104 105 106 107 999 --quiet 1 --notes-template '{{guestname}}' --mailnotification failure --bwlimit 204800 --compress zstd --storage NAS --node pve --fleecing 0 --mode snapshot --prune-backups 'keep-daily=7,keep-monthly=2,keep-yearly=1'
May 11 02:00:06 pve pvescheduler[350646]: INFO: Starting Backup of VM 100 (qemu)
May 11 02:12:57 pve pvescheduler[350646]: INFO: Finished Backup of VM 100 (00:12:51)
May 11 02:12:57 pve pvescheduler[350646]: INFO: Starting Backup of VM 101 (qemu)
May 11 02:17:01 pve CRON[358679]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 11 02:17:01 pve CRON[358680]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 11 02:17:01 pve CRON[358679]: pam_unix(cron:session): session closed for user root
May 11 02:21:30 pve pvescheduler[350646]: INFO: Finished Backup of VM 101 (00:08:33)
May 11 02:21:30 pve pvescheduler[350646]: INFO: Starting Backup of VM 102 (qemu)
May 11 02:23:43 pve pve-firewall[1946]: firewall update time (7.329 seconds)
May 11 02:23:44 pve pvestatd[1954]: status update time (7.112 seconds)
May 11 02:24:25 pve pve-firewall[1946]: firewall update time (19.170 seconds)
May 11 02:24:26 pve pvestatd[1954]: status update time (19.386 seconds)
May 11 02:24:30 pve pve-ha-lrm[1990]: loop take too long (32 seconds)
May 11 02:24:44 pve pvescheduler[362733]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
May 11 02:24:44 pve pve-firewall[1946]: firewall update time (9.160 seconds)
May 11 02:24:45 pve pvestatd[1954]: status update time (8.960 seconds)
May 11 02:29:11 pve pvescheduler[350646]: INFO: Finished Backup of VM 102 (00:07:41)
May 11 02:29:12 pve pvescheduler[350646]: INFO: Starting Backup of VM 104 (qemu)
May 11 02:29:35 pve pve-firewall[1946]: firewall update time (9.897 seconds)
May 11 02:29:35 pve systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
May 11 02:30:03 pve pve-firewall[1946]: firewall update time (18.145 seconds)
May 11 02:30:06 pve pvestatd[1954]: status update time (39.916 seconds)
May 11 02:30:07 pve pveupdate[367530]: <root@pam> starting task UPIDve:00059C41:004FA0CC:663EBC0F:aptupdate::root@pam:
May 11 02:30:21 pve pvestatd[1954]: status update time (5.479 seconds)
May 11 02:30:46 pve pveproxy[303815]: detected empty handle
May 11 02:30:46 pve pve-firewall[1946]: firewall update time (13.551 seconds)
May 11 02:30:52 pve kernel: nvme nvme1: I/O tag 109 (006d) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:524288
-- Reboot --
Recently I started having issue during mainly backups, proxmox hangs and the management page is not available anymore, backup never completes and everything basically stops. Hard reboot is the only solution in this case.
Machine is a Minisforum MS-01 with the i9-13900H
From what I saw I had high I/O of 45% and then nothing... I guess it is storage related. The boot drive is a cheap 1tb nvme from Lexar, it hosts as well a few low use LXC storage (Pihole, cloudflared, Wireguard server,...) and 1 low use VM.
The storage nvme for all other VM's is a Samsung 990pro 4tb, I have docker, plex, a Windows VM, Roon, home assistant, truenas for test, minecraft server)
Backup was on snapshot for all the vm/lxc.
Do you know or can guide me to where the issue may be ? thanks !
See the relevant part of the log; there is nothing after it, till it reboots (nvme1 is the Lexar drive):
May 11 02:00:06 pve pvescheduler[350646]: INFO: starting new backup job: vzdump 100 101 102 104 105 106 107 999 --quiet 1 --notes-template '{{guestname}}' --mailnotification failure --bwlimit 204800 --compress zstd --storage NAS --node pve --fleecing 0 --mode snapshot --prune-backups 'keep-daily=7,keep-monthly=2,keep-yearly=1'
May 11 02:00:06 pve pvescheduler[350646]: INFO: Starting Backup of VM 100 (qemu)
May 11 02:12:57 pve pvescheduler[350646]: INFO: Finished Backup of VM 100 (00:12:51)
May 11 02:12:57 pve pvescheduler[350646]: INFO: Starting Backup of VM 101 (qemu)
May 11 02:17:01 pve CRON[358679]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 11 02:17:01 pve CRON[358680]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 11 02:17:01 pve CRON[358679]: pam_unix(cron:session): session closed for user root
May 11 02:21:30 pve pvescheduler[350646]: INFO: Finished Backup of VM 101 (00:08:33)
May 11 02:21:30 pve pvescheduler[350646]: INFO: Starting Backup of VM 102 (qemu)
May 11 02:23:43 pve pve-firewall[1946]: firewall update time (7.329 seconds)
May 11 02:23:44 pve pvestatd[1954]: status update time (7.112 seconds)
May 11 02:24:25 pve pve-firewall[1946]: firewall update time (19.170 seconds)
May 11 02:24:26 pve pvestatd[1954]: status update time (19.386 seconds)
May 11 02:24:30 pve pve-ha-lrm[1990]: loop take too long (32 seconds)
May 11 02:24:44 pve pvescheduler[362733]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
May 11 02:24:44 pve pve-firewall[1946]: firewall update time (9.160 seconds)
May 11 02:24:45 pve pvestatd[1954]: status update time (8.960 seconds)
May 11 02:29:11 pve pvescheduler[350646]: INFO: Finished Backup of VM 102 (00:07:41)
May 11 02:29:12 pve pvescheduler[350646]: INFO: Starting Backup of VM 104 (qemu)
May 11 02:29:35 pve pve-firewall[1946]: firewall update time (9.897 seconds)
May 11 02:29:35 pve systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
May 11 02:30:03 pve pve-firewall[1946]: firewall update time (18.145 seconds)
May 11 02:30:06 pve pvestatd[1954]: status update time (39.916 seconds)
May 11 02:30:07 pve pveupdate[367530]: <root@pam> starting task UPIDve:00059C41:004FA0CC:663EBC0F:aptupdate::root@pam:
May 11 02:30:21 pve pvestatd[1954]: status update time (5.479 seconds)
May 11 02:30:46 pve pveproxy[303815]: detected empty handle
May 11 02:30:46 pve pve-firewall[1946]: firewall update time (13.551 seconds)
May 11 02:30:52 pve kernel: nvme nvme1: I/O tag 109 (006d) opcode 0x1 (I/O Cmd) QID 1 timeout, aborting req_op:WRITE(1) size:524288
-- Reboot --