hey
my second pve host (pve2) has issues, but i couldn't nail them down
it's a minisforum n100 based machine with a "primary" 500gb wd red nvme and a backup sata ssd (500gb wd red)
i have two VMs running on that device, 201 (jellyfin) and 210 (pihole)
i have passed through the igpu to jellyfin via this guide: https://3os.org/infrastructure/proxmox/gpu-passthrough/igpu-passthrough-to-vm/
(Device 00.02.0)
i have faced an issue two times, that around or during a backup to my nas (can't tell when exactly) my pve/data switches to read only because of some metadata error (though the meta data usage is low)
i reinstalled PVE once, but the issue happened again after two or three weeks
after the second time, i simply "nuked" pve/data and re-established it via command line
i did a local backup once for testing, that worked fine
but during the scheduled local backup tonight, i got an "-err5 input/output" error
the task starts at 03:30, thats why i pulled the log around that time
can you help me/see what could be the issue or where i could investigate further?
maybe it's a hint to my overall problem
thank you very much!
my second pve host (pve2) has issues, but i couldn't nail them down
it's a minisforum n100 based machine with a "primary" 500gb wd red nvme and a backup sata ssd (500gb wd red)
i have two VMs running on that device, 201 (jellyfin) and 210 (pihole)
i have passed through the igpu to jellyfin via this guide: https://3os.org/infrastructure/proxmox/gpu-passthrough/igpu-passthrough-to-vm/
(Device 00.02.0)
i have faced an issue two times, that around or during a backup to my nas (can't tell when exactly) my pve/data switches to read only because of some metadata error (though the meta data usage is low)
i reinstalled PVE once, but the issue happened again after two or three weeks
after the second time, i simply "nuked" pve/data and re-established it via command line
i did a local backup once for testing, that worked fine
but during the scheduled local backup tonight, i got an "-err5 input/output" error
the task starts at 03:30, thats why i pulled the log around that time
can you help me/see what could be the issue or where i could investigate further?
maybe it's a hint to my overall problem
Code:
Jun 29 03:30:02 pve2 pvescheduler[1695605]: <root@pam> starting task UPID:pve2:0019DF76:03AA5924:667F639A:vzdump::root@pam:
Jun 29 03:30:02 pve2 pvescheduler[1695606]: INFO: starting new backup job: vzdump --notes-template '{{guestname}}' --fleecing 0 --mode stop --storage bu_ssd --node pve2 --prune-backups 'keep-monthly=1,keep-weekly=6' --notification-mode notification-system --quiet 1 --compress zstd --all 1
Jun 29 03:30:02 pve2 pvescheduler[1695606]: INFO: Starting Backup of VM 201 (qemu)
Jun 29 03:30:03 pve2 qm[1695614]: shutdown VM 201: UPID:pve2:0019DF7E:03AA5972:667F639B:qmshutdown:201:root@pam:
Jun 29 03:30:03 pve2 qm[1695613]: <root@pam> starting task UPID:pve2:0019DF7E:03AA5972:667F639B:qmshutdown:201:root@pam:
Jun 29 03:30:52 pve2 kernel: tap201i0: left allmulticast mode
Jun 29 03:30:52 pve2 kernel: fwbr201i0: port 2(tap201i0) entered disabled state
Jun 29 03:30:52 pve2 kernel: fwbr201i0: port 1(fwln201i0) entered disabled state
Jun 29 03:30:52 pve2 kernel: vmbr0: port 2(fwpr201p0) entered disabled state
Jun 29 03:30:52 pve2 kernel: fwln201i0 (unregistering): left allmulticast mode
Jun 29 03:30:52 pve2 kernel: fwln201i0 (unregistering): left promiscuous mode
Jun 29 03:30:52 pve2 kernel: fwbr201i0: port 1(fwln201i0) entered disabled state
Jun 29 03:30:52 pve2 kernel: fwpr201p0 (unregistering): left allmulticast mode
Jun 29 03:30:52 pve2 kernel: fwpr201p0 (unregistering): left promiscuous mode
Jun 29 03:30:52 pve2 kernel: vmbr0: port 2(fwpr201p0) entered disabled state
Jun 29 03:30:52 pve2 qmeventd[582]: read: Connection reset by peer
Jun 29 03:30:52 pve2 qm[1695613]: <root@pam> end task UPID:pve2:0019DF7E:03AA5972:667F639B:qmshutdown:201:root@pam: OK
Jun 29 03:30:52 pve2 systemd[1]: 201.scope: Deactivated successfully.
Jun 29 03:30:52 pve2 systemd[1]: 201.scope: Consumed 54min 46.216s CPU time.
Jun 29 03:30:53 pve2 systemd[1]: Started 201.scope.
Jun 29 03:30:53 pve2 qmeventd[1695770]: Starting cleanup for 201
Jun 29 03:30:53 pve2 qmeventd[1695770]: trying to acquire lock...
Jun 29 03:30:53 pve2 kernel: tap201i0: entered promiscuous mode
Jun 29 03:30:53 pve2 kernel: vmbr0: port 2(fwpr201p0) entered blocking state
Jun 29 03:30:53 pve2 kernel: vmbr0: port 2(fwpr201p0) entered disabled state
Jun 29 03:30:53 pve2 kernel: fwpr201p0: entered allmulticast mode
Jun 29 03:30:53 pve2 kernel: fwpr201p0: entered promiscuous mode
Jun 29 03:30:53 pve2 kernel: vmbr0: port 2(fwpr201p0) entered blocking state
Jun 29 03:30:53 pve2 kernel: vmbr0: port 2(fwpr201p0) entered forwarding state
Jun 29 03:30:53 pve2 kernel: fwbr201i0: port 1(fwln201i0) entered blocking state
Jun 29 03:30:53 pve2 kernel: fwbr201i0: port 1(fwln201i0) entered disabled state
Jun 29 03:30:53 pve2 kernel: fwln201i0: entered allmulticast mode
Jun 29 03:30:53 pve2 kernel: fwln201i0: entered promiscuous mode
Jun 29 03:30:53 pve2 kernel: fwbr201i0: port 1(fwln201i0) entered blocking state
Jun 29 03:30:53 pve2 kernel: fwbr201i0: port 1(fwln201i0) entered forwarding state
Jun 29 03:30:53 pve2 kernel: fwbr201i0: port 2(tap201i0) entered blocking state
Jun 29 03:30:53 pve2 kernel: fwbr201i0: port 2(tap201i0) entered disabled state
Jun 29 03:30:53 pve2 kernel: tap201i0: entered allmulticast mode
Jun 29 03:30:53 pve2 kernel: fwbr201i0: port 2(tap201i0) entered blocking state
Jun 29 03:30:53 pve2 kernel: fwbr201i0: port 2(tap201i0) entered forwarding state
Jun 29 03:30:54 pve2 kernel: vfio-pci 0000:00:02.0: enabling device (0000 -> 0003)
Jun 29 03:30:55 pve2 qmeventd[1695770]: OK
Jun 29 03:30:55 pve2 qmeventd[1695770]: vm still running
Jun 29 03:31:05 pve2 kernel: kvm: kvm [1695782]: ignored rdmsr: 0xc0011029 data 0x0
Jun 29 03:31:24 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:31:24 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x54f553c000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:31:24 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:31:24 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x1f04e6d000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:31:24 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:31:24 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x49f5631000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:31:24 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:31:29 pve2 kernel: dmar_fault: 851 callbacks suppressed
Jun 29 03:31:29 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:31:29 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x5d5d199000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:31:29 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:31:29 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x1ba1e18000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:31:29 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:31:29 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x662e6e7000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:31:29 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 387
Jun 29 03:31:31 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 387
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:31 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Jun 29 03:31:32 pve2 pvescheduler[1695606]: ERROR: Backup of VM 201 failed - job failed with err -5 - Input/output error
Jun 29 03:31:32 pve2 pvescheduler[1695606]: INFO: Starting Backup of VM 210 (qemu)
Jun 29 03:31:33 pve2 qm[1696030]: shutdown VM 210: UPID:pve2:0019E11E:03AA7C8C:667F63F5:qmshutdown:210:root@pam:
Jun 29 03:31:33 pve2 qm[1696026]: <root@pam> starting task UPID:pve2:0019E11E:03AA7C8C:667F63F5:qmshutdown:210:root@pam:
Jun 29 03:31:35 pve2 kernel: tap210i0: left allmulticast mode
Jun 29 03:31:35 pve2 kernel: fwbr210i0: port 2(tap210i0) entered disabled state
Jun 29 03:31:35 pve2 kernel: fwbr210i0: port 1(fwln210i0) entered disabled state
Jun 29 03:31:35 pve2 kernel: vmbr0: port 3(fwpr210p0) entered disabled state
Jun 29 03:31:35 pve2 kernel: fwln210i0 (unregistering): left allmulticast mode
Jun 29 03:31:35 pve2 kernel: fwln210i0 (unregistering): left promiscuous mode
Jun 29 03:31:35 pve2 kernel: fwbr210i0: port 1(fwln210i0) entered disabled state
Jun 29 03:31:35 pve2 kernel: fwpr210p0 (unregistering): left allmulticast mode
Jun 29 03:31:35 pve2 kernel: fwpr210p0 (unregistering): left promiscuous mode
Jun 29 03:31:35 pve2 kernel: vmbr0: port 3(fwpr210p0) entered disabled state
Jun 29 03:31:35 pve2 qmeventd[582]: read: Connection reset by peer
Jun 29 03:31:36 pve2 qm[1696026]: <root@pam> end task UPID:pve2:0019E11E:03AA7C8C:667F63F5:qmshutdown:210:root@pam: OK
Jun 29 03:31:36 pve2 systemd[1]: 210.scope: Deactivated successfully.
Jun 29 03:31:36 pve2 systemd[1]: 210.scope: Consumed 22min 16.139s CPU time.
Jun 29 03:31:36 pve2 systemd[1]: Started 210.scope.
Jun 29 03:31:36 pve2 qmeventd[1696046]: Starting cleanup for 210
Jun 29 03:31:36 pve2 qmeventd[1696046]: trying to acquire lock...
Jun 29 03:31:36 pve2 kernel: tap210i0: entered promiscuous mode
Jun 29 03:31:36 pve2 kernel: vmbr0: port 3(fwpr210p0) entered blocking state
Jun 29 03:31:36 pve2 kernel: vmbr0: port 3(fwpr210p0) entered disabled state
Jun 29 03:31:36 pve2 kernel: fwpr210p0: entered allmulticast mode
Jun 29 03:31:36 pve2 kernel: fwpr210p0: entered promiscuous mode
Jun 29 03:31:36 pve2 kernel: vmbr0: port 3(fwpr210p0) entered blocking state
Jun 29 03:31:36 pve2 kernel: vmbr0: port 3(fwpr210p0) entered forwarding state
Jun 29 03:31:36 pve2 kernel: fwbr210i0: port 1(fwln210i0) entered blocking state
Jun 29 03:31:36 pve2 kernel: fwbr210i0: port 1(fwln210i0) entered disabled state
Jun 29 03:31:36 pve2 kernel: fwln210i0: entered allmulticast mode
Jun 29 03:31:36 pve2 kernel: fwln210i0: entered promiscuous mode
Jun 29 03:31:36 pve2 kernel: fwbr210i0: port 1(fwln210i0) entered blocking state
Jun 29 03:31:36 pve2 kernel: fwbr210i0: port 1(fwln210i0) entered forwarding state
Jun 29 03:31:36 pve2 kernel: fwbr210i0: port 2(tap210i0) entered blocking state
Jun 29 03:31:36 pve2 kernel: fwbr210i0: port 2(tap210i0) entered disabled state
Jun 29 03:31:36 pve2 kernel: tap210i0: entered allmulticast mode
Jun 29 03:31:36 pve2 kernel: fwbr210i0: port 2(tap210i0) entered blocking state
Jun 29 03:31:36 pve2 kernel: fwbr210i0: port 2(tap210i0) entered forwarding state
Jun 29 03:31:36 pve2 qmeventd[1696046]: OK
Jun 29 03:31:36 pve2 qmeventd[1696046]: vm still running
Jun 29 03:31:43 pve2 kernel: kvm: kvm [1696056]: ignored rdmsr: 0xc0011029 data 0x0
Jun 29 03:31:52 pve2 pvescheduler[1695606]: INFO: Finished Backup of VM 210 (00:00:20)
Jun 29 03:31:52 pve2 pvescheduler[1695606]: INFO: Backup job finished with errors
Jun 29 03:31:52 pve2 pvescheduler[1695606]: job errors
Jun 29 03:36:00 pve2 kernel: dmar_fault: 11864 callbacks suppressed
Jun 29 03:36:00 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:00 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x1b7dc2c000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:00 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:00 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x324857d000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:00 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:00 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x67193d0000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:00 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:18 pve2 kernel: dmar_fault: 2 callbacks suppressed
Jun 29 03:36:18 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:18 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x7cb023c000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:18 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:18 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x23be166000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:18 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:18 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x191fe19000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:19 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:23 pve2 kernel: dmar_fault: 104 callbacks suppressed
Jun 29 03:36:23 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:23 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x14e79c7000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:23 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:23 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x56208df000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:23 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:36:23 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x29bbf67000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:36:23 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:40:52 pve2 smartd[580]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 52 to 51
Jun 29 03:46:11 pve2 kernel: dmar_fault: 23 callbacks suppressed
Jun 29 03:46:11 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:46:11 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x99ac6d000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:46:11 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:46:11 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x2a89b75000 [fault reason 0x05] PTE Write access is not set
Jun 29 03:46:11 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Jun 29 03:46:11 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x3d496aa000 [fault reason 0x05] PTE Write access is not set
thank you very much!
Last edited: