hey
i have two pve machines (pve1 and pve2)
pve1 is rock solid... pve2 has often had issues with backup errors, often caused because the disk switched into readonly mode for no apparent reason
backup failed tonight again and one of my vms seems to be corrupt
I never could run a smart test because the currently installed smart tool does not support (at least a month ago when I tried) nvme drives (I use a wd red 500gb nvme ssd)
the log says
the gotify log says
i have no idea why this keeps happening and I don't k oe how to debug further
can anyone help me please?
i have two pve machines (pve1 and pve2)
pve1 is rock solid... pve2 has often had issues with backup errors, often caused because the disk switched into readonly mode for no apparent reason
backup failed tonight again and one of my vms seems to be corrupt
I never could run a smart test because the currently installed smart tool does not support (at least a month ago when I tried) nvme drives (I use a wd red 500gb nvme ssd)
the log says
Code:
Aug 24 03:35:00 pve2 kernel: kvm: kvm [3116309]: ignored rdmsr: 0xc0011029 data 0x0
Aug 24 03:35:00 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:00 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x7bdfc91000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:00 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:00 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x364565000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:00 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:00 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x5e4e2d3000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:00 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:05 pve2 kernel: dmar_fault: 2123 callbacks suppressed
Aug 24 03:35:05 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:05 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x6ddfc81000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:05 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:05 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x1dd810d000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:05 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:05 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xe1529000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:05 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:10 pve2 kernel: dmar_fault: 668 callbacks suppressed
Aug 24 03:35:10 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:10 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x19f579f000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:10 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:10 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x57a8245000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:10 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:10 pve2 kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0x40d2d69000 [fault reason 0x05] PTE Write access is not set
Aug 24 03:35:10 pve2 kernel: DMAR: DRHD: handling fault status reg 3
Aug 24 03:35:11 pve2 kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 2369
Aug 24 03:35:11 pve2 kernel: device-mapper: block manager: btree_node validator check failed for block 2369
Aug 24 03:35:11 pve2 kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Aug 24 03:35:12 pve2 pvescheduler[3115269]: ERROR: Backup of VM 210 failed - job failed with err -5 - Input/output error
Aug 24 03:35:12 pve2 pvescheduler[3115269]: INFO: Backup job finished with errors
Aug 24 03:35:12 pve2 pvescheduler[3115269]: job errors
the gotify log says
Code:
210: 2024-08-24 03:34:51 INFO: Starting Backup of VM 210 (qemu)
210: 2024-08-24 03:34:51 INFO: status = running
210: 2024-08-24 03:34:51 INFO: backup mode: stop
210: 2024-08-24 03:34:51 INFO: ionice priority: 7
210: 2024-08-24 03:34:51 INFO: VM Name: PiHole2
210: 2024-08-24 03:34:51 INFO: include disk 'scsi0' 'local-lvm:vm-210-disk-0' 10G
210: 2024-08-24 03:34:51 INFO: stopping virtual guest
210: 2024-08-24 03:34:53 INFO: creating vzdump archive '/mnt/bu_ssd/dump/vzdump-qemu-210-2024_08_24-03_34_51.vma.zst'
210: 2024-08-24 03:34:53 INFO: starting kvm to execute backup task
210: 2024-08-24 03:34:54 INFO: started backup task '40b95e10-168f-490c-a7e5-af29c443c27a'
210: 2024-08-24 03:34:54 INFO: resuming VM again after 3 seconds
210: 2024-08-24 03:34:57 INFO: 7% (783.0 MiB of 10.0 GiB) in 3s, read: 261.0 MiB/s, write: 205.9 MiB/s
210: 2024-08-24 03:35:00 INFO: 13% (1.4 GiB of 10.0 GiB) in 6s, read: 205.4 MiB/s, write: 181.1 MiB/s
210: 2024-08-24 03:35:03 INFO: 19% (2.0 GiB of 10.0 GiB) in 9s, read: 201.8 MiB/s, write: 174.2 MiB/s
210: 2024-08-24 03:35:06 INFO: 27% (2.7 GiB of 10.0 GiB) in 12s, read: 253.8 MiB/s, write: 213.0 MiB/s
210: 2024-08-24 03:35:09 INFO: 32% (3.3 GiB of 10.0 GiB) in 15s, read: 201.8 MiB/s, write: 175.6 MiB/s
210: 2024-08-24 03:35:12 INFO: 50% (5.0 GiB of 10.0 GiB) in 18s, read: 598.2 MiB/s, write: 79.5 MiB/s
210: 2024-08-24 03:35:12 ERROR: job failed with err -5 - Input/output error
210: 2024-08-24 03:35:12 INFO: aborting backup job
210: 2024-08-24 03:35:12 INFO: resuming VM again
210: 2024-08-24 03:35:12 ERROR: Backup of VM 210 failed - job failed with err -5 - Input/output error
i have no idea why this keeps happening and I don't k oe how to debug further
can anyone help me please?