can I trust proxmox's backup systems?

Dec 16, 2022
2
0
1
Hi everyone,
I manage a proxmox server hosted by OVH. This server is configured in raid 1. The log showed an error on a disk that was automatically put offline.

Code:
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md3: Disk failure on nvme0n1p3, disabling device.
md/raid1:md3: Operation continuing on 1 devices.
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: Disk failure on nvme0n1p5, disabling device.
md/raid1:md5: Operation continuing on 1 devices.
Feb 04 11:31:23 ns3220199 kernel: blk_update_request: I/O error, dev nvme0n1, sector 46634688 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md3: nvme0n1p3: rescheduling sector 9446600
Feb 04 11:31:23 ns3220199 kernel: Read-error on swap-device (259:1:46634696)
Feb 04 11:31:23 ns3220199 kernel: Read-error on swap-device (259:1:680216)
Feb 04 11:31:23 ns3220199 kernel: blk_update_request: I/O error, dev nvme0n1, sector 34900056 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md3: nvme0n1p3: rescheduling sector 31719512
Feb 04 11:31:23 ns3220199 kernel: blk_update_request: I/O error, dev nvme0n1, sector 602332672 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: nvme0n1p5: rescheduling sector 554882560
Feb 04 11:31:23 ns3220199 kernel: blk_update_request: I/O error, dev nvme0n1, sector 242041424 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: nvme0n1p5: rescheduling sector 194591312
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: nvme0n1p5: rescheduling sector 152205568
Feb 04 11:31:23 ns3220199 kernel: Read-error on swap-device (259:1:46634264)
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md3: redirecting sector 37751944 to other mirror: nvme1n1p3
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md3: redirecting sector 9446600 to other mirror: nvme1n1p3
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: redirecting sector 860666656 to other mirror: nvme1n1p5
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md3: redirecting sector 31719512 to other mirror: nvme1n1p3
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: redirecting sector 860666472 to other mirror: nvme1n1p5
Feb 04 11:31:23 ns3220199 systemd: systemd-journald.service: Main process exited, code=killed, status=6/ABRT
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: redirecting sector 554882560 to other mirror: nvme1n1p5
Feb 04 11:31:23 ns3220199 systemd: systemd-journald.service: Failed with result 'watchdog'.
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: redirecting sector 194591312 to other mirror: nvme1n1p5
Feb 04 11:31:23 ns3220199 systemd: systemd-journald.service: Consumed 19.459s CPU time.
Feb 04 11:31:23 ns3220199 kernel: md/raid1:md5: redirecting sector 152205568 to other mirror: nvme1n1p5
Feb 04 11:31:23 ns3220199 systemd: systemd-journald.service: Scheduled restart job, restart counter is at 1.
Feb 04 11:31:23 ns3220199 kernel: Read-error on swap-device (259:1:1564944)
Feb 04 11:31:23 ns3220199 kernel: Read-error on swap-device (259:1:993760)
Feb 04 11:31:23 ns3220199 kernel: Read-error on swap-device (259:1:1283472)
Feb 04 11:31:23 ns3220199 systemd: Stopping Flush Journal to Persistent Storage...

OVH has yet to respond to my request. I tried restarting the server and the disk came back online, the smart test does not detect any faults. Has this ever happened to you with this provider? I own a dedicated server. Once the server was restarted, one virtual machine was corrupted and wouldn't start. Some backups also didn't work. I use three backup strategies. One short-term strategy that keeps the last 3 complete snapshots on an OVH NFS cloud storage (3 out of 3 corrupted). A long-term strategy that backs up with proxmox backup server on a corporate nas (I am still restoring, I don't know if it will work EDIT: IT WORKS). The third strategy uses Idrive Mirror (I am currently testing a restore EDIT: IT WORKS). Fortunately, I took a snapshot of the corrupted machine and did a revert. At this point, I wonder if I can trust proxmox's backup systems.

When the corrupted machine started windows ask me to choose a keyboard and next try to repair. So from terminal I understood that the driver was missing. I followed this steps to install the correct driver in windows 2019 server and then retry to fix the boot without success.


EDIT: I see this topic : https://forum.proxmox.com/threads/corrupt-filesystem-after-snapshot.32232/
So nfs mount for backup may corrupt my VMs :eek:



Any ideas?
thanks

  1. Add the virtio driver ISO to VM.
  2. Use Troubleshoot -> Advanced Options -> Command Prompt
  3. Identify your driver latter mappings via wmic logicaldisk get deviceid, volumename, description
    • In my case virtio-win install ISO (CD-ROM Disc) was assigned to E:
  4. Load the driver via the CLI e.g. drvload e:\viostor\2k19\amd64\viostor.inf.
    • After loading the driver, run wmic logicaldisk get deviceid, volumename, description again.
    • F: was where the windows install became mounted in my case
  5. Use the DISM command to inject the storage controller driver
    • E.g. dism /image:f:\ /add-driver /driver:e:\viostor\w10\amd64\viostor.inf
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!