The IO operation at logical block address ..... was retried. vs PVE snapshot backup (LVM shared storage)

Gabor Szel · Dec 8, 2024

Dear All!

We have several such systems:
- Dell VRTX
- 4x m640
- 10x3.84 SAS SSD (Raid6) with 2xPerc8 shared RAID controllers (all node connected to this "storage")
- PVE 7.4-18 (we plan to update!)
- 4 node PVE cluster (non HA)
- LVM shared storage on ISCSI Multipath (shared storage not supported snapshot!)

Problem:
- Windows 2022 VM + Oracle SQL ( 300+ days uptime!)
- 3 virtual disks
- a VM snapshot backup is taken every night to PBS
- qemu guest agent installed
- PVE start backup 01:20 and make "snapshot"
- after PVE send "fs-freeze" qemu command, windows start IO operation errors:
"The IO operation at logical block address 0xd1c8 for Disk 1 (PDO name: \Device\0000004d) was retried."
only on disk1
- In the following, disk errors are constantly logged, constantly!
- disk1 is damaged, unusable .... (it could not be repaired)
- We restored the disk from backup and everything was fine! (from the backup that will get disk damaged!)
- on LVM storage we have 3Tb free space, no free space problem.
- other VMs are not damaged, cluster uptime is 400+ days.
- on PVE host no error message (nothing)

This is the second time we've experienced this.
We use PVE in many-many places, but we only experience this on LVM shared storage!

What can be done to prevent this from happening?

Search

Search

The IO operation at logical block address ..... was retried. vs PVE snapshot backup (LVM shared storage)

Gabor Szel

Active Member

We value your privacy