The IO operation at logical block address ..... was retried. vs PVE snapshot backup (LVM shared storage)

Gabor Szel

Active Member
Nov 8, 2018
8
0
41
44
Dear All!

We have several such systems:
- Dell VRTX
- 4x m640
- 10x3.84 SAS SSD (Raid6) with 2xPerc8 shared RAID controllers (all node connected to this "storage")
- PVE 7.4-18 (we plan to update!)
- 4 node PVE cluster (non HA)
- LVM shared storage on ISCSI Multipath (shared storage not supported snapshot!)

Problem:
- Windows 2022 VM + Oracle SQL ( 300+ days uptime!)
- 3 virtual disks
- a VM snapshot backup is taken every night to PBS
- qemu guest agent installed
- PVE start backup 01:20 and make "snapshot"
- after PVE send "fs-freeze" qemu command, windows start IO operation errors:
"The IO operation at logical block address 0xd1c8 for Disk 1 (PDO name: \Device\0000004d) was retried."
only on disk1
- In the following, disk errors are constantly logged, constantly!
- disk1 is damaged, unusable .... (it could not be repaired)
- We restored the disk from backup and everything was fine! (from the backup that will get disk damaged!)
- on LVM storage we have 3Tb free space, no free space problem.
- other VMs are not damaged, cluster uptime is 400+ days.
- on PVE host no error message (nothing)

This is the second time we've experienced this.
We use PVE in many-many places, but we only experience this on LVM shared storage!

What can be done to prevent this from happening?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!