Oct 4, 2018
5
0
21
59
I have a small server with 2 ea 256G SSDs. Proxmox 4.4-24 installed on sda along with 4 VMs. sdb is used for backups (clones) of the VMs from sda. Last week one of the 4 VMs failed to clone. Unfortunately I had already deleted the known good clone from sdb. (yes you can slap me). SMART stats would indicate that sda is failing although it is a quality drive with less than 1100 hours and 1% wearout. Proxmox host and all 4 VMs function fine but VM100 fails with -exit code 1. Reallocated sector count continues to rise. (over 200 now). My first thought was to clone the drive ignoring errors (since those sector errors are apparently associated with unused storage space or something non-critical). Don't need to go into all the reasons cloning didn't work related to LVM and such. Is my best bet really to recreate VM100 from scratch (using backup configs) onto sdb, replace sda, reinstall Proxmox on sda, and reestablish the 4 VMs from sdb? (making backups once operation of the VMs are verified.) I've failed to find a way for FSCK to repair or Qemu-img to ignore errors on convert.
 
Hi,

If you get a block error this means nothing. Good hi-end SSD will report damaged blocks(Cells) but for this, all SSD has reserve Cells.


Did you try "qemu-img check"?

SSD SMART info shows increasing numbers of remapped sectors.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 202
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1031
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 81
170 Unknown_Attribute 0x0033 099 099 010 Pre-fail Always - 0
171 Unknown_Attribute 0x0032 100 100 010 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 010 Old_age Always - 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 47
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 1436
190 Airflow_Temperature_Cel 0x0032 041 077 000 Old_age Always - 41 (Min/Max 27/77)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 47
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
225 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 130338
226 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 0
227 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 0
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 0
232 Available_Reservd_Space 0x0033 080 080 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 096 096 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 130338
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 54600
249 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 2271
252 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 10

SMART Error Log Version: 1
No Errors Logged

Yes...qemu-img check reports "cannot check RAW image"
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!