move disk keeps failing.

ferret_ferret

New Member
Apr 3, 2024
4
0
1
hello, i am trying to move a vm's disk to a new ssd i got, but one vm keeps failing. i cannot figure out a way forward and would love some assistance.

output of move disk through ui:
Code:
()
create full clone of drive scsi0 (backup:109/vm-109-disk-0.qcow2)
Formatting '/mnt/pve/nvme1/images/109/vm-109-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=687194767360 lazy_refcounts=off refcount_bits=16
transferred 0.0 B of 640.0 GiB (0.00%)
transferred 6.4 GiB of 640.0 GiB (1.00%)
transferred 12.8 GiB of 640.0 GiB (2.00%)
transferred 19.2 GiB of 640.0 GiB (3.00%)
qemu-img: error while reading at byte 16480993280: Input/output error
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f qcow2 -O qcow2 /mnt/pve/backup/images/109/vm-109-disk-0.qcow2 zeroinit:/mnt/pve/nvme1/images/109/vm-109-disk-0.qcow2' failed: exit code 1

Bash:
qemu-img check -f qcow2 /mnt/pve/backup/images/109/vm-109-disk-0.qcow2
No errors were found on the image.
7263740/10485760 = 69.27% allocated, 10.16% fragmented, 0.00% compressed clusters
Image end offset: 544648200192

smart test results on the source drive
Code:
smartctl --all /dev/sdh
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.13-3-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Elements / My Passport (USB, AF)
Device Model:     WDC WD50NDZW-11A8JS1
Serial Number:    WD-WXF2E214N6C1
LU WWN Device Id: 5 0014ee 269729388
Firmware Version: 01.01A01
User Capacity:    5,000,947,523,584 bytes [5.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 17 17:51:42 2024 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 247)    Self-test routine in progress...
                    70% of test remaining.
Total time to complete Offline
data collection:         ( 4560) seconds.
Offline data collection
capabilities:              (0x1b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 702) minutes.
SCT capabilities:            (0x30b5)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       22
  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       3316
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       255
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       22900
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       145
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       129
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3568
194 Temperature_Celsius     0x0022   105   087   000    Old_age   Always       -       47
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

i have tried fsck'ing the drive itself, the disk itself, and i am kinda at a loss at this time. the vm runs fine, albeit a bit slow.

i was able to move other disk images from the source drive to the target drive without any issues it appears to only be this one disk image
 
is it always the same offset? it does sound like a broken disk..
 
  • Like
Reactions: Kingneutron
In this case I would run a backup from inside the VM and try to save what you can. Depending on how critical this VM is, sounds like the backing storage is dying -- and you might need to reinstall / recreate the guest OS after restoring to ensure a good boot and consistent operating files.

If you're not worried about the guest OS and only need data files, I would suggest using Midnight Commander to sftp individual files/dirs out to another drive on the host, or NAS. Don't send them to the same bad disk. And I would order a new nvme drive ASAP, preferably one with a high TBW endurance
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!