[SOLVED] Cant backup/move VM due LVM Input/output error

CvH

Member
Oct 16, 2020
25
4
23
Germany
I have a VM running at a NVME/LVM. I can't move the disk anymore to any other storage or create a backup.
The VM has 2 disks, the problem appeared at both disks. I tried ~10 times at the smaller disk and it finally moved it.
The bigger disk always shows the same error at different percentages.

Smart values of the NVMe are okay and every other disk is also okay at the NVME. I first thought about a physical problem with the nvme but googeling hinted to some general problem elsewhere for example https://forum.proxmox.com/threads/move-disk-einer-vm-schlägt-fehl.108693/ .
There are similar error at google that doesn't look like a broken disk.

The error at moving the disk
create full clone of drive sata0 (NVMe-2TB:vm-880-disk-0)
Formatting '/mnt/pve/backup/images/880/vm-880-disk-1.raw', fmt=raw size=214748364800 preallocation=off
transferred 0.0 B of 200.0 GiB (0.00%)
transferred 2.0 GiB of 200.0 GiB (1.00%)
...
transferred 64.1 GiB of 200.0 GiB (32.03%)
qemu-img: error while reading at byte 69793202176: Input/output error
qemu-img: error while reading at byte 69791105024: Input/output error
qemu-img: error while reading at byte 69795299328: Input/output error
qemu-img: error while reading at byte 69789007872: Input/output error
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f raw -O raw /dev/NVMe-2TB/vm-880-disk-0 zeroinit:/mnt/pve/backup/images/880/vm-880-disk-1.raw' failed: exit code 1

Dmesg shows
[24737.727029] Buffer I/O error on dev dm-7, logical block 17039371, async page read
[24737.811641] blk_update_request: critical medium error, dev nvme0n1, sector 3397388376 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[24737.811708] Buffer I/O error on dev dm-7, logical block 17039371, async page read
[24737.902210] blk_update_request: critical medium error, dev nvme0n1, sector 3397387352 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[24737.902271] Buffer I/O error on dev dm-7, logical block 17039243, async page read
[24737.986880] blk_update_request: critical medium error, dev nvme0n1, sector 3397387352 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[24737.986942] Buffer I/O error on dev dm-7, logical block 17039243, async page read
[24738.158268] blk_update_request: critical medium error, dev nvme0n1, sector 3397393592 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[24738.160084] Buffer I/O error on dev dm-7, logical block 17040023, async page read
[24738.329600] blk_update_request: critical medium error, dev nvme0n1, sector 3397382264 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[24738.330505] Buffer I/O error on dev dm-7, logical block 17040023, async page read
[24738.331436] Buffer I/O error on dev dm-7, logical block 17038607, async page read
[24738.420383] Buffer I/O error on dev dm-7, logical block 17038607, async page read

I am unsure if this is a HW problem or a LVM problem ?
Restarted the node, just in case - still the same problem.

Any idea how to proceed further?
Or is there any way I can manually copy the file ?

I have a backup but its some days old so I`d like to find a solution if possible.
 
Are you sure that nvme is failed?
not yet, the error is very sketchy and the nvme work overall just some parts are broken - that is rather strange if you ask me

I need to put it into another host and test it there, the NVME is also ~6months old, smart shows everything is very fine.
 
I have moved all vm and start testing first results:
Code:
root@s6:~# badblocks -v /dev/nvme0n1 > ~/bad_sectors.txt
Checking blocks 0 to 976762583
Checking for bad blocks (read-only test): done
Pass completed, 96 bad blocks found. (96/0/0 errors)

Now I am restesting with -w (read and write method.)

I found that errors by noticing backups was failing. I will tonight update to 7.2 (currently 7.1) Could still be kernel related?
 
Here is write and read test results. Less bad blocks.

Code:
root@s6:~# badblocks -vw /dev/nvme0n1 > ~/bad_sectors--.txt
Checking for bad blocks in read-write mode
From block 0 to 976762583
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 17 bad blocks found. (0/0/17 errors)
 
I had an other read only test this time no bad block. Totatlly confused to return to warranty now :)

Code:
root@s6:~# badblocks -v /dev/nvme0n1 > ~/bad_sectors-2-.txt
Checking blocks 0 to 976762583
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)
 
I am not sure if I oversaw it or it increased a lot due the r/w testes but I am currently at these smart values, looks broken to me :)

Media and Data Integrity Errors: 1,546
Error Information Log Entries: 1,583

I used a spare Samsung SSD 970 EVO 2TB for non critical stuff, likewise not a good idea to cheap out at storage :)
 
Other samsung has those values 0. And I am sening it to warrany. Will see what they will do about it.
I chose samsung to not be in this sitation and paid more. Next time I may go with Crucial.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!