VM corrupting on iscsi multipath storage me4084

José Roberto · Jun 22, 2021

Hello friends,
I have a dell ME4084 storage, where I'm exporting a LUN to my proxmox 6.4-1 cluster with 3 nodes, I performed the configuration according to the proxmox documentation, apparently it is working normally according to the output of mutipath -ll below:

mpath_dell_vol_proxmox (3600c0ff00053643ce06cbf6001000000) dm-5 DellEMC,ME4
size=15T features='1 queue_if_no_path' hwhandler='1 alu' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 16:0:0:0 sdd 8:48 active ready running
|- 17:0:0:0 sdc 8:32 active ready running
|- 15:0:0:0 sdb 8:16 active ready running
`- 18:0:0:0 since 8:64 active ready running

I get to create the machines normally on the created volume, but when they are running it's corrupting the vms:

Jun 22 01:17:47 graylog kernel: [21078.577653] EXT4-fs error (device sda2): ext4_validate_block_bitmap:384: comm kworker/u16:1: bg 881: bad block bitmap checksum
Jun 22 01:17:47 graylog kernel: [21078.580090] EXT4-fs (sda2): Delayed block allocation failed for inode 1447687 at logical offset 0 with max blocks 6 with error 74
Jun 22 01:17:47 graylog kernel: [21078.580134] EXT4-fs (sda2): This should not happen!! Data will be lost

Has anyone experienced this problem, could we help me?

My infrastructure is configured as follows on a 10 Gb network:
SAN1 Network 10.10.10.0/24 VLAN 5
SAN2 Network 11.11.11.0/24 VLAN 10
Local Network 192.168.1.0/24 VLAN 15

bbgeek17 · Jun 22, 2021

I am assuming that the messages you poste are from hypervisor. You have found out what happens when you attempt to use a non-cluster aware filesystem (ext4) in a cluster.
You must reconfigure your disk/storage to use LVM on the shared storage.

You can start learning about it here: https://pve.proxmox.com/wiki/Storage , with more links available at the bottom of that page

José Roberto · Jun 22, 2021

bbgeek17 said:
Estou presumindo que as mensagens que você poste são do hipervisor. Você descobriu o que acontece quando tenta usar um sistema de arquivos sem reconhecimento de cluster (ext4) em um cluster.
Você deve reconfigurar seu disco / armazenamento para usar o LVM no armazenamento compartilhado.

Você pode começar a aprender sobre isso aqui: https://pve.proxmox.com/wiki/Storage , com mais links disponíveis na parte inferior da página

José Roberto · Jun 22, 2021

Ola amigo,

Na verdade a mensagem é de uma vm instalada no proxmox.

Eu já configurei o cluster proxmox com lvm via iscsi, no hypervisior não aparece nenhum erro.

Eu estou achando que é o MTU que eu tinha aplicado somente na interface física agora eu coloquei também nas vmbrs. (MTU 8900 aplicado). Estou esperando para ver se corrompe novamente.

Você acha que pode ser isso?

José Roberto · Jun 22, 2021

Guys after two hours the problem came back, so on this line it was not the MTU on the interfaces:

Jun 22 16:20:12 graylog kernel: [7722.329015] EXT4-fs error (device sda2): ext4_validate_block_bitmap:384: comm kworker/u16:1: bg 873: bad block bitmap checksum
Jun 22 16:20:12 graylog kernel: [ 7722.333355] EXT4-fs (sda2): Delayed block allocation failed for inode 1447825 at logical offset 18432 with max blocks 2048 with error 74
Jun 22 16:20:12 graylog kernel: [ 7722.333592] EXT4-fs (sda2): This should not happen!! Data will be lost
Jun 22 16:20:12 graylog kernel: [7722.333592]

bbgeek17 · Jun 22, 2021

I suspected that MTU mismatch would be unlikely to lead to actual data corruption, packet loss and slow performance - sure.
I suggest you try to shutdown 2 out of 3 nodes, and see if a single node functions as expected.

In the meantime you need to provide community more information, such as:
lsscsi (each node)
lsblk (each node)
mutipath -ll (from each node)
pvs
vgs
lvs
pvesm status
pvesm list <storage-name>
qm config <vmid>

Is there absolutely no errors on the hypervisor side? /var/log/messages? journalctl?
Are there any errors on network interfaces? (https://www.techrepublic.com/articl...thernet-interface-ubuntu-server-with-ethtool/)

José Roberto · Jun 24, 2021

I followed your tip to test only on one but I still have the same problem.

"Jun 24 15:52:07 UBUTESTE kernel: [65400.310058] EXT4-fs warning (device sda2): ext4_trim_all_free:5198: Error -117 loading buddy information for 333
Jun 24 15:52:09 UBUTESTE kernel: [65401.549693] EXT4-fs warning (device sda2): ext4_trim_all_free:5198: Error -117 loading buddy information for 333
Jun 24 15:52:10 UBUTESTE kernel: [65402.565329] EXT4-fs warning (device sda2): ext4_trim_all_free:5198: Error -117 loading buddy information for 333
Jun 24 15:52:12 UBUTESTE kernel: [65404.711834] EXT4-fs warning (device sda2): ext4_trim_all_free:5198: Error -117 loading buddy information for 333
Jun 24 16:00:12 UBUTESTE kernel: [65885.317752] EXT4-fs (sda2): Delayed block allocation failed for inode 9961607 at logical offset 4096 with max blocks 2048 with error 117
Jun 24 16:00:12 UBUTESTE kernel: [65885.320466] EXT4-fs (sda2): This should not happen!! Data will be lost
Jun 24 16:00:12 UBUTESTE kernel: [65885.320466]
Jun 24 16:00:50 UBUTESTE kernel: [65922.911442] EXT4-fs (sda2): Delayed block allocation failed for inode 9961616 at logical offset 61440 with max blocks 2048 with error 117
Jun 24 16:00:50 UBUTESTE kernel: [65922.914905] EXT4-fs (sda2): This should not happen!! Data will be lost
Jun 24 16:00:50 UBUTESTE kernel: [65922.914905]
Jun 24 16:00:58 UBUTESTE kernel: [65930.762299] EXT4-fs (sda2): Delayed block allocation failed for inode 9961617 at logical offset 235520 with max blocks 2048 with error 117
Jun 24 16:00:58 UBUTESTE kernel: [65930.765952] EXT4-fs (sda2): This should not happen!! Data will be lost"

I'm putting more information on the pve settings, if someone can help me I will be very grateful.

lsscsi (each node)

lsblk (each node)
Saida do PVE01

sda 8:0 0 931G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part /boot/efi
└─sda3 8:3 0 930.5G 0 part
├─pve-swap 253:0 0 8G 0 lvm
├─pve-root 253:1 0 96G 0 lvm /
├─pve-data_tmeta 253:2 0 8.1G 0 lvm
│ └─pve-data 253:4 0 794.3G 0 lvm
└─pve-data_tdata 253:3 0 794.3G 0 lvm
└─pve-data 253:4 0 794.3G 0 lvm
sdb 8:16 0 15T 0 disk
└─dell-me-01 253:5 0 15T 0 mpath
├─mpath_dell_vol_proxmox-vm--1001--disk--0 253:6 0 300G 0 lvm
└─mpath_dell_vol_proxmox-vm--10002--disk--0 253:7 0 300G 0 lvm
sdc 8:32 0 15T 0 disk
└─dell-me-01 253:5 0 15T 0 mpath
├─mpath_dell_vol_proxmox-vm--1001--disk--0 253:6 0 300G 0 lvm
└─mpath_dell_vol_proxmox-vm--10002--disk--0 253:7 0 300G 0 lvm
sdd 8:48 0 15T 0 disk
└─dell-me-01 253:5 0 15T 0 mpath
├─mpath_dell_vol_proxmox-vm--1001--disk--0 253:6 0 300G 0 lvm
└─mpath_dell_vol_proxmox-vm--10002--disk--0 253:7 0 300G 0 lvm
sde 8:64 0 15T 0 disk
└─dell-me-01 253:5 0 15T 0 mpath
├─mpath_dell_vol_proxmox-vm--1001--disk--0 253:6 0 300G 0 lvm
└─mpath_dell_vol_proxmox-vm--10002--disk--0 253:7 0 300G 0 lvm

/etc/multipath.conf
defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
rr_min_io 100
failback immediate
no_path_retry queue
}
blacklist {
wwid .*
}

blacklist_exceptions {
wwid 3600c0ff00053643c5e2ed26001000000

}

devices {
device {
vendor "DellEMC"
product "ME4084"
path_grouping_policy group_by_prio
prio rdac
#polling_interval 5
path_checker rdac
path_selector "round-robin 0"
hardware_handler "1 rdac"
failback immediate
features "2 pg_init_retries 50"
no_path_retry 30
rr_min_io 100
}
}

multipaths {
multipath {
wwid 3600c0ff00053643c5e2ed26001000000
alias dell-me-01
}
}

Saida do PVE01
mutipath -ll (from each node)
root@pve01:/etc# multipath -ll
dell-me-01 (3600c0ff00053643c5e2ed26001000000) dm-5 DellEMC,ME4
size=15T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=30 status=active
|- 15:0:0:0 sdb 8:16 active ready running
|- 17:0:0:0 sdd 8:48 active ready running
|- 16:0:0:0 sdc 8:32 active ready running
`- 18:0:0:0 sde 8:64 active ready running

pvs
root@pve01:/etc# pvs
PV VG Fmt Attr PSize PFree
/dev/mapper/dell-me-01 mpath_dell_vol_proxmox lvm2 a-- <15.00t 14.41t
/dev/sda3 pve lvm2 a-- <930.50g 16.00g

vgs
root@pve01:/etc# vgs
VG #PV #LV #SN Attr VSize VFree
mpath_dell_vol_proxmox 1 2 0 wz--n- <15.00t 14.41t
pve 1 3 0 wz--n- <930.50g 16.00g

lvs
root@pve01:/etc# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
vm-10002-disk-0 mpath_dell_vol_proxmox -wi-ao---- 300.00g
vm-1001-disk-0 mpath_dell_vol_proxmox -wi-ao---- 300.00g
data pve twi-a-tz-- <794.29g 0.00 0.24
root pve -wi-ao---- 96.00g
swap pve -wi-a----- 8.00g

pvesm status
root@pve01:/etc# pvesm status
Name Type Status Total Used Available %
PATH_ISCSI_LEFT_A0 iscsi active 0 0 0 0.00%
PATH_ISCSI_LEFT_A1 iscsi active 0 0 0 0.00%
PATH_ISCSI_RIGHT_B0 iscsi active 0 0 0 0.00%
PATH_ISCSI_RIGHT_B1 iscsi active 0 0 0 0.00%
local dir active 98559220 80199012 13310660 81.37%
local-lvm lvmthin active 832868352 395112746 437755605 47.44%
mpath_dell_vol_proxmox lvm active 16106123264 1048576000 15057547264 6.51%

pvesm list <storage-name>
root@pve01:/etc# pvesm list mpath_dell_vol_proxmox
Volid Format Type Size VMID
mpath_dell_vol_proxmox:vm-10002-disk-0 raw images 322122547200 10002
mpath_dell_vol_proxmox:vm-1001-disk-0 raw images 322122547200 1001
mpath_dell_vol_proxmox:vm-102-disk-0 raw images 429496729600 102

qm config <vmid>
root@pve01:/etc# qm config 1001
balloon: 0
boot: order=scsi0;net0
cores: 2
memory: 6048
name: UBTESTE
net0: virtio=32:3A:CE:A2:84:B2,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: mpath_dell_vol_proxmox:vm-1001-disk-0,discard=on,size=300G
scsihw: virtio-scsi-single
smbios1: uuid=eec333d2-9856-4fbf-bd0c-1c06dcf0140c
sockets: 4
vga: qxl
vmgenid: 76c618ae-d102-41d5-a33e-583100cd04b7

Is there absolutely no errors on the hypervisor side? /var/log/messages? journalctl?
In my log the only error message are these:
Jun 24 10:03:56 pve01 smartd[1286]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 73 to 80
Jun 24 10:03:56 pve01 smartd[1286]: Device: /dev/bus/0 [megaraid_disk_01] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 82 to 83
Jun 24 10:14:00 pve01 rsyslogd: omfwd: TCPSendBuf error -2027, destruct TCP Connection to 10.1.1.16:1514 [v8.1901.0 try https://www.rsyslog.com/e/2027 ]
Jun 24 12:33:56 pve01 smartd[1286]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 80 to 81
Jun 24 13:03:56 pve01 smartd[1286]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 81 to 84
Jun 24 13:03:56 pve01 smartd[1286]: Device: /dev/bus/0 [megaraid_disk_01] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 83 to 79
Jun 24 14:03:56 pve01 smartd[1286]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 84 to 81
Jun 24 14:03:56 pve01 smartd[1286]: Device: /dev/bus/0 [megaraid_disk_01] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 79 to 84
Jun 24 14:33:56 pve01 smartd[1286]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 81 to 82

bbgeek17 · Jun 24, 2021

I dont see anything obviously wrong. Its interesting that you have "vm-102-disk-0" listed in your pvesm, but LVM does not have it..

I'd suggest creating a FIO test and running in directly on hypervisor against an LVM slice on that disk. You can find plenty of write/read intensive examples on the net.
The issue you are having could be a transceiver, cable, storage system, hba, kernel, kvm or even memory... Continue reducing the number of moving parts, ie drop it down to one path.
If you can repro it without KVM, directly on hypervisor OS - I recommend opening a case with Dell

José Roberto · Jun 28, 2021

Hey guys,

After some tests, I disabled the vm discard and it's no longer crashing the machines.
Does anyone have any idea why this is?

With this configuration, is it running normally:
root@pve01:/var/lib/vz/template# qm config 101
boot: cdn
bootdisk: scsi0
cores: 4
ide2: none,media=cdrom
memory: 9216
name: Graylog
net0: virtio=BE:62:A4:57

F:BA,bridge=vmbr0
numa: 0
ostype: l26
scsi0: mpath_dell_vol_proxmox:vm-101-disk-0,size=300G
scsihw: virtio-scsi-pci
smbios1: uuid=b5809f84-396a-404e-9606-8e2d1b062997
sockets: 2
vga: qxl

bbgeek17 · Jun 28, 2021

Good find.
https://www.oreilly.com/library/vie...05/03431488-8696-41e3-92e2-a60482b6e4e9.xhtml

Either there is a problem between storage support for TRIM/Discard or there is an incompatibility between KVM implementation and Dell. Since proxmox is using KVM, you could post/reach out to KVM dedicated list and see if there are any know issues.

Another option, now that you tracked down the culprit, open a case with Dell, they support Linux/LVM and TRIMP/UNMAP, perhaps they can help:
https://www.dell.com/support/manual...0c61f1-4598-4033-bc76-e98cf0866bde&lang=en-us

José Roberto · Jun 28, 2021

Do you think it would have any problem to work without using the discard?

Dd9 · Jun 28, 2021

https://forum.proxmox.com/threads/sas-disk-array-and-discard-option.87901/
A similar problem occured to me with a Dell ME4024 array, from the same series. It seem to be indeed a problem with discard and the array. Contrary to what I've said in the thread, I could in the end replicate the problem with a simple QEMU hypervisor and Ubuntu 20.04. Dell support didn't give me much help. The VMs get corrupt after some I/O. I saw that when you provide less RAM to the VMs, the corruption happens faster.
In the end I gave up using a KVM/QEMU based solution with this array.

bbgeek17 · Jun 28, 2021

@José Roberto : It shouldn't be a problem. Without Trim you will have to live with different reporting of used/available space between array and the OS. OS would be reporting more free space because it released those blocks, but array has not cleared them. However, it should be fine as OS knows they are really "free" and naturally will re-use them when time comes.

Dd9 · Jun 29, 2021

bbgeek17 said:
@José Roberto : It shouldn't be a problem. Without Trim you will have to live with different reporting of used/available space between array and the OS. OS would be reporting more free space because it released those blocks, but array has not cleared them. However, it should be fine as OS knows they are really "free" and naturally will re-use them when time comes.

For me, it was a problem because without TRIM, the performance was much worse, it went from 1.5 GB/s to 500 MB/s. I don't know why.

José Roberto · Jun 29, 2021

What tool did you run to validate this test?

José Roberto · Jun 30, 2021

Today I was doing some performance testing with the wire and I had these errors again even with the discard turned off:

Jun 29 15:39:45 teste4 kernel: [ 7114.998104] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410343: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7114.999495] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410350: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7115.000043] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410344: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7115.001378] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410349: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7115.001860] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410340: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7115.003071] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410351: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7115.003607] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410348: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7115.004631] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410345: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7115.005171] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410342: comm vim: iget: checksum invalid
Jun 29 15:39:45 teste4 kernel: [ 7115.006625] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410341: comm vim: iget: checksum invalid
Jun 29 15:40:34 teste4 kernel: [ 7164.283611] EXT4-fs error: 28 callbacks suppressed
Jun 29 15:40:34 teste4 kernel: [ 7164.283613] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410343: comm vim: iget: checksum invalid
Jun 29 15:40:34 teste4 kernel: [ 7164.284551] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410350: comm vim: iget: checksum invalid
Jun 29 15:40:34 teste4 kernel: [ 7164.286066] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410344: comm vim: iget: checksum invalid
Jun 29 15:40:34 teste4 kernel: [ 7164.286626] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410349: comm vim: iget: checksum invalid
Jun 29 15:40:34 teste4 kernel: [ 7164.287871] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410340: comm vim: iget: checksum invalid
Jun 29 15:40:34 teste4 kernel: [ 7164.289085] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410351: comm vim: iget: checksum invalid
Jun 29 15:40:34 teste4 kernel: [ 7164.290299] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410348: comm vim: iget: checksum invalid
Jun 29 15:40:34 teste4 kernel: [ 7164.291265] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410345: comm vim: iget: checksum invalid
Jun 29 15:40:35 teste4 kernel: [ 7164.293524] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410342: comm vim: iget: checksum invalid
Jun 29 15:40:35 teste4 kernel: [ 7164.293989] EXT4-fs error (device sda2): ext4_lookup:1701: inode #11410341: comm vim: iget: checksum invalid
Jun 29 15:56:23 teste4 kernel: [ 8112.730953] SQUASHFS error: xz decompression failed, data probably corrupt
Jun 29 15:56:23 teste4 kernel: [ 8112.731007] SQUASHFS error: squashfs_read_data failed to read block 0x1b913cf
Jun 29 15:56:23 teste4 kernel: [ 8112.832557] SQUASHFS error: xz decompression failed, data probably corrupt
Jun 29 15:56:23 teste4 kernel: [ 8112.832607] SQUASHFS error: squashfs_read_data failed to read block 0x1b913cf
Jun 29 15:56:23 teste4 kernel: [ 8112.833886] SQUASHFS error: xz decompression failed, data probably corrupt
Jun 29 15:56:23 teste4 kernel: [ 8112.833934] SQUASHFS error: squashfs_read_data failed to read block 0x1b913cf
Jun 29 15:56:23 teste4 snapd[642]: fatal error: fault
Jun 29 15:56:23 teste4 snapd[642]: [signal SIGBUS: bus error code=0x2 addr=0x55b602a67f15 pc=0x55b601d53883]
Jun 29 15:56:25 teste4 snap-failure[13355]: fatal error: missing stackmap
Jun 29 15:57:44 teste4 kernel: [ 8194.151149] mv[13423]: segfault at 0 ip 000055d1b09f25a0 sp 00007fffd9578568 error 6 in mv[55d1b09e6000+18000]
Jun 29 15:59:11 teste4 kernel: [ 8281.126589] mv[13557]: segfault at 0 ip 000055c1fb2df5a0 sp 00007ffe276f9328 error 6 in mv[55c1fb2d3000+18000]

bbgeek17 · Jun 30, 2021

May be an obvious thing, but - have you started from scratch? If not, your filesystem could still be corrupted since having "discard=on".

José Roberto · Jun 30, 2021

bbgeek17 said:
May be an obvious thing, but - have you started from scratch? If not, your filesystem could still be corrupted since having "discard=on".

I set up a zeroed vm.

José Roberto · Jun 30, 2021

Hello proxmox, could you release the 6.0 version link? So that I can test it, to see if I can solve this problem apparently and version compatibility issue.

José Roberto · Jul 1, 2021

I tested it with proxmox version 6.0-4 with discard enabled and had the same problem with trim -v --all to clear from storage:

Jul 1 16:06:23 ubu1 kernel: [ 1113.189618] EXT4-fs error (device sda2): ext4_validate_block_bitmap:384: comm fstrim: bg 34: bad block bitmap checksum
Jul 1 16:06:23 ubu1 kernel: [ 1113.194861] EXT4-fs warning (device sda2): ext4_trim_all_free:5198: Error -74 loading buddy information for 34

VM corrupting on iscsi multipath storage me4084

Member

Attachments

Distinguished Member

Member

Member

Attachments

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

New Member

Distinguished Member

New Member

Member

Member

Distinguished Member

Member

Member

Member

We value your privacy