[BUG?] ZFS data corruption on Proxmox 4

gkovacs · Nov 6, 2015

I have installed Proxmox 4 on a server using ZFS RAID10 in the installer. The disks are brand new (4x2GB, attached to the Intel motherboard SATA connectors), and there are no SMART errors / reallocated sectors on them. I have run a memtest for 30 minutes, everything seems fine hardware-wise.

After restoring a few Vms (a hundred or so gigabytes), the system reported read errors in some files. Scrubbing the pool shows permanent read errors in the recently restored guest files:

Code:

[B]# zpool status -v[/B]
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0h4m with 1 errors on Thu Nov  5 21:30:02 2015
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     1
          mirror-0  ONLINE       0     0     2
            sdc2    ONLINE       0     0     2
            sdf2    ONLINE       0     0     2
          mirror-1  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        //var/lib/vz/images/501/vm-501-disk-1.qcow2

If I delete the VMs and scrub the pool again, the errors are gone. If I restore new VMs, the errors are back.
Anybody have any idea what could be happening here?

Code:

[B]# zdb -mcv rpool[/B]
Traversing all blocks to verify checksums and verify nothing leaked ...

loading space map for vdev 1 of 2, metaslab 30 of 116 ...
50.1G completed ( 143MB/s) estimated time remaining: 0hr 01min 09sec        zdb_blkptr_cb: Got error 52 reading <50, 61726, 0, 514eb>  -- skipping
59.8G completed ( 145MB/s) estimated time remaining: 0hr 00min 00sec
Error counts:

        errno  count
           52  1

        No leaks (block sum matches space maps exactly)

        bp count:          928688
        ganged count:           0
        bp logical:    115011845632      avg: 123843
        bp physical:   62866980352      avg:  67694     compression:   1.83
        bp allocated:  64258899968      avg:  69193     compression:   1.79
        bp deduped:             0    ref>1:      0   deduplication:   1.00
        SPA allocated: 64258899968     used:  1.61%

        additional, non-pointer bps of type 0:       4844
        Dittoed blocks on same vdev: 297

Code:

[B]# pveversion -v[/B]
proxmox-ve: 4.0-20 (running kernel: 4.2.3-2-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-20
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve4~jessie

tom · Nov 6, 2015

can you test with latest zfs? (upgrade your packages from pvetest)

gkovacs · Nov 6, 2015

tom said:
can you test with latest zfs? (upgrade your packages from pvetest)

I have upgraded to the latest packages (-21 kernel and ZFS 0.6.5.3) from the pvetest repo, rebooted the system, and repeated the restores. Unfortunately the files are corrupted again after the first scrub.

Code:

Nov  6 13:14:28 proxmox3 kernel: [    5.358851] ZFS: Loaded module v0.6.5.3-1, ZFS pool version 5000, ZFS filesystem version 5

Code:

# pveversion -v
proxmox-ve: 4.0-21 (running kernel: 4.2.3-2-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-21
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

sigxcpu · Nov 6, 2015

Do you have ECC memory? SATA cables?

gkovacs · Nov 6, 2015

sigxcpu said:
Do you have ECC memory? SATA cables?

I don't have ECC memory, but in the OP I have clearly stated that I have run a memtest for 30 minutes yesterday, and have extensively diagnosed the drives with smartmontools. I have also installed mcelog today, it is running for a couple of hours now, and it has not reported any memory errors during todays testing.

This server has been in use for 2 years without any hardware problems, also the drives in the pool are brand new, so hardware errors are quite unlikely in this case. But let's say this is a hardware error: how do I find it?

sigxcpu · Nov 6, 2015

I doubt mcelog will show you memory issues (I really don't know).
You will be amazed, I run ECC memory and I get bits flipped from time to time, although a past memtest was fine.
I also have 2 hard drives that show no error counters (0), but they've shown ZFS errors few times and one of them is very slow writing. I've thrown them out.
They are backplane connected and the backplane is routed through a SAS switch using a SFF-8087 cable, so it is common to all 12 drives there, still these are the only drives that have shown the errors.

Given that there are not many people with ZFS checksum errors, we can safely assume it is something wrong in your specific scenario. Software being the same, hardware remains to be pointed as cause, correct? Try to reinsert the memory modules and, if possible, exchange the SATA cables.
Whatever can be a corruption source after ZFS is done with its checksum (memory, cables, drives) needs to be carefully checked.

gkovacs · Nov 7, 2015

sigxcpu said:
if possible, exchange the SATA cables.
Whatever can be a corruption source after ZFS is done with its checksum (memory, cables, drives) needs to be carefully checked.

The drives were installed in trays before, connected to a backplane. I have removed the entire setup, and connected the drives directly to the mainboard with new SATA3 cables I got yesterday. After rebooting, restoring a few VMs, and scrubbing the pool again, the errors are back.

What I don't understand is this: if a cable were bad, and would only introduce an error sometimes, it still would not be able to corrupt both copies of the actual data (we are running RAID10, so all data must be mirrored). So how is it possible that after copying a single large file to the array, there is suddenly a "Permanent error" in it, meaning ZFS is unable to correct it even if it has TWO COPIES of it? If you think logically, this can't be caused by the cables or the drives, since the chance of two cables or two drives causing an error in the exact same place is essentially zero.

I'm almost positive it's a software error, but still if you (or anyone) got any more ideas about what and how to test, I certainly welcome them.

sigxcpu · Nov 7, 2015

If both drives are corrupted that means the source of error is upstream. Like you say, either software (I say this is unlikely because lots of people could have issues if ZFS borks the checksums) or... memory? Even CPUs and/or their caches can be broken sometimes.

gkovacs · Nov 27, 2015

sigxcpu said:
If both drives are corrupted that means the source of error is upstream. Like you say, either software (I say this is unlikely because lots of people could have issues if ZFS borks the checksums) or... memory? Even CPUs and/or their caches can be broken sometimes.

Things are getting more interesting. I have installed a replacement motherboard (different, newer chipset), and lo and behold the corruption is still there! So it's not the motherboard / chipset then.

I have done some RAM LEGO after: with 1 or 2 DIMMs (doesn't matter what size or speed or slot, tried several different pairs) there is no corruption, with 4 DIMMs the data gets corrupted. So it's not the RAM then.

What's left, really? The CPU? Started to fiddle with BIOS settings, and when I disabled Turbo and EIST (Enhanced Intel Speedstep Technology), the corruption became less likely! So it's either a kernel / ZFS regression that is happening when EIST scales the CPU frequency up and down AND all the RAM slots are in use, or my CPU is defective. Will test with another CPU this weekend.

sigxcpu · Nov 27, 2015

Is your CPU a Haswell? I'm asking because lots of people have issues with Haswell C-States.

gkovacs · Nov 27, 2015

sigxcpu said:
Is your CPU a Haswell? I'm asking because lots of people have issues with Haswell C-States.

No, it's Sandy Bridge.

Still, I am more inclined to believe that there is some Linux kernel issue at play here, instead of a defective CPU. It sounds so weird.

Nemesiz · Nov 27, 2015

gkovacs said:
I don't have ECC memory, but in the OP I have clearly stated that I have run a memtest for 30 minutes yesterday, and have extensively diagnosed the drives with smartmontools. I have also installed mcelog today, it is running for a couple of hours now, and it has not reported any memory errors during todays testing.

This server has been in use for 2 years without any hardware problems, also the drives in the pool are brand new, so hardware errors are quite unlikely in this case. But let's say this is a hardware error: how do I find it?

30 min of memtest is enough for one test cycle?

I had some problems with my desktop pc - http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023883.html

gkovacs · Nov 27, 2015

Nemesiz said:
30 min of memtest is enough for one test cycle?

I had some problems with my desktop pc - http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023883.html

Since when I wrote that, I have done a 4 hour (SMP) and 9 hour (single-core) memtest run, with absolutely no errors.

Also please note that the data corruption happens with any kind and size of RAM, but only when there are 4 modules installed. I have put in several DIMMS with varying brands, sizes and speeds, and all of them produce the same issue when four DIMMs are installed.

gkovacs · Dec 2, 2015

Nemesiz said:
30 min of memtest is enough for one test cycle?

I had some problems with my desktop pc - http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023883.html

Nemesiz it looks like we have the exact same problem!
- Is your home setup also a Proxmox installation?
- Kernel / ZFS version?
- Motherboard / SATA chipset / CPU?
- Do you also have 4 DIMMs in your MB? Does it produce the error with only 2 DIMMs?

Nemesiz · Dec 2, 2015

gkovacs said:
Nemesiz it looks like we have the exact same problem!
- Is your home setup also a Proxmox installation?
- Kernel / ZFS version?
- Motherboard / SATA chipset / CPU?
- Do you also have 4 DIMMs in your MB? Does it produce the error with only 2 DIMMs?

Kernel : Linux ubuntu 3.13.0-70-generic #113-Ubuntu SMP Mon Nov 16 18:34:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
ZFS : 0.6.4-3~trusty

Motherboad : Asus P5E-VM HDMI
SATA chipset : Intel G35 Intel ICH9R
CPU : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz
RAM: 4 slots various manufacturers and sizes. Total 6GB.

NO error with ram. I don`t use HDD at my desktop PC at all and the problem appeared then I wanted to keep files for sort time in 2x 1TB ZFS mirror.

mir · Dec 2, 2015

Nemesiz said:
RAM: 4 slots various manufacturers and sizes. Total 6GB.

That means different size ram sticks. How are the sticks paired? Remember nowadays ram is dual channeled and each channel must have identical ram sticks (size, MHz, parity etc).

Nemesiz · Dec 2, 2015

Code:

  *-memory
       description: System Memory
       physical id: 33
       slot: System board or motherboard
       size: 6GiB
     *-bank:0
          description: DIMM DDR2 Synchronous 800 MHz (1.2 ns)
          product: PartNum0
          vendor: Manufacturer0
          physical id: 0
          serial: SerNum0
          slot: DIMM0
          size: 2GiB
          width: 64 bits
          clock: 800MHz (1.2ns)
     *-bank:1
          description: DIMM DDR2 Synchronous 800 MHz (1.2 ns)
          product: PartNum1
          vendor: Manufacturer1
          physical id: 1
          serial: SerNum1
          slot: DIMM1
          size: 1GiB
          width: 64 bits
          clock: 800MHz (1.2ns)
     *-bank:2
          description: DIMM DDR2 Synchronous 800 MHz (1.2 ns)
          product: PartNum2
          vendor: Manufacturer2
          physical id: 2
          serial: SerNum2
          slot: DIMM2
          size: 2GiB
          width: 64 bits
          clock: 800MHz (1.2ns)
     *-bank:3
          description: DIMM DDR2 Synchronous 800 MHz (1.2 ns)
          product: PartNum3
          vendor: Manufacturer3
          physical id: 3
          serial: SerNum3
          slot: DIMM3
          size: 1GiB
          width: 64 bits
          clock: 800MHz (1.2ns)

Memtest86+ can show more information but I don`t have a photo.

mir · Dec 2, 2015

maybe try to pair them differently.

bank 0 + 1: 1GB
bank 2 + 3: 2GB

gkovacs · Dec 3, 2015

Nemesiz said:
Kernel : Linux ubuntu 3.13.0-70-generic #113-Ubuntu SMP Mon Nov 16 18:34:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
ZFS : 0.6.4-3~trusty

Motherboad : Asus P5E-VM HDMI
SATA chipset : Intel G35 Intel ICH9R
CPU : Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz
RAM: 4 slots various manufacturers and sizes. Total 6GB.

NO error with ram. I don`t use HDD at my desktop PC at all and the problem appeared then I wanted to keep files for sort time in 2x 1TB ZFS mirror.

Okay, but the interesting thing would be to try it out with only 2 DIMMs installed in the motherboard (taking out the 1GB sticks, leaving only the 2GB sticks). In my case, when I only had 2 modules installed, there was no checksum error. If you could validate that, it would get us closer to solving this.

Nemesiz · Dec 3, 2015

Not always my PC was filled all 4 slots of ram and I never had any problem with ram.

If you have problem combining ram maybe:

1. One of the ram is damaged
2. They just cant work together

I had played with memtest on one pc. Was hard to detect damaged ram. 2 ram sticks could work separately but not together.

[BUG?] ZFS data corruption on Proxmox 4

Renowned Member

Proxmox Staff Member

Renowned Member

Well-Known Member

Renowned Member

Well-Known Member

Renowned Member

Well-Known Member

Renowned Member

Well-Known Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Renowned Member