Hello all,
I've been running proxmox for quite some time with no problem. Lately I got a couple new servers so I've been moving things around. One of the things I did was to use lz4 compression and encryption. I was using it before without trouble but I remade the pools.
Before I used znapzend for backups from SSD to local disks and pve-zsync for remote backups. All worked fine for a long time.
Now I use znapzend with 2 destinations, 1 to local disks and the other to a server on LAN connected by ssh. I still use pve-zsync for the remote backups.
On one of my new servers with a mirror of 1.6TB HPE SATA SSD's I got a zfs error. "status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup."
I was curious by the fact that the errors shown were only in snapshots and the READ WRITE and CKSUM counts were all 0 on all disks. Interestingly if you list snapshots it gives an i/o error.
root@infpmx01:~# zfs list -t snapshot rpool/enc/vm-110-disk-0
cannot iterate filesystems: I/O error
NAME USED AVAIL REFER MOUNTPOINT
rpool/enc/vm-110-disk-0@rep_default_2022-06-06_23:45:33 1.20M - 16.9G -
rpool/enc/vm-110-disk-0@rep_default_2022-06-07_02:30:27 1.16M - 16.9G -
The affected snapshots aren't found if you try to delete them.
Anyway, I just assumed I had a bad disk or drive backplane or something but in a week this has happened to my original server that I know is good. Same exact way:
root@infpmx01:~# zpool status -v rpool
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub canceled on Wed Jun 8 23:52:47 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
wwn-0x55cd2e404b51d71e ONLINE 0 0 0
wwn-0x55cd2e404b55cea6 ONLINE 0 0 0
wwn-0x55cd2e404b564309 ONLINE 0 0 0
wwn-0x55cd2e404b57c399 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
rpool/enc/vm-110-disk-0@2022-06-08-220000:<0x0>
rpool/enc/vm-104-disk-0@rep_default_2022-06-08_19:30:56:<0x0>
rpool/enc/vm-110-disk-0@rep_default_2022-06-08_20:45:40:<0x0>
rpool/enc/vm-111-disk-0@rep_default_2022-06-08_21:15:25:<0x0>
rpool/enc/vm-103-disk-0@2022-06-08-220000:<0x0>
I didn't make the znapzend connection (2 destinations) until I started writing this but.. I have a suspicion znapzend is causing corruption... Any other ideas? I remade the pool because I changed from raidz2 to raidz1 (same ashift=12) and the SMART stats look perfectly fine on all the disks. If the backplane was failing the spinning disk pool should be affected and if an individual drive was failing there'd be a checksum error. I don't believe I have a failing drive. I think this is a bug.
I've been running proxmox for quite some time with no problem. Lately I got a couple new servers so I've been moving things around. One of the things I did was to use lz4 compression and encryption. I was using it before without trouble but I remade the pools.
Before I used znapzend for backups from SSD to local disks and pve-zsync for remote backups. All worked fine for a long time.
Now I use znapzend with 2 destinations, 1 to local disks and the other to a server on LAN connected by ssh. I still use pve-zsync for the remote backups.
On one of my new servers with a mirror of 1.6TB HPE SATA SSD's I got a zfs error. "status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup."
I was curious by the fact that the errors shown were only in snapshots and the READ WRITE and CKSUM counts were all 0 on all disks. Interestingly if you list snapshots it gives an i/o error.
root@infpmx01:~# zfs list -t snapshot rpool/enc/vm-110-disk-0
cannot iterate filesystems: I/O error
NAME USED AVAIL REFER MOUNTPOINT
rpool/enc/vm-110-disk-0@rep_default_2022-06-06_23:45:33 1.20M - 16.9G -
rpool/enc/vm-110-disk-0@rep_default_2022-06-07_02:30:27 1.16M - 16.9G -
The affected snapshots aren't found if you try to delete them.
Anyway, I just assumed I had a bad disk or drive backplane or something but in a week this has happened to my original server that I know is good. Same exact way:
root@infpmx01:~# zpool status -v rpool
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub canceled on Wed Jun 8 23:52:47 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
wwn-0x55cd2e404b51d71e ONLINE 0 0 0
wwn-0x55cd2e404b55cea6 ONLINE 0 0 0
wwn-0x55cd2e404b564309 ONLINE 0 0 0
wwn-0x55cd2e404b57c399 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
rpool/enc/vm-110-disk-0@2022-06-08-220000:<0x0>
rpool/enc/vm-104-disk-0@rep_default_2022-06-08_19:30:56:<0x0>
rpool/enc/vm-110-disk-0@rep_default_2022-06-08_20:45:40:<0x0>
rpool/enc/vm-111-disk-0@rep_default_2022-06-08_21:15:25:<0x0>
rpool/enc/vm-103-disk-0@2022-06-08-220000:<0x0>
I didn't make the znapzend connection (2 destinations) until I started writing this but.. I have a suspicion znapzend is causing corruption... Any other ideas? I remade the pool because I changed from raidz2 to raidz1 (same ashift=12) and the SMART stats look perfectly fine on all the disks. If the backplane was failing the spinning disk pool should be affected and if an individual drive was failing there'd be a checksum error. I don't believe I have a failing drive. I think this is a bug.
Last edited: