ZFS zfs_send_corrupt_data parameter not working

davidindra

New Member
Oct 14, 2017
29
0
1
24
Hello,
I have problem with my ZFS pool. My RAM recently flipped bit and as a result my pool got corrupt by permanent error (zpool status -vx):
Code:
errors: Permanent errors have been detected in the following files:

        rpool/data/vm-101-disk-1@experimenty:<0x1>
        rpool/data/vm-101-disk-1:<0x1>
        rpool/data/vm-101-disk-1@backup:<0x1>
My plan in resolving this problem (accepting some data got corrupted) was to set /sys/module/zfs/parameters/zfs_send_corrupt_data to 1 and then zfs send | zfs receive. But I am unable to that - zfs send fails like this (send_corrupt_data tunable seems to have no effect):
Code:
internal error: Invalid exchange
cannot receive incremental stream: checksum mismatch or incomplete stream.
Checksum error gets logged into zpool events of course.

Can you help me please? How to force zfs send to replace unreadable data by some static sequence (what is the expected behavior with respect to latest ZfsOnLinux sources)?

In case that solving this problem is hard - my second plan was to overwrite invalid data by zeroes by dd for example. But how to locate that problematic array of data?

Thank you for your help! :)
David
 
please post the full zfs send | zfs receive command line you used.
 
It was something like this (I used the option for continuing interrupted send):
Code:
zfs send -t 1-117ac0b39b-d0-789c636064000310a500c4ec50360710e72765a5269740f80cd8e4d3d28a5381f20c07ea53a5a1f26c48f2499525a9c5407a454966800c16fd25f9e9a599290c0cc7421f7feffbfd9fc101499e132c9f97989bcac05054909f9fa39f925892a85f96ab6b6860a89b92599cad6be8a0ab9b919a98a2abcb8000006eb620f8  | pv | zfs receive -s rpool/data/offload-vm-101-disk-1
 
and the original, non resuming one?
 
I believe it was just this (I can't find it in .bash_history):
Code:
zfs send rpool/data/vm-101-disk-1  | pv | zfs receive -s rpool/data/offload-vm-101-disk-1
 
Also, my pveversion -v, if interested:
Code:
root@prox2:~# pveversion -v
proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.13.8-3-pve: 4.13.8-30
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
 
Sending data with corruption is not forbidden. Receiving data with corruption initiate error.

Suggestion:
1. Send to file
2. Send snap to pool with turned off checksum.
 
@Nemesiz I don't understand. What I understand from https://github.com/zfsonlinux/zfs/blob/master/module/zfs/dmu_send.c, when zfs send arrives to some data that issue IO error, it:
a) fails, when /sys/modules/zfs/parameters/zfs_send_corrupt_data is set on 0
b) otherwise, it replaces invalid data with some static sequence.
So I don't understand why zfs receive should fail - it just receives different data.
Btw I already turned checksums off.
 
can you create pool/sub with checksum off and send corrupted snap to pool/sub/name ?

Why receive can fail? - its checks data integrity and send/receive is not exception.
 
Relax. I`m not talking about send, I`m talking about receive. Anyway its not my issue.

zfs send snap > file
mount file
 
Sorry :) for being arrogant. I haven't got a lot of experiences with ZFS. I will try what you are suggesting with sending zfs send into file and tell you the result.
 
Unfortunately, issued following command:
Code:
zfs send rpool/data/vm-101-disk-1 > ./vm-191-disk-1-zfs-send
and ended up with this:
Code:
internal error: Invalid exchange
Aborted
Any other tip?
 
Found out this: "The 'Invalid exchange' error you're seeing is EBADE which was what ZFS uses internally to report a checksum error." (here) - what doesn't make sense, because cat /sys/module/zfs/parameters/zfs_send_corrupt_data still gives 1. It again looks to me that code seen here doesn't correspond to what is built in my system. What do you think @fabian?
 
your error message indicates that verifying the checksum fails when receiving (you can easily test this by sending the stream to a file, and then piping that file's content to zfs recv). I guess this is either because of the resume operation (which checks that the existing data matches what is expected) or because you are in fact sending an incremental stream (either explicitly, or implicitly via something like -R). sending with the corrupt_data flag set works as expected when doing a single full send, and receiving is possible as well.

edit: resume of a full stream works as well..
 
I've tried it without parameters and it failed again:
Code:
root@prox2:~# zfs send rpool/data/vm-101-disk-1@actual | pv | zfs recv rpool/data/offload2-vm-101-disk-1
 113GiB 1:07:52 [28.5MiB/s] [                                    <=>                                         ]
internal error: Invalid exchange
cannot receive new filesystem stream: checksum mismatch or incomplete stream
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!