datastore corrupted after reboot (iscsi/lvm/ext4/multipath)

woops · Jul 7, 2023

Hello,

We regularly experience corruption issues with our PBS backup storage when rebooting the PBS server. We are using PBS version 7 and have updated it to version 8 with the same problem.

Our datastore volume is hosted on an HP disk array that exports an iSCSI volume via multipath.

Our iSCSI volume is partitioned with LVM and formatted as ext4.

Our fstab mount options are as follows:

Code:

/dev/mapper/saveVM /mnt/backup ext4 defaults,relatime,_netdev,x-systemd.requires=iscsid.service 0 0

During a problematic restart, the errors are as follows:

Code:

...
error during snapshot file listing: 'unable to load blob '"/mnt/backup/vm/116/2023-07-05T20:00:01Z/index.json.blob"' - unable to parse raw blob  - wrong magic'
kernel: EXT4-fs warning (device dm-4): ext4_dirblock_csum_verify:404: inode #701072025: comm UPID:restaurix:: No space for directory leaf checksum. Please run e2fsck -D.
kernel: EXT4-fs error (device dm-4): __ext4_find_entry:1673: inode #701072025: comm UPID:xx:: checksumming directory block 0
....

The "index.json.blob" file is corrupted and contains only null bytes.

Our "/etc/multipath.conf" file has the following options:

Code:

blacklist {
        wwid .*
}

blacklist_exceptions {
wwid "3600c0ff000669d2185a3a66401000000"
wwid "3600c0ff000669fbf86a3a66401000000"
}

multipaths {
multipath {
wwid "3600c0ff000669d2185a3a66401000000"
alias mpath-saveVM-A
}
multipath {
wwid "3600c0ff000669fbf86a3a66401000000"
alias mpath-saveVM-B
}
}

defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
uid_attribute ID_SERIAL
rr_min_io 100
failback immediate
no_path_retry queue
user_friendly_names yes
}

On the HP array side, we have tested with both "write-back" and "write-through" cache writing modes.

The problem seems to be recurring if a reboot is triggered during a PBS backup.

Sorry for the lengthy message. Do you have any ideas to solve this problem or any leads to investigate the source of the issue?

Thank you in advance!
Vincent

floh8 · Aug 1, 2023

the question is why reboot your pbs? the error is a follow.

woops · Aug 8, 2023

Hello floh8,

Thank you for your response, but I must admit I'm not sure if you're being ironic or not..

Just for your information, I had to shut down my server to move it to a different data center, and I'll have to do it again because a power outage is scheduled for December..

Upon reboot, the system is prompting me to perform an fsck on the partition, which is not successful due to inode corruption and the partition being unable to mount. I must have misconfigured something, but I can't figure out what..

dcsapak · Aug 9, 2023

woops said:
The problem seems to be recurring if a reboot is triggered during a PBS backup.

woops said:
On the HP array side, we have tested with both "write-back" and "write-through" cache writing modes.

my guess is that the cache does not completely get written through? which corrupts your filesystem?

in any case, i'd try to fix the corrupt filesystem, this does not look like it's a misconfiguration of the pbs, more likeyly a misconfiguration (or misbehaviour) of the raid

woops · Aug 9, 2023

Hello dcsapak and thank you!

The issue probably stems from a misconfiguration on my part (incorrect fstab mount options, HP storage array configuration, or iSCSI volume setup,..).

Have a great day/evening/vacation.

Search

Search

datastore corrupted after reboot (iscsi/lvm/ext4/multipath)

woops

New Member

floh8

Renowned Member

woops

New Member

dcsapak

Proxmox Staff Member

woops

New Member