datastore corrupted after reboot (iscsi/lvm/ext4/multipath)

woops

New Member
Jun 15, 2023
3
0
1
Hello,

We regularly experience corruption issues with our PBS backup storage when rebooting the PBS server. We are using PBS version 7 and have updated it to version 8 with the same problem.

Our datastore volume is hosted on an HP disk array that exports an iSCSI volume via multipath.

Our iSCSI volume is partitioned with LVM and formatted as ext4.

Our fstab mount options are as follows:
Code:
/dev/mapper/saveVM /mnt/backup ext4 defaults,relatime,_netdev,x-systemd.requires=iscsid.service 0 0

During a problematic restart, the errors are as follows:
Code:
...
error during snapshot file listing: 'unable to load blob '"/mnt/backup/vm/116/2023-07-05T20:00:01Z/index.json.blob"' - unable to parse raw blob  - wrong magic'
kernel: EXT4-fs warning (device dm-4): ext4_dirblock_csum_verify:404: inode #701072025: comm UPID:restaurix:: No space for directory leaf checksum. Please run e2fsck -D.
kernel: EXT4-fs error (device dm-4): __ext4_find_entry:1673: inode #701072025: comm UPID:xx:: checksumming directory block 0
....

The "index.json.blob" file is corrupted and contains only null bytes.

Our "/etc/multipath.conf" file has the following options:
Code:
blacklist {
        wwid .*
}

blacklist_exceptions {
wwid "3600c0ff000669d2185a3a66401000000"
wwid "3600c0ff000669fbf86a3a66401000000"
}

multipaths {
multipath {
wwid "3600c0ff000669d2185a3a66401000000"
alias mpath-saveVM-A
}
multipath {
wwid "3600c0ff000669fbf86a3a66401000000"
alias mpath-saveVM-B
}
}

defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
uid_attribute ID_SERIAL
rr_min_io 100
failback immediate
no_path_retry queue
user_friendly_names yes
}


On the HP array side, we have tested with both "write-back" and "write-through" cache writing modes.


The problem seems to be recurring if a reboot is triggered during a PBS backup.


Sorry for the lengthy message. Do you have any ideas to solve this problem or any leads to investigate the source of the issue?


Thank you in advance!
Vincent
 
Hello floh8,

Thank you for your response, but I must admit I'm not sure if you're being ironic or not.. :)

Just for your information, I had to shut down my server to move it to a different data center, and I'll have to do it again because a power outage is scheduled for December..

Upon reboot, the system is prompting me to perform an fsck on the partition, which is not successful due to inode corruption and the partition being unable to mount. I must have misconfigured something, but I can't figure out what..
 
The problem seems to be recurring if a reboot is triggered during a PBS backup.
On the HP array side, we have tested with both "write-back" and "write-through" cache writing modes.
my guess is that the cache does not completely get written through? which corrupts your filesystem?

in any case, i'd try to fix the corrupt filesystem, this does not look like it's a misconfiguration of the pbs, more likeyly a misconfiguration (or misbehaviour) of the raid
 
Hello dcsapak and thank you!

The issue probably stems from a misconfiguration on my part (incorrect fstab mount options, HP storage array configuration, or iSCSI volume setup,..).


Have a great day/evening/vacation.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!