MDRAID & O_DIRECT

yes, but while zfs is getting regular scrub, silent bitrot could indeed happen on partition1+2 on system which is rarely getting touched/updated. such issue will hit you when you don't expect it. chances are low indeed, but the sectors of partition1+2 don't get regular check and this is something which perhaps could be adressed somehow by some check/patrol-read/checksum routine.

and , btw:

>, as else IO errors on setting up a new kernel there would get noticed

afaik, i/o errors typically happen on read, not on write.
 
Last edited:
silent bitrot could happen anyway, ESPs don't support filesystems with checksumming after all. if the disk has a hardware problem, it is very unlikely it will affect all ESPs but not the main part of the disk used by ZFS. I don't think some over-enginereed solution makes sense here. if you are truly worried about this, schedule a "proxmox-boot-tool refresh" every week - if the ESP part of the disk fails, it will tell you ;)
 
yes, you are right, chances ar low, but i'm not not sure if we should really call it "over engineered" to check boot env for disk issues and to have "zfs|btrfs scrub" equivalent for bootenv.

for all those who worry, here are some ideas how check could be done proactively:

1. patrol read of all blocks , check return value of sg_dd. maybe using sg_dd is better for detecting errors then using dd.
sg_dd if=/dev/sda1 of=/dev/null
sg_dd if=/dev/sdb1 of=/dev/null
sg_dd if=/dev/sda2 of=/dev/null
sg_dd if=/dev/sdb2 of=/dev/null

2. check filesystem on efi partitions
fsck.vfat -t /dev/sda2
fsck.vfat -t /dev/sdb2

3. compare md5 sums of all files in /dev/sda2 and /dev/sdb2


whoever is worried about this issue may create some checker/patrol-read script for this with the ideas above.

needs some work to put into, to automatically determine which disks/partitions being used, by evaulating /etc/kernel/proxmox-boot-uuids

maybe we can have "proxmox-boot-tool check" for this one day....
 
>afaik, i/o errors typically happen on read, not on write.

addon note for this:
https://www.enterprisestorageforum.com/hardware/drive-reliability-studies/

"The authors found that final read errors (read errors after multiple retries) are about two orders of magnitude more frequent in terms of drive days than any other non-transparent (non-recoverable) error.
Given this, the authors wrote that write errors rarely turned into non-transparent (non-recoverable) errors"

i remember some guy on lvm mailinglist was telling the same for this, that most errors show up on read, not on write.

>if you are truly worried about this, schedule a "proxmox-boot-tool refresh" every week - if the ESP part of the disk fails, it will tell you

given that information above, i doubt that rewriting boot-env with pbt-refresh will sufficiently protect you from failure here.
 
Last edited:
In general, it would be also good to note that the ESP itself is not really stateful and can be rebuilt from scratch relatively easily, e.g. using a PVE ISO's debug shell to do so. After all there are mainly the kernel and initrd's located on that.

And if one doesn't want to care about that, then doing a backup of the partition and saving that somewhere accessible might be more worthwhile compared to some elaborate checks that won't really help if something fails anyway. PBS and the client could be used for this. Might be worthwhile to create a mini how-to for such recovery scenarios though.
 
  • Like
Reactions: RolandK
yes, raid won't make backup obsolete.

but from my admin perspective, it would be logical consequence to regularly check/scrub 100% of your system disks and not only 99% (i.e. 100% minus boot-part minus efi-part) , even if it would be possible to recreate.

you could have that included into your monitoring and every admin will be happy, if monitoring provides early information on issues BEFORE things start getting worse.

i don't want to put pressure on getting this feature, but please don't belittle this because there is no time. every idea needs time to develop.

>PBS and the client could be used for this

yes, but /boot/efi needs to be mounted before backup and unmounted afterwards.
 
i don't want to put pressure on getting this feature, but please don't belittle this because there is no time. every idea needs time to develop.
No, we do not belittle this at all, and we do not delay this for time reason, the current system with having an ESP on every mirror device in addition of being able to recreate it is simply safe and redundant enough for the practical cases that we want to hedge against, that won't change.
yes, but /boot/efi needs to be mounted before backup and unmounted afterwards.
You can create a backup of the block device itself, but sure, mounting and creating a filesystem level backup works too.
 
  • Like
Reactions: RolandK