power loss is not the only thing that can cause your system to drop dead. I'd only set sync to disabled if you don't care about losing the last few X of writes, where X depends on your work load.
sync=disabled
?why do you even want to disable it ?
Because I'm getting 10x fsyncs with sync disabled, and virtual machine are much much more faster.
if you've ssd zpool...no you don't need a SLOG except your SLOG is faster than the slowest ssd in your zpoolNo, I don't have a SLOG, do I need it even using SSD and NVMe drives?
No, I don't have a SLOG, do I need it even using SSD and NVMe drives?
Thanks.
Could you help me to understand which are the possible events which causes last writes to be lost?
Are snapshots safe with
- Power interruption without UPS
- Hard shutdown
- Kernel crash on Proxmox host
- Kernel crash on virtual machine?
- Qm process or container kill from Proxmox host?
- Snapshot?
- Other?
sync=disabled
?
yes, you are. you cannot just look at the disk state, the issue is thatIt's interesting to read.. Does anybody really experience data corruption of any kind or corrupted snapshot during the power loss with sync=disabled?
To my understanding, the consequences will be exactly the same as if the power loss happened ~ 5 seconds earlier with sync=standard. Am I wrong?
I did lose a whole pool once because of a power loss while an SSD mirror was trimming (and this was with sync not disabled). I recently bought my first enterprise-ish SSD with PLP and the sync writes are much faster (from 400 to 17500 inIt's interesting to read.. Does anybody really experience data corruption of any kind or corrupted snapshot during the power loss with sync=disabled?
To my understanding, the consequences will be exactly the same as if the power loss happened ~ 5 seconds earlier with sync=standard. Am I wrong?
pveperf
). They can be cached because of the power loss protection and I don't have to worry about unsafe settings or power loss or poorly-timed reboots anymore. No wonder they are so often recommended on this forum.Yeah, and they usually deliver on the specs or can handle even more, unlike many consumer SSDs where you can get close to the specs only in very defined circumstancesrecently bought my first enterprise-ish SSD with PLP and the sync writes are much faster
Thanks for your answer, Fabian. Sorry, I didn't make it clear. I understand that when we are talking about distributed system - it totally depends on the application, If the app can't handle that situation, then of cause one node may be confused about the state of the other node. Totally agree.yes, you are. you cannot just look at the disk state, the issue is that
- application writes to disk with sync, hands out reply corresponding to the persisted state (or does something else that has side-effects)
- crash
with sync, the on-disk and the replied-with state are in agreement
with sync=disabled, the on-disk state and the replied-with state are in disagreement
a basic example: the reply contains some kind of (auto-incremented) ID, the system starts up again, assigns that ID again, the client that got it before the crash will be rather confused since its local state and the server state are not referring to the same thing under that ID.
when you disable sync semantics, you basically break an invariant of the application logic since you take away one mechanism that is there to ensure consistency. this might not matter, or it might matter a lot - it all depends on what is using those semantics.
Thanks for your info! unfortunately It's unclear what caused the corruption in this situation.I did lose a whole pool once because of a power loss while an SSD mirror was trimming (and this was with sync not disabled).
Thank you for the info. Did the disk survive, no bad blocks? Or was it entirely software fault?I burned my fingers disabling sync in a workstation setup where the pool only had a single disk. At some point, the data was corrupted, and I created the pool from scratch again, leaving sync enabled.
Trimming is another thing that can go wrong if the drives have no power loss protection (like enterprise SSDs, which can therefore also cache sync writes). I managed for years without PLP but the speed and peace of mind are amazing.Thanks for your info! unfortunately It's unclear what caused the corruption in this situation.
Switching off power at the wrong time was mainly the reason AFAIR. The disk is still working, just with sync not disabled after the pool was recreatedThank you for the info. Did the disk survive, no bad blocks? Or was it entirely software fault?
That shouldn't have happened, according to the documentation, but probably something went wrong.. Thanks!Switching off power at the wrong time was mainly the reason AFAIR. The disk is still working, just with sync not disabled after the pool was recreated