No, a very common saying is "Raid is not a backup!" because so many people misinterpret a raid1 as a form of backup. It is not. Raid is just for increasing the performance and/or decrease the downtime. Its great because in case a single disks fails your server will continue operating normally, nothing will crash and you don't loose the data your are working on at the moment or since the last backup. A raid won't help anything in case of:Isn't RAID1 sort of a backup itself?
- human errors like running a wrong command or just a typo that will delete data, wipe the disk edit data that shouldn't be changed and so on. Because if you do anything on the filesystem it will be instantainiously written to both disks so you got the same deleted/corrupted file/filesystem on both disks.
- ransomware that encrypts everything it got access to, so also the data on both disks. Here you want some backup disks that are rotated so one is never power on so ransomware can't destroy it
- hardware failures like a dying PSU/Motherboard which might fry both disks at the same time as both are connected to the same PSU/Mainboard. Or your CPU/RAM is slowly failing silently corrupting all data on both disks because of bit flips or wrong calculation results (lost hundreds of GBs 2 years ago because of this and these disks were part of a raid5 array...lucky got real backups to restore the data from)
- fire, lightning, water damage destroying the entire server or just a thief breaking in stealing it
- software errors like a bad kernel/ driver causing all disks to fail or just a bugged service that deletes data without asking
- a power outage, failing PSU or kernel crash so you loose all data cached in RAM which might corrupt the entire filesystem so again data is lost on both disks
- your server gets hacked
- ...
All that could be prevented by making real backups but not by raid.
Good rule of thumb is the 3-2-1 rule. You want 3 copies of everything (raid doesn't count as two copies!) on 2 different media and 1 copy offsite.
Two examples on how a good backup strategy might look like:
1.) Cheap and manual option
Get a SSD for your PVE. This is your fast production data. Optional but highly recommended:
- use a raid1 (so you loose no data in case a single SSD will die, and it will sooner or later...just lost one last week...again)
- use a UPS with HID/SNMP (so you don't loose data cached in RAM corrupting your filesystems on power outage)
- use enterprise SSDs (powerloss protection, way more durable, faster)
- use ECC RAM (so RAM errors won't silently corrupt data)
- use a filesystem that got bit rot protection like ZFS (so data on disks won't silently corrupt over time)
Then get two USB HDDs that are a multiple of the size of your SSD so you can store multiple backups (atleast 2) of each guests. Optionally would be good to encrypt them. One of this HDDs you keep plugged in the PVE server and store your daily/weekly Vzdump backups on. The other one you store offsite at your work/family/friends. Every several weeks/months when going to work or visiting family/friends you unplug the HDD from the PVE server, take it with you, swap the disk and take the one home you left at work/family/friends for the last weeks/months. Now you write your backups to that HDD. Rotate the HDDs every several weeks/months. That way you always got a recent backup at home for a fast restore and in case your home burns down, your server get stolen, ransomware destroys your data or something similar you always got a not so recent backup somewhere else.
2.) Expensive and automated
Get a PVE server like described in Option 1.
Get another server (SSD-only storage recommended) and install Proxmox Backup Server (PBS) to it. A VM on your PVE server might work too as long as you got dedicated backup SSDs in it and you additionally backup the PBS VM as a vzdump backup to a NAS or external HDD.
Then your PVE will backup the guests and PVE configs to the PBS server over the LAN. Then get another PBS server somewhere offsite (doesn't has to be that fast, HDDs with SSDs as special device might be fine in case you don't need to verify your backups) that might be rented at a hoster or might be a old PC you can run at your family/friends home. Then you tell that offsite PBS server to sync with the local PBS server by pulling the latest backups snapshots over the internet. PBS also supports client-side encryption if you can't trust the hoster.
That way you also got a fast local backup and in case of an accident you got the slow offsite backup you can restore from over the internet.
Ransomware can also don't delete data on that offsite PBS as it only pulls backups and you can set the rights so that PVE or the main PBS server can't manually delete backups from the offsite PBS.
Big benefits would be that the offsite backups is more up to date, that you can access the offsite backups without needing to physical access to the offsite location, that it will save alot of space because of dedupliation and that everything runs automated, so you can't forget to do your backups.
Downside would be initial price, higher electricity bills, more setup, maybe faster internet connection needed, annoyed friends because of sserver heating the room in summer and being loud at night.
And for both options you should have a backup recovery plan. You should write down what you will do in case of a data loss, what you will do to fix it and get your data back. And you should test everything to get practice so you won't loose more data while trying to restore it because of human errors. And backups should be restored from time to time, even if you don't lost any data, to verify that the backups are still working and all backupped data is valid.
Its not rare. Just search this forum for people who start a new thread titled like "Power outage shutdown server...now PVE won't boot anymore". I think I read such a thread every month and often they used a ZFS mirror with consumer SSD thinking this is enough and not having proper backups. Then they panic and try to rescue some bits of data of the corrupted SSDs.It's rare condition to have 2 SSDs failing at the same time.
Its a common error to use consumer SSDs with raid and without a UPS. They all buy cheap consumer SSDs because the datacenter grade SSDs are so expensive and then wonder why data is currupted on both SSDs of the mirror at the same time when a power outage occures. When you write something it is usually first written to the volatile RAM and later written to the disks. In case of a power outage all data in the systems RAM and all data in the RAM write cache of the SSDs/HDDs is instantly lost. As data is written to both disks simultaniously and both disks loose power at the same time, both disks data will currupt and the whole raid array might be damaged. Enterprise SSDs got a powerloss protection, consumer SSDs don't, so they can run some seconds without power from the PSU to panic and quickly write the RAM cache down to the persistent NAND. Thats also why Enterprise SSDs are so much faster doing sync writes with less write amplification doing it, because in contrast to consumer SSDs they can cache sync writes in the SSDs RAM as the RAM isn't volatile anymore because of the powerloss protection. And a UPS is also recommended so a power outage won't erase your systems RAM so you don't loose data because of that, as your server then can run for some minutes on battery for a correct shutdown the PSU should trigger before the PSUs battery is running out.
When I was young I also lost all my data that was on a Hw raid1. A disk failed, I tried to rebuild the array with a new disk and that rebuilding failed and the whole array was lost.
There aren't much options as M.2 is barely used in professional environments. There you got PCIe cards or U.2/U.3 NVMe SSDs.Could you also recommend enterprise level nvme m.2 SSD?
I think the MTFDHBA480TDF-1AW1ZABYY was the only 500GB M.2 SSD with 2280 length and a DWPD of atleast 1.
Last edited: