Proxmox crashing and killing NVMe SSDs

binbin2000 · Nov 25, 2020

I am using Proxmox on an Intel NUC8i5BEH as a home server running a couple of VMs. I'm new to Proxmox but decided to give it a try when I was moving my main Docker host from a Synology DS916+ to the NUC. Please ask me if you need additional information as this is my first forum post and I'm not sure what kind of info that is needed for you to help.

Proxmox is installed on an EXT4 volume running on a Kingston A2000 250GB NVMe drive with all the default installation options. There are a total of 3 VMs and one CT running. The load on the NUC is generally very low.

Ever since the start I've been having issues with the NVMe disk suddenly not responding. Proxmox suddenly hangs and none of the VMs or the Proxmox interface is reachable. When looking at the attached monitor the console is filled with errors about not being able to read or write to the disk (see attached pictures). The only way to make it work again is to cycle the power. Proxmox then boots up normally and everything is working again.

After a couple of days the same crash first happened again. Then it started happening more frequently. First the next day and then the next one within a couple of hours. When cycling the power the last time the disk had died completely and a SMART error was showing up when trying to boot.

I first suspected that it might be related to a bad NVMe disk, so I got a replacement disk on warranty and gave it another try. But the issues kept coming in a similar fashion as last time and last night my second NVMe disk died.

I spent some time when the second disk was new trying to force a crash to happen and I found that heavy disk usage (when rsyncing files to a network share for example) could cause a crash. Changing from a Samba share to an NFS share seemed to work better and the daily backup during night was running for a couple of weeks up until last night.

Any help with solving this issue would be very appreciated as I'm now going to buy a new disk for the third time. First question would then be if it could be some compatibility issue with the Kingston NVMe together with Proxmox in some way and if I should try a different brand?

Thanks!

bobmc · Nov 25, 2020

I'm not totally surprised you've been having issues, only that it appears to have happened very quickly - perhaps it's been overheating?

That Kingston drive is only intended for consumer use and not the sort of intensive activity it would be subject to in a proxmox server.

You're best bet is to install proxmox on a conventional hard drive and use the flash storage for hosting your containers or VM's - however, I would still be looking to buy something a little more 'professional' like a Samsung 970EVo Plus

binbin2000 · Nov 25, 2020

bobmc said:
I'm not totally surprised you've been having issues, only that it appears to have happened very quickly - perhaps it's been overheating?

That Kingston drive is only intended for consumer use and not the sort of intensive activity it would be subject to in a proxmox server.

You're best bet is to install proxmox on a conventional hard drive and use the flash storage for hosting your containers or VM's - however, I would still be looking to buy something a little more 'professional' like a Samsung 970EVo Plus

I can understand that it's not intended for server use, but then I would've expected a shortened lifetime rather than completely dying after a couple of weeks (the first one even died within the first week of use). Overheating could be a possible problem, at least I cannot say for sure that it hasn't been hot. The room is actually a couple of degrees hotter than regular room temperature, I'd say around 25 degrees Celcius.

Just for me to learn. Why shouldn't Proxmox be installed on the flash storage?

But I'll definitely take your advice on buying a more professional flash storage, the price is not that much higher for such small capacity.

bobmc · Nov 25, 2020

A couple of weeks is not what I'd expect so that does imply there is something more involved. A look at the SMART report would be useful in understanding what has gone one but heat & airflow across the drive would be my next guess.

Unlike ESXi which runs quite happily from an SD card, Proxmox is writing to the host storage constantly for logs and journaling and so is not suited to running from USB or consumer SSDs

Dunuin · Nov 26, 2020

Some weeks is too fast to die because of wear leveling. And ext4 shouldn't be as bad as a CoW file systm like ZFS.
If it fails under heavy load you should run smartctl and check the temperatures. My NVMe SSDs reached 85 degree C so I installed a 40mm fan to actively cool them.

And make sure to run smartctl regularily to monitor the amount of data written to the NAND. Its not uncommon that a SSD writes 1TB per day so that the SSD dies after some months or years if you don't buy enterprise grade SSDs. Because of the several abstraction layers, overhead and disabled caching because of the missing powerloss protection you get a really high write amplification. I for example got a write amplification of factor 20. My homelab server writes 30GB of data inside the VMs per day and the SSD writes 600GB per day to store it on the physical drive.
You always should monitor that.

H4R0 · Nov 26, 2020

Is the disk connected directly ? Or do you use a hba / raid controller / nvme pcie adapter ?

I wouldn't really expect much lifetime from a 30$ consumer nvme either. Proxmox is write heavy and small writes come with high write amplification, if the drive is somewhat full it does not have enough free space to do wear leveling and dies within months. Although in your case it looks like swap could be the problem as well. To get more lifetime use drives with more space e. g. start at 1TB.

Please post the output of "df -h", "sysctl vm.swappiness" and "smartctl -a /dev/nvme0" (adjust nvme0 if needed)

binbin2000 · Nov 26, 2020

I'm now trying again with Proxmox installed on a HDD and the VMs installed on the SSD (bought a Samsung 970 Evo Plus).

Is the disk connected directly ? Or do you use a hba / raid controller / nvme pcie adapter ?

It is connected directly to the motherboard as far as I know.

To get more lifetime use drives with more space e. g. start at 1TB.

Is the actual free space in GB/TB important or should there be a certain percentage of unused space. I don't need much storage capacity since most data is stored on my NAS, I only need a couple of GB for each application so with a 1TB disk 90% or so would be unused. Or is it required by Proxmox to be able to have really large swap files?
I have now 68% of the disk space allocated to the VMs, the rest is unallocated. In each VM there is also plenty of unused space. Do you think that would be enough? The old SSD had a bit higher disk utilization since Proxmox was installed on the same volume.

Code:

# df -h
Filesystem                    Size  Used Avail Use% Mounted on
udev                          7.8G     0  7.8G   0% /dev
tmpfs                         1.6G  9.2M  1.6G   1% /run
/dev/mapper/pve-root           57G  2.8G   52G   6% /
tmpfs                         7.8G   37M  7.8G   1% /dev/shm
tmpfs                         5.0M     0  5.0M   0% /run/lock
tmpfs                         7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/sda2                     511M  312K  511M   1% /boot/efi
192.168.1.2:/volume1/proxmox   13T   11T  2.0T  85% /mnt/pve/NAS
/dev/fuse                      30M   16K   30M   1% /etc/pve
tmpfs                         1.6G     0  1.6G   0% /run/user/0

Code:

# sysctl vm.swappiness
vm.swappiness = 60

I'm not sure if smartctl is relevant now with the new SSD, but at least you can see that it's definitely not running hot at the moment.

Code:

# smartctl -a /dev/nvme0
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 250GB
Serial Number:                      S4EUNX0NA24155F
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 250,059,350,016 [250 GB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          250,059,350,016 [250 GB]
Namespace 1 Utilization:            171,798,843,392 [171 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5a01b0e0dd
Local Time is:                      Thu Nov 26 15:14:11 2020 CET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Ti                                                                                                             mestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    352,886 [180 GB]
Data Units Written:                 435,589 [223 GB]
Host Read Commands:                 2,967,287
Host Write Commands:                605,171
Controller Busy Time:               3
Power Cycles:                       4
Power On Hours:                     6
Unsafe Shutdowns:                   0
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               30 Celsius
Temperature Sensor 2:               26 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

binbin2000 · Nov 26, 2020

I read a post about how high swap usage could wear out the SSD so I removed the swap function completely (with swapoff -a). However it didn't really turn out as expected, but I think I learned something on the way.

With swap file turned off I did a folder copy inside a VM running on the SSD and after a couple of seconds the VM froze and Proxmox stopped responding. After about 60 seconds I could access Proxmox again. From the syslog it seems that it ran out of available memory and shut down the VM after a while, everything working as expected it seems (except from the person behind the keyboard that made the crash happen ).

Then it came to my mind, how much RAM is actually needed to feed an NVMe disk in full speed? Could the combination of too little RAM and a high swappiness value, which would lead to a high swap usage, be the reason for killing my consumer SSDs? It seems very understandable at this point!

I currently have 16GB of RAM installed. Is there a way to calculate (or any guidelines) how much RAM would be needed to reduce swap file usage to a minimum?

H4R0 · Nov 26, 2020

binbin2000 said:
I have now 68% of the disk space allocated to the VMs, the rest is unallocated. In each VM there is also plenty of unused space. Do you think that would be enough?

68% should be fine. I would upgrade once you hit 75-80% though.

binbin2000 said:
Code:

# sysctl vm.swappiness vm.swappiness = 60

Your sappiness value is way too high for a server, you don't need that on the hypervisor and it will reduce your drive lifespan by a lot.

binbin2000 said:

I'm not sure if smartctl is relevant now with the new SSD, but at least you can see that it's definitely not running hot at the moment.

Code:

# smartctl -a /dev/nvme0
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 250GB
Total NVM Capacity:                 250,059,350,016 [250 GB]

Data Units Read:                    352,886 [180 GB]
Data Units Written:                 435,589 [223 GB]
Host Read Commands:                 2,967,287
Host Write Commands:                605,171
Power On Hours:                     6
Temperature Sensor 1:               30 Celsius
Temperature Sensor 2:               26 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

Your drive has written 223GB in 6 hours, if it continues like that the 150 TBW (lifespan) will be reached after 168 days.

First reduce swap usage:

Code:

# reduce swappiness
echo "vm.swappiness = 5" >> /etc/sysctl.conf

Then its important to enable trim so the ssd can do garbage collection.

But it seems you use luks encryption ? In that case you have to add discard as option in your /etc/crypttab

Code:

nano /etc/crypttab
...
cryptroot  UUID=xxxxxxxxxxxxxxxxxx   cryptroot   luks,initramfs,discard

update initramfs

Code:

update-initramfs -u

And lastly enable the fstrim timer.

Code:

systemctl enable fstrim.timer

Reboot to apply changes

Code:

reboot

Also make sure your vm's do not have heavy write, check the proxmox drive summary.

Post the smartctl output again in a week to verify the usage went down.

The factor to look for is "Percentage Used: 0%", it will go up to 100%

You should monitor it using nagios/icinga/munin etc.

For example this is the output of one of my 970 Plus:

Code:

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    17%
Data Units Read:                    39,989,326 [20.4 TB]
Data Units Written:                 126,951,731 [64.9 TB]
Host Read Commands:                 3,032,987,375
Host Write Commands:                6,490,446,119
Controller Busy Time:               5,887
Temperature Sensor 1:               36 Celsius
Temperature Sensor 2:               38 Celsius

bobmc · Nov 26, 2020

Wow, that's a lot of data writes, no wonder the Kingston drives failed so quickly.

Swappiness = 60 is default for proxmox and I have a couple of systems running with SSDs that haven't needed any real tuning to resolve wear issues. One system has been in production use for nearly 20,000 hours and I'm only seeing about 5% wearout

Curious to know what's causing such high write activity

H4R0 · Nov 26, 2020

bobmc said:
Wow, that's a lot of data writes, no wonder the Kingston drives failed so quickly.

Swappiness = 60 is default for proxmox and I have a couple of systems running with SSDs that haven't needed any real tuning to resolve wear issues. One system has been in production use for nearly 20,000 hours and I'm only seeing about 5% wearout

Curious to know what's causing such high write activity

In fact swap is disable by default and has to be manually enabled, in which case 60 is the default that's right, but that's the default from debian for desktop systems and not suitable for servers.

The most writes from proxmox are caused by rrd graphs and the cluster fuse db, due to write amplification.

I did a lot of testing and reduced write usage by 80% with various tweaks.

binbin2000 · Nov 26, 2020

Wow! That's some really good input. Thanks a lot.
I will keep an eye on the number of writes during the next couple of days.

However it might have been a combination of restoring from backup and high swap usage that caused a high initial number of writes. The number isn't increasing much now. But perhaps the Kingston still got killed from a large number of writes in a short period.
How come this isn't a problem on a desktop PC though? Doesn't Windows allow such a high number of writes as Proxmox does or what's the main difference?

I haven't changed any of the default settings other than restoring my backuped VMs. Swap was turned on by default as far as I know.

But it seems you use luks encryption ? In that case you have to add discard as option in your /etc/crypttab

I have never heard of LUKS encryption before and have never used it if it isn't enabled by default somehow.

I started the scheduled daily backup job manually and the temperature went up to 55 degrees during the backup. Is that something I should be worried about?
Obviously the read data counter increased by the same amount as the size of the backup. Is reading data anything to worry about regarding wearout or is it only writing that is limiting the lifetime?

During backup:

Code:

# smartctl -a /dev/nvme0
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 250GB
Serial Number:                      S4EUNX0NA24155F
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 250,059,350,016 [250 GB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          250,059,350,016 [250 GB]
Namespace 1 Utilization:            171,798,843,392 [171 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5a01b0e0dd
Local Time is:                      Thu Nov 26 19:14:20 2020 CET
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        44 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    694,543 [355 GB]
Data Units Written:                 453,385 [232 GB]
Host Read Commands:                 5,658,820
Host Write Commands:                708,203
Controller Busy Time:               5
Power Cycles:                       4
Power On Hours:                     7
Unsafe Shutdowns:                   0
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               44 Celsius
Temperature Sensor 2:               55 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

H4R0 · Nov 26, 2020

binbin2000 said:
How come this isn't a problem on a desktop PC though? Doesn't Windows allow such a high number of writes as Proxmox does or what's the main difference?

Well there is a big difference for memory usage between desktops and servers. Over all swap is a good thing but there is a break even point where too many swap writes can slow down the system. You should leave swap enabled but reduce the swappiness value for better performance and less disk wear. Its always better to increase the available RAM or use KSM instead of abusing swap.

binbin2000 said:
I haven't changed any of the default settings other than restoring my backuped VMs. Swap was turned on by default as far as I know.

Seems like proxmox only offers swap for lvm installs, I'm a zfs only guy so I never had the option in the installer.

binbin2000 said:
I have never heard of LUKS encryption before and have never used it if it isn't enabled by default somehow.

In your screenshots there where mentions of block devices of /dev/dm1, which if I remember correctly is luks.

Can you please post the output of "lsblk".

binbin2000 said:
I started the scheduled daily backup job manually and the temperature went up to 55 degrees during the backup. Is that something I should be worried about?

55°C is fine, above 60°C will shorten the lifespan, stay below 70°C under any means. In a controlled environment opt for 35-40°C

binbin2000 said:
Obviously the read data counter increased by the same amount as the size of the backup. Is reading data anything to worry about regarding wearout or is it only writing that is limiting the lifetime?

You can read as much as you want, it wont affect SSD's at all. Only writes will effect the cell lifespan. Every cell has a limited write cycle count.

binbin2000 · Nov 26, 2020

Great! Thanks a lot for your help everyone, I have learned a lot.

Crossing my fingers that I’m not having another dead SSD within the next couple of weeks again.

Dunuin · Nov 26, 2020

I started the scheduled daily backup job manually and the temperature went up to 55 degrees during the backup. Is that something I should be worried about?

A little bit of heat is fine. 55 degree shouldn't be a problem. There also was a study that NAND flash will be more durable if kept warm and not at room temperature. So that souldn't be a problem. What you could do is write a lot of data at once and run smartctl at the same time and look at the temperature.
If I look at the "Critical Comp. Temp. Threshold: 85 Celsius" it could be possible that your SSD will throttle down at 85 degree. You should check that your SSD wont exceed 70 degree or something in that range.

But i think the value "Warning Comp. Temperature Time" shouldn't be zero if your SSD already reached 85 degree in the past.

Obviously the read data counter increased by the same amount as the size of the backup. Is reading data anything to worry about regarding wearout or is it only writing that is limiting the lifetime?

Only Writes. Look at the TBW rating of your SSD. Thats the amount of written terrabytes your SSDs warranty covers. But keep the write amplification in mind. SSDs are bad at writing small blocks of data especially syncronous writes because it can only write data in big chunks. For example if you want to write 1MB of data as 100x 10kb sync writes it is possible that your SSD will need to write 100x 1MB, if the internal block size is 1MB, so 100MB will be written instead of 1MB. That is the write amplification. Scale that up to read workloads and it is possible that the SSD will write hundreds of Gigabyte per day. My SSDs are writing 600GB per day at the moment and 90% of the writes are just to store logs/metrics.
And the TBW covers the total amplified writes your SSDs is doing and not the amount of real data you try to write to it.

binbin2000 · Nov 26, 2020

Can you please post the output of "lsblk".

It’s a different installation running now compared to the one in the screenshots, but I can’t remember activating something called LUKS the last time either.
Can’t see any mentions of LUKS in the current installation below?

Code:

NAME                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                     8:0    0 232.9G  0 disk
├─sda1                  8:1    0  1007K  0 part
├─sda2                  8:2    0   512M  0 part /boot/efi
└─sda3                  8:3    0 232.4G  0 part
  ├─pve-swap          253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root          253:1    0    58G  0 lvm  /
  ├─pve-data_tmeta    253:2    0   1.5G  0 lvm
  │ └─pve-data-tpool  253:4    0 147.4G  0 lvm
  │   └─pve-data      253:5    0 147.4G  0 lvm
  └─pve-data_tdata    253:3    0 147.4G  0 lvm
    └─pve-data-tpool  253:4    0 147.4G  0 lvm
      └─pve-data      253:5    0 147.4G  0 lvm
nvme0n1               259:0    0 232.9G  0 disk
├─vm-vm--101--disk--0 253:6    0    64G  0 lvm
├─vm-vm--100--disk--0 253:7    0    64G  0 lvm
├─vm-vm--103--disk--0 253:8    0    32G  0 lvm
└─vm-vm--102--disk--0 253:9    0    32G  0 lvm

Search

Search

Proxmox crashing and killing NVMe SSDs

binbin2000

Member

Attachments

bobmc

Renowned Member

binbin2000

Member

bobmc

Renowned Member

Dunuin

Distinguished Member

H4R0

Well-Known Member

binbin2000

Member

binbin2000

Member

H4R0

Well-Known Member

bobmc

Renowned Member

H4R0

Well-Known Member

binbin2000

Member

H4R0

Well-Known Member

binbin2000

Member

Dunuin

Distinguished Member

binbin2000

Member