Proxmox reboots when guest performs disk-intensive task

giovvv

Active Member
Aug 4, 2018
45
0
26
56
In a system with a single guest running (Ubuntu 16), I have the whole host system (proxmox) suddenly rebooting when the guest performs a local backup with rsync.

The host is Proxmox with kernel 4.15.18-14-pve; it has two disks with ZFS RAID 1.

Apparently nothing unusual is in the host logs, so I have obtained a remote kernel crash trace log; and here are the results (omitting the initial boot messages):

[ 438.833555] perf: interrupt took too long (2516 > 2500), lowering kernel.perf_event_max_sample_rate to 79250
[ 596.055863] perf: interrupt took too long (3149 > 3145), lowering kernel.perf_event_max_sample_rate to 63500
[ 812.710491] softdog: Initiating system reboot

I am pretty confident that this problem is not related to the hardware of this machine (e.g. the CPU overheating), because I have tested this on three different systems, with different specs, different hardware and in different locations, and the result is basically the same. The reboot does not happens ALWAYS, but almost always.

Suggestions?

Thanks,
Giovanni
 
hi!

do you have a HA cluster or HA enabled on a standalone node? you have `softdog` in your trace log, so i'm pretty sure you have some sort of HA enabled.

i will ask for the following logs/outputs:
* systemctl status watchdog-mux
* systemctl status pve-ha-crm
* systemctl status pve-ha-lrm
* dmesg | grep soft
* and maybe also relevant parts of the syslog.
 
This is a single node, no cluster. It was originally installed from the 5.2-1 ISO image and eventually upgraded, up to the current version 5.4-5. It has basically no customizations.

Since it has swap on zfs (the installer made it), I tried applying the tweaks described here, but the problem did not disappear, the softdog still reboots the system.

Setting vm.swappiness = 1 does not help either. However, disabling swap completely (swapoff /dev/zd0), seems to prevent the problem. Not sure if it's a good idea though.

Here are the outputs you requested, from one of the three (unrelated) machines I tested:

systemctl status watchdog-mux:

Code:
● watchdog-mux.service - Proxmox VE watchdog multiplexer
   Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static; vendor preset: enabled)
   Active: active (running) since Thu 2019-05-23 16:46:32 CEST; 52min ago
 Main PID: 7046 (watchdog-mux)
    Tasks: 1 (limit: 4915)
   Memory: 436.0K
      CPU: 68ms
   CGroup: /system.slice/watchdog-mux.service
           └─7046 /usr/sbin/watchdog-mux

May 23 16:46:32 mox3 systemd[1]: Started Proxmox VE watchdog multiplexer.
May 23 16:46:33 mox3 watchdog-mux[7046]: Watchdog driver 'Software Watchdog', version 0

systemctl status pve-ha-crm:

Code:
● pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-05-23 16:46:42 CEST; 53min ago
  Process: 7919 ExecStart=/usr/sbin/pve-ha-crm start (code=exited, status=0/SUCCESS)
 Main PID: 7942 (pve-ha-crm)
    Tasks: 1 (limit: 4915)
   Memory: 78.7M
      CPU: 1.278s
   CGroup: /system.slice/pve-ha-crm.service
           └─7942 pve-ha-crm

May 23 16:46:41 mox3 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
May 23 16:46:42 mox3 pve-ha-crm[7942]: starting server
May 23 16:46:42 mox3 pve-ha-crm[7942]: status change startup => wait_for_quorum
May 23 16:46:42 mox3 systemd[1]: Started PVE Cluster Ressource Manager Daemon.

systemctl status pve-ha-lrm:

Code:
● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-05-23 16:46:43 CEST; 54min ago
  Process: 7943 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
 Main PID: 7977 (pve-ha-lrm)
    Tasks: 1 (limit: 4915)
   Memory: 78.7M
      CPU: 1.469s
   CGroup: /system.slice/pve-ha-lrm.service
           └─7977 pve-ha-lrm

May 23 16:46:42 mox3 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
May 23 16:46:43 mox3 pve-ha-lrm[7977]: starting server
May 23 16:46:43 mox3 pve-ha-lrm[7977]: status change startup => wait_for_agent_lock
May 23 16:46:43 mox3 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

dmesg | grep soft:

Code:
[    0.817216] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    0.817261] software IO TLB [mem 0xb7f90000-0xbbf90000] (64MB) mapped at [        (ptrval)-        (ptrval)]
[    5.352891] xor: measuring software checksum speed
[   45.429642] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
 
hi.

if you turn swap off, then your ram will be used a lot more as a result.

maybe you can try:

Code:
sync; echo 3 > /proc/sys/vm/drop_caches

this should clear the pagecache, inodes and dentries. should clean up a sizeable chunk of ram.
maybe run this in a cronjob.

Here are the outputs you requested, from one of the three (unrelated) machines I tested:

not sure i understand this bit. are these outputs not from the effected machine?

EDIT:

how much ram do you have on total? zfs uses quite a bit of ram, and proxmox itself needs min 2gb
 
hi.
maybe run this in a cronjob.

How often?

not sure i understand this bit. are these outputs not from the effected machine?

Yes, from one of the three affected machines (see below).

how much ram do you have on total? zfs uses quite a bit of ram, and proxmox itself needs min 2gb

They have different specs:
machine1: 16 GB RAM, 8 GB swap, total pool size 4TB (mostly unused)
machine2: 32 GB RAM, 8 GB swap, total pool size 4TB (mostly unused)
machine3: 7 GB RAM, 6 GB swap, total pool size 250GB

the outputs above are all from machine3, but those from the other two are similar.

BTW I also tried to limit zfs_arc_max but it does not seem to change anything. Only swapoff prevents the machine from rebooting.
 
How often?

maybe every hour. first run it normally to see if it helps in your case.

which machine is rebooted exactly? whole cluster or just one node?

Only swapoff prevents the machine from rebooting.

maybe you can pinpoint what uses so much memory/causes the reboot if you check syslog output
 
which machine is rebooted exactly? whole cluster or just one node?

This is not a cluster. They are three separate, almost identical machines carrying different VMs; their only relationship is that they use pve-zsync for backup (not related to this issue, the problem never manifest during that operation). I am showing you all three machines just to underline that they have different hardware, different amount of RAM, different location etc and the problem still manifests.

maybe you can pinpoint what uses so much memory/causes the reboot if you check syslog output

This is known: it is just an rsync process in the VM that copies a largish amount of data (12GB, 500k files) locally. It is in crontab, but launching it manually causes the same effect (boom!).

BTW, the guest filesytem is ext4 on LVM; zfs is at the host level only.
 
try the command i sent you on the host and see if it helps
 
I've tried this, without disabling the swap:

1. rebooted the host
2. started the guest
3. "sync; echo 3 > /proc/sys/vm/drop_caches" on both the guest and the host
4. started the infamous rsync on the guest
5. after a while, the host rebooted:

Code:
[  823.461934] perf: interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[ 1258.014199] perf: interrupt took too long (3142 > 3130), lowering kernel.perf_event_max_sample_rate to 63500
[ 1382.335900] softdog: Initiating system reboot
 
Can you do this:
1] add dedicated HDD/SSDs as datastorage with zfs-> test job
2] add dedicated HDD/SSDs as datastorage withous zfs -> test job
3] install PVE as ext4 -> test job

Even install netdata on PVE+VM and send data to 3rd netdata instance...because you didn't presented any performance data
 
You basically said that the system is unstable when SWAP is used.
Swap on ZFS is not stable and should be removed. PM installers after one you used, do not set up SWAP on ZFS anymore.
Please browse this forum, as all has been discussed and possible solutions like SWAP on MD RAID, ZSWAP, etc. I personally use SWAP on MD RAID nowdays, if really needed.
 
@mailinglists: Okay, if swap on zfs is known to be unstable there is really no point in doing in-depth tests as @czechsys recommended. I'll get rid of that swap and consider some alternative. Thanks.

btw:
1) If the current Proxmox installer does not create swap on ZFS anymore, does it allow to create non-ZFS swap in a ZFS install?
2) At the moment, my servers (2-disks, zfs raid1) have a full-disk ZFS installation, created by the "legacy" Proxmox installer. Is it possible to carve on those disk another partition on them for non-zfs swap, without reinstalling everything?
 
1) No, but it lets you leave empty space for whatever. Creating SWAP is left up to you.
2) It is way easier to just do a reinstall, if you have to ask such a question. :) Even I would just backup and reinstall.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!