Reboot after heavy I/O load

Tasslehoff

New Member
Dec 22, 2023
2
0
1
Hi, I have a lab environment with three PVE 8.1.4 hosts, each of both have an SSD mirror for PVE itself and a SATA zfs mirror as datastore for my vm.
For backups I have another host with PBS 3.1-5 with an SSD mirror for the OS and a zfs raidz datastore for backups.
Each of these hosts have 1 Gbps nic and usually during backups or transfers between PVE hosts I fill up Gbps bandwidth between them.

So far so good, except when I have some I/O intensive loads on a host, today for example I started a backup of a new VM on PBS (~500 GB vm), the backup started with no problems, but after a new minutes some vm become unresponsive and the PVE host where I was running the new VM (the backup source) had a reboot.

After checking my environment I started looking for the cause of this reboot/crash, and the only trace I found was a simple log in journald for the reboot, no errors before, no sign in instability, nothing.

Code:
Apr 23 10:17:01 drakaris02 CRON[198750]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 23 10:17:01 drakaris02 CRON[198751]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Apr 23 10:17:01 drakaris02 CRON[198750]: pam_unix(cron:session): session closed for user root
Apr 23 10:18:51 drakaris02 pmxcfs[1267]: [status] notice: received log
Apr 23 10:19:47 drakaris02 pvedaemon[3445903]: <root@pam> starting task UPID:drakaris02:00031189:204F0DBB:66276F23:vzdump:101:root@pam:
Apr 23 10:19:47 drakaris02 pvedaemon[201097]: INFO: starting new backup job: vzdump 101 --notification-mode auto --notes-template '{{guestname}}' --rem
ove 0 --storage pbs-archive --mode snapshot --node drakaris02
Apr 23 10:19:47 drakaris02 pvedaemon[201097]: INFO: Starting Backup of VM 101 (qemu)
Apr 23 10:21:30 drakaris02 pvestatd[1394]: status update time (6.575 seconds)
-- Boot 42b0c380cd6f401bbb6445e19f5267be --
Apr 23 10:28:41 drakaris02 kernel: Linux version 6.5.11-8-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1
 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) ()
Apr 23 10:28:41 drakaris02 kernel: Command line: BOOT_IMAGE=/vmlinuz-6.5.11-8-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
Apr 23 10:28:41 drakaris02 kernel: KERNEL supported cpus:
Apr 23 10:28:41 drakaris02 kernel:   Intel GenuineIntel
Apr 23 10:28:41 drakaris02 kernel:   AMD AuthenticAMD
Apr 23 10:28:41 drakaris02 kernel:   Hygon HygonGenuine
Apr 23 10:28:41 drakaris02 kernel:   Centaur CentaurHauls
Apr 23 10:28:41 drakaris02 kernel:   zhaoxin   Shanghai
Apr 23 10:28:41 drakaris02 kernel: BIOS-provided physical RAM map:

Does anyone experienced this kind of behavior before?
Do you know how can I debug this kind of incidents?
Beside journald o /var/log/syslog is there any other specific PVE log I can check to find the cause of this reboot/crash?
 
If you are running each node on a 1 Gbps connection and experience issues during a backup or other network congestion, you are likely interfering with the network traffic between nodes.

Set a bandwidth limit on the backups to avoid the congestion.
 
  • Like
Reactions: Kingneutron
It's actually preferable to run corosync on its own separate network, I believe the docs mention this.

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network

@Tasslehoff it may be worth looking into setting up a separate 2.5Gbit network - either with pcie Intel-based adapters and/or USB3 (likely Realtek chipset) as your backups will complete faster and won't be interfering with cluster comms.

2.5 is basically the last gasp for CAT5E cables. Just for consideration, I run 172.16.25/24 for 2.5, and 172.16.10/24 for 10Gbit.

You can setup a small VM with ipfire and the like for DHCP + ntp/time service for that net.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!