Experience Level: Beginner - Intermediate
Host System: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz, 8GB Ram, 3 500GB Sata Drives, 1 External 8TB USB 3.0 Drive
Last week, this box was running Ubuntu 18.04 independent of any VE. I migrated to Proxmox, and the only VM configured is a fresh Ubuntu 20.04 install.
The external USB drive is assigned as a USB device on the VM.
The three internal drives are split up. 1 drive is the host system primary drive, and the other two are ZFS mirrored. The Ubuntu VM is using the entire mirror as a single LV.
Problem:
The Host locks up on an irregular basis multiple times per day. The cursor on the terminal attached to the box stops blinking and everything comes to a complete stop and I have to force power off and restart the system.
There weren't any problems running Ubuntu 20.04 on the hardware without Proxmox, so I don't know if there's some sort of finicky detail that Proxmox finds that Ubuntu ignored.
Question:
Where do I start looking beyond syslog to diagnose this problem and what might I be looking for?
As you can see, the dead spots on this usage graph are where the system has gone down (yet I don't know that it has gone down.)
So, I checked the syslog at those exact times:
Sep 20 13:17:02 pve1 CRON[1251095]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 20 13:17:02 pve1 CRON[1251096]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 20 13:17:02 pve1 CRON[1251095]: pam_unix(cron:session): session closed for user root
Sep 20 13:18:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 20 13:18:01 pve1 systemd[1]: pvesr.service: Succeeded.
Sep 20 13:18:01 pve1 systemd[1]: Finished Proxmox VE replication runner.
----
Crash Here
----
Sep 20 15:26:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 20 15:26:01 pve1 systemd[1]: pvesr.service: Succeeded.
Sep 20 15:26:01 pve1 systemd[1]: Finished Proxmox VE replication runner.
Sep 20 15:26:18 pve1 pvedaemon[1991]: <root@pam> successful auth for user 'root@pam'
Sep 20 15:27:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 20 15:27:01 pve1 systemd[1]: pvesr.service: Succeeded.
----
Crash Here
----
Sep 21 00:52:18 pve1 pvedaemon[74904]: <root@pam> successful auth for user 'root@pam'
Sep 21 00:53:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 21 00:53:00 pve1 systemd[1]: pvesr.service: Succeeded.
Sep 21 00:53:00 pve1 systemd[1]: Finished Proxmox VE replication runner.
Sep 21 00:54:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 21 00:54:00 pve1 systemd[1]: pvesr.service: Succeeded.
Sep 21 00:54:00 pve1 systemd[1]: Finished Proxmox VE replication runner.
----
Crash Here
----
So, at this point, I have no idea what I'm seeing and I don't see a pattern, and I don't know when the system goes down, so I'm sorta stuck.
Host System: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz, 8GB Ram, 3 500GB Sata Drives, 1 External 8TB USB 3.0 Drive
Last week, this box was running Ubuntu 18.04 independent of any VE. I migrated to Proxmox, and the only VM configured is a fresh Ubuntu 20.04 install.
The external USB drive is assigned as a USB device on the VM.
The three internal drives are split up. 1 drive is the host system primary drive, and the other two are ZFS mirrored. The Ubuntu VM is using the entire mirror as a single LV.
Problem:
The Host locks up on an irregular basis multiple times per day. The cursor on the terminal attached to the box stops blinking and everything comes to a complete stop and I have to force power off and restart the system.
There weren't any problems running Ubuntu 20.04 on the hardware without Proxmox, so I don't know if there's some sort of finicky detail that Proxmox finds that Ubuntu ignored.
Question:
Where do I start looking beyond syslog to diagnose this problem and what might I be looking for?
As you can see, the dead spots on this usage graph are where the system has gone down (yet I don't know that it has gone down.)
So, I checked the syslog at those exact times:
Sep 20 13:17:02 pve1 CRON[1251095]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 20 13:17:02 pve1 CRON[1251096]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 20 13:17:02 pve1 CRON[1251095]: pam_unix(cron:session): session closed for user root
Sep 20 13:18:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 20 13:18:01 pve1 systemd[1]: pvesr.service: Succeeded.
Sep 20 13:18:01 pve1 systemd[1]: Finished Proxmox VE replication runner.
----
Crash Here
----
Sep 20 15:26:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 20 15:26:01 pve1 systemd[1]: pvesr.service: Succeeded.
Sep 20 15:26:01 pve1 systemd[1]: Finished Proxmox VE replication runner.
Sep 20 15:26:18 pve1 pvedaemon[1991]: <root@pam> successful auth for user 'root@pam'
Sep 20 15:27:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 20 15:27:01 pve1 systemd[1]: pvesr.service: Succeeded.
----
Crash Here
----
Sep 21 00:52:18 pve1 pvedaemon[74904]: <root@pam> successful auth for user 'root@pam'
Sep 21 00:53:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 21 00:53:00 pve1 systemd[1]: pvesr.service: Succeeded.
Sep 21 00:53:00 pve1 systemd[1]: Finished Proxmox VE replication runner.
Sep 21 00:54:00 pve1 systemd[1]: Starting Proxmox VE replication runner...
Sep 21 00:54:00 pve1 systemd[1]: pvesr.service: Succeeded.
Sep 21 00:54:00 pve1 systemd[1]: Finished Proxmox VE replication runner.
----
Crash Here
----
So, at this point, I have no idea what I'm seeing and I don't see a pattern, and I don't know when the system goes down, so I'm sorta stuck.