New ProxmoxVE install crashing in less than 24h with no VM or CT running

barnfast

New Member
Jun 23, 2025
6
1
3
I'm running a Lenovo Thinkstation P510 that was stable running Windows 10 Pro 64-bit. I'm installing Proxmox for the first time as a homelab and home server. Unfortunately, it fails seemingly at random in less than 24h. I've started keeping notes.

CPU: Intel Xeon E5-2695 v4 18c/36t @ 2.10GHz
RAM: 64GB DDR4,2400,RDIMM,ECC,HYNIX
Mobo: Mikonos MB LGA-2011
NIC: 1Gbps ethernet onboard mobo e1000e Intel(R) PRO/1000 Network Connection
GPU: Nvidia GTX 1060

Storage:
1 x SPCC 256GB as boot SSD
2 x Sandisk 512GB as LVM-Thin storage drives
1 x Liteon 512GB as LVM-Thin storage drives
1 x SMB/CIFS shared folder on a NAS

Everything except the 512GB SSDs, 1060 GPU, and an extra DVDRW is certified for the P510.

OS: Proxmox PVE pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-11-pve)

I've recorded 4 runs so far. All crashed.

Run Start Freeze Duration Loads on Proxmox
123/06/2025 20:2624/06/2025 10:3814h 12m2VM, 1CT
225/06/2025 12:1325/06/2025 17:3605h 23m2VM, 1CT
325/06/2025 19:2526/06/2025 00:3205h 07m2VM, 1CT, stopped them mid-run
426/06/2025 07:4527/06/2025 05:4822h 03mnil


For the first two runs I had VMs and a CT. For the runs afterwards I disabled them.

When it freezes the SSH refuses connection, web server too. The monitor display output still works, although there is no change or warning when it crashes. Screen is frozen. When I plug in a keyboard, it receives power and backlight turns on, but no input is accepted and the capslock/numlock keys won't enable their indicator lights on the kb. The system never recovers from this state. I have to force reboot with the power button.

My post was too long, so the full dmesg and journalctl are attached. I've extracted the end of each system log before it froze in the spoiler tags.

Run01 System Log (end only)
Code:
root@pve:~# journalctl --since "2025-06-23 20:26" --until "2025-06-24 10:39"
Jun 23 20:26:08 pve kernel: Linux version 6.8.12-11-pve (build@proxmox) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld >
Jun 23 20:26:08 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=/dev/mapper/pve-root ro quiet int>
...snip...
Jun 24 09:17:01 pve CRON[224884]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 24 09:17:01 pve CRON[224885]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 24 09:17:01 pve CRON[224884]: pam_unix(cron:session): session closed for user root
Jun 24 09:17:21 pve pvedaemon[1692]: <root@pam> successful auth for user 'root@pam'
Jun 24 09:32:21 pve pvedaemon[1690]: <root@pam> successful auth for user 'root@pam'
Jun 24 09:47:21 pve pvedaemon[1691]: <root@pam> successful auth for user 'root@pam'
Jun 24 09:56:10 pve smartd[1336]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 7>
Jun 24 10:02:21 pve pvedaemon[1692]: <root@pam> successful auth for user 'root@pam'
Jun 24 10:12:12 pve pveproxy[221135]: worker exit
Jun 24 10:12:12 pve pveproxy[1696]: worker 221135 finished
Jun 24 10:12:12 pve pveproxy[1696]: starting 1 worker(s)
Jun 24 10:12:12 pve pveproxy[1696]: worker 240535 started
Jun 24 10:17:01 pve CRON[241908]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 24 10:17:01 pve CRON[241909]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 24 10:17:01 pve CRON[241908]: pam_unix(cron:session): session closed for user root
Jun 24 10:17:21 pve pvedaemon[1690]: <root@pam> successful auth for user 'root@pam'
Jun 24 10:26:10 pve smartd[1336]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 7>
Jun 24 10:32:21 pve pvedaemon[1691]: <root@pam> successful auth for user 'root@pam'
Jun 24 10:38:42 pve pveproxy[177302]: worker exit
Jun 24 10:38:42 pve pveproxy[1696]: worker 177302 finished
Jun 24 10:38:42 pve pveproxy[1696]: starting 1 worker(s)
Jun 24 10:38:42 pve pveproxy[1696]: worker 248166 started

Run02 System Log
Code:
root@pve:~# journalctl --since "2025-06-25 12:13" --until "2025-06-25 17:37"
Jun 25 12:13:16 pve kernel: Linux version 6.8.12-11-pve (build@proxmox) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld >
Jun 25 12:13:16 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=/dev/mapper/pve-root ro quiet int>
...SNIP...
Jun 25 17:00:10 pve pveproxy[62116]: worker exit
Jun 25 17:00:10 pve pveproxy[1719]: worker 62116 finished
Jun 25 17:00:10 pve pveproxy[1719]: starting 1 worker(s)
Jun 25 17:00:10 pve pveproxy[1719]: worker 92421 started
Jun 25 17:07:11 pve systemd[1]: Starting man-db.service - Daily man-db regeneration...
Jun 25 17:07:11 pve systemd[1]: man-db.service: Deactivated successfully.
Jun 25 17:07:11 pve systemd[1]: Finished man-db.service - Daily man-db regeneration.
Jun 25 17:12:08 pve pvedaemon[1712]: worker exit
Jun 25 17:12:08 pve pvedaemon[1711]: worker 1712 finished
Jun 25 17:12:08 pve pvedaemon[1711]: starting 1 worker(s)
Jun 25 17:12:08 pve pvedaemon[1711]: worker 96112 started
Jun 25 17:13:20 pve smartd[1329]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 6>
Jun 25 17:13:24 pve pveproxy[72494]: worker exit
Jun 25 17:13:24 pve pveproxy[1719]: worker 72494 finished
Jun 25 17:13:24 pve pveproxy[1719]: starting 1 worker(s)
Jun 25 17:13:24 pve pveproxy[1719]: worker 96502 started
Jun 25 17:13:59 pve pvedaemon[84165]: <root@pam> successful auth for user 'root@pam'
Jun 25 17:17:01 pve CRON[97742]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 25 17:17:01 pve CRON[97743]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 25 17:17:01 pve CRON[97742]: pam_unix(cron:session): session closed for user root
Jun 25 17:28:59 pve pvedaemon[96112]: <root@pam> successful auth for user 'root@pam'
Jun 25 17:29:54 pve systemd[1]: Starting apt-daily.service - Daily apt download activities...
Jun 25 17:29:54 pve systemd[1]: apt-daily.service: Deactivated successfully.
Jun 25 17:29:54 pve systemd[1]: Finished apt-daily.service - Daily apt download activities.
Jun 25 17:36:07 pve pveproxy[71395]: worker exit
Jun 25 17:36:07 pve pveproxy[1719]: worker 71395 finished
Jun 25 17:36:07 pve pveproxy[1719]: starting 1 worker(s)
Jun 25 17:36:07 pve pveproxy[1719]: worker 103692 started

Run03 System Log
Code:
root@pve:~# journalctl --since "2025-06-25 19:25" --until "2025-06-26 00:33"
Jun 25 19:25:41 pve kernel: Linux version 6.8.12-11-pve (build@proxmox) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld >
Jun 25 19:25:41 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=/dev/mapper/pve-root ro quiet int>
...SNIP...
Jun 26 00:00:02 pve systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
Jun 26 00:00:02 pve systemd[1]: Starting logrotate.service - Rotate log files...
Jun 26 00:00:02 pve systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Jun 26 00:00:02 pve systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
Jun 26 00:00:02 pve systemd[1]: Reloading pveproxy.service - PVE API Proxy Server...
Jun 26 00:00:03 pve pveproxy[83218]: send HUP to 2646
Jun 26 00:00:03 pve pveproxy[2646]: received signal HUP
Jun 26 00:00:03 pve pveproxy[2646]: server closing
Jun 26 00:00:03 pve pveproxy[2646]: server shutdown (restart)
Jun 26 00:00:03 pve systemd[1]: Reloaded pveproxy.service - PVE API Proxy Server.
Jun 26 00:00:03 pve systemd[1]: Reloading spiceproxy.service - PVE SPICE Proxy Server...
Jun 26 00:00:03 pve spiceproxy[83221]: send HUP to 2689
Jun 26 00:00:03 pve spiceproxy[2689]: received signal HUP
Jun 26 00:00:03 pve spiceproxy[2689]: server closing
Jun 26 00:00:03 pve spiceproxy[2689]: server shutdown (restart)
Jun 26 00:00:03 pve systemd[1]: Reloaded spiceproxy.service - PVE SPICE Proxy Server.
Jun 26 00:00:03 pve pvefw-logger[1316]: received terminate request (signal)
Jun 26 00:00:03 pve pvefw-logger[1316]: stopping pvefw logger
Jun 26 00:00:03 pve systemd[1]: Stopping pvefw-logger.service - Proxmox VE firewall logger...
Jun 26 00:00:03 pve systemd[1]: pvefw-logger.service: Deactivated successfully.
Jun 26 00:00:03 pve systemd[1]: Stopped pvefw-logger.service - Proxmox VE firewall logger.
Jun 26 00:00:03 pve systemd[1]: pvefw-logger.service: Consumed 1.635s CPU time.
Jun 26 00:00:03 pve systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall logger...
Jun 26 00:00:03 pve pvefw-logger[83231]: starting pvefw logger
Jun 26 00:00:03 pve systemd[1]: Started pvefw-logger.service - Proxmox VE firewall logger.
Jun 26 00:00:03 pve systemd[1]: logrotate.service: Deactivated successfully.
Jun 26 00:00:03 pve systemd[1]: Finished logrotate.service - Rotate log files.
Jun 26 00:00:03 pve spiceproxy[2689]: restarting server
Jun 26 00:00:03 pve spiceproxy[2689]: starting 1 worker(s)
Jun 26 00:00:03 pve spiceproxy[2689]: worker 83235 started
Jun 26 00:00:04 pve pveproxy[2646]: restarting server
Jun 26 00:00:04 pve pveproxy[2646]: starting 3 worker(s)
Jun 26 00:00:04 pve pveproxy[2646]: worker 83237 started
Jun 26 00:00:04 pve pveproxy[2646]: worker 83238 started
Jun 26 00:00:04 pve pveproxy[2646]: worker 83239 started
Jun 26 00:00:08 pve spiceproxy[2690]: worker exit
Jun 26 00:00:08 pve spiceproxy[2689]: worker 2690 finished
Jun 26 00:00:09 pve pveproxy[69990]: worker exit
Jun 26 00:00:09 pve pveproxy[75218]: worker exit
Jun 26 00:00:09 pve pveproxy[2646]: worker 69990 finished
Jun 26 00:00:09 pve pveproxy[2646]: worker 75218 finished
Jun 26 00:00:09 pve pveproxy[2646]: worker 74967 finished
Jun 26 00:00:11 pve pveproxy[83271]: worker exit
Jun 26 00:02:52 pve pvedaemon[80467]: <root@pam> successful auth for user 'root@pam'
Jun 26 00:17:01 pve CRON[87514]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 26 00:17:01 pve CRON[87515]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 26 00:17:01 pve CRON[87514]: pam_unix(cron:session): session closed for user root
Jun 26 00:17:53 pve pvedaemon[45470]: <root@pam> successful auth for user 'root@pam'
Jun 26 00:23:08 pve pvedaemon[1947]: worker exit
Jun 26 00:23:08 pve pvedaemon[1944]: worker 1947 finished
Jun 26 00:23:08 pve pvedaemon[1944]: starting 1 worker(s)
Jun 26 00:23:08 pve pvedaemon[1944]: worker 89070 started
Jun 26 00:23:54 pve pvedaemon[45470]: worker exit
Jun 26 00:23:54 pve pvedaemon[1944]: worker 45470 finished
Jun 26 00:23:54 pve pvedaemon[1944]: starting 1 worker(s)
Jun 26 00:23:54 pve pvedaemon[1944]: worker 89354 started
Jun 26 00:25:44 pve smartd[1331]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 7>
Jun 26 00:26:52 pve systemd[1]: Starting apt-daily.service - Daily apt download activities...
Jun 26 00:26:52 pve systemd[1]: apt-daily.service: Deactivated successfully.
Jun 26 00:26:52 pve systemd[1]: Finished apt-daily.service - Daily apt download activities.
Jun 26 00:32:53 pve pvedaemon[89070]: <root@pam> successful auth for user 'root@pam'

Run04 System Log
Code:
root@pve:~# journalctl --since "2025-06-26 07:45" --until "2025-06-27 05:49"
Jun 26 07:46:18 pve kernel: Linux version 6.8.12-11-pve (build@proxmox) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld >
Jun 26 07:46:18 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=/dev/mapper/pve-root ro quiet int>
...SNIP...
Jun 27 05:03:09 pve pvedaemon[347620]: <root@pam> successful auth for user 'root@pam'
Jun 27 05:14:33 pve systemd[1]: Starting man-db.service - Daily man-db regeneration...
Jun 27 05:14:33 pve systemd[1]: man-db.service: Deactivated successfully.
Jun 27 05:14:33 pve systemd[1]: Finished man-db.service - Daily man-db regeneration.
Jun 27 05:17:01 pve CRON[404451]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 27 05:17:01 pve CRON[404452]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 27 05:17:01 pve CRON[404451]: pam_unix(cron:session): session closed for user root
Jun 27 05:17:17 pve pveproxy[373047]: worker exit
Jun 27 05:17:17 pve pveproxy[2656]: worker 373047 finished
Jun 27 05:17:17 pve pveproxy[2656]: starting 1 worker(s)
Jun 27 05:17:17 pve pveproxy[2656]: worker 404561 started
Jun 27 05:17:22 pve pveproxy[375928]: worker exit
Jun 27 05:17:22 pve pveproxy[2656]: worker 375928 finished
Jun 27 05:17:22 pve pveproxy[2656]: starting 1 worker(s)
Jun 27 05:17:22 pve pveproxy[2656]: worker 404614 started
Jun 27 05:18:09 pve pvedaemon[385298]: <root@pam> successful auth for user 'root@pam'
Jun 27 05:19:19 pve pveproxy[374251]: worker exit
Jun 27 05:19:19 pve pveproxy[2656]: worker 374251 finished
Jun 27 05:19:19 pve pveproxy[2656]: starting 1 worker(s)
Jun 27 05:19:19 pve pveproxy[2656]: worker 405553 started
Jun 27 05:33:10 pve pvedaemon[337554]: <root@pam> successful auth for user 'root@pam'
Jun 27 05:46:05 pve systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Jun 27 05:46:07 pve pveupdate[418248]: <root@pam> starting task UPID:pve:000661E1:0078D863:685DBF9F:aptupdate::root@pam:
Jun 27 05:46:09 pve pveupdate[418273]: update new package list: /var/lib/pve-manager/pkgupdates
Jun 27 05:46:11 pve pveupdate[418248]: <root@pam> end task UPID:pve:000661E1:0078D863:685DBF9F:aptupdate::root@pam: OK
Jun 27 05:46:11 pve systemd[1]: pve-daily-update.service: Deactivated successfully.
Jun 27 05:46:11 pve systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Jun 27 05:46:11 pve systemd[1]: pve-daily-update.service: Consumed 4.240s CPU time.
Jun 27 05:48:11 pve pvedaemon[385298]: <root@pam> successful auth for user 'root@pam'
 

Attachments

When it freezes the SSH refuses connection, web server too. The monitor display output still works, although there is no change or warning when it crashes. Screen is frozen. When I plug in a keyboard, it receives power and backlight turns on, but no input is accepted and the capslock/numlock keys won't enable their indicator lights on the kb. The system never recovers from this state. I have to force reboot with the power button.
The logfiles don't show anything. Could you please run on the console on a logged in terminal the command

Code:
watch "dmesg | tail -50"

You may need to change the -50 parameter to reflect the maximum number of lines of your console minus 4. Hopefully it'll show something on the next freeze.
 
  • Like
Reactions: barnfast
See if there's any "e1000e" mentioned in your dmesg.

A lot of people, myself included, have been having issues with Intel NICs causing the nodes to reboot randomly when any bit of traffic is going through the NIC.

 
See if there's any "e1000e" mentioned in your dmesg.

A lot of people, myself included, have been having issues with Intel NICs causing the nodes to reboot randomly when any bit of traffic is going through the NIC.

I do have the e1000e nic and made sure to include it in my system description. I'd skimmed those threads before, but hadn't dug into them because my symptoms don't match exactly and I don't have the Detected Hardware Unit Hang in my logs. I also didn't want to start applying fixes for problems that aren't clearly present (maybe) and complicating the efforts of support folks here.

I do intend to apply whatever the recommended fix is at some point though.

Right now I'm going to wait for this to crash again while dmesg | tail -50 is running in an ssh terminal. Hopefully it shows something useful. If it points to the nic at all, then you can bet I'll be hitting those threads for their remedies sooner rather than later.
 
Last edited:
The logfiles don't show anything. Could you please run on the console on a logged in terminal the command

Code:
watch "dmesg | tail -50"

You may need to change the -50 parameter to reflect the maximum number of lines of your console minus 4. Hopefully it'll show something on the next freeze.
Unfortunately, it didn't show anything useful.

527/06/2025 08:0727/06/2025 23:1815h 11mnil

At 21:28 (two hrs before crash) dmesg | tail -50 showed this:
Code:
root@pve:/var/log# watch "dmesg | tail -50"
Every 2.0s: dmesg | tail -50                                                               pve: Fri Jun 27 21:28:03 2025

[   10.464183] EDAC sbridge:  Ver: 1.1.2
[   10.473258] intel_rapl_common: Found RAPL domain package
[   10.473267] intel_rapl_common: Found RAPL domain dram
[   10.480728] ZFS: Loaded module v2.2.7-pve2, ZFS pool version 5000, ZFS filesystem version 5
[   10.776258] audit: type=1400 audit(1750982864.254:2): apparmor="STATUS" operation="profile_load" profile="unconfined"
 name="pve-container-mounthotplug" pid=1306 comm="apparmor_parser"
[   10.776280] audit: type=1400 audit(1750982864.254:3): apparmor="STATUS" operation="profile_load" profile="unconfined"
 name="/usr/bin/lxc-copy" pid=1308 comm="apparmor_parser"
[   10.776343] audit: type=1400 audit(1750982864.254:4): apparmor="STATUS" operation="profile_load" profile="unconfined"
 name="/usr/bin/lxc-start" pid=1309 comm="apparmor_parser"
[   10.777398] audit: type=1400 audit(1750982864.255:5): apparmor="STATUS" operation="profile_load" profile="unconfined"
 name="swtpm" pid=1311 comm="apparmor_parser"
[   10.778563] audit: type=1400 audit(1750982864.256:6): apparmor="STATUS" operation="profile_load" profile="unconfined"
 name="lsb_release" pid=1303 comm="apparmor_parser"
[   10.778583] audit: type=1400 audit(1750982864.256:7): apparmor="STATUS" operation="profile_load" profile="unconfined"
 name="nvidia_modprobe" pid=1305 comm="apparmor_parser"
[   10.778587] audit: type=1400 audit(1750982864.256:8): apparmor="STATUS" operation="profile_load" profile="unconfined"
 name="nvidia_modprobe//kmod" pid=1305 comm="apparmor_parser"
[   10.779530] audit: type=1400 audit(1750982864.257:9): apparmor="STATUS" operation="profile_load" profile="unconfined"
 name="/usr/bin/man" pid=1310 comm="apparmor_parser"
[   10.779536] audit: type=1400 audit(1750982864.257:10): apparmor="STATUS" operation="profile_load" profile="unconfined
" name="man_filter" pid=1310 comm="apparmor_parser"
[   10.779539] audit: type=1400 audit(1750982864.257:11): apparmor="STATUS" operation="profile_load" profile="unconfined
" name="man_groff" pid=1310 comm="apparmor_parser"
[   10.863728] RPC: Registered named UNIX socket transport module.
[   10.863733] RPC: Registered udp transport module.
[   10.863734] RPC: Registered tcp transport module.
[   10.863735] RPC: Registered tcp-with-tls transport module.
[   10.863736] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   11.708154] vmbr0: port 1(eno1) entered blocking state

But when I checked the ssh terminal next (15 min after crash) the ssh window was near blank. All it had was the node name and timestamp: pve: Fri Jun 27 23:33:59 2025.

I finally understand what you meant about changing the tail -50 value. I lost the lowest lines. I've adjusted accordingly, and am trying again.
 
Last edited:
The logfiles don't show anything. Could you please run on the console on a logged in terminal the command

Code:
watch "dmesg | tail -50"

You may need to change the -50 parameter to reflect the maximum number of lines of your console minus 4. Hopefully it'll show something on the next freeze.
There was little visible content remaining post-crash this time either.

Screenshot 2025-06-28 192705.png

There was plenty on the screen four hours earlier, pasted below:

Code:
Every 2.0s: dmesg | tail -70                                                                                                                                                                                                                                                                    pve: Sat Jun 28 15:19:30 2025

[    9.076889] input: HDA Intel PCH Front Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input7
[    9.077044] input: HDA Intel PCH Rear Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input8
[    9.077149] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.0/0000:01:00.1/sound/card1/input3
[    9.077199] input: HDA Intel PCH Line as /devices/pci0000:00/0000:00:1b.0/sound/card0/input9
[    9.077271] input: HDA Intel PCH Line Out as /devices/pci0000:00/0000:00:1b.0/sound/card0/input10
[    9.077313] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.0/0000:01:00.1/sound/card1/input4
[    9.077384] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.0/0000:01:00.1/sound/card1/input5
[    9.077467] input: HDA Intel PCH Front Headphone as /devices/pci0000:00/0000:00:1b.0/sound/card0/input11
[    9.077683] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.0/0000:01:00.1/sound/card1/input6
[    9.169946] EDAC sbridge: Seeking for: PCI ID 8086:6fa0
[    9.169961] EDAC sbridge: Seeking for: PCI ID 8086:6fa0
[    9.169970] EDAC sbridge: Seeking for: PCI ID 8086:6f60
[    9.169984] EDAC sbridge: Seeking for: PCI ID 8086:6f60
[    9.169988] EDAC sbridge: Seeking for: PCI ID 8086:6fa8
[    9.169992] EDAC sbridge: Seeking for: PCI ID 8086:6fa8
[    9.169996] EDAC sbridge: Seeking for: PCI ID 8086:6f71
[    9.170001] EDAC sbridge: Seeking for: PCI ID 8086:6f71
[    9.170005] EDAC sbridge: Seeking for: PCI ID 8086:6faa
[    9.170009] EDAC sbridge: Seeking for: PCI ID 8086:6faa
[    9.170012] EDAC sbridge: Seeking for: PCI ID 8086:6fab
[    9.170017] EDAC sbridge: Seeking for: PCI ID 8086:6fab
[    9.170021] EDAC sbridge: Seeking for: PCI ID 8086:6fac
[    9.170026] EDAC sbridge: Seeking for: PCI ID 8086:6fad
[    9.170032] EDAC sbridge: Seeking for: PCI ID 8086:6f68
[    9.170037] EDAC sbridge: Seeking for: PCI ID 8086:6f68
[    9.170040] EDAC sbridge: Seeking for: PCI ID 8086:6f79
[    9.170045] EDAC sbridge: Seeking for: PCI ID 8086:6f79
[    9.170049] EDAC sbridge: Seeking for: PCI ID 8086:6f6a
[    9.170053] EDAC sbridge: Seeking for: PCI ID 8086:6f6a
[    9.170057] EDAC sbridge: Seeking for: PCI ID 8086:6f6b
[    9.170062] EDAC sbridge: Seeking for: PCI ID 8086:6f6b
[    9.170065] EDAC sbridge: Seeking for: PCI ID 8086:6f6c
[    9.170071] EDAC sbridge: Seeking for: PCI ID 8086:6f6d
[    9.170076] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[    9.170080] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[    9.170085] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[    9.170088] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[    9.170093] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[    9.170097] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[    9.170171] EDAC MC0: Giving out device to module sb_edac controller Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0 (INTERRUPT)
[    9.170235] EDAC MC1: Giving out device to module sb_edac controller Broadwell SrcID#0_Ha#1: DEV 0000:ff:12.4 (INTERRUPT)
[    9.170237] EDAC sbridge:  Ver: 1.1.2
[    9.179092] intel_rapl_common: Found RAPL domain package
[    9.179100] intel_rapl_common: Found RAPL domain dram
[    9.185945] ZFS: Loaded module v2.2.7-pve2, ZFS pool version 5000, ZFS filesystem version 5
[    9.451782] audit: type=1400 audit(1751039500.924:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="pve-container-mounthotplug" pid=1313 comm="apparmor_parser"
[    9.451814] audit: type=1400 audit(1751039500.924:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-copy" pid=1315 comm="apparmor_parser"
[    9.451859] audit: type=1400 audit(1751039500.924:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=1316 comm="apparmor_parser"
[    9.453161] audit: type=1400 audit(1751039500.926:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="swtpm" pid=1318 comm="apparmor_parser"
[    9.454697] audit: type=1400 audit(1751039500.927:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lsb_release" pid=1310 comm="apparmor_parser"
[    9.454747] audit: type=1400 audit(1751039500.927:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1312 comm="apparmor_parser"
[    9.454750] audit: type=1400 audit(1751039500.927:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1312 comm="apparmor_parser"
[    9.455322] audit: type=1400 audit(1751039500.928:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=1317 comm="apparmor_parser"
[    9.455328] audit: type=1400 audit(1751039500.928:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=1317 comm="apparmor_parser"
[    9.455330] audit: type=1400 audit(1751039500.928:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=1317 comm="apparmor_parser"
[    9.525969] RPC: Registered named UNIX socket transport module.
[    9.525979] RPC: Registered udp transport module.
[    9.525980] RPC: Registered tcp transport module.
[    9.525981] RPC: Registered tcp-with-tls transport module.
[    9.525982] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   10.391341] vmbr0: port 1(eno1) entered blocking state
[   10.391347] vmbr0: port 1(eno1) entered disabled state
[   10.391363] e1000e 0000:00:19.0 eno1: entered allmulticast mode
[   10.391406] e1000e 0000:00:19.0 eno1: entered promiscuous mode
[   13.904155] e1000e 0000:00:19.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   13.904215] vmbr0: port 1(eno1) entered blocking state
[   13.904219] vmbr0: port 1(eno1) entered forwarding state
[   19.905946] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[   19.905950] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[   24.043025] kvm_intel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.

What's the next step in diagnosing this?

Any recommendation on preserving the watch "dmesg | tail -50" content in a terminal window? Is there another ssh client or setting that will do it? I'm using PuTTY on Win 11 and have a Linux Mint laptop available too.
 
Last edited:
The logfiles don't show anything. Could you please run on the console on a logged in terminal the command

Code:
watch "dmesg | tail -50"

You may need to change the -50 parameter to reflect the maximum number of lines of your console minus 4. Hopefully it'll show something on the next freeze.

It was pure luck that I looked at the ssh session just as it crashed. I got a copy a few seconds before it disappeared.

Code:
root@pve:~# watch "dmesg | tail -70"
Every 2.0s: dmesg | tail -70                                                                                                                                                                                                                                                                    pve: Sun Jun 29 13:33:54 2025

[   10.376925] EDAC sbridge: Seeking for: PCI ID 8086:6f6c
[   10.376931] EDAC sbridge: Seeking for: PCI ID 8086:6f6d
[   10.376937] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[   10.376940] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[   10.376945] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[   10.376949] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[   10.376953] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[   10.376957] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[   10.377048] EDAC MC0: Giving out device to module sb_edac controller Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0 (INTERRUPT)
[   10.377111] EDAC MC1: Giving out device to module sb_edac controller Broadwell SrcID#0_Ha#1: DEV 0000:ff:12.4 (INTERRUPT)
[   10.377113] EDAC sbridge:  Ver: 1.1.2
[   10.384912] ZFS: Loaded module v2.2.7-pve2, ZFS pool version 5000, ZFS filesystem version 5
[   10.388013] intel_rapl_common: Found RAPL domain package
[   10.388020] intel_rapl_common: Found RAPL domain dram
[   10.725930] audit: type=1400 audit(1751121345.206:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-copy" pid=1304 comm="apparmor_parser"
[   10.725938] audit: type=1400 audit(1751121345.206:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="pve-container-mounthotplug" pid=1302 comm="apparmor_parser"
[   10.725961] audit: type=1400 audit(1751121345.206:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=1305 comm="apparmor_parser"
[   10.730154] audit: type=1400 audit(1751121345.211:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="swtpm" pid=1307 comm="apparmor_parser"
[   10.731740] audit: type=1400 audit(1751121345.212:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lsb_release" pid=1299 comm="apparmor_parser"
[   10.731780] audit: type=1400 audit(1751121345.212:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1301 comm="apparmor_parser"
[   10.731783] audit: type=1400 audit(1751121345.212:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1301 comm="apparmor_parser"
[   10.732903] audit: type=1400 audit(1751121345.213:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=1306 comm="apparmor_parser"
[   10.732909] audit: type=1400 audit(1751121345.213:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=1306 comm="apparmor_parser"
[   10.732911] audit: type=1400 audit(1751121345.213:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=1306 comm="apparmor_parser"
[   10.813503] RPC: Registered named UNIX socket transport module.
[   10.813507] RPC: Registered udp transport module.
[   10.813508] RPC: Registered tcp transport module.
[   10.813509] RPC: Registered tcp-with-tls transport module.
[   10.813509] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   11.734571] vmbr0: port 1(eno1) entered blocking state
[   11.734576] vmbr0: port 1(eno1) entered disabled state
[   11.734586] e1000e 0000:00:19.0 eno1: entered allmulticast mode
[   11.734630] e1000e 0000:00:19.0 eno1: entered promiscuous mode
[   15.359158] e1000e 0000:00:19.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   15.359217] vmbr0: port 1(eno1) entered blocking state
[   15.359222] vmbr0: port 1(eno1) entered forwarding state
[   21.506857] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[   21.506861] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[   25.574040] kvm_intel: L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[53854.432824] ata7.00: exception Emask 0x0 SAct 0xe0821000 SErr 0x0 action 0x6 frozen
[53854.432832] ata7.00: failed command: READ FPDMA QUEUED
[53854.432833] ata7.00: cmd 60/00:60:00:08:20/01:00:00:00:00/40 tag 12 ncq dma 131072 in
                        res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[53854.432838] ata7.00: status: { DRDY }
[53854.432840] ata7.00: failed command: WRITE FPDMA QUEUED
[53854.432841] ata7.00: cmd 61/18:88:08:07:0a/00:00:03:00:00/40 tag 17 ncq dma 12288 out
                        res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[53854.432845] ata7.00: status: { DRDY }
[53854.432846] ata7.00: failed command: WRITE FPDMA QUEUED
[53854.432847] ata7.00: cmd 61/18:b8:48:67:39/00:00:07:00:00/40 tag 23 ncq dma 12288 out
                        res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[53854.432850] ata7.00: status: { DRDY }
[53854.432852] ata7.00: failed command: WRITE FPDMA QUEUED
[53854.432853] ata7.00: cmd 61/10:e8:a8:8d:6a/00:00:05:00:00/40 tag 29 ncq dma 8192 out
                        res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[53854.432856] ata7.00: status: { DRDY }
[53854.432857] ata7.00: failed command: READ FPDMA QUEUED
[53854.432858] ata7.00: cmd 60/00:f0:00:08:20/01:00:00:00:00/40 tag 30 ncq dma 131072 in
                        res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[53854.432861] ata7.00: status: { DRDY }
[53854.432862] ata7.00: failed command: WRITE FPDMA QUEUED
[53854.432863] ata7.00: cmd 61/08:f8:48:74:08/00:00:03:00:00/40 tag 31 ncq dma 4096 out
                        res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[53854.432866] ata7.00: status: { DRDY }
[53854.432868] ata7: hard resetting link
[53859.782674] ata7: link is slow to respond, please be patient (ready=0)
[53864.462602] ata7: hard resetting link
[53869.814558] ata7: link is slow to respond, please be patient (ready=0)
[53874.494460] ata7: hard resetting link
[53879.846364] ata7: link is slow to respond, please be patient (ready=0)

So, the two errors are
Code:
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: failed command: WRITE FPDMA QUEUED

How do I figure out which device is ata7.00? I've got 4 x SSDs and 2 x DVD-RW drives plugged in.

It's the Silicon Power boot drive pve kernel: ata7.00: ATA-11: [B]SPCC[/B] Solid State Disk, SBFM61.2, max UDMA/133. Fortunately, I only have one from that manufacturer.

Next step is to change the SATA data and power cables, and try another mobo port. If all else fails, I'll reinstall PVE on another boot drive.
 
Last edited:
To discover which disk is using that port 7, try:
Code:
lshw -c storage -c disk
The bus info fields should give you the physical port used.

Based on your output & searching the web for similar issues, it would appear that your sata bus cannot keep up with the traffic, either due to a HW fault or a power issue on your MB. Suggested workarounds; I would try adding to the kernel command line the following:
Code:
libata.force=7:3.0G
I believe this will limit the sata bus 7 to 3gb/s

If you want to limit all the sata buses to 3gb/s you could try:
Code:
libata.force=3.0G

Good luck. (I've never done this personally).
 
I've changed the power, data, and sata port, and the same error reoccured pointing at the same Silicon Power boot drive on its new port... so the fault probably lies with the boot drive itself.

I'll reinstall PVE on a different SSD. Thankfully this is a new install, so there's no data to worry about migrating over.
 
  • Like
Reactions: LnxBil