ProxMox fail to respond

amerikiwi

New Member
Jan 28, 2024
3
0
1
The ProxMox server is stopping responding. Here is the log:
Code:
Jun 12 07:04:02 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Jun 12 07:04:02 pve-002 smartd[543]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 122 to 121
Jun 12 07:06:43 pve-002 kernel: perf: interrupt took too long (4912 > 4911), lowering kernel.perf_event_max_sample_rate to 40000
Jun 12 07:07:12 pve-002 pvestatd[914]: auth key pair too old, rotating..
Jun 12 07:17:02 pve-002 CRON[90226]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 12 07:17:04 pve-002 CRON[90247]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 12 07:17:04 pve-002 CRON[90226]: pam_unix(cron:session): session closed for user root
Jun 12 07:17:23 pve-002 systemd[1]: Starting apt-daily.service - Daily apt download activities...
Jun 12 07:17:24 pve-002 systemd[1]: apt-daily.service: Deactivated successfully.
Jun 12 07:17:24 pve-002 systemd[1]: Finished apt-daily.service - Daily apt download activities.
Jun 12 07:22:59 pve-002 pvedaemon[934]: worker exit
Jun 12 07:22:59 pve-002 pvedaemon[933]: worker 934 finished
Jun 12 07:22:59 pve-002 pvedaemon[933]: starting 1 worker(s)
Jun 12 07:22:59 pve-002 pvedaemon[933]: worker 91666 started
Jun 12 07:25:40 pve-002 pvedaemon[91666]: <root@pam> successful auth for user 'root@pam'
Jun 12 07:28:39 pve-002 pveproxy[78979]: worker exit
Jun 12 07:28:39 pve-002 pveproxy[940]: worker 78979 finished
Jun 12 07:28:39 pve-002 pveproxy[940]: starting 1 worker(s)
Jun 12 07:28:39 pve-002 pveproxy[940]: worker 93000 started
Jun 12 07:34:02 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Jun 12 07:34:03 pve-002 smartd[543]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
Jun 12 07:34:03 pve-002 smartd[543]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
Jun 12 07:34:03 pve-002 postfix/pickup[86076]: EAAD6203DB: uid=0 from=<root>
Jun 12 07:34:04 pve-002 postfix/cleanup[94186]: EAAD6203DB: message-id=<20240612113403.EAAD6203DB@pve-002.home.arpa>
Jun 12 07:34:04 pve-002 postfix/qmgr[885]: EAAD6203DB: from=<root@pve-002.home.arpa>, size=1048, nrcpt=1 (queue active)
Jun 12 07:34:04 pve-002 postfix/pickup[86076]: 7E4BF2040F: uid=65534 from=<root>
Jun 12 07:34:04 pve-002 postfix/cleanup[94186]: 7E4BF2040F: message-id=<20240612113403.EAAD6203DB@pve-002.home.arpa>
Jun 12 07:34:04 pve-002 proxmox-mail-fo[94190]: pve-002 proxmox-mail-forward[94190]: notified via target `mail-to-root`
Jun 12 07:34:04 pve-002 postfix/qmgr[885]: 7E4BF2040F: from=<root@pve-002.home.arpa>, size=1223, nrcpt=1 (queue active)
Jun 12 07:34:04 pve-002 postfix/local[94189]: EAAD6203DB: to=<root@pve-002.home.arpa>, orig_to=<root>, relay=local, delay=0.85, delays=0.38/0.06/0/0.41, dsn=2.0.0, status=sent (delivered to command: /usr/bin/proxmox-mail-forward)
Jun 12 07:34:04 pve-002 postfix/qmgr[885]: EAAD6203DB: removed
Jun 12 07:34:07 pve-002 postfix/smtp[94193]: 7E4BF2040F: to=<david.mcdowall@gmail.com>, relay=gmail-smtp-in.l.google.com[172.253.122.26]:25, delay=2.9, delays=0.07/0.09/1.8/0.88, dsn=5.7.1, status=bounced (host gmail-smtp-in.l.google.com[172.253.122.26] said: 550-5.7.1 [69.23.56.202] The IP you're using to send mail is not authorized to 550-5.7.1 send email directly to our servers. Please use the SMTP relay at your 550-5.7.1 service provider instead. For more information, go to 550 5.7.1  https://support.google.com/mail/?p=NotAuthorizedError af79cd13be357-79550268696si1082751185a.244 - gsmtp (in reply to end of DATA command))
Jun 12 07:34:07 pve-002 postfix/qmgr[885]: 7E4BF2040F: removed
Jun 12 07:34:07 pve-002 postfix/cleanup[94186]: 5DCFD20410: message-id=<20240612113407.5DCFD20410@pve-002.home.arpa>
Jun 12 08:04:03 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Jun 12 08:17:02 pve-002 CRON[102477]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 12 08:17:02 pve-002 CRON[102478]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 12 08:17:02 pve-002 CRON[102477]: pam_unix(cron:session): session closed for user root
Jun 12 08:34:02 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Jun 12 08:35:38 pve-002 chronyd[757]: Selected source 23.168.24.210 (2.debian.pool.ntp.org)
Jun 12 09:04:02 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Jun 12 09:17:01 pve-002 CRON[112309]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 12 09:17:02 pve-002 CRON[112310]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 12 09:17:02 pve-002 CRON[112309]: pam_unix(cron:session): session closed for user root
Jun 12 09:34:02 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Jun 12 10:04:02 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Jun 12 10:17:02 pve-002 CRON[122734]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 12 10:17:02 pve-002 CRON[122735]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 12 10:17:02 pve-002 CRON[122734]: pam_unix(cron:session): session closed for user root
-- Reboot --
Jun 12 18:43:27 pve-002 kernel: Linux version 6.8.4-3-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) ()
Jun 12 18:43:27 pve-002 kernel: Command line: BOOT_IMAGE=/vmlinuz-6.8.4-3-pve root=ZFS=/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
Jun 12 18:43:27 pve-002 kernel: KERNEL supported cpus:
Jun 12 18:43:27 pve-002 kernel:   Intel GenuineIntel
Jun 12 18:43:27 pve-002 kernel:   AMD AuthenticAMD
Jun 12 18:43:27 pve-002 kernel:   Hygon HygonGenuine
Jun 12 18:43:27 pve-002 kernel:   Centaur CentaurHauls
Jun 12 18:43:27 pve-002 kernel:   zhaoxin   Shanghai

What could be causing this?
 
Hey,

Code:
Jun 12 07:04:02 pve-002 smartd[543]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 122 to 121
122 degrees for a disk, but also in general, seems pretty hot. Also

Code:
Jun 12 07:04:02 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors

other than that I can't really see anything in the logs. Is this the first time your host crashed?
 
Last edited:
Hey,

Code:
Jun 12 07:04:02 pve-002 smartd[543]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 122 to 121
122 degrees for a disk, but also in general, seems pretty hot.
This is not the actual temperature in Celsius (even though the attribute is named that), the normalized value is probably closer to 48C.
Code:
Jun 12 07:04:02 pve-002 smartd[543]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors

other than that I can't really see anything in the logs. Is this the first time your host crashed?
It does indeed appear to be a disk problem and maybe filesystem corruption. Maybe check your rpool and scrub it.
 
That drive /dev/sda is probably about to fail completely. I would live boot to a Linux rescue/recovery media & backup/check it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!