[SOLVED] proxmox webgui freezes, file system read-only, exception frozen

skTom

New Member
Jan 4, 2024
2
0
1
Hi, I need help.

A few days ago, I was reinstalling Proxmox, and since then, I've been experiencing system freezes. Every few or several hours, the webGUI and all VM/LXC instances stop responding, or the file system becomes read-only.

Code:
Jan 05 03:00:26 proxmox kernel: ata1.00: exception Emask 0x0 SAct 0x1003c SErr 0x0 action 0x6 frozen
Jan 05 03:01:27 proxmox kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jan 05 03:01:27 proxmox kernel: ata1.00: cmd 61/08:10:00:3e:b4/00:00:11:00:00/40 tag 2 ncq dma 4096 out
         res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 05 03:01:27 proxmox kernel: ata1.00: status: { DRDY }
Jan 05 03:01:27 proxmox kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jan 05 03:01:27 proxmox kernel: ata1.00: cmd 61/08:18:b8:44:b4/00:00:11:00:00/40 tag 3 ncq dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 05 03:01:27 proxmox kernel: ata1.00: status: { DRDY }
Jan 05 03:01:27 proxmox kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jan 05 03:01:27 proxmox kernel: ata1.00: cmd 61/08:20:e0:f0:ef/00:00:0d:00:00/40 tag 4 ncq dma 4096 out
         res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 05 03:01:27 proxmox kernel: ata1.00: status: { DRDY }
Jan 05 03:01:27 proxmox kernel: ata1.00: failed command: READ FPDMA QUEUED
Jan 05 03:01:27 proxmox kernel: ata1.00: cmd 60/00:28:00:08:20/01:00:00:00:00/40 tag 5 ncq dma 131072 in
         res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 05 03:01:27 proxmox kernel: ata1.00: status: { DRDY }
Jan 05 03:01:27 proxmox kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jan 05 03:01:27 proxmox kernel: ata1.00: cmd 61/10:80:00:b2:28/00:00:07:00:00/40 tag 16 ncq dma 8192 out
         res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 05 03:01:27 proxmox kernel: ata1.00: status: { DRDY }
Jan 05 03:01:27 proxmox kernel: ata1: hard resetting link
Jan 05 03:01:27 proxmox kernel: ata1: link is slow to respond, please be patient (ready=0)
Jan 05 03:01:27 proxmox kernel: ata1: softreset failed (device not ready)
Jan 05 03:01:27 proxmox kernel: ata1: hard resetting link
Jan 05 03:01:27 proxmox kernel: ata1: link is slow to respond, please be patient (ready=0)
Jan 05 03:01:27 proxmox kernel: ata1: softreset failed (device not ready)
Jan 05 03:01:27 proxmox kernel: ata1: hard resetting link
Jan 05 03:01:27 proxmox kernel: ata1: link is slow to respond, please be patient (ready=0)
Jan 05 03:01:27 proxmox kernel: ata1: link is slow to respond, please be patient (ready=0)
Jan 05 03:01:27 proxmox kernel: ata1: softreset failed (device not ready)
Jan 05 03:01:27 proxmox kernel: ata1: limiting SATA link speed to 3.0 Gbps
Jan 05 03:01:27 proxmox kernel: ata1: hard resetting link
Jan 05 03:01:27 proxmox kernel: ata1: softreset failed (device not ready)
Jan 05 03:01:27 proxmox kernel: ata1: reset failed, giving up
Jan 05 03:01:27 proxmox kernel: ata1.00: disable device
Jan 05 03:01:27 proxmox kernel: ata1: EH complete
Jan 05 03:01:27 proxmox kernel: sd 0:0:0:0: [sda] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Jan 05 03:01:27 proxmox kernel: sd 0:0:0:0: [sda] tag#2 CDB: Read(10) 28 00 38 ca 37 98 00 00 08 00
Jan 05 03:01:27 proxmox kernel: I/O error, dev sda, sector 952776600 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
Jan 05 03:01:27 proxmox kernel: sd 0:0:0:0: [sda] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Jan 05 03:01:27 proxmox kernel: sd 0:0:0:0: [sda] tag#24 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
Jan 05 03:01:27 proxmox kernel: sd 0:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=96s
Jan 05 03:01:27 proxmox kernel: sd 0:0:0:0: [sda] tag#3 CDB: Write(10) 2a 00 11 b4 3e 00 00 00 08 00
Jan 05 03:01:27 proxmox kernel: I/O error, dev sda, sector 297025024 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 2
Jan 05 03:01:27 proxmox kernel: Buffer I/O error on dev dm-7, logical block 355696, lost async page write


pveversion -v
Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.5
pve-qemu-kvm: 8.1.2-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1

SMART

Code:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.11-7-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Device Model:     INTENSO
Serial Number:    AA000000000000003231
Firmware Version: V0621A0
User Capacity:    512,110,190,592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5319
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jan  6 22:59:24 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.
SCT capabilities:              (0x0001) SCT Status supported.


SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       4550
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       81
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       100
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       10
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       8998
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       129
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       2
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       41
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5050
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       74
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       40
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       100
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       78485
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       67568
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       190224

Is the problem with the SSD? I don't know how to interpret the SMART logs above.

thanks.
 
Hello,

the SMART output looks good to me.

I think there is a problem either with your sata-controller/cable or your SSD (SMART cannot find any error) .
Can you try using a different cable or SSD to find out what is faulty?
 
Last edited:
Hello,

the SMART output looks good to me.

I think there is a problem either with your sata-controller/cable or your SSD (SMART cannot find any error) .
Can you try using a different cable or SSD to find out what is faulty?
Hi, I noticed that the SMART results were not changing. I tested the drive and found that it was terribly slow and had a couple of bad sectors. After replacing the drive so far it is ok, if the problem does not occur for a few days I will mark this topic as resolved.


EDIT: mark as solved, the problem does not occur after changing the disk.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!