Proxmox hanging on various different operations since power cut

ShaunG

Member
Jul 12, 2022
37
3
11
Jul 19 15:19:42 proxmox pvestatd[1516]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5

Jul 19 15:21:31 proxmox kernel: blk_update_request: critical medium error, dev sdc, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0

I've getting the above and some other errors that's literally causing everything to hang up, all since a power cut earlier today. Any advice much appreciated!

"Disks" won't load in the UI either, and one disk on the left nav shows a grey question mark icon.
 
Sounds like drive /dev/sdc is corrupted or broken by the power failure. You can find out which drive it is via ls -l /dev/disk/by-id/ | grep sdc.
Does smartctl -a /dev/sdc show anything interesting? Does smartctl -t long /dev/sdc succeed (after some time)?
If it is an SSD, a secure format might wipe everything and get it back in working order. I hope you have backups.
 
Hi @leesteken ,

SMART overall-health self-assessment test result: PASSED

I'm not sure what else would be interesting? I'm running the long test it says please wait 116minutes for test to complete, but dropped to shell after, how would i know its complete?

Also I can see what drive it is, in that it's a Samsung, but it doesn't really help as I cannot see what it's being used for, so I cannot shut down/remove it etc?

Sorry for all of the questions but the Storage on proxmox has always confused me!
 
Hi @leesteken ,

SMART overall-health self-assessment test result: PASSED
Check the output under Vendor Specific SMART Attributes with Thresholds:. Maybe show them here.
I'm not sure what else would be interesting? I'm running the long test it says please wait 116minutes for test to complete, but dropped to shell after, how would i know its complete?
You count to 6960 slowly ;-). Just check after some time with smartctl -a /dev/sdc. There should be some output at the end about this test (once completed), something like: # Extended offline Completed without error 00% .
Also I can see what drive it is, in that it's a Samsung, but it doesn't really help as I cannot see what it's being used for, so I cannot shut down/remove it etc?
Does it not show a serial number in the name? I assumed you know your hardware configuration. I don't know how to get an overview of all your drives, ZFS and LVM partitions and their usages. What is the output of cat /etc/pve/storage.cfg? If it takes 116 minutes, it is probably a spinning HDD and not an SSD? What other drives do you have in the system?
If Proxmox itself it having issues and not just a VM or two, then maybe its easiest to replace the drive and reinstall or restore from backups.
 
Check the output under Vendor Specific SMART Attributes with Thresholds:. Maybe show them here.

You count to 6960 slowly ;-). Just check after some time with smartctl -a /dev/sdc. There should be some output at the end about this test (once completed), something like: # Extended offline Completed without error 00% .

Does it not show a serial number in the name? I assumed you know your hardware configuration. I don't know how to get an overview of all your drives, ZFS and LVM partitions and their usages. What is the output of cat /etc/pve/storage.cfg? If it takes 116 minutes, it is probably a spinning HDD and not an SSD? What other drives do you have in the system?
If Proxmox itself it having issues and not just a VM or two, then maybe its easiest to replace the drive and reinstall or restore from backups.

Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       22004
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   091   086   025    Pre-fail  Always       -       2856
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3517
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       45111
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       401
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1835
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       7102
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   047   000    Old_age   Always       -       34 (Min/Max 2/54)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       6701
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       401
225 Load_Cycle_Count        0x0032   091   091   000    Old_age   Always       -       94668

90% to go on the extended so likely a HDD as you say, I only use SSD for local storage, I believe this may therefore be used for camera storage, but it's impossible to tell as the UI just shows "connection timeout" when trying to view the storage tab.... not much help.

Storage.cfg below, again doesn't show the drives or anything:

Code:
dir: local
    path /var/lib/vz
    content backup,iso,vztmpl
    prune-backups keep-last=6
    shared 0

lvmthin: local-lvm
    thinpool data
    vgname pve
    content images,rootdir

lvm: zm-lvm
    vgname zmdata
    content rootdir,images
    shared 0

lvm: zm2-lvm
    vgname zmdata2
    content images,rootdir
    shared 0

lvm: local-lvm2
    vgname pve2
    content rootdir,images
    shared 0

cifs: smb
    path /mnt/pve/smb
    server 192.168.1.x
    share proxmox
    content backup,iso,images
    nodes proxmox
    prune-backups keep-last=5
    username proxmox