LVM storage is misbehaving after the latest update

Ramphex

New Member
Apr 30, 2023
7
1
3
So I'm running a Proxmox setup with 3 VMs in it. After the latest update, I cannot start any of the VMs without getting a ton of errors. Upon booting with VMs set to not auto-start, the storage is visible and all of the virtual drives are listed, but once I attempt to boot any of the VMs, I get a ton of errors including "scsi error badly formed scsi parameters" and then the storage status switches from Active: Yes to Active: No.

Not even sure what to do at this point or what could've caused this type of issue. I did create a backup of the drive with ddrescue just in case.

Also on boot, it now also gets caught up on a screen displaying this for about 30 seconds

[1.511026] ata2.00: failed to enable AA (error_mask=0x1)
[1.513811] ata2.00: failed to enable AA (error_mask=0x1)

Found volume group "VMs" using metadata type lvm2
Found volume group "pve" using metadata type lvm2

4 logical volume(s) in volume group "VMs' now active
3 logical volume(s) in volume group "pve" now active

/dev/mapper/pve-root: clean, 145398/3653632 files, 4753633/14614528 blocks

Let me know what information I could post that could be relevant as there's a lot of different stuff I could copy and paste here.

Thanks in advance.
 
I tried booting with a different kernel from Grub, the problem persisted.

root@pve:~# pveversion -v


proxmox-ve: 7.4-1 (running kernel: 5.15.107-1-pve)


pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)


pve-kernel-5.15: 7.4-2


pve-kernel-5.13: 7.1-9


pve-kernel-5.11: 7.0-10


pve-kernel-5.15.107-1-pve: 5.15.107-1


pve-kernel-5.15.85-1-pve: 5.15.85-1


pve-kernel-5.13.19-6-pve: 5.13.19-15


pve-kernel-5.11.22-7-pve: 5.11.22-12


pve-kernel-5.11.22-1-pve: 5.11.22-2


ceph-fuse: 15.2.13-pve1


corosync: 3.1.7-pve1


criu: 3.15-1+pve-1


glusterfs-client: 9.2-1


ifupdown2: 3.1.0-1+pmx3


ksm-control-daemon: 1.4-1


libjs-extjs: 7.0.0-1


libknet1: 1.24-pve2


libproxmox-acme-perl: 1.4.4


libproxmox-backup-qemu0: 1.3.1-1


libproxmox-rs-perl: 0.2.1


libpve-access-control: 7.4-2


libpve-apiclient-perl: 3.2-1


libpve-common-perl: 7.3-4


libpve-guest-common-perl: 4.2-4


libpve-http-server-perl: 4.2-3


libpve-rs-perl: 0.7.5


libpve-storage-perl: 7.4-2


libspice-server1: 0.14.3-2.1


lvm2: 2.03.11-2.1


lxc-pve: 5.0.2-2


lxcfs: 5.0.3-pve1


novnc-pve: 1.4.0-1


proxmox-backup-client: 2.4.1-1


proxmox-backup-file-restore: 2.4.1-1


proxmox-kernel-helper: 7.4-1


proxmox-mail-forward: 0.1.1-1


proxmox-mini-journalreader: 1.3-1


proxmox-offline-mirror-helper: 0.5.1-1


proxmox-widget-toolkit: 3.6.5


pve-cluster: 7.3-3


pve-container: 4.4-3


pve-docs: 7.4-2


pve-edk2-firmware: 3.20230228-2


pve-firewall: 4.3-1


pve-firmware: 3.6-5


pve-ha-manager: 3.6.1


pve-i18n: 2.12-1


pve-qemu-kvm: 7.2.0-8


pve-xtermjs: 4.16.0-1


qemu-server: 7.4-3


smartmontools: 7.2-pve3


spiceterm: 3.2-2


swtpm: 0.8.0~bpo11+3


vncterm: 1.7-1


zfsutils-linux: 2.1.11-pve1
 
Perhaps the problem is really a hardware failure, that does happen.
Search for the error message, i.e. https://askubuntu.com/questions/341502/how-to-fix-error-ata1-00-failed-to-enable-aa-0x1-error-mask


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I did try that without any luck. I changed the cables out and tried different sata ports. It also will not let me run a fsck, just shows up as this

root@pve:~# fsck -y /dev/sdb
fsck from util-linux 2.36.1
root@pve:~#

Adding the noncq flag removes that ata error, but it still hangs up on the /dev/mapper/pve-root: clean line for 60ish seconds.

Interestingly enough, none of these issues until the latest update. There hasn't been any glitches or slowdowns. Updated, restarted, all 3 VMs are down.

Any suggestions on troubleshooting the drive?

root@pve:~# smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.107-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 870 EVO 500GB
Serial Number: S62ANZ0R418389E
LU WWN Device Id: 5 002538 fc140e553
Firmware Version: SVT02B6Q
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue May 2 18:35:29 2023 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Read SMART Data failed: scsi error badly formed scsi parameters

=== START OF READ SMART DATA SECTION ===
SMART Status command failed: scsi error badly formed scsi parameters
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.

Read SMART Log Directory failed: scsi error badly formed scsi parameters

Read SMART Error Log failed: scsi error badly formed scsi parameters

Read SMART Self-test Log failed: scsi error badly formed scsi parameters

Selective Self-tests/Logging not supported

root@pve:~#
EDIT: Unplugging the VM storage drive still yields in the system getting stuck on the pve-root: clean message for a minimum of 60 seconds at boot.

EDIT2: Some more data

root@pve:~# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
Model Number: Samsung SSD 870 EVO 500GB
Serial Number: S62ANZ0R418389E
Firmware Revision: SVT02B6Q
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x005e)
Supported: 11 8 7 6 5
Likely used: 11
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 976773168
Logical Sector size: 512 bytes
Physical Sector size: 512 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 476940 MBytes
device size with M = 1000*1000: 500107 MBytes (500 GB)
cache/buffer size = unknown
Form Factor: 2.5 inch
Nominal Media Rotation Rate: Solid State Device
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 1 Current = 1
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
Write-Read-Verify feature set
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Phy event counters
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Asynchronous notification (eg. media change)
* Software settings preservation
Device Sleep (DEVSLP)
unknown 78[10]
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
* Device encrypts all user data
* DOWNLOAD MICROCODE DMA command
* SET MAX SETPASSWORD/UNLOCK DMA commands
* WRITE BUFFER DMA command
* READ BUFFER DMA command
* Data Set Management TRIM supported (limit 8 blocks)
* Deterministic read ZEROs after TRIM
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
4min for SECURITY ERASE UNIT. 8min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5002538fc140e553
NAA : 5
IEEE OUI : 002538
Unique ID : fc140e553
Device Sleep:
DEVSLP Exit Timeout (DETO): 50 ms (drive)
Minimum DEVSLP Assertion Time (MDAT): 30 ms (drive)
Checksum: correct
 
Last edited:
Okay. So after restoring the ddrescue backup to a new SSD, my VMs are back up and running. The only problem remains is this line during boot that holds up the startup for 60+ seconds.

/dev/mapper/pve-root: clean, 80742/3653631 files, 3739911/14614528 blocks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!