LVM storage is misbehaving after the latest update

Ramphex

New Member
Apr 30, 2023
7
1
3
So I'm running a Proxmox setup with 3 VMs in it. After the latest update, I cannot start any of the VMs without getting a ton of errors. Upon booting with VMs set to not auto-start, the storage is visible and all of the virtual drives are listed, but once I attempt to boot any of the VMs, I get a ton of errors including "scsi error badly formed scsi parameters" and then the storage status switches from Active: Yes to Active: No.

Not even sure what to do at this point or what could've caused this type of issue. I did create a backup of the drive with ddrescue just in case.

Also on boot, it now also gets caught up on a screen displaying this for about 30 seconds

[1.511026] ata2.00: failed to enable AA (error_mask=0x1)
[1.513811] ata2.00: failed to enable AA (error_mask=0x1)

Found volume group "VMs" using metadata type lvm2
Found volume group "pve" using metadata type lvm2

4 logical volume(s) in volume group "VMs' now active
3 logical volume(s) in volume group "pve" now active

/dev/mapper/pve-root: clean, 145398/3653632 files, 4753633/14614528 blocks

Let me know what information I could post that could be relevant as there's a lot of different stuff I could copy and paste here.

Thanks in advance.
 
I tried booting with a different kernel from Grub, the problem persisted.

root@pve:~# pveversion -v


proxmox-ve: 7.4-1 (running kernel: 5.15.107-1-pve)


pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)


pve-kernel-5.15: 7.4-2


pve-kernel-5.13: 7.1-9


pve-kernel-5.11: 7.0-10


pve-kernel-5.15.107-1-pve: 5.15.107-1


pve-kernel-5.15.85-1-pve: 5.15.85-1


pve-kernel-5.13.19-6-pve: 5.13.19-15


pve-kernel-5.11.22-7-pve: 5.11.22-12


pve-kernel-5.11.22-1-pve: 5.11.22-2


ceph-fuse: 15.2.13-pve1


corosync: 3.1.7-pve1


criu: 3.15-1+pve-1


glusterfs-client: 9.2-1


ifupdown2: 3.1.0-1+pmx3


ksm-control-daemon: 1.4-1


libjs-extjs: 7.0.0-1


libknet1: 1.24-pve2


libproxmox-acme-perl: 1.4.4


libproxmox-backup-qemu0: 1.3.1-1


libproxmox-rs-perl: 0.2.1


libpve-access-control: 7.4-2


libpve-apiclient-perl: 3.2-1


libpve-common-perl: 7.3-4


libpve-guest-common-perl: 4.2-4


libpve-http-server-perl: 4.2-3


libpve-rs-perl: 0.7.5


libpve-storage-perl: 7.4-2


libspice-server1: 0.14.3-2.1


lvm2: 2.03.11-2.1


lxc-pve: 5.0.2-2


lxcfs: 5.0.3-pve1


novnc-pve: 1.4.0-1


proxmox-backup-client: 2.4.1-1


proxmox-backup-file-restore: 2.4.1-1


proxmox-kernel-helper: 7.4-1


proxmox-mail-forward: 0.1.1-1


proxmox-mini-journalreader: 1.3-1


proxmox-offline-mirror-helper: 0.5.1-1


proxmox-widget-toolkit: 3.6.5


pve-cluster: 7.3-3


pve-container: 4.4-3


pve-docs: 7.4-2


pve-edk2-firmware: 3.20230228-2


pve-firewall: 4.3-1


pve-firmware: 3.6-5


pve-ha-manager: 3.6.1


pve-i18n: 2.12-1


pve-qemu-kvm: 7.2.0-8


pve-xtermjs: 4.16.0-1


qemu-server: 7.4-3


smartmontools: 7.2-pve3


spiceterm: 3.2-2


swtpm: 0.8.0~bpo11+3


vncterm: 1.7-1


zfsutils-linux: 2.1.11-pve1
 
Perhaps the problem is really a hardware failure, that does happen.
Search for the error message, i.e. https://askubuntu.com/questions/341502/how-to-fix-error-ata1-00-failed-to-enable-aa-0x1-error-mask


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I did try that without any luck. I changed the cables out and tried different sata ports. It also will not let me run a fsck, just shows up as this

root@pve:~# fsck -y /dev/sdb
fsck from util-linux 2.36.1
root@pve:~#

Adding the noncq flag removes that ata error, but it still hangs up on the /dev/mapper/pve-root: clean line for 60ish seconds.

Interestingly enough, none of these issues until the latest update. There hasn't been any glitches or slowdowns. Updated, restarted, all 3 VMs are down.

Any suggestions on troubleshooting the drive?

root@pve:~# smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.107-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 870 EVO 500GB
Serial Number: S62ANZ0R418389E
LU WWN Device Id: 5 002538 fc140e553
Firmware Version: SVT02B6Q
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue May 2 18:35:29 2023 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Read SMART Data failed: scsi error badly formed scsi parameters

=== START OF READ SMART DATA SECTION ===
SMART Status command failed: scsi error badly formed scsi parameters
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.

Read SMART Log Directory failed: scsi error badly formed scsi parameters

Read SMART Error Log failed: scsi error badly formed scsi parameters

Read SMART Self-test Log failed: scsi error badly formed scsi parameters

Selective Self-tests/Logging not supported

root@pve:~#
EDIT: Unplugging the VM storage drive still yields in the system getting stuck on the pve-root: clean message for a minimum of 60 seconds at boot.

EDIT2: Some more data

root@pve:~# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
Model Number: Samsung SSD 870 EVO 500GB
Serial Number: S62ANZ0R418389E
Firmware Revision: SVT02B6Q
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x005e)
Supported: 11 8 7 6 5
Likely used: 11
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 976773168
Logical Sector size: 512 bytes
Physical Sector size: 512 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 476940 MBytes
device size with M = 1000*1000: 500107 MBytes (500 GB)
cache/buffer size = unknown
Form Factor: 2.5 inch
Nominal Media Rotation Rate: Solid State Device
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 1 Current = 1
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
Write-Read-Verify feature set
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Phy event counters
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Asynchronous notification (eg. media change)
* Software settings preservation
Device Sleep (DEVSLP)
unknown 78[10]
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
* Device encrypts all user data
* DOWNLOAD MICROCODE DMA command
* SET MAX SETPASSWORD/UNLOCK DMA commands
* WRITE BUFFER DMA command
* READ BUFFER DMA command
* Data Set Management TRIM supported (limit 8 blocks)
* Deterministic read ZEROs after TRIM
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
4min for SECURITY ERASE UNIT. 8min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5002538fc140e553
NAA : 5
IEEE OUI : 002538
Unique ID : fc140e553
Device Sleep:
DEVSLP Exit Timeout (DETO): 50 ms (drive)
Minimum DEVSLP Assertion Time (MDAT): 30 ms (drive)
Checksum: correct
 
Last edited:
Okay. So after restoring the ddrescue backup to a new SSD, my VMs are back up and running. The only problem remains is this line during boot that holds up the startup for 60+ seconds.

/dev/mapper/pve-root: clean, 80742/3653631 files, 3739911/14614528 blocks