Storage become unknown

tweety · Monday at 16:53

I have that server

INTEL - NUC Wall Street NUC12WSHI3 Barebone L6 NO Cord
Crucial CT4G4SFS624A 32GB Memory
Crucial P3 1TB M.2 PCIe Gen3 NVMe Internal SSD, up to 3500MB/s - CT1000P3SSD8
Transcend 1TB SATA III 6Gb/s Internal 2.5 Inch SSD220Q 2.5 Inch

And proxmox 8.4.1.

After a while tha storage sata1 is no more available for my vm and has a question mark on it with a status unknown.
It also disappeared from the node disks/lvm.
The only solution I found to to restart the server.
But again after a while same problem.

Can someone help me understand what's happening and what to do ? (I'm new with proxmox).

Thanks.

l.leahu-vladucu · 2025-05-06T09:08:08+0200

Hello tweety! Could you please check the journal around the time the issue occurs? Please use:

Code:

journalctl --since <TIME> --until <TIME> > journal.txt

Please check if you have any storage/SATA-related warnings or errors. You can attach the journal to this thread.

Maybe this happens due to some power saving features, or maybe due to other issues. The journal should help us with debugging further.

tweety · 2025-05-06T11:02:26+0200

Hello. Thanks for your reply.
I rebooted the server yesterday about 20:20.
And this morning the disk has unknown status again.
I don't know if it could have a link but the entire disk is linked to an openmediavault vm.
Inside omv the disk is prepared with btfs.
See log in attachment since yesterday reboot.

l.leahu-vladucu · 2025-05-06T12:08:40+0200

Thanks for the journal. It confirms that there are multiple I/O errors - here the ones at the point where the issues start to happen:

May 06 07:59:59 proxmox01 kernel: ata2.00: exception Emask 0x0 SAct 0x700 SErr 0x0 action 0x6 frozen
May 06 07:59:59 proxmox01 kernel: ata2.00: failed command: WRITE FPDMA QUEUED
May 06 07:59:59 proxmox01 kernel: ata2.00: cmd 61/20:40:80:30:21/00:00:00:00:00/40 tag 8 ncq dma 16384 out
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
May 06 07:59:59 proxmox01 kernel: ata2.00: status: { DRDY }
May 06 07:59:59 proxmox01 kernel: ata2.00: failed command: WRITE FPDMA QUEUED
May 06 07:59:59 proxmox01 kernel: ata2.00: cmd 61/20:48:a0:30:21/00:00:00:00:00/40 tag 9 ncq dma 16384 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
May 06 07:59:59 proxmox01 kernel: ata2.00: status: { DRDY }
May 06 07:59:59 proxmox01 kernel: ata2.00: failed command: WRITE FPDMA QUEUED
May 06 07:59:59 proxmox01 kernel: ata2.00: cmd 61/20:50:c0:30:21/00:00:00:00:00/40 tag 10 ncq dma 16384 out
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
May 06 07:59:59 proxmox01 kernel: ata2.00: status: { DRDY }
May 06 07:59:59 proxmox01 kernel: ata2: hard resetting link
May 06 08:00:04 proxmox01 kernel: ata2: link is slow to respond, please be patient (ready=0)
May 06 08:00:09 proxmox01 kernel: ata2: hard resetting link
May 06 08:00:14 proxmox01 kernel: ata2: link is slow to respond, please be patient (ready=0)
May 06 08:00:19 proxmox01 kernel: ata2: hard resetting link
May 06 08:00:24 proxmox01 kernel: ata2: link is slow to respond, please be patient (ready=0)
May 06 08:00:54 proxmox01 kernel: ata2: limiting SATA link speed to 3.0 Gbps
May 06 08:00:54 proxmox01 kernel: ata2: hard resetting link
May 06 08:00:59 proxmox01 kernel: ata2: hardreset failed
May 06 08:00:59 proxmox01 kernel: ata2: reset failed, giving up
May 06 08:00:59 proxmox01 kernel: ata2.00: disable device
May 06 08:00:59 proxmox01 kernel: ata2: EH complete
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#11 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=90s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#11 CDB: Write(10) 2a 00 00 21 30 80 00 00 20 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 2175104 op 0x1WRITE) flags 0x8800 phys_seg 4 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=90s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#12 CDB: Write(10) 2a 00 00 21 30 a0 00 00 20 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 2175136 op 0x1WRITE) flags 0x8800 phys_seg 4 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=90s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#13 CDB: Write(10) 2a 00 00 21 30 c0 00 00 20 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 2175168 op 0x1WRITE) flags 0x8800 phys_seg 4 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#16 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#16 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 0 op 0x0READ) flags 0x0 phys_seg 32 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#26 CDB: Write(10) 2a 00 00 21 30 80 00 00 60 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 2175104 op 0x1WRITE) flags 0x8800 phys_seg 12 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#27 CDB: Write(10) 2a 00 00 21 30 80 00 00 60 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 2175104 op 0x1WRITE) flags 0x8800 phys_seg 12 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#7 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 0 op 0x0READ) flags 0x0 phys_seg 9 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#5 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#5 CDB: Write(10) 2a 00 00 21 30 80 00 00 60 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 2175104 op 0x1WRITE) flags 0x8800 phys_seg 12 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#23 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 0 op 0x0READ) flags 0x0 phys_seg 32 prio class 0
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#28 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
May 06 08:00:59 proxmox01 kernel: sd 1:0:0:0: [sda] tag#28 CDB: Write(10) 2a 00 00 21 30 80 00 00 60 00
May 06 08:00:59 proxmox01 kernel: I/O error, dev sda, sector 2175104 op 0x1WRITE) flags 0x8800 phys_seg 12 prio class 0

Furthermore, smartd reports the following:

May 06 08:22:11 proxmox01 smartd[760]: Device: /dev/sda [SAT], is in SLEEP mode, suspending checks

This notification would confirm that it is related to some power-saving features. However, the previous errors say something different, so this might not be true.

Unfortunately, I/O errors can happen due to many reasons, so please try the following:

Unplug and plug in again the SATA cable on both sides - make sure that it is plugged in properly
Try using a different SATA cable
Try using a different SATA port on the motherboard
Check whether there are any SATA-related power saving features in the BIOS and try to disable them - at least for now, in order to see if the issue still occurs.
Check the S.M.A.R.T. values of the disk using smartctl -a /dev/sda

If none of this helps, the drive might be dying and might need to be replaced. Please make sure that you have backups of your data to avoid data loss.

tweety · 2025-05-06T16:00:31+0200

My server is an all in one intel nuc.
The sata cable is a special flat cable. Not easy to replace.
And there is only one port on the motherboard.

I'll look for the power saving.

After rebooting again, here is the smartctl result

Code:

root@proxmox01:~# smartctl -a /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-10-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     TS1TSSD220Q
Serial Number:    H783800372
LU WWN Device Id: 5 7c3548 20b8e2834
Firmware Version: VD0R0230
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        Not in smartctl database 7.3/5319
ATA Version is:   ACS-3, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue May  6 15:54:01 2025 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    1) seconds.
Offline data collection
capabilities:                    (0x59) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       16531
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38
160 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
161 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       48
163 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       51
164 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       185153
165 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       1546
166 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       33
167 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       391
168 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       600
169 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       34
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       34
194 Temperature_Celsius     0x0022   100   100   030    Old_age   Always       -       51
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       645
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
200 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       52
201 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       0
202 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x0032   100   100   000    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   000    Old_age   Always       -       99
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       288533
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       101385
245 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       13344323
250 Read_Error_Retry_Rate   0x0032   100   100   000    Old_age   Always       -       324596

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

gfngfn256 · 2025-05-06T16:58:43+0200

tweety said:
NUC12WSHI3

tweety said:
4 x Crucial CT4G4SFS624A 4GB Memory

Don't think that is possible on that NUC. It seems to only have two 260-pin DDR4 SO-DIMM sockets. Maybe that's a typo & should read "2 x"?

tweety said:
the entire disk is linked to an openmediavault vm

How did you do that?

tweety · 2025-05-06T18:08:43+0200

@gfngfn256 yes indeed a big typo. It is 1x 32gb. I corrected my initial post.
For the vm I just added the disk to the vm.

gfngfn256 · 2025-05-06T18:57:37+0200

tweety said:
yes indeed a big typo. It is 1x 32gb.

That cannot be considered a typo:

4 x Crucial CT4G4SFS624A 4GB= 16GB NOT 32GB!
CT4G4SFS624A (still in your post) is AFAIK a 4GB SODIMM

Do you actually know what SODIMM is actually in there? Is it rated/compatible for the HW?
Anyway whatever it is - I probably don't need to tell you that 2x 16GB would have been the correct way to populate those banks for 32GB.

tweety said:
For the vm I just added the disk to the vm.

I don't see what you have added to enhance my knowledge of how you "added the disk".
Could you provide output for:

Code:

qm config <vmid>   #replacing the <vmid> with the actual VMID

tweety · 2025-05-06T20:11:02+0200

Don't be so hard with me

I never told I was an hardware specialist. And I'm sure I'm definitely not one

Code:

agent: 1
boot: order=scsi0;ide2;net0
cores: 2
cpu: x86-64-v2-AES
ide2: local:iso/openmediavault_7.4.17-amd64.iso,media=cdrom,size=940M
memory: 4096
meta: creation-qemu=9.2.0,ctime=1746260556
name: omv
net0: virtio=BC:24:11:99:CA:E2,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-101-disk-0,iothread=1,size=64G
scsi1: sata1:vm-101-disk-0,iothread=1,size=931G
scsihw: virtio-scsi-single
smbios1: uuid=5a95eae1-81e5-4689-8d4a-156d820d926b
sockets: 4
vmgenid: 0433dc1f-f3a0-4c9c-80b7-0fbd80524a7d

gfngfn256 · 2025-05-06T20:35:29+0200

tweety said:
Don't be so hard with me

Sorry, was not trying to be, just trying to help get to the source of the problem.

tweety said:
scsi1: sata1:vm-101-disk-0,iothread=1,size=931G

I imagine this is the disk you refer to as:

tweety said:
the entire disk is linked to an openmediavault vm.

Well that is not exactly what you've done. You appear to have a Proxmox Storage called sata1 & on that a volume named vm-101-disk-0 that has been added to the VM 101.

In order to help, could you provide output (from the Proxmox host) for the following:

Code:

lsblk

cat /etc/pve/storage.cfg

tweety · 2025-05-07T08:53:13+0200

Code:

NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                            8:0    0 931.5G  0 disk
└─sata1-vm--101--disk--0     252:11   0   931G  0 lvm 
nvme0n1                      259:0    0 931.5G  0 disk
├─nvme0n1p1                  259:1    0  1007K  0 part
├─nvme0n1p2                  259:2    0     1G  0 part /boot/efi
└─nvme0n1p3                  259:3    0 930.5G  0 part
  ├─pve-swap                 252:0    0     8G  0 lvm  [SWAP]
  ├─pve-root                 252:1    0    96G  0 lvm  /
  ├─pve-data_tmeta           252:2    0   8.1G  0 lvm 
  │ └─pve-data-tpool         252:4    0 794.3G  0 lvm 
  │   ├─pve-data             252:5    0 794.3G  1 lvm 
  │   ├─pve-vm--100--disk--0 252:6    0     4M  0 lvm 
  │   ├─pve-vm--100--disk--1 252:7    0    32G  0 lvm 
  │   ├─pve-vm--102--disk--0 252:8    0    32G  0 lvm 
  │   ├─pve-vm--106--disk--0 252:9    0    32G  0 lvm 
  │   └─pve-vm--101--disk--0 252:10   0    64G  0 lvm 
  └─pve-data_tdata           252:3    0 794.3G  0 lvm 
    └─pve-data-tpool         252:4    0 794.3G  0 lvm 
      ├─pve-data             252:5    0 794.3G  1 lvm 
      ├─pve-vm--100--disk--0 252:6    0     4M  0 lvm 
      ├─pve-vm--100--disk--1 252:7    0    32G  0 lvm 
      ├─pve-vm--102--disk--0 252:8    0    32G  0 lvm 
      ├─pve-vm--106--disk--0 252:9    0    32G  0 lvm 
      └─pve-vm--101--disk--0 252:10   0    64G  0 lvm

Code:

dir: local
        path /var/lib/vz
        content iso,backup,vztmpl

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

lvm: sata1
        vgname sata1
        content images,rootdir
        nodes proxmox01
        shared 0

gfngfn256 · 2025-05-07T14:08:16+0200

As I thought, you have not given that disk directly to the OMV VM, but rather you created an LVM Proxmox host storage (called sata1) & then used the entire LVM to create a virtual disk (called vm--101--disk--0). Using the entire LVM for one LV may have its own issues.

I must point out I have no experience either with your HW or with OMV VM, but the way you've done this is probably not optimal, you should rather be passing the disk using the /dev/disk/by-id/ method. There are many available online tutorials for this, but here is the official Proxmox wiki on the subject.

If you take this route, you should probably first, remove the scsi1: sata1:vm-101-disk-0 from the VM (detach & then delete) then remove the Storage named sata1 (in the GUI from Datacenter, Storage, click on sata1 & Remove) & finally I'd wipe that disk before the above passthrough to the VM.

You may want to start fresh, especially with that OMV VM.

It is still to be seen on your HW how it lives up, to the hypervisor scenario.

Good luck.

Search

Search

Storage become unknown

tweety

New Member

l.leahu-vladucu

Proxmox Staff Member

tweety

New Member

Attachments

l.leahu-vladucu

Proxmox Staff Member

tweety

New Member

gfngfn256

Famous Member

tweety

New Member

gfngfn256

Famous Member

tweety

New Member

gfngfn256

Famous Member

tweety

New Member

gfngfn256

Famous Member

We value your privacy