Hi,
This morning one of my nvme drives "disappeared" from the server but the server ilo reports it as online and healthy.
The drive is a wdc gold sn600 and its label is /dev/nvme1n1. I'm using this drive in a Ceph pool.
Here is my pveversion -v:
lsblk -l output without rbd blocks:
and also dmesg | grep nvme:
Any ideas? I can't reboot the server right right now.
Thanks
This morning one of my nvme drives "disappeared" from the server but the server ilo reports it as online and healthy.
The drive is a wdc gold sn600 and its label is /dev/nvme1n1. I'm using this drive in a Ceph pool.
Here is my pveversion -v:
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
lsblk -l output without rbd blocks:
Code:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 447.1G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part /boot/efi
└─sda3 8:3 0 446.6G 0 part
├─pve-swap 253:1 0 8G 0 lvm [SWAP]
├─pve-root 253:2 0 96G 0 lvm /
├─pve-data_tmeta 253:3 0 3.3G 0 lvm
│ └─pve-data 253:5 0 320.1G 0 lvm
└─pve-data_tdata 253:4 0 320.1G 0 lvm
└─pve-data 253:5 0 320.1G 0 lvm
nvme0n1 259:0 0 1.8T 0 disk
└─nvme0n1p1 259:5 0 1.8T 0 part
├─vg--local3-lv--local3_tmeta 253:0 0 112M 0 lvm
│ └─vg--local3-lv--local3-tpool 253:7 0 1.8T 0 lvm
│ ├─vg--local3-lv--local3 253:12 0 1.8T 0 lvm
│ ├─vg--local3-vm--6000--disk--0 253:13 0 20G 0 lvm
│ ├─vg--local3-vm--6006--disk--0 253:14 0 20G 0 lvm
│ ├─vg--local3-vm--6007--disk--0 253:15 0 20G 0 lvm
│ ├─vg--local3-vm--6010--disk--0 253:16 0 20G 0 lvm
│ ├─vg--local3-vm--6011--disk--0 253:17 0 20G 0 lvm
│ ├─vg--local3-vm--6013--disk--0 253:18 0 20G 0 lvm
│ ├─vg--local3-vm--6026--disk--2 253:19 0 15G 0 lvm
│ ├─vg--local3-vm--6004--disk--0 253:20 0 20G 0 lvm
│ ├─vg--local3-vm--6018--disk--0 253:21 0 20G 0 lvm
│ ├─vg--local3-vm--6023--disk--0 253:22 0 20G 0 lvm
│ ├─vg--local3-vm--6025--disk--0 253:23 0 20G 0 lvm
│ ├─vg--local3-vm--6025--disk--1 253:24 0 25G 0 lvm
│ ├─vg--local3-vm--6026--disk--0 253:25 0 20G 0 lvm
│ ├─vg--local3-vm--6026--disk--1 253:26 0 60G 0 lvm
│ ├─vg--local3-vm--1003--disk--0 253:27 0 20G 0 lvm
│ └─vg--local3-vm--1003--disk--1 253:28 0 60G 0 lvm
└─vg--local3-lv--local3_tdata 253:6 0 1.8T 0 lvm
└─vg--local3-lv--local3-tpool 253:7 0 1.8T 0 lvm
├─vg--local3-lv--local3 253:12 0 1.8T 0 lvm
├─vg--local3-vm--6000--disk--0 253:13 0 20G 0 lvm
├─vg--local3-vm--6006--disk--0 253:14 0 20G 0 lvm
├─vg--local3-vm--6007--disk--0 253:15 0 20G 0 lvm
├─vg--local3-vm--6010--disk--0 253:16 0 20G 0 lvm
├─vg--local3-vm--6011--disk--0 253:17 0 20G 0 lvm
├─vg--local3-vm--6013--disk--0 253:18 0 20G 0 lvm
├─vg--local3-vm--6026--disk--2 253:19 0 15G 0 lvm
├─vg--local3-vm--6004--disk--0 253:20 0 20G 0 lvm
├─vg--local3-vm--6018--disk--0 253:21 0 20G 0 lvm
├─vg--local3-vm--6023--disk--0 253:22 0 20G 0 lvm
├─vg--local3-vm--6025--disk--0 253:23 0 20G 0 lvm
├─vg--local3-vm--6025--disk--1 253:24 0 25G 0 lvm
├─vg--local3-vm--6026--disk--0 253:25 0 20G 0 lvm
├─vg--local3-vm--6026--disk--1 253:26 0 60G 0 lvm
├─vg--local3-vm--1003--disk--0 253:27 0 20G 0 lvm
└─vg--local3-vm--1003--disk--1 253:28 0 60G 0 lvm
nvme1n1 259:1 0 1.8T 0 disk
└─ceph--f107a279--77ae--4003--8523--b62d356df5bd-osd--block--95f1cb37--6324--472a--9cd7--0ba92770f3b5 253:8 0 1.8T 0 lvm
nvme3n1 259:2 0 1.8T 0 disk
└─ceph--6ed9e93b--685a--4b3a--ae25--ca14e772d7ee-osd--block--0b46114b--f375--4ad7--9f80--914e4b802ea4 253:10 0 1.8T 0 lvm
nvme4n1 259:3 0 1.8T 0 disk
└─ceph--cd46f637--5533--49bb--aa42--09efc22d89dc-osd--block--f74e3f69--c6a9--49fb--af63--ebac9a98bc22 253:11 0 1.8T 0 lvm
and also dmesg | grep nvme:
Code:
[8599639.793030] nvme nvme2: I/O 677 QID 7 timeout, aborting
[8599639.793038] nvme nvme2: I/O 678 QID 7 timeout, aborting
[8599639.793040] nvme nvme2: I/O 679 QID 7 timeout, aborting
[8599639.793043] nvme nvme2: I/O 680 QID 7 timeout, aborting
[8599670.508517] nvme nvme2: I/O 677 QID 7 timeout, reset controller
[8599701.231967] nvme nvme2: I/O 0 QID 0 timeout, reset controller
[8599742.451266] nvme nvme2: Device not ready; aborting reset
[8599742.492142] nvme nvme2: Abort status: 0x371
[8599742.492144] nvme nvme2: Abort status: 0x371
[8599742.492145] nvme nvme2: Abort status: 0x371
[8599742.492146] nvme nvme2: Abort status: 0x371
[8599753.139132] nvme nvme2: Device not ready; aborting reset
[8599753.139528] nvme nvme2: Removing after probe failure status: -19
[8599763.758945] nvme nvme2: Device not ready; aborting reset
[8599763.759494] blk_update_request: I/O error, dev nvme2n1, sector 1802065328 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
[8599763.759497] blk_update_request: I/O error, dev nvme2n1, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
[8599763.759501] blk_update_request: I/O error, dev nvme2n1, sector 2797415056 op 0x1:(WRITE) flags 0x8800 phys_seg 3 prio class 0
[8599763.759503] blk_update_request: I/O error, dev nvme2n1, sector 1802230728 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
[8599763.759506] blk_update_request: I/O error, dev nvme2n1, sector 2285580784 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[8599763.759508] blk_update_request: I/O error, dev nvme2n1, sector 1802234456 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
[8599763.759514] blk_update_request: I/O error, dev nvme2n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
[8599763.759515] blk_update_request: I/O error, dev nvme2n1, sector 2797406496 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[8599763.759519] blk_update_request: I/O error, dev nvme2n1, sector 1802229048 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0
[8599763.759520] blk_update_request: I/O error, dev nvme2n1, sector 2287970768 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Any ideas? I can't reboot the server right right now.
Thanks