iSCSI: storage '<name>' is not online, VM working, no possibility to backup and/or migrate

zamo2k · Mar 16, 2021

Hello!
I'm running a PVE 5.2 cluster with 4 nodes. The cluster is attached to a SAN, an HP P2000 G3 iSCSI. VMs are hosted on the SAN.
The first controller of the SAN failed. Everything is running on the second controller, but I can't manage PVE anymore.
Although VMs are running, it seems that Proxmox is stuck on the failed controller, trying to manage virtual disks only via the first SAN IP address (which belongs to the failed controller), probably because it is the "portal" set during the first configuration.
In particular:

1) The web interface is locked (no login possible), even if I re-launch pvestatd or others (as suggested in other threads). The lock of the web interface affects 3 nodes over 4, and happened in sequence. I'm pretty sure that if I use the 4th node now, it will lock as well.

2) I can access via SSH and see the VMs running and manage them via qm monitor, but I cannot backup or migrate them because I receive the message
storage '<name>' is not online

3) I found in an old thread that PVE checks the availability of iSCSI storage with the command
iscsiadm -m session --rescan
In my case the command runs successfully.
Also ping works with all the 4 SAN IPs.

Does anyone know if there is a way to unlock Proxmox (web interface, backup, migration) without shutting everything down and restart? I'm going to replace the failed controller, but I'd like to do a backup before, but I can't...

Many thanks in advance!

Some command outputs follow.

Code:

#pvecm status

Quorum information
------------------
Date:             Tue Mar 16 11:25:37 2021
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000004
Ring ID:          1/3524
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 <IP1>
0x00000002          1 <IP2>
0x00000003          1 <IP3>
0x00000004          1 <IP4> (local)

Code:

#pveversion -v

proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-1
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-3
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9

Code:

#pvesm status

storage '<name>' is not online
storage '<name>' is not online
storage '<name>' is not online

[Ctrl+C because it hangs]

Code:

#cat /etc/pve/storage.cfg

dir: local
    path /var/lib/vz
    content vztmpl,iso,backup

lvmthin: local-lvm
    thinpool data
    vgname pve
    content images,rootdir

iscsi: <name>
    portal <SAN_IP1>
    target iqn.1986-03.com.hp:storage.p2000g3.131819bad6
    content none

lvm: <lvm_name>
    vgname <lvm_group_name>
    base <name>:0.0.100.scsi-3600c0ff00019cb32b080f65b01000000
    content rootdir,images
    shared 1

lvm: <lvm2_name>
    vgname <lvm_group2_name>
    base <name>:0.0.101.scsi-3600c0ff00019cc68a192f65b01000000
    content images,rootdir
    shared 1

nfs: <NAS_name>
    export /vol_backup_vms_08
    path /mnt/pve/netapp-backup-nfs08
    server <NAS_IP>
    content backup,images
    maxfiles 1
    options vers=3

Code:

#multipath -ll         [only the part related to the HP SAN]

3600c0ff00019cc68333d415d01000000 dm-6 HP,P2000 G3 iSCSI
size=931G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 3:0:0:101 sdg 8:96  active ready  running
| `- 5:0:0:101 sdi 8:128 active ready  running
`-+- policy='service-time 0' prio=0 status=enabled
  |- 2:0:0:101 sdf 8:80  failed faulty running
  `- 4:0:0:101 sdh 8:112 failed faulty running
3600c0ff00019cb32b080f65b01000000 dm-5 HP,P2000 G3 iSCSI
size=931G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| |- 2:0:0:100 sdb 8:16  failed faulty running
| `- 4:0:0:100 sdd 8:48  failed faulty running
`-+- policy='service-time 0' prio=50 status=active
  |- 3:0:0:100 sdc 8:32  active ready  running
  `- 5:0:0:100 sde 8:64  active ready  running

There's no multipath.conf, as this SAN seems to be already pre-configured in the default configuration.

This is the cfg of one of the VMs hosted on the SAN, which I can't backup or migrate anymore.

Code:

#qm config 121

balloon: 0
bootdisk: virtio0
cores: 6
memory: 24576
name: <VM 121 NAME>
net0: virtio=76:D6:C3:9F:34:F9,bridge=vmbr0
net1: virtio=36:56:53:14:78:D9,bridge=vmbr1
numa: 0
ostype: win7
scsihw: virtio-scsi-pci
smbios1: uuid=eaed5964-aa1a-4c8e-8557-1317712a9df7
sockets: 2
virtio0: <lvm_name>:vm-121-disk-1,size=100G

zamo2k · Mar 19, 2021

No one has experienced the same issue? No suggestions?

It seems to me that this is a strange Proxmox behavior, maybe a bug?
The ISCSI volumes are online and running, but the system reports they are down and doesn't allow managing the VMs and using the web interface.

jamaljmhr3 · Nov 2, 2021

Try
# pvecm expected 4
4 is the number of your nodes

Search

Search

iSCSI: storage '<name>' is not online, VM working, no possibility to backup and/or migrate

zamo2k

Member

zamo2k

Member

jamaljmhr3

Member