[SOLVED] proxmox connects to iscsi targets after (failed) VM autostart?

RodinM

Renowned Member
Aug 1, 2011
81
0
71
Hello,
I found out that after proxmox starts VMs which have their disks on iscsi storage and are expected to autostart fail to do this because their storage is not ready yet.
By the time I login to the web gui I can start the VMs by hand with no problem:

Bash:
...........
Aug 12 07:00:11 hv02 systemd[1]: Starting pve-guests.service - PVE guests...
Aug 12 07:00:13 hv02 pve-guests[1160]: <root@pam> starting task UPID:hv02:0000048B:000005A3:689ABC4D:startall::root@pam:
Aug 12 07:00:13 hv02 pvesh[1160]: Starting VM 102
Aug 12 07:00:13 hv02 pve-guests[1163]: <root@pam> starting task UPID:hv02:0000048C:000005A5:689ABC4D:qmstart:102:root@pam:
Aug 12 07:00:13 hv02 pve-guests[1164]: start VM 102: UPID:hv02:0000048C:000005A5:689ABC4D:qmstart:102:root@pam:
Aug 12 07:00:13 hv02 pve-guests[1164]: no such logical volume vg_iscsi/lv_thin_iscsi
Aug 12 07:00:14 hv02 pvesh[1160]: Starting VM 102 failed: no such logical volume vg_iscsi/lv_thin_iscsi
Aug 12 07:00:14 hv02 pvesh[1160]: Starting VM 100
Aug 12 07:00:14 hv02 pve-guests[1163]: <root@pam> starting task UPID:hv02:00000491:0000060A:689ABC4E:qmstart:100:root@pam:
Aug 12 07:00:14 hv02 pve-guests[1169]: start VM 100: UPID:hv02:00000491:0000060A:689ABC4E:qmstart:100:root@pam:
Aug 12 07:00:14 hv02 pve-guests[1169]: no such logical volume vg_iscsi/lv_thin_iscsi
Aug 12 07:00:15 hv02 pvesh[1160]: Starting VM 100 failed: no such logical volume vg_iscsi/lv_thin_iscsi
Aug 12 07:00:15 hv02 pve-guests[1160]: <root@pam> end task UPID:hv02:0000048B:000005A3:689ABC4D:startall::root@pam: OK
Aug 12 07:00:15 hv02 systemd[1]: Finished pve-guests.service - PVE guests.
...........
Aug 12 07:00:19 hv02 systemd[1]: Starting iscsid.service - iSCSI initiator daemon (iscsid)...
Aug 12 07:00:19 hv02 iscsid[1231]: iSCSI logger with pid=1233 started!
Aug 12 07:00:19 hv02 systemd[1]: Started iscsid.service - iSCSI initiator daemon (iscsid).
Aug 12 07:00:19 hv02 kernel: Loading iSCSI transport class v2.0-870.
Aug 12 07:00:19 hv02 kernel: iscsi: registered transport (tcp)
Aug 12 07:00:19 hv02 kernel: scsi host6: iSCSI Initiator over TCP/IP
Aug 12 07:00:19 hv02 kernel: scsi 6:0:0:0: RAID              IET      Controller       0001 PQ: 0 ANSI: 5
Aug 12 07:00:19 hv02 kernel: scsi 6:0:0:0: Attached scsi generic sg1 type 12
Aug 12 07:00:19 hv02 kernel: scsi 6:0:0:1: Direct-Access     IET      VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Aug 12 07:00:19 hv02 kernel: sd 6:0:0:1: Attached scsi generic sg2 type 0
Aug 12 07:00:19 hv02 kernel: sd 6:0:0:1: Power-on or device reset occurred
Aug 12 07:00:19 hv02 kernel: sd 6:0:0:1: [sdb] 2936012800 512-byte logical blocks: (1.50 TB/1.37 TiB)
Aug 12 07:00:19 hv02 kernel: sd 6:0:0:1: [sdb] 4096-byte physical blocks
Aug 12 07:00:19 hv02 kernel: sd 6:0:0:1: [sdb] Write Protect is off
Aug 12 07:00:19 hv02 kernel: sd 6:0:0:1: [sdb] Mode Sense: 69 00 10 08
Aug 12 07:00:19 hv02 kernel: sd 6:0:0:1: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
Aug 12 07:00:19 hv02 kernel:  sdb: sdb1
Aug 12 07:00:19 hv02 kernel: sd 6:0:0:1: [sdb] Attached SCSI disk
Aug 12 07:00:19 hv02 kernel: netfs: FS-Cache loaded
Aug 12 07:00:19 hv02 lvm[1270]: PV /dev/sdb1 online, VG vg_iscsi is complete.
Aug 12 07:00:19 hv02 systemd[1]: Started lvm-activate-vg_iscsi.service - [systemd-run] /usr/sbin/lvm vgchange -aay --autoactivation event vg_iscsi.
Aug 12 07:00:19 hv02 kernel: NFS: Registering the id_resolver key type
...........
Aug 12 07:03:39 hv02 systemd[1]: Started session-1.scope - Session 1 of User root.
Aug 12 07:04:44 hv02 pvedaemon[2295]: start VM 102: UPID:hv02:000008F7:00006FC0:689ABD5C:qmstart:102:root@pam:
Aug 12 07:04:44 hv02 pvedaemon[1134]: <root@pam> starting task UPID:hv02:000008F7:00006FC0:689ABD5C:qmstart:102:root@pam:
Aug 12 07:04:45 hv02 systemd[1]: Created slice qemu.slice - Slice /qemu.
Aug 12 07:04:45 hv02 systemd[1]: Started 102.scope.
Aug 12 07:04:46 hv02 kernel: tap102i0: entered promiscuous mode
Aug 12 07:04:46 hv02 kernel: vmbr0: port 2(fwpr102p0) entered blocking state
Aug 12 07:04:46 hv02 kernel: vmbr0: port 2(fwpr102p0) entered disabled state
Aug 12 07:04:46 hv02 kernel: fwpr102p0: entered allmulticast mode
Aug 12 07:04:46 hv02 kernel: fwpr102p0: entered promiscuous mode
Aug 12 07:04:46 hv02 kernel: vmbr0: port 2(fwpr102p0) entered blocking state
Aug 12 07:04:46 hv02 kernel: vmbr0: port 2(fwpr102p0) entered forwarding state
Aug 12 07:04:46 hv02 kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Aug 12 07:04:46 hv02 kernel: fwbr102i0: port 1(fwln102i0) entered disabled state
Aug 12 07:04:46 hv02 kernel: fwln102i0: entered allmulticast mode
Aug 12 07:04:46 hv02 kernel: fwln102i0: entered promiscuous mode
Aug 12 07:04:46 hv02 kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Aug 12 07:04:46 hv02 kernel: fwbr102i0: port 1(fwln102i0) entered forwarding state
Aug 12 07:04:46 hv02 kernel: fwbr102i0: port 2(tap102i0) entered blocking state
Aug 12 07:04:46 hv02 kernel: fwbr102i0: port 2(tap102i0) entered disabled state
Aug 12 07:04:46 hv02 kernel: tap102i0: entered allmulticast mode
Aug 12 07:04:46 hv02 kernel: fwbr102i0: port 2(tap102i0) entered blocking state
Aug 12 07:04:46 hv02 kernel: fwbr102i0: port 2(tap102i0) entered forwarding state
Aug 12 07:04:46 hv02 pvedaemon[2295]: VM 102 started with PID 2317.
I set up the lvm-thin storage on iscsi following the official guide which otherwise seems to work quite well.
I can of course set a 200 second delay to the VM that should autostart first which completely solves the problem.
But may be there is something else I should correct in my installation?
 
The delay in the VM autostart settings does not help when it is the first VM to autostart.
 
Can you share your "/etc/pve/storage.cfg" ?
Also, what is serving your iSCSI?
Finally, what version of PVE are you running: pveversion -v


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

my storage.cfg:

Bash:
~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content rootdir

iscsi: iscsi_block
        portal 10.10.1.10
        target iqn.2025-08.ru.rmaxv.nfs:disk0
        content none

lvmthin: iscsi_lvm_thin
        thinpool lv_thin_iscsi
        vgname vg_iscsi
        content rootdir,images

nfs: nfs_backup
        export /export/DATA
        path /mnt/pve/nfs_backup
        server 10.10.1.10
        content iso,backup
        prune-backups keep-all=1

iSCSI is served by openmediavault server

pveversion:
Code:
# pveversion -v
proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve)
pve-manager: 9.0.3 (running version: 9.0.3/025864202ebb6109)
proxmox-kernel-helper: 9.0.3
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.14: 6.14.8-2
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx9
intel-microcode: 3.20250512.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.9
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.6
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2
lxc-pve: 6.0.4-2
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.11-1
proxmox-backup-file-restore: 4.0.11-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.1
proxmox-kernel-helper: 9.0.3
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.0
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.9
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-3
pve-ha-manager: 5.0.4
pve-i18n: 3.5.2
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.16
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.3-pve1
 
iSCSI is served by openmediavault server
I presume this isn’t a VM running on the same PVE host?

From what you’ve described, you have two storage pools: iSCSI and LVM-Thin. One depends on the other, but PVE currently has no built-in way to declare that dependency. This creates a very likely race condition when bringing up storage. Could this be improved? Sure, but it’s not a setup typically seen in business environments.

For example, LVM-type storage pools have the following option:
base
Base volume. This volume is automatically activated before accessing the storage. This is mostly useful when the LVM volume group resides on a remote iSCSI server.


This tells PVE that one storage must be active before the other. However, I don’t believe this option applies to LVM-Thin.

One potential approach would be to switch from using the PVE iSCSI storage pool to a Linux-native configuration, then implement systemd dependencies between PVE services and the iscsid service. You can see a similar approach described here (behavior B):
https://kb.blockbridge.com/technote...rage/#connect-all-hosts-to-the-shared-storage

And here’s another reference on systemd dependencies: https://forum.proxmox.com/threads/systemd-dependencies-for-mail-forwarding.114673/

You’ll need to do some research and testing, running a virtual PVE lab might be the easiest and safest way to experiment.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
From what you’ve described, you have two storage pools: iSCSI and LVM-Thin. One depends on the other, but PVE currently has no built-in way to declare that dependency. This creates a very likely race condition when bringing up storage. Could this be improved? Sure, but it’s not a setup typically seen in business environments.
I found the solution in the outdated article from proxmox wiki
I changed the following parameters:
Bash:
node.startup:
manual => automatic
node.session.timeo.replacement_timeout:
120 => 15
in the /etc/iscsi/iscsid.conf and /etc/iscsi/nodes/<TARGET>/<PORTAL>/default
The problem is gone.

The latest article says only about setting:
Bash:
node.session.timeo.replacement_timeout = 15

But I will leave the solution from the older article for now.