[SOLVED] [SOLVED] "Volume XXX does not exist"

morilythari

New Member
Aug 2, 2022
3
0
1
I noticed a few backup jobs error out last night. The log states:

Code:
volume 'NASP12:107/vm-107-disk-0.qcow2' does not exist

The volume most definitely does exist as the VMs are running and the disks can be written too.
This is also preventing me from being able to migrate VMs to other hosts to initiate reboots to try and clear the error. Each one (50+ VMs across 4 hosts). Also a host that is shut down will not start back up, again saying the volume does not exist.


So far I have tried:

Verified connectivity to storage, I can create new disks onto it.

Running an nfs scan of the storage

Restarted pve-cluster, pvedaemon, pvestatd and pveproxy.
If there is no other option then tonight I can shut down each VM and reboot to see if the issue is resolved but I'm wondering if there is any past instances of this. I've tried google and searching the forums and there are some that are somewhat similar but only give steps that I've already tried.

Full error from live migration attempt:

Code:
2022-08-02 15:28:51 starting migration of VM 107 to node 'putsproxp10' (192.168.39.120)
2022-08-02 15:28:51 starting VM 107 on remote node 'putsproxp10'
2022-08-02 15:28:52 [putsproxp10] volume 'NASP12:107/vm-107-disk-0.qcow2' does not exist
2022-08-02 15:28:52 ERROR: online migrate failure - remote command failed with exit code 255
2022-08-02 15:28:52 aborting phase 2 - cleanup resources
2022-08-02 15:28:52 migrate_cancel
2022-08-02 15:28:53 ERROR: migration finished with problems (duration 00:00:02)
TASK ERROR: migration problems
 
what does "pvesm status" show on source and target node?
what does "pvesm list NASP12" show on source and target node?
what does "qm config [vmid]" show?

what does journalctl show around the time of migration error on target node?
what does journalctl show around the time of backup error on source node?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
pvesm status

Code:
Name             Type     Status           Total            Used       Available        %
ISOs              nfs     active     64239381504       143040512     64096340992    0.22%
NASP09            nfs     active    133349567488       211349504    133138217984    0.16%
NASP12            nfs     active    113940043776     49843702784     64096340992   43.75%
backups           nfs     active     67993288704     14334651392     53658637312   21.08%
local             dir     active        98559220        14338620        79171052   14.55%
local-lvm     lvmthin     active       366276608               0       366276608    0.00%

pvesm list NASP12 lists nothing, which is worrying.

qm config 102

Code:
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 8192
meta: creation-qemu=6.1.1,ctime=1648733633
name: RockyTest
net0: virtio=D2:BC:78:8A:68:FC,bridge=vmbr1,tag=2
numa: 0
ostype: l26
scsi0: NASP12:102/vm-102-disk-0.qcow2,size=150G
scsihw: virtio-scsi-pci
smbios1: uuid=debef876-4bd3-4982-bfd9-436dc0a3d053
sockets: 1
vmgenid: 4b80e405-cd62-4e69-8ce8-4ed15cd35a1a

Syslog from start of migration to error:

Code:
pvedaemon[3074524]: <root@pam> starting task UPID:putsproxp09:002F2AE6:06C25379:62E98B59:qmigrate:109:root@pam:
Aug  2 16:38:50 putsproxp09 corosync[2411]:   [KNET  ] pmtud: Starting PMTUD for host: 9 link: 0
Aug  2 16:38:50 putsproxp09 corosync[2411]:   [KNET  ] udp: detected kernel MTU: 1500
Aug  2 16:38:50 putsproxp09 corosync[2411]:   [KNET  ] pmtud: PMTUD completed for host: 9 link: 0 current link mtu: 1397
Aug  2 16:38:50 putsproxp09 corosync[2411]:   [KNET  ] pmtud: Starting PMTUD for host: 9 link: 1
Aug  2 16:38:50 putsproxp09 corosync[2411]:   [KNET  ] udp: detected kernel MTU: 1500
Aug  2 16:38:50 putsproxp09 corosync[2411]:   [KNET  ] pmtud: PMTUD completed for host: 9 link: 1 current link mtu: 1397
Aug  2 16:38:50 putsproxp09 pmxcfs[3037025]: [status] notice: received log
Aug  2 16:38:50 putsproxp09 pmxcfs[3037025]: [status] notice: received log
Aug  2 16:38:50 putsproxp09 systemd[1]: Stopping User Manager for UID 0...
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Stopped target Main User Target.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Stopped target Basic System.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Stopped target Paths.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Stopped target Sockets.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Stopped target Timers.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: dirmngr.socket: Succeeded.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Closed GnuPG network certificate management daemon.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: gpg-agent-browser.socket: Succeeded.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Aug  2 16:38:50 putsproxp09 systemd[3089973]: gpg-agent-extra.socket: Succeeded.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Aug  2 16:38:50 putsproxp09 systemd[3089973]: gpg-agent-ssh.socket: Succeeded.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Aug  2 16:38:50 putsproxp09 systemd[3089973]: gpg-agent.socket: Succeeded.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Closed GnuPG cryptographic agent and passphrase cache.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Removed slice User Application Slice.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Reached target Shutdown.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: systemd-exit.service: Succeeded.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Finished Exit the Session.
Aug  2 16:38:50 putsproxp09 systemd[3089973]: Reached target Exit the Session.
Aug  2 16:38:50 putsproxp09 systemd[1]: user@0.service: Succeeded.
Aug  2 16:38:50 putsproxp09 systemd[1]: Stopped User Manager for UID 0.
Aug  2 16:38:51 putsproxp09 systemd[1]: Stopping User Runtime Directory /run/user/0...
Aug  2 16:38:51 putsproxp09 systemd[1]: run-user-0.mount: Succeeded.
Aug  2 16:38:51 putsproxp09 systemd[1]: user-runtime-dir@0.service: Succeeded.
Aug  2 16:38:51 putsproxp09 systemd[1]: Stopped User Runtime Directory /run/user/0.
Aug  2 16:38:51 putsproxp09 systemd[1]: Removed slice User Slice of UID 0.
Aug  2 16:38:51 putsproxp09 systemd[1]: user-0.slice: Consumed 5.727s CPU time.
Aug  2 16:38:51 putsproxp09 pmxcfs[3037025]: [status] notice: received log
Aug  2 16:38:51 putsproxp09 pmxcfs[3037025]: [status] notice: received log
Aug  2 16:38:51 putsproxp09 pvedaemon[3091174]: migration problems
Aug  2 16:38:51 putsproxp09 pvedaemon[3074524]: <root@pam> end task UPID:putsproxp09:002F2AE6:06C25379:62E98B59:qmigrate:109:root@pam: migration problems
 
I figured it out, after trying to manually move a disk I got a permissions denied.
Somehow the ownership of the NAS share was changed from nobody:nogroup to root:root
After fixing that and verifying 755 permissions its functioning as it should.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!