Cannot select storage for VM because of loading and timeout

weilbyte

New Member
Jan 19, 2022
9
1
1
21
Hi, when I try to create VM or do anything that requires me to select storage, the dialog is loading for a while and then says communication failure. In the network tab the request for "https://PROXMOX:8006/api2/json/nodes/NODE/storage?format=1&content=images" fails with status code 596.
1642632341416.png1642632448247.png

My storage setup as reported by `pvesm status` (took really long time to output, not sure if this is normal):
1642632671363.png

The storage also shows up fine under Datacenter -> Storage and Node -> Disks.
Help would be greatly appreciated as I have no clue what is going on.
 
Hello,

Do you can use shell and see the content of node storage?
Do you have a cluster or single node?

Can you please do journalctl -f command during creating the VM and see if there is any error message?
 
Hello,

Do you can use shell and see the content of node storage?
Do you have a cluster or single node?

Can you please do journalctl -f command during creating the VM and see if there is any error message?
Hi, it is a single node. journalctl -f does not say anything useful as I cannot add storage to the VM because of the storage selection failing to load.

I am also unable to see the storage content from the dashboard, it is failing to load there as well:
1642672156061.png

However, running pvesh get nodes/NODE/storage/ssd-lvm/content --output-format json does work, for all storages.
 
Hi,

Sorry for the late answer, can you renew the Proxmox VE certificate using the below command?


Bash:
pvecm updatecerts -f
 
Hi,

Sorry for the late answer, can you renew the Proxmox VE certificate using the below command?


Bash:
pvecm updatecerts -f
I have ran the command but it still does not work. Restarted a couple of times and result is still the same.
 
can you attach is the syslog `/var/log/syslog`
I have uploaded it.

I have also noticed that
Code:
pvesh get nodes/ether/storage --output-format json
(the same API call that the UI does) takes around 1m25s to run. The UI gets error 596 on that request after 30 seconds.

I assume this could have something to do with the SMART error in the attached syslog but the RAID controller is reporting that there is nothing wrong.
 

Attachments

Hi,

Thank you for the Syslog file!

there is much like this message

Code:
Jan 31 19:07:02 ether pvestatd[4365]: status update time (66.443 seconds)
Jan 31 19:08:31 ether pvestatd[4365]: status update time (89.776 seconds)
Jan 31 19:09:41 ether pvestatd[4365]: status update time (69.685 seconds)
Jan 31 19:09:53 ether pvestatd[4365]: status update time (12.147 seconds)
Jan 31 19:10:32 ether pvestatd[4365]: status update time (38.787 seconds)
Jan 31 19:11:35 ether pvestatd[4365]: status update time (62.560 seconds)
Jan 31 19:13:06 ether pvestatd[4365]: status update time (91.687 seconds)
Jan 31 19:14:17 ether pvestatd[4365]: status update time (70.465 seconds)
Jan 31 19:14:31 ether pvestatd[4365]: status update time (14.243 seconds)
Jan 31 19:15:01 ether pvestatd[4365]: status update time (19.993 seconds)
Jan 31 19:16:26 ether pvestatd[4365]: status update time (84.586 seconds)

Can you please post the output of the following command as well?

Bash:
ps aux | grep '[p]vestatd'
 
Hi,

Thank you for the Syslog file!

there is much like this message

Code:
Jan 31 19:07:02 ether pvestatd[4365]: status update time (66.443 seconds)
Jan 31 19:08:31 ether pvestatd[4365]: status update time (89.776 seconds)
Jan 31 19:09:41 ether pvestatd[4365]: status update time (69.685 seconds)
Jan 31 19:09:53 ether pvestatd[4365]: status update time (12.147 seconds)
Jan 31 19:10:32 ether pvestatd[4365]: status update time (38.787 seconds)
Jan 31 19:11:35 ether pvestatd[4365]: status update time (62.560 seconds)
Jan 31 19:13:06 ether pvestatd[4365]: status update time (91.687 seconds)
Jan 31 19:14:17 ether pvestatd[4365]: status update time (70.465 seconds)
Jan 31 19:14:31 ether pvestatd[4365]: status update time (14.243 seconds)
Jan 31 19:15:01 ether pvestatd[4365]: status update time (19.993 seconds)
Jan 31 19:16:26 ether pvestatd[4365]: status update time (84.586 seconds)

Can you please post the output of the following command as well?

Bash:
ps aux | grep '[p]vestatd'
Output of that command is

Code:
root@ether:~# ps aux | grep '[p]vestatd'
root        4365  0.2  0.2 271268 107044 ?       Ss   Jan20  43:58 pvestatd
 
Any updates to this topic. I'm having what appears to be the same problem. Proxmox is just about useless as I can't add any new machines or do anything requiring storage as it times out with the 596.

pvesm status 9 out of 10 times take a very long time to run. 1 in 10 times it's nearly instant.

I notice from Datacenter View Storage it's the CT items that time out.

Since New Years I've been evaluating 6 different Virtualization solutions and I've had to rebuild 3 times previously as Proxmox VE seems to develop some kind of issue that all but halts use. Only one other required a rebuild, XCP-ng required a reinstall as it was easier to start fresh with clusters that way. Haven't even gotten to Ceph or clusters yet with Proxmox.

Prior to this issue I thought everything was looking good but shortly after adding Turnkey containers and the latest updates it went down hill again. I personally really like the way ProxMox VE works and it seems to have the best core power features specifically for virtulization. But it seems to be quirky and have lots of "cosmetic errors". That's kind of a problem because when a real problem shows up and your looking at logs and come across these other "cosmetic errors" you can easily start chasing a problem that doesn't exist. That's probably at least partially responsible for previous rebuilds. Other strange things when testing ZFS pools is it randomly drops Drive IDs (still properly setup) for parts of the pool in vdev0 & 1. Worse is the first drive in vdev 2 wasn't setup in that vdev but was a spare so after replacing should have went back to spare duty. This has only happened that I've saw with odd numbered drives in the vdev. I've been testing all kinds of different pool setups from 4 way mirror stacked 30 high to raidz3 15 drives wide and stacked 7 high. ZFS ran as part of ProxMox seems to hit a performance wall with about 30 drives to 40 spinning rust drives even with 3 SAS controllers being used with 40 cores from 2 XEON Golds and 128GB memory available with no containers or VMs running.
1652604700302.png
I don't know if it's kernel tuning or what, but same pools imported to a Debian Bullseye with ZFS installed continue to get faster with additional vdevs added. I originally thought it was just me trying to adapt to VE but I'm not having these issues elsewhere. Next week I'm going to try another Lab computer SuperMicro FatTwin with 8 XEONs and 150 or so SAS drives to see if maybe it just doesn't like the Dell Servers.

Finally decided it was time to post and give a very brief synopsis and of course see if there is any movement on this issue since it's a show stopper for me right now.

Thanks,
Carlo
 
Any updates to this topic. I'm having what appears to be the same problem. Proxmox is just about useless as I can't add any new machines or do anything requiring storage as it times out with the 596.

pvesm status 9 out of 10 times take a very long time to run. 1 in 10 times it's nearly instant.

I notice from Datacenter View Storage it's the CT items that time out.

Since New Years I've been evaluating 6 different Virtualization solutions and I've had to rebuild 3 times previously as Proxmox VE seems to develop some kind of issue that all but halts use. Only one other required a rebuild, XCP-ng required a reinstall as it was easier to start fresh with clusters that way. Haven't even gotten to Ceph or clusters yet with Proxmox.

Prior to this issue I thought everything was looking good but shortly after adding Turnkey containers and the latest updates it went down hill again. I personally really like the way ProxMox VE works and it seems to have the best core power features specifically for virtulization. But it seems to be quirky and have lots of "cosmetic errors". That's kind of a problem because when a real problem shows up and your looking at logs and come across these other "cosmetic errors" you can easily start chasing a problem that doesn't exist. That's probably at least partially responsible for previous rebuilds. Other strange things when testing ZFS pools is it randomly drops Drive IDs (still properly setup) for parts of the pool in vdev0 & 1. Worse is the first drive in vdev 2 wasn't setup in that vdev but was a spare so after replacing should have went back to spare duty. This has only happened that I've saw with odd numbered drives in the vdev. I've been testing all kinds of different pool setups from 4 way mirror stacked 30 high to raidz3 15 drives wide and stacked 7 high. ZFS ran as part of ProxMox seems to hit a performance wall with about 30 drives to 40 spinning rust drives even with 3 SAS controllers being used with 40 cores from 2 XEON Golds and 128GB memory available with no containers or VMs running.
View attachment 36936
I don't know if it's kernel tuning or what, but same pools imported to a Debian Bullseye with ZFS installed continue to get faster with additional vdevs added. I originally thought it was just me trying to adapt to VE but I'm not having these issues elsewhere. Next week I'm going to try another Lab computer SuperMicro FatTwin with 8 XEONs and 150 or so SAS drives to see if maybe it just doesn't like the Dell Servers.

Finally decided it was time to post and give a very brief synopsis and of course see if there is any movement on this issue since it's a show stopper for me right now.

Thanks,
Carlo
Hi, I was able to solve it by rebooting the host several times.
 
  • Like
Reactions: cayars
Hello cayars,

I once had this issue over here, with the exact same error behaviour with regards to storage. I am not completely sure whether this is the same problem though, as you stated that you reinstalled Proxmox several times and that should have fixed the issue (But maybe not if you installed the same DE everytime).

Anyway, the error I was facing was a faulty interaction between Proxmox and udisks2, which I accidentally installed as a dependency of my desktop environment. The solution for this was to either deinstall udisks2 or alternatively mask the service (systemctl mask udisks2) so that your system thinks it is no longer installed.

If udisks2 is not the problem, you still might get a clearer picture of what is going on by looking at the output and what child processes are running with ps auxwf while pvesm status is hanging.
 
  • Like
Reactions: tipmtaio
Most excellent advise and you might be on to something.
I did a systemctl status udisks2 and it's running as well as showing an error "Error performing housekeeping for drive" for 2 different drives both 2.5" SATA drives.
I recently connected a HP 3PAR 24 bay inline I was thinking of using for SATA SSDs. No drives are actually being used but I did load it with about 8 drives I had laying around and they are online. Haven't done anything else with it since but obviously 2 drives might have an issue. Easy enough to shutdown and disconnect this storage chassis.

First order of business will be disconnecting the 3 PAR completely from the system to see if that changes things on it's own. Then I'll continue with what you suggested. Everything else is SAS drives except 3 Samsung 980 Pros used for ZFS raidz boot disk and then 4 NVMe drives. Only 2 in use at present for L2ARC. I could remove the L2ARC and remove this PCI card with the 4 NVMe drives as well if needed to simplify things a bit more but they've been installed since day one where the HP 3PAR is semi-recent (month maybe).
 
  • Like
Reactions: datschlatscher
Anyway, the error I was facing was a faulty interaction between Proxmox and udisks2, which I accidentally installed as a dependency of my desktop environment. The solution for this was to either deinstall udisks2 or alternatively mask the service (systemctl mask udisks2) so that your system thinks it is no longer installed.

If udisks2 is not the problem, you still might get a clearer picture of what is going on by looking at the output and what child processes are running with ps auxwf while pvesm status is hanging.

For some reason, my trouble started when I created a container. This was what I needed! Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!