after upgrading from 5.x to 6.x, iSCSI VMs can only be started via commandline

udotirol

Active Member
Mar 9, 2018
44
15
28
51
Hi,

I've upgraded one of our clusters from 5.4 to 6.2 only yesterday and unfortunately cannot start VMs using iSCSI storage using the UI.

The logs don't show much, all I see there is Error: start failed: QEMU exited with code -1

Researching the forum I read that errors like these are likely to be storage related and indeed, I can start VMs that only use local storage.

And even stranger (but luckily for me), I can start the iSCSI based VMs on the commandline via qm start $VMID

Any ideas what could be causing this and how I can fix it?
 

danielb

Well-Known Member
Jun 1, 2018
211
53
48
36
Bordeaux (france)
Exactly the same issue with ZFS over iSCSI storage. Probably a regression in a very recent 6.x update, haven't tracked it down yet
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,221
1,137
164
Exactly the same issue with ZFS over iSCSI storage. Probably a regression in a very recent 6.x update, haven't tracked it down yet
Will answer here (instead of the pve-user list) since it probably reaches more users.

* Are you able to start the VM when logging in as 'root@pam' into the GUI?

* does starting the VM via pvesh on the commandline of the node work?
Code:
 pvesh create /nodes/<nodename>/qemu/<vmid>/status/start


(The original thread on pve-user can be found at https://lists.proxmox.com/pipermail/pve-user/2020-October/172056.html)
 

danielb

Well-Known Member
Jun 1, 2018
211
53
48
36
Bordeaux (france)
Will answer here (instead of the pve-user list) since it probably reaches more users.

* Are you able to start the VM when logging in as 'root@pam' into the GUI?

No, it doesn't work from the GUI even when logged in with the root@pam account

* does starting the VM via pvesh on the commandline of the node work?
Code:
pvesh create /nodes/<nodename>/qemu/<vmid>/status/start

Yes, starting like this does work
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,221
1,137
164
No, it doesn't work from the GUI even when logged in with the root@pam account
ok this is weird (since the pvesh command (and qm start)) do the same action in the code path)...

Could you please:
* try using a completely fresh browser (if possible install a new one - else try a private browsing window)
* check the browser's developer tools for any errors from JavaScript (or the network)
* try POSTing the VM start via Curl (see https://pve.proxmox.com/wiki/Proxmox_VE_API)

Thanks!
 

udotirol

Active Member
Mar 9, 2018
44
15
28
51
glad to see I'm not the only one in this boat :)

As with daniel, using root@pam to start the VMs doesn't work here either but starting them via the pvesh works.

And, completely unrelated, the forum upgrade from a couple of minutes ago messed the posting icons in firefox:

utf8fun.png
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,221
1,137
164
And, completely unrelated, the forum upgrade from a couple of minutes ago messed the posting icons in firefox:
* do they still work in another browser?! (for us here the problem does not occur)
* does ctrl+f5 resolve the issue?
 

udotirol

Active Member
Mar 9, 2018
44
15
28
51
alright, I tried with chromium in private mode (I usually use FF). The only thing you see in the console is some complaints about missing files. The errors appear right after logging in, so they are not related to clicking on "start" or something.

console.png

Will try with the API in a second
 

udotirol

Active Member
Mar 9, 2018
44
15
28
51
starting via the API didn't work either:


Bash:
[udo@veteris ~]$ curl  --insecure --cookie "$(<cookie)" https://rambler:8006/api2/json/nodes/rambler/qemu/133/status/start | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    13  100    13    0     0     55      0 --:--:-- --:--:-- --:--:--    55
{
  "data": null
}

Just to be sure, I also tried to get the node status and that just returned nicely:

Bash:
[udo@veteris ~]$ curl  --insecure --cookie "$(<cookie)" https://rambler:8006/api2/json/nodes/rambler/status | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1180  100  1180    0     0   6020      0 --:--:-- --:--:-- --:--:--  6020
{
  "data": {
    "kversion": "Linux 5.4.65-1-pve #1 SMP PVE 5.4.65-1 (Mon, 21 Sep 2020 15:40:22 +0200)",
    "cpu": 0,
    "memory": {
      "total": 270437965824,

And the forum issue is gone after a reload, sorry for the noise :)
 

udotirol

Active Member
Mar 9, 2018
44
15
28
51
well,
btw this hast to be a "POST" call, not get

well, what kind of POST payload would I send to this endpoint? Anyhow, this is what happens when I do the same request using POST:

Bash:
$ curl  -X --insecure --cookie "$(<cookie)" https://rambler:8006/api2/json/nodes/rambler/qemu/133/status/start | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
$

... and subsequently the VM does not start just like before.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,221
1,137
164
some information is also sent in the headers (i.e. the error message next to a 501 response) - and you'd need to provide the CSRF-prevention token as header to the post request:
Code:
curl -v -XPOST -k -H "$cephcsrf"  -b $cephtick https://buster-ceph01:8006/api2/json//nodes/localhost/qemu/101/status/start
(as exemplified in the wiki-page above)
 

udotirol

Active Member
Mar 9, 2018
44
15
28
51
some information is also sent in the headers (i.e. the error message next to a 501 response) - and you'd need to provide the CSRF-prevention token as header to the post request:
Code:
curl -v -XPOST -k -H "$cephcsrf"  -b $cephtick https://buster-ceph01:8006/api2/json//nodes/localhost/qemu/101/status/start
(as exemplified in the wiki-page above)
allright, so this is a little bit more complicated.

Bash:
[udo@veteris ~]$ curl -v -XPOST -k -H "$(<csrftoken)" -b "PVEAuthCookie=$(<csrftick)"  https://rambler:8006/api2/json//nodes/rambler/qemu/133/status/start | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.21.254.11:8006...
* Connected to rambler (172.21.254.11) port 8006 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [6 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [1735 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [520 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=AT; ST=Tirol; L=Innsbruck; O=BestSolution...
*  start date: Nov 13 13:52:00 2018 GMT
*  expire date: Nov 13 13:52:00 2023 GMT
*  issuer: C=AT; ST=Tyrol; L=Innsbruck; O=BestSolution...
*  SSL certificate verify ok.
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0} [5 bytes data]
> POST /api2/json//nodes/rambler/qemu/133/status/start HTTP/1.1
> Host: rambler:8006
> User-Agent: curl/7.72.0
> Accept: */*
> Cookie: PVEAuthCookie=PVE:udo.rader@example.com@example:5F7F4151::R5muW4IUBajIsi/2oY0PRnW5...==
> CSRFPreventionToken:5F7F3E1A:ixHX...
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [217 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [217 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Cache-Control: max-age=0
< Connection: close
< Connection: Keep-Alive
< Date: Thu, 08 Oct 2020 16:43:33 GMT
< Pragma: no-cache
< Server: pve-api-daemon/3.0
< Content-Length: 102
< Content-Type: application/json;charset=UTF-8
< Expires: Thu, 08 Oct 2020 16:43:33 GMT
< 
{ [102 bytes data]
100   102  100   102    0     0    326      0 --:--:-- --:--:-- --:--:--   325
* Closing connection 0
} [5 bytes data]
* TLSv1.3 (OUT), TLS alert, close notify (256):
} [2 bytes data]
{
  "data": "UPID:rambler:00002220:00F0C5E3:5F7F41B5:qmstart:133:udo.rader@example.com@example:"
}
$

I see HTTP/1.1 200 OK and I guess it means that everything should have worked. It hasn't however, the qm has not been started.

Just to be sure that I had all those nice litte parameters extracted correctly, I decided to manually start the iSCSI VM on the commandline and then shut it down via the API. This worked indeed:

first check on the PVE host that the VM is running:
Bash:
root@rambler:~# qm list | grep 133
       133 plinius12            running    512               16.00 45146     
root@rambler:~#

stop it from via API from my workstation:
Bash:
[udo@veteris ~]$ curl -v -XPOST -k -H "$(<csrftoken)" -b "PVEAuthCookie=$(<csrftick)"  https://rambler:8006/api2/json//nodes/rambler/qemu/133/status/stop | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.21.254.11:8006...
* Connected to rambler (172.21.254.11) port 8006 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [6 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [1735 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [520 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=AT; ST=Tirol; L=Innsbruck; O=...
*  start date: Nov 13 13:52:00 2018 GMT
*  expire date: Nov 13 13:52:00 2023 GMT
*  issuer: C=AT; ST=Tyrol; L=Innsbruck; O=...
*  SSL certificate verify ok.
} [5 bytes data]
> POST /api2/json//nodes/rambler/qemu/133/status/stop HTTP/1.1
> Host: rambler:8006
> User-Agent: curl/7.72.0
> Accept: */*
> Cookie: PVEAuthCookie=PVE:udo.rader@example.com@example:5F7F4151::R5mu...==
> CSRFPreventionToken:5F7F3E1A:ixHX...
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [217 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [217 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Cache-Control: max-age=0
< Connection: close
< Connection: Keep-Alive
< Date: Thu, 08 Oct 2020 16:48:57 GMT
< Pragma: no-cache
< Server: pve-api-daemon/3.0
< Content-Length: 101
< Content-Type: application/json;charset=UTF-8
< Expires: Thu, 08 Oct 2020 16:48:57 GMT
<
{ [101 bytes data]
100   101  100   101    0     0    337      0 --:--:-- --:--:-- --:--:--   337
* Closing connection 0
} [5 bytes data]
* TLSv1.3 (OUT), TLS alert, close notify (256):
} [2 bytes data]
{
  "data": "UPID:rambler:0000E237:00F14460:5F7F42F9:qmstop:133:udo.rader@example.com@example:"
}
[udo@veteris ~]$

Bash:
root@rambler:~# qm list | grep 133
       133 plinius12            stopped    512               16.00 0  
root@rambler:~#

Looks like only the "start" is misbehaving ...
 
  • Like
Reactions: fabian

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,221
1,137
164
I see HTTP/1.1 200 OK and I guess it means that everything should have worked. It hasn't however, the qm has not been started.
the start-task (which gets handed to a worker since it might take longer) got created successfully,
however this task failed...

In any case - managed to reproduce the issue here locally - will need a bit to find the cause

Thanks for reporting and testing!
 
  • Like
Reactions: fabian

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!