Multiple issues accidental backup maxed storage on new VM to now cant access web interface

kirkyg

Member
Jan 31, 2022
2
0
6
44
Where do i start...
Here is my current setup - I have a windows 10 VM and unraid VM setup on a single node that i'm been working on for the past week

My next step was to install my RTX 3070 GPU and pass it through to the windows 10 VM. I realized in this process that I had setup the VM with machine type i440fx instead of g35 and and bios was set to default(seaBIOS) instead of OVMF (UEFI) which many people have pointed out is not easy to convert over. I actually was initially successful in following instructions from someone on converting to GPT and able to get it to boot/follow steps for gpu passthrough, but eventually i ran into more issues and decide to just destroy the VM and start over.

Next i setup a new windows 10 VM from scratch (just in case i used a different VMid although backups was the only thing that i could find might be an issue with reusing a VMid (which i dont have backups setup yet). After following the correct steps initially to OVMF (UEFI) + g35 i was able to get windows 10 going with GPU passed through and see it in device manager. But almost immediately my system started not responding, connecting via RDP was super slow and almost nonresponsive. I realized that the drive that proxmox was installed on was maxing out in space after follow more instructions i found on forums to identify based on behavior.

At this point i could do nothing really because the VM was locked and i realize that i had forgotten to uncheck backup on the windows 10 storage 100gb drive. So I'm pretty sure that was the real issue. Only option was to forcefully shutdown and restart but when i did the web gui would not launch - i was so tired at that point i just turned the box off and slept. This morning i thought Oh - let me try to SSH in and was able to access the proxmox machine. I've been looking into why the web interface to see why it randomly because in accessible.

Here is my /etc/hosts:

root@pve:~# cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.1.30 pve.server.local pve # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts

Here is lsof -i output:

root@pve:~# lsof -i COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME systemd 1 root 36u IPv4 26789 0t0 TCP *:sunrpc (LISTEN) systemd 1 root 37u IPv4 23130 0t0 UDP *:sunrpc systemd 1 root 38u IPv6 24805 0t0 TCP *:sunrpc (LISTEN) systemd 1 root 39u IPv6 26114 0t0 UDP *:sunrpc rpcbind 1278 _rpc 4u IPv4 26789 0t0 TCP *:sunrpc (LISTEN) rpcbind 1278 _rpc 5u IPv4 23130 0t0 UDP *:sunrpc rpcbind 1278 _rpc 6u IPv6 24805 0t0 TCP *:sunrpc (LISTEN) rpcbind 1278 _rpc 7u IPv6 26114 0t0 UDP *:sunrpc sshd 1449 root 3u IPv4 33580 0t0 TCP *:ssh (LISTEN) sshd 1449 root 4u IPv6 33582 0t0 TCP *:ssh (LISTEN) chronyd 1467 _chrony 5u IPv4 21412 0t0 UDP localhost.localdomain:323 chronyd 1467 _chrony 6u IPv6 21413 0t0 UDP ip6-localhost:323 pvedaemon 1513 root 6u IPv4 41286 0t0 TCP localhost.localdomain:85 (LISTEN) pvedaemon 1514 root 6u IPv4 41286 0t0 TCP localhost.localdomain:85 (LISTEN) pvedaemon 1515 root 6u IPv4 41286 0t0 TCP localhost.localdomain:85 (LISTEN) pvedaemon 1516 root 6u IPv4 41286 0t0 TCP localhost.localdomain:85 (LISTEN) spiceprox 1528 www-data 6u IPv6 29286 0t0 TCP *:3128 (LISTEN) spiceprox 1529 www-data 6u IPv6 29286 0t0 TCP *:3128 (LISTEN) sshd 1641 root 4u IPv4 33620 0t0 TCP pve.server.local:ssh->192.168.1.36:50362 (ESTABLISHED) pveproxy 1918 www-data 6u IPv6 40230 0t0 TCP *:8006 (LISTEN) pveproxy 2256 www-data 6u IPv6 40230 0t0 TCP *:8006 (LISTEN) pveproxy 2257 www-data 6u IPv6 40230 0t0 TCP *:8006 (LISTEN) pveproxy 2258 www-data 6u IPv6 40230 0t0 TCP *:8006 (LISTEN)

output from systemctl status pveproxy

root@pve:~# systemctl status pveproxy ● pveproxy.service - PVE API Proxy Server Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2023-02-09 08:50:02 CST; 2min 55s ago Process: 1912 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=111) Process: 1914 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS) Main PID: 1918 (pveproxy) Tasks: 4 (limit: 76996) Memory: 132.9M CPU: 3.039s CGroup: /system.slice/pveproxy.service ├─1918 pveproxy ├─2042 pveproxy worker ├─2043 pveproxy worker └─2044 pveproxy worker Feb 09 08:52:57 pve pveproxy[1918]: worker 2041 finished Feb 09 08:52:57 pve pveproxy[1918]: worker 2039 finished Feb 09 08:52:57 pve pveproxy[1918]: worker 2040 finished Feb 09 08:52:57 pve pveproxy[1918]: starting 3 worker(s) Feb 09 08:52:57 pve pveproxy[1918]: worker 2042 started Feb 09 08:52:57 pve pveproxy[1918]: worker 2043 started Feb 09 08:52:57 pve pveproxy[1918]: worker 2044 started Feb 09 08:52:57 pve pveproxy[2042]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/pe> Feb 09 08:52:57 pve pveproxy[2043]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/pe> Feb 09 08:52:57 pve pveproxy[2044]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/pe>

  1. First task is to get web GUI accessible
  2. Second task is to confirm the disk storage issue was causing vm to lockup due to backup and see if im now able to set the windows 10 vm to not backup and clear the excess data that was created/or in logging that was running away.
  3. Third thing is to verify that GPU passthrough is indeed working and vm performing / able to do GPU tasks/rendering/games etc.
Any help would be much appreciated!

Thanks,

kirkyg
 
Some more interesting things. I noticed the IP for my VMs is not visible on my router so i went to check the status of the VMs with

qm status and i get:

Code:
root@pve:~# qm status
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

and journalctl -u pve* -b0 shows errors

Code:
root@pve:~# journalctl -u pve* -b0
Journal file /var/log/journal/e253c15736b949ee8507aabbdfdf0ac1/system.journal is truncated, ignoring file.
-- Journal begins at Wed 2023-02-08 23:58:03 CST, ends at Thu 2023-02-09 10:39:03 CST. --
Feb 09 08:41:41 pve systemd[1]: Starting Proxmox VE Login Banner...
Feb 09 08:41:41 pve systemd[1]: Starting Proxmox VE firewall logger...
Feb 09 08:41:41 pve systemd[1]: Starting Commit Proxmox VE network changes...
Feb 09 08:41:41 pve systemd[1]: Finished Commit Proxmox VE network changes.
Feb 09 08:41:41 pve pvefw-logger[1289]: starting pvefw logger
Feb 09 08:41:41 pve systemd[1]: Started Proxmox VE firewall logger.
Feb 09 08:41:42 pve systemd[1]: Starting Proxmox VE LXC Syscall Daemon...
Feb 09 08:41:42 pve systemd[1]: Started Proxmox VE LXC Syscall Daemon.
Feb 09 08:41:42 pve systemd[1]: Finished Proxmox VE Login Banner.
Feb 09 08:41:43 pve systemd[1]: Reached target PVE Storage Target.
Feb 09 08:41:43 pve systemd[1]: Started Daily PVE download activities.
Feb 09 08:41:43 pve systemd[1]: Starting The Proxmox VE cluster filesystem...
Feb 09 08:41:43 pve pmxcfs[1497]: [database] crit: unable to set WAL mode: disk I/O error#010
Feb 09 08:41:43 pve pmxcfs[1497]: [database] crit: unable to set WAL mode: disk I/O error#010
Feb 09 08:41:43 pve pmxcfs[1497]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Feb 09 08:41:43 pve pmxcfs[1497]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 09 08:41:43 pve pmxcfs[1497]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Feb 09 08:41:43 pve pmxcfs[1497]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 09 08:41:43 pve systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Feb 09 08:41:43 pve systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 09 08:41:43 pve systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Feb 09 08:41:44 pve systemd[1]: Starting Proxmox VE firewall...
Feb 09 08:41:44 pve systemd[1]: Starting PVE API Daemon...
Feb 09 08:41:44 pve systemd[1]: Starting PVE Status Daemon...
Feb 09 08:41:44 pve systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 1.
Feb 09 08:41:44 pve systemd[1]: Stopped The Proxmox VE cluster filesystem.
Feb 09 08:41:44 pve systemd[1]: Starting The Proxmox VE cluster filesystem...
Feb 09 08:41:44 pve pmxcfs[1510]: [database] crit: unable to set WAL mode: disk I/O error#010
Feb 09 08:41:44 pve pmxcfs[1510]: [database] crit: unable to set WAL mode: disk I/O error#010
Feb 09 08:41:44 pve pmxcfs[1510]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Feb 09 08:41:44 pve pmxcfs[1510]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 09 08:41:44 pve pmxcfs[1510]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Feb 09 08:41:44 pve pmxcfs[1510]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 09 08:41:44 pve systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Feb 09 08:41:44 pve systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 09 08:41:44 pve systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Feb 09 08:41:44 pve systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 2.
Feb 09 08:41:44 pve systemd[1]: Stopped The Proxmox VE cluster filesystem.
Feb 09 08:41:44 pve systemd[1]: Starting The Proxmox VE cluster filesystem...
Feb 09 08:41:44 pve pmxcfs[1511]: [database] crit: unable to set WAL mode: disk I/O error#010
Feb 09 08:41:44 pve pmxcfs[1511]: [database] crit: unable to set WAL mode: disk I/O error#010
Feb 09 08:41:44 pve pmxcfs[1511]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Feb 09 08:41:44 pve pmxcfs[1511]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 09 08:41:44 pve pmxcfs[1511]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!