VM unresponse issue

eric.chan

New Member
Mar 20, 2024
8
1
3
Hello,

I am a rookie user of Proxmox VE. I deploy Proxmox 8.1 version on a Dell Rx530 server, witch cpu is intel E5-2603 v4. I meet some issue with the VM unresponse, hope someone can help me to troubleshot.
Scenario 1:
I import ovf as VM on PVE, use command line: "qm importovf 103 /path/of/ovf local-lvm", when process reach about 50%, then physical Server' CPU usage grow higher than 60%, all VM in the same server go to unresponse status. VM's CPU also grow higher than 60% but network usage and disk io drop to 0. After importovf success, VM still in unresponse status untill I kill this vm process in console and restart it.

Scenario 2:
I migration VM from source serverA to target serverB. same thing happened as Scenario 1, physical server's CPU usage grow higher than 60%, all VM in the same server go to unresponse status. VM's CPU also grow higher than 60% but network usage and disk io drop to 0. Atfer migration success, VM still in unresponse status untill I kill this vm process in console and restart it.
 
Dell Rx530 server, witch cpu is intel E5-2603 v4
I guess you mean R530, that has 2 CPU Processor sockets. (I don't know of a Rx530).
How many CPU Processor's have you got.
The Intel E5-2603 v4 is an extremely limited chip, with only 6 non-Hyper-Threading cores @ 1.70 GHz.

How much RAM have you got?
How many VMs, LXCs are you running?
 
Hey,

are your VM disk's stored on a network share eg. NFS, SMB/CIFS or hyperconverged?
- If on a SAN/NAS what's the used connection type
- If not - are HDDs in the game?

Please paste the output of pveversion -v in a CODE-tag

Best
 
I guess you mean R530, that has 2 CPU Processor sockets. (I don't know of a Rx530).
How many CPU Processor's have you got.
The Intel E5-2603 v4 is an extremely limited chip, with only 6 non-Hyper-Threading cores @ 1.70 GHz.

How much RAM have you got?
How many VMs, LXCs are you running?
Hi,

Thanks for reply, sorry for typo. Yes, I mean Dell R530 with 6 x Intel(R) Xeon(R) CPU E5-2603 v4.
RAM is 128GB
2 VM are running, no LXC.

Thank you.
 
Thanks for reply, sorry for typo. Yes, I mean Dell R530 with 6 x Intel(R) Xeon(R) CPU E5-2603 v4.
RAM is 128GB
2 VM are running, no LXC.
RAM looks ok, although I don't know your VMs RAM allocation.
But you've only got 6 slow cores, between the Host & another 2 VMs. How many cores are allocated to the VMs?
Also I don't know anything about your storage/network config.
 
Hey,

are your VM disk's stored on a network share eg. NFS, SMB/CIFS or hyperconverged?
- If on a SAN/NAS what's the used connection type
- If not - are HDDs in the game?

Please paste the output of pveversion -v in a CODE-tag

Best

Hi,

Thanks for reply, I have tried VM disk store on local disk and NFS.
- for NFS connetion type is 1G ethernet.
- for local disk, HDD no in the game.

please check the following output:
root@PVE03:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5: 6.5.11-8
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.4
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-3
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
 
RAM looks ok, although I don't know your VMs RAM allocation.
But you've only got 6 slow cores, between the Host & another 2 VMs. How many cores are allocated to the VMs?
Also I don't know anything about your storage/network config.
I allocate 16G RAM per VM, 2 core per VM.
for storage I keep the default settings.
vm-disk locate in local-lvm
for network, just 1G local network

Thank you.
 
RAM looks ok, although I don't know your VMs RAM allocation.
But you've only got 6 slow cores, between the Host & another 2 VMs. How many cores are allocated to the VMs?
Also I don't know anything about your storage/network config.
That's sounds like the problem you occurring - when your host can't keep up the sustained load it will simply struggle to keep up with it's current tasks. In your situation: restoring VMs and simultaneously they freeze all together

Can you reproduce that issue again an check for dmesg errors and take screenshots of htop and paste your /etc/pve/storage.cfg config.
Is local-lvm your only datastore and are using some HDDs or modern SSDs?

Best
 
That's sounds like the problem you occurring - when your host can't keep up the sustained load it will simply struggle to keep up with it's current tasks. In your situation: restoring VMs and simultaneously they freeze all together

Can you reproduce that issue again an check for dmesg errors and take screenshots of htop and paste your /etc/pve/storage.cfg config.
Is local-lvm your only datastore and are using some HDDs or modern SSDs?

Best

Hello,

please check the attachment.

I have NAS storage as NFS to store vmk file.


root@STWPVE03:~# more /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content backup,iso,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

nfs: NAS04
export /volume1/PVE_Backup
path /mnt/pve/STWNAS04
server 10.xx.yy.139
content backup,images
options vers=4.1
prune-backups keep-all=1

nfs: NAS05
export /volume1/PVE_VMK
path /mnt/pve/STWNAS05
server 10.xx.yy.140
content images
prune-backups keep-all=1
 

Attachments

  • image.png
    image.png
    492.9 KB · Views: 2
  • dmesg.txt
    dmesg.txt
    115.1 KB · Views: 2
Last edited:
What I meant was - as you reproduce scenario 1 while the job is running and your VMs starting to freeze up.

Take a screenshot of htop so we can see if there are some droll behavior, you also could install sysstat and run mpstat 1 (to check CPU usage) paste the output in a CODE-tag. Check your disks IO with watch -n 1 iostat.

For scenario 2 you could additionally install nload then run nload -u m when restoring from a network share.

Best
 
What I meant was - as you reproduce scenario 1 while the job is running and your VMs starting to freeze up.

Take a screenshot of htop so we can see if there are some droll behavior, you also could install sysstat and run mpstat 1 (to check CPU usage) paste the output in a CODE-tag. Check your disks IO with watch -n 1 iostat.

For scenario 2 you could additionally install nload then run nload -u m when restoring from a network share.

Best
OK, I will test again. will update the result, thanks for help
 
  • Like
Reactions: Hqu
My bad! Please use sar -u 1 instead of mpstat 1 so everything which utilize CPU will be collected