Proxmox freeze some kvm machines

therrynoprat

New Member
Mar 9, 2019
7
0
1
54
Hi everyone.
As I user, I would like to thank you for this great product you are sharing with this community.

Unfortunately, I'm not only here to thank you..

I recently started to have problems with some of the virtual machines in my installation. My hardware is a ryzon 5 2600 with 16 ecc gb of ram.

After a couple of weeks of uptimes with no problems, now just TWO (router and nas) of the 4 VM I have cannot work, because they freeze after a couple of seconds of uptime.

The only thing left for me to do is to brutally do a

kill -9 $PID

When restarted, a couple of second after the login, the machine freeze again.
It is not possible to use the proxmox console nor connect via ssh to the machines in any way.

How can I understand what is going on with the two virtual machines?
If I am quick enough I can write a

sudo dmesg -kw

but nothing is printed in the console before the machine freeze.


I cannot link an image because I am a new user to the forum, but on the menu on the left, in the list of all the virtual machines, I can see that instead of having a green checkmark, the freezed VMs have a yellow triangle sign, signaling a problem.


The freeze seems to be random. When I turn on the VM, I can follow the boot on the console with no errors, and sometimes I can even make it through the login! But after a couple of minutes the logo of the VM changes and the VM is unreachable.

This happens to just two of the machines in the network. All of them are always Debian 9 machine from the same iso. There are no major differences between them.


Thanks for your help.
 
Last edited:
I have also noted the following fact.

The machine that I use for NAS might keep working for a couple of minutes. It always freeze after I start some of CPU intensive tasks (for example a duplicity between two disks..). This is weird. :/
Other machines, while they can stay online almost undefinitely, freeze as well as soon as I try to put a little bit of workload on them..
 
Last edited:
Please post the VMconfigs. Are there bad logs on the host?
Code:
journalctl -r -p3
Was there an bios update or any changes? What pveversion to you use?
 
Sorry for the late reply, I was afk and the server got a kernel panic :(

I am running
pve-manager/5.3-11/d4907f84 (running kernel: 4.15.18-11-pve)

Unfortunately I am not able to show the one of the previous kernel panic, but this is the journalctl report.

root@tortuga:~# journalctl -r -p3
-- Logs begin at Mon 2019-03-18 20:13:55 CET, end at Mon 2019-03-18 20:42:01 CET. --
Mar 18 20:18:34 tortuga pvedaemon[3540]: <root@pam> end task UPID:tortuga:0000115A:00006C05:5C8FEF0A:vncproxy:105:root@pam: Failed to run vncproxy.
Mar 18 20:18:34 tortuga pvedaemon[4442]: Failed to run vncproxy.
Mar 18 20:18:34 tortuga qm[4444]: VM 105 qmp command failed - VM 105 not running
Mar 18 20:18:31 tortuga pvedaemon[3540]: <root@pam> end task UPID:tortuga:00001114:00006AC3:5C8FEF07:vncproxy:105:root@pam: Failed to run vncproxy.
Mar 18 20:18:31 tortuga pvedaemon[4372]: Failed to run vncproxy.
Mar 18 20:18:31 tortuga qm[4374]: VM 105 qmp command failed - VM 105 not running
Mar 18 20:18:08 tortuga pvedaemon[3541]: VM 105 qmp command failed - VM 105 qmp command 'guest-ping' failed - got timeout
Mar 18 20:18:04 tortuga pvedaemon[3541]: VM 103 qmp command failed - VM 103 qmp command 'guest-ping' failed - got timeout
Mar 18 20:17:58 tortuga pvedaemon[3539]: VM 105 qmp command failed - VM 105 qmp command 'guest-ping' failed - got timeout
Mar 18 20:14:10 tortuga pveupdate[4787]: <root@pam> end task UPID:tortuga:000013BA:0000080C:5C8FEE01:aptupdate::root@pam: command 'apt-get update' failed: exit code 100
Mar 18 20:14:10 tortuga pveupdate[5050]: command 'apt-get update' failed: exit code 100
Mar 18 20:14:04 tortuga iscsid[4331]: iSCSI daemon with pid=4333 started!
Mar 18 20:13:55 tortuga kernel: Error: Driver 'pcspkr' is already registered, aborting...
Mar 18 20:13:55 tortuga kernel: Couldn't get size: 0x800000000000000e
Mar 18 20:13:55 tortuga kernel: ACPI Exception: AE_AML_OPERAND_TYPE, Could not execute arguments for [IOB2] (Region) (20170831/nsinit-426)
Mar 18 20:13:55 tortuga kernel: ACPI Error: Needed [Integer/String/Buffer], found [Region] (ptrval) (20170831/exresop-424)
 
Also, I was able to set up a serial terminal, so to see if I can get some log in the VM before the crahs.

Router VM is no longer usable because it crashes at each reboot just after a few seconds. On the other hand, the VM nas instead allow me to play for a while.

For instance I can safely play with the command stress with no problem (so it should not be a problem with the CPU), but the VM crash when I try to run an apt upgrade.

Nevertheless, on the serial terminal I am attached to from the host following this tutorial, there are NO error messages of any sort.
 
This is the console tty0 output of the machine router before dying.


Code:
   0.218936] acpiphp: Slot [1-2] registered                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   [0/279][    0.219370] acpiphp: Slot [2-2] registered
[    0.220023] acpiphp: Slot [3-3] registered
[    0.220988] acpiphp: Slot [4-3] registered
[    0.221422] acpiphp: Slot [5-3] registered
[    0.221855] acpiphp: Slot [6-3] registered
[    0.222285] acpiphp: Slot [7-3] registered
[    0.222725] acpiphp: Slot [8-3] registered
[    0.223163] acpiphp: Slot [9-3] registered
[    0.223603] acpiphp: Slot [10-3] registered
[    0.224032] acpiphp: Slot [11-3] registered
[    0.224473] acpiphp: Slot [12-3] registered
[    0.224910] acpiphp: Slot [13-3] registered
[    0.225352] acpiphp: Slot [14-3] registered
[    0.225797] acpiphp: Slot [15-3] registered
[    0.226246] acpiphp: Slot [16-3] registered
[    0.226681] acpiphp: Slot [17-3] registered
[    0.227130] acpiphp: Slot [18-3] registered
[    0.227569] acpiphp: Slot [19-3] registered
[    0.228014] acpiphp: Slot [20-3] registered
[    0.228462] acpiphp: Slot [21-3] registered
[    0.228910] acpiphp: Slot [22-3] registered
[    0.229355] acpiphp: Slot [23-3] registered
[    0.229796] acpiphp: Slot [24-3] registered
[    0.230244] acpiphp: Slot [25-3] registered
[    0.230688] acpiphp: Slot [26-3] registered
[    0.231137] acpiphp: Slot [27-3] registered
[    0.231587] acpiphp: Slot [28-3] registered
[    0.232032] acpiphp: Slot [29-3] registered
[    0.232479] acpiphp: Slot [30-2] registered
[    0.232918] acpiphp: Slot [31-2] registered
[    0.233528] pci 0000:00:1f.0: PCI bridge to [bus 02]
[    0.235292] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.235979] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.236233] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.236904] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.237558] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[    0.238415] ACPI: Enabled 3 GPEs in block 00 to 0F
[    0.240016] vgaarb: setting as boot device: PCI:0000:00:02.0
[    0.240580] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.240580] vgaarb: loaded
[    0.240580] vgaarb: bridge control possible 0000:00:02.0
[    0.244030] PCI: Using ACPI for IRQ routing
[    0.244730] HPET: 3 timers in total, 0 timers will be used for per-cpu timer
[    0.244802] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.245373] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    0.252046] clocksource: Switched to clocksource kvm-clock
[    0.257872] VFS: Disk quotas dquot_6.6.0
[    0.258346] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.259161] pnp: PnP ACPI init
[    0.259875] pnp: PnP ACPI: found 5 devices
[    0.267346] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.269280] pci 0000:00:1e.0: PCI bridge to [bus 01]
[    0.269831] pci 0000:00:1e.0:   bridge window [io  0xd000-0xdfff]
[    0.271095] pci 0000:00:1e.0:   bridge window [mem 0xfe800000-0xfe9fffff]
[    0.272234] pci 0000:00:1e.0:   bridge window [mem 0xfe200000-0xfe3fffff 64bit pref]
[    0.273855] pci 0000:00:1f.0: PCI bridge to [bus 02]
[    0.275144] pci 0000:00:1f.0:   bridge window [io  0xc000-0xcfff]
[    0.276464] pci 0000:00:1f.0:   bridge window [mem 0xfe600000-0xfe7fffff]
[    0.277615] pci 0000:00:1f.0:   bridge window [mem 0xfe000000-0xfe1fffff 64bit pref]
[    0.279443] NET: Registered protocol family 2
[    0.280116] TCP established hash table entries: 32768 (order: 6, 262144 bytes)
[    0.280926] TCP bind hash table entries: 32768 (order: 7, 524288 bytes)
[    0.281964] TCP: Hash tables configured (established 32768 bind 32768)
[    0.282678] UDP hash table entries: 2048 (order: 4, 65536 bytes)
[    0.283329] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes)
[    0.284086] NET: Registered protocol family 1
[    0.284556] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.285192] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.285817] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.303095] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
[    0.320641] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    0.321578] Unpacking initramfs...
[    0.545445] Freeing initrd memory: 17984K
[    0.545928] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    0.546607] software IO TLB [mem 0xbbfdb000-0xbffdb000] (64MB) mapped at [ffff98117bfdb000-ffff98117ffdafff]
[    0.547672] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30eac5b3ab4, max_idle_ns: 440795272524 ns
[    0.549152] audit: initializing netlink subsys (disabled)
[    0.549892] audit: type=2000 audit(1552948799.640:1): initialized
[    0.551096] workingset: timestamp_bits=40 max_order=20 bucket_order=0
[    0.552145] zbud: loaded
[    0.553454] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
[    0.554319] io scheduler noop registered
[    0.554758] io scheduler deadline registered
[    0.555238] io scheduler cfq registered (default)
[    0.556123] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    0.556721] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    0.557575] GHES: HEST is not enabled!
[    0.558042] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.581085] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    0.582146] Linux agpgart interface v0.103
[    0.582630] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    0.583228] AMD IOMMUv2 functionality not available on this system
[    0.584319] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    0.586407] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.586946] serio: i8042 AUX port at 0x60,0x64 irq 12
[    0.587609] mousedev: PS/2 mouse device common for all mice
[    0.588397] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    0.589493] rtc_cmos 00:00: RTC can wake from S4
[    0.590222] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[    0.590984] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram, hpet irqs
[    0.591938] ledtrig-cpu: registered to indicate activity on CPUs
[    0.592797] NET: Registered protocol family 10
[    0.593483] mip6: Mobile IPv6
[    0.593809] NET: Registered protocol family 17
[    0.594284] mpls_gso: MPLS GSO support
[    0.595010] registered taskstats version 1
[    0.595463] zswap: loaded using pool lzo/zbud
[    0.596220] ima: No TPM chip found, activating TPM-bypass!
[    0.596814] ima: Allocated hash algorithm: sha256
[    0.597780] rtc_cmos 00:00: setting system clock to 2019-03-18 22:39:58 UTC (1552948798)
[    0.599630] Freeing unused kernel memory: 1420K
[    0.600128] Write protecting the kernel read-only data: 12288k
[    0.601192] Freeing unused kernel memory: 1908K
[    0.603417] Freeing unused kernel memory: 1224K
[    0.607760] x86/mm: Checked W+X mappings: passed, no W+X pages found.
Loading, please wait...
starting version 232
[    0.617532] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[    0.618340] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[    0.618345] random: udevadm: uninitialized urandom read (16 bytes read)
[    0.635402] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0x700, revision 0
[    0.643435] SCSI subsystem initialized
[    0.644494] ACPI: bus type USB registered
[    0.645148] usbcore: registered new interface driver usbfs
[    0.645855] usbcore: registered new interface driver hub
[    0.646524] usbcore: registered new device driver usb
[    0.647879] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input3
[    0.649140] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input2
[    0.650369] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.651852] uhci_hcd: USB Universal Host Controller Interface driver
[    0.652659] FDC 0 is a S82078B
[    0.656494] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10
[    0.672353] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.703381] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.705581] scsi host0: ata_piix
[    0.706192] scsi host2: ata_piix
[    0.706737] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xe0e0 irq 14
[    0.707435] scsi host1: Virtio SCSI HBA
[    0.707944] scsi 1:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    0.709909] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xe0e8 irq 15
[    0.710971] virtio_net virtio3 ens18: renamed from eth0
[    0.726359] uhci_hcd 0000:00:01.2: UHCI Host Controller
[    0.726893] uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1
[    0.727628] uhci_hcd 0000:00:01.2: detected 2 ports
[    0.728191] uhci_hcd 0000:00:01.2: irq 11, io base 0x0000e080
[    0.728821] usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
[    0.729485] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    0.730199] usb usb1: Product: UHCI Host Controller
[    0.730686] usb usb1: Manufacturer: Linux 4.9.0-8-amd64 uhci_hcd
[    0.731286] usb usb1: SerialNumber: 0000:00:01.2
[    0.731849] hub 1-0:1.0: USB hub found
[    0.732271] hub 1-0:1.0: 2 ports detected
[    0.761665] random: fast init done
[    0.868724] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[    0.869728] ata2.00: configured for MWDMA2
[    0.870713] scsi 2:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
[    0.889596] sd 1:0:0:0: [sda] 67108864 512-byte logical blocks: (34.4 GB/32.0 GiB)
[    0.890797] sd 1:0:0:0: [sda] Write Protect is off
[    0.891495] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    0.894014]  sda: sda1 sda2 < sda5 >
[    0.894936] sd 1:0:0:0: [sda] Attached SCSI disk
[    0.904458] sr 2:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[    0.905173] cdrom: Uniform CD-ROM driver Revision: 3.20
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... [    0.990490] PM: Starting manual resume from disk
done.
[    1.060024] usb 1-1: new full-speed USB device number 2 using uhci_hcd
Begin: Will now check root file system ... fsck from util-linux 2.29.2
[/sbin/fsck.ext4 (1) -- /dev/sda1] fsck.ext4 -a -C0 /dev/sda1
/dev/sda1: recovering journal
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!