VMs suddenly stopped and I have to start them again.

It might not be a crash in userspace then, but in the kernel.
just wondering if this has a major effect afterupgrade?

update-initramfs: Generating /boot/initrd.img-6.2.16-3-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
 
  • Like
Reactions: carles89
1. First check what repos you have enabled. (You can look in the GUI under Node , Updates ,Repositories).
2. You must have one PVE active repo. If you have a subscription that will be pve-enterprise , if you do not it will be pve-no-subscription .
3. To update; You can either use the GUI, Node , Updates & Press Refresh & then >_ Upgrade, or use the CLI with apt-get update & then apt-get dist-upgrade , (don't use anything else in CLI!)

here is what I got so far.
root@awpve:~# apt update
Get:1 http://security.debian.org bookworm-security InRelease [48.0 kB]
Hit:2 http://ftp.tr.debian.org/debian bookworm InRelease
Hit:3 http://ftp.tr.debian.org/debian bookworm-updates InRelease
Fetched 48.0 kB in 8s (5,732 B/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
root@awpve:~# apt upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@awpve:~# apt dist-upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@awpve:~#
 
Code:
nano /etc/apt/sources.list.d/pve-install-repo.list

#enter the following

deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription

#save & exit
(Look carefully at that pve-no-subscription NOT pbs-no-subscription)
 
Code:
nano /etc/apt/sources.list.d/pve-install-repo.list

#enter the following

deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription

#save & exit
(Look carefully at that pve-no-subscription NOT pbs-no-subscription)
Thx

I see so you opting for stable ones.

here is what I did

root@awpve:~# nano /etc/apt/sources.list.d/pve-install-repo.list
root@awpve:~# apt update
Hit:1 http://security.debian.org bookworm-security InRelease
Hit:2 http://ftp.tr.debian.org/debian bookworm InRelease
Hit:3 http://ftp.tr.debian.org/debian bookworm-updates InRelease
Get:4 http://download.proxmox.com/debian/pve bookworm InRelease [2,768 B]
Get:5 http://download.proxmox.com/debian/pve bookworm/pve-no-subscription amd64 Packages [361 kB]
Fetched 364 kB in 9s (40.0 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
58 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@awpve:~# apt list --upgradable
Listing... Done
grub-common/stable 2.06-13+pmx2 amd64 [upgradable from: 2.06-13+deb12u1]
grub-efi-amd64-bin/stable 2.06-13+pmx2 amd64 [upgradable from: 2.06-13+deb12u1]
grub-pc-bin/stable 2.06-13+pmx2 amd64 [upgradable from: 2.06-13+deb12u1]
grub-pc/stable 2.06-13+pmx2 amd64 [upgradable from: 2.06-13+deb12u1]
grub2-common/stable 2.06-13+pmx2 amd64 [upgradable from: 2.06-13+deb12u1]
ifupdown2/stable 3.2.0-1+pmx9 all [upgradable from: 3.2.0-1+pmx2]
ksm-control-daemon/stable 1.5-1 all [upgradable from: 1.4-1]
libjs-extjs/stable 7.0.0-4 all [upgradable from: 7.0.0-3]
libknet1/stable 1.28-pve1 amd64 [upgradable from: 1.25-pve1]
libnozzle1/stable 1.28-pve1 amd64 [upgradable from: 1.25-pve1]
libnvpair3linux/stable 2.2.6-pve1 amd64 [upgradable from: 2.1.12-pve1]
libopeniscsiusr/stable 2.1.8-1.pve1 amd64 [upgradable from: 2.1.8-1]
libproxmox-acme-perl/stable 1.5.1 all [upgradable from: 1.4.6]
libproxmox-acme-plugins/stable 1.5.1 all [upgradable from: 1.4.6]
libproxmox-backup-qemu0/stable 1.4.1 amd64 [upgradable from: 1.4.0]
libproxmox-rs-perl/stable 0.3.4 amd64 [upgradable from: 0.3.0]
libpve-access-control/stable 8.1.4 all [upgradable from: 8.0.3]
libpve-apiclient-perl/stable 3.3.2 all [upgradable from: 3.3.1]
libpve-cluster-api-perl/stable 8.0.8 all [upgradable from: 8.0.1]
libpve-cluster-perl/stable 8.0.8 all [upgradable from: 8.0.1]
libpve-common-perl/stable 8.2.8 all [upgradable from: 8.0.5]
libpve-guest-common-perl/stable 5.1.4 all [upgradable from: 5.0.3]
libpve-http-server-perl/stable 5.1.2 all [upgradable from: 5.0.3]
libpve-rs-perl/stable 0.8.11 amd64 [upgradable from: 0.8.3]
libpve-storage-perl/stable 8.2.6 all [upgradable from: 8.0.1]
librados2-perl/stable 1.4.1 amd64 [upgradable from: 1.4.0]
libuutil3linux/stable 2.2.6-pve1 amd64 [upgradable from: 2.1.12-pve1]
libzfs4linux/stable 2.2.6-pve1 amd64 [upgradable from: 2.1.12-pve1]
libzpool5linux/stable 2.2.6-pve1 amd64 [upgradable from: 2.1.12-pve1]
lxc-pve/stable 6.0.0-1 amd64 [upgradable from: 5.0.2-4]
lxcfs/stable 6.0.0-pve2 amd64 [upgradable from: 5.0.3-pve3]
novnc-pve/stable 1.5.0-1 all [upgradable from: 1.4.0-2]
open-iscsi/stable 2.1.8-1.pve1 amd64 [upgradable from: 2.1.8-1]
proxmox-archive-keyring/stable 3.1 all [upgradable from: 3.0]
proxmox-backup-client/stable 3.2.9-1 amd64 [upgradable from: 2.99.0-1]
proxmox-backup-file-restore/stable 3.2.9-1 amd64 [upgradable from: 2.99.0-1]
proxmox-backup-restore-image/stable 0.6.1 amd64 [upgradable from: 0.5.1]
proxmox-kernel-helper/stable 8.1.0 all [upgradable from: 8.0.2]
proxmox-mail-forward/stable 0.2.3 amd64 [upgradable from: 0.1.1-1]
proxmox-ve/stable 8.2.0 all [upgradable from: 8.0.1]
proxmox-widget-toolkit/stable 4.3.0 all [upgradable from: 4.0.5]
pve-cluster/stable 8.0.8 amd64 [upgradable from: 8.0.1]
pve-container/stable 5.2.1 all [upgradable from: 5.0.3]
pve-docs/stable 8.2.4 all [upgradable from: 8.0.3]
pve-edk2-firmware/stable 4.2023.08-4 all [upgradable from: 3.20230228-4]
pve-firewall/stable 5.0.7 amd64 [upgradable from: 5.0.2]
pve-firmware/stable 3.14-1 all [upgradable from: 3.7-1]
pve-ha-manager/stable 4.0.5 amd64 [upgradable from: 4.0.2]
pve-i18n/stable 3.2.4 all [upgradable from: 3.0.4]
pve-kernel-6.2/stable 8.0.5 all [upgradable from: 8.0.2]
pve-manager/stable 8.2.8 amd64 [upgradable from: 8.0.3]
pve-qemu-kvm/stable 9.0.2-4 amd64 [upgradable from: 8.0.2-3]
pve-xtermjs/stable 5.3.0-3 amd64 [upgradable from: 4.16.0-3]
qemu-server/stable 8.2.6 amd64 [upgradable from: 8.0.6]
spl/stable 2.2.6-pve1 all [upgradable from: 2.1.12-pve1]
zfs-initramfs/stable 2.2.6-pve1 all [upgradable from: 2.1.12-pve1]
zfs-zed/stable 2.2.6-pve1 amd64 [upgradable from: 2.1.12-pve1]
zfsutils-linux/stable 2.2.6-pve1 amd64 [upgradable from: 2.1.12-pve1]
root@awpve:~#


does this seem good to go?
 
See
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates
https://pve.proxmox.com/wiki/Package_Repositories

Are the latest BIOS updates installed?

Did you attempt getting a dump with systemd-coredump?
I have done the following:
BIOS update
IMM fw update
RAID fw update
IVe done the apt dist-upgrade
im now on PVE 8.2.8
yet the same things

Nov 18 22:27:24 awpve kernel: tap777i0: entered allmulticast mode
Nov 18 22:27:24 awpve kernel: vmbr0: port 2(tap777i0) entered blocking state
Nov 18 22:27:24 awpve kernel: vmbr0: port 2(tap777i0) entered forwarding state
Nov 18 22:27:25 awpve kernel: tap777i1: entered promiscuous mode
Nov 18 22:27:25 awpve kernel: vmbr1: port 2(tap777i1) entered blocking state
Nov 18 22:27:25 awpve kernel: vmbr1: port 2(tap777i1) entered disabled state
Nov 18 22:27:25 awpve kernel: tap777i1: entered allmulticast mode
Nov 18 22:27:25 awpve kernel: vmbr1: port 2(tap777i1) entered blocking state
Nov 18 22:27:25 awpve kernel: vmbr1: port 2(tap777i1) entered forwarding state
Nov 18 22:27:25 awpve pve-guests[1975]: VM 777 started with PID 1986.
Nov 18 22:27:29 awpve pve-guests[1973]: <root@pam> end task UPID:awpve:000007B6:00000BE8:673B9519:startall::root@pam: OK
Nov 18 22:27:29 awpve systemd[1]: Finished pve-guests.service - PVE guests.
Nov 18 22:27:29 awpve systemd[1]: Starting pvescheduler.service - Proxmox VE scheduler...
Nov 18 22:27:31 awpve pvescheduler[2103]: starting server
Nov 18 22:27:31 awpve systemd[1]: Started pvescheduler.service - Proxmox VE scheduler.
Nov 18 22:27:31 awpve systemd[1]: Reached target multi-user.target - Multi-User System.
Nov 18 22:27:31 awpve systemd[1]: Reached target graphical.target - Graphical Interface.
Nov 18 22:27:31 awpve systemd[1]: Starting systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP...
Nov 18 22:27:31 awpve systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
Nov 18 22:27:31 awpve systemd[1]: Finished systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP.
Nov 18 22:27:31 awpve systemd[1]: Startup finished in 4.517s (kernel) + 35.538s (userspace) = 40.055s.
Nov 18 22:27:32 awpve systemd[1]: systemd-fsckd.service: Deactivated successfully.
Nov 18 22:27:59 awpve pvedaemon[1952]: <root@pam> successful auth for user 'root@pam'
Nov 18 22:42:16 awpve systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Nov 18 22:42:16 awpve systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Nov 18 22:42:16 awpve systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Nov 18 22:42:16 awpve systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Nov 18 22:43:00 awpve pvedaemon[1953]: <root@pam> successful auth for user 'root@pam'
Nov 18 22:58:01 awpve pvedaemon[1953]: <root@pam> successful auth for user 'root@pam'
Nov 18 23:13:02 awpve pvedaemon[1952]: <root@pam> successful auth for user 'root@pam'
Nov 18 23:17:01 awpve CRON[13230]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 18 23:17:01 awpve CRON[13231]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 18 23:17:01 awpve CRON[13230]: pam_unix(cron:session): session closed for user root
Nov 18 23:28:04 awpve pvedaemon[1953]: <root@pam> successful auth for user 'root@pam'
Nov 18 23:28:15 awpve kernel: zd32: p1 p2
Nov 18 23:28:15 awpve kernel: zd0: p1
Nov 18 23:28:16 awpve kernel: tap777i0: left allmulticast mode
Nov 18 23:28:16 awpve kernel: vmbr0: port 2(tap777i0) entered disabled state
Nov 18 23:28:17 awpve kernel: tap777i1: left allmulticast mode
Nov 18 23:28:17 awpve kernel: vmbr1: port 2(tap777i1) entered disabled state
Nov 18 23:28:17 awpve qmeventd[1482]: read: Connection reset by peer
Nov 18 23:28:17 awpve pvestatd[1935]: VM 777 qmp command failed - VM 777 not running
Nov 18 23:28:17 awpve systemd[1]: 777.scope: Deactivated successfully.
Nov 18 23:28:17 awpve systemd[1]: 777.scope: Consumed 9min 4.375s CPU time.
Nov 18 23:28:18 awpve qmeventd[15785]: Starting cleanup for 777
Nov 18 23:28:18 awpve qmeventd[15785]: Finished cleanup for 777
Nov 18 23:43:04 awpve pvedaemon[1954]: <root@pam> successful auth for user 'root@pam'
Nov 18 23:58:05 awpve pvedaemon[1953]: <root@pam> successful auth for user 'root@pam'


Am I missing something here??? any clue pls?
 
It might not be a crash in userspace then, but in the kernel.
just to add something
the zfs pool is near full
can this cause a problem though its been like that for 8 months
97.42% (1.12 TB of 1.15 TB)
 
97.42% (1.12 TB of 1.15 TB)
I don't personally currently use ZFS, but as you well know, for optimum performance, you should not have the pool that used.
What I would do in your case is offload about 200GB of data from the pool - so bringing the usage to about 80% - (scrub) - & test to see if you gain stability.
 
  • Like
Reactions: fiona
I don't personally currently use ZFS, but as you well know, for optimum performance, you should not have the pool that used.
What I would do in your case is offload about 200GB of data from the pool - so bringing the usage to about 80% - (scrub) - & test to see if you gain stability.
the strange thing is that it was like that for 8 months just when we replaced the motherboard things started going funny.
just another note here is that Ive noticed that our engineer inserted the SAS drives in the opposite sequence as they were on the old motherboard.
I tend to think in softraid zfs1 this shouldnt be a big deal but correct me if im wrong.
the reason Im using zfs is that I wanted to go zfs1 softraid. I dont mind switching to another way but keep in mind that this is a single node server.
 
It might not be a crash in userspace then, but in the kernel.
Hi sorry to keep bothering, any clue on this? could it be that drive bays were not put in sequence when it got transferred to the new server ?
 
Hi sorry to keep bothering, any clue on this? could it be that drive bays were not put in sequence when it got transferred to the new server ?
It sounds like it shouldn't matter, but can't be ruled out of course. The much more likely issue is that the ZFS is too full. ZFS uses copy-on-write and metadata trees and as a rule of thumb, should be at most 80-90% full to avoid issues.
 
It sounds like it shouldn't matter, but can't be ruled out of course. The much more likely issue is that the ZFS is too full. ZFS uses copy-on-write and metadata trees and as a rule of thumb, should be at most 80-90% full to avoid issues.
Ive just went on and made Disk analysis
ZFS is 97% shows full but on each virtual drive that is used. the disks inside each VM is only 20% used. I have Virtual disks each one is 20% used only no more. so I wonder if this is something important or no?
 
AFAIK (as above I don't use ZFS) you need to run FStrim both inside the VM & outside have the FStrim set in Proxmox.