Search results

  1. D

    PVE 7.0 BUG: kernel NULL pointer dereference, address: 00000000000000c0-PF:error_code(0x0000) - No web access no ssh

    I would really like to recommend that Proxmox bump the ABI whenever they release a new package, either to apply a hotfix to the package or when making use of an upstream update. The scenario we found ourselves in is that we got affected by the bug in this forum post, read that package 5.11.22-9...
  2. D

    PVE 7.0 BUG: kernel NULL pointer dereference, address: 00000000000000c0-PF:error_code(0x0000) - No web access no ssh

    Hrm.... So how is it possible that we were running nodes on 5.11.22-4-pve where 'dpkg -l pve-kernel-5.11.22-4-pve' reports it as being 5.11.22-9 but the active kernel in memory is actually 5.11.22-8? [admin@kvm5f ~]# uname -r 5.11.22-4-pve [admin@kvm5f ~]# dpkg -l pve-kernel-5.11.22-4-pve...
  3. D

    PVE 7.0 BUG: kernel NULL pointer dereference, address: 00000000000000c0-PF:error_code(0x0000) - No web access no ssh

    This appears to exclusively affect Intel Xeon Scalable 2nd generation systems (eg Intel Xeon Gold 6248). SSDs are proper data centre grade and Ceph is performing well. High after house usage are VM backups (many run within the VM), cluster backups (rotating RBD snapshots) and deep scrubs...
  4. D

    CEPH on PVE - how much space consumed?

    PS: At these OSD utilisation levels there isn't really spare capacity to re-distribute a failed OSD. This cluster is currently in a warning state due to 'noout' having been set, to avoid a cascading failure of other OSDs as they tried to replicate. [root@kvm1a ~]# ceph -s cluster: id...
  5. D

    CEPH on PVE - how much space consumed?

    I do agree that Ceph storage utilisation be reworked in the web UI, herewith an example of a cluster where there's a critical problem but the client simply wasn't aware: I understand that Storage here is simply a sum of all available storage on each node, perhaps exclude CephFS mounts then and...
  6. D

    PVE 7.0 BUG: kernel NULL pointer dereference, address: 00000000000000c0-PF:error_code(0x0000) - No web access no ssh

    We had another occurrence in the early hours of this morning, this was with the system running the latest kernel: pve-kernel-5.11.22-5-pve: 5.11.22-10 [admin@kvm5k ~]# uname -r 5.11.22-5-pve
  7. D

    linux guest: weird NIC names "rename10", "rename11" ...

    PVE 7's kernel is no longer able to rename interfaces to keep 'eth0', 'eth1', 'eth2', etc consistent. This is primarily due to systemd initialising things concurrently. If you want consistent names you now have to call them something other than what the kernel uses by default, the following...
  8. D

    Migration suggestion - from Proxmox to Proxmox

    Herewith some notes on creating a SSH public key pair and then syncing block devices by only transferring the differences by first compressing them via lzop. Great for inter data centre copies... Network based block replication: NB: Requires 'lzop' package! PS: Requires SSH keys to be...
  9. D

    Migration suggestion - from Proxmox to Proxmox

    Perhaps the following collection of commands are usefull: Convert or copy images: Copy RBD Image between Ceph pools (eg rbd_hdd -> rbd_ssd): # This command copies the source image to the destination image, honouring parent clone references. # ie: It copies data from the delta...
  10. D

    unexpected restart of all cluster nodes

    Also consider getting Corosync to automatically restart should it ever crash: mkdir /etc/systemd/system/corosync.service.d; echo -e '[Service]\nRestart=on-failure' > /etc/systemd/system/corosync.service.d/override.conf; systemctl daemon-reload; systemctl restart corosync; corosync-cfgtool -s...
  11. D

    unexpected restart of all cluster nodes

    The disruption of fencing a node unnecessarily is massive, we adjusted Corosync to simply be less sensitive and only fence when a node was really unavailable. We typically recommend 4 x 10G interfaces, basically comprising of two LACP bonds where one is used for VM traffic and the second is used...
  12. D

    Script to get Snapshots from Ceph and Merge it with VM-Name

    Continued: Top wrqm (merged write requests per second): vm_name ceph_rbd_image r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util...
  13. D

    Script to get Snapshots from Ceph and Merge it with VM-Name

    The following sort of does this in reverse, it connects to each node in the cluster and collects 2 minutes worth of 'iostat' results before then parsing the output by making reference to the VM name: #!/bin/bash time='60'; filter='rbd_'; function getstats() { for host in `ls -1A...
  14. D

    Script to get Snapshots from Ceph and Merge it with VM-Name

    We do something similar, just not via the API, to get a list of Ceph RBD images we want to backup by searching for parts of the VM names: get_disk () { # Limit to first 25 lines to hopefully avoid including snapshot images # Convert template clone names to rbd names ie...
  15. D

    PVE 7.0 BUG: kernel NULL pointer dereference, address: 00000000000000c0-PF:error_code(0x0000) - No web access no ssh

    We appear to have been bitten by the same bug this morning, we were however already running '5.11.22-9'. pve-kernel-5.11.22-4-pve: 5.11.22-9 Oct 27 07:52:02 kvm5k kernel: [1936007.710328] BUG: kernel NULL pointer dereference, address: 0000000000000000 Oct 27 07:52:02 kvm5k kernel...
  16. D

    EFI and TPM removed from VM config when stopped, not when shutdown

    That was indeed the problem, restarting the node resolved that issue so I presume a service that should be restarted as part of the package upgrade process... Thought that all PVE 7.0-13 cluster nodes had fenced and reset after network interfaces suddenly changed MTU (logs below), turns out I...
  17. D

    EFI and TPM removed from VM config when stopped, not when shutdown

    PS: The copy & paste of the console commands above also confirm that the VM was running locally on the same node that we issued 'qm start xx' and 'qm stop xx' on...
  18. D

    EFI and TPM removed from VM config when stopped, not when shutdown

    I can confirm that this is reproducible at will on a cluster of PVE 7 nodes which are subscribed to the enterprise repositories, where we temporarily added the no-subscription repository to prepare ourselves for vTPM and EFI state disks becoming available on our main production clusters that...
  19. D

    EFI and TPM removed from VM config when stopped, not when shutdown

    We have had good success with the Secure Boot capable EFI disks and TPM v2.0 emulation. Tested on latest no-subscription with Ceph Pacific 16.2.6. Live migrate works with Windows 11 with full disk encryption (BitLocker) and everything works just perfectly as long as one selects the...
  20. D

    tpmstate0: property is not defined in schema and the schema does not allow additional properties

    Logical, I'll move the VM to the one node running the new components. There is alot of interest around win 11 and server 2022, had wanted to start creating sysprep templates in preparation for vTPM being official... Had noticed swtpm having been installed on the enterprise subscription nodes...