3 nodes cluster messed up

VartKat

Member
Apr 1, 2021
22
2
8
56
Hi,

Here is the story, a three nodes cluster with, three MacMinis with internal disk and each one an external disk. Cluster configured, ceph configured with replication, HA configured.

After a few hours of running I discover the local-lvm disk on Node2 and Node3 are offline. A short investigation reveals that the volume group ‘pve’ has been renamed ‘vgroup’ on these two nodes. I rename the volume group but I forget to update fstab and update grub and initramfs (I didn’t know). Whateve…nodes2 and 3 get back their local-lvm volumes.

Strange thing, node1 begins to have the same behaviour so I do the same renaming. Finally all my nodes has volume group named ‘pve’.

I don’t know why or pehaps now I know that’s because I didn’t update fstab, grub and initramfs, node1 goes totally offline after a simple apt upgrade and reboot.

Again the volume group is named vgroup and is referenced as pve in grub fstab and so on ! So no volume is mounted and the node is down.

I don’t see it for a few days as HA migrated the node1 vm (which is a Home Assistant) to node 2.

I don’t remember why but I tell my friend who is on site to reboot the HA machine and he misunderstood and rebooted the whole node2.

Result is the migrated VM is no more booting saying that there’s no bootable volume.

I tried moving the VM disk as raw on the internal disk, no luck, I tried moving it as qcow2, no luck.

I’m stuck and I don’t know if removing node1 from cluster and doing a whole new install will get back my VM disk. I only need some data on the VM disk so if I was able to mount it I would backup the data I need but I didn’t succeed.

I tried to fire a new VM and mounting the /dev/sdb disk but and lsblk or fdisk - l tells me there’s no partition so I can’t mount anything.


Oh and I forgot to tell as it was a safe cluster... I don't have any backup :-(

I ever someone has an idea…


Thanks for your help
V.


Code:
root@pvemini2:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-8
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

Code:
cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content iso,images,vztmpl,rootdir,snippets,backup
        prune-backups keep-all=1

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

rbd: ceph-replicate
        content rootdir,images
        krbd 0
        pool ceph-replicate

Code:
VG                                        #PV #LV #SN Attr   VSize    VFree
  ceph-46a69907-e338-47a4-abb1-3b3a9214cd48   1   1   0 wz--n-  465.73g     0
  pve                                         1   4   0 wz--n- <931.01g 15.99g
root@pvemini2:~# pvs
  PV         VG                                        Fmt  Attr PSize    PFree
  /dev/sda3  pve                                       lvm2 a--  <931.01g 15.99g
  /dev/sdb   ceph-46a69907-e338-47a4-abb1-3b3a9214cd48 lvm2 a--   465.73g     0
 
Last edited:
Hi,
what is the current status of your cluster pvecm status? Also, please post the config for the VM in question qm config <VMID> as well as the output of lvs.

Why did you use local storage with HA? For HA to work you need shared storage.
 
Code:
# qm config 101
agent: 1
boot: order=ide2;scsi0
cores: 4
description: unused0%3A local%3A101/vm-101-disk-0.qcow2%0Aunused0%3A local-lvm%3Avm-101-disk-0.qcow2%0Aunused1%3A local-lvm%3Avm-101-disk-0%0Aunused2%3A local%3A101/vm-101-disk-0.qcow2
ide2: none,media=cdrom
memory: 8192
meta: creation-qemu=7.1.0,ctime=1674265658
name: HA
net0: virtio=42:AB:E1:8F:35:AA,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-101-disk-0,iothread=1,size=64G
scsihw: virtio-scsi-single
smbios1: uuid=4bf64e75-cabd-48dc-88d6-87fd3d361f65
sockets: 1
startup: order=1
tags: critical
unused0: ceph-replicate:vm-101-disk-0
unused1: local:101/vm-101-disk-0.qcow2
vmgenid: b2d57703-9cf0-4276-9962-86d31111195f

Note that the unused disks are my sevral attemps...

Code:
root@pvemini2:~# lvs
  LV                                             VG                                        Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  osd-block-5b809eab-4d8d-474d-af54-9dbe4f25521d ceph-46a69907-e338-47a4-abb1-3b3a9214cd48 -wi-ao----  465.73g                                                 
  data                                           pve                                       twi-aotz-- <795.14g             7.21   0.42                         
  root                                           pve                                       -wi-ao----   96.00g                                                 
  swap                                           pve                                       -wi-ao----    7.64g                                                 
  vm-101-disk-0                                  pve                                       Vwi-a-tz--   64.00g data        89.58
Code:
root@pvemini3:~# lvs
  LV                                             VG                                        Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  osd-block-41dd93cf-2f21-4bed-a620-8ce4fa7265b1 ceph-1e12dae8-afa9-4606-a58a-c90fdbdaada9 -wi-ao----  465.73g                                                 
  data                                           pve                                       twi-a-tz-- <342.27g             0.00   0.49                         
  root                                           pve                                       -wi-ao----   96.00g                                                 
  swap                                           pve                                       -wi-ao----    4.00g

I discover that the disk we're talking about is, on mini1, par of the host logical volumes, I made so many attempts that I don''t know how I did that...


Code:
root@pvemini2:~# lvdisplay
  --- Logical volume ---
  LV Path                /dev/ceph-46a69907-e338-47a4-abb1-3b3a9214cd48/osd-block-5b809eab-4d8d-474d-af54-9dbe4f25521d
  LV Name                osd-block-5b809eab-4d8d-474d-af54-9dbe4f25521d
  VG Name                ceph-46a69907-e338-47a4-abb1-3b3a9214cd48
  LV UUID                25Y8oo-1sY9-yEcr-EoQL-jK1Q-Eiry-HJTzpP
  LV Write Access        read/write
  LV Creation host, time pvemini2, 2023-05-08 17:29:36 -0400
  LV Status              available
  # open                 25
  LV Size                465.73 GiB
  Current LE             119227
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     131064
  Block device           253:0
 
  --- Logical volume ---
  LV Path                /dev/pve/swap
  LV Name                swap
  VG Name                pve
  LV UUID                QdF3J0-NeB0-FB4j-OK0D-EziU-7vlV-bLbylF
  LV Write Access        read/write
  LV Creation host, time proxmox, 2023-01-20 14:50:07 -0500
  LV Status              available
  # open                 2
  LV Size                7.64 GiB
  Current LE             1957
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1
 
  --- Logical volume ---
  LV Path                /dev/pve/root
  LV Name                root
  VG Name                pve
  LV UUID                uzVyGu-eOQE-7xwr-FPxP-08gJ-6Plz-qP2IfK
  LV Write Access        read/write
  LV Creation host, time proxmox, 2023-01-20 14:50:07 -0500
  LV Status              available
  # open                 1
  LV Size                96.00 GiB
  Current LE             24576
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
 
  --- Logical volume ---
  LV Name                data
  VG Name                pve
  LV UUID                HzUC1Y-FT3w-Txfj-h9Tj-0Y9O-2pYF-mCYxpa
  LV Write Access        read/write (activated read only)
  LV Creation host, time proxmox, 2023-01-20 14:52:32 -0500
  LV Pool metadata       data_tmeta
  LV Pool data           data_tdata
  LV Status              available
  # open                 0
  LV Size                <795.14 GiB
  Allocated pool data    7.21%
  Allocated metadata     0.42%
  Current LE             203555
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:6
 
  --- Logical volume ---
  LV Path                /dev/pve/vm-101-disk-0
  LV Name                vm-101-disk-0
  VG Name                pve
  LV UUID                1Op8zW-YHSY-pWIx-RR5O-Dzv4-1j2n-qBLmQ7
  LV Write Access        read/write
  LV Creation host, time pvemini2, 2023-05-22 16:36:56 -0400
  LV Pool name           data
  LV Status              available
  # open                 0
  LV Size                64.00 GiB
  Mapped size            89.58%
  Current LE             16384
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:7

Code:
root@pvemini3:~# pvecm status
Cluster information
-------------------
Name:             KnowltonCluster
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed May 24 11:23:07 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000003
Ring ID:          2.bf
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 10.111.0.11
0x00000003          1 10.111.0.12 (local)
 
Last edited:
So according to your output you have vm-101-disk-0 which is located at pvemini2 and is configured to be the boot disk for VM 101.
Is this the disk you are referring to? Is VM 101 located at node pvemini2?
 
Last edited:
Thank you very much to look into my case.

The disk I'm looking for is vm-101-disk-0 which originally was on ceph-replicate a replicated 3 disks (one per node) ceph volume.
Node one came down and as HA was setup VM101 magically migrated to Node2 (mini2). It was working till my friend rebooted Mini2, I suppose that is a bad idea to reboot one of the remaining healthy nodes of a cluster while the cluster is unhealthy.

After reboot VM101 was no more able to reboot. It says no bootable device.

I tried copying the disk as qcow2 on internal disk, that's what you see as
Code:
scsi0: local-lvm:vm-101-disk-0,iothread=1,size=64G
I also tried to do the same as raw, to operate rescue on copies (fsck, disk, etc) in case I mess up I thought I would keep a untouch copy of the disk.

All other VMs and disk are not so important and easyly redoable but the work on VM101 is month long and hundreds of devices an fine tune setups.
Is there any way of mounting such disk on a liveCD ?

In case I find a way to do so I built a PVE install on Mini1 on an external disk. Mini1 is booted from this kind of LivePVE (which has no lvm, it is a debian11+pve because the MacMini wants wifi drivers that the legacy pve iso image hasn't).
VGS
Code:
root@rescue:~# pvs
  PV         VG  Fmt  Attr PSize   PFree
  /dev/sda2  pve lvm2 a--  465.56g 8.26g
root@rescue:~# vgs
  VG  #PV #LV #SN Attr   VSize   VFree
  pve   1   3   0 wz--n- 465.56g 8.26g
root@rescue:~# lvs
  LV   VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data pve twi-aotz--  310.00g             0.00   10.42                          
  root pve -wi-a----- <139.70g                                                  
  swap pve -wi-a-----   <7.45g                                                  
root@rescue:~# lsblk
NAME                 MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                    8:0    0 465.8G  0 disk
├─sda1                 8:1    0   200M  0 part
└─sda2                 8:2    0 465.6G  0 part
  ├─pve-swap         253:0    0   7.4G  0 lvm
  ├─pve-root         253:1    0 139.7G  0 lvm
  ├─pve-data_tmeta   253:2    0    80M  0 lvm
  │ └─pve-data-tpool 253:4    0   310G  0 lvm
  │   └─pve-data     253:5    0   310G  1 lvm
  └─pve-data_tdata   253:3    0   310G  0 lvm
    └─pve-data-tpool 253:4    0   310G  0 lvm
      └─pve-data     253:5    0   310G  1 lvm
sdb                    8:16   1 468.8G  0 disk
├─sdb1                 8:17   1   512M  0 part /boot/efi
├─sdb2                 8:18   1 467.3G  0 part /
└─sdb3                 8:19   1   976M  0 part [SWAP]
root@rescue:~# blkid
/dev/sda1: LABEL_FATBOOT="EFI" LABEL="EFI" UUID="70D6-1701" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="6ac9e036-69cd-4dd6-b4c0-925ceeeb294a"
/dev/sda2: UUID="S6XrZB-fBPT-PJd0-yTyL-BS8m-YKmE-u0P7Dn" TYPE="LVM2_member" PARTLABEL="pv" PARTUUID="55d57463-ff39-4a82-ba84-82e735e42302"
/dev/mapper/pve-swap: UUID="705c7e14-7a21-42b8-b9ab-b41fc517bbcb" TYPE="swap"
/dev/mapper/pve-root: UUID="e0e85b7f-6c4f-499c-83a2-bc370cf93740" BLOCK_SIZE="4096" TYPE="ext4"
/dev/sdb1: UUID="6413-73E2" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="26b3e8ba-83a5-484e-844d-e2f8564b82b6"
/dev/sdb2: UUID="30adc320-84ab-46cb-a365-477ab605e2de" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="39c6d1af-9af0-4b64-a7d3-21f256a9411f"
/dev/sdb3: UUID="78e76e31-8181-4955-b676-923e66fce530" TYPE="swap" PARTUUID="c8334efc-eced-4167-9170-0cce7cb06f5b"

I tried two ways :

A- getting back the Mini1 to boot as I'm quite sure nothing is really broken and tha's just a question of references cause volume group changed its name from “pve“ to “vgroup“.
  1. mounted the internal disk on /mnt (mount /dev/mapper/pve-root /mnt)
  2. verified fstab had the right disk ref
  3. Code:
    root@rescue:~# cat /mnt/etc/fstab
    [/LIST]
    # /etc/fstab: static file system information.
    #
    # Use 'blkid' to print the universally unique identifier for a
    # device; this may be used with UUID= as a more robust way to name devices
    # that works even if disks are added and removed. See fstab(5).
    #
    # systemd generates mount units based on this file, see systemd.mount(5).
    # Please run 'systemctl daemon-reload' after making changes here.
    #
    # <file system> <mount point>   <type>  <options>       <dump>  <pass>
    /dev/mapper/pve-root /               ext4    errors=remount-ro 0       1
    # /boot/efi was on /dev/sda1 during installation
    UUID=70D6-1701  /boot/efi       vfat    umask=0077      0       1
    /dev/mapper/pve-data /data           ext4    defaults        0       2
    /dev/mapper/pve-swap none            swap    sw              0       0
    3. verified /etc/defaults/grub
    where I didn't find much references to disks or volumes even in the grub.d subdir
    4. bind some directories so that I can chroot on the internal disk and run commands as if I had booted on it
    Code:
    for i in /sys /proc /run /dev; do sudo mount --bind "$i" "/mnt$i"; done
    5- $chroot /mnt
    6- $update grub
    7- mkinitramfs to build a new initrd.img with the right disk ref
    8- proxmox-boot-tool refresh
    which says
    Code:
    root@rescue:/# proxmox-boot-tool refresh
    Running hook script 'proxmox-auto-removal'..
    Running hook script 'zz-proxmox-boot'..
    Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
    No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.

    Tried rebooting and no luck.

    B- As I'l not succeeding in getting back Mini1 to boot I'm looking to mount the VM101 disk on my LivePVE so that I can backup my Home Asistant data and redo things from scratch or at least one of the versions (raw or qcow2) which are on the internal disk cause mounting cep-replicate on the LivePVE seems too complicated. I neither found a way to mount local-lvm partition as lvdisplay doens't give path for it :
    Code:
    root@rescue:~# lvdisplay
      --- Logical volume ---
      LV Path                /dev/pve/swap
      LV Name                swap
      VG Name                pve
      LV UUID                oigcr1-dEBS-dJ4K-ZjH6-GYgZ-OFrk-rN0K2K
      LV Write Access        read/write
      LV Creation host, time pvemini1, 2023-01-20 13:09:15 -0500
      LV Status              available
      # open                 0
      LV Size                <7.45 GiB
      Current LE             1907
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     256
      Block device           253:0
     
      --- Logical volume ---
      LV Path                /dev/pve/root
      LV Name                root
      VG Name                pve
      LV UUID                mA1s7N-eOpd-68o8-2sea-qWv2-FR82-7AjwMX
      LV Write Access        read/write
      LV Creation host, time pvemini1, 2023-01-20 13:09:34 -0500
      LV Status              available
      # open                 1
      LV Size                <139.70 GiB
      Current LE             35762
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     256
      Block device           253:1
     
      --- Logical volume ---
      LV Name                data
      VG Name                pve
      LV UUID                6E6Was-3DdO-fd79-twZE-ihhr-GLm7-ERFJk7
      LV Write Access        read/write (activated read only)
      LV Creation host, time pvemini1, 2023-01-21 09:23:42 -0500
      LV Pool metadata       data_tmeta
      LV Pool data           data_tdata
      LV Status              available
      # open                 0
      LV Size                310.00 GiB
      Allocated pool data    0.00%
      Allocated metadata     10.42%
      Current LE             79360
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     1024
      Block device           253:5

    Is there any mean to mount the data logical volume ?

    Further plan is to follow the dead node procedure and rebuild Mini1 from scratch but aven if the ceph disk osd.1 should be untouched I'm not sure to get back anything
 
Last edited:
If ever someone was kind enought to take a look at my desperate case I've managed to get some of the Node1 log files....
 

Attachments

  • syslog.log
    812 KB · Views: 1
  • kern.log
    53.4 KB · Views: 1
  • messages.log
    131.4 KB · Views: 0
  • pveam.log
    47.1 KB · Views: 2
I do not quite understand is the volume you are looking for on lvm or on ceph?
Is your ceph healthy now? Output of ceph -s

Don't panik, probably your data is still there ;)

How did you get to that lv 101

Can you show the output of kpartx -l /dev/pve/vm-101-disk-0
 
Last edited:
Code:
root@pvemini2:~# ceph -s
  cluster:
    id:     13bc2c1c-fcac-45e2-8b4a-344e7f9c1705
    health: HEALTH_WARN
            mon pvemini2 is low on available space
            1/3 mons down, quorum pvemini2,pvemini3
            Degraded data redundancy: 28416/85248 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized
            33 pgs not deep-scrubbed in time
            33 pgs not scrubbed in time
            1 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum pvemini2,pvemini3 (age 11d), out of quorum: localhost
    mgr: pvemini2(active, since 11d)
    osd: 3 osds: 2 up (since 11d), 2 in (since 2w)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 28.42k objects, 110 GiB
    usage:   220 GiB used, 711 GiB / 931 GiB avail
    pgs:     28416/85248 objects degraded (33.333%)
             33 active+undersized+degraded
 
  io:
    client:   0 B/s rd, 23 KiB/s wr, 0 op/s rd, 2 op/s wr


Actually here is the timeline :
t1- Node1 which has VM101 on it goes down. HA is setup, VM 101 is magically migrated on Node2 and it's running fine on two nodes
t2- After a misunderstanding Node2 is rebooted. Result is VM101 doesn't boot

So the VM-101-disk-0 that I'm looking for is on ceph volume.
And I didn't want to run some repair tools on this disk so I moved it as raw and as qcow2 on the Node2 internal disk and then tried some repair (fdisk, fsck, kpartx,...) nothing sees partitions, and I didn't succeed.

I face two problems
Problem 1- Node1 not booting which I thought was because of the volume group getting renames 'vgroup' instead of 'pve'. So I tried to rename the volume group but it didn't work. After reading a bunch of infos online I found that it could be because when a boot disk refeerence is vchanged one has also to update iniramfs...(cf previous post to see my attempts, pve live disk, chroot, update initramfs,...)

Problem 2- As I has no clue on how to get Node 1 booting, my other solution could be to get vm-101-disk-0 mounted and access to the files so that I can backup and rebuild everything from scratch.

I'm on the verge on rebuilding Node1 from scratch and this leads me to some questions :
Q1- Node1 is down, OK, but the problem is not on its ceph disk. Is ceph built for this case ? i.e. A host rebuild with a ceph disk untouched.
The dead node procedure seems to be made for ceph disk down (or both host and disk) not host machine only ?
Can I follow a procedure which will recognize the ceph disk as part as the ceph 3 disks replication and not try to rebuild it from scratch ?

Q2- Before going on the dead node procedure, is there any mean to mount the ceph volume while booted on 'LivePVE' and see my vm-101-disk-0 to mount it and get back some files ?


As for the output of kpartx the reult is nothing, not a word
Code:
root@pvemini2:~# ls -la /dev/pve
total 0
drwxr-xr-x  2 root root  140 May 27 06:48 .
drwxr-xr-x 22 root root 4840 May 27 06:48 ..
lrwxrwxrwx  1 root root    7 May 17 17:50 root -> ../dm-2
lrwxrwxrwx  1 root root    7 May 17 17:50 swap -> ../dm-1
lrwxrwxrwx  1 root root    7 May 22 17:36 vm-101-disk-0 -> ../dm-7
lrwxrwxrwx  1 root root    7 May 27 06:56 vm-108-disk-0 -> ../dm-8
lrwxrwxrwx  1 root root    7 May 27 06:48 vm-108-disk-1 -> ../dm-9


root@pvemini2:~# kpartx -l /dev/pve/vm-101-disk-0
root@pvemini2:~#
Thanks for your help
 
Last edited:
My config for a VM looks like this:
root@prompt:~# cat /etc/pve/qemu-server/4019.conf boot: order=scsi0;ide2;net0 cores: 2 ide2: cephfs:iso/debian-11.3.0-amd64-netinst.iso,media=cdrom,size=378M memory: 2048 meta: creation-qemu=7.0.0,ctime=1667572740 name: testdc02 net0: virtio=B6:E7:FD:90:01:63,bridge=vmbr0 numa: 0 ostype: l26 scsi0: ceph:vm-4019-disk-0,size=32G scsihw: virtio-scsi-pci smbios1: uuid=b90bddb3-5926-4d34-9948-ad0f7ccb1ff6 sockets: 1 vmgenid: 9f146b9b-add0-4d19-99b0-24aeec65ff02

so rados lspools shows me my (ceph) pools
ceph ceph_ssd cephfs

and rbd ls ceph -l shows my vm disks

... vm-4018-disk-0 10 GiB 2 excl vm-4019-disk-0 32 GiB 2 vm-4020-disk-0 20 GiB 2 excl vm-4022-disk-0 8 GiB 2 excl ...

rbd device map -t nbd ceph/vm-106-disk-0 maps this as a local block device /dev/nbd0

rbd device map -t nbd ceph/vm-106-disk-0
/dev/nbd2

fdisk -l /dev/nbd2
Disk /dev/nbd2: 90 GiB, 96636764160 bytes, 188743680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x36245222

Device Boot Start End Sectors Size Id Type
/dev/nbd2p1 * 2048 187596650 187594603 89.5G 7 HPFS/NTFS/exFAT
/dev/nbd2p2 187596800 188737535 1140736 557M 27 Hidden NTFS WinRE


mkdir /tmp/mnt
mount -t auto /dev/nbd2p1 /tmp/mnt
mount |grep nbd2
/dev/nbd2p1 on /tmp/mnt type fuseblk (rw,relatime,user_id=0,group_id=0,allow_other,blksize=4096)

ls /tmp/mnt
'$Recycle.Bin' 6da490d62b5f2a047bad199400a6 'Documents and Settings' EDI-bachor.zip PerfLogs PuTTYPortable temp
'$Windows.~WS' Boot 'Dokumente und Einstellungen' ESD pihan.ovpn Recovery tmp
'$WinREAgent' bootmgr DRIVERS hiberfil.sys ProgramData sshd.pid Users
30272cdfff67241a4c BOOTNXT drv inetpub 'Program Files' swapfile.sys Windows
 
Last edited:
Thanks you very much, here's what I get :
Code:
root@pvemini3:~# rados lspools
.mgr
ceph-replicate


root@pvemini3:~# rbd ls ceph-replicate
base-103-disk-0
vm-101-disk-0
vm-102-disk-0
vm-104-disk-0
vm-105-disk-0
vm-106-disk-0
vm-107-disk-0
vm-109-disk-0
root@pvemini3:~# lsblk
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                                                     8:0    0 465.8G  0 disk
├─sda1                                                                                                  8:1    0  1007K  0 part
├─sda2                                                                                                  8:2    0   512M  0 part /boot/efi
└─sda3                                                                                                  8:3    0 465.3G  0 part
  ├─pve-swap                                                                                          253:1    0     4G  0 lvm  [SWAP]
  ├─pve-root                                                                                          253:2    0    96G  0 lvm  /
  ├─pve-data_tmeta                                                                                    253:3    0   3.5G  0 lvm 
  │ └─pve-data                                                                                        253:5    0 342.3G  0 lvm 
  └─pve-data_tdata                                                                                    253:4    0 342.3G  0 lvm 
    └─pve-data                                                                                        253:5    0 342.3G  0 lvm 
sdb                                                                                                     8:16   0 465.8G  0 disk
└─ceph--1e12dae8--afa9--4606--a58a--c90fdbdaada9-osd--block--41dd93cf--2f21--4bed--a620--8ce4fa7265b1 253:0    0 465.7G  0 lvm 
rbd0                                                                                                  252:0    0     4G  0 disk


root@pvemini3:~# rbd device map -t nbd ceph-replicate/vm-101-disk-0
/usr/bin/rbd-nbd: exec failed: (2) No such file or directory
rbd: rbd-nbd failed with error: /usr/bin/rbd-nbd: exit status: 1


root@pvemini3:~# modprobe nbd max_part=8

root@pvemini3:~# rbd device map -t nbd ceph-replicate/vm-101-disk-0
/usr/bin/rbd-nbd: exec failed: (2) No such file or directory
rbd: rbd-nbd failed with error: /usr/bin/rbd-nbd: exit status: 1

root@pvemini3:~# apt install rbd-nbd
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  rbd-nbd
0 upgraded, 1 newly installed, 0 to remove and 22 not upgraded.
Need to get 163 kB of archives.
After this operation, 530 kB of additional disk space will be used.
Get:1 http://download.proxmox.com/debian/ceph-quincy bullseye/main amd64 rbd-nbd amd64 17.2.6-pve1 [163 kB]
Fetched 163 kB in 0s (723 kB/s)
Selecting previously unselected package rbd-nbd.
(Reading database ... 45903 files and directories currently installed.)
Preparing to unpack .../rbd-nbd_17.2.6-pve1_amd64.deb ...
Unpacking rbd-nbd (17.2.6-pve1) ...
Setting up rbd-nbd (17.2.6-pve1) ...
Processing triggers for man-db (2.9.4-2) ...


root@pvemini3:~# rbd device map -t nbd ceph-replicate/vm-101-disk-0
rbd: rbd-nbd failed with error: /usr/bin/rbd-nbd: got signal: 11
root@pvemini3:~#
 
Thanks you very much, here's what I get :
Code:
root@pvemini3:~# rados lspools
.mgr
ceph-replicate


root@pvemini3:~# rbd ls ceph-replicate
base-103-disk-0
vm-101-disk-0
vm-102-disk-0
vm-104-disk-0
vm-105-disk-0
vm-106-disk-0
vm-107-disk-0
vm-109-disk-0
root@pvemini3:~# lsblk
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                                                     8:0    0 465.8G  0 disk
├─sda1                                                                                                  8:1    0  1007K  0 part
├─sda2                                                                                                  8:2    0   512M  0 part /boot/efi
└─sda3                                                                                                  8:3    0 465.3G  0 part
  ├─pve-swap                                                                                          253:1    0     4G  0 lvm  [SWAP]
  ├─pve-root                                                                                          253:2    0    96G  0 lvm  /
  ├─pve-data_tmeta                                                                                    253:3    0   3.5G  0 lvm
  │ └─pve-data                                                                                        253:5    0 342.3G  0 lvm
  └─pve-data_tdata                                                                                    253:4    0 342.3G  0 lvm
    └─pve-data                                                                                        253:5    0 342.3G  0 lvm
sdb                                                                                                     8:16   0 465.8G  0 disk
└─ceph--1e12dae8--afa9--4606--a58a--c90fdbdaada9-osd--block--41dd93cf--2f21--4bed--a620--8ce4fa7265b1 253:0    0 465.7G  0 lvm
rbd0                                                                                                  252:0    0     4G  0 disk


root@pvemini3:~# rbd device map -t nbd ceph-replicate/vm-101-disk-0
/usr/bin/rbd-nbd: exec failed: (2) No such file or directory
rbd: rbd-nbd failed with error: /usr/bin/rbd-nbd: exit status: 1


root@pvemini3:~# modprobe nbd max_part=8

root@pvemini3:~# rbd device map -t nbd ceph-replicate/vm-101-disk-0
/usr/bin/rbd-nbd: exec failed: (2) No such file or directory
rbd: rbd-nbd failed with error: /usr/bin/rbd-nbd: exit status: 1

root@pvemini3:~# apt install rbd-nbd
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  rbd-nbd
0 upgraded, 1 newly installed, 0 to remove and 22 not upgraded.
Need to get 163 kB of archives.
After this operation, 530 kB of additional disk space will be used.
Get:1 http://download.proxmox.com/debian/ceph-quincy bullseye/main amd64 rbd-nbd amd64 17.2.6-pve1 [163 kB]
Fetched 163 kB in 0s (723 kB/s)
Selecting previously unselected package rbd-nbd.
(Reading database ... 45903 files and directories currently installed.)
Preparing to unpack .../rbd-nbd_17.2.6-pve1_amd64.deb ...
Unpacking rbd-nbd (17.2.6-pve1) ...
Setting up rbd-nbd (17.2.6-pve1) ...
Processing triggers for man-db (2.9.4-2) ...


root@pvemini3:~# rbd device map -t nbd ceph-replicate/vm-101-disk-0
rbd: rbd-nbd failed with error: /usr/bin/rbd-nbd: got signal: 11
root@pvemini3:~#
Try mapping the volume via
Bash:
rbd device map ceph-replicate/vm-101-disk-0
If that works, you can inspect the mapped block device via `gdisk /dev/rbd0`.

Nevertheless, I would suggest to tackle one problem at a time, since it gets very difficult to follow if you try to many different things at once. Makes it also nearly impossible to find the root cause of your issues.

What is the exact error you get when trying to boot the node which is acting up? Please share the exact error message.
 
About Node1 startup it hangs at disk cleanup...

View attachment 51011
Does it just stay there or do you get a timeout after that? seems like your volume group is fine and a fsck is performed on your root filesystem. Have you tried to boot into the recovery mode by selecting the corresponding entry in the grub advanced section?
 
GRUB Then it stays there even after more than 5min waiting.

Thta's why I made a kind of LivePVE on a USB Stick and booted on it. (and if that's usefull I can mount the original/internal pve root partition and edit files on it).
 
Last edited:
So the same behaviour if you try to boot via recovery mode?

Have you tried adding the nomodeset kernel parameter via grub? Can you ping or even ssh into the machine? Seems strange that you get no error message at all or are not dropped into an emergency shell.

Also, I assume the rbd device map did not work as expected, right?
 
Hi Chris, and sorry ofr the delay I had to wait for my friend who is on site to try and tell you what we get (as i'm doing all this using ssh from far away)

none of these worked, no ping, no ssh, I surrendered and begun the node replacement porcedure.

I shut down the node1 (the dead one) and followed Aaron's advices from this thread :
https://forum.proxmox.com/threads/r...-had-ceph-services-running.127207/post-555967

Now I'm wondering how to get back my first ceph disk, which was osd.0 into the cluster without wiping it. As my problem was on the internal/system disk of the node and not on the ceph (replicated) disk I should be able to reintegrate it, no ?

If so how to do that ?

If that can help

Code:
root@pvemini2:~# ceph tell mon.0 status
Error ENXIO: problem getting command descriptions from mon.0


root@pvemini2:~# ceph tell mon.1 status
no valid command found; 10 closest matches:
0
1
2
abort
add_bootstrap_peer_hint <addr>
add_bootstrap_peer_hintv <addrv>
assert
compact
config diff
config diff get <var>
Error EINVAL: invalid command


root@pvemini2:~# ceph tell mon.2 status
no valid command found; 10 closest matches:
0
1
2
abort
add_bootstrap_peer_hint <addr>
add_bootstrap_peer_hintv <addrv>
assert
compact
config diff
config diff get <var>
Error EINVAL: invalid command

Code:
root@pvemini2:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pvemini1
         2          1 pvemini2 (local)
         3          1 pvemini3

Code:
root@pvemini2:~# ceph health
HEALTH_WARN 1 OSD(s) have spurious read errors; mon pvemini2 is low on available space; 1/3 mons down, quorum pvemini2,pvemini3; Degraded data redundancy: 28429/85287 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized; 33 pgs not deep-scrubbed in time; 33 pgs not scrubbed in time; OSD count 2 < osd_pool_default_size 3

Code:
root@pvemini2:~# ceph health detail
HEALTH_WARN 1 OSD(s) have spurious read errors; mon pvemini2 is low on available space; 1/3 mons down, quorum pvemini2,pvemini3; Degraded data redundancy: 28429/85287 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized; 33 pgs not deep-scrubbed in time; 33 pgs not scrubbed in time; OSD count 2 < osd_pool_default_size 3
[WRN] BLUESTORE_SPURIOUS_READ_ERRORS: 1 OSD(s) have spurious read errors
     osd.2  reads with retries: 1
[WRN] MON_DISK_LOW: mon pvemini2 is low on available space
    mon.pvemini2 has 12% avail
[WRN] MON_DOWN: 1/3 mons down, quorum pvemini2,pvemini3
    mon.localhost (rank 0) addr [v2:10.111.0.10:3300/0,v1:10.111.0.10:6789/0] is down (out of quorum)
[WRN] PG_DEGRADED: Degraded data redundancy: 28429/85287 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized
    pg 1.0 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.0 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.1 is stuck undersized for 33h, current state active+undersized+degraded, last acting [1,2]
    pg 2.2 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.3 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.4 is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.5 is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.6 is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.7 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.8 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.9 is stuck undersized for 33h, current state active+undersized+degraded, last acting [2,1]
    pg 2.a is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.b is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.c is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.d is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.e is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.f is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.10 is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.11 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.12 is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.13 is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.14 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.15 is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.16 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.17 is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.18 is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.19 is stuck undersized for 33h, current state active+undersized+degraded, last acting [2,1]
    pg 2.1a is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
    pg 2.1b is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.1c is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.1d is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.1e is stuck undersized for 7d, current state active+undersized+degraded, last acting [2,1]
    pg 2.1f is stuck undersized for 7d, current state active+undersized+degraded, last acting [1,2]
[WRN] PG_NOT_DEEP_SCRUBBED: 33 pgs not deep-scrubbed in time
    pg 2.1f not deep-scrubbed since 2023-05-12T02:57:00.569031-0400
    pg 2.1e not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.1d not deep-scrubbed since 2023-05-11T06:37:03.854472-0400
    pg 2.1c not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.b not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.a not deep-scrubbed since 2023-05-11T13:02:32.297222-0400
    pg 2.9 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.8 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.7 not deep-scrubbed since 2023-05-10T03:25:56.747862-0400
    pg 2.6 not deep-scrubbed since 2023-05-10T02:04:54.258205-0400
    pg 2.5 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.4 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.2 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.1 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.0 not deep-scrubbed since 2023-05-13T10:57:53.093123-0400
    pg 2.3 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 1.0 not deep-scrubbed since 2023-05-08T17:31:13.537368-0400
    pg 2.c not deep-scrubbed since 2023-05-09T20:08:35.223452-0400
    pg 2.d not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.e not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.f not deep-scrubbed since 2023-05-09T17:52:05.307163-0400
    pg 2.10 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.11 not deep-scrubbed since 2023-05-12T04:19:46.577533-0400
    pg 2.12 not deep-scrubbed since 2023-05-10T04:46:31.449991-0400
    pg 2.13 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.14 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.15 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.16 not deep-scrubbed since 2023-05-11T12:24:55.471638-0400
    pg 2.17 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.18 not deep-scrubbed since 2023-05-12T11:50:52.345860-0400
    pg 2.19 not deep-scrubbed since 2023-05-08T17:38:54.976918-0400
    pg 2.1a not deep-scrubbed since 2023-05-10T03:07:45.488210-0400
    pg 2.1b not deep-scrubbed since 2023-05-10T02:20:20.615383-0400
[WRN] PG_NOT_SCRUBBED: 33 pgs not scrubbed in time
    pg 2.1f not scrubbed since 2023-05-13T13:20:54.194509-0400
    pg 2.1e not scrubbed since 2023-05-12T04:20:29.694729-0400
    pg 2.1d not scrubbed since 2023-05-12T12:38:07.566666-0400
    pg 2.1c not scrubbed since 2023-05-13T08:43:18.605793-0400
    pg 2.b not scrubbed since 2023-05-12T14:00:53.519195-0400
    pg 2.a not scrubbed since 2023-05-13T00:13:38.006972-0400
    pg 2.9 not scrubbed since 2023-05-12T17:19:57.779059-0400
    pg 2.8 not scrubbed since 2023-05-13T07:25:01.050016-0400
    pg 2.7 not scrubbed since 2023-05-12T20:21:24.347531-0400
    pg 2.6 not scrubbed since 2023-05-12T14:18:12.985938-0400
    pg 2.5 not scrubbed since 2023-05-12T12:15:54.725456-0400
    pg 2.4 not scrubbed since 2023-05-12T17:13:24.163116-0400
    pg 2.2 not scrubbed since 2023-05-12T17:04:32.524334-0400
    pg 2.1 not scrubbed since 2023-05-12T18:26:54.059912-0400
    pg 2.0 not scrubbed since 2023-05-13T10:57:53.093123-0400
    pg 2.3 not scrubbed since 2023-05-13T11:17:59.636969-0400
    pg 1.0 not scrubbed since 2023-05-13T12:59:42.600026-0400
    pg 2.c not scrubbed since 2023-05-13T05:13:09.060223-0400
    pg 2.d not scrubbed since 2023-05-13T11:39:46.475086-0400
    pg 2.e not scrubbed since 2023-05-12T11:38:35.346843-0400
    pg 2.f not scrubbed since 2023-05-12T11:56:12.261176-0400
    pg 2.10 not scrubbed since 2023-05-12T13:02:46.879639-0400
    pg 2.11 not scrubbed since 2023-05-13T08:52:36.472487-0400
    pg 2.12 not scrubbed since 2023-05-12T19:51:31.418083-0400
    pg 2.13 not scrubbed since 2023-05-12T15:24:36.435177-0400
    pg 2.14 not scrubbed since 2023-05-12T06:41:07.198864-0400
    pg 2.15 not scrubbed since 2023-05-12T07:38:14.585238-0400
    pg 2.16 not scrubbed since 2023-05-12T12:47:16.997747-0400
    pg 2.17 not scrubbed since 2023-05-12T17:00:47.915481-0400
    pg 2.18 not scrubbed since 2023-05-13T12:21:38.334402-0400
    pg 2.19 not scrubbed since 2023-05-12T06:05:08.271338-0400
    pg 2.1a not scrubbed since 2023-05-12T15:32:27.947909-0400
    pg 2.1b not scrubbed since 2023-05-13T12:42:51.109792-0400
[WRN] TOO_FEW_OSDS: OSD count 2 < osd_pool_default_size 3
 
Last edited:
Did you reinstall the ceph packages on the newly joined node? Seems like your mointor ist still down. You should be able to remiport existing osds by running ceph-volume lvm activate --all.
 
Last edited:
Code:
root@pvemini1:~# ceph-volume lvm activate --all
--> Activating OSD ID 0 FSID 51ec4819-b28c-4119-b8ed-c42f1dfb449c
-->  ConfigurationError: Unable to load expected Ceph config at: /etc/ceph/ceph.conf

Note that as stated by Aaron I did
# ceph mon remove 0
edited /etc/pve/ceph.conf to remove the monitor from the mon_host line and its section
# ceph osd purge 0 --yes-i-really-mean-it
# ceph osd crush remove pvemini1

removed Mini1 keys from /etc/pve/priv/authorized_keys and /etc/pve/priv/known_hosts
moved the mini1 directory from /etc/pve/nodes
and this on mini2 and mini3 just when Node1 was off.

Do I have to rewrite the deleted part from /etc/pve/ceph.conf recreate a monitor ?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!