Current fence status: FENCE, with wrong hd space usage

Adi M

Active Member
Mar 1, 2020
40
3
28
52
I have a cluster with four notes and ZFS. Every notes has 3 drives for system and 5 drives for VMs/CTs.

Now today one note goes to fence state.
What I see: hd space is in red with 99.97% (26.54 GiB of 26.55 GiB).
But there must be some wrong because zfs cluster looks like this (rpool Size = 438 GB; Free = 13.6 GB):
1673528010179.png

It's still possible to access to this portal without any restriction or speed limitation.

I don't now, if its a good way do just to reboot server without have more problem.
View from ok notes:
1673528471267.png

View from nok notes:
1673528561106.png

Please give mir input for more specific info.
 
Last edited:
are these using ZFS as / ? please provide the output of "zfs list" and "df -h" on "pve-yps". thanks!
 
Yes, i'm using ZFS as /
zfs list:
Code:
NAME                            USED  AVAIL     REFER  MOUNTPOINT
pbpool                          493G  44.6G      493G  /mnt/datastore/pbpool
rpool                           263G  9.41M      139K  /rpool
rpool/ROOT                     26.5G  9.41M      128K  /rpool/ROOT
rpool/ROOT/pve-1               26.5G  9.41M     26.5G  /
rpool/data                      237G  9.41M      128K  /rpool/data
rpool/data/vm-72250-disk-1     50.6G  9.41M     50.5G  -
rpool/data/vm-72250-disk-3     9.18G  9.41M     9.18G  -
rpool/data/vm-72252-disk-0     48.9G  9.41M     46.2G  -
rpool/data/vm-72252-disk-1      208K  9.41M      208K  -
rpool/data/vm-72252-disk-2      123K  9.41M      112K  -
rpool/data/vm-72252-state-tt1  8.71G  9.41M     8.71G  -
rpool/data/vm-72254-disk-0      119G  9.41M      119G  -

df -h:
Code:
Filesystem                Size  Used Avail Use% Mounted on
udev                       24G     0   24G   0% /dev
tmpfs                     4.8G   26M  4.7G   1% /run
rpool/ROOT/pve-1           27G   27G  9.4M 100% /
tmpfs                      24G   46M   24G   1% /dev/shm
tmpfs                     5.0M     0  5.0M   0% /run/lock
rpool                     9.7M  256K  9.4M   3% /rpool
pbpool                    538G  493G   45G  92% /mnt/datastore/pbpool
rpool/ROOT                9.5M  128K  9.4M   2% /rpool/ROOT
rpool/data                9.5M  128K  9.4M   2% /rpool/data
/dev/fuse                 128M   48K  128M   1% /etc/pve
192.168.0.2:/pve-storage  1.4T  872G  500G  64% /mnt/pve/nas-storage
192.168.0.2:/pve-lvm      1.4T  872G  500G  64% /mnt/pve/nas-lvm
tmpfs                     4.8G     0  4.8G   0% /run/user/0

Thank you
 
your rpool is only 263G, not 438G.. is it possible you underestimated the overhead of raidz? anyhow, you need to (re)move some data, e.g. by cleaning out old backups or similar things (snapshots?), and then re-evaluate your storage concept.
 
Hmm, yes, interesting:
1673537530624.png

How can I give space back? All VMs is now on other note running and turnt of replication.
So can I remove all VM's on "pve-yps"?
When I remove a snapshot, I get "Permission denied".
 
I thought you moved all VMs? then no VM config can be on the "full" node, only volumes..

could you post the output of

- qm list
- pct list
- pvecm status

on pve-yps?
 
I thought you moved all VMs? then no VM config can be on the "full" node, only volumes..
Sorry, it was moved by replication and HA.

could you post the output of


- qm list
- pct list
- pvecm status

on pve-yps?
qm list:
Code:
VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
     72250 Windows10pro         stopped    8192              50.00 0
     72252 Windows11Pro         stopped    16384             50.00 0

pct list: -> empty

pvecm status:
Code:
Cluster information
-------------------
Name:             pve-apnw
Config Version:   6
Transport:        knet
Secure auth:      on

Cannot initialize CMAP service
 
okay, so your node is not part of the cluster at the moment and hasn't realized yet that those VMs were stolen by HA..

what about systemctl status pve-cluster corosync and journalctl -b -u pve-cluster | head -n 100 on pve-yps?
 
systemctl status pve-cluster corosync
Code:
pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-01-12 12:17:18 CET; 23h ago
    Process: 3695 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 3786 (pmxcfs)
      Tasks: 6 (limit: 57840)
     Memory: 60.1M
        CPU: 52.025s
     CGroup: /system.slice/pve-cluster.service
             └─3786 /usr/bin/pmxcfs

Jan 13 11:49:21 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
Jan 13 11:49:21 pve-yps pmxcfs[3786]: [status] crit: cpg_initialize failed: 2
Jan 13 11:49:27 pve-yps pmxcfs[3786]: [quorum] crit: quorum_initialize failed: 2
Jan 13 11:49:27 pve-yps pmxcfs[3786]: [confdb] crit: cmap_initialize failed: 2
Jan 13 11:49:27 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
Jan 13 11:49:27 pve-yps pmxcfs[3786]: [status] crit: cpg_initialize failed: 2
Jan 13 11:49:33 pve-yps pmxcfs[3786]: [quorum] crit: quorum_initialize failed: 2
Jan 13 11:49:33 pve-yps pmxcfs[3786]: [confdb] crit: cmap_initialize failed: 2
Jan 13 11:49:33 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
Jan 13 11:49:33 pve-yps pmxcfs[3786]: [status] crit: cpg_initialize failed: 2
Jan 13 11:49:39 pve-yps pmxcfs[3786]: [quorum] crit: quorum_initialize failed: 2
Jan 13 11:49:39 pve-yps pmxcfs[3786]: [confdb] crit: cmap_initialize failed: 2
Jan 13 11:49:39 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
Jan 13 11:49:39 pve-yps pmxcfs[3786]: [status] crit: cpg_initialize failed: 2

● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2023-01-12 12:17:19 CET; 23h ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
    Process: 3874 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=21)
   Main PID: 3874 (code=exited, status=21)
        CPU: 125ms

Jan 12 12:17:19 pve-yps corosync[3874]:   [KNET  ] host: host: 2 has no active links
Jan 12 12:17:19 pve-yps corosync[3874]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Jan 12 12:17:19 pve-yps corosync[3874]:   [KNET  ] host: host: 2 has no active links
Jan 12 12:17:19 pve-yps corosync[3874]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Jan 12 12:17:19 pve-yps corosync[3874]:   [KNET  ] host: host: 2 has no active links
Jan 12 12:17:19 pve-yps corosync[3874]:   [MAIN  ] Couldn't store new ring id 455 to stable storage: No space left on device (28)
Jan 12 12:17:19 pve-yps corosync[3874]:   [MAIN  ] Corosync Cluster Engine exiting with status 21 at main.c:707.
Jan 12 12:17:19 pve-yps systemd[1]: corosync.service: Main process exited, code=exited, status=21/n/a
Jan 12 12:17:19 pve-yps systemd[1]: corosync.service: Failed with result 'exit-code'.
Jan 12 12:17:19 pve-yps systemd[1]: Failed to start Corosync Cluster Engine.

journalctl -b -u pve-cluster | head -n 100
Code:
-- Journal begins at Wed 2022-05-11 19:54:01 CEST, ends at Fri 2023-01-13 11:51:43 CET. --
Jan 12 12:17:16 pve-yps systemd[1]: Starting The Proxmox VE cluster filesystem...
Jan 12 12:17:17 pve-yps pmxcfs[3786]: [quorum] crit: quorum_initialize failed: 2
Jan 12 12:17:17 pve-yps pmxcfs[3786]: [quorum] crit: can't initialize service
Jan 12 12:17:17 pve-yps pmxcfs[3786]: [confdb] crit: cmap_initialize failed: 2
Jan 12 12:17:17 pve-yps pmxcfs[3786]: [confdb] crit: can't initialize service
Jan 12 12:17:17 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
Jan 12 12:17:17 pve-yps pmxcfs[3786]: [dcdb] crit: can't initialize service
Jan 12 12:17:17 pve-yps pmxcfs[3786]: [status] crit: cpg_initialize failed: 2
Jan 12 12:17:17 pve-yps pmxcfs[3786]: [status] crit: can't initialize service
Jan 12 12:17:18 pve-yps systemd[1]: Started The Proxmox VE cluster filesystem.
Jan 12 12:17:23 pve-yps pmxcfs[3786]: [quorum] crit: quorum_initialize failed: 2
Jan 12 12:17:23 pve-yps pmxcfs[3786]: [confdb] crit: cmap_initialize failed: 2
Jan 12 12:17:23 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
Jan 12 12:17:23 pve-yps pmxcfs[3786]: [status] crit: cpg_initialize failed: 2
Jan 12 12:17:29 pve-yps pmxcfs[3786]: [quorum] crit: quorum_initialize failed: 2
Jan 12 12:17:29 pve-yps pmxcfs[3786]: [confdb] crit: cmap_initialize failed: 2
Jan 12 12:17:29 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
Jan 12 12:17:29 pve-yps pmxcfs[3786]: [status] crit: cpg_initialize failed: 2
Jan 12 12:17:35 pve-yps pmxcfs[3786]: [quorum] crit: quorum_initialize failed: 2
Jan 12 12:17:35 pve-yps pmxcfs[3786]: [confdb] crit: cmap_initialize failed: 2
Jan 12 12:17:35 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
Jan 12 12:17:35 pve-yps pmxcfs[3786]: [status] crit: cpg_initialize failed: 2
Jan 12 12:17:41 pve-yps pmxcfs[3786]: [quorum] crit: quorum_initialize failed: 2
Jan 12 12:17:41 pve-yps pmxcfs[3786]: [confdb] crit: cmap_initialize failed: 2
Jan 12 12:17:41 pve-yps pmxcfs[3786]: [dcdb] crit: cpg_initialize failed: 2
...
 
okay. so one possible way forward (again, if you are sure you don't need those volumes anymore on pve-yps!) would be to delete the volumes using 'zfs destroy' and then reboot the node. hopefully it will be able to rejoin the cluster and re-sync the configuration.

what I meant with "reconsider" is explained in the link I gave you - basically, raidz with VMs has a high overhead, you might be better of using mirrors insteads (or tuning your volblocksize and recreating the volumes, but that has other downsides)..
 
yes, you should only delete the volumes, and only if you don't need them!
 
OK, thank you for your Help. On Somehow the space comes back and after reboot, pve-yps is back in cluster :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!