[edit: removed cluster, rechecking.]
[edit2: RESOLVED. it was the problem with the hard drives. removing them from /etc/fstab resulted in proxmox booting up normally]
i have two proxmox servers (both on VE-6), added them together in a cluster for testing, everything worked fine the past month. today, the servers could not see each other, and one of the servers now boots into emergency mode.
running journalctl -xb shows two basic issues:
1. four relatively new drives are not mounting. but i configured these in OMV, so that may be normal on bootup?
2. the cluster (quorum, cmap, cfg) cannot initialize. could this be due to the drives?
fyi, i did set the quorum expectation to ONE, so that i could still get to the servers if one failed. and i can get to the one that is running. PVECM STATUS output below.
to address the mounting issue, i tried removing the four mounts from /etc/pve/storage.cfg, but i am not allowed to edit that file.
i am suspicious of the cluster due to error messages (below), and wonder if removing the cluster would fix things.
i do need the downed server, but i do NOT need my two servers to be in a cluster, as i was going to remove that anyway.
journalctl -xb reports the following errors:
Couldn't get size: 0x8000...e (after loading UEFI: db cert VMware items)
ACPI errors (no biggie i dont think)
...
[note: the following errors happen for each of the four drives i added (a/b/c/d). these were setup/configured with OMV, so maybe this is normal?]
...
mox systemd[1] Timed out waiting for device /dev/disk/by-label/wdpurple4a/b/c/d
A start job for unit ...wdpurple4a/b/c/d ha failed
the job identifier is 63 and the job result is timeout.
mox systemd[1] Dependency failed for local file systems.
...
mox pmxcfs[930]: [quorum] crit: quorum_initialize failed: 2
mox pmxcfs[930]: [quorum] crit: can't initialize service
mox pmxcfs[930]: [quorum] crit: cmap_initialize failed: 2
mox pmxcfs[930]: [quorum] crit: can't initialize service
mox pmxcfs[930]: [quorum] crit: cpg_initialize failed: 2
mox pmxcfs[930]: [quorum] crit: can't initialize service
mox pmxcfs[930]: [quorum] crit: cpg_initialize failed: 2
mox pmxcfs[930]: [quorum] crit: can't initialize service
...
mox systemd[776]: emergency.service: executable /bin/plymouth missing. skipping...
the process /bin/plymouth could not be executed and failed.
...
[note: lots more quorum, cmap and cpg initialization errors]
...
[note: followed by messages about emergency.services]
...
[note: now just looping through lots of quorum, cmap and cpg initialization errors]
here is the result of pvecm status on running server
Cluster information
-------------------
Name: pmox
Config Version: 2
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Fri Jan 10 19:57:32 2020
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.580
Quorate: Yes
Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.2 (local)
here is the result of pvecm status on down server
cannot initialize cmap service
[edit2: RESOLVED. it was the problem with the hard drives. removing them from /etc/fstab resulted in proxmox booting up normally]
i have two proxmox servers (both on VE-6), added them together in a cluster for testing, everything worked fine the past month. today, the servers could not see each other, and one of the servers now boots into emergency mode.
running journalctl -xb shows two basic issues:
1. four relatively new drives are not mounting. but i configured these in OMV, so that may be normal on bootup?
2. the cluster (quorum, cmap, cfg) cannot initialize. could this be due to the drives?
fyi, i did set the quorum expectation to ONE, so that i could still get to the servers if one failed. and i can get to the one that is running. PVECM STATUS output below.
to address the mounting issue, i tried removing the four mounts from /etc/pve/storage.cfg, but i am not allowed to edit that file.
i am suspicious of the cluster due to error messages (below), and wonder if removing the cluster would fix things.
i do need the downed server, but i do NOT need my two servers to be in a cluster, as i was going to remove that anyway.
journalctl -xb reports the following errors:
Couldn't get size: 0x8000...e (after loading UEFI: db cert VMware items)
ACPI errors (no biggie i dont think)
...
[note: the following errors happen for each of the four drives i added (a/b/c/d). these were setup/configured with OMV, so maybe this is normal?]
...
mox systemd[1] Timed out waiting for device /dev/disk/by-label/wdpurple4a/b/c/d
A start job for unit ...wdpurple4a/b/c/d ha failed
the job identifier is 63 and the job result is timeout.
mox systemd[1] Dependency failed for local file systems.
...
mox pmxcfs[930]: [quorum] crit: quorum_initialize failed: 2
mox pmxcfs[930]: [quorum] crit: can't initialize service
mox pmxcfs[930]: [quorum] crit: cmap_initialize failed: 2
mox pmxcfs[930]: [quorum] crit: can't initialize service
mox pmxcfs[930]: [quorum] crit: cpg_initialize failed: 2
mox pmxcfs[930]: [quorum] crit: can't initialize service
mox pmxcfs[930]: [quorum] crit: cpg_initialize failed: 2
mox pmxcfs[930]: [quorum] crit: can't initialize service
...
mox systemd[776]: emergency.service: executable /bin/plymouth missing. skipping...
the process /bin/plymouth could not be executed and failed.
...
[note: lots more quorum, cmap and cpg initialization errors]
...
[note: followed by messages about emergency.services]
...
[note: now just looping through lots of quorum, cmap and cpg initialization errors]
here is the result of pvecm status on running server
Cluster information
-------------------
Name: pmox
Config Version: 2
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Fri Jan 10 19:57:32 2020
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.580
Quorate: Yes
Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.2 (local)
here is the result of pvecm status on down server
cannot initialize cmap service
Last edited: