I had a power outage for about an hour this afternoon, which took down three nodes of my four-node cluster (creatively enough, the nodes are named pve1, pve2, and pve3; pve4 was on a different UPS and wasn't affected). After bringing them back up, pve1 and pve2 are in a semi-offline status:
I can SSH to each of these nodes, and I can log into the PVE web UI on each of these nodes (the screen shot above is while logged into pve1, but I see the same when logged into pve3). Both of them are up in Ceph--in fact, they're the only ones that are up in Ceph.
On a hunch, I checked the date on these nodes, and found it was way out--like over six weeks behind (IIRC, it was showing a date of 13 Jul 23). The hardware clock was even worse, reading somewhere in 2010. So I used
All four nodes are running 8.0.4 with all updates through about a week ago. Each node can ping any of the others. Scrubs of the boot pools find no errors. I'm kind of baffled here--what else should I be checking?
I can SSH to each of these nodes, and I can log into the PVE web UI on each of these nodes (the screen shot above is while logged into pve1, but I see the same when logged into pve3). Both of them are up in Ceph--in fact, they're the only ones that are up in Ceph.
On a hunch, I checked the date on these nodes, and found it was way out--like over six weeks behind (IIRC, it was showing a date of 13 Jul 23). The hardware clock was even worse, reading somewhere in 2010. So I used
chronyd
to forcibly resync the time to my NTP server, then hwclock
to set the hardware clock to the system clock. After a reboot, pve1 showed the green checkmark for a short time, then reverted to the gray question mark--pve2 stayed at the question mark. System and hardware date/time are correct on pve3 and pve4.All four nodes are running 8.0.4 with all updates through about a week ago. Each node can ping any of the others. Scrubs of the boot pools find no errors. I'm kind of baffled here--what else should I be checking?
Last edited: