I don't think there would have been a power outage - not sure whether you mean any other sort of outage.
The particular node has been up for 8 days (while the other two nodes have been up for 33 and 45 days respectively). However, this coincides with a system update I ran prior to going on vacation on this one node (probably not the best time to do this...). So I guess that explains the different uptime.
The / file system is not full but has plenty of space. Mount also does not show anything suspicious.
However journalctl yields this:
Jan 05 19:18:43 tx1330m2-1 smartd[680]: Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors
Jan 05 19:40:46 tx1330m2-1 pmxcfs[908]: [dcdb] notice: data verification successful
Jan 05 19:47:05 tx1330m2-1 ceph-mon[1010]: [205B blob data]
Jan 05 19:47:05 tx1330m2-1 ceph-mon[1010]: PutCF( prefix = paxos key = '4859972' value size = 2440)
Jan 05 19:47:05 tx1330m2-1 ceph-mon[1010]: PutCF( prefix = paxos key = 'pending_v' value size = 8)
Jan 05 19:47:05 tx1330m2-1 ceph-mon[1010]: PutCF( prefix = paxos key = 'pending_pn' value size = 8)
Jan 05 19:47:10 tx1330m2-1 kernel: sd 0:0:0:0: [sda] tag#20 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=40s
Jan 05 19:47:10 tx1330m2-1 kernel: sd 0:0:0:0: [sda] tag#20 CDB: Write(10) 2a 00 01 2e 7f 28 00 00 08 00
Jan 05 19:47:10 tx1330m2-1 kernel: blk_update_request: I/O error, dev sda, sector 19824424 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs warning (device dm-2): ext4_end_bio:344: I/O error 10 writing to inode 786450 starting block 249317)
Jan 05 19:47:10 tx1330m2-1 kernel: sd 0:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=40s
Jan 05 19:47:10 tx1330m2-1 kernel: sd 0:0:0:0: [sda] tag#13 CDB: Write(10) 2a 00 04 b2 f1 b8 00 00 10 00
Jan 05 19:47:10 tx1330m2-1 kernel: blk_update_request: I/O error, dev sda, sector 78836152 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs warning (device dm-2): ext4_end_bio:344: I/O error 10 writing to inode 134688 starting block 7625783)
Jan 05 19:47:10 tx1330m2-1 kernel: Buffer I/O error on device dm-2, logical block 7625783
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs warning (device dm-2): ext4_end_bio:344: I/O error 10 writing to inode 134688 starting block 7625784)
Jan 05 19:47:10 tx1330m2-1 kernel: sd 0:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=39s
Jan 05 19:47:10 tx1330m2-1 kernel: sd 0:0:0:0: [sda] tag#14 CDB: Write(10) 2a 00 02 4c 6a 10 00 00 18 00
Jan 05 19:47:10 tx1330m2-1 kernel: blk_update_request: I/O error, dev sda, sector 38562320 op 0x1:(WRITE) flags 0x0 phys_seg 3 prio class 0
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs warning (device dm-2): ext4_end_bio:344: I/O error 10 writing to inode 134661 starting block 2591554)
Jan 05 19:47:10 tx1330m2-1 kernel: Buffer I/O error on device dm-2, logical block 2591554
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs warning (device dm-2): ext4_end_bio:344: I/O error 10 writing to inode 134661 starting block 2591555)
Jan 05 19:47:10 tx1330m2-1 kernel: sd 0:0:0:0: [sda] tag#15 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=36s
Jan 05 19:47:10 tx1330m2-1 kernel: sd 0:0:0:0: [sda] tag#15 CDB: Write(10) 2a 00 02 d5 d3 d8 00 00 80 00
Jan 05 19:47:10 tx1330m2-1 kernel: blk_update_request: I/O error, dev sda, sector 47567832 op 0x1:(WRITE) flags 0x800 phys_seg 16 prio class 0
Jan 05 19:47:10 tx1330m2-1 kernel: Buffer I/O error on device dm-2, logical block 249317
Jan 05 19:47:10 tx1330m2-1 kernel: Buffer I/O error on device dm-2, logical block 7625784
Jan 05 19:47:10 tx1330m2-1 kernel: Buffer I/O error on device dm-2, logical block 2591555
Jan 05 19:47:10 tx1330m2-1 kernel: Buffer I/O error on device dm-2, logical block 2591556
Jan 05 19:47:10 tx1330m2-1 kernel: Aborting journal on device dm-2-8.
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm cfs_loop: Detected aborted journal
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm rs:main Q:Reg: Detected aborted journal
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm systemd-journal: Detected aborted journal
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm log: Detected aborted journal
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm safe_timer: Detected aborted journal
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm log: Detected aborted journal
Jan 05 19:47:10 tx1330m2-1 kernel: EXT4-fs (dm-2): Remounting filesystem read-only
The two unreadable sector I see from the beginning of the journal - don't know how long they have been there. But on Jan 5 the system was remounted read-only.
/dev/sda is a relatively new ssd (no wearout yet). It only holds the OS. VMs etc reside on other disks.
So what is the best course of action? I can't interpret the error messages and don't know how bad this is.
Should I power down the node and reinstall PVE on a new drive? Or can the node be saved somehow as is?
Thanks!