Hi everyone!
I currently have a 3 node cluster in which node 1-2 is physical servers and the 3rd one is a vm running on unRAID.
(proxmox: v8.1.4 [16GB of RAM, 2 HDDs in RAID1 using ZFS, i3-4150 in both node1-2], unRAID: 6.12.4 [16GB of 48GB, 4 cores of a 6 core i5-8400, 2 vm disks on different HDDs in RAID 1 using ZFS])
For some reason node3 sometimes reboots between 2:00 am and 2:20 am. It does not reboot every night, so it's kinda random.
The node comes back online in 1-3 mins but because of this, all VMs are migrated every time.
Maybe it's related to performance issues when doing backups? There is 1 that starts at 1:30 am (1 VM) and at 2:00 am (1 VM). The 1:30 one finishes before the next one. Both VMs are on that 3rd node so maybe it's related to some kind of performance bottleneck? The assigned resources are the same as the physical servers, 2 vm disks on different drives on the host server. Can't be out of memory as it should kill the vm permanently and not reboot it, also it has plenty of memory, both the node and the host server.
In the proxmox log, I can't see anything related to it, 3 mins before there is a replication timeout which usually happens when there is a backup running. The 2:00 am backup finished in 7 mins and the node just rebooted 5 mins later without any sign, just a
in the log.
Plex starts the scheduled tasks at 2:00am on unRAID, so I set it to 3:00am but I don't think it's related. Planning to move the node to a physical server in the future but not there yet.
How can I troubleshoot this case, where should I start?
Thank you in advance!
I currently have a 3 node cluster in which node 1-2 is physical servers and the 3rd one is a vm running on unRAID.
(proxmox: v8.1.4 [16GB of RAM, 2 HDDs in RAID1 using ZFS, i3-4150 in both node1-2], unRAID: 6.12.4 [16GB of 48GB, 4 cores of a 6 core i5-8400, 2 vm disks on different HDDs in RAID 1 using ZFS])
For some reason node3 sometimes reboots between 2:00 am and 2:20 am. It does not reboot every night, so it's kinda random.
The node comes back online in 1-3 mins but because of this, all VMs are migrated every time.
Maybe it's related to performance issues when doing backups? There is 1 that starts at 1:30 am (1 VM) and at 2:00 am (1 VM). The 1:30 one finishes before the next one. Both VMs are on that 3rd node so maybe it's related to some kind of performance bottleneck? The assigned resources are the same as the physical servers, 2 vm disks on different drives on the host server. Can't be out of memory as it should kill the vm permanently and not reboot it, also it has plenty of memory, both the node and the host server.
In the proxmox log, I can't see anything related to it, 3 mins before there is a replication timeout which usually happens when there is a backup running. The 2:00 am backup finished in 7 mins and the node just rebooted 5 mins later without any sign, just a
Code:
-- Reboot --
Plex starts the scheduled tasks at 2:00am on unRAID, so I set it to 3:00am but I don't think it's related. Planning to move the node to a physical server in the future but not there yet.
How can I troubleshoot this case, where should I start?
Thank you in advance!