[SOLVED] 5.8 kernel , vzdump to local storage results in node fenced

RobFantini · Jun 20, 2024

Hello
we have a 5 node cluster.

a couple of months ago we had this issue, to solve I pinned a 6.5 kernel.

Last night I unpinned and booted the 5 nodes to use 6.8.4-2-pve.

at 2AM shortly after a vzdump backup to this storage :

Code:

dir: z-local-nvme
        path /nvme-ext4
        content images,backup,vztmpl,iso,snippets,rootdir
        prune-backups keep-last=1
        shared 0

one of the nodes got fenced.
HA vm's were migrated.

all 5 nodes run the backup at the same time.

I pinned 6.5.13-5-pve and rebooted all nodes
--------------------------------

So it looks like there is a bug or a bad configuration here.

Any suggestions to fix this issue?
Is more data needed?

Chris · Jun 21, 2024

HI,

RobFantini said:
one of the nodes got fenced.

when you say it got fenced, do you mean it was rebooted or did it lost connection to the quorate network segment?
This sounds more like a node local issue to me, as the backup is performed to a local disk, therefore the network should be fine.

RobFantini said:
Any suggestions to fix this issue?
Is more data needed?

Can you provide an excerpt of the systemd journal of all the nodes in the cluster around the time when this issue appeared? journalctl --since <DATETIME> --until <DATETIME> > $(hostname)-systemd-journal.log should provide us more information.

RobFantini · Jun 21, 2024

Chris said:
HI,

when you say it got fenced, do you mean it was rebooted or did it lost connection to the quorate network segment?
This sounds more like a node local issue to me, as the backup is performed to a local disk, therefore the network should be fine.

Can you provide an excerpt of the systemd journal of all the nodes in the cluster around the time when this issue appeared? journalctl --since <DATETIME> --until <DATETIME> > $(hostname)-systemd-journal.log should provide us more information.

Hi Chris,
the node got disconnected to the cluster network.

so there could be faulty hardware as the local write to nvme disk could be causing some network connect issue.

I will try using a 6.8 kernel to another node. and keep 6.5 to the rest.

RobFantini · Jul 10, 2024

it looks like the issue was caused by a hardware issue on one node.

I unpinned older kernel from the other 4 nodes two weeks ago, and the issue did not occur.

Search

Search

[SOLVED] 5.8 kernel , vzdump to local storage results in node fenced

RobFantini

Famous Member

Chris

Proxmox Staff Member

RobFantini

Famous Member

RobFantini

Famous Member