Two node setup grey question marks everywhere error

esi_y · Feb 28, 2024

DDPF said:
Okay perfect, cheers. How could I miss the button? xD

So this is interesting, I suppose this log is from the node which you have to take down to get it working in GUI again, same node that is on most of the time. I can see e.g. Feb 27 4:30 the other node was orderly shut down and checked out just fine, past 9am you then get into the GUI and shut this node manually down. I assume this is because you were experiencing the said issue, however there's nothing in the log that would suggest anything wrong with quorum to the point where it would be e.g. congesting the network. This is excellent because there is nothing wrong there.

I think I should have asked this already before, but how about when you SSH in? Can you start/stop VMs, etc? What exactly are your symptoms other than "grey GUI"? Do you get the same issues in the terminal (just to be clear, I would prefer to SSH in, literally, not use the GUI pty).

DDPF said:
I know how to backup datasets. I do not use PBS and I am a fan of ZFS. To reframe my question: When I send my newest snapshot from node 1 to my backup node 2, the VM data exists now on both nodes.

You do just zfs send | receive? I will be honest I never tired to do this across nodes as means of replication, only in and out (from/to another place). Did not consider it before, but ...

DDPF said:
However, If I want to migrate the VM from node 1 to 2 Proxmox doesn't know that a replica of the VM already exists in its filesystem.

What's the error messages when you attempt that? Maybe I do not know enough about the internals of that mechanism (yet), but ...

DDPF said:
Now what else do I need to do, so that Proxmox is aware of the VM copy on node 2? With replication from the webui, proxmox is aware of the copy and instntly migrates.

I am not suggesting it's the only way, but why not use the built-in tool? It's not about "doing it from GUI", that's just front-end for the API as is pvesr [1] if you like CLI.

When I look at pvesr source [2] it's not that exciting (sorry

) and uses PVE::API2::Replication [3] which gets to use PVE::Replication [4] which in the end just calls PVE::Storage::storage_migrate [5] which essentially does pvesm import [6] on the other end and feeds it with pvesm export [7] and there's some accounting for snapshot situation. That's interesting. I might be a bit off now but there's some hocus pocus with activate_volumes [8] there too.

[1] https://pve.proxmox.com/pve-docs/pvesr.1.html
[2] https://github.com/proxmox/pve-manager/blob/master/PVE/CLI/pvesr.pm
[3] https://github.com/proxmox/pve-mana...f0590ddeb58fab1ad/PVE/API2/Replication.pm#L39
[4] https://github.com/proxmox/pve-gues...9b3172296c38058d0/src/PVE/Replication.pm#L220
[5] https://github.com/proxmox/pve-stor...3e84d0a18f95de0322168/src/PVE/Storage.pm#L702
[6] https://github.com/proxmox/pve-stor...3e84d0a18f95de0322168/src/PVE/Storage.pm#L820
[7] https://github.com/proxmox/pve-stor...3e84d0a18f95de0322168/src/PVE/Storage.pm#L743
[8] https://github.com/proxmox/pve-stor...de0322168/src/PVE/Storage.pm#L1200C5-L1200C21

I think I detracted by now quite a bit (truth is I have to go for now), but it's not doing zfs send|receive for sure. I would look at the resulting datasets and snapshot names/attributes. But still it's interesting this would have something to do with your greyed out GUI on the node you replicate from...

DDPF said:
Your link is also a cool project, cheers, but I would prefer to stick to sanoid.

So you want to be replicating to the other node and then send|recv out, correct?

DDPF said:
Yes, works perfectly. Also without the expected.

Alright, I was not sure by heart. The expected is implied from the nodes list unless explicitly specified, but it is possible the tie breaker has overridden it for you. The result should be that even 1 node with 1 vote should be quorate. You set your node votes back to 1 each, correct?

DDPF · Feb 28, 2024

esi_y said:
So this is interesting, I suppose this log is from the node which you have to take down to get it working in GUI again, same node that is on most of the time. I can see e.g. Feb 27 4:30 the other node was orderly shut down and checked out just fine, past 9am you then get into the GUI and shut this node manually down. I assume this is because you were experiencing the said issue, however there's nothing in the log that would suggest anything wrong with quorum to the point where it would be e.g. congesting the network. This is excellent because there is nothing wrong there.

Yes that's correct. Its the log of my primary node.

esi_y said:
I think I should have asked this already before, but how about when you SSH in? Can you start/stop VMs, etc? What exactly are your symptoms other than "grey GUI"? Do you get the same issues in the terminal (just to be clear, I would prefer to SSH in, literally, not use the GUI pty).

So I wouldn't even notice if it was just the greyed out icons. So in the morning of the 27th of Feburary I noticed that in my HomeAssistant panel some sensor data is out of date and was last updated 2 days ago. I though that odd, why isn't it updating it. Then on the train I was connected to my node via wireguard vpn. I thought at least but it wasn't working at all. No connection whatsoever. Then I used my other wireguard vpn to look where the problem is and I login into the webui and see the greyed out question marks. I had it before and I know that a reboot fixes it, so I inistialized a reboot. After about 10min everythin was back to normal again. Last time I had the same issue, I noticed that I just can not access my services. It seems like the network is not working properly.

I always ssh in and almost never use the GUI pty, but I can not remember if starting and stoping VM and containers worked. I would need to try that out if the issue comes back.

esi_y said:
You do just zfs send | receive? I will be honest I never tired to do this across nodes as means of replication, only in and out (from/to another place). Did not consider it before, but ...

Yes, I use syncoid, under the hood I think it is more sophisticated then just zfs send and receive but the result is the same.

esi_y said:
What's the error messages when you attempt that? Maybe I do not know enough about the internals of that mechanism (yet), but ...

So there is no error message, but the last time I tried that it just starts copying the whole VM/container over the network to my second node. And just ignoring the already existing ones. I do not understand why.

esi_y said:
I am not suggesting it's the only way, but why not use the built-in tool? It's not about "doing it from GUI", that's just front-end for the API as is pvesr [1] if you like CLI.

When I look at pvesr source [2] it's not that exciting (sorry ) and uses PVE::API2::Replication [3] which gets to use PVE::Replication [4] which in the end just calls PVE::Storage::storage_migrate [5] which essentially does pvesm import [6] on the other end and feeds it with pvesm export [7] and there's some accounting for snapshot situation. That's interesting. I might be a bit off now but there's some hocus pocus with activate_volumes [8] there too.

[1] https://pve.proxmox.com/pve-docs/pvesr.1.html
[2] https://github.com/proxmox/pve-manager/blob/master/PVE/CLI/pvesr.pm
[3] https://github.com/proxmox/pve-mana...f0590ddeb58fab1ad/PVE/API2/Replication.pm#L39
[4] https://github.com/proxmox/pve-gues...9b3172296c38058d0/src/PVE/Replication.pm#L220
[5] https://github.com/proxmox/pve-stor...3e84d0a18f95de0322168/src/PVE/Storage.pm#L702
[6] https://github.com/proxmox/pve-stor...3e84d0a18f95de0322168/src/PVE/Storage.pm#L820
[7] https://github.com/proxmox/pve-stor...3e84d0a18f95de0322168/src/PVE/Storage.pm#L743
[8] https://github.com/proxmox/pve-stor...de0322168/src/PVE/Storage.pm#L1200C5-L1200C21

I think I detracted by now quite a bit (truth is I have to go for now), but it's not doing zfs send|receive for sure. I would look at the resulting datasets and snapshot names/attributes. But still it's interesting this would have something to do with your greyed out GUI on the node you replicate from...

That a good point, thanks for providing the sources, I will have a look around!

esi_y said:
So you want to be replicating to the other node and then send|recv out, correct?

Yes, so that the migrate just works for the VM and containers.

esi_y said:
Alright, I was not sure by heart. The expected is implied from the nodes list unless explicitly specified, but it is possible the tie breaker has overridden it for you. The result should be that even 1 node with 1 vote should be quorate. You set your node votes back to 1 each, correct?

Yes, that exaclty what I did. But the expected wasn't there before

...

DDPF · Mar 31, 2024

The problem is back. What kind of logs can I share to settle this issue?

Search

Search

Two node setup grey question marks everywhere error

esi_y

Renowned Member

DDPF

New Member

DDPF

New Member

Attachments

We value your privacy