599 To many Redirections, 596 Broke Pipe

michael wagner · Nov 15, 2022

Hi,

we have e Cluster with 3 Nodes in our school envieroment. When students create LXC Container, then occurs in 33% of the cases the error 599 To many Redirections, 596 Broke Pipe. In the most cases the error occurs when the storage should be selected or the installer image should be selected. We have checked the DNS and add all Nodes in the local hosts file on all nodes. We have tree storages (local, local-lvm, and SSD-School). This storages have on all nodes the same name. No NFS or so, are configured. Only lokal disks. I had also checkt the corosync.cfg nothing strange inside. Switch to debug to see more Information. No error/warning messages in the log. i had also checkt the syslogd file. Only in the pveproxy/accesslog i could find some messages

::ffff:172.17.216.200 - if200183@htl [15/11/2022:10:10:51 +0100] "GET /api2/extjs/version?_dc=1668503445997 HTTP/1.1" 200 77
::ffff:172.17.216.200 - if200183@htl [15/11/2022:10:10:52 +0100] "GET /api2/extjs/cluster/sdn?_dc=1668503445999 HTTP/1.1" 200 92
::ffff:172.17.214.234 - if200182@htl [15/11/2022:10:13:16 +0100] "GET /api2/json/nodes/pve02/storage/local/content?content=vztmpl HTTP/1.1" 599 -
::ffff:172.17.210.233 - if200205@htl [15/11/2022:10:13:37 +0100] "GET /api2/json/nodes/pve01/storage?format=1&content=rootdir HTTP/1.1" 599 -
::ffff:172.17.216.200 - if200183@htl [15/11/2022:10:13:56 +0100] "GET /api2/json/nodes/pve01/storage/local/content?content=vztmpl HTTP/1.1" 599 -
::ffff:172.17.218.70 - root@pam [15/11/2022:10:14:42 +0100] "GET /api2/json/nodes/pve03/status HTTP/1.1" 599 -
::ffff:172.17.213.212 - if200104@htl [15/11/2022:10:17:19 +0100] "GET /api2/json/nodes/pve01/network?type=any_bridge HTTP/1.1" 599 -
::ffff:172.17.212.224 - if200188@htl [15/11/2022:10:21:32 +0100] "GET /api2/json/nodes/pve01/storage/local/content?content=vztmpl HTTP/1.1" 599 -
::ffff:172.17.218.70 - root@pam [15/11/2022:10:38:23 +0100] "GET /api2/json/nodes/pve03/status HTTP/1.1" 599 -

what should i do to solve that problem?

Regards
Michael

Moayad · Nov 15, 2022

Hi,

Can you please provide us with the syslog during create the LXC?

michael wagner · Nov 15, 2022

Hi Moayad,

thx for you quick reponse. I have uploades the syslogd files from all nodes. We had tested create about 30 LXCs at 15.11.2022 around 10:00

Regards
Michael

michael wagner · Nov 25, 2022

Hi Moayad,

do you have seen any reason for the behavior at our cluster?
regards

Michael

Dunuin · Nov 26, 2022

I also really hate this "error 599". As soon as my remote SMB/PBS storages can't be accessed, the webUI basically will become unusable as the connection will timeout all the time (so I need several tries to disable these storages) or I see the "error 599". As soon as I disable the SMB/PBS storages or they become available again, everything continues to work normally.
There really needs to be something done, that at least the webUI won't stop working. And those PVE servers aren't clustered and its all local storage except for the SMB/PBS storages where I just store my backups once per day.

@michael wagner:
Is the IO delay high when that happens? If it is the same as here, then maybe there is a probolem with your storage and PVE gets somehow stuck while polling the state of your local storages.

michael wagner · Nov 26, 2022

Dunuin said:
I also really hate this "error 599". As soon as my remote SMB/PBS storages can't be accessed, the webUI basically will become unusable as the connection will timeout all the time (so I need several tries to disable these storages) or I see the "error 599". As soon as I disable the SMB/PBS storages or they become available again, everything continues to work normally.
There really needs to be something done, that at least the webUI won't stop working. And those PVE servers aren't clustered and its all local storage except for the SMB/PBS storages where I just store my backups once per day.

@michael wagner:
Is the IO delay high when that happens? If it is the same as here, then maybe there is a probolem with your storage and PVE gets somehow stuck while polling the state of your local storages.

Hi Dunuin,

Thx for your response. I had now checked these values. While the system is under load. The change of that value (IO Wait) between no load and load with 599 occourence is very low. I will analyze that, because the behavior description from your side is very logicly

regards
Michael

michael wagner · Nov 29, 2022

Good Morning, I had installed 7.3 and add to all Bridges the new mac address option. Still the same. I had installed a new node just for connecting to the Cluster still the same.

Stoiko Ivanov · Jan 16, 2023

* On a hunch - please check your DNS entries and your /etc/hosts - are all IPs correctly assigned to the PVE node names?
* Do you see any issues in the system journal when this occurs? (`journalctl -f`)

I hope this helps!

Stoiko Ivanov · Mar 28, 2023

After a few unsuccessful tries of reproducing this - it seems the root-cause in this particular case was fixed by https://git.proxmox.com/?p=pve-acce...it;h=170cf17bf791b4373540102b1e58bcb61d716fd6
(which is included in PVE 7.4)

Thanks @michael wagner for providing feedback and their configuration for the analysis.

Search

Search

599 To many Redirections, 596 Broke Pipe

michael wagner

Member

Moayad

Proxmox Staff Member

michael wagner

Member

Attachments

michael wagner

Member

Dunuin

Distinguished Member

michael wagner

Member

michael wagner

Member

Stoiko Ivanov

Proxmox Staff Member

Stoiko Ivanov

Proxmox Staff Member