[SOLVED] PVE 7.4-17 GUI "Too Many Redirections" error

edtrumbull

New Member
Jun 24, 2022
29
5
3
Our cluster has 7 nodes running pve 7.4-17.
I am displaying the GUI through a brave browser (1.63.138) on a laptop running linux mint 21 with the cinnamon desktop.

Every so often I see a "too many redirections" error pop up on the screen. (see screenshot) It stays up for a minute or so then clears.

I'm not sure where this error message is coming from, the browser, the PVE cluster, or my laptop. Nor do I have any idea what it's trying to tell me and how to fix it. I appreciate any suggestions to where to look and/or what to change to prevent these errors from popping up.
 

Attachments

  • proxmox-redirections.png
    proxmox-redirections.png
    197.9 KB · Views: 32
Hi,

I would check the syslog during a time when the "Too Many Redirections" message is, this will let us know what the issue cause.
 
Thank you for your reply. The error reoccurred roughly at 08:00 EST (US).
At that time the syslog for the server I was looking at shows 5500-plus log entries.

4900 of them are variations on "Jan 31 08:02:18 ceph-02 ceph-osd[2430]: 2024-01-31T08:02:18.796-0500 7f3a4ba29700 -1 osd.8 30673 heartbeat_check: no reply from 172.29.6.21:6828 osd.16 since back 2024-01-31T08:02:18.643637-0500 front 2024-01-31T08:01:47.109720-0500 (oldest deadline 2024-01-31T08:02:12.073673-0500)"

another 180 are like "Jan 31 08:02:18 ceph-02 ceph-osd[2406]: 2024-01-31T08:02:18.844-0500 7fb8e6f52700 -1 osd.12 30673 get_health_metrics reporting 3 slow ops, oldest is osd_op(client.258494491.0:1269682 3.a1 3:853015dc:::rbd_data.760c617b626a33.0000000000000809:head [write 1159168~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected e30668)"

There are also several instances of messages like:
Jan 31 08:02:40 ceph-02 corosync[1587]: [KNET ] link: host: 6 link: 0 is down
Jan 31 08:02:40 ceph-02 corosync[1587]: [KNET ] host: host: 6 (passive) best link: 0 (pri: 1)
Jan 31 08:02:40 ceph-02 corosync[1587]: [KNET ] host: host: 6 has no active links


All of these suggest to me that there's a network error.

My /etc/network/interfaces file looks like this:
Code:
auto lo
iface lo inet loopback

auto ens4f0np0
iface ens4f0np0 inet manual

auto ens4f1np1
iface ens4f1np1 inet manual

iface enxb03af2b6059f inet manual

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves ens4f0np0 ens4f1np1
    bond-miimon 100
    bond-mode balance-alb

auto vmbr0
iface vmbr0 inet static
    address 172.29.6.22/17
    gateway 172.29.0.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
[endcode]

ens4f0np0 and ens4f1np1 are both 10g/sec fibers

/proc/net/bonding/bond0 looks like:
[code]
root@ceph-02:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.15.131-2-pve

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: ens4f1np1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: ens4f0np0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 4
Permanent HW addr: e4:3d:1a:d6:c9:80
Slave queue ID: 0

Slave Interface: ens4f1np1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 4
Permanent HW addr: e4:3d:1a:d6:c9:81
Slave queue ID: 0

Neither of the two adapters has entries in syslog near the time these events occurred. The last events that the adapters logged were on 30 Jan when the fiber switches had a firmware update applied[/code]
 
Last edited:
Thank you for the information and the network config. Could you please also post the corosync config `cat /etc/pve/corosync.conf`?
 
Here you go:

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: ceph-00
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 172.29.6.20
  }
  node {
    name: ceph-01
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 172.29.6.21
  }
  node {
    name: ceph-02
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 172.29.6.22
  }
  node {
    name: kvm-00
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 172.29.6.10
  }
  node {
    name: kvm-01
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 172.29.6.11
  }
  node {
    name: kvm-02
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 172.29.6.12
  }
  node {
    name: kvm-03
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 172.29.6.13
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Durham
  config_version: 7
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
Last edited:
Thank you for your reply.
I have set up the corosync network for this cluster, and all seems to be well.

However, I encountered problems when I tried to update the corosync.conf file - at any given time, the cluster did not think that it had quorum, and so I was unable to copy the edited corosync.conf file into place. I ended up having to shut nodes down and make the change, and then bring them back up, which was a delay and meant access to some VM was interrupted.

I have a second cluster which is in the same state, and which I will need to perform the same operation on.
Fortunately, I have available 1g ethernet interfaces which I can use for the network, and I have set them up with appropriate static IP addresses.

I can edit a copy of the corosync.conf file easily enough, and have all in readiness.

But I don't know the best method to update the conf file without getting into quorum disputes.

I can manage a small downtime including shutting down all VMs before making the changes.
What I want to avoid is a longer downtime, if I can.