Search results

  1. O

    Network stops working (was OSDs on one node keeps crashing)

    Hi, took some time. Reinstalled the mystery host with a clean proxmox 6.0 (theory was that the other hosts didn't show this due to having been upgraded). I ping another host once every 20 seconds and record that in a file with the date while true;do date>>ping.txt; ping -c 1 ip|grep...
  2. O

    Proxmox 6.0 renames network card and can not use it

    As I wrote earlier, my resolution was to change the NamePolicy to path only. I actually added the 99-default.link to /etc/systemd/network in order for it to persist. Like you note, onboard is not unique and slot seems to be the same as onboard. I don't care to have the whole MAC, so path is the...
  3. O

    Network stops working (was OSDs on one node keeps crashing)

    It was a very simple test, pinging another host every 20 seconds and checking when the ping failed (1 packets transmitted, 0 received, 100% packet loss, time 0ms). Yes - this was a fresh install with the 6.2-1 iso (pveversion pve-manager/6.2-4/982457a (running kernel: 5.4.34-1-pve))
  4. O

    Network stops working (was OSDs on one node keeps crashing)

    Hyper-V: nothing in the eventlog for Hyper-V-VmSwitch or Hyper-V-Compute inside the vm: dmesg -H - nothing from today syslog - Proxmox VE replicator every minute, nothing else around 01:00 auth.log - sessions opened by cron for root every hour pveam.log - failed update at 3:41 but nothing...
  5. O

    Network stops working (was OSDs on one node keeps crashing)

    Thanks - I am really scratching my head! I already changed the switch (but the machines are on the same network of course) and the nic. I haven't been able to find anything in BIOS that seems relevant and it's strange that it would affect a pve installation running on bare metal and a vm under...
  6. O

    Network stops working (was OSDs on one node keeps crashing)

    Another update in this mystery saga. I've installed pve in i Hyper-V VM on one of these HPE machines. The machine was installed with the new 6.2.1 iso, no updates. The pve installation is clean, not connected to the cluster, no ceph installed, only one virtual nic. I started a job that prints...
  7. O

    Network stops working (was OSDs on one node keeps crashing)

    I have now reinstalled proxmox multiple times on the node (HP DL380 Gen10). I am using only the new NIC (HP with 4xe1000). My most recent installation was with 6.0.1, upgraded to latest, running mon, mgr and mds and two osds. I've also moved all connections to another switch (cisco->netgear)...
  8. O

    Network stops working (was OSDs on one node keeps crashing)

    So now I have received another NIC with 4 ports. Took some time to get it working properly, but now it's working with all ports having consistent device names. I've bonded a port on the old NIC with a port on the new NIC for both my public network and the ceph cluster network. I still get my...
  9. O

    Proxmox 6.0 renames network card and can not use it

    I just changed /usr/lib/systemd/network/99-default.link to NamePolicy=path, seemed like only way to get consistent and persistent nic names. The currently implemented naming policy scheme is clearly incompatible with nic with multiple ports in the same pci slot.
  10. O

    Linux Kernel 5.4 for Proxmox VE

    I'm not well versed in kernel development, is there a list somewhere where I could see what new hardware is supported (like the RTL8125 mentioned above)?
  11. O

    Network stops working (was OSDs on one node keeps crashing)

    I have corosync on ring0 on it's own physical nic and physical switch. I'm mostly using the corosync syslog messages to detect the issue on the other nic. However, would it be interfering to have ring1 on a shared system - i.e. is it better to not have a second ring if it is on a shared network...
  12. O

    Network stops working (was OSDs on one node keeps crashing)

    corosync ring 0 eno4 (pve network for corosync) ring 1 vmbr0 (public network for VMs and ceph public) ceph cluster_network vmbr1 (ceph internal traffic) public_network vmbr0 (public network for VMs and ceph public) In the syslog corosync entries, only ring1 (vmbr0)...
  13. O

    Network stops working (was OSDs on one node keeps crashing)

    It's on a secure network so I can't copy paste. auto lo iface lo inet loopback iface eno1 inet manual iface eno2 inet manual iface eno3 inet manual auto eno4 iface eno4 inet static address xx.XX.xx.6/24 #pve network iface eno5 inet manual iface eno6 inet manual auto bond1 iface...
  14. O

    Network stops working (was OSDs on one node keeps crashing)

    I see in syslog on the troubled node (pve006) that at 00:58:11 corosync reports that link 1 to all other hosts are down (link 1 is the public nic), best link is 0 (which is the pve/corosync link). I see the same in syslog on another node, that at 00:58:11 the link 1 to pve006 goes down, link 0...
  15. O

    Network stops working (was OSDs on one node keeps crashing)

    I switched off the backups - same issue. I thought it was maybe lack of RAM (on the other machines in the cluster), so I've put more memory in them all - still same issue. I'm pretty sure it's something wrong in the network configuration, but I've redone it to make sure it's correct - same...
  16. O

    Network stops working (was OSDs on one node keeps crashing)

    Hi, I'm having an annoying problem with a 5 node cluster. All OSDs are crashing on one node just after midnight most nights. In syslog I can see that starting at 23:58:53 all the osds on the node start having issues with heartbeat_check with no reply from other osds. A minute later all the...
  17. O

    [SOLVED] 10GB NIC only works in 1GB mode

    Man - doing ip link up switched it to 10GB! Sorry for bothering with such a simple problem in the end, I just thought that the NIC and the switch would be negotiating the physical link without needing the OS involved. ip link before 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state...
  18. O

    [SOLVED] 10GB NIC only works in 1GB mode

    The connections are all TP copper, no SFPs. ethtool confirms 10GB for eno1 and eno2 but says no link even though I have a cable in eno1 connected to the 10GB switch. This is how I feel right now!
  19. O

    [SOLVED] 10GB NIC only works in 1GB mode

    Auto config - I haven't been able to set static 10GB, my options are auto config, auto-config 5GB and auto-config 2.5GB. They all auto configures to 1GB when connected to the X540. The purpose of the 10GB connection is to connect the ceph cluster network for the recovery data. I have other...
  20. O

    [SOLVED] 10GB NIC only works in 1GB mode

    Good point - I had a cable connected to a 1GB switch for my initial switch configuration. However, no difference with only 10GB connections. And it would be a very poor switch that couldn't negotiate 10GB for the x540 NIC when there was a 1GB cable connected. And even more strange that there is...