Search results

  1. R

    slow migrations

    2021-10-15 15:49:06 migration active, transferred 12.5 GiB of 32.0 GiB VM-state, 49.5 MiB/s 2021-10-15 15:49:07 migration active, transferred 12.5 GiB of 32.0 GiB VM-state, 57.2 MiB/s 2021-10-15 15:49:08 migration active, transferred 12.6 GiB of 32.0 GiB VM-state, 37.1 MiB/s .. 2021-10-15...
  2. R

    slow migrations

    Hello I am still seeing slow migrations. let me know what I can do to debug. There could be some suggestions above and I'll check later as I've a big issue to deal with.. any suggestions to debug the issue?
  3. R

    [SOLVED] lxc upgrade to bullseye trouble

    - i had a left over from testing eth1 in interfaces . removing that fixed . - the cause of my other issue is probably due to removing /etc/apt/sources.list.d/turnkey.list some years ago - then not getting updates. thank you
  4. R

    [SOLVED] lxc upgrade to bullseye trouble

    interfaces has eth0 and eth1 , when pve networking config has just one device . weird. # cat interfaces # UNCONFIGURED INTERFACES # remove the above line if you edit this file auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 172.30.24.17/24 gateway...
  5. R

    [SOLVED] lxc upgrade to bullseye trouble

    I'll get back to this later on. here is more info # systemctl status inithooks.service ● inithooks.service - inithooks: firstboot and everyboot initialization scripts Loaded: loaded (/lib/systemd/system/inithooks.service; enabled; vendor preset: enabled) Active: failed (Result...
  6. R

    [SOLVED] lxc upgrade to bullseye trouble

    this lxc was originally created using Turnkey Debian 8 template some years ago [ when Debian 8 was stable ]. also the lxc restored from backup does have some issues # systemctl list-units --state=failed UNIT LOAD ACTIVE SUB DESCRIPTION...
  7. R

    [SOLVED] lxc upgrade to bullseye trouble

    debug mode output # cat start-debug-mode lxc-start -n $1 -F --logfile=lxc.log --logpriority=debug root@pve33:[/fbc/pve/lxc]:# ./start-debug-mode 22217 systemd 247.3-6 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL...
  8. R

    [SOLVED] lxc upgrade to bullseye trouble

    after rebooting the pve host the lxc got a network address [ probably restarting a systemd unit would have done the same thing.] however this still freezes: systemctl status and media server does not work. I'll try starting in debug mode later on.
  9. R

    [SOLVED] lxc upgrade to bullseye trouble

    hello, while doing apt dist-upgrade to bullseye we have seen upgrades get stuck at this point: Installing new version of config file /etc/systemd/resolved.conf ... Installing new version of config file /etc/systemd/system.conf ... Installing new version of config file /etc/systemd/user.conf...
  10. R

    slow migrations

    Hi, I went back and checked the reasoning for our not using bgp. Using static routes works too. see https://docs.nvidia.com/networking-ethernet-software/cumulus-linux-43/Layer-3/Routing/Static-Routing/ "You can use static routing if you don’t require the complexity of a dynamic routing...
  11. R

    slow migrations

    Spirit that link mentions vrr not vrrp . did you mean use vrr ?
  12. R

    slow migrations

    I just check kernlog at one of the cluster nodes, and the dmesg info is very different. there are lines like these: Sep 12 07:39:41 pve2 kernel: [ 3385.014666] INFO: task jbd2/rbd0-8:4096 blocked for more than 120 seconds. Sep 12 07:39:41 pve2 kernel: [ 3385.014698] "echo 0 >...
  13. R

    slow migrations

    Hello, i added this to /etc/pve/pve-local crontab and got a hit from a standalone pve system we use for off site backups. cron: 55 */4 * * * root dmesg -T | grep hung email from cron [Sun Sep 12 09:45:13 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this...
  14. R

    slow migrations

    Hello Spirit As far as I know, and will recheck we use VRR and do NOT have bgp and ecmp set up. would you suggest we just use VRRP ? my understanding of networking is not great and I could use advice on which way to set things up. we have 5 pve hosts, 1 pbs host, pfsense and 4 netgear...
  15. R

    slow migrations

    - yes they are used as router. per the following in pfsense lldp . ------------------------------------------------------------------------------- LLDP neighbors: ------------------------------------------------------------------------------- Interface: em0, via: LLDP, RID: 1, Time: 0...
  16. R

    slow migrations

    however I am certain that it is likely that I had or have something mis-configured in software settings. Or there is a bug at the switch etc. usually operator errors occur more often then bugs.
  17. R

    slow migrations

    - we use connect-x4 and -x5 - the mlag switches are mellanox 2500sn running cumulus . as for the loop i am certain the cables go from the connect-x* cards to the switch. cumulus has this command that shows the lldp of each connection. I ran this on both switches to verify that the port...
  18. R

    slow migrations

    so in our case the network we think was the cause. I will leave the thread open as someone else posted their issue, and will wait a few days to make sure there is not a repeat.
  19. R

    slow migrations

    also we get emails when ceph -s shows warnings and saw these: cluster: id: 220b9a53-4556-48e3-a73c-28deff665e45 health: HEALTH_WARN 1 slow ops, oldest one blocked for 82322 sec, mon.pve4 has slow ops services: mon: 3 daemons, quorum pve15,pve11,pve4 (age 22h)...
  20. R

    slow migrations

    1- I am fairly sure it is related to this seen at dmesg on pve hosts # dmesg|grep hung [Sun Sep 12 07:39:32 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Sep 12 07:39:32 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun...