8.1 Update Hung/Failed

Quorum: 3 Activity blocked
Well, your case is different: You have a cluster and this node is not part of a quorum. Therefore it is in readonly mode. You will have to fix your corosync network for this node first, then the rest should work as expected.
 
Hi, I'm also able to reproduce on 1 node

postinst script is hanging

/bin/sh /var/lib/dpkg/info/pve-manager.postinst configure 8.1.3


edit:

it's hanging here, at restart of pvescheduler.service

Code:
    if test ! -e /proxmox_install_mode; then
        # modeled after code generated by dh_start
        for unit in ${UNITS}; do
            if test -n "$2"; then
                dh_action="reload-or-restart";
            else
                dh_action="start"
            fi
            if systemctl -q is-enabled "$unit"; then
                deb-systemd-invoke $dh_action "$unit"
            fi
        done
    fi


Code:
systemctl status pvescheduler.service
○ pvescheduler.service - Proxmox VE scheduler
     Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled)
     Active: inactive (dead)

Code:
 /usr/bin/pvescheduler start

---> hang
[CODE]
 
Last edited:
Hi, I'm also able to reproduce on 1 node

postinst script is hanging

/bin/sh /var/lib/dpkg/info/pve-manager.postinst configure 8.1.3


edit:

it's hanging here, at restart of pvescheduler.service

Code:
    if test ! -e /proxmox_install_mode; then
        # modeled after code generated by dh_start
        for unit in ${UNITS}; do
            if test -n "$2"; then
                dh_action="reload-or-restart";
            else
                dh_action="start"
            fi
            if systemctl -q is-enabled "$unit"; then
                deb-systemd-invoke $dh_action "$unit"
            fi
        done
    fi


Code:
systemctl status pvescheduler.service
○ pvescheduler.service - Proxmox VE scheduler
     Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled)
     Active: inactive (dead)

Code:
 /usr/bin/pvescheduler start

---> hang
[CODE]
Hi @spirit ,
so you are saying that although the second parameters given to postinstall is the pve-manager version, and therefore the dh_action reload-or-restart should be executed according to the logic in the script, the dh_action start is triggered instead? Do you see this from the output of ps auxwf

Edit: For me the reload-or-restart action is triggered when providing the second parameter, as expected.

P.S. does this node is part of the quorate corosync network segment?
 
Last edited:
Hi @spirit ,
so you are saying that although the second parameters given to postinstall is the pve-manager version, and therefore the dh_action reload-or-restart should be executed according to the logic in the script, the dh_action start is triggered instead? Do you see this from the output of ps auxwf

Edit: For me the reload-or-restart action is triggered when providing the second parameter, as expected.

P.S. does this node is part of the quorate corosync network segment?
yes, quorum is fine , /etc/pve is writable
Code:
Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate


Code:
ps -aux:
  systemctl reload-or-restart pvescheduler.service

but currently, pvescheduler is dead and not running

Code:
○ pvescheduler.service - Proxmox VE scheduler
     Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled)
     Active: inactive (dead)


pvescheduler is not starting, totally hanging



I'm currently debug it,

Code:
/usr/share/perl5/PVE/Service/pvescheduler.pm

    for (my $count = 1000;;$count++) {
        return if $self->{got_hup_signal}; # keep workers running, PVE::Daemon re-execs us on return
        last if $self->{shutdown_request}; # exit main-run loop for shutdown

        $run_jobs->();
--->hang here



    my $run_jobs = sub {
        # TODO: actually integrate replication in PVE::Jobs and do not always fork here, we could
        # do the state lookup and check if there's new work scheduled before doing so, e.g., by
        # extending the PVE::Jobs interfacae e.g.;
        # my $scheduled_jobs = PVE::Jobs::get_pending() or return;
        # forked { PVE::Jobs::run_jobs($scheduled_jobs) }
        $fork->('replication', sub {
            PVE::API2::Replication::run_jobs(undef, sub {}, 0, 1);
        });

        $fork->('jobs', sub {
            PVE::Jobs::run_jobs($first_run);
        });
----> hang here
 
Can't help anymore, I got it working again after migrating vm to another node.

I really don't known why it could impact the pvescheduler start.

I don't run any jobs (no backup job, no replication job), the jobs.cfg don't even exist.

Maybe is it system related with pids && fork , I really don't work.
 
Can't help anymore, I got it working again after migrating vm to another node.

I really don't known why it could impact the pvescheduler start.

I don't run any jobs (no backup job, no replication job), the jobs.cfg don't even exist.

Maybe is it system related with pids && fork , I really don't work.
To bad, thanks for your input anyways. Also, I am not sure you ran into the same issue as the others, the pvescheduler.service being dead is somewant new. For future readers with the same issue, include the output of systemctl status pvescheduler.service as well as the branch below it from the output of ps auxwf.
 
To bad, thanks for your input anyways. Also, I am not sure you ran into the same issue as the others, the pvescheduler.service being dead is somewant new. For future readers with the same issue, include the output of systemctl status pvescheduler.service as well as the branch below it from the output of ps auxwf.
pvescheduler does not start.. systemctl start just hangs indefinitely
Code:
root@mosh:~# pvecm status
Cluster information
-------------------
Name:             proxcluster
Config Version:   4
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Dec  4 19:37:52 2023
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.150a
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      3
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.10.201 (local)
0x00000002          1 10.10.10.202
0x00000004          1 10.10.10.204
root@mosh:~# systemctl restart pvescheduler.service
^C
root@mosh:~# systemctl status pvescheduler.service
○ pvescheduler.service - Proxmox VE scheduler
     Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled)
     Active: inactive (dead)
root@mosh:~#

ps while restart hangs
Code:
root@mosh:~# ps auxwf
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           2  0.0  0.0      0     0 ?        S    19:37   0:00 [kthreadd]
<...>
root         170  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/u8:3-events_unbound]
root         171  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/u8:4-events_unbound]
root         172  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/u8:5-flush-252:1]
root         177  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:0]
root         178  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:1]
root         186  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/3:2-cgroup_destroy]
root         191  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [dm_bufio_cache]
root         192  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [uas]
root         195  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [scsi_eh_6]
root         196  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [scsi_tmf_6]
root         199  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:2]
root         200  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:3]
root         216  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:4]
root         217  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kcopyd]
root         218  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [dm-thin]
root         219  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:5]
root         231  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/2:2-events]
root         259  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [jbd2/dm-1-8]
root         260  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [ext4-rsv-conver]
root         326  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [spl_system_task]
root         327  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [spl_delay_taskq]
root         328  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [spl_dynamic_tas]
root         329  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [spl_kmem_cache]
root         337  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [zvol]
root         338  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [arc_prune]
root         339  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [arc_evict]
root         340  0.0  0.0      0     0 ?        SN   19:37   0:00  \_ [arc_reap]
root         341  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [dbu_evict]
root         342  0.0  0.0      0     0 ?        SN   19:37   0:00  \_ [dbuf_evict]
root         395  0.0  0.0      0     0 ?        SN   19:37   0:00  \_ [z_vdev_file]
root         463  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [irq/133-mei_me]
root         505  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [cryptd]
root         506  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [ttm]
root         508  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [card0-crtc0]
root         509  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [card0-crtc1]
root         510  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [card0-crtc2]
root         511  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [l2arc_feed]
root         617  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [rpciod]
root         621  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [xprtiod]
root         641  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [tls-strp]
root         800  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/3:4-mm_percpu_wq]
root        1233  0.0  0.0   2460  1024 ?        S    19:37   0:00  \_ bpfilter_umh
root        2765  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [ceph-msgr]
root        2766  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [rbd]
root        2774  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [ceph-watch-noti]
root        2775  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [ceph-completion]
root        2776  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [rbd0-tasks]
root        3075  0.0  0.0      0     0 ?        I    19:42   0:00  \_ [kworker/1:0]
root           1  0.4  0.0 169712 13960 ?        Ss   19:37   0:01 /sbin/init
root         311  0.0  0.0  33220 12288 ?        Ss   19:37   0:00 /lib/systemd/systemd-journald
root         323  0.0  0.1  80580 24960 ?        SLsl 19:37   0:00 /sbin/dmeventd -f
root         333  0.0  0.0  27372  6980 ?        Ss   19:37   0:00 /lib/systemd/systemd-udevd
_rpc         585  0.0  0.0   7876  3968 ?        Ss   19:37   0:00 /sbin/rpcbind -f -w
ceph         593  0.0  0.0  20304 13056 ?        Ss   19:37   0:00 /usr/bin/python3 /usr/bin/ceph-crash
message+     594  0.0  0.0   9128  4480 ?        Ss   19:37   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         597  0.0  0.0 152748  2432 ?        Ssl  19:37   0:00 /usr/bin/lxcfs /var/lib/lxcfs
root         598  0.0  0.0 278156  3712 ?        Ssl  19:37   0:00 /usr/lib/x86_64-linux-gnu/pve-lxc-syscalld/pve-lxc-syscalld --system /run/pve/lxc-syscalld.sock
root         600  0.0  0.0  11908  6144 ?        Ss   19:37   0:00 /usr/sbin/smartd -n -q never
root         607  0.0  0.0   7064  1996 ?        S    19:37   0:00 /bin/bash /usr/sbin/ksmtuned
root        3060  0.0  0.0   5464  1664 ?        S    19:42   0:00  \_ sleep 60
root         608  0.0  0.0   5308  1280 ?        Ss   19:37   0:00 /usr/sbin/qmeventd /var/run/qmeventd.sock
root         611  0.0  0.0  25364  7808 ?        Ss   19:37   0:00 /lib/systemd/systemd-logind
root         614  0.0  0.0   2332  1280 ?        Ss   19:37   0:00 /usr/sbin/watchdog-mux
root         616  0.0  0.0 101388  5504 ?        Ssl  19:37   0:00 /usr/sbin/zed -F
root         740  0.0  0.0   5024  2304 ?        Ss   19:37   0:00 /usr/libexec/lxc/lxc-monitord --daemon
_chrony      783  0.0  0.0  18860  3104 ?        S    19:37   0:00 /usr/sbin/chronyd -F 1
_chrony      793  0.0  0.0  10532  2368 ?        S    19:37   0:00  \_ /usr/sbin/chronyd -F 1
root         798  0.0  0.0  15408  9344 ?        Ss   19:37   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root        1991  0.0  0.0  17972 11264 ?        Ss   19:37   0:00  \_ sshd: root@pts/0
root        2014  0.0  0.0   8100  4736 pts/0    Ss   19:37   0:00  |   \_ -bash
root        3076  0.0  0.0  11216  4736 pts/0    R+   19:42   0:00  |       \_ ps auxwf
root        2855  0.0  0.0  17972 11264 ?        Ss   19:38   0:00  \_ sshd: root@pts/1
root        2862  0.0  0.0   8100  4864 pts/1    Ss   19:38   0:00      \_ -bash
root        3067  0.0  0.0  16424  5760 pts/1    S+   19:42   0:00          \_ systemctl restart pvescheduler.service
root        3068  0.0  0.0  16296  6400 pts/1    S+   19:42   0:00              \_ /bin/systemd-tty-ask-password-agent --watch
root         817  0.0  0.0   5872  1792 tty1     Ss+  19:37   0:00 /sbin/agetty -o -p -- \u --noclear - linux
root         853  0.0  0.0 440632  3404 ?        Ssl  19:37   0:00 /usr/bin/rrdcached -B -b /var/lib/rrdcached/db/ -j /var/lib/rrdcached/journal/ -p /var/run/rrdcached.pid -l unix:/var/run/rrdcached.sock
root         869  0.0  0.2 540220 46360 ?        Ssl  19:37   0:00 /usr/bin/pmxcfs
root         982  0.0  0.0  42656  4500 ?        Ss   19:37   0:00 /usr/lib/postfix/sbin/master -w
postfix      985  0.0  0.0  43044  6784 ?        S    19:37   0:00  \_ pickup -l -t unix -u -c
postfix      986  0.0  0.0  43092  6784 ?        S    19:37   0:00  \_ qmgr -l -t unix -u
root         991  0.0  0.0  79184  2176 ?        Ssl  19:37   0:00 /usr/sbin/pvefw-logger
ceph         998  1.4  2.4 1295080 393856 ?      Ssl  19:37   0:04 /usr/bin/ceph-mgr -f --cluster ceph --id mosh --setuser ceph --setgroup ceph
ceph        1009  0.7  0.4 290104 66920 ?        Ssl  19:37   0:02 /usr/bin/ceph-mon -f --cluster ceph --id mosh --setuser ceph --setgroup ceph
root        1010  0.0  0.0   2576  1536 ?        Ss   19:37   0:00 /bin/sh -c timeout $CEPH_VOLUME_TIMEOUT /usr/sbin/ceph-volume-systemd lvm-0-5b983362-f095-40f1-bf83-92f4308ac840
root        1027  0.0  0.0   5472  1664 ?        S    19:37   0:00  \_ timeout 10000 /usr/sbin/ceph-volume-systemd lvm-0-5b983362-f095-40f1-bf83-92f4308ac840
root        1029  0.0  0.0  23100 15872 ?        S    19:37   0:00      \_ /usr/bin/python3 /usr/sbin/ceph-volume-systemd lvm-0-5b983362-f095-40f1-bf83-92f4308ac840
root        2895  0.0  0.1  33964 26880 ?        S    19:38   0:00          \_ /usr/bin/python3 /usr/sbin/ceph-volume lvm trigger 0-5b983362-f095-40f1-bf83-92f4308ac840
root        2901  0.0  0.0  26508  9600 ?        D    19:38   0:00              \_ lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=0,ceph.osd_fsid=5b983362-f095-40f1-bf83-92f4308ac840} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
root        1012  0.0  0.0   2576  1536 ?        Ss   19:37   0:00 /bin/sh -c timeout $CEPH_VOLUME_TIMEOUT /usr/sbin/ceph-volume-systemd lvm-4-1a2d901f-3121-4939-8630-a683775d8df8
root        1014  0.0  0.0   5472  1664 ?        S    19:37   0:00  \_ timeout 10000 /usr/sbin/ceph-volume-systemd lvm-4-1a2d901f-3121-4939-8630-a683775d8df8
root        1015  0.0  0.0  23100 16000 ?        S    19:37   0:00      \_ /usr/bin/python3 /usr/sbin/ceph-volume-systemd lvm-4-1a2d901f-3121-4939-8630-a683775d8df8
root        2893  0.0  0.1  33964 26752 ?        S    19:38   0:00          \_ /usr/bin/python3 /usr/sbin/ceph-volume lvm trigger 4-1a2d901f-3121-4939-8630-a683775d8df8
root        2897  0.0  0.0  26508  9472 ?        D    19:38   0:00              \_ lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=4,ceph.osd_fsid=1a2d901f-3121-4939-8630-a683775d8df8} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
root        1013  0.0  0.0   2576  1408 ?        Ss   19:37   0:00 /bin/sh -c timeout $CEPH_VOLUME_TIMEOUT /usr/sbin/ceph-volume-systemd lvm-4-67a21be9-f017-4886-92be-fcd5e0031f0d
root        1016  0.0  0.0   5472  1664 ?        S    19:37   0:00  \_ timeout 10000 /usr/sbin/ceph-volume-systemd lvm-4-67a21be9-f017-4886-92be-fcd5e0031f0d
root        1017  0.0  0.0  23100 15596 ?        S    19:37   0:00      \_ /usr/bin/python3 /usr/sbin/ceph-volume-systemd lvm-4-67a21be9-f017-4886-92be-fcd5e0031f0d
root        2894  0.0  0.1  33964 26880 ?        S    19:38   0:00          \_ /usr/bin/python3 /usr/sbin/ceph-volume lvm trigger 4-67a21be9-f017-4886-92be-fcd5e0031f0d
root        2899  0.0  0.0  26508  9600 ?        D    19:38   0:00              \_ lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=4,ceph.osd_fsid=67a21be9-f017-4886-92be-fcd5e0031f0d} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
root        1019  1.0  1.0 561512 168816 ?       SLsl 19:37   0:03 /usr/sbin/corosync -f
root        1020  0.0  0.0   6608  2560 ?        Ss   19:37   0:00 /usr/sbin/cron -f
root        1227  0.0  0.6 157320 98756 ?        Ss   19:37   0:00 pve-firewall
root        1228  0.0  0.5 152160 94848 ?        Ss   19:37   0:00 pvestatd
root        2133  0.0  0.0   5024  2816 ?        S    19:37   0:00  \_ lxc-info -n 102 -p
root        1316  0.0  0.8 233288 137264 ?       Ss   19:37   0:00 pvedaemon
root        1317  0.0  0.8 233552 138036 ?       S    19:37   0:00  \_ pvedaemon worker
root        1318  0.0  0.8 233552 137908 ?       S    19:37   0:00  \_ pvedaemon worker
root        1319  0.0  0.8 233552 138036 ?       S    19:37   0:00  \_ pvedaemon worker
ceph        1338  6.6  1.5 1079672 250880 ?      Ssl  19:37   0:20 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
root        1418  0.0  0.6 219280 112016 ?       Ss   19:37   0:00 pve-ha-crm
root        1995  0.0  0.0  19092 10752 ?        Ss   19:37   0:00 /lib/systemd/systemd --user
root        1996  0.0  0.0 170772  6804 ?        S    19:37   0:00  \_ (sd-pam)
www-data    2021  0.0  0.8 234548 138456 ?       Ss   19:37   0:00 pveproxy
www-data    2022  0.0  0.8 234824 142680 ?       S    19:37   0:00  \_ pveproxy worker
www-data    2023  0.0  0.8 234824 142680 ?       S    19:37   0:00  \_ pveproxy worker
www-data    2024  0.0  0.8 234824 142680 ?       S    19:37   0:00  \_ pveproxy worker
www-data    2029  0.0  0.3  80800 53384 ?        Ss   19:37   0:00 spiceproxy
www-data    2030  0.0  0.3  81008 54160 ?        S    19:37   0:00  \_ spiceproxy worker
root        2037  0.0  0.6 218860 111520 ?       Ss   19:37   0:00 pve-ha-lrm
root        2039  0.3  0.9 234328 157088 ?       Ss   19:37   0:00 /usr/bin/perl /usr/bin/pvesh --nooutput create /nodes/localhost/startall
root        2059  0.0  0.8 241560 132632 ?       Ss   19:37   0:00  \_ task UPID:mosh:0000080B:0000104A:656E1C7E:startall::root@pam:
root        2060  0.0  0.8 248816 132476 ?       Ss   19:37   0:00      \_ task UPID:mosh:0000080C:0000104D:656E1C7E:vzstart:102:root@pam:
root        2063  0.0  0.0   5024  3072 ?        Ss   19:37   0:00 /usr/bin/lxc-start -F -n 102
root        2064  0.1  0.6 132180 99832 ?        S    19:37   0:00  \_ /usr/bin/perl /usr/share/lxc/hooks/lxc-pve-prestart-hook 102 lxc pre-start
root        2741  0.0  0.2 778260 33908 ?        Sl   19:38   0:00      \_ /usr/bin/rbd -p RepSSDPool-01 -c /etc/pve/ceph.conf --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/ECHDDPool-03.keyring map vm-102-disk-0
 
pvescheduler does not start.. systemctl start just hangs indefinitely
Code:
root@mosh:~# pvecm status
Cluster information
-------------------
Name:             proxcluster
Config Version:   4
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Dec  4 19:37:52 2023
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.150a
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      3
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.10.10.201 (local)
0x00000002          1 10.10.10.202
0x00000004          1 10.10.10.204
root@mosh:~# systemctl restart pvescheduler.service
^C
root@mosh:~# systemctl status pvescheduler.service
○ pvescheduler.service - Proxmox VE scheduler
     Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled)
     Active: inactive (dead)
root@mosh:~#

ps while restart hangs
Code:
root@mosh:~# ps auxwf
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           2  0.0  0.0      0     0 ?        S    19:37   0:00 [kthreadd]
<...>
root         170  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/u8:3-events_unbound]
root         171  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/u8:4-events_unbound]
root         172  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/u8:5-flush-252:1]
root         177  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:0]
root         178  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:1]
root         186  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/3:2-cgroup_destroy]
root         191  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [dm_bufio_cache]
root         192  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [uas]
root         195  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [scsi_eh_6]
root         196  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [scsi_tmf_6]
root         199  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:2]
root         200  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:3]
root         216  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:4]
root         217  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kcopyd]
root         218  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [dm-thin]
root         219  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [kdmflush/252:5]
root         231  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/2:2-events]
root         259  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [jbd2/dm-1-8]
root         260  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [ext4-rsv-conver]
root         326  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [spl_system_task]
root         327  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [spl_delay_taskq]
root         328  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [spl_dynamic_tas]
root         329  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [spl_kmem_cache]
root         337  0.0  0.0      0     0 ?        S<   19:37   0:00  \_ [zvol]
root         338  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [arc_prune]
root         339  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [arc_evict]
root         340  0.0  0.0      0     0 ?        SN   19:37   0:00  \_ [arc_reap]
root         341  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [dbu_evict]
root         342  0.0  0.0      0     0 ?        SN   19:37   0:00  \_ [dbuf_evict]
root         395  0.0  0.0      0     0 ?        SN   19:37   0:00  \_ [z_vdev_file]
root         463  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [irq/133-mei_me]
root         505  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [cryptd]
root         506  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [ttm]
root         508  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [card0-crtc0]
root         509  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [card0-crtc1]
root         510  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [card0-crtc2]
root         511  0.0  0.0      0     0 ?        S    19:37   0:00  \_ [l2arc_feed]
root         617  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [rpciod]
root         621  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [xprtiod]
root         641  0.0  0.0      0     0 ?        I<   19:37   0:00  \_ [tls-strp]
root         800  0.0  0.0      0     0 ?        I    19:37   0:00  \_ [kworker/3:4-mm_percpu_wq]
root        1233  0.0  0.0   2460  1024 ?        S    19:37   0:00  \_ bpfilter_umh
root        2765  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [ceph-msgr]
root        2766  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [rbd]
root        2774  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [ceph-watch-noti]
root        2775  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [ceph-completion]
root        2776  0.0  0.0      0     0 ?        I<   19:38   0:00  \_ [rbd0-tasks]
root        3075  0.0  0.0      0     0 ?        I    19:42   0:00  \_ [kworker/1:0]
root           1  0.4  0.0 169712 13960 ?        Ss   19:37   0:01 /sbin/init
root         311  0.0  0.0  33220 12288 ?        Ss   19:37   0:00 /lib/systemd/systemd-journald
root         323  0.0  0.1  80580 24960 ?        SLsl 19:37   0:00 /sbin/dmeventd -f
root         333  0.0  0.0  27372  6980 ?        Ss   19:37   0:00 /lib/systemd/systemd-udevd
_rpc         585  0.0  0.0   7876  3968 ?        Ss   19:37   0:00 /sbin/rpcbind -f -w
ceph         593  0.0  0.0  20304 13056 ?        Ss   19:37   0:00 /usr/bin/python3 /usr/bin/ceph-crash
message+     594  0.0  0.0   9128  4480 ?        Ss   19:37   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         597  0.0  0.0 152748  2432 ?        Ssl  19:37   0:00 /usr/bin/lxcfs /var/lib/lxcfs
root         598  0.0  0.0 278156  3712 ?        Ssl  19:37   0:00 /usr/lib/x86_64-linux-gnu/pve-lxc-syscalld/pve-lxc-syscalld --system /run/pve/lxc-syscalld.sock
root         600  0.0  0.0  11908  6144 ?        Ss   19:37   0:00 /usr/sbin/smartd -n -q never
root         607  0.0  0.0   7064  1996 ?        S    19:37   0:00 /bin/bash /usr/sbin/ksmtuned
root        3060  0.0  0.0   5464  1664 ?        S    19:42   0:00  \_ sleep 60
root         608  0.0  0.0   5308  1280 ?        Ss   19:37   0:00 /usr/sbin/qmeventd /var/run/qmeventd.sock
root         611  0.0  0.0  25364  7808 ?        Ss   19:37   0:00 /lib/systemd/systemd-logind
root         614  0.0  0.0   2332  1280 ?        Ss   19:37   0:00 /usr/sbin/watchdog-mux
root         616  0.0  0.0 101388  5504 ?        Ssl  19:37   0:00 /usr/sbin/zed -F
root         740  0.0  0.0   5024  2304 ?        Ss   19:37   0:00 /usr/libexec/lxc/lxc-monitord --daemon
_chrony      783  0.0  0.0  18860  3104 ?        S    19:37   0:00 /usr/sbin/chronyd -F 1
_chrony      793  0.0  0.0  10532  2368 ?        S    19:37   0:00  \_ /usr/sbin/chronyd -F 1
root         798  0.0  0.0  15408  9344 ?        Ss   19:37   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root        1991  0.0  0.0  17972 11264 ?        Ss   19:37   0:00  \_ sshd: root@pts/0
root        2014  0.0  0.0   8100  4736 pts/0    Ss   19:37   0:00  |   \_ -bash
root        3076  0.0  0.0  11216  4736 pts/0    R+   19:42   0:00  |       \_ ps auxwf
root        2855  0.0  0.0  17972 11264 ?        Ss   19:38   0:00  \_ sshd: root@pts/1
root        2862  0.0  0.0   8100  4864 pts/1    Ss   19:38   0:00      \_ -bash
root        3067  0.0  0.0  16424  5760 pts/1    S+   19:42   0:00          \_ systemctl restart pvescheduler.service
root        3068  0.0  0.0  16296  6400 pts/1    S+   19:42   0:00              \_ /bin/systemd-tty-ask-password-agent --watch
root         817  0.0  0.0   5872  1792 tty1     Ss+  19:37   0:00 /sbin/agetty -o -p -- \u --noclear - linux
root         853  0.0  0.0 440632  3404 ?        Ssl  19:37   0:00 /usr/bin/rrdcached -B -b /var/lib/rrdcached/db/ -j /var/lib/rrdcached/journal/ -p /var/run/rrdcached.pid -l unix:/var/run/rrdcached.sock
root         869  0.0  0.2 540220 46360 ?        Ssl  19:37   0:00 /usr/bin/pmxcfs
root         982  0.0  0.0  42656  4500 ?        Ss   19:37   0:00 /usr/lib/postfix/sbin/master -w
postfix      985  0.0  0.0  43044  6784 ?        S    19:37   0:00  \_ pickup -l -t unix -u -c
postfix      986  0.0  0.0  43092  6784 ?        S    19:37   0:00  \_ qmgr -l -t unix -u
root         991  0.0  0.0  79184  2176 ?        Ssl  19:37   0:00 /usr/sbin/pvefw-logger
ceph         998  1.4  2.4 1295080 393856 ?      Ssl  19:37   0:04 /usr/bin/ceph-mgr -f --cluster ceph --id mosh --setuser ceph --setgroup ceph
ceph        1009  0.7  0.4 290104 66920 ?        Ssl  19:37   0:02 /usr/bin/ceph-mon -f --cluster ceph --id mosh --setuser ceph --setgroup ceph
root        1010  0.0  0.0   2576  1536 ?        Ss   19:37   0:00 /bin/sh -c timeout $CEPH_VOLUME_TIMEOUT /usr/sbin/ceph-volume-systemd lvm-0-5b983362-f095-40f1-bf83-92f4308ac840
root        1027  0.0  0.0   5472  1664 ?        S    19:37   0:00  \_ timeout 10000 /usr/sbin/ceph-volume-systemd lvm-0-5b983362-f095-40f1-bf83-92f4308ac840
root        1029  0.0  0.0  23100 15872 ?        S    19:37   0:00      \_ /usr/bin/python3 /usr/sbin/ceph-volume-systemd lvm-0-5b983362-f095-40f1-bf83-92f4308ac840
root        2895  0.0  0.1  33964 26880 ?        S    19:38   0:00          \_ /usr/bin/python3 /usr/sbin/ceph-volume lvm trigger 0-5b983362-f095-40f1-bf83-92f4308ac840
root        2901  0.0  0.0  26508  9600 ?        D    19:38   0:00              \_ lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=0,ceph.osd_fsid=5b983362-f095-40f1-bf83-92f4308ac840} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
root        1012  0.0  0.0   2576  1536 ?        Ss   19:37   0:00 /bin/sh -c timeout $CEPH_VOLUME_TIMEOUT /usr/sbin/ceph-volume-systemd lvm-4-1a2d901f-3121-4939-8630-a683775d8df8
root        1014  0.0  0.0   5472  1664 ?        S    19:37   0:00  \_ timeout 10000 /usr/sbin/ceph-volume-systemd lvm-4-1a2d901f-3121-4939-8630-a683775d8df8
root        1015  0.0  0.0  23100 16000 ?        S    19:37   0:00      \_ /usr/bin/python3 /usr/sbin/ceph-volume-systemd lvm-4-1a2d901f-3121-4939-8630-a683775d8df8
root        2893  0.0  0.1  33964 26752 ?        S    19:38   0:00          \_ /usr/bin/python3 /usr/sbin/ceph-volume lvm trigger 4-1a2d901f-3121-4939-8630-a683775d8df8
root        2897  0.0  0.0  26508  9472 ?        D    19:38   0:00              \_ lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=4,ceph.osd_fsid=1a2d901f-3121-4939-8630-a683775d8df8} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
root        1013  0.0  0.0   2576  1408 ?        Ss   19:37   0:00 /bin/sh -c timeout $CEPH_VOLUME_TIMEOUT /usr/sbin/ceph-volume-systemd lvm-4-67a21be9-f017-4886-92be-fcd5e0031f0d
root        1016  0.0  0.0   5472  1664 ?        S    19:37   0:00  \_ timeout 10000 /usr/sbin/ceph-volume-systemd lvm-4-67a21be9-f017-4886-92be-fcd5e0031f0d
root        1017  0.0  0.0  23100 15596 ?        S    19:37   0:00      \_ /usr/bin/python3 /usr/sbin/ceph-volume-systemd lvm-4-67a21be9-f017-4886-92be-fcd5e0031f0d
root        2894  0.0  0.1  33964 26880 ?        S    19:38   0:00          \_ /usr/bin/python3 /usr/sbin/ceph-volume lvm trigger 4-67a21be9-f017-4886-92be-fcd5e0031f0d
root        2899  0.0  0.0  26508  9600 ?        D    19:38   0:00              \_ lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=4,ceph.osd_fsid=67a21be9-f017-4886-92be-fcd5e0031f0d} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
root        1019  1.0  1.0 561512 168816 ?       SLsl 19:37   0:03 /usr/sbin/corosync -f
root        1020  0.0  0.0   6608  2560 ?        Ss   19:37   0:00 /usr/sbin/cron -f
root        1227  0.0  0.6 157320 98756 ?        Ss   19:37   0:00 pve-firewall
root        1228  0.0  0.5 152160 94848 ?        Ss   19:37   0:00 pvestatd
root        2133  0.0  0.0   5024  2816 ?        S    19:37   0:00  \_ lxc-info -n 102 -p
root        1316  0.0  0.8 233288 137264 ?       Ss   19:37   0:00 pvedaemon
root        1317  0.0  0.8 233552 138036 ?       S    19:37   0:00  \_ pvedaemon worker
root        1318  0.0  0.8 233552 137908 ?       S    19:37   0:00  \_ pvedaemon worker
root        1319  0.0  0.8 233552 138036 ?       S    19:37   0:00  \_ pvedaemon worker
ceph        1338  6.6  1.5 1079672 250880 ?      Ssl  19:37   0:20 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
root        1418  0.0  0.6 219280 112016 ?       Ss   19:37   0:00 pve-ha-crm
root        1995  0.0  0.0  19092 10752 ?        Ss   19:37   0:00 /lib/systemd/systemd --user
root        1996  0.0  0.0 170772  6804 ?        S    19:37   0:00  \_ (sd-pam)
www-data    2021  0.0  0.8 234548 138456 ?       Ss   19:37   0:00 pveproxy
www-data    2022  0.0  0.8 234824 142680 ?       S    19:37   0:00  \_ pveproxy worker
www-data    2023  0.0  0.8 234824 142680 ?       S    19:37   0:00  \_ pveproxy worker
www-data    2024  0.0  0.8 234824 142680 ?       S    19:37   0:00  \_ pveproxy worker
www-data    2029  0.0  0.3  80800 53384 ?        Ss   19:37   0:00 spiceproxy
www-data    2030  0.0  0.3  81008 54160 ?        S    19:37   0:00  \_ spiceproxy worker
root        2037  0.0  0.6 218860 111520 ?       Ss   19:37   0:00 pve-ha-lrm
root        2039  0.3  0.9 234328 157088 ?       Ss   19:37   0:00 /usr/bin/perl /usr/bin/pvesh --nooutput create /nodes/localhost/startall
root        2059  0.0  0.8 241560 132632 ?       Ss   19:37   0:00  \_ task UPID:mosh:0000080B:0000104A:656E1C7E:startall::root@pam:
root        2060  0.0  0.8 248816 132476 ?       Ss   19:37   0:00      \_ task UPID:mosh:0000080C:0000104D:656E1C7E:vzstart:102:root@pam:
root        2063  0.0  0.0   5024  3072 ?        Ss   19:37   0:00 /usr/bin/lxc-start -F -n 102
root        2064  0.1  0.6 132180 99832 ?        S    19:37   0:00  \_ /usr/bin/perl /usr/share/lxc/hooks/lxc-pve-prestart-hook 102 lxc pre-start
root        2741  0.0  0.2 778260 33908 ?        Sl   19:38   0:00      \_ /usr/bin/rbd -p RepSSDPool-01 -c /etc/pve/ceph.conf --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/ECHDDPool-03.keyring map vm-102-disk-0

With all nodes up:

root@mosh:~# pvecm status Cluster information ------------------- Name: proxcluster Config Version: 4 Transport: knet Secure auth: on Quorum information ------------------ Date: Mon Dec 4 21:38:29 2023 Quorum provider: corosync_votequorum Nodes: 4 Node ID: 0x00000001 Ring ID: 1.152d Quorate: Yes Votequorum information ---------------------- Expected votes: 4 Highest expected: 4 Total votes: 4 Quorum: 3 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 10.10.10.201 (local) 0x00000002 1 10.10.10.202 0x00000003 1 10.10.10.203 0x00000004 1 10.10.10.204 root@mosh:~# systemctl status pvescheduler.service ○ pvescheduler.service - Proxmox VE scheduler Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled) Active: inactive (dead) root@mosh:~# dpkg-reconfigure pve-manager root@mosh:~# systemctl status pvescheduler.service ○ pvescheduler.service - Proxmox VE scheduler Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled) Active: inactive (dead) root@mosh:~# systemctl start pvescheduler.service ^C
 
With all nodes up:

root@mosh:~# pvecm status Cluster information ------------------- Name: proxcluster Config Version: 4 Transport: knet Secure auth: on Quorum information ------------------ Date: Mon Dec 4 21:38:29 2023 Quorum provider: corosync_votequorum Nodes: 4 Node ID: 0x00000001 Ring ID: 1.152d Quorate: Yes Votequorum information ---------------------- Expected votes: 4 Highest expected: 4 Total votes: 4 Quorum: 3 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 10.10.10.201 (local) 0x00000002 1 10.10.10.202 0x00000003 1 10.10.10.203 0x00000004 1 10.10.10.204 root@mosh:~# systemctl status pvescheduler.service ○ pvescheduler.service - Proxmox VE scheduler Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled) Active: inactive (dead) root@mosh:~# dpkg-reconfigure pve-manager root@mosh:~# systemctl status pvescheduler.service ○ pvescheduler.service - Proxmox VE scheduler Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled) Active: inactive (dead) root@mosh:~# systemctl start pvescheduler.service ^C
Hi,
please try to start the pvescheduler directly by calling /usr/bin/pvescheduler start from the cli. Does this hang as well?
If that is the case, please try to generate a syscall trace via strace by calling strace -ttyyf -s 512 -o /tmp/pvescheduler.trace /usr/bin/pvescheduler start and attach the generated pvescheduler.trace.
 
Attached strace file.
Thanks for the syscall traces. Can you please also share the output of fuser -v /var/run/pvescheduler.pid.lock, then /usr/bin/pvescheduler stop and once again the first output?
If the command states that the file is not present, then that is fine, otherwise remove it at this point.

After that, also please run a systemctl daemon-reload. Does this hang for you as well? If not, run /usr/bin/pvescheduler start once again, and check the output of fuser -v /var/run/pvescheduler.pid.lock once again.

Edit: Also, please post the content of /etc/pve/jobs.cfg and ls -l /etc/pve/priv/lock
 
Last edited:
Im facing the same issue on one of my nodes.
Code:
ast login: Wed Dec  6 18:21:35 2023
root@pve2:~# fuser -v /var/run/pvescheduler.pid.lock
Specified filename /var/run/pvescheduler.pid.lock does not exist.

root@pve2:~# /usr/bin/pvescheduler stop
root@pve2:~# fuser -v /var/run/pvescheduler.pid.lock
Specified filename /var/run/pvescheduler.pid.lock does not exist.

root@pve2:~# systemctl daemon-reload
root@pve2:~# fuser -v /var/run/pvescheduler.pid.lock
Specified filename /var/run/pvescheduler.pid.lock does not exist.

root@pve2:~# /usr/bin/pvescheduler status
stopped

root@pve2:~# service pvescheduler status
○ pvescheduler.service - Proxmox VE scheduler
     Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: en>
     Active: inactive (dead)
root@pve2:~#
root@pve2:~#

root@pve2:~# cat /etc/pve/jobs.cfg
vzdump: d4002806256179b179e84f7e2d0a5e71d52d7769:1
        schedule sat 01:00
        compress zstd
        enabled 1
        mailnotification failure
        mailto support@zim-service.ru
        mode snapshot
        quiet 1
        storage pve2_nfs_general
        vmid 301,302,303,304,401,402,404,403,244,248,250

root@pve2:~# ls -l /etc/pve/priv/lock
total 0
 
Im facing the same issue on one of my nodes.
Code:
ast login: Wed Dec  6 18:21:35 2023
root@pve2:~# fuser -v /var/run/pvescheduler.pid.lock
Specified filename /var/run/pvescheduler.pid.lock does not exist.

root@pve2:~# /usr/bin/pvescheduler stop
root@pve2:~# fuser -v /var/run/pvescheduler.pid.lock
Specified filename /var/run/pvescheduler.pid.lock does not exist.

root@pve2:~# systemctl daemon-reload
root@pve2:~# fuser -v /var/run/pvescheduler.pid.lock
Specified filename /var/run/pvescheduler.pid.lock does not exist.

root@pve2:~# /usr/bin/pvescheduler status
stopped

root@pve2:~# service pvescheduler status
○ pvescheduler.service - Proxmox VE scheduler
     Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: en>
     Active: inactive (dead)
root@pve2:~#
root@pve2:~#

root@pve2:~# cat /etc/pve/jobs.cfg
vzdump: d4002806256179b179e84f7e2d0a5e71d52d7769:1
        schedule sat 01:00
        compress zstd
        enabled 1
        mailnotification failure
        mailto support@zim-service.ru
        mode snapshot
        quiet 1
        storage pve2_nfs_general
        vmid 301,302,303,304,401,402,404,403,244,248,250

root@pve2:~# ls -l /etc/pve/priv/lock
total 0
Thanks for your report, just to double check and exclude that these are mixed issues: Does your node is part of the quorate network segment pvecm status? Do you also see the hanging of the systemctl start pvescheduler.service and get similar output using ps auxwf?

Edit: Also, please run /usr/bin/pvescheduler start before the fuser -v /var/run/pvescheduler.pid.lock. This would tell if the scheduler daemon process gets started, in your case there is currently no process holding the lock.
 
Last edited:
Upgrade process stacks on setting up pve-manage 8.1.3 if I breaks it (CTRL+C) I see

Code:
root@pve2:~# pvecm nodes
Cannot initialize CMAP service
root@pve2:~# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-06 18:18:11 MSK; 37min ago
   Main PID: 2005 (pmxcfs)
      Tasks: 6 (limit: 38363)
     Memory: 43.2M
        CPU: 779ms
     CGroup: /system.slice/pve-cluster.service
             └─2005 /usr/bin/pmxcfs

Dec 06 18:25:46 pve2 pmxcfs[2005]: [dcdb] crit: cpg_initialize failed: 2
Dec 06 18:25:46 pve2 pmxcfs[2005]: [status] crit: cpg_initialize failed: 2
Dec 06 18:25:52 pve2 pmxcfs[2005]: [quorum] crit: quorum_initialize failed: 2
Dec 06 18:25:52 pve2 pmxcfs[2005]: [confdb] crit: cmap_initialize failed: 2
Dec 06 18:25:52 pve2 pmxcfs[2005]: [dcdb] crit: cpg_initialize failed: 2
Dec 06 18:25:52 pve2 pmxcfs[2005]: [status] crit: cpg_initialize failed: 2
Dec 06 18:25:58 pve2 pmxcfs[2005]: [quorum] crit: quorum_initialize failed: 2
Dec 06 18:25:58 pve2 pmxcfs[2005]: [confdb] crit: cmap_initialize failed: 2
Dec 06 18:25:58 pve2 pmxcfs[2005]: [dcdb] crit: cpg_initialize failed: 2
Dec 06 18:25:58 pve2 pmxcfs[2005]: [status] crit: cpg_initialize failed: 2

root@pve2:~# systemctl start pvescheduler.service

stacks here
^C
 

Attachments

  • ps_auxwf.txt
    22.2 KB · Views: 3
Last edited:
commenting out line 133 and removing pvescheduler.service from UNITS= on line 143 in /var/lib/dpkg/info/pve-manager.postinst made it so the upgrade no longer hanged on Setting up pve-manager (8.1.3) ... but I was still unable to restart pvescheduler.service. Once I rebooted, I was able to restart the service.

This helped
 
Dec 06 18:25:52 pve2 pmxcfs[2005]: [quorum] crit: quorum_initialize failed: 2
Well, these errors indicate that the node might have no quorum, please provide the requested output of pvecm status. Without quorum, the restart will fail because the serivce cannot acquire the locks on the pmxcfs.
 
This helped
This simply results in not restarting the pvescheduler during upgrades, are you able to restart the service after the package configuration was successful? Please try a systemctl reload-or-restart pvescheduler.service
 
Well I figured it out: somehow corosync.conf on this node became different (versions differ) with compare to other nodes (100% sure cluster was healthy and this node had been a member when I started an upgrade). After coping corosync.conf file from another node and restarting pvedaemon - everything raised up
 
Last edited:
This simply results in not restarting the pvescheduler during upgrades, are you able to restart the service after the package configuration was successful? Please try a systemctl reload-or-restart pvescheduler.service

This didnt help.
What I have done:
1. Fixed a line in code (removed pvescheduller restart), thanks to advice from previous page
2. Fix corosync conf desynchronization (by coping new version of corosync.conf to buggy node)
3. Reconfigure by dpkg
4. Restart daemons or reboot
 
  • Like
Reactions: Chris
Thanks for the syscall traces. Can you please also share the output of fuser -v /var/run/pvescheduler.pid.lock, then /usr/bin/pvescheduler stop and once again the first output?
If the command states that the file is not present, then that is fine, otherwise remove it at this point.

After that, also please run a systemctl daemon-reload. Does this hang for you as well? If not, run /usr/bin/pvescheduler start once again, and check the output of fuser -v /var/run/pvescheduler.pid.lock once again.

Edit: Also, please post the content of /etc/pve/jobs.cfg and ls -l /etc/pve/priv/lock

I have no pvescheduler.pid.lock file in var/run
systemctl daemon-reload works fine, thereafter pvesched start still hangs.
There's no /etc/pve/jobs.xcfg and locks folder is empty.

Tried pvcem delnode but no success. I am going to reinstall all, this is too time consuming.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!