PVE 8.4 pvescheduler cfs-lock timeout all cluster node

jjooeeyy · Tuesday at 04:51

My cluster all member has same issue below, it's any step I can do for found the root cause and solve it? thanks

Code:

root@pvenode-151:~# systemctl status pvescheduler.service
● pvescheduler.service - Proxmox VE scheduler
     Loaded: loaded (/lib/systemd/system/pvescheduler.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-06-02 16:45:45 CST; 18h ago
    Process: 74314 ExecStart=/usr/bin/pvescheduler start (code=exited, status=0/SUCCESS)
   Main PID: 74315 (pvescheduler)
      Tasks: 1 (limit: 154012)
     Memory: 116.0M
        CPU: 16.302s
     CGroup: /system.slice/pvescheduler.service
             └─74315 pvescheduler

Jun 03 09:55:09 pvenode-151 pvescheduler[735086]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jun 03 10:09:10 pvenode-151 pvescheduler[758102]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jun 03 10:11:09 pvenode-151 pvescheduler[761412]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jun 03 10:11:09 pvenode-151 pvescheduler[761410]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Jun 03 10:13:10 pvenode-151 pvescheduler[764706]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Jun 03 10:18:09 pvenode-151 pvescheduler[772946]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jun 03 10:20:09 pvenode-151 pvescheduler[776240]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jun 03 10:34:09 pvenode-151 pvescheduler[799228]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Jun 03 10:35:09 pvenode-151 pvescheduler[800859]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jun 03 10:48:09 pvenode-151 pvescheduler[822257]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout

fiona · Tuesday at 10:20

Hi,
what does pvecm status on each node say? Do you see any other errors/warnings in the system logs/journal?

jjooeeyy · Tuesday at 14:27

It's seems normal on each node

Code:

Cluster information
-------------------
Name:             SeOOOOOOOOCluster
Config Version:   172
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jun  3 20:22:53 2025
Quorum provider:  corosync_votequorum
Nodes:            20
Node ID:          0x00000001
Ring ID:          1.12000
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   20
Highest expected: 20
Total votes:      20
Quorum:           11 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 OOO.OOO.OOO.151 (local)
0x00000002          1 OOO.OOO.OOO.152
0x00000003          1 OOO.OOO.OOO.153
0x00000004          1 OOO.OOO.OOO.154
0x00000005          1 OOO.OOO.OOO.155
0x00000006          1 OOO.OOO.OOO.156
0x00000007          1 OOO.OOO.OOO.157
0x00000008          1 OOO.OOO.OOO.158
0x00000009          1 OOO.OOO.OOO.159
0x0000000a          1 OOO.OOO.OOO.161
0x0000000b          1 OOO.OOO.OOO.160
0x0000000c          1 OOO.OOO.OOO.162
0x0000000d          1 OOO.OOO.OOO.163
0x0000000e          1 OOO.OOO.OOO.164
0x0000000f          1 OOO.OOO.OOO.165
0x00000010          1 OOO.OOO.OOO.166
0x00000011          1 OOO.OOO.OOO.167
0x00000012          1 OOO.OOO.OOO.168
0x00000013          1 OOO.OOO.OOO.169
0x00000014          1 OOO.OOO.OOO.170

there is no other error but some link up/down in the log for each node such as:

Code:

Jun 03 17:50:25 pvenode-151 corosync[20556]:   [KNET  ] host: host: 7 (passive) best link: 0 (pri: 1)
Jun 03 17:50:25 pvenode-151 corosync[20556]:   [KNET  ] host: host: 7 has no active links
Jun 03 17:50:26 pvenode-151 corosync[20556]:   [KNET  ] link: Resetting MTU for link 0 because host 7 joined
Jun 03 17:50:26 pvenode-151 corosync[20556]:   [KNET  ] host: host: 7 (passive) best link: 0 (pri: 1)
Jun 03 17:50:26 pvenode-151 corosync[20556]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Jun 03 17:53:13 pvenode-151 pvescheduler[1460030]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Jun 03 17:53:25 pvenode-151 corosync[20556]:   [KNET  ] link: host: 11 link: 0 is down
Jun 03 17:53:25 pvenode-151 corosync[20556]:   [KNET  ] link: host: 4 link: 0 is down
Jun 03 17:53:25 pvenode-151 corosync[20556]:   [KNET  ] host: host: 11 (passive) best link: 0 (pri: 1)
Jun 03 17:53:25 pvenode-151 corosync[20556]:   [KNET  ] host: host: 11 has no active links
Jun 03 17:53:25 pvenode-151 corosync[20556]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Jun 03 17:53:25 pvenode-151 corosync[20556]:   [KNET  ] host: host: 4 has no active links
Jun 03 17:53:29 pvenode-151 corosync[20556]:   [KNET  ] link: Resetting MTU for link 0 because host 4 joined
Jun 03 17:53:29 pvenode-151 corosync[20556]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Jun 03 17:53:29 pvenode-151 corosync[20556]:   [KNET  ] link: Resetting MTU for link 0 because host 11 joined
Jun 03 17:53:29 pvenode-151 corosync[20556]:   [KNET  ] host: host: 11 (passive) best link: 0 (pri: 1)
Jun 03 17:53:29 pvenode-151 corosync[20556]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Jun 03 17:56:12 pvenode-151 pvescheduler[1465014]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jun 03 17:57:12 pvenode-151 pmxcfs[20560]: [dcdb] notice: data verification successful
Jun 03 17:59:21 pvenode-151 corosync[20556]:   [KNET  ] link: host: 16 link: 0 is down
Jun 03 17:59:21 pvenode-151 corosync[20556]:   [KNET  ] link: host: 9 link: 0 is down
Jun 03 17:59:21 pvenode-151 corosync[20556]:   [KNET  ] host: host: 16 (passive) best link: 0 (pri: 1)
Jun 03 17:59:21 pvenode-151 corosync[20556]:   [KNET  ] host: host: 16 has no active links
Jun 03 17:59:21 pvenode-151 corosync[20556]:   [KNET  ] host: host: 9 (passive) best link: 0 (pri: 1)
Jun 03 17:59:21 pvenode-151 corosync[20556]:   [KNET  ] host: host: 9 has no active links
Jun 03 17:59:23 pvenode-151 corosync[20556]:   [KNET  ] link: Resetting MTU for link 0 because host 16 joined
Jun 03 17:59:23 pvenode-151 corosync[20556]:   [KNET  ] host: host: 16 (passive) best link: 0 (pri: 1)
Jun 03 17:59:23 pvenode-151 corosync[20556]:   [KNET  ] link: Resetting MTU for link 0 because host 9 joined
Jun 03 17:59:23 pvenode-151 corosync[20556]:   [KNET  ] host: host: 9 (passive) best link: 0 (pri: 1)
Jun 03 17:59:24 pvenode-151 corosync[20556]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Jun 03 18:00:25 pvenode-151 pvedaemon[20630]: <root@pam> successful auth for user 'root@pam'
Jun 03 18:01:55 pvenode-151 pveproxy[1424709]: Clearing outdated entries from certificate cache
Jun 03 18:04:51 pvenode-151 pmxcfs[20560]: [status] notice: received log
Jun 03 18:05:10 pvenode-151 pvescheduler[1479988]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] link: host: 19 link: 0 is down
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] link: host: 15 link: 0 is down
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] link: host: 11 link: 0 is down
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] host: host: 19 has no active links
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] host: host: 15 (passive) best link: 0 (pri: 1)
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] host: host: 15 has no active links
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] host: host: 11 (passive) best link: 0 (pri: 1)
Jun 03 18:09:22 pvenode-151 corosync[20556]:   [KNET  ] host: host: 11 has no active links
Jun 03 18:09:26 pvenode-151 corosync[20556]:   [KNET  ] link: Resetting MTU for link 0 because host 19 joined
Jun 03 18:09:26 pvenode-151 corosync[20556]:   [KNET  ] host: host: 19 (passive) best link: 0 (pri: 1)
Jun 03 18:09:26 pvenode-151 corosync[20556]:   [KNET  ] link: Resetting MTU for link 0 because host 15 joined
Jun 03 18:09:26 pvenode-151 corosync[20556]:   [KNET  ] host: host: 15 (passive) best link: 0 (pri: 1)
Jun 03 18:09:26 pvenode-151 corosync[20556]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Jun 03 18:09:28 pvenode-151 corosync[20556]:   [KNET  ] link: Resetting MTU for link 0 because host 11 joined
Jun 03 18:09:28 pvenode-151 corosync[20556]:   [KNET  ] host: host: 11 (passive) best link: 0 (pri: 1)
Jun 03 18:09:28 pvenode-151 corosync[20556]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Jun 03 18:13:15 pvenode-151 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_09] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 73 to 72
Jun 03 18:13:15 pvenode-151 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_11] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 76 to 75
Jun 03 18:13:15 pvenode-151 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_13] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 76 to 74
Jun 03 18:13:21 pvenode-151 pveproxy[1443188]: Clearing outdated entries from certificate cache
Jun 03 18:16:11 pvenode-151 pvescheduler[1498298]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout

fiona · Wednesday at 09:57

What is the output of ps faxl | grep -C 2 pvescheduler? What about systemctl status pve-cluster.service?

jjooeeyy · Wednesday at 14:39

There is some node of

Code:

ps faxl | grep -C 2 pvescheduler

and single node(all result are same)

Code:

systemctl status pve-cluster.service

Code:

pvenode-151

4     0 3922520   47052  20   0  18000 10972 do_sys Ss   ?          0:00  \_ sshd: root@pts/0

4     0 3922808 3922520  20   0   9680  5120 do_wai Ss   pts/0      0:00  |   \_ -bash

0     0 3959144 3922808  20   0  14188  7680 do_sys S+   pts/0      0:00  |       \_ ssh OOO.OOO.11.151 hostname; ps faxl | grep -C 2 pvescheduler

4     0 3959145   47052  20   0  18012 10752 do_sys Ss   ?          0:00  \_ sshd: root@notty

4     0 3959158 3959145  20   0   6936  2560 do_wai Ss   ?          0:00      \_ bash -c hostname; ps faxl | grep -C 2 pvescheduler

4     0 3959160 3959158  20   0  11352  4608 -      R    ?          0:00          \_ ps faxl

0     0 3959161 3959158  20   0   6336  2048 pipe_r S    ?          0:00          \_ grep -C 2 pvescheduler

5   100   70012       1  20   0  18884  3072 do_sel S    ?          0:00 /usr/sbin/chronyd -F 1

1   100   70013   70012  20   0  10556  2560 skb_wa S    ?          0:00  \_ /usr/sbin/chronyd -F 1

1     0   74315       1  20   0 225272 121164 hrtime Ss  ?          0:12 pvescheduler

5     0 2068254       1  20   0  79192  2048 do_sys Ssl  ?          0:04 /usr/sbin/pvefw-logger

7     0 3185803       1  20   0 68574612 2051504 do_sys Sl ?        3:42 /usr/bin/kvm -id 60151 -name OHotNode151,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/60151.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/60151.pid -daemonize -smbios type=1,uuid=38fe9901-07c1-4fb9-ba8b-34fd9a79b7bc -drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd -drive if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/pve/vm-60151-disk-0,size=540672 -smp 12,sockets=2,cores=6,maxcpus=12 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/60151.vnc,password=on -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 65535 -object iothread,id=iothread-virtioscsi0 -object iothread,id=iothread-virtioscsi1 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=bc32e2ba-45ee-4556-b388-d41599c33f93 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:32cf02eba67 -drive if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/dev/O/vm-60151-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -device virtio-scsi-pci,id=virtioscsi1,bus=pci.3,addr=0x2,iothread=iothread-virtioscsi1 -drive file=/dev/O/vm-60151-disk-1,if=none,id=drive-scsi1,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1 -netdev type=tap,id=net0,ifname=tap60151i0,script=/usr/libexec/qemu-server/pve-bridge,downscript=/usr/libexec/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=BC:24:11:86:12:CB,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256 -machine type=pc+pve1

pvenode-152

4     0    1631       1  20   0  15440  8704 do_sys Ss   ?          0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups

4     0 2452334    1631  20   0  18012 10752 do_sys Ss   ?          0:00  \_ sshd: root@notty

4     0 2452340 2452334  20   0   6936  3072 do_wai Ss   ?          0:00      \_ bash -c hostname; ps faxl | grep -C 2 pvescheduler

4     0 2452342 2452340  20   0  11352  4608 -      R    ?          0:00          \_ ps faxl

0     0 2452343 2452340  20   0   6336  2048 pipe_r S    ?          0:00          \_ grep -C 2 pvescheduler

5   100    1633       1  20   0  18884  3076 do_sel S    ?          0:00 /usr/sbin/chronyd -F 1

1   100    1652    1633  20   0  10556  2560 skb_wa S    ?          0:00  \_ /usr/sbin/chronyd -F 1

--

4     0 2537880       1 -100  - 609736 214452 do_epo SLsl ?        23:53 /usr/sbin/corosync -f

5     0 2537883       1  20   0 826920 47536 futex_ Ssl  ?          5:38 /usr/bin/pmxcfs

1     0 2703616       1  20   0 225372 121168 hrtime Ss  ?          0:17 pvescheduler

5     0  891608       1  20   0  79192  2048 do_sys Ssl  ?          0:04 /usr/sbin/pvefw-logger

7     0 1943465       1  20   0 35132072 1726560 do_sys Sl ?        2:53 /usr/bin/kvm -id 61152 -name OWarmNode152,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/61152.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/61152.pid -daemonize -smbios type=1,uuid=cd3a3a35-c003-4482-ad0a-69e8e09a2a95 -drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd -drive if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/pve/vm-61152-disk-0,size=540672 -smp 12,sockets=2,cores=6,maxcpus=12 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/61152.vnc,password=on -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 32767 -object iothread,id=iothread-virtioscsi0 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=6006fed4-162f-4005-a7b7-67f44dc3942c -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:811436395337 -drive if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/mnt/pve/S34_O/images/61152/vm-61152-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap61152i0,script=/usr/libexec/qemu-server/pve-bridge,downscript=/usr/libexec/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=BC:24:11:AA:E0:C8,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256 -machine type=pc+pve1

pvenode-153

4     0    1655       1  20   0  15440  9216 do_sys Ss   ?          0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups

4     0 3116439    1655  20   0  18012 10240 do_sys Ss   ?          0:00  \_ sshd: root@notty

4     0 3116449 3116439  20   0   6936  3072 do_wai Ss   ?          0:00      \_ bash -c hostname; ps faxl | grep -C 2 pvescheduler

4     0 3116451 3116449  20   0  11352  4096 -      R    ?          0:00          \_ ps faxl

0     0 3116452 3116449  20   0   6336  2048 pipe_r S    ?          0:00          \_ grep -C 2 pvescheduler

5   100    1660       1  20   0  18884  3076 do_sel S    ?          0:00 /usr/sbin/chronyd -F 1

1   100    1677    1660  20   0  10556  2560 skb_wa S    ?          0:00  \_ /usr/sbin/chronyd -F 1

--

4     0 2792128       1 -100  - 609608 213824 do_epo SLsl ?        24:37 /usr/sbin/corosync -f

5     0 2792131       1  20   0 679320 53944 futex_ Ssl  ?          5:45 /usr/bin/pmxcfs

1     0 2951196       1  20   0 225396 121212 hrtime Ss  ?          0:17 pvescheduler

5     0 1473409       1  20   0  79192  2048 do_sys Ssl  ?          0:04 /usr/sbin/pvefw-logger

4     0 3116275       1  20   0  19124  9216 do_epo Ss   ?          0:00 /lib/systemd/systemd --user

pvenode-154

4     0    1669       1  20   0  15440  6144 do_sys Ss   ?          0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups

4     0 2642819    1669  20   0  18012 10752 do_sys Ss   ?          0:00  \_ sshd: root@notty

4     0 2642831 2642819  20   0   6936  2560 do_wai Ss   ?          0:00      \_ bash -c hostname; ps faxl | grep -C 2 pvescheduler

4     0 2642833 2642831  20   0  11352  4608 -      R    ?          0:00          \_ ps faxl

0     0 2642834 2642831  20   0   6336  2048 pipe_r S    ?          0:00          \_ grep -C 2 pvescheduler

5   100    1694       1  20   0  18884  2564 do_sel S    ?          0:00 /usr/sbin/chronyd -F 1

1   100    1710    1694  20   0  10556  2048 skb_wa S    ?          0:00  \_ /usr/sbin/chronyd -F 1

--

4     0  581434       1 -100  - 609672 215012 do_epo SLsl ?        21:39 /usr/sbin/corosync -f

5     0  581437       1  20   0 675204 54848 futex_ Ssl  ?          4:52 /usr/bin/pmxcfs

1     0  625636       1  20   0 225300 44872 hrtime Ss   ?          0:12 pvescheduler

5     0 1724161       1  20   0  79192  4096 do_sys Ssl  ?          0:02 /usr/sbin/pvefw-logger

7     0 2494098       1  20   0 36785912 33328684 do_sys Sl ?     163:02 /usr/bin/kvm -id 61154 -name OWarmNode154,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/61154.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect-ms=5000 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/61154.pid -daemonize -smbios type=1,uuid=2ab7cfb7-d128-4f67-ab9b-c1de31154721 -drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd -drive if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/pve/vm-61154-disk-0,size=540672 -smp 12,sockets=2,cores=6,maxcpus=12 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/61154.vnc,password=on -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 32767 -object iothread,id=iothread-virtioscsi0 -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=3b821077-53b5-4a35-9703-cafbf3b52a1f -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:a7d91a4e9165 -drive if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/mnt/pve/S34_O/images/61154/vm-61154-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap61154i0,script=/usr/libexec/qemu-server/pve-bridge,downscript=/usr/libexec/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=BC:24:11:AB:A1:71,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256 -machine type=pc+pve1

Code:

● pve-cluster.service - The Proxmox VE cluster filesystem

     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)

     Active: active (running) since Mon 2025-06-02 14:57:11 CST; 2 days ago

   Main PID: 20560 (pmxcfs)

      Tasks: 9 (limit: 154012)

     Memory: 64.5M

        CPU: 6min 35.386s

     CGroup: /system.slice/pve-cluster.service

             └─20560 /usr/bin/pmxcfs



Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/local: -1

Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/F34_HDD_VMStorage: -1

Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/local-lvm: -1

Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/S34_O: -1

Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/S34_PVE_ISOs: -1

Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/O: -1

Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/FS82_vol_sas_600_10k: -1

Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/FS82_vol_sata_2t_7k2: -1

Jun 04 20:23:42 pvenode-151 pmxcfs[20560]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/pvenode-168/FS82_vol_sata_2t_7k2: /var/lib/rrdcached/db/pve2-storage/pvenode-168/FS82_vol_sata_2t_7k2: illegal attempt to update using time 1749039821 when last update time is 1749039821 (minimum one second step)

Jun 04 20:24:53 pvenode-151 pmxcfs[20560]: [status] notice: received log

fiona · Wednesday at 16:03

The flapping links in the logs might indicate network issues. Do you have a dedicated network for corosync? It needs very low latency, especially with 20 nodes: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_requirements

jjooeeyy · 2025-06-05T03:05:06+0200

My All node has two 1GbE and aggr to one bond0
bond0 peak throughput around 40-60%, All 1GbE interface throughput around 60-80%
It's okay to use shared network interface? or I just need to bridge a interface for bond0 to dedicated network for corosync?

fiona · 2025-06-05T10:29:02+0200

What's important is the latency.

jjooeeyy · 2025-06-06T03:12:01+0200

My odd and even node are located at different server room location with 20Gbps dark fiber, latency between 10-20 ms.
It's any way to increase tolerable latency in corosync config?

Search

Search

PVE 8.4 pvescheduler cfs-lock timeout all cluster node

jjooeeyy

New Member

fiona

Proxmox Staff Member

jjooeeyy

New Member

fiona

Proxmox Staff Member

jjooeeyy

New Member

fiona

Proxmox Staff Member

jjooeeyy

New Member

fiona

Proxmox Staff Member

jjooeeyy

New Member

We value your privacy