[SOLVED] pvescheduler defunct after HA replication

j5boot

Renowned Member
May 18, 2011
13
0
66
Spain
Hello
I have two nodes in a cluster with the latest version of PVE (7.4-3) and after the last update the pvescheduler service has stopped working.
The service starts, and after running the first task to execute the disk replication to the second node, this process is dead.

Code:
pve02 ~ # ps aux | grep pvescheduler
root     1309422  0.0  0.0 336696 109292 ?       Ss   08:36   0:00 pvescheduler
root     1309423  0.0  0.0      0     0 ?        Z    08:36   0:00 [pvescheduler] <defunct>
root     1313066  0.0  0.0      0     0 ?        Z    08:37   0:00 [pvescheduler] <defunct>

It happens on both nodes. I have tried reinstalling the pve-manager package and restarting the nodes.

Code:
pve02 ~ # strace -yyttT -f -s 512 -p 1309422
strace: Process 1309422 attached
10:28:48.551399 restart_syscall(<... resuming interrupted read ...>) = 0 <11.599147>
10:29:00.150793 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=60, tv_nsec=0}, 0x7ffe7fcf8f90) = 0 <60.000169>
10:30:00.151229 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=60, tv_nsec=0}, 0x7ffe7fcf8f90) = 0 <60.000191>
10:31:00.151667 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=60, tv_nsec=0}, 0x7ffe7fcf8f90) = 0 <60.000171>
10:32:00.152079 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=60, tv_nsec=0}, 0x7ffe7fcf8f90) = 0 <60.000128>
10:33:00.152368 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=60, tv_nsec=0}, 0x7ffe7fcf8f90) = 0 <60.000178>
10:34:00.152781 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=60, tv_nsec=0},

More info:

Code:
pve02 ~ # pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u2.1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Code:
pve01 ~ # pvecm status
Cluster information
-------------------
Name:             PVECluster
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Mar 27 10:29:43 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.c7
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.101.111
0x00000002          1 192.168.101.112 (local)

The replication tasks remain pending, but if I run them manually, they are fine.

Code:
pve02 ~ # pvesr status
JobID      Enabled    Target                           LastSync             NextSync   Duration  FailCount State
1002-0     Yes        local/pve01           2023-03-26_16:08:29              pending  11.099929          0 OK
2002-0     Yes        local/pve01           2023-03-27_08:36:53              pending  138.37906          0 OK
2102-0     Yes        local/pve01           2023-03-26_16:33:45              pending    6.50503          0 OK
2104-0     Yes        local/pve01           2023-03-26_16:10:57  2023-03-27_10:40:00   7.043028          0 OK
2106-0     Yes        local/pve01           2023-03-26_16:12:34  2023-03-27_11:00:00    7.13516          0 OK
2108-0     Yes        local/pve01           2023-03-26_16:15:32  2023-03-27_11:20:00    7.57505          0 OK
2112-0     Yes        local/pve01           2023-03-26_16:20:49  2023-03-27_12:00:00   7.913801          0 OK

Code:
pve02 ~ # pvesr run --id 2102-0 --verbose  
start replication job
guest => VM 2102, running => 180321
volumes => DiscoSSD:vm-2102-disk-0
create snapshot '__replicate_2102-0_1679905976__' on DiscoSSD:vm-2102-disk-0
using secure transmission, rate limit: none
incremental sync 'DiscoSSD:vm-2102-disk-0' (__replicate_2102-0_1679841225__ => __replicate_2102-0_1679905976__)
send from @__replicate_2102-0_1679841225__ to DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__ estimated size is 655M
total estimated size is 655M
TIME        SENT   SNAPSHOT DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__
10:33:01   37.6M   DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__
10:33:02    136M   DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__
10:33:03    230M   DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__
10:33:04    318M   DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__
10:33:05    407M   DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__
10:33:06    499M   DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__
10:33:07    578M   DiscoSSD/vm-2102-disk-0@__replicate_2102-0_1679905976__
successfully imported 'DiscoSSD:vm-2102-disk-0'
delete previous replication snapshot '__replicate_2102-0_1679841225__' on DiscoSSD:vm-2102-disk-0
(remote_finalize_local_job) delete stale replication snapshot '__replicate_2102-0_1679841225__' on DiscoSSD:vm-2102-disk-0
end replication job

Any ideas? Its a bug?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!