Proxmox datacenter backup job repeatedly attempting to backup VMs that literally have never existed at any point in history

Maeve · May 15, 2023

Hey all, I'm at my wit's end.
A few days ago my node stopped backing up two of my containers. I had to update and reboot it to get them to cooperate. Pasted below is a list of all my VMs, CTs, and their IDs.

After I rebooted and verified that a single backup cooperated, I ran the backup job and it failed, saying it could not find VM 200. This makes total sense as... there was no VM 200. There had never been a VM 200. It's theoretically possible that I tried to create a VM or CT called 200 at some point but didn't finish it, but there is absolutely no actual VM 200.

It was fine for two days and then today's job did it again, with VM 152. There is no VM 152. When it was VM 200 I looked in the Backup Job scheduler and found nothing. This time it had an entry for VM 152 but it just said unknown. ???

I'm running a smart check just to be sure but like, what the hell? Does anyone have any idea why this is happening? Is it from VMs I went to create but didn't? Why would they show up in the backup job? Is it a corruption? If so, how is it doing this specifically??

Chris · May 15, 2023

Maeve said:
Hey all, I'm at my wit's end.
A few days ago my node stopped backing up two of my containers. I had to update and reboot it to get them to cooperate. Pasted below is a list of all my VMs, CTs, and their IDs.

View attachment 50418

After I rebooted and verified that a single backup cooperated, I ran the backup job and it failed, saying it could not find VM 200. This makes total sense as... there was no VM 200. There had never been a VM 200. It's theoretically possible that I tried to create a VM or CT called 200 at some point but didn't finish it, but there is absolutely no actual VM 200.

It was fine for two days and then today's job did it again, with VM 152. There is no VM 152. When it was VM 200 I looked in the Backup Job scheduler and found nothing. This time it had an entry for VM 152 but it just said unknown. ???

I'm running a smart check just to be sure but like, what the hell? Does anyone have any idea why this is happening? Is it from VMs I went to create but didn't? Why would they show up in the backup job? Is it a corruption? If so, how is it doing this specifically??

Hi,
please post the output of the following:

Bash:

pveversion -v
cat /etc/pve/jobs.cfg
journanlctl -b -u pvescheduler.service

UdoB · May 15, 2023

Interesting.

You did specify the backups jobs in the GUI via Datacenter --> Backups? You are sure those jobs do not mention 152 or 200?

Go to arachna-pve --> Shell and search for them:

Code:

~# grep 152 /etc/pve/jobs.cfg
~# grep 152 /etc/pve/vzdump.cron
~# grep 200 /etc/pve/jobs.cfg
~# grep 200 /etc/pve/vzdump.cron

"jobs" is the new mechanism, "vzdump" is legacy. Maybe there is something...

Good luck

Maeve · May 15, 2023

Chris said:
Hi,
please post the output of the following:

Bash:

pveversion -v cat /etc/pve/jobs.cfg journanlctl -b -u pvescheduler.service

pveversion -v

Bash:

proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.107-1-pve: 5.15.107-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

jobs.cfg:

journalctl command:

Not sure why 199 failed, but that's a real VM. It did backup, so Idk what its going on about there. 152 is the problem child.

Maeve · May 15, 2023

UdoB said:
Interesting.

You did specify the backups jobs in the GUI via Datacenter --> Backups? You are sure those jobs do not mention 152 or 200?

Go to arachna-pve --> Shell and search for them:

Code:

~# grep 152 /etc/pve/jobs.cfg ~# grep 152 /etc/pve/vzdump.cron ~# grep 200 /etc/pve/jobs.cfg ~# grep 200 /etc/pve/vzdump.cron

"jobs" is the new mechanism, "vzdump" is legacy. Maybe there is something...

Good luck

Confirmed, nothing shows up from those greps. Weird.

Chris · May 16, 2023

Maeve said:

pveversion -v

Bash:

proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.107-1-pve: 5.15.107-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

jobs.cfg:
View attachment 50439

journalctl command:

View attachment 50440

Not sure why 199 failed, but that's a real VM. It did backup, so Idk what its going on about there. 152 is the problem child.

Try to remove the backup job from the config an recreate it. Does the issue persist? ID 152 is showing up in the task log, while it is not in the config, so it seems that something is out of sync here.

fiona · May 16, 2023

Hi,
can you also check the output of ps aux | grep pvescheduler and debsums -s pve-manager (you likely need to install debsums first)?

Maeve · May 16, 2023

Chris said:
Try to remove the backup job from the config an recreate it. Does the issue persist? ID 152 is showing up in the task log, while it is not in the config, so it seems that something is out of sync here.

I removed the job and recreated it after the first incident, it was fixed until eventually I had to power cycle my machine and I think it started a day after that. I recently power cycled it again and it hasn't done it, so my fingers are crossed, but I just wanna get to the bottom of this bizarre behavior in case it's indicative of something else

Maeve · May 16, 2023

fiona said:
Hi,
can you also check the output of ps aux | grep pvescheduler and debsums -s pve-manager (you likely need to install debsums first)?

Debsums had no return value

Chris · May 17, 2023

Maeve said:
I removed the job and recreated it after the first incident, it was fixed until eventually I had to power cycle my machine and I think it started a day after that. I recently power cycled it again and it hasn't done it, so my fingers are crossed, but I just wanna get to the bottom of this bizarre behavior in case it's indicative of something else

Okay, looking trough the code, I could not find any hint of what might cause such an issue, as the job config and guest IDs are read directly from their corresponding locations under the mounted pmxcfs at /etc/pve.
Could you please check the logs of journalctl -u pve-cluster.service and the logs in general for any further hint?

Nuke Bloodaxe · May 17, 2023

I notice your mystery machine of ID 152 comes immediately after VM ID 150. Might I suggest what you are looking at is this:
10010110 -> Mysterious energetic bit flip -> 10011000

VM 199 to VM200:
11000111 -> Mysterious energetic bit flip -> 11001000

Do you see my point? How the last 4 bits [rightmost group] change the same in each case.

So, the critical question, have you tested the memory with memtest86?

fiona · May 17, 2023

Nuke Bloodaxe said:
I notice your mystery machine of ID 152 comes immediately after VM ID 150. Might I suggest what you are looking at is this:
10010110 -> Mysterious energetic bit flip -> 10011000

VM 199 to VM200:
11000111 -> Mysterious energetic bit flip -> 11001000

Do you see my point? How the last 4 bits [rightmost group] change the same in each case.

So, the critical question, have you tested the memory with memtest86?

Of course checking memory never hurts, but if it were a bit flip, you wouldn't have both 150 and 152 in the job, but only the flipped one. Note that all of the IDs, including the supposedly never existing 152, are already present in the starting new backup job log line.

EDIT: fixed, because the 200 is not actually there in the log above.

Nuke Bloodaxe · May 17, 2023

fiona said:
Of course checking memory never hurts, but if it were a bit flip, you wouldn't have both 150 and 152 in the job, but only the flipped one. Note that all of the IDs, including the supposedly never existing 152, are already present in the starting new backup job log line.

EDIT: fixed, because the 200 is not actually there in the log above.

Only mentioned 200, as it's in Ops opening message. The similarity seemed interesting.

Maeve · May 18, 2023

Out of an abundance of caution I'll do a memtest in a little bit

Search

Search

Proxmox datacenter backup job repeatedly attempting to backup VMs that literally have never existed at any point in history

Maeve

New Member

Chris

Proxmox Staff Member

UdoB

Distinguished Member

Maeve

New Member

Maeve

New Member

Chris

Proxmox Staff Member

fiona

Proxmox Staff Member

Maeve

New Member

Maeve

New Member

Chris

Proxmox Staff Member

Nuke Bloodaxe

Active Member

fiona

Proxmox Staff Member

Nuke Bloodaxe

Active Member

Maeve

New Member

We value your privacy