[SOLVED] I've LOST and CORRUPTED my data on a proxmox VE v 7.3-4 after space out on PBS 3.2-2!

VGusev2007 · Monday at 13:35

Dear all!

I use proxmox VE v 7.3-4 with PBS v 3.2-2 for a long time.

I missed out space on my pbs and after that my backup job stacked with error msg:

Code:

INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '5c30694c-25d4-4696-b113-75462f864802'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (560.0 MiB of 16.2 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 560.0 MiB dirty of 16.2 GiB total
INFO:  12% (72.0 MiB of 560.0 MiB) in 2s, read: 36.0 MiB/s, write: 34.0 MiB/s
ERROR: backup write data failed: command error: write_data upload error: pipelined request failed: inserting chunk on store 'storage01' failed for 88a1a151f283827462d8ab2f879f4e4683386fb5a2fe43a109363643584129f7 - fchmod "/srv/storage01/.chunks/88a1/88a1a151f283827462d8ab2f879f4e4683386fb5a2fe43a109363643584129f7.tmp_AAMSiH" failed: EDQUOT: Quota exceeded
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 8998 failed - backup write data failed: command error: write_data upload error: pipelined request failed: inserting chunk on store 'storage01' failed for 88a1a151f283827462d8ab2f879f4e4683386fb5a2fe43a109363643584129f7 - fchmod "/srv/storage01/.chunks/88a1/88a1a151f283827462d8ab2f879f4e4683386fb5a2fe43a109363643584129f7.tmp_AAMSiH" failed: EDQUOT: Quota exceeded
INFO: Failed at 2024-10-25 22:00:11
...

After that several of my vm've stuck with corrupt fs inside it!

Code:

Like this one:
[Mon Oct 28 15:06:45 2024] systemd-journald[362]: Failed to write entry (22 items, 753 bytes), ignoring: Read-only file system

Dmesg also showed a lot of i/o.

fsck.ext4 shows a lot of error with inode and so on insade of that VMs!

I use zfs as a backend storage.
I don't have any problem like this one before I started to use PBS!

Is it danger to use PBS on production env?

UdoB · Monday at 14:10

VGusev2007 said:
failed: EDQUOT: Quota exceeded

VGusev2007 said:
I use zfs as a backend storage.

You need to read man zfsprops and look for "Quota". Then you get an overview of the space used with zfs list -o space; then you ask it for details about the dataset used for the PBS datastore with zfs get all <yourpbsdataset>.

At the end - and if you have space unassigned and available in the pool for this - you can increase the Quota with zfs set quota=<newsetting> <yourpbsdataset>.

If the complete pool is actually full you need to delete some unused data... which may be a problem in itself...

VGusev2007 · Monday at 14:34

UdoB said:
You need to read man zfsprops and look for "Quota". Then you get an overview of the space used with zfs list -o space; then you ask it for details about the dataset used for the PBS datastore with zfs get all <yourpbsdataset>.

At the end - and if you have space unassigned and available in the pool for this - you can increase the Quota with zfs set quota=<newsetting> <yourpbsdataset>.

If the complete pool is actually full you need to delete some unused data... which may be a problem in itself...

No. I realized that problem comes from bad design of proxmox at all. For try to mitigate impact of my problem I need to have consider use PVE v 8.2 with use of Backup fleecing (advanced feature) or switch to pve-zsync

esi_y · Monday at 15:06

VGusev2007 said:
No. I realized that problem comes from bad design of proxmox at all. For try to mitigate impact of my problem I need to have consider use PVE v 8.2 with use of Backup fleecing (advanced feature) or switch to pve-zsync

So I would just point out that if this is occuring in PVE 8, you should file a Bugzilla report [1]. Also not too long ago, it was made clear to me in no uncertain terms by Proxmox staff that users running PVE 7 are anyhow not in scope for anything as it is EOL.

[1] https://bugzilla.proxmox.com/

VGusev2007 · Monday at 15:11

esi_y said:
So I would just point out that if this is occuring in PVE 8, you should file a Bugzilla report [1]. Also not too long ago, it was made clear to me in no uncertain terms by Proxmox staff that users running PVE 7 are anyhow not in scope for anything as it is EOL.

[1] https://bugzilla.proxmox.com/

That problem is common (and still here) and described in wiki: https://pve.proxmox.com/wiki/Backup_and_Restore see section VM Backup Fleecing

If you can answer for my other topic witch is related to that problem I will be glad - https://forum.proxmox.com/threads/what-will-happen-if-out-of-space-backup-fleecing.156581/

esi_y · Monday at 15:28

VGusev2007 said:
That problem is common (and still here) and described in wiki: https://pve.proxmox.com/wiki/Backup_and_Restore see section VM Backup Fleecing

If you can answer for my other topic witch is related to that problem I will be glad - https://forum.proxmox.com/threads/what-will-happen-if-out-of-space-backup-fleecing.156581/

I do not see why fleecing should corrupt your guest disks. I intentionally left your solo post unanswered because usually with zero answers staff will eventually reply. It's odd to me as described above.

VGusev2007 · Monday at 16:28

esi_y said:
I do not see why fleecing should corrupt your guest disks. I intentionally left your solo post unanswered because usually with zero answers staff will eventually reply. It's odd to me as described above.

If backup process hangs for any reasons, there is a lot of problem after that... You can see that. It's my production. Thank for you did not answer me in that topic. Hope that proxmox stuff will answer me.

esi_y · Monday at 16:31

VGusev2007 said:
If backup process hangs for any reasons, there is a lot of problem after that... You can see that. It's my production. Thank for you did not answer me in that topic. Hope that proxmox stuff will answer me.

I just find it completely weird that you filling up target storage for backup has any impact your guest disks. I think staff eventually use some filter where there were no answers and reply such threads (if they feel like they can reply something). Another trick you can employ is to post this to PBS forum. That is so slow that whatever you post there will be on the 1st page for long.

esi_y · Monday at 16:38

But if you can make a reproducer with PVE8 (with or without fleecing, doesn't matter), I would not even wait and file Bugzilla report right away. This is not documented in any way and does not even make sense to me - consider that in some scenarios the admins running PVE cluster and PBS might be completely different people.

VGusev2007 · Monday at 17:43

esi_y said:
But if you can make a reproducer with PVE8 (with or without fleecing, doesn't matter), I would not even wait and file Bugzilla report right away. This is not documented in any way and does not even make sense to me - consider that in some scenarios the admins running PVE cluster and PBS might be completely different people.

Yes. If I will have that possibility, I test that, but I'm sure that problem still here because design. You need fast and robust backup server to make any kind online backup. I use proxmox PVE from v 4.0 and problem still here. It's design. I hoped that PBS uses other design but not, I've googled it and because of this I've closed that topic as solved because there is no new good info for any users.

VGusev2007 · Monday at 17:44

VGusev2007 said:
Yes. If I will have that possibility, I test that, but I'm sure that problem still here because design. You need fast and robust backup server to make any kind online backup. I use proxmox PVE from v 4.0 and problem still here. It's design. I hoped that PBS uses other design but not, I've googled it and because of this I've closed that topic as solved because there is no new good info for any users.

Also, I suppose that problem comes not only from proxmox but from qemu itself.

waltar · Monday at 19:25

esi_y said:
I just find it completely weird that you filling up target storage for backup has any impact your guest disks.

Yeah, that really looks as two different independent problems !

VGusev2007 said:
I use zfs as a backend storage.

Backend storage for the vm's/lxc's (with ext4 inside) on zfs ?
And backend storage for pbs on zfs but both on different hosts ?

VGusev2007 · Monday at 19:40

waltar said:
Yeah, that really looks as two different independent problems !

Sad but no. It happened because my PBS server stopped write any new data during try to get a new backup data. And after several hours my Proxmox VMs just hangs. I logged in into those and all of my VMs which had a some write during backup switched to read only with a lot of errors. I fixed that only from LiveCD. This is sad but true.

waltar said:
Backend storage for the vm's/lxc's (with ext4 inside) on zfs ?
And backend storage for pbs on zfs but both on different hosts ?

Yes, for both questions.

esi_y · 2024-10-31T08:35:28+0100

@VGusev2007 Have you filed this in BZ?

VictorSTS · 2024-10-31T09:08:57+0100

Would you mind to post the configuration of a couple of those VMs that suffered the issue?

VGusev2007 · 2024-10-31T11:38:33+0100

VictorSTS said:
Would you mind to post the configuration of a couple of those VMs that suffered the issue?

Hey! Yes. No problem.

The first one;

Code:

#Current state%3A production.
balloon: 0
boot: order=virtio0
cores: 4
ide2: none,media=cdrom
memory: 20480
name: mail
net0: virtio=52:54:00:3D:1D:10,bridge=vmbr0,firewall=1,tag=1
numa: 1
ostype: l26
parent: auto-hourly-31-10-2024-12-10-05
protection: 1
smbios1: uuid=5928ee84-1926-4ff4-82fa-f37dc443fcf3
sockets: 1
tablet: 0
virtio0: local-zfs:vm-106-disk-1,size=70G
virtio1: local-zfs:vm-106-disk-2,size=6066G

The second one:

Code:

boot: order=scsi0
cores: 4
ide2: none,media=cdrom
memory: 4096
name: antivirus
net0: virtio=00:0C:29:00:36:B2,bridge=vmbr0,tag=1
numa: 1
ostype: l26
parent: snap24072024
protection: 1
scsi0: local-zfs:vm-125-disk-0,size=25G
smbios1: uuid=0e87280d-abab-4842-bc82-484fc1737bd7
sockets: 1
vmgenid: cac2d8a0-d29f-4f1b-83d8-332dec70ea26

I met problem with my mail and antivirus during that time. Both of them had some write during backup. They're critical for me.

VGusev2007 · 2024-10-31T11:41:22+0100

esi_y said:
@VGusev2007 Have you filed this in BZ?

Hey! Please explain me what do you mean when talking about BZ?

waltar · 2024-10-31T12:07:56+0100

BZ is https://bugzilla.proxmox.com/

VGusev2007 · 2024-10-31T13:18:01+0100

waltar said:
BZ is https://bugzilla.proxmox.com/

Oh. Thank you so much. No, I didn't post it to BZ because, PBS or just online backup is wrong by design and there is some info about that in the proxmox wiki. Just don't use it, or upgrade to pve 8.2 and use local SSD for temp cache. But I don't know what will be happening if local SSD out of space during backup. I think there will meet same problem.

[SOLVED] I've LOST and CORRUPTED my data on a proxmox VE v 7.3-4 after space out on PBS 3.2-2!

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Active Member

Renowned Member

Renowned Member

Famous Member

Renowned Member

Renowned Member

Active Member

Renowned Member