Réplication = error : out of space

ag-damien

New Member
Jan 25, 2022
12
1
3
34
Bonjour,
J'ai un souci de réplication sur une VM-100 avec un message d'erreur suivant :
2023-04-04 09:35:02 100-0: end replication job with error: zfs error: cannot create snapshot 'rpool/data/vm-100-disk-1@__replicate_100-0_1680593700__': out of space

Ma VM fait 550Go, j'ai un stockage de 900Go, elle devrait prendre donc un peu plus de la moitié de mon stockage. (j'ai une autre VM qui fait 70Go et qui n'a pas de problème de réplication)

En cherchant un peu, je vois qu'il y a en réalité 739Go utilisé par la VM-100, comment est ce possible ? et comment réactiver ma réplication sur cette VM-100 ?

Merci pour votre aide.


root@agpve-xxxx1:~# zfs get all rpool/data/vm-100-disk-1
NAME PROPERTY VALUE SOURCE
rpool/data/vm-100-disk-1 type volume -
rpool/data/vm-100-disk-1 creation Tue Jan 31 8:59 2023 -
rpool/data/vm-100-disk-1 used 739G -
rpool/data/vm-100-disk-1 available 541G -
rpool/data/vm-100-disk-1 referenced 177G -
rpool/data/vm-100-disk-1 compressratio 1.50x -
rpool/data/vm-100-disk-1 reservation none default
rpool/data/vm-100-disk-1 volsize 550G local
rpool/data/vm-100-disk-1 volblocksize 8K default
rpool/data/vm-100-disk-1 checksum on default
rpool/data/vm-100-disk-1 compression on inherited from rpool
rpool/data/vm-100-disk-1 readonly off default
rpool/data/vm-100-disk-1 createtxg 212 -
rpool/data/vm-100-disk-1 copies 1 default
rpool/data/vm-100-disk-1 refreservation 567G received
rpool/data/vm-100-disk-1 guid 997373996257919687 -
rpool/data/vm-100-disk-1 primarycache all default
rpool/data/vm-100-disk-1 secondarycache all default
rpool/data/vm-100-disk-1 usedbysnapshots 34.1G -
rpool/data/vm-100-disk-1 usedbydataset 177G -
rpool/data/vm-100-disk-1 usedbychildren 0B -
rpool/data/vm-100-disk-1 usedbyrefreservation 529G -
rpool/data/vm-100-disk-1 logbias latency default
rpool/data/vm-100-disk-1 objsetid 413 -
rpool/data/vm-100-disk-1 dedup off default
rpool/data/vm-100-disk-1 mlslabel none default
rpool/data/vm-100-disk-1 sync standard inherited from rpool
rpool/data/vm-100-disk-1 refcompressratio 1.44x -
rpool/data/vm-100-disk-1 written 38.5G -
rpool/data/vm-100-disk-1 logicalused 314G -
rpool/data/vm-100-disk-1 logicalreferenced 254G -
rpool/data/vm-100-disk-1 volmode default default
rpool/data/vm-100-disk-1 snapshot_limit none default
rpool/data/vm-100-disk-1 snapshot_count none default
rpool/data/vm-100-disk-1 snapdev hidden default
rpool/data/vm-100-disk-1 context none default
rpool/data/vm-100-disk-1 fscontext none default
rpool/data/vm-100-disk-1 defcontext none default
rpool/data/vm-100-disk-1 rootcontext none default
rpool/data/vm-100-disk-1 redundant_metadata all default
rpool/data/vm-100-disk-1 encryption off default
rpool/data/vm-100-disk-1 keylocation none default
rpool/data/vm-100-disk-1 keyformat none default
rpool/data/vm-100-disk-1 pbkdf2iters 0 default
 
please post in English!

creating a ZFS snapshot for a "thick" zvol requires space (since you have to keep the data of the snapshot, but also guarantee that it's possible to fully overwrite the whole zvol), you don't have enough, so creating the snapshot fails.
 
Ok sorry,
So what can i do for resolv my problem and re start my replication ?
Thank you
 
this is just how ZFS works - you need to reduce your space usage or increase the amount of raw storage you have (e.g., by adding more vdevs). you can switch to thin volumes (for existing volumes, removing the "refreservation" has the same effect), but that includes the risk of running out of space during regular operations with undefined results including potential data loss or corruption.
 
Ok so i can restart the replication but i don't really understand the risk ?

If i run the next command, the VM-100 will not crash ?

zfs set refreservation=none rpool/data/vm-100-disk-0
and
zfs set refreservation=none rpool/data/vm-100-disk-1
 
yes, you can do that. but if the VM ever writes the full disk, you might not have enough space for that, and what happens then depends entirely on what is running the guest and how it is configured. you can think of it like one of those fake USB pen drives - the zvol basically says "you can store 550G on me", but that's a lie, and it might fail after 100G or after 150G or after 549G or never, it depends on circumstances.
 
Ok so if i understand when my VM will be betwen 700-900 GB I won't be able to do any more replication because there won't be enough space on the server to do the snapshots.
It's correct ?
For now, it's ok because it's small
 
if you have a fully reserved zvol, creating a snapshot requires space for the reservation (even if there is no data there at the moment), but you cannot run out of space by writing to the zvol before it is full. if you don't fully reserve the space, snapshots are cheaper/take up less space, but you can run out of space just by doing regular writes to the zvol, even if the zvol is not full yet. in general, operating systems don't expect a disk to have less space than it says it has, so the failure modes are worse than the already bad regular "disk is full".
 
Ok but my vm cannot exceed 550 GB since I only put a 550 GB disk in it. And the snapshots only take the difference to do the replication, right?
 
if you have a single 550GB zvol, even if you only keep two snapshots (last replication snapshot, current replication snapshot), the total amount of space usage might be 3x550GB + zpool overhead, if the following sequence of events take place:
- write the whole zvol
- run replication (snapshot A is taken)
- write the whole zvol (now snapshot A references 550GB of old data, and the zvol itself references 550GB of *different*, new data)
- start replication (snapshot B is taken)
- write the whole zvol while the replication is still running (snapshot A references 550GB of very old data, snapshot B references a different 550GB of old data, and the zvol itself references yet another different 550GB of new data)
 
What is zvol ? it's my volume ZFS ? on my server i have a zfs storage with 900Go
1680613323637.png
 

Attachments

  • 1680613257583.png
    1680613257583.png
    31.1 KB · Views: 2
see "man zfs" - a zvol is a kind of zfs dataset that can be used as a block device.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!