Hi all,
i've got a problem migrating a vm in an 2 node cluster from pve02 to pve03. The vm is a pbs 4 with 2 disks (os 30gb & data 500gb). Both disks are located at a local zfs 2 TB nvme disk. the storage is called `nvme-2tb` on both nodes. the vm was configured in HA with a replication job every 30 minutes.
By migrating both nodes to PVE9 and the pbs to PBS4 i've noticed, that the jobs has failed since weeks. The log says, that the replication has failed, because of running out of space.
So i checked and found out, the zfs storage was full with about 90 %. So far i know, that zfs should not be used about 80%, but the second node is just for maintenance and so therefore was no problem seen. So i began to clean up the zfs pool on node 2 (pve03). i also removed the replicated copy of the disks from the pbs vm (130). So right now the free / unused space is about 1.3 TB, more than enough space for 30+500 GB of data.
Unfortunatly i am still not able to migrate the vm from node1 (pve02) to node2 (pve03). The error is still present.
I have done a `zpool trim nvme-2tb` and also a `zpool scrub nvme-2tb` and i rebooted both nodes more that 2 times with any change of error.
So for now, i am out of solutions, why i am starting my first thread here in the proxmox forums. So please, be nice. ;-)
Does anyone has an idea?
Thanks in advance.
i've got a problem migrating a vm in an 2 node cluster from pve02 to pve03. The vm is a pbs 4 with 2 disks (os 30gb & data 500gb). Both disks are located at a local zfs 2 TB nvme disk. the storage is called `nvme-2tb` on both nodes. the vm was configured in HA with a replication job every 30 minutes.
By migrating both nodes to PVE9 and the pbs to PBS4 i've noticed, that the jobs has failed since weeks. The log says, that the replication has failed, because of running out of space.
Code:
2025-12-11 23:12:42 use dedicated network address for sending migration traffic (172.16.10.3)
2025-12-11 23:12:42 starting migration of VM 130 to node 'pve03' (172.16.10.3)
2025-12-11 23:12:42 found local, replicated disk 'nvme-2tb:vm-130-disk-0' (attached)
2025-12-11 23:12:42 found local, replicated disk 'nvme-2tb:vm-130-disk-1' (attached)
2025-12-11 23:12:42 replicating disk images
2025-12-11 23:12:42 start replication job
2025-12-11 23:12:42 guest => VM 130, running => 0
2025-12-11 23:12:42 volumes => nvme-2tb:vm-130-disk-0,nvme-2tb:vm-130-disk-1
2025-12-11 23:12:44 create snapshot '__replicate_130-0_1765491162__' on nvme-2tb:vm-130-disk-0
2025-12-11 23:12:44 create snapshot '__replicate_130-0_1765491162__' on nvme-2tb:vm-130-disk-1
2025-12-11 23:12:44 delete previous replication snapshot '__replicate_130-0_1765491162__' on nvme-2tb:vm-130-disk-0
2025-12-11 23:12:44 end replication job with error: zfs error: cannot create snapshot 'nvme-2tb/vm-130-disk-1@__replicate_130-0_1765491162__': out of space
2025-12-11 23:12:44 ERROR: zfs error: cannot create snapshot 'nvme-2tb/vm-130-disk-1@__replicate_130-0_1765491162__': out of space
2025-12-11 23:12:44 aborting phase 1 - cleanup resources
2025-12-11 23:12:44 ERROR: migration aborted (duration 00:00:02): zfs error: cannot create snapshot 'nvme-2tb/vm-130-disk-1@__replicate_130-0_1765491162__': out of space
TASK ERROR: migration aborted
So i checked and found out, the zfs storage was full with about 90 %. So far i know, that zfs should not be used about 80%, but the second node is just for maintenance and so therefore was no problem seen. So i began to clean up the zfs pool on node 2 (pve03). i also removed the replicated copy of the disks from the pbs vm (130). So right now the free / unused space is about 1.3 TB, more than enough space for 30+500 GB of data.
Code:
oot@pve03:~# zfs list -r -o name,used,avail,refer,usedbysnapshots,usedbychildren,usedbyrefreservation nvme-2tb
NAME USED AVAIL REFER USEDSNAP USEDCHILD USEDREFRESERV
nvme-2tb 391G 1.37T 96K 0B 391G 0B
nvme-2tb/subvol-110-disk-0 2.57G 23.1G 1.92G 666M 0B 0B
nvme-2tb/vm-101-disk-0 117G 1.43T 44.6G 11.8G 0B 60.9G
nvme-2tb/vm-101-state-vor_Update 18.6G 1.39T 1.87G 0B 0B 16.7G
nvme-2tb/vm-111-disk-0 49.5G 1.41T 17.0G 0B 0B 32.5G
nvme-2tb/vm-211-disk-0 19.3G 1.39T 7.14G 0B 0B 12.2G
nvme-2tb/vm-211-disk-1 1.02G 1.38T 56K 0B 0B 1.02G
nvme-2tb/vm-212-disk-0 11.4G 1.38T 1.29G 0B 0B 10.2G
nvme-2tb/vm-221-disk-0 17.6G 1.39T 5.44G 0B 0B 12.2G
nvme-2tb/vm-221-disk-1 3.79G 1.38T 1.75G 0B 0B 2.03G
nvme-2tb/vm-221-disk-10 3.05G 1.38T 56K 0B 0B 3.05G
nvme-2tb/vm-221-disk-11 4.06G 1.38T 56K 0B 0B 4.06G
nvme-2tb/vm-221-disk-12 4.06G 1.38T 56K 0B 0B 4.06G
nvme-2tb/vm-221-disk-2 3.79G 1.38T 1.75G 0B 0B 2.03G
nvme-2tb/vm-221-disk-3 3.79G 1.38T 1.75G 0B 0B 2.03G
nvme-2tb/vm-221-disk-4 3.79G 1.38T 1.75G 0B 0B 2.03G
nvme-2tb/vm-221-disk-5 3.78G 1.38T 1.75G 0B 0B 2.03G
nvme-2tb/vm-221-disk-6 3.78G 1.38T 1.75G 0B 0B 2.03G
nvme-2tb/vm-221-disk-7 3.78G 1.38T 1.75G 0B 0B 2.03G
nvme-2tb/vm-221-disk-8 3.78G 1.38T 1.75G 0B 0B 2.03G
nvme-2tb/vm-221-disk-9 3.05G 1.38T 56K 0B 0B 3.05G
nvme-2tb/vm-222-disk-0 15.9G 1.39T 3.70G 0B 0B 12.2G
nvme-2tb/vm-223-disk-0 15.9G 1.39T 3.71G 0B 0B 12.2G
nvme-2tb/vm-301-disk-0 36.1G 1.41T 2.78G 837M 0B 32.5G
nvme-2tb/vm-301-state-preUpdate_25_1 8.62G 1.38T 1.07G 0B 0B 7.55G
nvme-2tb/vm-302-disk-0 32.5G 1.40T 2.80G 0B 0B 29.7G
Unfortunatly i am still not able to migrate the vm from node1 (pve02) to node2 (pve03). The error is still present.
I have done a `zpool trim nvme-2tb` and also a `zpool scrub nvme-2tb` and i rebooted both nodes more that 2 times with any change of error.
So for now, i am out of solutions, why i am starting my first thread here in the proxmox forums. So please, be nice. ;-)
Does anyone has an idea?
Thanks in advance.