Replication between nodes

Benoit · Nov 8, 2018

Hello,

We have a 3 nodes cluster running on Proxmox 5.2-10

On each node there is 2 ZFS pool : VM-STOCKAGE and VM-STOCKAGE2 each with Raid 1 of 4To disks

On one node, pool VM-STOCKAGE 3.51To, there is a VM id 105 that uses two disk : disk-1 2To and disk-2 1To total 3.12TB.

When i try to activate replication between this node to another for this VM i get an out of space error :

2018-11-08 14:07:01 105-0: start replication job
2018-11-08 14:07:01 105-0: guest => VM 105, running => 27648
2018-11-08 14:07:01 105-0: volumes => VM-STOCKAGE:vm-105-disk-1,VM-STOCKAGE:vm-105-disk-2
2018-11-08 14:07:02 105-0: create snapshot '__replicate_105-0_1541682421__' on VM-STOCKAGE:vm-105-disk-1
2018-11-08 14:07:02 105-0: end replication job with error: zfs error: cannot create snapshot 'VM-STOCKAGE/vm-105-disk-1@__replicate_105-0_1541682421__': out of space

I have this setup on three others different cluster in others sites and no problem. I do not understand why on this cluster it is not working.

Thanks.

wolfgang · Nov 8, 2018

Hi,

the error message says you have no space left.

Check if you have snapshots what need this space.

Benoit · Nov 8, 2018

i do not have snapshots ...

VM ID 105

source node :

Destination node :

wolfgang · Nov 8, 2018

Please check

zfs list -t all

on the nodes and send the output

Benoit · Nov 8, 2018

Here is the outputs :

node svr-00-scq (destination for vm error replicate)

root@SVR-00-SCQ:~# zfs list -t all NAME USED AVAIL REFER MOUNTPOINT
VM-STOCKAGE 28.9M 3.84T 24K /VM-STOCKAGE
VM-STOCKAGE2 1.28T 127G 24K /VM-STOCKAGE2
VM-STOCKAGE2/subvol-117-disk-1 3.68G 96.3G 3.68G /VM-STOCKAGE2/subvol-117-disk-1
VM-STOCKAGE2/subvol-117-disk-1@__replicate_117-0_1541631601__ 0B - 3.68G -
VM-STOCKAGE2/vm-108-disk-1 906G 643G 391G -
VM-STOCKAGE2/vm-108-disk-1@__replicate_108-0_1541676601__ 0B - 391G -
VM-STOCKAGE2/vm-111-disk-1 111G 189G 49.1G -
VM-STOCKAGE2/vm-111-disk-1@__replicate_111-1_1541687401__ 1.17M - 49.1G -
VM-STOCKAGE2/vm-111-disk-1@__replicate_111-0_1541687404__ 0B - 49.1G -
VM-STOCKAGE2/vm-118-disk-1 104G 189G 41.8G -
VM-STOCKAGE2/vm-118-disk-1@__replicate_118-1_1541682001__ 398K - 41.8G -
VM-STOCKAGE2/vm-118-disk-1@__replicate_118-0_1541682005__ 0B - 41.8G -
VM-STOCKAGE2/vm-119-disk-1 128G 210G 45.3G -
VM-STOCKAGE2/vm-119-disk-1@__replicate_119-1_1541674811__ 25.5M - 45.3G -
VM-STOCKAGE2/vm-119-disk-1@__replicate_119-0_1541682010__ 0B - 45.3G -
VM-STOCKAGE2/vm-199-disk-1 61.9G 179G 9.95G -

node svr-07-scq (source for vm error replicate)

root@svr-07-scq:~# zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
VM-STOCKAGE 3.12T 399G 24K /VM-STOCKAGE
VM-STOCKAGE/vm-105-disk-1 2.01T 1.59T 831G -
VM-STOCKAGE/vm-105-disk-2 1.11T 1.47T 27.7G -
VM-STOCKAGE2 343G 3.18T 24K /VM-STOCKAGE2
VM-STOCKAGE2/vm-111-disk-1 111G 3.24T 49.1G -
VM-STOCKAGE2/vm-111-disk-1@__replicate_111-0_1541686501__ 19.1M - 49.1G -
VM-STOCKAGE2/vm-111-disk-1@__replicate_111-1_1541687401__ 0B - 49.1G -
VM-STOCKAGE2/vm-118-disk-1 104G 3.24T 41.8G -
VM-STOCKAGE2/vm-118-disk-1@__replicate_118-0_1541674801__ 49.2M - 41.8G -
VM-STOCKAGE2/vm-118-disk-1@__replicate_118-1_1541682001__ 0B - 41.8G -
VM-STOCKAGE2/vm-119-disk-1 128G 3.26T 45.3G -
VM-STOCKAGE2/vm-119-disk-1@__replicate_119-0_1541682010__ 0B - 45.3G -
VM-STOCKAGE2/vm-119-disk-1@__replicate_119-1_1541682014__ 0B - 45.3G -

node svr-09-scq

root@SVR-09-SCQ:~# zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
VM-STOCKAGE 1.69T 1.82T 24K /VM-STOCKAGE
VM-STOCKAGE/vm-112-disk-1 1.69T 2.15T 1.36T -
VM-STOCKAGE2 1.22T 2.29T 24K /VM-STOCKAGE2
VM-STOCKAGE2/subvol-117-disk-1 3.87G 96.2G 3.79G /VM-STOCKAGE2/subvol-117-disk-1
VM-STOCKAGE2/subvol-117-disk-1@__replicate_117-0_1541631601__ 77.3M - 3.69G -
VM-STOCKAGE2/vm-108-disk-1 906G 2.79T 391G -
VM-STOCKAGE2/vm-108-disk-1@__replicate_108-0_1541676601__ 276M - 391G -
VM-STOCKAGE2/vm-111-disk-1 111G 2.35T 49.1G -
VM-STOCKAGE2/vm-111-disk-1@__replicate_111-1_1541687401__ 1.17M - 49.1G -
VM-STOCKAGE2/vm-111-disk-1@__replicate_111-0_1541687404__ 1.07M - 49.1G -
VM-STOCKAGE2/vm-118-disk-1 104G 2.35T 41.8G -
VM-STOCKAGE2/vm-118-disk-1@__replicate_118-1_1541682001__ 415K - 41.8G -
VM-STOCKAGE2/vm-118-disk-1@__replicate_118-0_1541682005__ 407K - 41.8G -
VM-STOCKAGE2/vm-119-disk-1 128G 2.37T 45.3G -
VM-STOCKAGE2/vm-119-disk-1@__replicate_119-0_1541682010__ 0B - 45.3G -
VM-STOCKAGE2/vm-119-disk-1@__replicate_119-1_1541682014__ 0B - 45.3G -

wolfgang · Nov 8, 2018

I need a bit more information about node root@svr-07-scq

zpool status VM-STOCKAGE

zpool list VM-STOCKAGE

Benoit · Nov 8, 2018

here is :

root@svr-07-scq:~# zpool status VM-STOCKAGE
pool: VM-STOCKAGE
state: ONLINE
scan: scrub repaired 0B in 5h45m with 0 errors on Sun Oct 14 06:09:32 2018
config:

NAME STATE READ WRITE CKSUM
VM-STOCKAGE ONLINE 0 0 0
sdb ONLINE 0 0 0

errors: No known data errors

root@svr-07-scq:~# zpool list VM-STOCKAGE
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
VM-STOCKAGE 3.62T 860G 2.78T - 20% 23% 1.00x ONLINE -

wolfgang · Nov 9, 2018

Normal you should have enough space for an snapshot.
lets have a look into the blocksize

Code:

zfs get -rH volsize  VM-STOCKAGE
zpool get ashift

Benoit · Nov 9, 2018

Hello,

root@svr-07-scq:~# zfs get -rH volsize VM-STOCKAGE
VM-STOCKAGE volsize - -
VM-STOCKAGE/vm-105-disk-1 volsize 1.95T local
VM-STOCKAGE/vm-105-disk-2 volsize 1.07T local

root@svr-07-scq:~# zpool get ashift
NAME PROPERTY VALUE SOURCE
VM-STOCKAGE ashift 0 default
VM-STOCKAGE2 ashift 0 default

For my understanding.

When node engage replication, it creates a snapshot and send this snapshot to other node ?

Is it possible to change default directory for snapshot on other volume pool ? On this node on pool VM-STOCKAGE2 i have almost 3.51To empty

Thanks a lot for you help again.

Benoit · Nov 27, 2018

Hello,

Can i have some news ?
It create a problem when for backup too. Can t use snapshot mode.

Regards.

Benoit · Dec 7, 2018

Hello,

Any ideas ?

Regards.

Benoit · Dec 11, 2018

Hello,

Still waiting for help on this thread !

Thanks.

guletz · Dec 11, 2018

Hi,

It seems to me that as your error say from the first message that you do not have enough space on destination pool. When you show your destination zpool, as a free space is ok. But you must take in account that this free space is corelated with hdd block size. Let put this in a simple way:

on source you have 4 blocks of data (4x 512 b)= 2k

on destination you have free 8k (2 blocks x 4k)

So as a space count 8k is ok for replication from source to destination. But zfs wil replicate block , so 4 blocks on source will be 4x 4 k = 16 k . So in this case you will see the message not enough space !

Good luck!

Benoit · Dec 11, 2018

Hello,

Thanks for your reply !

On source :

root@svr-07-scq:~# cat /sys/block/sdb/queue/hw_sector_size
512

root@svr-07-scq:~# blockdev --getss --getpbsz /dev/sdb
512
512

On destination :

root@SVR-00-SCQ:~# cat /sys/block/sdb/queue/hw_sector_size
512

root@SVR-00-SCQ:~# blockdev --getss --getpbsz /dev/sdb
512
512

it seems sector size are the same ? no ?

guletz · Dec 12, 2018

It seems to be the same. But you say that you use a raid 1 zfs pool, so you must need to find if both disk have the same value.

Anyway after I read again you post, the error about not enough disk free space it is about source. As you say at source have 3.12 To from 3.51 To. So you are under tha safe guard free space(around 10 %), because zfs will also store not only user data but also the check sums and metadata and some internal data.
So from this reason, at source snapshots can not be created. So if want to be on safe side, do not go with zfs when free space is under 20%. Around this value, any write is very dificult because your fragmentation will be higher and it is also hard to find in a short time the blocks who are free. And the iops will be bad.
If your all disk have 512 block size this will also increase the space used for metadata compared with the same size disk with 4k.
In my own opinion in this days, usage of 512 disk with zfs is a very bad option. Think for a moment that only one disk is broken. The probability to find a new one with 512 in the future is very low. Most probably you will find a 4k. And if you have create your pool with ashift=9 aka 512 size, then your pool will be very very slow. because for any 512 write on old disk, you will write a 4k on the new disk. The same will be for read. But you are luky because you have mirror, on raidz is more badly

Can you post the output for zpool list -v on the source pool?

Benoit · Dec 13, 2018

Thanks for your time answering.

My server is a Dell R530 brand new with 5 years pro support. So i think i will find disk easyly (i hope ! )
I use Raid 1 with the perc controller card.

I use ZFS because i want to use replication between nodes, how can i do it without that ? is it possible ?

I think i can reduce the disk of my VM, do you think it can help ?

Here is the output.

root@svr-07-scq:~# zpool list -v
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
VM-STOCKAGE 3.62T 960G 2.69T - 28% 25% 1.00x ONLINE -
sdb 3.62T 960G 2.69T - 28% 25%
VM-STOCKAGE2 3.62T 140G 3.49T - 6% 3% 1.00x ONLINE -
sdc 3.62T 140G 3.49T - 6% 3%
root@svr-07-scq:~#

thanks again !

guletz · Dec 13, 2018

Hi Benoit,

So your VM-STOCKAGE have a lot of free space(25% is data only), so the rest of the space is reserved for your allocated vdisk(VM105). I think you can try to reduce your vdisk size could solve the problem. Anyway, in my case I allocate only the space I need it + 25%. I prefer to use free space for snapshots! From time to time I increase the vdisk space as is need it. Sometime you will need to move your VM from a node to other node, but for this you need free space!
I do not understand why do you use 2 different pools on each node! Maybe because you have add the last 2 disk before you setup the first pool? It is more space efficeient + performance to use only a single pool(with 2 pools you will split the ARC for 2), so is better to have a single pool(raid10?)

Good luck!

Benoit · Dec 13, 2018

Hello,

Two differents pools because i made a mistake i think ... in my mind if i split at perc controller card level the two raid 1 instead of raid 10 i will have a security because the second array will stay online even if both disks in the first array fail. For performance i loose a little but it is not significant in my usage.
Also i thought that it will be clear if i made two pools on nodes for the VM online and offline.

I explain myself :

svr-07-scq : vm online 105 on first pool and replicate of the other node on other pool
svr-09-scq : vm online 112 / 114 / 118 / 116 / 111 on second pool and replicate of the 105 on first
svr-00-scq : old server used only for quorum and with some test vm that are not important. On sites where i have some space i use it to replicate all vm of the other nodes

I see that i have a lot to learn about Proxmox, and that auto learning has many limits. Work with someone like you would be great to learn easily.

It will try to reduce my vm disk, linux server with volume groups, i hope i will not made a mistake !
It will restore a backup of this server on quorum node, and make my tests on it.

I never reduce a vdisk, is there a better way to do this ?

Again, thanks a lot for your time and your help. I really apreciate.

guletz · Dec 13, 2018

Hello again,

Benoit said:
Two differents pools because i made a mistake i think ... in my mind if i split at perc controller card level the two raid 1 instead of raid 10 i will have a security because the second array will stay online even if both disks in the first array fail. For performance i loose a little but it is not significant in my usage.
Also i thought that it will be clear if i made two pools on nodes for the VM online and offline.

What kind of PERC do you have! I also have H330 and I do not use it in raid mode(I used in jbod/hba mode), because is not so good for zfs.
You could have one pool, but with different data sets, like data-online, data-offline, short-backups(short time retention), and so on. Each data sets cand be configured with different zfs proprieties(like different volblocksize. In my case I do not use directly the pool in zfs, but I use data sets for proxmox.

Benoit said:
I never reduce a vdisk, is there a better way to do this ?

gparted live-cd

Good luck!

Benoit · Dec 13, 2018

I have Perc H730 with 1Gb cache. I make two raid 1 on it and after make ZFS pools with

zpool create -f VM-STOCKAGE /dev/sdb
zpool create -f VM-STOCKAGE2 /dev/sdc

I never use data sets. In fact i didn't know that this exist. I will search on our google friend !

Thanks again ! (and again .....)

Replication between nodes

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

We value your privacy