Slow restore backup to DRBD9 storage.

dg1000

New Member
Oct 5, 2021
6
0
1
41
Dear Proxmox friends.

My problem is that I think that restore vm from backup files to my DRBD strorage is very slow. For example, my last VM backup with 80GB hard disk was for more than 7 hours.

My Proxmox configuration is a 3 node cluster with 1TB SATA 7,2k and 32Gb RAM and DRBD9 with 3 nodes redundancy.

Code:
pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Code:
# cat /proc/drbd
version: 9.1.7 (api:2/proto:110-121)

Code:
# drbdadm -v
Version: 9.21.1 (api:2)
GIT-hash: 900bbac4cc81befb630692c1378ed59b345e087f build by @buildsystem, 2022-04-27 06:14:20

Code:
#linstor -v
linstor 1.13.1; GIT-hash: 18167ee4f931b1dbde13c84eaf9b7c31ae48c98b

Code:
# linstor n l
╭────────────────────────────────────────────────────────╮
┊ Node  ┊ NodeType ┊ Addresses                  ┊ State  ┊
╞════════════════════════════════════════════════════════╡
┊ node1 ┊ COMBINED ┊ 192.168.130.1:3366 (PLAIN) ┊ Online                                           ┊
┊ node2 ┊ COMBINED ┊ 192.168.130.2:3366 (PLAIN) ┊ Online                                           ┊
┊ node3 ┊ COMBINED ┊ 192.168.130.3:3366 (PLAIN) ┊ Online                                           ┊
╰────────────────────────────────────────────────────────╯
LINSTOR ==> rg l
╭───────────────────────────────────────────────────────────────────╮
┊ ResourceGroup   ┊ SelectFilter             ┊ VlmNrs ┊ Description ┊
╞═══════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp      ┊ PlaceCount: 2            ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ drbdMypoolThree ┊ PlaceCount: 3            ┊ 0      ┊             ┊
┊                 ┊ StoragePool(s): drbdpool ┊        ┊             ┊
╰───────────────────────────────────────────────────────────────────╯

LINSTOR ==> sp l
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node  ┊ Driver   ┊ PoolName              ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ drbdpool             ┊ node1 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊   845.65 GiB ┊       930 GiB ┊ True         ┊ Ok    ┊            ┊
┊ drbdpool             ┊ node2 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊   849.93 GiB ┊       930 GiB ┊ True         ┊ Ok    ┊            ┊
┊ drbdpool             ┊ node3 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊   845.37 GiB ┊       930 GiB ┊ True         ┊ Ok    ┊            ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

LINSTOR ==> rd l
╭────────────────────────────────────────────────╮
┊ ResourceName  ┊ Port ┊ ResourceGroup   ┊ State ┊
╞════════════════════════════════════════════════╡
┊ linstor_db    ┊ 7000 ┊ DfltRscGrp      ┊ ok    ┊
┊ vm-100-disk-1 ┊ 7001 ┊ drbdMypoolThree ┊ ok    ┊
┊ vm-103-disk-1 ┊ 7002 ┊ drbdMypoolThree ┊ ok    ┊
╰────────────────────────────────────────────────╯

Configuration Backup vm .zst file stored in local disk (vzdump-qemu-101-2022_06_06_-12_34_13.vma.zst)
Code:
boot: order=virtio0;ide2;net0
cores: 2
ide2: local:iso/ubuntu-14.04.6-server-amd64.iso,media=cdrom
memory: 2048
name: RadiusManager
net0: virtio=3E:3F:C5:B9:93:30,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=50ec9c8a-5ba9-4fd2-b108-2e48a2a6d47d
sockets: 1
virtio0: local-lvm:vm-101-disk-0,size=80G
vmgenid: 915ea86d-5018-4970-a35c-c7e0b93cf37e
#qmdump#map:virtio0:drive-virtio0:local-lvm:raw:

atop on all nodes when restore is started on node2 in drbdpool (/dev/sdb):

node1
Code:
DSK |
DSK |          sdb | busy     22% | read      38  | write   8058 | KiB/r    128 | KiB/w      8 | MBr/s    0.5 | MBw/s    6.4  | avq     8.42 | avio 0.27 ms |
DSK |          sda | busy      3% | read       6  | write     17 | KiB/r    128 | KiB/w     39 | MBr/s    0.1 | MBw/s    0.1  | avq     1.34 | avio 12.7 ms |
NET | transport    | tcpi    9510 | tcpo   11217  | udpi    1213 | udpo    1232 | tcpao      4 | tcppo      0 | tcprs      0  | tcpie      0 | udpie      0 |
NET | network      | ipi    10743 | ipo    12446  | ipfrw      0 | deliv  10743 |              |              |               | icmpi      0 | icmpo      0 |
NET | eno2    ---- | pcki   27427 | pcko   11154  | sp    0 Mbps | si   28 Mbps | so  889 Kbps | erri       0 | erro       0  | drpi       0 | drpo       0 |
NET | eno1    ---- | pcki    1300 | pcko    1302  | sp    0 Mbps | si  226 Kbps | so  230 Kbps | erri       0 | erro       0  | drpi       0 | drpo       0 |

node2 (sdb is busy 99%)
Code:
DSK |          sdb | busy     99% | read     124  | write   5238 | KiB/r    128 | KiB/w     12 | MBr/s    1.6 | MBw/s    6.2  | avq     5.59 | avio 1.85 ms |
DSK |          sda | busy      8% | read      19  | write    124 | KiB/r    128 | KiB/w      8 | MBr/s    0.2 | MBw/s    0.1  | avq     4.58 | avio 5.29 ms |
NET | transport    | tcpi   20574 | tcpo   56277  | udpi    1071 | udpo    1068 | tcpao      4 | tcppo      6 | tcprs      8  | tcpie      0 | udpie      0 |
NET | network      | ipi    21672 | ipo    12307  | ipfrw      0 | deliv  21672 |              |              |               | icmpi      0 | icmpo      0 |
NET | eno2    ---- | pcki   20623 | pcko   56198  | sp    0 Mbps | si 1906 Kbps | so   57 Mbps | erri       0 | erro       0  | drpi       0 | drpo       0 |
NET | eno1    ---- | pcki    1223 | pcko    1191  | sp    0 Mbps | si  200 Kbps | so  209 Kbps | erri       0 | erro       0  | drpi       0 | drpo       0 |

node3
Code:
DSK |          sdb | busy     16% | read      22  | write   8055 | KiB/r    128 | KiB/w      8 | MBr/s    0.3 | MBw/s    6.4  | avq     9.08 | avio 0.20 ms |
DSK |          sda | busy      1% | read       0  | write     18 | KiB/r      0 | KiB/w      6 | MBr/s    0.0 | MBw/s    0.0  | avq     1.77 | avio 3.56 ms |
NET | transport    | tcpi    7250 | tcpo    9482  | udpi    1136 | udpo    1123 | tcpao      2 | tcppo      8 | tcprs      1  | tcpie      0 | udpie      0 |
NET | network      | ipi     8422 | ipo    10476  | ipfrw      0 | deliv   8413 |              |              |               | icmpi      0 | icmpo      0 |
NET | eno2    ---- | pcki   27852 | pcko    9373  | sp    0 Mbps | si   28 Mbps | so  967 Kbps | erri       0 | erro       0  | drpi       0 | drpo       0 |

Task Viewer restore log:
Code:
restore vma archive: lzop -d -c /var/lib/vz/dump/vzdump-qemu-103-2022_06_07-15_05_30.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp5363.fifo - /var/tmp/vzdumptmp5363
CFG: size: 397 name: qemu-server.conf
DEV: dev_id=1 size: 85901922304 devname: drive-virtio0
CTIME: Tue Jun  7 15:05:32 2022
new volume ID is 'drbdstorage:vm-114-disk-1'
map 'drive-virtio0' to '/dev/drbd/by-res/vm-114-disk-1/0' (write zeros = 1)
progress 1% (read 859045888 bytes, duration 6 sec)
progress 2% (read 1718091776 bytes, duration 13 sec)
progress 3% (read 2577072128 bytes, duration 19 sec)
progress 4% (read 3436118016 bytes, duration 24 sec)
progress 5% (read 4295098368 bytes, duration 30 sec)
progress 6% (read 5154144256 bytes, duration 35 sec)
progress 7% (read 6013190144 bytes, duration 184 sec)
progress 8% (read 6872170496 bytes, duration 469 sec)
.
.
.

hdparm nodes /dev/sdb
Code:
NODE1
hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   12700 MB in  1.99 seconds = 6375.54 MB/sec
 Timing buffered disk reads: 250 MB in  3.00 seconds =  83.26 MB/sec

NODE2
hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   13100 MB in  1.99 seconds = 6576.21 MB/sec
 Timing buffered disk reads: 590 MB in  3.00 seconds = 196.56 MB/sec


NODE3
hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   15770 MB in  1.99 seconds = 7911.57 MB/sec
 Timing buffered disk reads: 464 MB in  3.06 seconds = 151.60 MB/sec

Dedicated LOM for DRBD is connected a 1Gb
Code:
NODE1# ethtool eno2 | grep -i speed
        Speed: 1000Mb/s

NODE2# ethtool eno2 | grep -i speed
        Speed: 1000Mb/s

NODE3# ethtool eno2 | grep -i speed
        Speed: 1000Mb/s

I modified /etc/drbd.d/global_common.conf on all nodes with some parameters I found in forums, but I'm not sure if global_common.conf is working under DRBD9.
Code:
global {
        usage-count yes;

        # Decide what kind of udev symlinks you want for "implicit" volumes
        # (those without explicit volume <vnr> {} block, implied vnr=0):
        # /dev/drbd/by-resource/<resource>/<vnr>   (explicit volumes)
        # /dev/drbd/by-resource/<resource>         (default for implict)
        udev-always-use-vnr; # treat implicit the same as explicit volumes

        # minor-count dialog-refresh disable-ip-verification
        # cmd-timeout-short 5; cmd-timeout-medium 121; cmd-timeout-long 600;
}

common {
        handlers {
                # These are EXAMPLE handlers only.
                # They may have severe implications,
                # like hard resetting the node under certain circumstances.
                # Be careful when choosing your poison.

                # IMPORTANT: most of the following scripts symlink to "notify.sh" which tries to send mail via "mail".
                # If you intend to use this notify.sh script make sure that "mail" is installed.
                #
                # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reb>
                # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reb>
                # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
                # quorum-lost "/usr/lib/drbd/notify-quorum-lost.sh root";
                # disconnected /bin/true;
        }

        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
        }

        options {
                # cpu-mask on-no-data-accessible

                # RECOMMENDED for three or more storage nodes with DRBD 9:
                # quorum majority;
                # on-no-quorum suspend-io | io-error;
        }
        disk {
                # size on-io-error fencing disk-barrier disk-flushes
                # disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
                on-io-error detach;
                no-disk-flushes;
                no-disk-barrier;
                c-plan-ahead 0;
                c-fill-target 24M;
                c-min-rate 80M;
                c-max-rate 720M;
            }

        net {
                # protocol timeout max-epoch-size max-buffers
                # connect-int ping-int sndbuf-size rcvbuf-size ko-count
                # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
                # after-sb-1pri after-sb-2pri always-asbp rr-conflict
                # ping-timeout data-integrity-alg tcp-cork on-congestion
                # congestion-fill congestion-extents csums-alg verify-alg
                # use-rle
                max-buffers 36k;
                #max-epoch-size 20000;
                sndbuf-size 1024k;
                rcvbuf-size 2048k;
            }
}

What do you think about my config? Is the restore speed normal? Is possible that restore speed is limited from some hard disk or maybe I am forgetting to do some extra configuration about disk synchronization?

Thanks in advance.
 
Oct 7, 2019
155
35
33
This won't help, but...
With two nodes I could understand the motivation of using DRBD, although it is unsupported. But, why using DRBD with 3 nodes instead of using CEPH?
 

dg1000

New Member
Oct 5, 2021
6
0
1
41
This won't help, but...
With two nodes I could understand the motivation of using DRBD, although it is unsupported. But, why using DRBD with 3 nodes instead of using CEPH?
Hi VictorSTS.

I'm thinking about test CEPH but I'm not sure if my HW it will be enough for fine working.

I have 3 Dell Servers each with: 32Gb RAM, 1TB SATA 7,2k rpm, 1Gb NIC dedicated for data.

What do you think?
 
Oct 7, 2019
155
35
33
With such hardware you will be limited by the 1Gb nics. At least, try to get 2x1GB nics for CEPH and set them up in an LACP bonding (check that your switch supports LACP!). Then create two VLANs and split CEPH public network and CEPH cluster network. The official documentation for CEPH on Proxmox is quite detailed, I suggest you to read it carefully.

Another option would be to get dual port 10GB NICs for each server and use a full mesh without a switch.

In any case, try to set up CEPH with your current hardware, as it might be enough for your requirements.
 
  • Like
Reactions: dg1000

dg1000

New Member
Oct 5, 2021
6
0
1
41
Spinning rust is a clear bottleneck. You may tweak a bit hier and there, but do not expect any major improvement with current hardware.
I was thinking that this can be the problem, but I don't know how to know it.

Can you recommend any 3.5" disk model that gives good results for data replication using DRBD/CEPH?
 
Oct 7, 2019
155
35
33
If you are planning on getting new disks, simply get enterprise/datacenter 2'5 SSD SATA drives. No point on getting new spinning drives as they will probably give you similiar performance as the ones you already have. Do *not* buy consumer/prosumer grade SSD: they do not perform properly under the workloads that CEPH (or ZFS) produce (mainly sync rw and request coalescing) and will give you poor performance. There are many reports in this forum and elsewere regarding this.
 
  • Like
Reactions: dg1000
Mar 25, 2022
77
20
8
Another option would be to use DC grade NVMe drives. They are not essentially faster or more expensive then DC grade SATA SSDs, but may save you from having trouble with SAS/SATA controllers. If your servers have no support for U.2 NVMe, you can look for PCIe AIC or M.2 + PCIe adapter.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!