Slow restore backup to DRBD9 storage.

Status
Not open for further replies.

dg1000

New Member
Oct 5, 2021
6
0
1
43
Dear Proxmox friends.

My problem is that I think that restore vm from backup files to my DRBD strorage is very slow. For example, my last VM backup with 80GB hard disk was for more than 7 hours.

My Proxmox configuration is a 3 node cluster with 1TB SATA 7,2k and 32Gb RAM and DRBD9 with 3 nodes redundancy.

Code:
pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Code:
# cat /proc/drbd
version: 9.1.7 (api:2/proto:110-121)

Code:
# drbdadm -v
Version: 9.21.1 (api:2)
GIT-hash: 900bbac4cc81befb630692c1378ed59b345e087f build by @buildsystem, 2022-04-27 06:14:20

Code:
#linstor -v
linstor 1.13.1; GIT-hash: 18167ee4f931b1dbde13c84eaf9b7c31ae48c98b

Code:
# linstor n l
╭────────────────────────────────────────────────────────╮
┊ Node  ┊ NodeType ┊ Addresses                  ┊ State  ┊
╞════════════════════════════════════════════════════════╡
┊ node1 ┊ COMBINED ┊ 192.168.130.1:3366 (PLAIN) ┊ Online                                           ┊
┊ node2 ┊ COMBINED ┊ 192.168.130.2:3366 (PLAIN) ┊ Online                                           ┊
┊ node3 ┊ COMBINED ┊ 192.168.130.3:3366 (PLAIN) ┊ Online                                           ┊
╰────────────────────────────────────────────────────────╯
LINSTOR ==> rg l
╭───────────────────────────────────────────────────────────────────╮
┊ ResourceGroup   ┊ SelectFilter             ┊ VlmNrs ┊ Description ┊
╞═══════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp      ┊ PlaceCount: 2            ┊        ┊             ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ drbdMypoolThree ┊ PlaceCount: 3            ┊ 0      ┊             ┊
┊                 ┊ StoragePool(s): drbdpool ┊        ┊             ┊
╰───────────────────────────────────────────────────────────────────╯

LINSTOR ==> sp l
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node  ┊ Driver   ┊ PoolName              ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊                       ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ drbdpool             ┊ node1 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊   845.65 GiB ┊       930 GiB ┊ True         ┊ Ok    ┊            ┊
┊ drbdpool             ┊ node2 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊   849.93 GiB ┊       930 GiB ┊ True         ┊ Ok    ┊            ┊
┊ drbdpool             ┊ node3 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊   845.37 GiB ┊       930 GiB ┊ True         ┊ Ok    ┊            ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

LINSTOR ==> rd l
╭────────────────────────────────────────────────╮
┊ ResourceName  ┊ Port ┊ ResourceGroup   ┊ State ┊
╞════════════════════════════════════════════════╡
┊ linstor_db    ┊ 7000 ┊ DfltRscGrp      ┊ ok    ┊
┊ vm-100-disk-1 ┊ 7001 ┊ drbdMypoolThree ┊ ok    ┊
┊ vm-103-disk-1 ┊ 7002 ┊ drbdMypoolThree ┊ ok    ┊
╰────────────────────────────────────────────────╯

Configuration Backup vm .zst file stored in local disk (vzdump-qemu-101-2022_06_06_-12_34_13.vma.zst)
Code:
boot: order=virtio0;ide2;net0
cores: 2
ide2: local:iso/ubuntu-14.04.6-server-amd64.iso,media=cdrom
memory: 2048
name: RadiusManager
net0: virtio=3E:3F:C5:B9:93:30,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=50ec9c8a-5ba9-4fd2-b108-2e48a2a6d47d
sockets: 1
virtio0: local-lvm:vm-101-disk-0,size=80G
vmgenid: 915ea86d-5018-4970-a35c-c7e0b93cf37e
#qmdump#map:virtio0:drive-virtio0:local-lvm:raw:

atop on all nodes when restore is started on node2 in drbdpool (/dev/sdb):

node1
Code:
DSK |
DSK |          sdb | busy     22% | read      38  | write   8058 | KiB/r    128 | KiB/w      8 | MBr/s    0.5 | MBw/s    6.4  | avq     8.42 | avio 0.27 ms |
DSK |          sda | busy      3% | read       6  | write     17 | KiB/r    128 | KiB/w     39 | MBr/s    0.1 | MBw/s    0.1  | avq     1.34 | avio 12.7 ms |
NET | transport    | tcpi    9510 | tcpo   11217  | udpi    1213 | udpo    1232 | tcpao      4 | tcppo      0 | tcprs      0  | tcpie      0 | udpie      0 |
NET | network      | ipi    10743 | ipo    12446  | ipfrw      0 | deliv  10743 |              |              |               | icmpi      0 | icmpo      0 |
NET | eno2    ---- | pcki   27427 | pcko   11154  | sp    0 Mbps | si   28 Mbps | so  889 Kbps | erri       0 | erro       0  | drpi       0 | drpo       0 |
NET | eno1    ---- | pcki    1300 | pcko    1302  | sp    0 Mbps | si  226 Kbps | so  230 Kbps | erri       0 | erro       0  | drpi       0 | drpo       0 |

node2 (sdb is busy 99%)
Code:
DSK |          sdb | busy     99% | read     124  | write   5238 | KiB/r    128 | KiB/w     12 | MBr/s    1.6 | MBw/s    6.2  | avq     5.59 | avio 1.85 ms |
DSK |          sda | busy      8% | read      19  | write    124 | KiB/r    128 | KiB/w      8 | MBr/s    0.2 | MBw/s    0.1  | avq     4.58 | avio 5.29 ms |
NET | transport    | tcpi   20574 | tcpo   56277  | udpi    1071 | udpo    1068 | tcpao      4 | tcppo      6 | tcprs      8  | tcpie      0 | udpie      0 |
NET | network      | ipi    21672 | ipo    12307  | ipfrw      0 | deliv  21672 |              |              |               | icmpi      0 | icmpo      0 |
NET | eno2    ---- | pcki   20623 | pcko   56198  | sp    0 Mbps | si 1906 Kbps | so   57 Mbps | erri       0 | erro       0  | drpi       0 | drpo       0 |
NET | eno1    ---- | pcki    1223 | pcko    1191  | sp    0 Mbps | si  200 Kbps | so  209 Kbps | erri       0 | erro       0  | drpi       0 | drpo       0 |

node3
Code:
DSK |          sdb | busy     16% | read      22  | write   8055 | KiB/r    128 | KiB/w      8 | MBr/s    0.3 | MBw/s    6.4  | avq     9.08 | avio 0.20 ms |
DSK |          sda | busy      1% | read       0  | write     18 | KiB/r      0 | KiB/w      6 | MBr/s    0.0 | MBw/s    0.0  | avq     1.77 | avio 3.56 ms |
NET | transport    | tcpi    7250 | tcpo    9482  | udpi    1136 | udpo    1123 | tcpao      2 | tcppo      8 | tcprs      1  | tcpie      0 | udpie      0 |
NET | network      | ipi     8422 | ipo    10476  | ipfrw      0 | deliv   8413 |              |              |               | icmpi      0 | icmpo      0 |
NET | eno2    ---- | pcki   27852 | pcko    9373  | sp    0 Mbps | si   28 Mbps | so  967 Kbps | erri       0 | erro       0  | drpi       0 | drpo       0 |

Task Viewer restore log:
Code:
restore vma archive: lzop -d -c /var/lib/vz/dump/vzdump-qemu-103-2022_06_07-15_05_30.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp5363.fifo - /var/tmp/vzdumptmp5363
CFG: size: 397 name: qemu-server.conf
DEV: dev_id=1 size: 85901922304 devname: drive-virtio0
CTIME: Tue Jun  7 15:05:32 2022
new volume ID is 'drbdstorage:vm-114-disk-1'
map 'drive-virtio0' to '/dev/drbd/by-res/vm-114-disk-1/0' (write zeros = 1)
progress 1% (read 859045888 bytes, duration 6 sec)
progress 2% (read 1718091776 bytes, duration 13 sec)
progress 3% (read 2577072128 bytes, duration 19 sec)
progress 4% (read 3436118016 bytes, duration 24 sec)
progress 5% (read 4295098368 bytes, duration 30 sec)
progress 6% (read 5154144256 bytes, duration 35 sec)
progress 7% (read 6013190144 bytes, duration 184 sec)
progress 8% (read 6872170496 bytes, duration 469 sec)
.
.
.

hdparm nodes /dev/sdb
Code:
NODE1
hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   12700 MB in  1.99 seconds = 6375.54 MB/sec
 Timing buffered disk reads: 250 MB in  3.00 seconds =  83.26 MB/sec

NODE2
hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   13100 MB in  1.99 seconds = 6576.21 MB/sec
 Timing buffered disk reads: 590 MB in  3.00 seconds = 196.56 MB/sec


NODE3
hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   15770 MB in  1.99 seconds = 7911.57 MB/sec
 Timing buffered disk reads: 464 MB in  3.06 seconds = 151.60 MB/sec

Dedicated LOM for DRBD is connected a 1Gb
Code:
NODE1# ethtool eno2 | grep -i speed
        Speed: 1000Mb/s

NODE2# ethtool eno2 | grep -i speed
        Speed: 1000Mb/s

NODE3# ethtool eno2 | grep -i speed
        Speed: 1000Mb/s

I modified /etc/drbd.d/global_common.conf on all nodes with some parameters I found in forums, but I'm not sure if global_common.conf is working under DRBD9.
Code:
global {
        usage-count yes;

        # Decide what kind of udev symlinks you want for "implicit" volumes
        # (those without explicit volume <vnr> {} block, implied vnr=0):
        # /dev/drbd/by-resource/<resource>/<vnr>   (explicit volumes)
        # /dev/drbd/by-resource/<resource>         (default for implict)
        udev-always-use-vnr; # treat implicit the same as explicit volumes

        # minor-count dialog-refresh disable-ip-verification
        # cmd-timeout-short 5; cmd-timeout-medium 121; cmd-timeout-long 600;
}

common {
        handlers {
                # These are EXAMPLE handlers only.
                # They may have severe implications,
                # like hard resetting the node under certain circumstances.
                # Be careful when choosing your poison.

                # IMPORTANT: most of the following scripts symlink to "notify.sh" which tries to send mail via "mail".
                # If you intend to use this notify.sh script make sure that "mail" is installed.
                #
                # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reb>
                # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reb>
                # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
                # quorum-lost "/usr/lib/drbd/notify-quorum-lost.sh root";
                # disconnected /bin/true;
        }

        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
        }

        options {
                # cpu-mask on-no-data-accessible

                # RECOMMENDED for three or more storage nodes with DRBD 9:
                # quorum majority;
                # on-no-quorum suspend-io | io-error;
        }
        disk {
                # size on-io-error fencing disk-barrier disk-flushes
                # disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
                on-io-error detach;
                no-disk-flushes;
                no-disk-barrier;
                c-plan-ahead 0;
                c-fill-target 24M;
                c-min-rate 80M;
                c-max-rate 720M;
            }

        net {
                # protocol timeout max-epoch-size max-buffers
                # connect-int ping-int sndbuf-size rcvbuf-size ko-count
                # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
                # after-sb-1pri after-sb-2pri always-asbp rr-conflict
                # ping-timeout data-integrity-alg tcp-cork on-congestion
                # congestion-fill congestion-extents csums-alg verify-alg
                # use-rle
                max-buffers 36k;
                #max-epoch-size 20000;
                sndbuf-size 1024k;
                rcvbuf-size 2048k;
            }
}

What do you think about my config? Is the restore speed normal? Is possible that restore speed is limited from some hard disk or maybe I am forgetting to do some extra configuration about disk synchronization?

Thanks in advance.
 
Last edited:
Status
Not open for further replies.

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!