Dear Proxmox friends.
My problem is that I think that restore vm from backup files to my DRBD strorage is very slow. For example, my last VM backup with 80GB hard disk was for more than 7 hours.
My Proxmox configuration is a 3 node cluster with 1TB SATA 7,2k and 32Gb RAM and DRBD9 with 3 nodes redundancy.
Configuration Backup vm .zst file stored in local disk (vzdump-qemu-101-2022_06_06_-12_34_13.vma.zst)
atop on all nodes when restore is started on node2 in drbdpool (/dev/sdb):
node1
node2 (sdb is busy 99%)
node3
Task Viewer restore log:
hdparm nodes /dev/sdb
Dedicated LOM for DRBD is connected a 1Gb
I modified /etc/drbd.d/global_common.conf on all nodes with some parameters I found in forums, but I'm not sure if global_common.conf is working under DRBD9.
What do you think about my config? Is the restore speed normal? Is possible that restore speed is limited from some hard disk or maybe I am forgetting to do some extra configuration about disk synchronization?
Thanks in advance.
My problem is that I think that restore vm from backup files to my DRBD strorage is very slow. For example, my last VM backup with 80GB hard disk was for more than 7 hours.
My Proxmox configuration is a 3 node cluster with 1TB SATA 7,2k and 32Gb RAM and DRBD9 with 3 nodes redundancy.
Code:
pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
Code:
# cat /proc/drbd
version: 9.1.7 (api:2/proto:110-121)
Code:
# drbdadm -v
Version: 9.21.1 (api:2)
GIT-hash: 900bbac4cc81befb630692c1378ed59b345e087f build by @buildsystem, 2022-04-27 06:14:20
Code:
#linstor -v
linstor 1.13.1; GIT-hash: 18167ee4f931b1dbde13c84eaf9b7c31ae48c98b
Code:
# linstor n l
╭────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞════════════════════════════════════════════════════════╡
┊ node1 ┊ COMBINED ┊ 192.168.130.1:3366 (PLAIN) ┊ Online ┊
┊ node2 ┊ COMBINED ┊ 192.168.130.2:3366 (PLAIN) ┊ Online ┊
┊ node3 ┊ COMBINED ┊ 192.168.130.3:3366 (PLAIN) ┊ Online ┊
╰────────────────────────────────────────────────────────╯
LINSTOR ==> rg l
╭───────────────────────────────────────────────────────────────────╮
┊ ResourceGroup ┊ SelectFilter ┊ VlmNrs ┊ Description ┊
╞═══════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp ┊ PlaceCount: 2 ┊ ┊ ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ drbdMypoolThree ┊ PlaceCount: 3 ┊ 0 ┊ ┊
┊ ┊ StoragePool(s): drbdpool ┊ ┊ ┊
╰───────────────────────────────────────────────────────────────────╯
LINSTOR ==> sp l
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ node1 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ node2 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ DfltDisklessStorPool ┊ node3 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ drbdpool ┊ node1 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊ 845.65 GiB ┊ 930 GiB ┊ True ┊ Ok ┊ ┊
┊ drbdpool ┊ node2 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊ 849.93 GiB ┊ 930 GiB ┊ True ┊ Ok ┊ ┊
┊ drbdpool ┊ node3 ┊ LVM_THIN ┊ drbdpool/drbdthinpool ┊ 845.37 GiB ┊ 930 GiB ┊ True ┊ Ok ┊ ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
LINSTOR ==> rd l
╭────────────────────────────────────────────────╮
┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State ┊
╞════════════════════════════════════════════════╡
┊ linstor_db ┊ 7000 ┊ DfltRscGrp ┊ ok ┊
┊ vm-100-disk-1 ┊ 7001 ┊ drbdMypoolThree ┊ ok ┊
┊ vm-103-disk-1 ┊ 7002 ┊ drbdMypoolThree ┊ ok ┊
╰────────────────────────────────────────────────╯
Configuration Backup vm .zst file stored in local disk (vzdump-qemu-101-2022_06_06_-12_34_13.vma.zst)
Code:
boot: order=virtio0;ide2;net0
cores: 2
ide2: local:iso/ubuntu-14.04.6-server-amd64.iso,media=cdrom
memory: 2048
name: RadiusManager
net0: virtio=3E:3F:C5:B9:93:30,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=50ec9c8a-5ba9-4fd2-b108-2e48a2a6d47d
sockets: 1
virtio0: local-lvm:vm-101-disk-0,size=80G
vmgenid: 915ea86d-5018-4970-a35c-c7e0b93cf37e
#qmdump#map:virtio0:drive-virtio0:local-lvm:raw:
atop on all nodes when restore is started on node2 in drbdpool (/dev/sdb):
node1
Code:
DSK |
DSK | sdb | busy 22% | read 38 | write 8058 | KiB/r 128 | KiB/w 8 | MBr/s 0.5 | MBw/s 6.4 | avq 8.42 | avio 0.27 ms |
DSK | sda | busy 3% | read 6 | write 17 | KiB/r 128 | KiB/w 39 | MBr/s 0.1 | MBw/s 0.1 | avq 1.34 | avio 12.7 ms |
NET | transport | tcpi 9510 | tcpo 11217 | udpi 1213 | udpo 1232 | tcpao 4 | tcppo 0 | tcprs 0 | tcpie 0 | udpie 0 |
NET | network | ipi 10743 | ipo 12446 | ipfrw 0 | deliv 10743 | | | | icmpi 0 | icmpo 0 |
NET | eno2 ---- | pcki 27427 | pcko 11154 | sp 0 Mbps | si 28 Mbps | so 889 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | eno1 ---- | pcki 1300 | pcko 1302 | sp 0 Mbps | si 226 Kbps | so 230 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
node2 (sdb is busy 99%)
Code:
DSK | sdb | busy 99% | read 124 | write 5238 | KiB/r 128 | KiB/w 12 | MBr/s 1.6 | MBw/s 6.2 | avq 5.59 | avio 1.85 ms |
DSK | sda | busy 8% | read 19 | write 124 | KiB/r 128 | KiB/w 8 | MBr/s 0.2 | MBw/s 0.1 | avq 4.58 | avio 5.29 ms |
NET | transport | tcpi 20574 | tcpo 56277 | udpi 1071 | udpo 1068 | tcpao 4 | tcppo 6 | tcprs 8 | tcpie 0 | udpie 0 |
NET | network | ipi 21672 | ipo 12307 | ipfrw 0 | deliv 21672 | | | | icmpi 0 | icmpo 0 |
NET | eno2 ---- | pcki 20623 | pcko 56198 | sp 0 Mbps | si 1906 Kbps | so 57 Mbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
NET | eno1 ---- | pcki 1223 | pcko 1191 | sp 0 Mbps | si 200 Kbps | so 209 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
node3
Code:
DSK | sdb | busy 16% | read 22 | write 8055 | KiB/r 128 | KiB/w 8 | MBr/s 0.3 | MBw/s 6.4 | avq 9.08 | avio 0.20 ms |
DSK | sda | busy 1% | read 0 | write 18 | KiB/r 0 | KiB/w 6 | MBr/s 0.0 | MBw/s 0.0 | avq 1.77 | avio 3.56 ms |
NET | transport | tcpi 7250 | tcpo 9482 | udpi 1136 | udpo 1123 | tcpao 2 | tcppo 8 | tcprs 1 | tcpie 0 | udpie 0 |
NET | network | ipi 8422 | ipo 10476 | ipfrw 0 | deliv 8413 | | | | icmpi 0 | icmpo 0 |
NET | eno2 ---- | pcki 27852 | pcko 9373 | sp 0 Mbps | si 28 Mbps | so 967 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 |
Task Viewer restore log:
Code:
restore vma archive: lzop -d -c /var/lib/vz/dump/vzdump-qemu-103-2022_06_07-15_05_30.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp5363.fifo - /var/tmp/vzdumptmp5363
CFG: size: 397 name: qemu-server.conf
DEV: dev_id=1 size: 85901922304 devname: drive-virtio0
CTIME: Tue Jun 7 15:05:32 2022
new volume ID is 'drbdstorage:vm-114-disk-1'
map 'drive-virtio0' to '/dev/drbd/by-res/vm-114-disk-1/0' (write zeros = 1)
progress 1% (read 859045888 bytes, duration 6 sec)
progress 2% (read 1718091776 bytes, duration 13 sec)
progress 3% (read 2577072128 bytes, duration 19 sec)
progress 4% (read 3436118016 bytes, duration 24 sec)
progress 5% (read 4295098368 bytes, duration 30 sec)
progress 6% (read 5154144256 bytes, duration 35 sec)
progress 7% (read 6013190144 bytes, duration 184 sec)
progress 8% (read 6872170496 bytes, duration 469 sec)
.
.
.
hdparm nodes /dev/sdb
Code:
NODE1
hdparm -tT /dev/sdb
/dev/sdb:
Timing cached reads: 12700 MB in 1.99 seconds = 6375.54 MB/sec
Timing buffered disk reads: 250 MB in 3.00 seconds = 83.26 MB/sec
NODE2
hdparm -tT /dev/sdb
/dev/sdb:
Timing cached reads: 13100 MB in 1.99 seconds = 6576.21 MB/sec
Timing buffered disk reads: 590 MB in 3.00 seconds = 196.56 MB/sec
NODE3
hdparm -tT /dev/sdb
/dev/sdb:
Timing cached reads: 15770 MB in 1.99 seconds = 7911.57 MB/sec
Timing buffered disk reads: 464 MB in 3.06 seconds = 151.60 MB/sec
Dedicated LOM for DRBD is connected a 1Gb
Code:
NODE1# ethtool eno2 | grep -i speed
Speed: 1000Mb/s
NODE2# ethtool eno2 | grep -i speed
Speed: 1000Mb/s
NODE3# ethtool eno2 | grep -i speed
Speed: 1000Mb/s
I modified /etc/drbd.d/global_common.conf on all nodes with some parameters I found in forums, but I'm not sure if global_common.conf is working under DRBD9.
Code:
global {
usage-count yes;
# Decide what kind of udev symlinks you want for "implicit" volumes
# (those without explicit volume <vnr> {} block, implied vnr=0):
# /dev/drbd/by-resource/<resource>/<vnr> (explicit volumes)
# /dev/drbd/by-resource/<resource> (default for implict)
udev-always-use-vnr; # treat implicit the same as explicit volumes
# minor-count dialog-refresh disable-ip-verification
# cmd-timeout-short 5; cmd-timeout-medium 121; cmd-timeout-long 600;
}
common {
handlers {
# These are EXAMPLE handlers only.
# They may have severe implications,
# like hard resetting the node under certain circumstances.
# Be careful when choosing your poison.
# IMPORTANT: most of the following scripts symlink to "notify.sh" which tries to send mail via "mail".
# If you intend to use this notify.sh script make sure that "mail" is installed.
#
# pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reb>
# pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reb>
# local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
# quorum-lost "/usr/lib/drbd/notify-quorum-lost.sh root";
# disconnected /bin/true;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
}
options {
# cpu-mask on-no-data-accessible
# RECOMMENDED for three or more storage nodes with DRBD 9:
# quorum majority;
# on-no-quorum suspend-io | io-error;
}
disk {
# size on-io-error fencing disk-barrier disk-flushes
# disk-drain md-flushes resync-rate resync-after al-extents
# c-plan-ahead c-delay-target c-fill-target c-max-rate
# c-min-rate disk-timeout
on-io-error detach;
no-disk-flushes;
no-disk-barrier;
c-plan-ahead 0;
c-fill-target 24M;
c-min-rate 80M;
c-max-rate 720M;
}
net {
# protocol timeout max-epoch-size max-buffers
# connect-int ping-int sndbuf-size rcvbuf-size ko-count
# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
# after-sb-1pri after-sb-2pri always-asbp rr-conflict
# ping-timeout data-integrity-alg tcp-cork on-congestion
# congestion-fill congestion-extents csums-alg verify-alg
# use-rle
max-buffers 36k;
#max-epoch-size 20000;
sndbuf-size 1024k;
rcvbuf-size 2048k;
}
}
What do you think about my config? Is the restore speed normal? Is possible that restore speed is limited from some hard disk or maybe I am forgetting to do some extra configuration about disk synchronization?
Thanks in advance.