ZFS - Restores

jkirker · Jul 13, 2018

I've never seen this issue before but I'm doing a restore from a local spinner to a local ZFS volume.

It's taken the entire host node down to a crawl and while I can ping the node, I can't ssh into it.

I've never had this happen. I've seen some really slow behavior with ZFS reading and writing. What used to restore in 5 minutes over ext4 is now taking 40-50 minutes - or longer.

I'll have more information I believe when the restore is complete and I can get back into the machine - but right now it's killing me. The node and all of the vms just show a ? because the UI can't get any details.

Thoughts?

Nemesiz · Jul 13, 2018

What is your ZFS pool configuration? On heady load (depends on how slow setup is) pool can be very unacceptable. For SSH use another pool for server OS to avoid IO waiting.

jkirker · Jul 13, 2018

Here's the output - am I doing something wrong?

36 minutes for ~100 gigs... Worse - it took the entire server down. This is restoring from a local spinner to a local SSD.

Virtual Environment 5.2-5
Virtual Machine 100 (host6.x.com) on node 'mox2'
Logs
()
restore vma archive: lzop -d -c /home/backups//dump/vzdump-qemu-100-2018_07_12-18_30_02.vma.lzo | vma extract -v -r /var/tmp/vzdumptmp1215315.fifo - /var/tmp/vzdumptmp1215315
CFG: size: 572 name: qemu-server.conf
DEV: dev_id=1 size: 34359738368 devname: drive-virtio0
DEV: dev_id=2 size: 68719476736 devname: drive-virtio1
CTIME: Thu Jul 12 18:30:04 2018
new volume ID is 'Local-ZFS:vm-100-disk-1'
map 'drive-virtio0' to '/dev/zvol/Local-ZFS/vm-100-disk-1' (write zeros = 0)
new volume ID is 'Local-ZFS:vm-100-disk-2'
map 'drive-virtio1' to '/dev/zvol/Local-ZFS/vm-100-disk-2' (write zeros = 0)
progress 1% (read 1030815744 bytes, duration 6 sec)
progress 2% (read 2061631488 bytes, duration 10 sec)
progress 3% (read 3092381696 bytes, duration 15 sec)
progress 4% (read 4123197440 bytes, duration 21 sec)
progress 5% (read 5154013184 bytes, duration 26 sec)
progress 6% (read 6184763392 bytes, duration 31 sec)
progress 7% (read 7215579136 bytes, duration 35 sec)
progress 8% (read 8246394880 bytes, duration 40 sec)
progress 9% (read 9277145088 bytes, duration 47 sec)
progress 10% (read 10307960832 bytes, duration 53 sec)
progress 11% (read 11338776576 bytes, duration 59 sec)
progress 12% (read 12369526784 bytes, duration 65 sec)
progress 13% (read 13400342528 bytes, duration 71 sec)
progress 14% (read 14431092736 bytes, duration 79 sec)
progress 15% (read 15461908480 bytes, duration 85 sec)
progress 16% (read 16492724224 bytes, duration 91 sec)
progress 17% (read 17523474432 bytes, duration 97 sec)
progress 18% (read 18554290176 bytes, duration 103 sec)
progress 19% (read 19585105920 bytes, duration 109 sec)
progress 20% (read 20615856128 bytes, duration 116 sec)
progress 21% (read 21646671872 bytes, duration 123 sec)
progress 22% (read 22677487616 bytes, duration 130 sec)
progress 23% (read 23708237824 bytes, duration 136 sec)
progress 24% (read 24739053568 bytes, duration 143 sec)
progress 25% (read 25769803776 bytes, duration 169 sec)
progress 26% (read 26800619520 bytes, duration 202 sec)
progress 27% (read 27831435264 bytes, duration 235 sec)
progress 28% (read 28862185472 bytes, duration 263 sec)
progress 29% (read 29893001216 bytes, duration 291 sec)
progress 30% (read 30923816960 bytes, duration 291 sec)
progress 31% (read 31954567168 bytes, duration 291 sec)
progress 32% (read 32985382912 bytes, duration 291 sec)
progress 33% (read 34016198656 bytes, duration 291 sec)
progress 34% (read 35046948864 bytes, duration 310 sec)
progress 35% (read 36077764608 bytes, duration 356 sec)
progress 36% (read 37108580352 bytes, duration 382 sec)
progress 37% (read 38139330560 bytes, duration 423 sec)
progress 38% (read 39170146304 bytes, duration 445 sec)
progress 39% (read 40200896512 bytes, duration 474 sec)
progress 40% (read 41231712256 bytes, duration 486 sec)
progress 41% (read 42262528000 bytes, duration 521 sec)
progress 42% (read 43293278208 bytes, duration 554 sec)
progress 43% (read 44324093952 bytes, duration 583 sec)
progress 44% (read 45354909696 bytes, duration 609 sec)
progress 45% (read 46385659904 bytes, duration 653 sec)
progress 46% (read 47416475648 bytes, duration 678 sec)
progress 47% (read 48447291392 bytes, duration 703 sec)
progress 48% (read 49478041600 bytes, duration 738 sec)
progress 49% (read 50508857344 bytes, duration 761 sec)
progress 50% (read 51539607552 bytes, duration 792 sec)
progress 51% (read 52570423296 bytes, duration 819 sec)
progress 52% (read 53601239040 bytes, duration 868 sec)
progress 53% (read 54631989248 bytes, duration 896 sec)
progress 54% (read 55662804992 bytes, duration 922 sec)
progress 55% (read 56693620736 bytes, duration 961 sec)
progress 56% (read 57724370944 bytes, duration 986 sec)
progress 57% (read 58755186688 bytes, duration 1022 sec)
progress 58% (read 59786002432 bytes, duration 1046 sec)
progress 59% (read 60816752640 bytes, duration 1072 sec)
progress 60% (read 61847568384 bytes, duration 1106 sec)
progress 61% (read 62878384128 bytes, duration 1129 sec)
progress 62% (read 63909134336 bytes, duration 1165 sec)
progress 63% (read 64939950080 bytes, duration 1187 sec)
progress 64% (read 65970700288 bytes, duration 1213 sec)
progress 65% (read 67001516032 bytes, duration 1250 sec)
progress 66% (read 68032331776 bytes, duration 1276 sec)
progress 67% (read 69063081984 bytes, duration 1302 sec)
progress 68% (read 70093897728 bytes, duration 1325 sec)
progress 69% (read 71124713472 bytes, duration 1364 sec)
progress 70% (read 72155463680 bytes, duration 1376 sec)
progress 71% (read 73186279424 bytes, duration 1405 sec)
progress 72% (read 74217095168 bytes, duration 1469 sec)
progress 73% (read 75247845376 bytes, duration 1479 sec)
progress 74% (read 76278661120 bytes, duration 1512 sec)
progress 75% (read 77309411328 bytes, duration 1543 sec)
progress 76% (read 78340227072 bytes, duration 1575 sec)
progress 77% (read 79371042816 bytes, duration 1596 sec)
progress 78% (read 80401793024 bytes, duration 1629 sec)
progress 79% (read 81432608768 bytes, duration 1651 sec)
progress 80% (read 82463424512 bytes, duration 1689 sec)
progress 81% (read 83494174720 bytes, duration 1717 sec)
progress 82% (read 84524990464 bytes, duration 1743 sec)
progress 83% (read 85555806208 bytes, duration 1778 sec)
progress 84% (read 86586556416 bytes, duration 1805 sec)
progress 85% (read 87617372160 bytes, duration 1836 sec)
progress 86% (read 88648187904 bytes, duration 1859 sec)
progress 87% (read 89678938112 bytes, duration 1882 sec)
progress 88% (read 90709753856 bytes, duration 1914 sec)
progress 89% (read 91740504064 bytes, duration 1938 sec)
progress 90% (read 92771319808 bytes, duration 1959 sec)
progress 91% (read 93802135552 bytes, duration 1988 sec)
progress 92% (read 94832885760 bytes, duration 2022 sec)
progress 93% (read 95863701504 bytes, duration 2044 sec)
progress 94% (read 96894517248 bytes, duration 2075 sec)
progress 95% (read 97925267456 bytes, duration 2101 sec)
progress 96% (read 98956083200 bytes, duration 2124 sec)
progress 97% (read 99986898944 bytes, duration 2151 sec)
progress 98% (read 101017649152 bytes, duration 2151 sec)
progress 99% (read 102048464896 bytes, duration 2162 sec)
progress 100% (read 103079215104 bytes, duration 2162 sec)
total bytes read 103079215104, sparse bytes 9391480832 (9.11%)
space reduction due to 4K zero blocks 0.202%
TASK OK

jkirker · Jul 13, 2018

Nemesiz said:
What is your ZFS pool configuration? On heady load (depends on how slow setup is) pool can be very unacceptable. For SSH use another pool for server OS to avoid IO waiting.

Hi Nemesiz - not sure if this is helpful or what you are looking for - and I appreciate you reaching out and trying to help!

Just a single disk zpool RAID-0.

dir: local
path /var/lib/vz
content rootdir,images,vztmpl,iso
maxfiles 0

dir: Local-Backups
path /home/backups/
content iso,images,rootdir,vztmpl,backup
maxfiles 3

rbd: ILStore1
content rootdir,images
krbd 0
monhost 172.16.0.46 172.16.0.47 172.16.0.48
nodes mox3,mox0,mox1,mox2
pool ILStore1
username admin

zfspool: Local-ZFS
pool Local-ZFS
content rootdir,images
nodes mox1,mox3,mox0,mox2
sparse 1

dir: Local-SSD-ZFS
disable
path /Local-SSD-ZFS/storage
content iso,vztmpl,images,rootdir
nodes mox0
shared 0

zfspool: rpool-SSD
pool rpool
content images,rootdir
nodes mox0
sparse 1

pool: Local-ZFS
state: ONLINE
scan: scrub repaired 0B in 5h50m with 0 errors on Sun Jul 8 06:14:16 2018
config:

NAME STATE READ WRITE CKSUM
Local-ZFS ONLINE 0 0 0
sdb1 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 0h4m with 0 errors on Sun Jul 8 00:28:28 2018
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
sda2 ONLINE 0 0 0

errors: No known data errors

NAME USED AVAIL REFER MOUNTPOINT
Local-ZFS 575G 324G 144K /Local-ZFS
Local-ZFS/dump 96K 324G 96K /Local-ZFS/dump
Local-ZFS/image 96K 324G 96K /Local-ZFS/image
Local-ZFS/iso 96K 324G 96K /Local-ZFS/iso
Local-ZFS/private 96K 324G 96K /Local-ZFS/private
Local-ZFS/storage 96K 324G 96K /Local-ZFS/storage
Local-ZFS/template 96K 324G 96K /Local-ZFS/template
Local-ZFS/vm-100-disk-1 27.8G 324G 27.0G -
Local-ZFS/vm-100-disk-2 61.0G 324G 61.0G -
Local-ZFS/vm-109-disk-1 32.3G 324G 32.2G -
Local-ZFS/vm-109-disk-2 406G 324G 404G -
Local-ZFS/vm-110-disk-1 23.8G 324G 23.8G -
Local-ZFS/vm-111-disk-1 23.8G 324G 23.8G -
rpool 38.8G 186G 96K /rpool
rpool/ROOT 7.77G 186G 96K /rpool/ROOT
rpool/ROOT/pve-1 7.77G 186G 7.77G /
rpool/swap 30.8G 188G 29.1G -

root@mox2:~# zpool status
pool: Local-ZFS
state: ONLINE
scan: scrub repaired 0B in 5h50m with 0 errors on Sun Jul 8 06:14:16 2018
config:

NAME STATE READ WRITE CKSUM
Local-ZFS ONLINE 0 0 0
sdb1 ONLINE 0 0 0

errors: No known data errors

zpool: rpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 0h4m with 0 errors on Sun Jul 8 00:28:28 2018
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
sda2 ONLINE 0 0 0

errors: No known data errors
root@mox2:~# zpool iostat
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
Local-ZFS 573G 355G 518 103 12.0M 6.98M
rpool 37.0G 195G 31 44 212K 597K
---------- ----- ----- ----- ----- ----- -----
root@mox2:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
Local-ZFS 928G 573G 355G - 39% 61% 1.00x ONLINE -
rpool 232G 37.0G 195G - 53% 15% 1.00x ONLINE -

Nemesiz · Jul 13, 2018

#zfs get all rpool

jkirker · Jul 13, 2018

NAME PROPERTY VALUE SOURCE
rpool type filesystem -
rpool creation Thu Mar 3 10:11 2016 -
rpool used 38.8G -
rpool available 186G -
rpool referenced 96K -
rpool compressratio 1.14x -
rpool mounted yes -
rpool quota none default
rpool reservation none default
rpool recordsize 128K default
rpool mountpoint /rpool default
rpool sharenfs off default
rpool checksum on default
rpool compression lz4 local
rpool atime off local
rpool devices on default
rpool exec on default
rpool setuid on default
rpool readonly off default
rpool zoned off default
rpool snapdir hidden default
rpool aclinherit restricted default
rpool createtxg 1 -
rpool canmount on default
rpool xattr on default
rpool copies 1 default
rpool version 5 -
rpool utf8only off -
rpool normalization none -
rpool casesensitivity sensitive -
rpool vscan off default
rpool nbmand off default
rpool sharesmb off default
rpool refquota none default
rpool refreservation none default
rpool guid 9946285014865877177 -
rpool primarycache all default
rpool secondarycache all default
rpool usedbysnapshots 0B -
rpool usedbydataset 96K -
rpool usedbychildren 38.8G -
rpool usedbyrefreservation 0B -
rpool logbias latency default
rpool dedup off default
rpool mlslabel none default
rpool sync standard local
rpool dnodesize legacy default
rpool refcompressratio 1.00x -
rpool written 96K -
rpool logicalused 41.6G -
rpool logicalreferenced 40K -
rpool volmode default default
rpool filesystem_limit none default
rpool snapshot_limit none default
rpool filesystem_count none default
rpool snapshot_count none default
rpool snapdev hidden default
rpool acltype off default
rpool context none default
rpool fscontext none default
rpool defcontext none default
rpool rootcontext none default
rpool relatime off default
rpool redundant_metadata all default
rpool overlay off default

jkirker · Jul 13, 2018

And then: Local-ZFS

root@mox2:~# zfs get all Local-ZFS
NAME PROPERTY VALUE SOURCE
Local-ZFS type filesystem -
Local-ZFS creation Tue Jun 19 23:55 2018 -
Local-ZFS used 573G -
Local-ZFS available 326G -
Local-ZFS referenced 144K -
Local-ZFS compressratio 1.00x -
Local-ZFS mounted yes -
Local-ZFS quota none default
Local-ZFS reservation none default
Local-ZFS recordsize 128K default
Local-ZFS mountpoint /Local-ZFS default
Local-ZFS sharenfs off default
Local-ZFS checksum on default
Local-ZFS compression off default
Local-ZFS atime on default
Local-ZFS devices on default
Local-ZFS exec on default
Local-ZFS setuid on default
Local-ZFS readonly off default
Local-ZFS zoned off default
Local-ZFS snapdir hidden default
Local-ZFS aclinherit restricted default
Local-ZFS createtxg 1 -
Local-ZFS canmount on default
Local-ZFS xattr on default
Local-ZFS copies 1 default
Local-ZFS version 5 -
Local-ZFS utf8only off -
Local-ZFS normalization none -
Local-ZFS casesensitivity sensitive -
Local-ZFS vscan off default
Local-ZFS nbmand off default
Local-ZFS sharesmb off default
Local-ZFS refquota none default
Local-ZFS refreservation none default
Local-ZFS guid 6229891795844742391 -
Local-ZFS primarycache all default
Local-ZFS secondarycache all default
Local-ZFS usedbysnapshots 0B -
Local-ZFS usedbydataset 144K -
Local-ZFS usedbychildren 573G -
Local-ZFS usedbyrefreservation 0B -
Local-ZFS logbias latency default
Local-ZFS dedup off default
Local-ZFS mlslabel none default
Local-ZFS sync standard default
Local-ZFS dnodesize legacy default
Local-ZFS refcompressratio 1.00x -
Local-ZFS written 144K -
Local-ZFS logicalused 570G -
Local-ZFS logicalreferenced 60.5K -
Local-ZFS volmode default default
Local-ZFS filesystem_limit none default
Local-ZFS snapshot_limit none default
Local-ZFS filesystem_count none default
Local-ZFS snapshot_count none default
Local-ZFS snapdev hidden default
Local-ZFS acltype off default
Local-ZFS context none default
Local-ZFS fscontext none default
Local-ZFS defcontext none default
Local-ZFS rootcontext none default
Local-ZFS relatime off default
Local-ZFS redundant_metadata all default
Local-ZFS overlay off default

Nemesiz · Jul 13, 2018

I suggest you to set sync=disabled to avoid double write. Single disk is single disk. ZFS don't have IO process priority.
What you have to know
1. The data goes like this: Program -> ZFS write cache (not ZIL) -> disk
2. ZFS flush data from write cache to disk every ~5 sec
3. Then the write cache is full and disk flush in progress -> happens IO wait for programs (freeze)

Maybe your SSD is consumer level and cant handle much.

jkirker · Jul 14, 2018

Nemesiz said:
What is your ZFS pool configuration? On heady load (depends on how slow setup is) pool can be very unacceptable. For SSH use another pool for server OS to avoid IO waiting.

Thanks Nemesiz... They are Samsumg 850 Pro's - but there are better/stronger/faster.

I read these threads yesterday:
https://forum.proxmox.com/threads/zfs-sync-disabled.37900/
https://forum.proxmox.com/threads/p...-ssd-drives-sync-parameter.31130/#post-155543

So basically on the pve host nodes I should enter:
zfs set sync=disabled

?

And no changes to vms and their caching correct? I typically use writethrough

Nemesiz · Jul 14, 2018

#zfs set sync=disabled pool/name

jkirker · Jul 14, 2018

Thanks - done. Does this take effect immediately or does the host or ZFS need to be reinitialized?

Nemesiz · Jul 14, 2018

immediately

If you set to Local-ZFS will effect Local-ZFS/vm-100-disk-1 and so on. And you can set individually for sub fs.

jkirker · Jul 14, 2018

Thanks again for your help. I'm going to let things run like this for a day or two and do some testing tonight to see how things behave. Fingers crossed.

jkirker · Jul 23, 2018

Nemesiz - this made all the difference in the world. One last question - I have some latency on a 15TB Ceph volume. It's set to Writethrough cache as well. The Ceph I use this particular Ceph mount for backup storage which is fairly static. Other than the 5 second lag for caching, are there any dangers to the data in changing the cache back to the default of NoCache? (3 node Ceph cluster w/ 3 separate monitoring nodes. Ceph drives are spinners w/ SSDs for caching)

Search

Search

ZFS - Restores

jkirker

Member

Nemesiz

Renowned Member

jkirker

Member

jkirker

Member

Nemesiz

Renowned Member

jkirker

Member

jkirker

Member

Nemesiz

Renowned Member

jkirker

Member

Nemesiz

Renowned Member

jkirker

Member

Nemesiz

Renowned Member

jkirker

Member

jkirker

Member