Search results

  1. A

    Insane load avg, disk timeouts w/ZFS

    Typo. SSD. Used as both L2ARC and SLOG.
  2. A

    Insane load avg, disk timeouts w/ZFS

    Based on a whole lot of reading (including from Matt Ahrens, pretty much the horse's mouth himself), my fundamental problem appears to be that I chose a topology of 10 x HDD + 2 x HSS in RAIDZ3 (with SSDs split between SLOG and L2ARC). Plus a whole lot of sub-optimal code in ZFS-on-Linux that...
  3. A

    Insane load avg, disk timeouts w/ZFS

    Doesn't *seem* to be an issue. There's consistent reads from between 4 and 6 out of 12 devices, which correlates with "zpool iostat -v" output. SMART detects nothing wrong, either.
  4. A

    Insane load avg, disk timeouts w/ZFS

    OK, seriously WTF... as I'm exporting the pool, the zd* devices are starting to show up in /sys/block and udev is trying to deal with them... and getting nowhere as they're vanishing out from under udev's feet. Wishing I'd picked the H700 RAID controller and just run qcow2 on ext4 or xfs now...
  5. A

    Insane load avg, disk timeouts w/ZFS

    Hmm... I'm also trying the 'zpool export tank' / 'zpool import tank' that many others in the ZIL community suggest; exporting this zpool is taking 15+ minutes. A zpool export should be nearly instantaneous, no?
  6. A

    Insane load avg, disk timeouts w/ZFS

    ARC is hard-limited via module options to 32GB. ZIL on mirrored SLOG vdevs exist, 240GB in size (yes, vastly overprovisioned). L2ARC on striped vdevs exist, 240GB in size. Each of the VMs has between 1 and 4GB allocated to it, and nothing's actually running in the containers yet besides a bare...
  7. A

    Insane load avg, disk timeouts w/ZFS

    Ah, even worse, when the system finally boots, most of the entries are missing from /dev/zvol, so the VMs cannot start. Currently looking for workarounds... seems like renaming each zvol does the trick, but - ugh, not a nice workaround. Exporting the data pool and reimporting it works, too...
  8. A

    Insane load avg, disk timeouts w/ZFS

    Rebooting again: import ZFS pools took 9 minutes, mount ZFS filesystems didn't finish until 12 minutes (according to systemd timer). I fear I have done something very wrong with ZFS, but I don't know what. arcstat.py shows that, immediately after boot, the ARC is only 2.1GB. I'm not even sure...
  9. A

    Insane load avg, disk timeouts w/ZFS

    The 17k load average was from the old cluster, running sheepdog. That number is irrelevant, my brain is just fixated on it because it was so astonishing. New system is what I'm having problems with, which is not clustered. Uh oh... now I can't even reboot it, zpool import takes too long...
  10. A

    Insane load avg, disk timeouts w/ZFS

    FYI, although not directly related to this thread (oops - see previous post): https://goo.gl/photos/EszhPMQQkrbWpw4x9
  11. A

    Insane load avg, disk timeouts w/ZFS

    That's the confusing part - the load average with ZFS running seems completely arbitrary and artificial. (Does it count kernel threads? Even then it's ridiculous.) Right now there are 4 containers and 9 VMs on this system. Not a very heavy load for the hardware. Also, I just realized the...
  12. A

    Insane load avg, disk timeouts w/ZFS

    I've setup a new PVE system (community license only at this time). It's reasonably beefy (given that it's slightly trailing edge technology) using local ZFS storage. We are observing that during periods of heavy *write* activity, the system load average goes into orbit (I've got a screen capture...
  13. A

    Sheepdog 1.0

    Oh, and I can confirm that pve-sheepdog works well so far on one cluster. There's an anomaly where restarting any given node causes a disproportionate amount of recovery to occur even if cluster-rebuild is disabled while the node gets restarted, but that may be me not understanding the protocol...
  14. A

    Sheepdog 1.0

    Thank you! That makes it easier to plan and to provide meaningful status updates to stakeholders. (It doesn't help for the bugzilla entries that stay open for many months because they're hard to fix, but at least it gives me some guidance.) Sadly, we're running into a whole bunch of GUI...
  15. A

    Sheepdog 1.0

    It's obviously too early in the morning. That package is in the jessie repos, not the pve repos.
  16. A

    Sheepdog 1.0

    Perhaps I've misunderstood... do I not also need the "sheepdog" package, which is still at 0.8.3-2 (in no-subscription)? (Never mind - testing shows that I do not. That is what was confusing me.) However, going back to my previous question, is there any approximate guideline I can use for...
  17. A

    Sheepdog 1.0

    Dietmar, what sort of delay or timeline is typical between: 1. a commit like this going into the repo, 2. a package showing up in pvetest, 3. a package showing up in no-subscription 4. a package showing up in pve-enterprise, and finally 5. the package being included in the latest ISO? I'm...
  18. A

    "Start at boot" and shared storage

    I would say the biggest problem is the CEPH re-re-balancing after the last couple of nodes finish booting, because that consumes nearly 100% of the IOPS each node can deliver. (Four of the nodes have strangely slow disk despite being 10k SAS drives with an SSD cache... something to do with the...
  19. A

    "Start at boot" and shared storage

    I assume the "small CT hack" means create a tiny (basically no-op) LXC (or whatever it's called in the current version) container on each host, set it to boot priority=1, and use that to determine the timings for the rest of the boot? If so, I'm thinking I might be better off disabling...
  20. A

    Cluster cold start timing problems

    I've got an 8-node PVE 3 cluster where each node is also part of the CEPH storage pool. Anytime I cold-boot the entire cluster (e.g. after work on the building's power system), the fact that the nodes are not all identical means that they take varying amounts of time to come online. I have a...

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!