Okay,
basically it's exactly what's happening in the bugzilla report:
I have a customer with a Proxmox Cluster who uses a shared Open-E iscsi Storage Cluster.
These storage Clusters are somwhat strange and have some design issues, but until the latest patch there was a workaround for this...
So what does this tell me? I can't find a way to fix this. My next try would be to replace the perl modules by the ones of the previous commit, but this is not a update-proof solution, and i can't say more about other side effects.
I have the exact same Effect on a 5 Node Epyc Milan / 7313P (Board Supermicro H12SSW-NTR) Cluster.
Ran stable for a year, since 8.2/Kernel 6.8 random lockups of single Nodes after 1-3 days. Console freezes, no error Messages anywhere, just frozen.
Rebooted all nodes today back to 6.5 - I will...
We're getting somewhere:
2024-05-07T16:56:14+02:00: Starting tape backup job 'zfs:cephfs-mai:lto9:cephfs'
2024-05-07T16:56:14+02:00: update media online status
2024-05-07T16:56:16+02:00: media set uuid: c3d64c07-c811-48f0-9845-086cead14e55
2024-05-07T16:56:16+02:00: found 9 groups (out of 9...
Okay, so we can already rule out the Drive to not be able to sustain the specified 300MB/s.. so the bottleneck must either be the ZFS, or the Code that's pulling the data off.
I notice one thing, which is not totally logical:
2024-05-07T13:32:18+02:00: wrote 1272 chunks (4299.95 MB at 256.08 MB/s)
2024-05-07T13:32:37+02:00: wrote 1211 chunks (4305.72 MB at 255.80 MB/s) => 4305MB in 19s: 226,578947368 MB
2024-05-07T13:32:55+02:00: wrote 1490 chunks (4295.23 MB at...
Another observation:
This is how it looks like in the Datastore graph - only the Tape backup job is running, but it constantly shows ~255MB/s:
vs:
2024-05-06T20:01:25+02:00: backup snapshot "vm/11222/2024-05-05T20:35:05Z"
2024-05-06T20:02:05+02:00: wrote 7322 chunks (4295.75 MB at 183.45...
First I would like to provide you with some Specs:
AMD EPYC 7313 16-Core Processor
12x Seagate Exos 20tb
4x 4tb NVME
It's configured as raid-z3 with a 4-way mirror special device on nvme.
It - seems - that the drop comes in after some time, a few minutes. I made another job...
More Update on the Topic:
Increased the Thread to 16 (which is core count of the backup machine):
300MB/s (was <=200MB/s volatile) when running a tape job only.
~160-200MB/s (was <= 60MB/s volatile) when tape job was running while a verify job was running.
//What I am noticing now is some...
Okay, was i impatient, built PBS myself and can conform significant improvements to the ZFS Performance on Spinners, especially when writing data to tape. Finally I can (at least if no other things are running) saturate the write performance of my LTO9 drive. Before the patch it maxed at...
Looking at the Thread again: If the gains are to visible for a single Spinner, the effect to have multiple IO threads should be even more Interessting on ZFS pools with several Disks...
So well, I investigated this a bit:
PBS doesn't use rsyslogd by default, just systemd-journald.
In theory there is the option to set LogFilterPatterns in the service definition for systemd for the proxmox-backup-proxy.server BUT: Bookworm has systemd 252, and the Option was introduced with...
Hi,
i just noticed that PBS logging is very noisy:
Mar 31 23:07:29 pbs-ba1-2 proxmox-backup-proxy[2878]: GET /chunk
Mar 31 23:07:29 pbs-ba1-2 proxmox-backup-proxy[2878]: download chunk "/mnt/datastore/datastore/px11/.chunks/4583/4583ba2fb3e7c1086442e0>
Mar 31 23:07:29 pbs-ba1-2...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.