pvestatd Zeitüberschreitung bei Backup

aGro

Member
Feb 22, 2021
77
4
13
45
Hallo,


meine Backups der 3 Nodes dauern heute ungewöhnlich lange und als ich mir das Syslog anschaute fielen mir die vielen Zeitüberschreitungen des pvestatd auf, sowie außerdem immer wieder die Meldung, dass mein gemountetes NFS nicht gefunden wird.

Code:
Sep  7 08:12:23 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:12:33 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:12:33 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:12:53 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:13:00 mc-kvg-03 systemd[1]: Starting Proxmox VE replication runner...
Sep  7 08:13:01 mc-kvg-03 systemd[1]: Started Proxmox VE replication runner.
Sep  7 08:13:03 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:13:13 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:13:24 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:13:43 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:13:43 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:13:53 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:13:53 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:14:00 mc-kvg-03 systemd[1]: Starting Proxmox VE replication runner...
Sep  7 08:14:01 mc-kvg-03 systemd[1]: Started Proxmox VE replication runner.
Sep  7 08:14:03 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:14:03 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:14:13 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:14:13 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:14:43 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:15:00 mc-kvg-03 systemd[1]: Starting Proxmox VE replication runner...
Sep  7 08:15:01 mc-kvg-03 systemd[1]: Started Proxmox VE replication runner.
Sep  7 08:15:24 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:15:24 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:15:33 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:15:33 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:16:00 mc-kvg-03 systemd[1]: Starting Proxmox VE replication runner...
Sep  7 08:16:01 mc-kvg-03 systemd[1]: Started Proxmox VE replication runner.
Sep  7 08:16:13 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:16:33 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:16:43 mc-kvg-03 pvestatd[294487]: got timeout
Sep  7 08:17:00 mc-kvg-03 systemd[1]: Starting Proxmox VE replication runner...
.

Was kann man hier machen?
 
Hallo,
check mal die syslogs am NFS host, vielleicht gibt das ja Aufschluss weshalb die Verbindung scheinbar immer wieder wegbricht. Wie sieht es mit dem Netzwerk generell aus, kann der NFS host kontinuierlich gepingt werden? Wenn nicht handelt es sich vermutlich um ein Netzwerkproblem.
 
Code:
Sep  7 08:12:23 mc-kvg-03 pvestatd[294487]: unable to activate storage 'backup-vm' - directory '/mnt/pve/backup-vm' does not exist or is unreachable
Sep  7 08:12:33 mc-kvg-03 pvestatd[294487]: got timeout
Da steht doch alles, das Storage ist nicht erreichbar. Je nachdem was das für ein Storage ist musst du da mal schauen ob es erreichbar ist.
Das kann am Netzwerk liegen oder dein NAS hat ein Problem.
 
Meine Aussage war leicht fehlerhaft. Das NFS ist ein Proxmox-Backup-Server. Und das Syslogs auf diesem zeigt gar nichts in dieser Hinsicht an. Der Backup-Server ist auch durchgehend anpingbar von allen drei Nodes.
 
Meine Aussage war leicht fehlerhaft. Das NFS ist ein Proxmox-Backup-Server. Und das Syslogs auf diesem zeigt gar nichts in dieser Hinsicht an. Der Backup-Server ist auch durchgehend anpingbar von allen drei Nodes.
Handelt es sich bei backup-vm um einen PBS storage oder um einen NFS storage? Wird auf dieses storage target das Backup geschrieben oder auf das PBS storage target?

Bitte poste den output von cat /etc/pve/storage.cfg und den Backup task log, damit wir hier folgen können wie das Setup genau ausschaut.
 
backup-vm ist ein NFS-Storage

Code:
cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content rootdir,images

rbd: ceph-pool1
        content rootdir,images
        krbd 1
        pool pool1

nfs: backup-vm
        export /tank1/backup-vm
        path /mnt/pve/backup-vm
        server 10.0.0.6
        content backup
        maxfiles 5

nfs: iso-store_neu
        export /volume1/iso-store
        path /mnt/pve/iso-store_neu
        server 10.2.0.51
        content iso

nfs: kvg-ds02_backup_vm
        disable
        export /volume1/backup_vm
        path /mnt/pve/kvg-ds02_backup_vm
        server 10.2.0.51
        content backup
        maxfiles 2

Wo genau finde ich das Backup Task Log?
 
backup-vm ist ein NFS-Storage

Code:
cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content rootdir,images

rbd: ceph-pool1
        content rootdir,images
        krbd 1
        pool pool1

nfs: backup-vm
        export /tank1/backup-vm
        path /mnt/pve/backup-vm
        server 10.0.0.6
        content backup
        maxfiles 5

nfs: iso-store_neu
        export /volume1/iso-store
        path /mnt/pve/iso-store_neu
        server 10.2.0.51
        content iso

nfs: kvg-ds02_backup_vm
        disable
        export /volume1/backup_vm
        path /mnt/pve/kvg-ds02_backup_vm
        server 10.2.0.51
        content backup
        maxfiles 2

Wo genau finde ich das Backup Task Log?
Okay, also handelt es sich nicht um einen Proxmox Backup Server storage, es sind nur NFS storages und local storage als backup target konfiguriert. Der Tasklog kann durch Doppelklick auf den Backup task in der Task Liste unten in der WebUI geöffnet und via Download Button heruntergeladen werden.

Kann es sein, dass die Verbindung nur unter last wegbricht? Ping auch bei laufendem Backup durchgehend beobachten.
 
Ja, es sieht so aus, als würde die Verbindung nur unter Last wegbrechen. Der Ping ging auch während des Backups ohne Paketverluste durch.

INFO: starting new backup job: vzdump 106 104 105 101 --mailto --quiet 1 --compress lzo --storage backup-vm --mailnotification always --mode snapshot INFO: skip external VMs: 106, 104, 101 INFO: Starting Backup of VM 105 (qemu) INFO: Backup started at 2023-09-07 00:00:02 INFO: status = running INFO: update VM 105: -lock backup INFO: VM Name: srv-kvg-ad03 INFO: include disk 'scsi0' 'ceph-pool1:vm-105-disk-0' 200G INFO: include disk 'scsi1' 'ceph-pool1:vm-105-disk-2' 200G INFO: backup mode: snapshot INFO: ionice priority: 7 INFO: skip unused drive 'ceph-pool1:vm-105-disk-1' (not included into backup) INFO: creating archive '/mnt/pve/backup-vm/dump/vzdump-qemu-105-2023_09_07-00_00_02.vma.lzo' INFO: started backup task 'c57573b3-f23a-43f9-a028-68b54dca1aaf' INFO: status: 0% (75759616/429496729600), sparse 0% (7188480), duration 3, read/write 25/22 MB/s INFO: status: 1% (4304470016/429496729600), sparse 0% (167849984), duration 217, read/write 19/19 MB/s INFO: status: 2% (8598585344/429496729600), sparse 0% (232259584), duration 408, read/write 22/22 MB/s INFO: status: 3% (12885491712/429496729600), sparse 0% (273469440), duration 719, read/write 13/13 MB/s INFO: status: 4% (17198809088/429496729600), sparse 0% (318955520), duration 964, read/write 17/17 MB/s INFO: status: 5% (21481455616/429496729600), sparse 0% (352849920), duration 1200, read/write 18/18 MB/s INFO: status: 6% (25771573248/429496729600), sparse 0% (387072000), duration 1407, read/write 20/20 MB/s INFO: status: 7% (30092886016/429496729600), sparse 0% (422010880), duration 1606, read/write 21/21 MB/s INFO: status: 8% (34365636608/429496729600), sparse 0% (450736128), duration 1987, read/write 11/11 MB/s INFO: status: 9% (38658834432/429496729600), sparse 0% (455589888), duration 2354, read/write 11/11 MB/s INFO: status: 10% (42967040000/429496729600), sparse 0% (456769536), duration 2565, read/write 20/20 MB/s INFO: status: 11% (47249424384/429496729600), sparse 0% (458027008), duration 2811, read/write 17/17 MB/s INFO: status: 12% (51546750976/429496729600), sparse 0% (459608064), duration 3064, read/write 16/16 MB/s INFO: status: 13% (55841849344/429496729600), sparse 0% (461684736), duration 3286, read/write 19/19 MB/s INFO: status: 14% (60138127360/429496729600), sparse 0% (462614528), duration 3593, read/write 13/13 MB/s INFO: status: 15% (64442138624/429496729600), sparse 0% (501768192), duration 3772, read/write 24/23 MB/s INFO: status: 16% (68735533056/429496729600), sparse 0% (619044864), duration 3935, read/write 26/25 MB/s INFO: status: 17% (73014640640/429496729600), sparse 0% (719708160), duration 4113, read/write 24/23 MB/s INFO: status: 18% (77331496960/429496729600), sparse 0% (1473523712), duration 4302, read/write 22/18 MB/s INFO: status: 19% (81634918400/429496729600), sparse 0% (2875867136), duration 4410, read/write 39/26 MB/s INFO: status: 20% (85937946624/429496729600), sparse 0% (3244474368), duration 4534, read/write 34/31 MB/s INFO: status: 21% (90219020288/429496729600), sparse 0% (3244474368), duration 4640, read/write 40/40 MB/s INFO: status: 22% (94490001408/429496729600), sparse 0% (3266031616), duration 4785, read/write 29/29 MB/s INFO: status: 23% (98814656512/429496729600), sparse 0% (3272232960), duration 4968, read/write 23/23 MB/s INFO: status: 24% (103085178880/429496729600), sparse 0% (3277713408), duration 5181, read/write 20/20 MB/s INFO: status: 25% (107384995840/429496729600), sparse 0% (3280674816), duration 5397, read/write 19/19 MB/s INFO: status: 26% (111692349440/429496729600), sparse 0% (3283976192), duration 5603, read/write 20/20 MB/s INFO: status: 27% (115979386880/429496729600), sparse 0% (3284353024), duration 5796, read/write 22/22 MB/s INFO: status: 28% (120278548480/429496729600), sparse 0% (3284353024), duration 5986, read/write 22/22 MB/s INFO: status: 29% (124572336128/429496729600), sparse 0% (3284353024), duration 6160, read/write 24/24 MB/s INFO: status: 30% (128851509248/429496729600), sparse 0% (3284869120), duration 6341, read/write 23/23 MB/s INFO: status: 31% (133170266112/429496729600), sparse 0% (3285377024), duration 6527, read/write 23/23 MB/s INFO: status: 32% (137447342080/429496729600), sparse 0% (3285463040), duration 7322, read/write 5/5 MB/s INFO: status: 33% (141750173696/429496729600), sparse 0% (3287347200), duration 8210, read/write 4/4 MB/s INFO: status: 34% (146037800960/429496729600), sparse 0% (3287642112), duration 9493, read/write 3/3 MB/s INFO: status: 35% (150333030400/429496729600), sparse 0% (3287642112), duration 9967, read/write 9/9 MB/s INFO: status: 36% (154635862016/429496729600), sparse 0% (3287642112), duration 10291, read/write 13/13 MB/s INFO: status: 37% (158935547904/429496729600), sparse 0% (3287642112), duration 10555, read/write 16/16 MB/s INFO: status: 38% (163218456576/429496729600), sparse 0% (3288670208), duration 11032, read/write 8/8 MB/s INFO: status: 39% (167504183296/429496729600), sparse 0% (3296542720), duration 11672, read/write 6/6 MB/s INFO: status: 40% (171801575424/429496729600), sparse 0% (3296583680), duration 12420, read/write 5/5 MB/s INFO: status: 41% (176094576640/429496729600), sparse 0% (3296583680), duration 13407, read/write 4/4 MB/s INFO: status: 42% (180397670400/429496729600), sparse 0% (3296583680), duration 14131, read/write 5/5 MB/s INFO: status: 43% (184694538240/429496729600), sparse 0% (3300888576), duration 14923, read/write 5/5 MB/s INFO: status: 44% (188992782336/429496729600), sparse 0% (3305484288), duration 15740, read/write 5/5 MB/s INFO: status: 45% (193281720320/429496729600), sparse 0% (3312492544), duration 16319, read/write 7/7 MB/s INFO: status: 46% (197572886528/429496729600), sparse 0% (3314466816), duration 17070, read/write 5/5 MB/s INFO: status: 47% (201881092096/429496729600), sparse 0% (3315924992), duration 17783, read/write 6/6 MB/s INFO: status: 48% (206166949888/429496729600), sparse 0% (3317239808), duration 18489, read/write 6/6 MB/s INFO: status: 49% (210459426816/429496729600), sparse 0% (3319148544), duration 18959, read/write 9/9 MB/s INFO: status: 50% (214879305728/429496729600), sparse 0% (3321384960), duration 19308, read/write 12/12 MB/s INFO: status: 51% (219052638208/429496729600), sparse 0% (3588288512), duration 19886, read/write 7/6 MB/s INFO: status: 52% (223389941760/429496729600), sparse 0% (3588616192), duration 20609, read/write 5/5 MB/s INFO: status: 53% (227674750976/429496729600), sparse 0% (3588648960), duration 21445, read/write 5/5 MB/s INFO: status: 54% (231976599552/429496729600), sparse 0% (3591778304), duration 22127, read/write 6/6 MB/s INFO: status: 55% (236260425728/429496729600), sparse 0% (3595194368), duration 22897, read/write 5/5 MB/s INFO: status: 56% (240575447040/429496729600), sparse 0% (3595743232), duration 23731, read/write 5/5 MB/s INFO: status: 57% (244872445952/429496729600), sparse 0% (3595743232), duration 24625, read/write 4/4 MB/s INFO: status: 58% (249157451776/429496729600), sparse 0% (3595743232), duration 25475, read/write 5/5 MB/s INFO: status: 59% (253437345792/429496729600), sparse 0% (3595743232), duration 26311, read/write 5/5 MB/s INFO: status: 60% (257787101184/429496729600), sparse 0% (3596046336), duration 27064, read/write 5/5 MB/s INFO: status: 61% (261996412928/429496729600), sparse 0% (3596197888), duration 27840, read/write 5/5 MB/s INFO: status: 62% (266303766528/429496729600), sparse 0% (3597078528), duration 28514, read/write 6/6 MB/s INFO: status: 63% (270596112384/429496729600), sparse 0% (3597578240), duration 28771, read/write 16/16 MB/s INFO: status: 64% (275099549696/429496729600), sparse 1% (7172636672), duration 28836, read/write 69/14 MB/s INFO: status: 65% (279405592576/429496729600), sparse 2% (11478679552), duration 28855, read/write 226/0 MB/s INFO: status: 66% (283608875008/429496729600), sparse 3% (15681961984), duration 28870, read/write 280/0 MB/s INFO: status: 67% (288010534912/429496729600), sparse 4% (20083621888), duration 28885, read/write 293/0 MB/s INFO: status: 68% (292173250560/429496729600), sparse 5% (24246337536), duration 28900, read/write 277/0 MB/s INFO: status: 69% (296558395392/429496729600), sparse 6% (28631482368), duration 28915, read/write 292/0 MB/s INFO: status: 70% (300730417152/429496729600), sparse 7% (32803504128), duration 28929, read/write 298/0 MB/s INFO: status: 71% (305077420032/429496729600), sparse 8% (37150507008), duration 28945, read/write 271/0 MB/s INFO: status: 72% (309281357824/429496729600), sparse 9% (41354444800), duration 28961, read/write 262/0 MB/s INFO: status: 73% (313619775488/429496729600), sparse 10% (45692862464), duration 28977, read/write 271/0 MB/s INFO: status: 74% (317907664896/429496729600), sparse 11% (49980751872), duration 28993, read/write 267/0 MB/s INFO: status: 75% (322375843840/429496729600), sparse 12% (54448930816), duration 29010, read/write 262/0 MB/s INFO: status: 76% (326623952896/429496729600), sparse 13% (58697039872), duration 29027, read/write 249/0 MB/s INFO: status: 77% (330806263808/429496729600), sparse 14% (62879350784), duration 29043, read/write 261/0 MB/s INFO: status: 78% (335225749504/429496729600), sparse 15% (67298836480), duration 29060, read/write 259/0 MB/s INFO: status: 79% (339528253440/429496729600), sparse 16% (71601340416), duration 29078, read/write 239/0 MB/s INFO: status: 80% (343641882624/429496729600), sparse 17% (75714969600), duration 29094, read/write 257/0 MB/s INFO: status: 81% (348045180928/429496729600), sparse 18% (80118267904), duration 29111, read/write 259/0 MB/s INFO: status: 82% (352243482624/429496729600), sparse 19% (84316569600), duration 29128, read/write 246/0 MB/s INFO: status: 83% (356700061696/429496729600), sparse 20% (88773148672), duration 29145, read/write 262/0 MB/s INFO: status: 84% (360878702592/429496729600), sparse 21% (92951789568), duration 29161, read/write 261/0 MB/s INFO: status: 85% (365085261824/429496729600), sparse 22% (97158348800), duration 29177, read/write 262/0 MB/s INFO: status: 86% (369504681984/429496729600), sparse 23% (101577768960), duration 29194, read/write 259/0 MB/s INFO: status: 87% (373900836864/429496729600), sparse 24% (105973923840), duration 29211, read/write 258/0 MB/s INFO: status: 88% (378044219392/429496729600), sparse 25% (110117306368), duration 29227, read/write 258/0 MB/s INFO: status: 89% (382437031936/429496729600), sparse 26% (114510118912), duration 29244, read/write 258/0 MB/s INFO: status: 90% (386564751360/429496729600), sparse 27% (118637838336), duration 29260, read/write 257/0 MB/s INFO: status: 91% (390957694976/429496729600), sparse 28% (123030781952), duration 29277, read/write 258/0 MB/s INFO: status: 92% (395262820352/429496729600), sparse 29% (127335907328), duration 29294, read/write 253/0 MB/s INFO: status: 93% (399590817792/429496729600), sparse 30% (131663904768), duration 29311, read/write 254/0 MB/s INFO: status: 94% (403751960576/429496729600), sparse 31% (135825047552), duration 29327, read/write 260/0 MB/s INFO: status: 95% (408193335296/429496729600), sparse 32% (140266422272), duration 29347, read/write 222/0 MB/s INFO: status: 96% (412346220544/429496729600), sparse 33% (144419307520), duration 29363, read/write 259/0 MB/s INFO: status: 97% (416794279936/429496729600), sparse 34% (148867366912), duration 29380, read/write 261/0 MB/s INFO: status: 98% (420974034944/429496729600), sparse 35% (153047121920), duration 29396, read/write 261/0 MB/s INFO: status: 99% (425424191488/429496729600), sparse 36% (157497278464), duration 29414, read/write 247/0 MB/s INFO: status: 100% (429496729600/429496729600), sparse 37% (161569812480), duration 29430, read/write 254/0 MB/s INFO: transferred 429496 MB in 29430 seconds (14 MB/s) INFO: archive file size: 162.71GB INFO: delete old backup '/mnt/pve/backup-vm/dump/vzdump-qemu-105-2023_09_01-00_00_01.vma.lzo' INFO: Finished Backup of VM 105 (08:11:42) INFO: Backup finished at 2023-09-07 08:11:44 INFO: Backup job finished successfully TASK OK

Wie man sieht, hat er für diese eine VM mehr als 8h gebraucht. Was ich noch beobachten konnte (nachdem nun alle Backup`s durch sind), dass das Backup dieser Node wohl das Hindernis für die Anderen war. Nachdem dieses fertig war, kamen auch keine Zeitüberschreitungen mehr, obwohl eine Node noch das Backup durchführte.
 
Wie sieht es mit der last am NFS aus? Vermutlich kann die storage dort nicht mithalten, wenn mehrere nodes parallel schreiben/backups machen.
 
Das System arbeitet seit Jahren so. Und heute fiel es mir aber auf, weil die Backup's sonst normalerweise längst fertig waren, wenn ich um 7 Uhr ins Büro kam. Und heute war ebend nur 1 Node fertig. Ich habe den Backup-Server jetzt mal neugestartet (vielleicht hilft das ja schon) und werde es morgen nochmal kontrollieren, ob es wieder auftritt. Dann würde ich mich hier nochmal melden.

Danke trotzdem für die Hilfe.
 
Prüfe mal deine Disks auf dem Backupserver. Smart Werte kontrollieren.
Eventuell auch mal die Last auf den Disks monitoren während eines Backups.
Unter umständen ist eine Disk kurz vorm sterben und macht ganz schlechte Latenzen.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!