[SOLVED] Migration und Replikation "hängen" bei wenigen KB

herzkerl

Member
Mar 18, 2021
94
10
13
Guten Morgen zusammen!

Gestern habe ich unser Cluster um einen dritten Server erweitert und das Migrationsnetzwerk neu gesteckt (vorher direkt zwischen den beiden Nodes, nun über einen Switch) und nun kann ich weder migrieren noch replizieren. In beiden Fällen "hängt" das Senden des Snapshots bei wenigen KB fest, siehe unten.

Der Storage ist local ZFS, in der Sidebar wird der auch bei jedem Server angezeigt. Zwischen den beiden bisherigen Servern funktionierte das bisher einwandfrei. Ich habe zumindest den neuen Server bereits neugestartet, die direkte SSH-Verbindung von Server zu Server überprüft, sämtliche send/receive-Prozesse gekillt.

Danke im Voraus!

Code:
2021-06-27 22:54:01 102-0: start replication job
2021-06-27 22:54:01 102-0: guest => VM 102, running => 7608
2021-06-27 22:54:01 102-0: volumes => local-zfs:vm-102-disk-0
2021-06-27 22:54:03 102-0: freeze guest filesystem
2021-06-27 22:54:03 102-0: create snapshot '__replicate_102-0_1624827241__' on local-zfs:vm-102-disk-0
2021-06-27 22:54:03 102-0: thaw guest filesystem
2021-06-27 22:54:03 102-0: using insecure transmission, rate limit: none
2021-06-27 22:54:03 102-0: full sync 'local-zfs:vm-102-disk-0' (__replicate_102-0_1624827241__)
2021-06-27 22:54:07 102-0: full send of rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__ estimated size is 33.8G
2021-06-27 22:54:07 102-0: send from @__replicate_102-0_1624795201__ to rpool/data/vm-102-disk-0@__replicate_102-0_1624827241__ estimated size is 446M
2021-06-27 22:54:07 102-0: total estimated size is 34.2G
2021-06-27 22:54:08 102-0: TIME        SENT   SNAPSHOT rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:08 102-0: 22:54:08    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:09 102-0: 22:54:09    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:10 102-0: 22:54:10    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:11 102-0: 22:54:11    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:12 102-0: 22:54:12    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:13 102-0: 22:54:13    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:14 102-0: 22:54:14    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:15 102-0: 22:54:15    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:16 102-0: 22:54:16    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__
2021-06-27 22:54:17 102-0: 22:54:17    325K   rpool/data/vm-102-disk-0@__replicate_102-0_1624795201__

Code:
2021-06-28 08:30:47 use dedicated network address for sending migration traffic (10.10.62.44)
2021-06-28 08:30:47 starting migration of VM 205 to node 'pve4' (10.10.62.44)
2021-06-28 08:30:48 found local disk 'local-zfs:vm-205-disk-0' (via storage)
2021-06-28 08:30:48 found local disk 'local-zfs:vm-205-disk-1' (in current VM config)
2021-06-28 08:30:48 copying local disk images
2021-06-28 08:30:51 full send of rpool/data/vm-205-disk-0@__migration__ estimated size is 1.28G
2021-06-28 08:30:51 total estimated size is 1.28G
2021-06-28 08:30:52 TIME        SENT   SNAPSHOT rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:30:52 08:30:52    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:30:53 08:30:53    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:30:54 08:30:54    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:30:55 08:30:55    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:30:56 08:30:56    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:30:57 08:30:57    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:30:58 08:30:58    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:30:59 08:30:59    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:00 08:31:00    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:01 08:31:01    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:02 08:31:02    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:03 08:31:03    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:04 08:31:04    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:05 08:31:05    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:06 08:31:06    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:07 08:31:07    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:08 08:31:08    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:09 08:31:09    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:10 08:31:10    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:11 08:31:11    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:12 08:31:12    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:13 08:31:13    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:14 08:31:14    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:15 08:31:15    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:16 08:31:16    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:17 08:31:17    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:18 08:31:18    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:19 08:31:19    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:20 08:31:20    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:21 08:31:21    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:22 08:31:22    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:23 08:31:23    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:24 08:31:24    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:25 08:31:25    317K   rpool/data/vm-205-disk-0@__migration__
2021-06-28 08:31:26 08:31:26    317K   rpool/data/vm-205-disk-0@__migration__
send/receive failed, cleaning up snapshot(s)..
2021-06-28 08:31:27 ERROR: storage migration for 'local-zfs:vm-205-disk-0' to storage 'local-zfs' failed - command 'set -o pipefail && pvesm export local-zfs:vm-205-disk-0 zfs - -with-snapshots 0 -snapshot __migration__' failed: interrupted by signal
2021-06-28 08:31:27 aborting phase 1 - cleanup resources
2021-06-28 08:31:27 ERROR: migration aborted (duration 00:00:40): storage migration for 'local-zfs:vm-205-disk-0' to storage 'local-zfs' failed - command 'set -o pipefail && pvesm export local-zfs:vm-205-disk-0 zfs - -with-snapshots 0 -snapshot __migration__' failed: interrupted by signal
TASK ERROR: migration aborted
 
WIe sieht die Bandbreite zwischen den beiden Nodes mit iperf3 aus?
Welche PVE Version ist installiert? Bitte den ganzen Output von `pveversion -v` posten.
 
  • Like
Reactions: herzkerl
Habe den Fehler gefunden – und danke für den Hinweis mit iperf, es wurde Zeit, das mal zu installieren und zu testen ;) Der Fehler war, dass auf dem Switch die Jumbo Frames nicht aktiviert waren – in den beiden Netzwerkschnittstellen hatte ich aber noch MTU 9000 stehen. Fiel mir dann plötzlich wie Schuppen von den Augen – mit MTU 1500 lief es – also war die Ursache klar.
 
  • Like
Reactions: mira

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!