Hi guys, I am still having Retransmit List on two node cluster with quorum disk.
No mater which kernel i am using 2.6 and 3.10.0-8-pve - in both cases i have The same problem.
Sometimes i see that /etc/pve is locked for some time.
Any idea what causes it ?
Should i be worried ?
What can i do to fix this, i tried everything in google...
It presists even if two nodes are connected directly between eachother without switch.
Also HA VM inside one of the nodes sometimes has problem with internet access.
Any idea ? It is very important for me to know this as soon as possible (i have 24h
Nestat says thsi (below you have full version)
152 fast retransmits
206 forward retransmits
12 other TCP timeouts
TCPLossProbes: 94
TCPLossProbeRecovery: 59
No mater which kernel i am using 2.6 and 3.10.0-8-pve - in both cases i have The same problem.
Sometimes i see that /etc/pve is locked for some time.
Any idea what causes it ?
Should i be worried ?
What can i do to fix this, i tried everything in google...
It presists even if two nodes are connected directly between eachother without switch.
Also HA VM inside one of the nodes sometimes has problem with internet access.
Any idea ? It is very important for me to know this as soon as possible (i have 24h
Nestat says thsi (below you have full version)
152 fast retransmits
206 forward retransmits
12 other TCP timeouts
TCPLossProbes: 94
TCPLossProbeRecovery: 59
Code:
==> corosync.log <==
Apr 25 12:45:32 corosync [TOTEM ] Retransmit List: 3b8 3b9 3ba 3bb 3bc 3bd 3be 3bf
Apr 25 12:45:42 corosync [TOTEM ] Retransmit List: 3c6 3c7 3c8 3c9 3ca 3cb 3cc 3cd
Apr 25 12:45:42 corosync [TOTEM ] Retransmit List: 3cb 3cc 3cd
Apr 25 12:45:52 corosync [TOTEM ] Retransmit List: 3d4 3d5 3d6 3d7 3d8 3d9 3da 3db
Apr 25 12:46:03 corosync [TOTEM ] Retransmit List: 3e2 3e3 3e4 3e5 3e6 3e7 3e8 3e9
Apr 25 12:46:13 corosync [TOTEM ] Retransmit List: 3f0 3f1 3f2 3f3 3f4 3f5 3f6 3f7
Apr 25 12:46:23 corosync [TOTEM ] Retransmit List: 3fe 3ff 400 401 402 403 404 405
Apr 25 12:46:23 corosync [TOTEM ] Retransmit List: 403 404 405
Apr 25 12:46:33 corosync [TOTEM ] Retransmit List: 40c 40d 40e 40f 410 411 412 413
Apr 25 12:46:48 corosync [TOTEM ] Retransmit List: 41a 41b 41c 41d 41e 41f 420 421
Apr 25 12:47:02 corosync [TOTEM ] Retransmit List: 436 437 438 439 43a 43b 43c 43d
Apr 25 12:47:02 corosync [TOTEM ] Retransmit List: 43a 43b 43c 43d
Apr 25 12:47:12 corosync [TOTEM ] Retransmit List: 444 445 446 447 448 449 44a 44b
Apr 25 12:47:22 corosync [TOTEM ] Retransmit List: 453 454 455 456 457 458 459 45a
Apr 25 12:47:22 corosync [TOTEM ] Retransmit List: 45a
Apr 25 12:47:33 corosync [TOTEM ] Retransmit List: 461 462 463 464 465 466 467 468
Apr 25 12:47:33 corosync [TOTEM ] Retransmit List: 467 468
Apr 25 12:47:43 corosync [TOTEM ] Retransmit List: 46f 470 471 472 473 474 475 476
Apr 25 12:47:53 corosync [TOTEM ] Retransmit List: 47d 47e 47f 480 481 482 483 484
Apr 25 12:47:53 corosync [TOTEM ] Retransmit List: 480 481 482 483 484
Apr 25 12:48:08 corosync [TOTEM ] Retransmit List: 48b 48c 48d 48e 48f 490 491 492
Apr 25 12:48:08 corosync [TOTEM ] Retransmit List: 48d 48e 48f 490 491 492
Apr 25 12:48:22 corosync [TOTEM ] Retransmit List: 4a7 4a8 4a9 4aa 4ab 4ac 4ad 4ae
Apr 25 12:48:22 corosync [TOTEM ] Retransmit List: 4a9 4aa 4ab 4ac 4ad 4ae
Apr 25 12:48:52 corosync [TOTEM ] Retransmit List: 4b6 4b7 4b8 4b9 4ba 4bb 4bc 4bd
Apr 25 12:48:52 corosync [TOTEM ] Retransmit List: 4bd 4c3 4c4 4c5 4c6 4c7 4c8 4c9 4ca
Apr 25 12:48:52 corosync [TOTEM ] Retransmit List: 4c7 4c8 4c9 4ca
Apr 25 12:48:52 corosync [TOTEM ] Retransmit List: 4d2 4d3 4d4 4d5 4d6 4d7 4d8 4d9
Apr 25 12:49:03 corosync [TOTEM ] Retransmit List: 4e0 4e1 4e2 4e3 4e4 4e5 4e6 4e7
Apr 25 12:49:03 corosync [TOTEM ] Retransmit List: 4e2 4e3 4e4 4e5 4e6 4e7
Apr 25 12:49:18 corosync [TOTEM ] Retransmit List: 4ee 4ef 4f0 4f1 4f2 4f3 4f4 4f5
Apr 25 12:49:32 corosync [TOTEM ] Retransmit List: 50a 50b 50c 50d 50e 50f 510 511
Apr 25 12:49:42 corosync [TOTEM ] Retransmit List: 518 519 51a 51b 51c 51d 51e 51f
Apr 25 12:49:52 corosync [TOTEM ] Retransmit List: 526 527 528 529 52a 52b 52c 52d
Apr 25 12:49:52 corosync [TOTEM ] Retransmit List: 528 529 52a 52b 52c 52d
Apr 25 12:50:02 corosync [TOTEM ] Retransmit List: 534 535 536 537 538 539 53a 53b
Apr 25 12:50:02 corosync [TOTEM ] Retransmit List: 536 537 538 539 53a 53b
Apr 25 12:50:22 corosync [TOTEM ] Retransmit List: 542 543 544 545 546 547 548 549
Apr 25 12:50:22 corosync [TOTEM ] Retransmit List: 551 552 553 554 555 556 557 558
Apr 25 12:50:22 corosync [TOTEM ] Retransmit List: 558
Apr 25 12:50:33 corosync [TOTEM ] Retransmit List: 55f 560 561 562 563 564 565 566
Apr 25 12:50:43 corosync [TOTEM ] Retransmit List: 56d 56e 56f 570 571 572 573 574
Apr 25 12:50:52 corosync [TOTEM ] Retransmit List: 57b 57c 57d 57e 57f 580 581 582
Apr 25 12:50:52 corosync [TOTEM ] Retransmit List: 581 582
Apr 25 12:51:02 corosync [TOTEM ] Retransmit List: 589 58a 58b 58c 58d 58e 58f 590
Apr 25 12:51:02 corosync [TOTEM ] Retransmit List: 58b 58c 58d 58e 58f 590
Apr 25 12:51:12 corosync [TOTEM ] Retransmit List: 597 598 599 59a 59b 59c 59d 59e
Apr 25 12:51:12 corosync [TOTEM ] Retransmit List: 599 59a 59b 59c 59d 59e
Apr 25 12:51:22 corosync [TOTEM ] Retransmit List: 5a5 5a6 5a7 5a8 5a9 5aa 5ab 5ac
Apr 25 12:51:32 corosync [TOTEM ] Retransmit List: 5b3 5b4 5b5 5b6 5b7 5b8 5b9 5ba 5bb 5bc 5bd 5be 5bf 5c0
Apr 25 12:51:32 corosync [TOTEM ] Retransmit List: 5bf 5c0
Code:
Apr 25 13:05:09 node1 pvedaemon[5389]: <root@pam> successful auth for user 'root@pam'
Apr 25 13:05:33 node1 qdiskd[4624]: qdisk cycle took more than 1 second to complete (1.780000)
Apr 25 13:05:35 node1 qdiskd[4624]: qdisk cycle took more than 1 second to complete (1.290000)
Apr 25 13:05:46 node1 qdiskd[4624]: qdisk cycle took more than 1 second to complete (1.120000)
Apr 25 13:06:00 node1 qdiskd[4624]: qdisk cycle took more than 1 second to complete (1.090000)
Apr 25 13:06:01 node1 pveproxy[9387]: worker exit
Apr 25 13:06:01 node1 pveproxy[5684]: worker 9387 finished
Apr 25 13:06:01 node1 pveproxy[5684]: starting 1 worker(s)
Apr 25 13:06:01 node1 pveproxy[5684]: worker 15071 started
Apr 25 13:06:04 node1 qdiskd[4624]: qdisk cycle took more than 1 second to complete (1.680000)
Apr 25 13:06:11 node1 qdiskd[4624]: qdiskd: read (system call) has hung for 2 seconds
Apr 25 13:06:11 node1 qdiskd[4624]: In 3 more seconds, we will be evicted
Apr 25 13:06:16 node1 qdiskd[4624]: qdisk cycle took more than 1 second to complete (6.620000)
Code:
root@node1:/etc/pve# netstat -s
Ip:
86025 total packets received
0 forwarded
0 incoming packets discarded
83066 incoming packets delivered
79341 requests sent out
3 fragments dropped after timeout
2111 reassemblies required
1054 packets reassembled ok
3 packet reassembles failed
657 fragments received ok
1314 fragments created
Icmp:
131 ICMP messages received
1 input ICMP message failed.
ICMP input histogram:
destination unreachable: 2
timeout in transit: 129
8 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 5
time exceeded: 3
IcmpMsg:
InType3: 2
InType11: 129
OutType3: 5
OutType11: 3
Tcp:
730 active connections openings
402 passive connection openings
381 failed connection attempts
6 connection resets received
31 connections established
70589 segments received
67902 segments send out
371 segments retransmited
29 bad segments received.
925 resets sent
Udp:
11347 packets received
3 packets to unknown port received.
0 packet receive errors
11399 packets sent
UdpLite:
TcpExt:
179 invalid SYN cookies received
290 TCP sockets finished time wait in fast timer
1472 delayed acks sent
Quick ack mode was activated 16 times
2 packets directly queued to recvmsg prequeue.
44928 packet headers predicted
11387 acknowledgments not containing data payload received
22308 predicted acknowledgments
124 times recovered from packet loss by selective acknowledgements
Detected reordering 1 times using SACK
Detected reordering 1 times using time stamp
12 congestion windows fully recovered without slow start
4 congestion windows partially recovered using Hoe heuristic
111 congestion windows recovered without slow start by DSACK
10 congestion windows recovered without slow start after partial ack
[COLOR=#ff0000][B] 152 fast retransmits[/B][/COLOR]
[COLOR=#ff0000][B] 206 forward retransmits[/B][/COLOR]
[COLOR=#ff0000][B] 12 other TCP timeouts[/B][/COLOR]
[COLOR=#ff0000][B] TCPLossProbes: 94[/B][/COLOR]
[COLOR=#ff0000][B] TCPLossProbeRecovery: 59[/B][/COLOR]
18 DSACKs sent for old packets
4035 DSACKs received
6 connections aborted due to timeout
TCPDSACKIgnoredOld: 1
TCPDSACKIgnoredNoUndo: 3031
TCPSackMerged: 118
TCPSackShiftFallback: 1089
TCPRcvCoalesce: 167
TCPOFOQueue: 335
TCPChallengeACK: 29
TCPSYNChallenge: 29
TCPSpuriousRtxHostQueues: 1
IpExt:
InMcastPkts: 76
InBcastPkts: 920
InOctets: 30324948
OutOctets: 19347639
InMcastOctets: 2432
InBcastOctets: 202688
Code:
<?xml version="1.0"?>
<cluster config_version="14" name="tychy">
<cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/>
<quorumd allow_kill="0" interval="1" label="proxmox_quorum1" tko="5" votes="1">
<heuristic interval="2" program="ping $GATEWAY -c1 -w1" score="1" tko="4"/>
<heuristic interval="2" program="ip addr | grep bond1 | grep -q UP" score="2" tko="3"/>
</quorumd>
<totem token="20000" window_size="40"/>
<fencedevices>
<fencedevice agent="fence_drac5" cmd_prompt="/admin1->" ipaddr="10.55.55.14" login="fence" name="node1-drac" passwd="xxxxxxxxxx" secure="1"/>
<fencedevice agent="fence_drac5" cmd_prompt="/admin1->" ipaddr="10.55.55.15" login="fence" name="node2-drac" passwd="xxxxxxxxxx" secure="1"/>
</fencedevices>
<clusternodes>
<clusternode name="node1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node1-drac"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node2-drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="200"/>
</rm>
</cluster>
Code:
root@node1:/var/log/cluster# pveversion --v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 3.10.0-8-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-3.10.0-8-pve: 3.10.0-30
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
Code:
root@node2:~# pveversion --v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 3.10.0-8-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-3.10.0-8-pve: 3.10.0-30
pve-kernel-2.6.32-37-pve: 2.6.32-150
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
Last edited: