Node gets evicted during backup task

pietrek · Nov 3, 2014

Hi,

We've recently configured a Proxmox node with two hosts and shared storage. All three servers are connected directly via 10Gbit Intel 82599EB based SFP+ link. Storage server runs on latest OpenMediavault and provides 32TB of RAID10 (mdadm based) space for both backups, ISOs and VMs over NFS and is also the host for a 1GB quorum disk shared via iscsi/fileio protocol. The entire configuration is pretty simple and looks like this: NODE1->STORAGE<-NODE2. Unforttunatelly we couldn't afford a decent 10Gbit switch which is why the storage server contain a dual-port card and acts as a switch between NODE1 and NODE2 - it's a simple bridge configured with brctl. All three machines are Supermicro Xeon based computers with 16GB of RAM for the storage and 64GB for each node.

Each machine is absolutelly up to date, there's only one Windows 2012R2 VM configured in HA mode. Now, everything works perfectly fine, except for backups (we're using LZO if that matters)... During the backup task something strange happens and the host currently hosting the VM gets evicted from the cluster, the job itself is interrupted, VM gets stuck in "locked" state and everything goes south.

Could anybody please take a look at the logs below and help us track the problem?

In this specific case sul-node0001 was hosting the VM and performing the backup while sul-node0002 was completely idle acting as a backup host (and part of the cluster, of course).

Logs for NODE1:

Code:

Nov  3 12:22:05 sul-node0001 pvedaemon[4016]: <root@pam> successful auth for user                                                                                       'root@pam'
Nov  3 12:22:06 sul-node0001 rgmanager[11957]: [pvevm] VM 100 is running
Nov  3 12:22:41 sul-node0001 pvestatd[4037]: WARNING: command 'df -P -B 1 /mnt/pve                                                                                      /NFS_BACKUP' failed: got timeout
Nov  3 12:22:43 sul-node0001 pvestatd[4037]: WARNING: command 'df -P -B 1 /mnt/pve                                                                                      /NFS_MSSQL' failed: got timeout
Nov  3 12:22:45 sul-node0001 pvestatd[4037]: WARNING: command 'df -P -B 1 /mnt/pve                                                                                      /NFS_STORAGE' failed: got timeout
Nov  3 12:22:45 sul-node0001 pvestatd[4037]: status update time (6.066 seconds)
Nov  3 12:22:46 sul-node0001 rgmanager[12045]: [pvevm] VM 100 is running
Nov  3 12:23:06 sul-node0001 rgmanager[12099]: [pvevm] VM 100 is running
Nov  3 12:23:26 sul-node0001 rgmanager[12152]: [pvevm] VM 100 is running
Nov  3 12:23:57 sul-node0001 rgmanager[12284]: [pvevm] VM 100 is running
Nov  3 12:24:06 sul-node0001 rgmanager[12320]: [pvevm] VM 100 is running
Nov  3 12:24:09 sul-node0001 pveproxy[4042]: worker 9158 finished
Nov  3 12:24:09 sul-node0001 pveproxy[4042]: starting 1 worker(s)
Nov  3 12:24:09 sul-node0001 pveproxy[4042]: worker 12353 started
Nov  3 12:24:36 sul-node0001 kernel: kvm: exiting hardware virtualization
Nov  3 12:24:36 sul-node0001 kernel: sd 0:0:1:0: [sdb] Synchronizing SCSI cache

Logs for NODE2:

Code:

Nov  3 12:18:51 sul-node0002 pmxcfs[3247]: [status] notice: received log
Nov  3 12:19:29 sul-node0002 qdiskd[3448]: qdisk cycle took more than 1 second to complete (1.340000)
Nov  3 12:22:05 sul-node0002 pmxcfs[3247]: [status] notice: received log
Nov  3 12:22:34 sul-node0002 pvestatd[4430]: WARNING: command 'df -P -B 1 /mnt/pve/NFS_BACKUP' failed: got timeout
Nov  3 12:22:36 sul-node0002 pvestatd[4430]: WARNING: command 'df -P -B 1 /mnt/pve/NFS_MSSQL' failed: got timeout
Nov  3 12:22:38 sul-node0002 pvestatd[4430]: WARNING: command 'df -P -B 1 /mnt/pve/NFS_STORAGE' failed: got timeout
Nov  3 12:22:38 sul-node0002 pvestatd[4430]: status update time (6.056 seconds)
Nov  3 12:22:44 sul-node0002 pvestatd[4430]: WARNING: command 'df -P -B 1 /mnt/pve/NFS_BACKUP' failed: got timeout
Nov  3 12:22:46 sul-node0002 pvestatd[4430]: WARNING: command 'df -P -B 1 /mnt/pve/NFS_MSSQL' failed: got timeout
Nov  3 12:22:48 sul-node0002 pvestatd[4430]: WARNING: command 'df -P -B 1 /mnt/pve/NFS_STORAGE' failed: got timeout
Nov  3 12:22:48 sul-node0002 pvestatd[4430]: status update time (6.056 seconds)
Nov  3 12:24:30 sul-node0002 qdiskd[3448]: Assuming master role
Nov  3 12:24:31 sul-node0002 qdiskd[3448]: Writing eviction notice for node 1
Nov  3 12:24:32 sul-node0002 qdiskd[3448]: Node 1 evicted
Nov  3 12:25:29 sul-node0002 corosync[3397]:   [TOTEM ] A processor failed, forming new configuration.
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] CLM CONFIGURATION CHANGE
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] New Configuration:
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] #011r(0) ip(10.10.10.222)
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] Members Left:
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] #011r(0) ip(10.10.10.221)
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] Members Joined:
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [QUORUM] Members[1]: 2
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] CLM CONFIGURATION CHANGE
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] New Configuration:
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] #011r(0) ip(10.10.10.222)
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] Members Left:
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CLM   ] Members Joined:
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov  3 12:25:31 sul-node0002 rgmanager[3888]: State change: sul-node0001 DOWN
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [CPG   ] chosen downlist: sender r(0) ip(10.10.10.222) ; members(old:2 left:1)
Nov  3 12:25:31 sul-node0002 pmxcfs[3247]: [dcdb] notice: members: 2/3247
Nov  3 12:25:31 sul-node0002 pmxcfs[3247]: [dcdb] notice: members: 2/3247
Nov  3 12:25:31 sul-node0002 kernel: dlm: closing connection to node 1
Nov  3 12:25:31 sul-node0002 corosync[3397]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov  3 12:25:31 sul-node0002 fenced[3603]: fencing node sul-node0001
Nov  3 12:25:32 sul-node0002 fence_ipmilan: Parse error: Ignoring unknown option 'nodename=sul-node0001
Nov  3 12:25:47 sul-node0002 fenced[3603]: fence sul-node0001 success
Nov  3 12:25:48 sul-node0002 rgmanager[3888]: Starting stopped service pvevm:100
Nov  3 12:25:48 sul-node0002 rgmanager[9284]: [pvevm] Move config for VM 100 to local node
Nov  3 12:25:48 sul-node0002 pvevm: <root@pam> starting task UPID:sul-node0002:00002458:00050553:5457663C:qmstart:100:root@pam:
Nov  3 12:25:48 sul-node0002 task UPID:sul-node0002:00002458:00050553:5457663C:qmstart:100:root@pam:: start VM 100: UPID:sul-node0002:00002458:00050553:5457663C:qmstart:100:root@pam:
Nov  3 12:25:48 sul-node0002 task UPID:sul-node0002:00002458:00050553:5457663C:qmstart:100:root@pam:: VM is locked (backup)
Nov  3 12:25:48 sul-node0002 pvevm: <root@pam> end task UPID:sul-node0002:00002458:00050553:5457663C:qmstart:100:root@pam: VM is locked (backup)
Nov  3 12:25:48 sul-node0002 rgmanager[3888]: start on pvevm "100" returned 1 (generic error)
Nov  3 12:25:48 sul-node0002 rgmanager[3888]: #68: Failed to start pvevm:100; return value: 1
Nov  3 12:25:48 sul-node0002 rgmanager[3888]: Stopping service pvevm:100
Nov  3 12:25:49 sul-node0002 rgmanager[9306]: [pvevm] VM 100 is already stopped
Nov  3 12:25:49 sul-node0002 rgmanager[3888]: Service pvevm:100 is recovering
Nov  3 12:25:49 sul-node0002 rgmanager[3888]: #71: Relocating failed service pvevm:100
Nov  3 12:25:49 sul-node0002 rgmanager[3888]: Service pvevm:100 is stopped

There's nothing extraordinary in storage server's log. nfsd doesn't get messed up everything seems perfectly OK on that side. Could it be due to that 10Gbit getting over-saturated and dropping some vital packets which in turn leads node0002 to believe that node0001 has gone down? This seems both possible and highly unlikely to me, because peak HDD transfers on the storage server oscilate around 350MB/s which shouldn't choke those Intel 82599s, right?

Could you please provide me with any tips?

dietmar · Nov 4, 2014

Yes, looks there is something wrong with cluster communication. For HA, it is highly recommended to use a separate network for cluster communication.

pietrek · Nov 7, 2014

Hello again,

We've added an additional 1Gbit network card to our storage server to create another bridge and connected NODE1 and NODE2 to it. The nodes are now connected to the storage via two links - 1Gbit used for quorum and inter-node connectivity plus 10Gbit for NFS access. We ran manual backup task a few times to test if everything was fine and it was indeed. Up until tonight when a scheduled vmzdump task started... node0001 got evicted again after some time, node0002 was unable to take over because of the VM being locked and everything got stuck...

Here are the logs (pretty much the same situation as the original one):

Code:

Nov  7 00:15:01 sul-node0002 /USR/SBIN/CRON[192247]: (root) CMD (vzdump 100 --quiet 1 --mode snapshot --mailto our@mail.com --compress lzo --storage NFS_B
ACKUP)
Nov  7 00:15:01 sul-node0002 pmxcfs[3343]: [status] notice: received log
Nov  7 00:15:03 sul-node0002 pmxcfs[3343]: [status] notice: received log
Nov  7 00:17:01 sul-node0002 /USR/SBIN/CRON[192430]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Nov  7 00:22:24 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-node/sul-node0001: -1
Nov  7 00:22:24 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/100: -1
Nov  7 00:22:24 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_BACKUP: -1
Nov  7 00:22:24 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_MSSQL: -1
Nov  7 00:22:24 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sul-node0001/local: -1
Nov  7 00:22:24 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_STORAGE: -1
Nov  7 00:22:37 sul-node0002 rrdcached[3332]: queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_MSSQL) failed with status -1. (/var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_MSSQL: illegal attempt to update using time 1415315855 when last update time is 1415316144 (minimum one second step))
Nov  7 00:22:37 sul-node0002 rrdcached[3332]: queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_BACKUP) failed with status -1. (/var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_BACKUP: illegal attempt to update using time 1415315855 when last update time is 1415316144 (minimum one second step))
Nov  7 00:22:37 sul-node0002 rrdcached[3332]: queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_STORAGE) failed with status -1. (/var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_STORAGE: illegal attempt to update using time 1415315855 when last update time is 1415316144 (minimum one second step))
Nov  7 00:24:06 sul-node0002 pmxcfs[3343]: [status] notice: received log
Nov  7 00:24:54 sul-node0002 rrdcached[3332]: queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-node/sul-node0001) failed with status -1. (/var/lib/rrdcached/db/pve2-node/sul-node0001: illegal attempt to update using time 1415315996 when last update time is 1415316144 (minimum one second step))
Nov  7 00:25:34 sul-node0002 rrdcached[3332]: queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-vm/100) failed with status -1. (/var/lib/rrdcached/db/pve2-vm/100: illegal attempt to update using time 1415316035 when last update time is 1415316144 (minimum one second step))
Nov  7 00:25:34 sul-node0002 rrdcached[3332]: queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-storage/sul-node0001/local) failed with status -1. (/var/lib/rrdcached/db/pve2-storage/sul-node0001/local: illegal attempt to update using time 1415316036 when last update time is 1415316144 (minimum one second step))
Nov  7 00:33:14 sul-node0002 pmxcfs[3343]: [dcdb] notice: data verification successful
Nov  7 00:37:12 sul-node0002 rrdcached[3332]: flushing old values
Nov  7 00:37:12 sul-node0002 rrdcached[3332]: rotating journals
Nov  7 00:37:12 sul-node0002 rrdcached[3332]: started new journal /var/lib/rrdcached/journal/rrd.journal.1415317032.785746
Nov  7 00:37:12 sul-node0002 rrdcached[3332]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1415309832.785786
Nov  7 00:39:06 sul-node0002 pmxcfs[3343]: [status] notice: received log
Nov  7 00:42:47 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/100: -1
Nov  7 00:42:47 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_BACKUP: -1
Nov  7 00:42:47 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_MSSQL: -1
Nov  7 00:42:47 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sul-node0001/local: -1
Nov  7 00:42:47 sul-node0002 pmxcfs[3343]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sul-node0001/NFS_STORAGE: -1
Nov  7 00:42:47 sul-node0002 qdiskd[3546]: Node 1 is undead.
Nov  7 00:42:47 sul-node0002 qdiskd[3546]: Writing eviction notice (again) for node 1
Nov  7 00:42:48 sul-node0002 qdiskd[3546]: Node 1 evicted
Nov  7 00:42:49 sul-node0002 qdiskd[3546]: Assuming master role
Nov  7 00:43:41 sul-node0002 corosync[3495]:   [TOTEM ] A processor failed, forming new configuration.
Nov  7 00:43:43 sul-node0002 corosync[3495]:   [CLM   ] CLM CONFIGURATION CHANGE

Is there any way to slightly increase the quorum/heartbeat timeout so that node0002 doesn't fence node0001 (and vice versa) so quickly? Would that be a good idea?

P.S.

Here are some more details about our setup:

NODE0001:

Code:

root@sul-node0001:~# pveversion --verbose
proxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.3-1 (running version: 3.3-1/a06c9f73)
pve-kernel-2.6.32-32-pve: 2.6.32-136
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-23
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Code:

root@sul-node0001:~# pvecm stat
Version: 6.2.0
Config Version: 20
Cluster Name: pvevm-cluster
Cluster Id: 39604
Cluster Member: Yes
Cluster Generation: 16
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 8
Flags:
Ports Bound: 0 177 178
Node name: sul-node0001
Node ID: 1
Multicast addresses: 239.192.154.79
Node addresses: 10.10.11.221

NODE0002

Code:

root@sul-node0002:~# pveversion --verbose
proxmox-ve-2.6.32: 3.2-136 (running kernel: 2.6.32-32-pve)
pve-manager: 3.3-1 (running version: 3.3-1/a06c9f73)
pve-kernel-2.6.32-32-pve: 2.6.32-136
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.1-34
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-23
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-5
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Code:

root@sul-node0002:~# pvecm status
Version: 6.2.0
Config Version: 20
Cluster Name: pvevm-cluster
Cluster Id: 39604
Cluster Member: Yes
Cluster Generation: 16
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Quorum device votes: 1
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 8
Flags:
Ports Bound: 0 177 178
Node name: sul-node0002
Node ID: 2
Multicast addresses: 239.192.154.79
Node addresses: 10.10.11.222

We decided to go ahead and change quorumd interval from 1 to 3 using interval="3" in cluster.conf
One backup task went smoothly, but another one bailed out :-/
The strange thing is that now quorum cycle takes even longer for no apparent reason:

Nov 07 13:09:33 qdiskd qdisk cycle took more than 3 seconds to complete (6.590000)
Nov 07 13:13:07 qdiskd qdisk cycle took more than 3 seconds to complete (55.110000)

What is the catch here, guys? We're not using anything super-fancy, the nodes are connected directly using 1Gbit Intel cards, MTU on every ethernet adapter is set to 1500.

Search

Search

Node gets evicted during backup task

pietrek

New Member

dietmar

Proxmox Staff Member

pietrek

New Member

We value your privacy