HA problem

bemar

Member
Nov 16, 2011
69
0
6
Switzerland
Hello,

I've created an HA Cluster with 3 nodes.
Fencing is working with the APC7921 fencing device.

I've tried to start a VM and got the following error:

PHP:
Executing HA start for CT 109
Member vmhost2 trying to enable pvevm:109...Could not connect to resource group manager
TASK ERROR: command 'clusvcadm -e pvevm:109 -m vmhost2' failed: exit code 1

Starting the rgmanager failed with:
PHP:
root@vmhost2:~# /etc/init.d/rgmanager start
Starting Cluster Service Manager: [FAILED]

What could be the problem?

Thank you an best regards

Ben
 
Last edited:
Thats what I've got from "cat /var/log/syslog | grep dlm"

PHP:
Apr 25 15:13:05 vmhost2 kernel: dlm: closing connection to node 3
Apr 25 15:48:07 vmhost2 dlm_controld[2186]: dlm_controld 1324544458 started
Apr 25 15:48:18 vmhost2 kernel: dlm: Using TCP for communications
Apr 25 15:48:19 vmhost2 dlm_controld[2186]: dlm_join_lockspace no fence domain
Apr 25 15:48:19 vmhost2 dlm_controld[2186]: process_uevent online@ error -1 errno 2
Apr 25 15:48:19 vmhost2 kernel: dlm: rgmanager: group join failed -1 -1
Apr 25 16:02:12 vmhost2 kernel: dlm: Using TCP for communications
Apr 25 16:02:12 vmhost2 dlm_controld[2186]: dlm_join_lockspace no fence domain
Apr 25 16:02:12 vmhost2 dlm_controld[2186]: process_uevent online@ error -1 errno 11
Apr 25 16:02:12 vmhost2 kernel: dlm: rgmanager: group join failed -1 -1

Thats my cluster.conf
PHP:
<?xml version="1.0"?>
<cluster config_version="10" name="FinawareCluster">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_apc" ipaddr="192.168.61.14" login="apc" name="apc" passwd="apc"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="vmhost1" nodeid="1" votes="1">
      <fence>
        <method name="power">
          <device name="apc" port="1" secure="on"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="vmhost2" nodeid="2" votes="1">
      <fence>
        <method name="power">
          <device name="apc" port="2" secure="on"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="vmhost3" nodeid="3" votes="1">
      <fence>
        <method name="power">
          <device name="apc" port="3" secure="on"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="102"/>
    <pvevm autostart="1" vmid="107"/>
    <pvevm autostart="1" vmid="109"/>
  </rm>
</cluster>
 
Got it.

the rgmanager must run on the master node (vmhost1). I've tried to start it first on vmhost2 and that failed.
After starting rgmanager on master node vmhost1 all other startups on the other nodes succeeded.

Little hint: execute "update-rc.d rgmanager defaults" to make sure it will be started on node reboot.

Now I've earned a cigarette ;-)

Best regards

Bena
 
Hi,

I guess that was to early:

The rgmanager is running on vmhost1 (master).

On the 2 other nodes I get the error in syslog:
PHP:
dlm: Using TCP for communications
dlm: rgmanager: group join failed -1 -1

The rgmanager is starting but is stopping right after the start.

I have no clue what the problem is because there is no more info.

Any ideas?

Best regards

Ben
 
Hi,
On the 2 other nodes I get the error in syslog:
PHP:
dlm: Using TCP for communications
dlm: rgmanager: group join failed -1 -1

The rgmanager is starting but is stopping right after the start.

I have no clue what the problem is because there is no more info.

Any ideas?
Firewall issues or managed switch blocking communication between ports involved (vlan tagging)?
 
Nothing of them. The PVE Communication is running through the same route and cards and that works.

Thats my network config:
PHP:
auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

iface eth2 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.61.10
        netmask 255.255.255.0
        gateway 192.168.61.1
        bridge_ports eth1
        bridge_stp off
        bridge_fd 0

auto bond0
iface bond0 inet static
        slaves eth0 eth2
        address 172.60.23.6
        netmask 255.255.255.240
        network 172.60.23.0
        broadcast 172.60.23.15
        bond miimon 100
        bond_mode balance-rr
 
Nothing of them. The PVE Communication is running through the same route and cards and that works.

Thats my network config:
PHP:
auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

iface eth2 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.61.10
        netmask 255.255.255.0
        gateway 192.168.61.1
        bridge_ports eth1
        bridge_stp off
        bridge_fd 0

auto bond0
iface bond0 inet static
        slaves eth0 eth2
        address 172.60.23.6
        netmask 255.255.255.240
        network 172.60.23.0
        broadcast 172.60.23.15
        bond miimon 100
        bond_mode balance-rr
Are the nodes assigned IP on 172.60.23.0 or 192.168.61.0? routing does not automatically take place between vmbr0 and bond0
 
They are assigned to each other to the 172.60.23.0 network
But your fence agent:
<fencedevices>
<fencedevice agent="fence_apc" ipaddr="192.168.61.14" login="apc" name="apc" passwd="apc"/>
</fencedevices>

How are the fence agent suppose to communicate with the nodes and the node manager?
 
They are assigned to each other to the 172.60.23.0 network


sounds like a little issue i'm having. Out of interest try stopping CRON CMAN and restarting PVECluster manager then start CRON CMAN and then RGManager sounds very similar to part of the issue i'm having with HA not quite playing ball.

Dave
 
Through the other network.

When I execute the "fence_apc" command on the nodes it works

PHP:
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
172.60.23.0     *               255.255.255.240 U     0      0        0 bond0
192.168.61.0    *               255.255.255.0   U     0      0        0 vmbr0
default         192.168.61.1    0.0.0.0         UG    0      0        0 vmbr0
 
No luck. Got

PHP:
Apr 25 17:48:41 vmhost2 dlm_controld[2626]: dlm_controld 1324544458 started
Apr 25 17:48:49 vmhost2 kernel: dlm: Using TCP for communications
Apr 25 17:48:49 vmhost2 dlm_controld[2626]: dlm_join_lockspace no fence domain
Apr 25 17:48:49 vmhost2 dlm_controld[2626]: process_uevent online@ error -1 errno 2
Apr 25 17:48:49 vmhost2 kernel: dlm: rgmanager: group join failed -1 -1
 
Today at 5:01 a.m. a very serious error occured on vmhost1 (master). As I came to the office today the machine was marked red in the GUI and all VMs and Containers off.

PHP:
Apr 26 05:01:10 vmhost1 kernel: connection1:0: detected conn error (1011)
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: Device offlined - not ready after error recovery
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: Device offlined - not ready after error recovery
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: Device offlined - not ready after error recovery
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: Device offlined - not ready after error recovery
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: Device offlined - not ready after error recovery
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Unhandled error code
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] CDB: Write(10): 2a 00 0f 88 32 08 00 00 08 00
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Unhandled error code
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] CDB: Write(10): 2a 00 0f 88 44 d0 00 00 08 00
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Unhandled error code
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] CDB: Write(10): 2a 00 0f 55 5f e8 00 00 40 00
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Unhandled error code
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] CDB: Write(10): 2a 00 0f 55 60 a0 00 00 08 00
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Unhandled error code
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
Apr 26 05:03:17 vmhost1 kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 10 80 e7 80 00 00 08 00
Apr 26 05:04:30 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:05:22 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:06:14 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:07:06 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:07:58 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:08:50 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:09:42 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:10:34 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:11:26 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:12:39 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:13:31 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:14:23 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:15:15 vmhost1 kernel: scsi 5:0:0:1: Device offlined - not ready after error recovery
Apr 26 05:17:56 vmhost1 kernel: iscsiadm      D ffff880f3a47c580     0 188738   3071    0 0x00000000
Apr 26 05:17:56 vmhost1 kernel: ffff880f03fd77f8 0000000000000082 0000000000000000 ffffffff8100984c
Apr 26 05:17:56 vmhost1 kernel: ffff8807bcfe1278 0000000000000000 0000000000fd77b8 ffff88002825bd80
Apr 26 05:17:56 vmhost1 kernel: ffff88002825e268 ffff880f3a47cb20 ffff880f03fd7fd8 ffff880f03fd7fd8
Apr 26 05:17:56 vmhost1 kernel: Call Trace:
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8100984c>] ? __switch_to+0x1ac/0x320
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8150b4a5>] schedule_timeout+0x215/0x2e0
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8150b113>] wait_for_common+0x123/0x190
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81059b50>] ? default_wake_function+0x0/0x20
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81240643>] ? __generic_unplug_device+0x33/0x40
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8150b23d>] wait_for_completion+0x1d/0x20
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81247eec>] blk_execute_rq+0x8c/0xf0
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81241970>] ? blk_rq_bio_prep+0x30/0xb0
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81247a66>] ? blk_rq_map_kern+0xd6/0x150
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8135d80c>] scsi_execute+0xfc/0x160
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8135da88>] scsi_execute_req+0xb8/0x190
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8135f20c>] scsi_probe_and_add_lun+0x2dc/0xef0
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81242659>] ? blk_put_request+0x49/0x60
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81260127>] ? kobject_put+0x27/0x60
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8136023c>] __scsi_scan_target+0x41c/0x750
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81360ca5>] scsi_scan_target+0xd5/0xf0
Apr 26 05:17:56 vmhost1 kernel: [<ffffffffa029bd69>] iscsi_user_scan_session+0x159/0x190 [scsi_transport_iscsi]
Apr 26 05:17:56 vmhost1 kernel: [<ffffffffa029bc10>] ? iscsi_user_scan_session+0x0/0x190 [scsi_transport_iscsi]
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8133ba8c>] device_for_each_child+0x4c/0x80
Apr 26 05:17:56 vmhost1 kernel: [<ffffffffa029a69d>] iscsi_user_scan+0x2d/0x30 [scsi_transport_iscsi]
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff81361854>] store_scan+0xe4/0x120
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8133a960>] dev_attr_store+0x20/0x30
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff812032e5>] sysfs_write_file+0xe5/0x170
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8118b058>] vfs_write+0xb8/0x1a0
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8118ba61>] sys_write+0x51/0x90
Apr 26 05:17:56 vmhost1 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Apr 26 05:19:56 vmhost1 kernel: iscsiadm      D ffff880f3a47c580     0 188738   3071    0 0x00000000
Apr 26 05:19:56 vmhost1 kernel: ffff880f03fd77f8 0000000000000082 0000000000000000 ffffffff8100984c
Apr 26 05:19:56 vmhost1 kernel: ffff8807bcfe1278 0000000000000000 0000000000fd77b8 ffff88002825bd80
Apr 26 05:19:56 vmhost1 kernel: ffff88002825e268 ffff880f3a47cb20 ffff880f03fd7fd8 ffff880f03fd7fd8
Apr 26 05:19:56 vmhost1 kernel: Call Trace:
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8100984c>] ? __switch_to+0x1ac/0x320
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8150b4a5>] schedule_timeout+0x215/0x2e0
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8150b113>] wait_for_common+0x123/0x190
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81059b50>] ? default_wake_function+0x0/0x20
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81240643>] ? __generic_unplug_device+0x33/0x40
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8150b23d>] wait_for_completion+0x1d/0x20
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81247eec>] blk_execute_rq+0x8c/0xf0
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81241970>] ? blk_rq_bio_prep+0x30/0xb0
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81247a66>] ? blk_rq_map_kern+0xd6/0x150
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8135d80c>] scsi_execute+0xfc/0x160
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8135da88>] scsi_execute_req+0xb8/0x190
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8135f20c>] scsi_probe_and_add_lun+0x2dc/0xef0
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81242659>] ? blk_put_request+0x49/0x60
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81260127>] ? kobject_put+0x27/0x60
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8136023c>] __scsi_scan_target+0x41c/0x750
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81360ca5>] scsi_scan_target+0xd5/0xf0
Apr 26 05:19:56 vmhost1 kernel: [<ffffffffa029bd69>] iscsi_user_scan_session+0x159/0x190 [scsi_transport_iscsi]
Apr 26 05:19:56 vmhost1 kernel: [<ffffffffa029bc10>] ? iscsi_user_scan_session+0x0/0x190 [scsi_transport_iscsi]
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8133ba8c>] device_for_each_child+0x4c/0x80
Apr 26 05:19:56 vmhost1 kernel: [<ffffffffa029a69d>] iscsi_user_scan+0x2d/0x30 [scsi_transport_iscsi]
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff81361854>] store_scan+0xe4/0x120
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8133a960>] dev_attr_store+0x20/0x30
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff812032e5>] sysfs_write_file+0xe5/0x170
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8118b058>] vfs_write+0xb8/0x1a0
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8118ba61>] sys_write+0x51/0x90
Apr 26 05:19:56 vmhost1 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Apr 26 05:21:56 vmhost1 kernel: iscsiadm      D ffff880f3a47c580     0 188738   3071    0 0x00000000
Apr 26 05:21:56 vmhost1 kernel: ffff880f03fd77f8 0000000000000082 0000000000000000 ffffffff8100984c
Apr 26 05:21:56 vmhost1 kernel: ffff8807bcfe1278 0000000000000000 0000000000fd77b8 ffff88002825bd80
Apr 26 05:21:56 vmhost1 kernel: ffff88002825e268 ffff880f3a47cb20 ffff880f03fd7fd8 ffff880f03fd7fd8
Apr 26 05:21:56 vmhost1 kernel: Call Trace:
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8100984c>] ? __switch_to+0x1ac/0x320
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8150b4a5>] schedule_timeout+0x215/0x2e0
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8150b113>] wait_for_common+0x123/0x190
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81059b50>] ? default_wake_function+0x0/0x20
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81240643>] ? __generic_unplug_device+0x33/0x40
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8150b23d>] wait_for_completion+0x1d/0x20
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81247eec>] blk_execute_rq+0x8c/0xf0
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81241970>] ? blk_rq_bio_prep+0x30/0xb0
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81247a66>] ? blk_rq_map_kern+0xd6/0x150
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8135d80c>] scsi_execute+0xfc/0x160
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8135da88>] scsi_execute_req+0xb8/0x190
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8135f20c>] scsi_probe_and_add_lun+0x2dc/0xef0
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81242659>] ? blk_put_request+0x49/0x60
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81260127>] ? kobject_put+0x27/0x60
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8136023c>] __scsi_scan_target+0x41c/0x750
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81360ca5>] scsi_scan_target+0xd5/0xf0
Apr 26 05:21:56 vmhost1 kernel: [<ffffffffa029bd69>] iscsi_user_scan_session+0x159/0x190 [scsi_transport_iscsi]
Apr 26 05:21:56 vmhost1 kernel: [<ffffffffa029bc10>] ? iscsi_user_scan_session+0x0/0x190 [scsi_transport_iscsi]
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8133ba8c>] device_for_each_child+0x4c/0x80
Apr 26 05:21:56 vmhost1 kernel: [<ffffffffa029a69d>] iscsi_user_scan+0x2d/0x30 [scsi_transport_iscsi]
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff81361854>] store_scan+0xe4/0x120
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8133a960>] dev_attr_store+0x20/0x30
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff812032e5>] sysfs_write_file+0xe5/0x170
Apr 26 05:21:56 vmhost1 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Apr 26 05:23:56 vmhost1 kernel: iscsiadm      D ffff880f3a47c580     0 188738   3071    0 0x00000000
Apr 26 05:23:56 vmhost1 kernel: ffff880f03fd77f8 0000000000000082 0000000000000000 ffffffff8100984c
Apr 26 05:23:56 vmhost1 kernel: ffff8807bcfe1278 0000000000000000 0000000000fd77b8 ffff88002825bd80
Apr 26 05:23:56 vmhost1 kernel: ffff88002825e268 ffff880f3a47cb20 ffff880f03fd7fd8 ffff880f03fd7fd8
Apr 26 05:23:56 vmhost1 kernel: Call Trace:
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8100984c>] ? __switch_to+0x1ac/0x320
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8150b4a5>] schedule_timeout+0x215/0x2e0
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8150b113>] wait_for_common+0x123/0x190
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81059b50>] ? default_wake_function+0x0/0x20
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81240643>] ? __generic_unplug_device+0x33/0x40
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8150b23d>] wait_for_completion+0x1d/0x20
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81247eec>] blk_execute_rq+0x8c/0xf0
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81241970>] ? blk_rq_bio_prep+0x30/0xb0
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81247a66>] ? blk_rq_map_kern+0xd6/0x150
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8135d80c>] scsi_execute+0xfc/0x160
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8135da88>] scsi_execute_req+0xb8/0x190
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8135f20c>] scsi_probe_and_add_lun+0x2dc/0xef0
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81242659>] ? blk_put_request+0x49/0x60
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81260127>] ? kobject_put+0x27/0x60
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8136023c>] __scsi_scan_target+0x41c/0x750
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81360ca5>] scsi_scan_target+0xd5/0xf0
Apr 26 05:23:56 vmhost1 kernel: [<ffffffffa029bd69>] iscsi_user_scan_session+0x159/0x190 [scsi_transport_iscsi]
Apr 26 05:23:56 vmhost1 kernel: [<ffffffffa029bc10>] ? iscsi_user_scan_session+0x0/0x190 [scsi_transport_iscsi]
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8133ba8c>] device_for_each_child+0x4c/0x80
Apr 26 05:23:56 vmhost1 kernel: [<ffffffffa029a69d>] iscsi_user_scan+0x2d/0x30 [scsi_transport_iscsi]
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff81361854>] store_scan+0xe4/0x120
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8133a960>] dev_attr_store+0x20/0x30
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff812032e5>] sysfs_write_file+0xe5/0x170
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8118b058>] vfs_write+0xb8/0x1a0
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8118ba61>] sys_write+0x51/0x90
Apr 26 05:23:56 vmhost1 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Apr 26 05:25:56 vmhost1 kernel: iscsiadm      D ffff880f3a47c580     0 188738   3071    0 0x00000000
Apr 26 05:25:56 vmhost1 kernel: ffff880f03fd77f8 0000000000000082 0000000000000000 ffffffff8100984c
Apr 26 05:25:56 vmhost1 kernel: ffff8807bcfe1278 0000000000000000 0000000000fd77b8 ffff88002825bd80
Apr 26 05:25:56 vmhost1 kernel: ffff88002825e268 ffff880f3a47cb20 ffff880f03fd7fd8 ffff880f03fd7fd8
Apr 26 05:25:56 vmhost1 kernel: Call Trace:
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8100984c>] ? __switch_to+0x1ac/0x320
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8150b4a5>] schedule_timeout+0x215/0x2e0
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8150b113>] wait_for_common+0x123/0x190
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81059b50>] ? default_wake_function+0x0/0x20
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81240643>] ? __generic_unplug_device+0x33/0x40
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8150b23d>] wait_for_completion+0x1d/0x20
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81247eec>] blk_execute_rq+0x8c/0xf0
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81241970>] ? blk_rq_bio_prep+0x30/0xb0
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81247a66>] ? blk_rq_map_kern+0xd6/0x150
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8135d80c>] scsi_execute+0xfc/0x160
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8135da88>] scsi_execute_req+0xb8/0x190
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8135f20c>] scsi_probe_and_add_lun+0x2dc/0xef0
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81242659>] ? blk_put_request+0x49/0x60
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81260127>] ? kobject_put+0x27/0x60
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8136023c>] __scsi_scan_target+0x41c/0x750
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81360ca5>] scsi_scan_target+0xd5/0xf0
Apr 26 05:25:56 vmhost1 kernel: [<ffffffffa029bd69>] iscsi_user_scan_session+0x159/0x190 [scsi_transport_iscsi]
Apr 26 05:25:56 vmhost1 kernel: [<ffffffffa029bc10>] ? iscsi_user_scan_session+0x0/0x190 [scsi_transport_iscsi]
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8133ba8c>] device_for_each_child+0x4c/0x80
Apr 26 05:25:56 vmhost1 kernel: [<ffffffffa029a69d>] iscsi_user_scan+0x2d/0x30 [scsi_transport_iscsi]
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff81361854>] store_scan+0xe4/0x120
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8133a960>] dev_attr_store+0x20/0x30
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff812032e5>] sysfs_write_file+0xe5/0x170
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8118b058>] vfs_write+0xb8/0x1a0
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8118ba61>] sys_write+0x51/0x90
Apr 26 05:25:56 vmhost1 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b

Do anybody see what the problem was?

The strange thing is that the node wasn't fenced although of that errors in message.log. All VMs and Containers (several managed by HA) weren't migrated to the other 2 nodes which are running well.
When do the fencing machanism do something?

Thank you

Ben
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!