Hi All,
I've set up a distributed-replicated gluster cluster with proxmox nodes and I'm running a few kvm instances on top.
I'm hitting a problem during a reboot on one of the nodes, causing the KVM servers to halt and die.
My guess is that there is a possible bug with the block device used by KVM that prevents it from "failing over" to a working node.
The setup is simple: Distributed replica on 2x2 nodes exporting a gluster share to proxmox / kvm:
#gluster volume info kvmbackend
Volume Name: kvmbackend
Type: Distributed-Replicate
Volume ID: 8807ce9f-ee5f-43d4-9a28-9999999999
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: proxmox10g:/data/gluster/brick
Brick2: proxmox11g:/data/gluster/brick
Brick3: proxmox12g:/data/gluster/brick
Brick4: proxmox13g:/data/gluster/brick
Options Reconfigured:
cluster.server-quorum-type: server
diagnostics.client-log-level: WARNING
auth.allow: 10.0.0.*,127.*
performance.cache-size: 1GB
nfs.disable: on
The gluster volume is online and fully working. I have a windows and a linux kvm servers running on proxmox13g (brick4).
The images for the windows server are located in Brick1 and Brick2 (proxmox10g and proxmox11g).
The images for the linux server are incidentally located on the same bricks.
If I reboot Brick2 both KVM VM lock up hard. The Linux one dumps console errors of not being able to access the disk and tried to remount root read-only. The windows server either hangs hard or crashes to the blue screen of death showing disk errors too.
This lock-up happens each and every time. I'm short of ideas.
Any help will be greatly appreciated.
Jinger.
I've set up a distributed-replicated gluster cluster with proxmox nodes and I'm running a few kvm instances on top.
I'm hitting a problem during a reboot on one of the nodes, causing the KVM servers to halt and die.
My guess is that there is a possible bug with the block device used by KVM that prevents it from "failing over" to a working node.
The setup is simple: Distributed replica on 2x2 nodes exporting a gluster share to proxmox / kvm:
#gluster volume info kvmbackend
Volume Name: kvmbackend
Type: Distributed-Replicate
Volume ID: 8807ce9f-ee5f-43d4-9a28-9999999999
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: proxmox10g:/data/gluster/brick
Brick2: proxmox11g:/data/gluster/brick
Brick3: proxmox12g:/data/gluster/brick
Brick4: proxmox13g:/data/gluster/brick
Options Reconfigured:
cluster.server-quorum-type: server
diagnostics.client-log-level: WARNING
auth.allow: 10.0.0.*,127.*
performance.cache-size: 1GB
nfs.disable: on
The gluster volume is online and fully working. I have a windows and a linux kvm servers running on proxmox13g (brick4).
The images for the windows server are located in Brick1 and Brick2 (proxmox10g and proxmox11g).
The images for the linux server are incidentally located on the same bricks.
If I reboot Brick2 both KVM VM lock up hard. The Linux one dumps console errors of not being able to access the disk and tried to remount root read-only. The windows server either hangs hard or crashes to the blue screen of death showing disk errors too.
This lock-up happens each and every time. I'm short of ideas.
Any help will be greatly appreciated.
Jinger.