A while back I worked with people on this forum and we eventually got a version of qemu 2.1.0 compiled over glusterfs version 3.5.2. You can see the history here:
http://forum.proxmox.com/threads/19102-Updated-gluster-possible
So until recently I was operating on the pve test 2.1.0 version of qemu so that gluster worked properly (including live migration). However, I was pleased to see that in proxmox 3.3 roadmap that qemu 2.1.0 (now 2.1.2) was compiled over the glusterfs 3.5.2 libraries.
We recently upgraded all our nodes to pve 3.3, kernel 2.6.32-33-pve which upgraded our qemu to 2.1.2. Our setup is a 3-way pve cluster where each node also serves as a gluster storage (3-way replica one brick per pve node). The upgrade path we took which made the issue clear was:
1) Live migrate all guests from host A to host B
2) Gracefully shutdown gluster on Host A
3) Upgrade host A (apt-get dist-upgrade) etc etc
4) Reboot Host A (all brand new)
5) Wait for gluster to heal
All upgrades where migration was from qemu 2.1.2 produced disk acess failures on guests (some guests are more tolerent than others, but all had IO issues). It looks like the libgfapi included in the 3.3 qemu 2.1.2 is behaving differently when a gluster brick goes offline. We have subsequently tested this moving from 2.1.2 to 2.1.2 to verify it was not a 2.1.0 to 2.1.2 migration issue.
Does anyone have thoughts on this? At present we would only consider cold migration.
Thanks!
http://forum.proxmox.com/threads/19102-Updated-gluster-possible
So until recently I was operating on the pve test 2.1.0 version of qemu so that gluster worked properly (including live migration). However, I was pleased to see that in proxmox 3.3 roadmap that qemu 2.1.0 (now 2.1.2) was compiled over the glusterfs 3.5.2 libraries.
We recently upgraded all our nodes to pve 3.3, kernel 2.6.32-33-pve which upgraded our qemu to 2.1.2. Our setup is a 3-way pve cluster where each node also serves as a gluster storage (3-way replica one brick per pve node). The upgrade path we took which made the issue clear was:
1) Live migrate all guests from host A to host B
2) Gracefully shutdown gluster on Host A
3) Upgrade host A (apt-get dist-upgrade) etc etc
4) Reboot Host A (all brand new)
5) Wait for gluster to heal
All upgrades where migration was from qemu 2.1.2 produced disk acess failures on guests (some guests are more tolerent than others, but all had IO issues). It looks like the libgfapi included in the 3.3 qemu 2.1.2 is behaving differently when a gluster brick goes offline. We have subsequently tested this moving from 2.1.2 to 2.1.2 to verify it was not a 2.1.0 to 2.1.2 migration issue.
Does anyone have thoughts on this? At present we would only consider cold migration.
Thanks!