Distributed filesystem for HA cluster and SAS storage

Alex31 · Oct 23, 2014

Hello,

I have 3 nodes linked on a IBM Storwize v3700 in SAS (Serial Attached SCSI).
The HA cluster works like a charm in a shared LVM, but ... the snapshot seem to be disabled.
I can only make a baskup.

After few investigation, it 's seem this option is not possible when the vm disk is in "raw".
If a move the disk of the VM on a NFS storage and convert the disk to qcow2 for example, I can make snapshot of the VM

So, my question is:
How can I enable the snapshot with a LVM mount ?

Alex

dea · Oct 23, 2014

Is not possible on LVM (this is not a LVM snapshot), qm snapshot require qcow2 and a file system or a more complex structure like ZFS.

Luca

Alex31 · Oct 23, 2014

Wich file system can I use for my HA cluster with a SAS attached storage ?

Alex

dea · Oct 24, 2014

I use several storwize on clusters.
Use only LVM storage, is more resilent and faster than a file system (fewer logical levels). But you can not use qemu snapshot.

Alex31 · Oct 24, 2014

Arf .. thanks you for your response Dea, but ... I'm really not ready to give up snapshot. Life is so simple with it

I continued my research of the graal

Alex

Alex31 · Oct 24, 2014

I have forgotten to explain another thing.... Maybe onebody is going to have THE idea

I have a cluster of 3 ESX linked to a IBM DS3524 in SAS (Serial Attached SCSI). The storage is shared between the 3 ESX and all works fine: HA + vmotion etc .... The filesystem is VMFS...

So, finally, what I search, is an equivalent on proxmox...

Mitya · Oct 25, 2014

If you want filesystem on shared storage, then look at ocfs2 or gfs2.

Alex31 · Oct 28, 2014

Thanks Mitya. GFS2 seem to be the good solution.

I have install it and clustered a volume group. I have make a gfs2 filesystem on it and the 3 nodes mount correctely the FS.
I can create file but, impossible to delete it. I have a timeout and some kernels errors:

Anybody have an idea of what's occured ?

Oct 28 09:54:33 proxmox02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 09:54:33 proxmox02 kernel: glock_workque D ffff88087bb00d30 0 5678 2 0 0x00000000
Oct 28 09:54:33 proxmox02 kernel: ffff88087827bc70 0000000000000046 ffff88087827bbf0 ffffffff811df2b6
Oct 28 09:54:33 proxmox02 kernel: ffff8810749cbe98 ffff8808770df000 ffff881074894e30 000000000000fc83
Oct 28 09:54:33 proxmox02 kernel: ffff88087827bc20 ffffffffa072fc0f ffff88087827bfd8 ffff88087827bfd8
Oct 28 09:54:33 proxmox02 kernel: Call Trace:
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff811df2b6>] ? submit_bh+0x126/0x200
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa072fc0f>] ? gfs2_log_write_buf+0xaf/0xd0 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8155ac03>] io_schedule+0x73/0xc0
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa072ef39>] gfs2_log_flush+0x2b9/0x530 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8109fcd0>] ? autoremove_wake_function+0x0/0x40
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa072babb>] inode_go_sync+0x7b/0x160 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa0729b56>] do_xmote+0x136/0x280 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8155a3dc>] ? thread_return+0xbc/0x870
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa072a390>] ? glock_work_func+0x0/0x1e0 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa072a276>] run_queue+0x136/0x250 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa072a390>] ? glock_work_func+0x0/0x1e0 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa072a405>] glock_work_func+0x75/0x1e0 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff81099189>] worker_thread+0x179/0x2d0
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8109fcd0>] ? autoremove_wake_function+0x0/0x40
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff81099010>] ? worker_thread+0x0/0x2d0
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8109f738>] kthread+0x88/0x90
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff810096d2>] ? __switch_to+0xc2/0x2f0
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8100c3ca>] child_rip+0xa/0x20
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8109f6b0>] ? kthread+0x0/0x90
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8100c3c0>] ? child_rip+0x0/0x20
Oct 28 09:54:33 proxmox02 kernel: INFO: task task UPID

roxm:13272 blocked for more than 120 seconds.
Oct 28 09:54:33 proxmox02 kernel: Not tainted 2.6.32-32-pve #1
Oct 28 09:54:33 proxmox02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 09:54:33 proxmox02 kernel: task UPID

ro D ffff88087bb813f0 0 13272 3953 0 0x00000004
Oct 28 09:54:33 proxmox02 kernel: ffff88085bb2fb88 0000000000000082 0000000000000000 ffff8808a061ed00
Oct 28 09:54:33 proxmox02 kernel: ffff88087bb00d30 0000000000000001 000000000001ed00 ffff88107cfeac10
Oct 28 09:54:33 proxmox02 kernel: ffff88085bb2fb28 0000000100304c17 ffff88085bb2ffd8 ffff88085bb2ffd8
Oct 28 09:54:33 proxmox02 kernel: Call Trace:
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8155b824>] schedule_timeout+0x204/0x300
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff810521f3>] ? __wake_up+0x53/0x70
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8155b067>] wait_for_completion+0xd7/0x110
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff81067a70>] ? default_wake_function+0x0/0x20
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff81099bc6>] flush_work+0x76/0xc0
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff81097ee0>] ? wq_barrier_func+0x0/0x20
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff81099c62>] flush_delayed_work+0x52/0x70
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa0743452>] gfs2_clear_inode+0x62/0xc0 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff811c4cec>] clear_inode+0xac/0x140
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa0743514>] gfs2_delete_inode+0x64/0x530 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa07434b0>] ? gfs2_delete_inode+0x0/0x530 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff811c6706>] generic_delete_inode+0xa6/0x1c0
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff811c6875>] generic_drop_inode+0x55/0x70
Oct 28 09:54:33 proxmox02 kernel: [<ffffffffa0743297>] gfs2_drop_inode+0x37/0x40 [gfs2]
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff811c4b82>] iput+0x62/0x70
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff811b9686>] do_unlinkat+0x1d6/0x240
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8156028b>] ? do_page_fault+0x3b/0xa0
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff811b9706>] sys_unlink+0x16/0x20
Oct 28 09:54:33 proxmox02 kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b

Alex31 · Oct 29, 2014

I have compil the 3.1.7 gfs2-utils but now, I have this kind of problem:

[09:56] <AleX31> fatal: filesystem consistency error
[09:56] <AleX31> inode = 1 99395
[09:56] <AleX31> function = gfs2_dinode_dealloc, file = fs/gfs2/super.c, line = 1409
[09:56] <AleX31> about to withdraw this file system

bouuuuh

Why the 2.6.32-32-pve Kernel was build without OCFS2 !!??

dietmar · Oct 29, 2014

Alex31 said:
Why the 2.6.32-32-pve Kernel was build without OCFS2 !!??

Because Oracle decided to cancel support for OCFS2.

1nerdyguy · Oct 29, 2014

dietmar said:
Because Oracle decided to cancel support for OCFS2.

AKA, Oracle is being Oracle, and ruining everyone's happiness because they can.

Mitya · Oct 29, 2014

I have not used gfs2 nor ocsf2, I just know they should do what you want.
You can try different kernel (3.10 from pve-no-subscription repo).

dea · Oct 30, 2014

In my experience, in the past, I've tried to use gfs2-ocfs2. The conclusions were... they are NOT extremely robust.
I use LVM on SAN (IBM Storwize), this is the best for me.

If I need take snapshot I use LVM snapshot (in console)... can use a single level, but you can rollback or merge.

Alex31 · Oct 30, 2014

I have upgraded the kernel in 3.10.0-5-pve like say Mitya (thanks you !!!), and now, GFS2 works correctly. I can create remove file on each node without problem/hang .
It steeling a problem with the HA function with GFS... it's seem the journal is not correctly write and the VM seem to be block.... I'm not really sure of that but, I'm working on...

Dea, this wy is interesting. I'm going to have a look on it.

Alex

sengo · Nov 4, 2014

Run a i/o benchmark with and without gfs please. In my tests the performance dropped by about 70%. If you need snapshot on a block san use lvm -s and lvconvert --merge

Alex31 · Nov 12, 2014

On a GFS fileSystem mounted on each node: A dd write around 700MB/s
Without, directly on the LVM Proxmox, a dd test write around 200MB/s

I have tried the solution with lvm snasphot, it's work perfectly...
I'm going to keep my system like that until the GFS2 support on proxmox works better : Because, In my case, a dd directly on the directory mapped on each node in GFS2 create a kernel panic..... At the contrary, a dd on a VM which is writed physicaly on the directory mapped in GFS2 seem to work nicely each time... but... in the case of, it's not secure to use this in a production environement for the moment.

Alex

tonci · Nov 20, 2014

Hello
I'm very glad I've seen your post Alex because a week ago we also bought storewize v3700 and two ibm x3550 hosts with sas connectors. The purpose would fire proxmox cluster on a new HW platform. I'm sas-communication newbee but am very satisfied with NAS storage (nfs, iscsi) proxmox concept so far .

So let me share my problems that appeared very soon:
1. - I installed proxmox 3.3 (latest) on both hosts (on local hdds)
2. - fired up storewize without any iscsi/sas or volume configuration
3. - after connecting sas cables to the hosts they cannot start up (boot) anymore . After disconnecting everything was Ok back again (I connected every host to each canister/controller -> 4 sas cables involved)

So what am I missing ? do I have prepare proxmox hosts for the sas storage connection ? Do I have to load something before connecting cables ?

Is there any little cookbook regarding this issue ?

Thank you very much in advance
and best regards

Tonci Stipicevic

Alex31 · Nov 21, 2014

Hello tonci,

don't worry, it's "normal" (or it's a bug of the IBM server, this is another point of view).
When you have linked a IBM server with sas cable, I don't know why, but, the server put the sas link as the first boot order.
So, you have to play with the option of the bios, active/inactive disk for forcing your server to boot on the local disk.

You are going to have the same problem each time you create a new LUN on the storwize. So, my workaround has been to create all the LUN on the storwize and after, configure the bios for booting on the local disk.

Keep you calm, it's really boring to configure this part (the boot time of the IBM is very very very long), but after that, it will work nicely.

tonci · Nov 21, 2014

Hello Alex
1st of all thank you for your quick response and encouragement
But I noticed that grub was displayed and I assumed that server knows where to boot from

tonci · Nov 21, 2014

Hello Alex
1st of all thank you for your quick response and encouragement
But I noticed that grub was displayed and I assumed that server knows where to boot from I think just after little booting it stopped ...
But , unfortunately I must go on-site and then I'll check once more what screen says and post it again
till then
BR
Tonci

Distributed filesystem for HA cluster and SAS storage

New Member

Renowned Member

New Member

Renowned Member

New Member

New Member

Active Member

New Member

New Member

Proxmox Staff Member

Active Member

Active Member

Renowned Member

New Member

Member

New Member

Renowned Member

New Member

Renowned Member

Renowned Member

We value your privacy