Hello,
I am aware that raid soft was discussed many times and that proxmox
team do not support it.
However, I know it does interest many proxmox users and therefore
I would like to report my difficult experience with
md raid 1 + lvm2 + snapshot.
Also, this may be a lvm2 related issue
I use a proxmox 1.3:
2.6.24-7-pve #1 SMP PREEMPT Fri Aug 21 09:07:39 CEST 2009 x86_64 GNU/Linux
The system was installed a couple weeks ago and
Snapshot worked fine until the problem occured
despite no changes to the disks configuration.
In short the main symptom is that the lvm group
of volumes completly stalled after reading about
1GB while executing a vzdump snapshot backup.
My configuration:
md1: md RAID 1 + ext3 mounted as /
md0: md RAID 1 + lvm2 divided in 2 x ext3 volumes vmdata and vmbackups, mounted as /var/lib/vz and /backups.
Symptoms:
- snapshot vzsnap creation OK
- backups on vmbackups started OK
- after backing up about 1Gb the snapshot stalled. I mean that all requests
to read files on any lvm volume will hang. Including /backups not involved
in the snapshot.
However "ls" and "cd" commands do work and I can get directories listing.
Any command to read a file content stall the ssh session (ex cat,cp,mv).
A simple "cat /backups/phil.log" will also stall the ssh session.
- smartctl do not report any problem (including the long test)
- "wa" in "top" is blocked at 99%, cpu is near zero.
- the snapshot is visible in /dev/mapper
- the snapshot cannot be removed (lvremove -f). again no error reported. just hanged with no output at all.
- the system seem to work fine as long as nothing tries to read on one of the
2 lvm2 volumes.
- no error reported in messages or syslog.
- it seem a md check started after the snapshot creation. This check process
also stalled at 29% (speed=0K/sec). again no error reported.
- soft reboot did not work
- hard reboot worked. But a md resync started and stalled at 0.1% leaving the system in the same context as before the hard reboot.
To recover a working system I set sdb3 as faulty, removed sdb3 from the raid1
and hard rebooted. That worked and I could remove the snapshot and access the data
on both lvm volumes. Since then, I did not try to create a snapshot and system
seem to work fine.
Any comments or suggestiosn would be very appreciated.
Greetings,
Phil Ten
I am aware that raid soft was discussed many times and that proxmox
team do not support it.
However, I know it does interest many proxmox users and therefore
I would like to report my difficult experience with
md raid 1 + lvm2 + snapshot.
Also, this may be a lvm2 related issue

I use a proxmox 1.3:
2.6.24-7-pve #1 SMP PREEMPT Fri Aug 21 09:07:39 CEST 2009 x86_64 GNU/Linux
The system was installed a couple weeks ago and
Snapshot worked fine until the problem occured
despite no changes to the disks configuration.
In short the main symptom is that the lvm group
of volumes completly stalled after reading about
1GB while executing a vzdump snapshot backup.
My configuration:
md1: md RAID 1 + ext3 mounted as /
md0: md RAID 1 + lvm2 divided in 2 x ext3 volumes vmdata and vmbackups, mounted as /var/lib/vz and /backups.
Symptoms:
- snapshot vzsnap creation OK
- backups on vmbackups started OK
- after backing up about 1Gb the snapshot stalled. I mean that all requests
to read files on any lvm volume will hang. Including /backups not involved
in the snapshot.
However "ls" and "cd" commands do work and I can get directories listing.
Any command to read a file content stall the ssh session (ex cat,cp,mv).
A simple "cat /backups/phil.log" will also stall the ssh session.
- smartctl do not report any problem (including the long test)
- "wa" in "top" is blocked at 99%, cpu is near zero.
- the snapshot is visible in /dev/mapper
- the snapshot cannot be removed (lvremove -f). again no error reported. just hanged with no output at all.
- the system seem to work fine as long as nothing tries to read on one of the
2 lvm2 volumes.
- no error reported in messages or syslog.
- it seem a md check started after the snapshot creation. This check process
also stalled at 29% (speed=0K/sec). again no error reported.
- soft reboot did not work
- hard reboot worked. But a md resync started and stalled at 0.1% leaving the system in the same context as before the hard reboot.
To recover a working system I set sdb3 as faulty, removed sdb3 from the raid1
and hard rebooted. That worked and I could remove the snapshot and access the data
on both lvm volumes. Since then, I did not try to create a snapshot and system
seem to work fine.
Any comments or suggestiosn would be very appreciated.
Greetings,
Phil Ten