Results 1 to 7 of 7

Thread: lvremove hangs on backup, uninterruptable sleep, two times in a row now

  1. #1
    Join Date
    Jul 2009
    Posts
    19

    Default lvremove hangs on backup, uninterruptable sleep, two times in a row now

    Hello,

    I use proxmox with latest kernel 2.6.32 (see detailed version informations below) and I have on this machine some KVM's, which have their raw image on an iSCSI LVM volume.

    This machine is running with this scenario for over a month now. On Saturday one KVM Machine (a Zimbra Server) suddenly stopped. On Monday I could not enter the KVM from ssh and proxmox did not let me login. I tried from another node but could not do anything with the node in question.

    After restarting the pvedaemon I noticed vgs and vgscan stuff laying dormant around. I investigated further and found that the cron-jobs doing the backups is hanging "uninterruptable" with lvremove.

    I had nothing really relevant in the log-files, the output from lsscsi is fine, however all lvm tools (vgdisplay, pvdisplay) are hanging in regard to the volume (vmstorage), all other pv's and vg's display fine (after a CTRL-C).

    The other KVM-Machines seem to be fine as well as I can ssh into them, etc.

    I rebooted the node for the better and everything worked fine, until the nightly backup, again the same problem. Now I am worried to why, because in the whole Setup *nothing* has changed.

    All KVM Machines are running either debian or ubuntu LTS 8.04

    All logs insinde the Zimbra KVM machine just stop at a specific time. Around this Time I can not find anything unusual in the Logs from the Host machine.


    here are some more details gathered from the Host machine:
    root 3064 0.0 0.0 19832 1040 ? Ss Mar15 0:00 /usr/sbin/cron
    root 8857 0.0 0.0 28372 992 ? S Mar15 0:00 \_ /USR/SBIN/CRON
    root 8859 0.0 0.1 47824 13096 ? Ss Mar15 0:00 \_ /usr/bin/perl -w /usr/sbin/vzdump --quiet --node 1 --snapshot --compress --storage backup-bagdad
    root 21200 0.0 0.0 38668 7892 ? S Mar15 0:00 \_ /usr/bin/perl -w /usr/sbin/pvesm lock KVM 60
    root 21201 0.0 0.1 25840 13568 ? D<L Mar15 0:00 \_ lvremove -f /dev/vmstorage/vzsnap-node-04-0

    Hanging pvedaemon (but interruptable)
    root 21246 0.0 0.0 15492 1516 ? S Mar15 0:00 | \_ /sbin/vgs --separator : --noheadings --units k --unbuffered --nosuffix --options vg_name,vg_size
    root 4494 0.0 0.2 88116 24096 ? S Mar15 0:08 \_ pvedaemon worker
    root 21219 0.0 0.0 15492 1516 ? S Mar15 0:00 \_ /sbin/vgs --separator : --noheadings --units k --unbuffered --nosuffix --options vg_name,vg_size


    iSCSI Information:
    Loading iSCSI transport class v2.0-870.
    iscsi: registered transport (tcp)
    iscsi: registered transport (iser)
    scsi5 : iSCSI Initiator over TCP/IP
    scsi6 : iSCSI Initiator over TCP/IP
    scsi7 : iSCSI Initiator over TCP/IP
    scsi 5:0:0:0: Direct-Access OPNFILER VIRTUAL-DISK 0 PQ: 0 ANSI: 4
    sd 5:0:0:0: Attached scsi generic sg4 type 0
    scsi 6:0:0:0: Direct-Access OPNFILER VIRTUAL-DISK 0 PQ: 0 ANSI: 4
    sd 6:0:0:0: Attached scsi generic sg5 type 0
    sd 6:0:0:0: [sdc] 52822016 512-byte logical blocks: (27.0 GB/25.1 GiB)
    sd 5:0:0:0: [sdb] 246743040 512-byte logical blocks: (126 GB/117 GiB)
    sd 5:0:0:0: [sdb] Write Protect is off
    sd 5:0:0:0: [sdb] Mode Sense: 77 00 00 08
    sd 6:0:0:0: [sdc] Write Protect is off
    sd 6:0:0:0: [sdc] Mode Sense: 77 00 00 08
    sd 5:0:0:0: [sdb] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
    sd 6:0:0:0: [sdc] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA
    sdb:
    sdc: sdc1
    sd 6:0:0:0: [sdc] Attached SCSI disk
    unknown partition table
    sd 5:0:0:0: [sdb] Attached SCSI disk

    Kernel:
    Linux node-04 2.6.32-1-pve #1 SMP Fri Jan 15 11:37:39 CET 2010 x86_64 GNU/Linux

    node-04:/var/log# dpkg -l | egrep "(lvm|devm)"
    ii libdevmapper1.02.1 2:1.02.27-4 The Linux Kernel Device Mapper userspace library
    ii lvm2 2.02.39-7 The Linux Logical Volume Manager


    node-04:/var/log# cat /etc/debian_version
    5.0.4


    lsscsi output:
    node-04:/etc# lsscsi --long
    [0:0:0:0] cd/dvd TSSTcorp CDDVDW TS-L633B IB03 /dev/sr0
    state=running queue_depth=1 scsi_level=6 type=5 device_blocked=0 timeout=30
    [4:0:0:0] disk ATA ST9320423AS SDM1 -
    state=running queue_depth=64 scsi_level=6 type=0 device_blocked=0 timeout=0
    [4:0:1:0] disk ATA ST9320423AS SDM1 -
    state=running queue_depth=64 scsi_level=6 type=0 device_blocked=0 timeout=0
    [4:1:2:0] disk LSILOGIC Logical Volume 3000 /dev/sda
    state=running queue_depth=64 scsi_level=3 type=0 device_blocked=0 timeout=30
    [5:0:0:0] disk OPNFILER VIRTUAL-DISK 0 /dev/sdb
    state=running queue_depth=32 scsi_level=5 type=0 device_blocked=0 timeout=30
    [6:0:0:0] disk OPNFILER VIRTUAL-DISK 0 /dev/sdc
    state=running queue_depth=32 scsi_level=5 type=0 device_blocked=0 timeout=30


    Anyone experiencing the same? Any solutions? Everything i found on google relates to older LVM software problems which should be fixed in the releases installed on this node.

    best
    Ray

  2. #2
    Join Date
    Feb 2010
    Posts
    7

    Default Re: lvremove hangs on backup, uninterruptable sleep, two times in a row now

    I had the exact same issue this morning .. i backup all my vm's from 1:00 to 4:00 .. with 1 hour delay between them. As of 3:00 that specific vm wasn't working anymore, the other vm's worked just fine, and the lvremove proc was hanging, couldn't kill it, an fdisk would stall the ssh connection.

    Only difference in my setup is that i'm not using iSCSI, but LVM over MDraid and drbd to the slave server in the cluster and libdevmapper is newer:

    vm01:/var/log# dpkg -l | egrep "(lvm|devm)"
    ii libdevmapper1.02.1 2:1.02.38-2.1~bpo50+1 The Linux Kernel Device Mapper userspace lib
    ii lvm2 2.02.39-7 The Linux Logical Volume Manager

    Anyway, i ended up rebooting the server, which also comes to my next question:

    Why couldn't i start the VM that was crashed on the slave server? Because, what's the use of clustering, using DRBD to keep the slave up to date if it's not possible to start it (i got some "write access denied" error) on the slave server?

    If the master dies for some reason, i can't start the VM's on the slave.

  3. #3
    Join Date
    Jul 2009
    Posts
    19

    Default Re: lvremove hangs on backup, uninterruptable sleep, two times in a row now

    Hi, for me going back to proxmox kernel 2.6.18 stabilized the problem somewhat, saying, it did not occour once since going back to 2.6.18

    So my best guess it's some problem with the kernel, and I am afraid of upgrading my production systems to 2.6.32, which is also problematic because I have the problem that the new Lucid LTS will not work properly in the 2.6.18 kernel.

    but my kernel problems are for another thread :-)

    best

    update:
    I think it could be related to 2.6.32 and the way it handles the lvm volumes. I guess you access the same LVM volume on cluster Secundary as the one on clusternode Primary? If so it seems that (sure) if the LVM remove hangs uninterruptable, the LVM volume is locked, which would explain the write access denied from the Secondary ...

    I do not know if it would be possible to clear the lock on the LVM Volume from the secondary node, and I do not know if it's safe either, but if only I would have the time to play in a lab with these problems *sigh* ...

    Where to report this problem to?
    Last edited by Ray; 04-12-2010 at 03:18 PM.

  4. #4
    Join Date
    Jul 2009
    Posts
    19

    Default Re: lvremove hangs on backup, uninterruptable sleep, two times in a row now

    I'l have this posted with a reference to this thread to the LVM Mailinglist as well, let's if they have something on it, i'l keep this thread updated.

    best

  5. #5
    Join Date
    Feb 2010
    Posts
    7

    Default Re: lvremove hangs on backup, uninterruptable sleep, two times in a row now

    Cool thanks, ray, i'm keeping track of this one, hope to see some replies soon.

  6. #6
    Join Date
    Apr 2011
    Posts
    9

    Default Re: lvremove hangs on backup, uninterruptable sleep, two times in a row now

    Hi
    I also had the same issue.
    Does anyone have a solution or an advice.
    I can't downgrade my proxmox until a long time because it's a production environment.
    What are you recommendations to avoid theses crashes. I couldn't keep logs of the crash and I cannot say exactly if 2 snapshots where running at the same time.
    I use bacula; it nightly run a backup job with creation of a snapshot in a run before job script. LVM crashed during the post backup script.

    Thanks for your help

    Hugo

  7. #7
    Join Date
    Apr 2011
    Posts
    9

    Default Re: lvremove hangs on backup, uninterruptable sleep, two times in a row now

    Hi
    I posted on the pve mailing list,
    I have done some tries in various conditions.
    My script was doing a kpartx -d and then a lvremove. It crashed a lot of times with the same symptoms as you.
    I was on kernel package 2.6.32-30.
    After upgrading to 2.6.32-32, the commands seems to run slowly but it's stable.

    Did you tried with this kernel ?

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •