Unable to stop container: operation timed out

chronos

Active Member
Apr 11, 2009
55
2
28
This error message will be printed after failure of openvz guest os and attempt to stop vm by vzctl stop <container_id>. As I find out on the web there is problem with some unkillable process which is i D state what can be discovered with ps -aux command.
Code:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
500      18762  0.0  0.6 901540 50820 ?        D    06:35   0:28 krusader -caption Krusader --icon=krusader_user.png
This D state is probably waiting state to finish some I/O operations so there must be some problem with local storage or network. dmesg display some errors:

Code:
EXT3-fs error (device dm-6): ext3_find_entry: reading directory #101204807 offset 0
EXT3-fs error (device dm-6): ext3_find_entry: reading directory #101204807 offset 0
EXT3-fs error (device dm-6): ext3_find_entry: reading directory #101204807 offset 0
EXT3-fs error (device dm-6): ext3_find_entry: reading directory #101204807 offset 0
EXT3-fs error (device dm-6): ext3_find_entry: reading directory #101204807 offset 0
EXT3-fs error (device dm-6): ext3_find_entry: reading directory #101204807 offset 0
EXT3-fs error (device dm-6): ext3_get_inode_loc: unable to read inode block - inode=101209555, block=202408112
EXT3-fs error (device dm-6): ext3_get_inode_loc: unable to read inode block - inode=101209550, block=202408112
EXT3-fs error (device dm-6): ext3_find_entry: reading directory #101204954 offset 0
EXT3-fs error (device dm-6): ext3_find_entry: reading directory #101204954 offset 0
Buffer I/O error on device dm-6, logical block 1545
lost page write due to I/O error on dm-6
But dm-6 is not existed according /sys/block/ at moment of freezed process state so it must be some kind of virtual lvm2 file system used temporary by running vm.

But on my second hw node dmesg show me similar report.
Code:
EXT3-fs error (device dm-8): ext3_find_entry: reading directory #20693312 offset 0
EXT3-fs error (device dm-8): ext3_find_entry: reading directory #20693312 offset 0
EXT3-fs error (device dm-8): ext3_find_entry: reading directory #20693312 offset 0
EXT3-fs error (device dm-8): ext3_find_entry: reading directory #20693312 offset 0
EXT3-fs error (device dm-8): ext3_find_entry: reading directory #20693312 offset 0
EXT3-fs error (device dm-8): ext3_find_entry: reading directory #20529154 offset 0
EXT3-fs error (device dm-8): ext3_find_entry: reading directory #20694234 offset 0
EXT3-fs error (device dm-8): ext3_find_entry: reading directory #20694234 offset 0
printk: 9 messages suppressed.
Buffer I/O error on device dm-8, logical block 1545
lost page write due to I/O error on dm-8
Only simple way to get rid of D state process is reboot whole node.
Is there any another way to remove these unkillable processes?
What could cause this kind of failure?

I have installed Proxmox VE 1.1 which contain kernel 2.6.24-2-pve. Probem occured on diffrent hosts with slightly different hw configuration so probably there is not depend directly on hw configuration.
So far there were problem only on vm using template fedora-10-x86_64-default-20090318.tar.gz and installed with Xorg with GNOME environment. But it could be simply cause of more intense usage of these vms as remote desktop.

Error occured some times randomly by usage of different applications but in some situation can be repeated.
For example it is possible to connect to remote nxserver desktop and use some application but some combination as launching krusader and next launching avi movie with kmplayer cause krusader move into D state but nothing suspicious in any log.
Sometimes evan suffices to try to log with su command and process get stuck in D state.

Only way I can currently reproduce this is launchning kmplayer from krusader by double click on file.
There is another process which get to D state in same time
Code:
5055 ?        D      0:00 iceauth remove netid=local/george-virt-old:/tmp/.ICE-unix/dcop1524-1240851203
pstree display hiearchy as
Code:
  │   ├─dcopserver,500  
  │   │   └─dcopserver_shut --nokill
  │   │       └─iceauth remove netid=local/george-virt-old:/tmp/.ICE-unix/dcop1524-1240851203
So as krusader is KDE(currently version 2.0 works is KDE4 compatible) appliacation there can be issue with KDE. Launching kmplayer on same file from Nautilus working good but from krusader fails. Interesting.

Anyway, is there any way to stop D state processes or find out for which resource are waiting?
And second question what are dm-6 devices and what can cause errors like "lost page write due to I/O error on dm-6"?
 
software raid?
 
No raid, simply Wester digital 1TB SATA drive WDC WD10EADS-00L5B1 with default partitioning on cheap servers for usage by small wireless ISP. CPU on one host is Athlon X2 2 GHz and Phenom II X3 2,8 GHz on cheap motherboards.

Code:
virtual:/# lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: ASRock Incorporation Unknown device 9602
00:09.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 4)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 5)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3200 Graphics
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

Second machine
Code:
virtual-test:/# lspci
00:00.0 Host bridge: nVidia Corporation nForce3 250Gb Host Bridge (rev a1)
00:01.0 ISA bridge: nVidia Corporation nForce3 250Gb LPC Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation nForce 250Gb PCI System Management (rev a1)
00:02.0 USB Controller: nVidia Corporation CK8S USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation CK8S USB Controller (rev a1)
00:02.2 USB Controller: nVidia Corporation nForce3 EHCI USB 2.0 Controller (rev a2)
00:08.0 IDE interface: nVidia Corporation CK8S Parallel ATA Controller (v2.5) (rev a2)
00:0a.0 IDE interface: nVidia Corporation nForce3 Serial ATA Controller (rev a2)
00:0b.0 PCI bridge: nVidia Corporation nForce3 250Gb AGP Host to PCI Bridge (rev a2)
00:0e.0 PCI bridge: nVidia Corporation nForce3 250Gb PCI-to-PCI Bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 Pro Ultra TF
02:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
02:08.0 Multimedia controller: Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder (rev d1)
 
or find out for which resource are waiting?
And second question what are dm-6 devices

'dm' is 'device mapper' (lvm2), so its your HD. Do you use vzdump for backup?

and what can cause errors like "lost page write due to I/O error on dm-6"?

damaged hard disk? You should use a HW RAID.
 
I don't think it could be damaged harddisk. I upgraded two servers recently and added new 1TB drive to both and I can reproduce make processes in D state on both servers even their have different configuration.
It's interesting that error
Code:
Buffer I/O error on device dm-6, logical block 1545
lost page write due to I/O error on dm-6
is related to some virtual lvm device and pve-root either pve-data have no problem at all.
Maybe it is occured only durring vzdump process as filesystem is probaly mounted in snapshot mode.

So this error "lost page write due to I/O error" is probably independent to sleeping process in D state and thus these are separate errors. As I was repetitively trying to reproduce D state process nothing was appeared in dmesg log but hw node have to be rebooted to clear all processes. So I need find out somehow what cause this and for what resource are processes waiting for.
 
I had a similar problem and had to reboot the hardware node.

Also, when rebooting it could not stop the 1 problem container. So I had to kill -9 that process in order to let the reboot continue.

I did see before the problem with the container that the DCACHESIZE limit was being hit in the beancounters.

Ken
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!