node would not restart, now has a /etc/pve issue

RobFantini

Famous Member
May 24, 2012
2,023
107
133
Boston,Mass
Hello

When I tried to stop a node, it got stuck here ( from ps afx ] :
Code:
   6061 ?        SL     0:00  \_ startpar -p 4 -t 20 -T 3 -M stop -P 2 -R 6
   6534 ?        S      0:00      \_ /bin/sh /etc/init.d/vz stop
   6583 ?        R      0:48          \_ /sbin/modprobe -r ip_nat_ftp

there were no vz's on this node.

after 5 minutes I used kill -9 6534 un stick it.

while the node was down I made a change to /etc/pve/cluster.conf . that should not be an issue when a node is off? we have 4 nodes in this cluster.

yet here is output of fbc3 /etc/pve # ls -l /etc/pve/cluster*

bad node:
Code:
-r--r----- 1 root www-data 1393 Aug 24 11:57 /etc/pve/cluster.conf
-r--r----- 1 root www-data 1393 Aug 27 12:33 /etc/pve/cluster.conf.new


good nodes have this:
Code:
# node  fbc87
-rw-r----- 1 root www-data 1505 Aug 27 12:44 cluster.conf
-rw-r----- 1 root www-data 1432 Aug 23 07:56 cluster.conf.old

# node fbc241
s012  ~ # ls -l /etc/pve/clus*
-rw-r----- 1 root www-data 1505 Aug 27 12:44 /etc/pve/cluster.conf
-rw-r----- 1 root www-data 1432 Aug 23 07:56 /etc/pve/cluster.conf.old


more info:
Code:
fbc87  /etc/pve # pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X   1112                        fbc3
   2   M    828   2013-08-03 14:00:13  fbc87
   3   M   1072   2013-08-26 15:35:21  s035
   5   M    972   2013-08-22 18:47:55  s012


*bad node pveversion at time of reboot:
Code:
fbc3  /var/lib/vz/private # pveversion -v
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-17-pve: 2.6.32-83
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2

I upgraded after reboot, restarted and have the same issue.


Any suggestions to fix?
 
I moved the network connection to a dumb switch and after two more reboots the node came back up.

Originally it was connected to a Netgear layer 3 switch.
 
Just tested a shutdown , which still hangs here:
Code:
  14406 ?        SL     0:00  \_ startpar -p 4 -t 20 -T 3 -M stop -P 2 -R 0
  14820 ?        S      0:00      \_ /bin/sh /etc/init.d/vz stop
  14869 ?        R      0:12          \_ /sbin/modprobe -r ip_nat_ftp
 
At least the node is part of the cluster.

If I've run in to a bug that needs reporting , or if I can test a fix just let me know.
 
What is the output of

# dpkg -l fuse-utils

Please remove that packages if it is still installed:

# apt-get remove fuse-utils
 
fuse-utils was installed, I removed.

The issue is still occurring. I rebooted two times and ps afx still shows:
Code:
 7946 ?        Ss     0:00 /bin/sh /etc/init.d/rc 6
   7952 ?        SL     0:00  \_ startpar -p 4 -t 20 -T 3 -M stop -P 2 -R 6
   8339 ?        S      0:00      \_ /bin/sh /etc/init.d/vz stop
   8419 ?        R      2:50          \_ /sbin/modprobe -r ip_nat_ftp

Not this system was a pve + desktop system, now it is just a testing node.
here is most of ps afx output besides the kthreadd stuff:
Code:
   4257 ?        S      0:00  \_ [nfsio]
    586 ?        Ss     0:00 udevd --daemon
   3731 ?        S      0:00  \_ udevd --daemon
   4226 ?        S      0:00  \_ udevd --daemon
   2401 ?        Ss     0:00 /sbin/rpcbind -w
   2419 ?        Ss     0:00 /sbin/rpc.statd
   2442 ?        Ss     0:00 /usr/sbin/rpc.idmapd
   2642 ?        Ss     0:00 /usr/sbin/iscsid
   2643 ?        S<Ls   0:00 /usr/sbin/iscsid
   2785 ?        Sl     0:00 /usr/sbin/rsyslogd -c5
   2870 ?        Ss     0:00 /usr/sbin/vzeventd
   3058 ?        Ss     0:00 /usr/sbin/acpid
   3092 ?        S      0:00 /usr/sbin/dnsmasq -x /var/run/dnsmasq/dnsmasq.pid -u dnsmasq -7 /etc/dnsmasq.d,.dpkg-dist,.dp
   3115 ?        Ss     0:00 /usr/bin/dbus-daemon --system
   3118 ?        Sl     0:00 /usr/sbin/console-kit-daemon --no-daemon
   3188 ?        Sl     0:00 /usr/lib/policykit-1/polkitd --no-debug
   3232 ?        Ss     0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 101:104
   3438 ?        Ss     0:00 /usr/sbin/sshd
   3459 ?        Ss     0:00  \_ sshd: root@pts/0 
   3467 pts/0    Ss     0:00      \_ -bash
   8485 pts/0    R+     0:00          \_ ps afx
   3453 ?        Ssl    0:00 /usr/bin/pmxcfs
   3680 ?        Ss     0:00 /usr/sbin/cron
   3745 ?        S      0:00  \_ /USR/SBIN/CRON
   3754 ?        Ss     0:00      \_ /bin/sh -c /fbc/bin/linux-server-reboot-cronjob
   3756 ?        S      0:00          \_ /bin/sh /fbc/bin/linux-server-reboot-cronjob
   8469 ?        S      0:00              \_ sleep 240
   3759 ?        Ss     0:00 /usr/sbin/cupsd -C /etc/cups/cupsd.conf
   3814 ?        Sl     0:00 /usr/lib/x86_64-linux-gnu/colord/colord
   3871 ?        S<Lsl   0:01 corosync -f
   3960 ?        Ssl    0:00 fenced
   3985 ?        Ssl    0:00 dlm_controld
   7917 ?        Sl     0:00 /usr/lib/packagekit/packagekitd
   7946 ?        Ss     0:00 /bin/sh /etc/init.d/rc 6
   7952 ?        SL     0:00  \_ startpar -p 4 -t 20 -T 3 -M stop -P 2 -R 6
   8339 ?        S      0:00      \_ /bin/sh /etc/init.d/vz stop
   8419 ?        R      3:56          \_ /sbin/modprobe -r ip_nat_ftp

This is a system that can be re-installed, or we can take the time to try to solve this. No hurry but if someone has a suggestion please respond.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!