Node crushed, kernel panic

Melanxolik

Well-Known Member
Dec 18, 2013
86
0
46
Hi guys.
I continue the theme about the problems with proxmox. My last problem was that the switch rebooted.
Before the new year, we replaced the EX3300 switch to QFX3500 and it don't reboots, but while creating backup node falls.
I want to remind:
We have 3 identical nodes proxmox cluster v4.

proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

Storage created from ceph on every node's from 6 drives.

Code:
root@cluster-2-1:~# qm list|grep running
root@cluster-2-1:~#

qm status 1001
status: stopped

qm config 1001
bootdisk: virtio0
cores: 2
ide2: local:iso/ru_windows_7_ultimate_with_sp1_x64_dvd_u_677391.iso,media=cdrom
memory: 4096
name: WinTest
net0: virtio=96:F0:C4:BF:7A:B5,bridge=vmbr159
numa: 0
ostype: win7
smbios1: uuid=ca75c9cd-9219-40bb-b9df-8997dd74a77b
sockets: 2
virtio0: CEPH01:vm-1001-disk-1,iops_rd=300,iops_wr=300,mbps_rd=50,mbps_wr=50,size=70G

and I started create backup from command:

Code:
pvesh create  /nodes/cluster-2-1/vzdump --vmid 1001 --storage backup12
INFO: started backup task 'c4771b5f-4a21-4df7-b703-d3948c511903'
INFO: status: 0% (60555264/75161927680), sparse 0% (60555264), duration 3, 20/0 MB/s
INFO: status: 1% (769327104/75161927680), sparse 1% (769327104), duration 39, 19/0 MB/s
INFO: status: 2% (1517027328/75161927680), sparse 2% (1517027328), duration 77, 19/0 MB/s
INFO: status: 3% (2265186304/75161927680), sparse 3% (2265186304), duration 115, 19/0 MB/s
INFO: status: 4% (3013345280/75161927680), sparse 4% (3013345280), duration 153, 19/0 MB/s
INFO: status: 5% (3761504256/75161927680), sparse 5% (3761504256), duration 191, 19/0 MB/s
INFO: status: 6% (4529324032/75161927680), sparse 6% (4529324032), duration 231, 19/0 MB/s
INFO: status: 7% (5277483008/75161927680), sparse 7% (5277483008), duration 269, 19/0 MB/s
INFO: status: 8% (6025707520/75161927680), sparse 8% (6025707520), duration 307, 19/0 MB/s
INFO: status: 9% (6773211136/75161927680), sparse 9% (6773211136), duration 345, 19/0 MB/s
INFO: status: 10% (7521370112/75161927680), sparse 10% (7521370112), duration 383, 19/0 MB/s
INFO: status: 11% (8269529088/75161927680), sparse 11% (8269529088), duration 421, 19/0 MB/s
INFO: status: 12% (9037414400/75161927680), sparse 12% (9037414400), duration 460, 19/0 MB/s
INFO: status: 13% (9785573376/75161927680), sparse 13% (9785573376), duration 498, 19/0 MB/s
INFO: status: 14% (10533732352/75161927680), sparse 14% (10533732352), duration 536, 19/0 MB/s
INFO: status: 15% (11281956864/75161927680), sparse 15% (11281956864), duration 574, 19/0 MB/s
INFO: status: 16% (12030115840/75161927680), sparse 16% (12030115840), duration 612, 19/0 MB/s
INFO: status: 17% (12778274816/75161927680), sparse 17% (12778274816), duration 650, 19/0 MB/s
INFO: status: 18% (13546160128/75161927680), sparse 18% (13546160128), duration 689, 19/0 MB/s
INFO: status: 19% (14296154112/75161927680), sparse 19% (14296154112), duration 730, 18/0 MB/s
Write failed: Broken pipe

top - 21:57:02 up 33 min, 3 users, load average: 0.88, 1.23, 1.08
Tasks: 392 total, 1 running, 391 sleeping, 0 stopped, 0 zombie
%Cpu0 : 1.7 us, 1.7 sy, 0.0 ni, 95.0 id, 1.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 2.3 us, 0.7 sy, 0.0 ni, 96.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 1.3 us, 1.0 sy, 0.0 ni, 94.4 id, 3.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu3 : 1.3 us, 0.7 sy, 0.0 ni, 95.0 id, 3.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 2.0 us, 1.7 sy, 0.0 ni, 95.3 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 5.0 us, 2.0 sy, 0.0 ni, 92.4 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 2.0 us, 0.7 sy, 0.0 ni, 97.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 4.0 us, 1.0 sy, 0.0 ni, 95.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 3.0 us, 1.0 sy, 0.0 ni, 96.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.7 us, 1.3 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 3.0 us, 0.7 sy, 0.0 ni, 96.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.3 us, 0.7 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 13192827+total, 18144812 used, 11378345+free, 4988 buffers
KiB Swap: 14548988 total, 0 used, 14548988 free. 11895304 cached Mem

Code:
model name    : Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz

Code:
cat /proc/meminfo
MemTotal:       131928268 kB
MemFree:        113650048 kB
MemAvailable:   125590636 kB


backup task crashed...
after reboot node I have a lot of error messages in kernel.log.
I attached kernel.log to my post.

I know, It's very strange everything, but I can't fix all this problems with proxmox by myself.
 

Attachments

  • kernel.txt
    32.3 KB · Views: 4
images from console server in monet crash vzdump.
 

Attachments

  • crash1.png
    crash1.png
    862.2 KB · Views: 11
  • crash2.png
    crash2.png
    851 KB · Views: 10
  • crash3.png
    crash3.png
    838 KB · Views: 8
  • crash5.png
    crash5.png
    973.8 KB · Views: 10
  • crash6.png
    crash6.png
    623 KB · Views: 11
  • crash7.png
    crash7.png
    812.7 KB · Views: 10
what type of storage is "backup-12" in relation to the node that crashed on vzdump ?
what kind of pool is CEPH01 ?
What does your syslog show regarding the crash ?
Is it reproduce-able every-time, or just happens sporadic or even only once ?
 
what type of storage is "backup-12" in relation to the node that crashed on vzdump ?
what kind of pool is CEPH01 ?
What does your syslog show regarding the crash ?
Is it reproduce-able every-time, or just happens sporadic or even only once ?

Backup storage made glusterfs.
I don't have any messages in syslog, but only a litle messages in kernel.log
I can reproduce every-time when I do backup task from console or api query
 
Backup storage made glusterfs.
I don't have any messages in syslog, but only a litle messages in kernel.log
I can reproduce every-time when I do backup task from console or api query

thanks about zfs swap, it's intresting, I will try disable swap on Mondey.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!