[SOLVED] Ceph Error

Dec 10, 2016
41
0
26
Hallo,
Heute sind alle VMs auf einem unser Servern ausgefallen. Folgender Fehler ist aufgetreten

Hat jemand eine Idee?

Code:
Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fb7c7b97780 time 2017-03-21 23:41:48.132092
common/Thread.cc: 131: FAILED assert(ret == 0)
ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0x5016d6]
2: /usr/bin/rbd() [0x4fe38f]
3: (CephContext::CephContext(unsigned int)+0x149) [0x505a99]
4: (common_preinit(CephInitParameters const&, code_environment_t, int)+0x32) [0x516832]
5: (global_pre_init(std::vector<char const*, std::allocator<char const*> >*, std::vector<char const*, std::allocator<char const*> >&, unsigned int, code_environment_t, int)+0x9a) [0x59908a]
6: (global_init(std::vector<char const*, std::allocator<char const*> >*, std::vector<char const*, std::allocator<char const*> >&, unsigned int, code_environment_t, int)+0x1c) [0x59998c]
7: (main()+0xad) [0x4b9bbd]
8: (__libc_start_main()+0xf5) [0x7fb7c0bfcb45]
9: /usr/bin/rbd() [0x4c2717]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
can't unmap rbd volume vm-111-disk-1: terminate called after throwing an instance of 'ceph::FailedAssertion'
TASK ERROR: start failed: command '/usr/bin/kvm -id 111 -chardev 'socket,id=qmp,path=/var/run/qemu-server/111.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/111.pid -daemonize -smbios 'type=1,uuid=fa4f7f02-7c2d-4c9d-a8d8-956864b10ff9' -name gameserver -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/111.vnc,x509,password -no-hpet -cpu 'kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,enforce' -m 2048 -k de -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:3681fcbb6821' -drive 'file=/mnt/iso-data/template/iso/virtio-win.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/rbd/vm-data/vm-111-disk-1,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap111i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=5E:E5:0F:50:D7:2F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -global 'kvm-pit.lost_tick_policy=discard'' failed: open3: fork failed: Die Ressource ist zur Zeit nicht verfügbar at /usr/share/perl5/PVE/Tools.pm line 411.
 
Hi,
hast Du was an der ceph-installation geändert? Wie z.B. Updates ohne restarts (von den ceph-diensten)?

Kannst Du die Ausgabe folgender Befehle posten?
Code:
ceph -s
ceph osd tree
dpkg -l | grep ceph
Udo
 
Hey, war leider die Tage verhindert. Ja hatte ein apt-get upgrade gefahren, da würde aber nicht gesagt neustarten.

Hier die Daten:
ceph -s
Code:
    cluster 96f86bf1-a42d-4267-8928-d685eee56605
     health HEALTH_OK
     monmap e3: 3 mons at {0=10.10.1.1:6789/0,1=10.10.1.2:6789/0,2=10.10.1.3:6789/0}
            election epoch 406, quorum 0,1,2 0,1,2
     osdmap e1524: 6 osds: 6 up, 6 in
      pgmap v5832177: 128 pgs, 2 pools, 1188 GB data, 303 kobjects
            2411 GB used, 8761 GB / 11172 GB avail
                 128 active+clean
  client io 40826 B/s wr, 12 op/s
ceph osd tree
Code:
ID WEIGHT   TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 10.91995 root default
-2  3.63998     host pve01cp01
 0  1.81999         osd.0           up  1.00000          1.00000
 1  1.81999         osd.1           up  1.00000          1.00000
-3  3.63998     host pve02cp01
 2  1.81999         osd.2           up  1.00000          1.00000
 3  1.81999         osd.3           up  1.00000          1.00000
-4  3.63998     host pve03cp01
 4  1.81999         osd.4           up  1.00000          1.00000
 5  1.81999         osd.5           up  1.00000          1.00000
dpkg -l | grep ceph
Code:
ii  ceph                                 0.94.10-1~bpo80+1              amd64        distributed storage and file system
ii  ceph-common                          0.94.10-1~bpo80+1              amd64        common utilities to mount and interact with a ceph storage cluster
ii  libcephfs1                           0.94.10-1~bpo80+1              amd64        Ceph distributed file system client library
ii  python-ceph                          0.94.10-1~bpo80+1              amd64        Meta-package for python libraries for the Ceph libraries
ii  python-cephfs                        0.94.10-1~bpo80+1              amd64        Python libraries for the Ceph libcephfs library
 
Hey, war leider die Tage verhindert. Ja hatte ein apt-get upgrade gefahren, da würde aber nicht gesagt neustarten.
Hi,
bei proxmox immer dist-upgrade verwenden! Wenn nur "normale" Pakete erneuert werden, reicht zwar ein upgrade, aber bei pve-paketen kannst Du dir das System zerlegen, weil Abhängigkeiten nicht nachgezogen werden.

Wenn ceph-pakete geupdated werden, musst Du immer die ceph-dienste neustarten - am besten nach den Update-Anweisungen von ceph. Gewöhnlich werden erst die Monitore restartet und danach die OSDs.

Könnte mir vorstellen, dass danach Dein Fehler weg ist.

Udo
 
  • Like
Reactions: fireon