1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

VM crash with memory hotplug

Discussion in 'Proxmox VE: Installation and configuration' started by hansm, Jul 25, 2017.

  1. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    We're having problems with all newer pve-qemu-kvm versions, every version newer than 2.5.19 is causing a unpredictable crash of our VM's. I've setup some tests and reproduce details, can you please check your configuration, with this test you can crash your VM in a few minutes.
    My test VM details:
    Code:
    boot: dc
    bootdisk: scsi0
    cores: 12
    hotplug: disk,network,usb,memory,cpu
    ide2: none,media=cdrom
    memory: 2048
    name: testserver.mydomain.com
    net0: virtio=3A:A9:6A:0C:3E:2D,bridge=vmbr123
    numa: 1
    onboot: 1
    ostype: l26
    protection: 1
    scsi0: sata-datastore:995/vm-995-disk-1.qcow2,size=20G
    scsihw: virtio-scsi-single
    smbios1: uuid=18ee0633-f00c-4d40-b037-349da8e44ea4
    sockets: 1
    vcpus: 2
    
    So, NUMA enabled and memory hotplug selected.

    I tested with Debian Jessie (8.8) and Stretch (9.1), both are having the same problem, a few months ago I also tested with CentOS with the same results, earlier I didn't had time to test this thoroughly. All our Proxmox nodes are running with pve-qemu-kvm_2.5-19_amd64.deb and the problem doesn't occur then.

    Enable hotplug support in your VM according https://pve.proxmox.com/wiki/Hotplug_(qemu_disk,nic,cpu,memory) To make this easy you can copy this rule (for Jessie):
    Code:
    echo 'SUBSYSTEM=="memory", ACTION=="add", TEST=="state", ATTR{state}=="offline", ATTR{state}="online"' > /lib/udev/rules.d/80-hotplug-cpu-mem.rules
    
    If you installed Debian Stretch add the following in /etc/default/grub instead of the above udev rule:
    Code:
    GRUB_CMDLINE_LINUX="memhp_default_state=online"
    
    Save the file and update Grub:
    Code:
    update-grub
    
    Please test with newer pve-qemu-kvm version and setup a default Debian Jessie or Stretch VM with comparable details as listed above. Make sure your test VM can send email to your own email address, on Jessie this can be done with: dpkg-reconfigure exim4-config (choose internet site and all other options default). In /etc/aliases specify your email address after root: to setup a forwarder for the root account. Now test by: echo test | mail -s test root
    If you don't receive the email please fix this, the command to test will send you an email after a crash and a reboot.

    Now set following cronjob with: crontab -e and reboot
    Code:
    @reboot    touch /home/test_count && if [ -e /home/test_crashed_your_server ]; then echo "`/bin/hostname` crashed after `wc -c < /home/test_count` tries" | mail -s "`/bin/hostname` crashed" root; elif [ `wc -c < /home/test_count` -ge 50 ]; then exit; else sleep 10; touch /home/test_crashed_your_server; for i in `seq 1 5`; do SIZE=2048; echo 3 > /proc/sys/vm/drop_caches; dd if=/dev/zero of=/home/tempfile bs=1M count=$SIZE conv=fdatasync,notrunc > /dev/null 2>&1; echo 3 > /proc/sys/vm/drop_caches; dd if=/home/tempfile of=/dev/null bs=1M count=$SIZE > /dev/null 2>&1; rm -f /home/tempfile; echo -n . >> /home/test_count; done; rm -f /home/test_crashed_your_server; /sbin/reboot; fi
    
    Reboot your VM.
    This will create a test file of 2GB after every reboot, first wait 10 seconds to give the VM time to boot. The test file is written and will be read, caches are dropped before and after the dd tests. This dd tests repeat 5 times and then reboot. Before the test starts a file /home/test_crashed_your_server is created and after the 5 successful tests it is deleted. If your VM crashes during the dd tests the file isn't removed and after the reboot the cron sends you a warning. If you didn't enable HA on the VM the VM will be stopped when crashed, please start it and you will get the email, tests aren't repeated until you clean the files in /home (test_count, tempfile, test_crashed_your_server). If email can't be sent but you'll see the file test_crashed_your_server directly after a reboot, your VM crashed.

    Notes:
    - VM's also crash on unpredictable moments, this test only triggers it, maybe someone can think of a better test, but this worked for me.
    - It doesn't happen without memory hotplug. You could enable memory hotplug on the VM but don't enable it in the VM with the udev rule or Grub parameter in Stretch and the VM will not crash.
    - It happens on all storages, tested on SATA RAID10 NFS over 1Gbit/s and Ceph full SSD cluster over redundant 10Gbit/s.

    Thanks for testing!
     
    #1 hansm, Jul 25, 2017
    Last edited: Sep 20, 2017 at 15:11
  2. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    Bump...
    I really think you need to take a few minutes to test this. It's kind of crucial to have VM's that do not crash and with recent pve-qemu-kvm versions VM's will crash with memory hotplug support enabled in Proxmox and Linux guest OS.
     
  3. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    I tested it with PVE 5.0 on a test server in our office. Standalone, only 1 SATA disk for OS and test VM on local-lvm.
    With above procedure (memory hotplug and test case) a clean Debian 9 install crashes directly after the first test run. Reinstalled the VM with CentOS 7, setup cronjob and reboot to start testing. After 20 tests the VM crashes.

    As I told before, this doesnt happen with pve-qemu-kvm 2.5.19 and earlier.

    I gathered strace output of the kvm process on the host when running the test case. These are the last lines strace outputs, it keeps quiet after the last line. The VM isn't usable anymore, it just hangs, no real crash or kernel panic, it hangs and you can't login anymore, you can't even attach the console to it, it's completely dead.

    For the Debian 9 VM:
    Code:
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=933041}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=934244}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=933368}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=935821}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=933838}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=934214}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=931164}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=935833}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=934361}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=932171}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=934129}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=935054}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=26, events=POLLIN}, {fd=31, events=POLLIN}], 8, {tv_sec=0, tv_nsec=937179}, NULL, 8) = 1 ([{fd=31, revents=POLLIN}], left {tv_sec=0, tv_nsec=216610})
    read(31, "\1\0\0\0\0\0\0\0", 512)       = 8
    
    CentOS 7 VM:
    Code:
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc82f8000, iov_len=122880}], offset=19363287040, resfd=26}]) = 1
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc825a000, iov_len=131072}], offset=19362639872, resfd=26}]) = 1
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc83db000, iov_len=131072}], offset=19364704256, resfd=26}]) = 1
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc827a000, iov_len=122880}], offset=19362770944, resfd=26}]) = 1
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc83fb000, iov_len=122880}], offset=19364835328, resfd=26}]) = 1
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc79fc000, iov_len=16384}, {iov_base=0x7febc8200000, iov_len=106496}], offset=19362254848, resfd=26}]) = 1
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8356000, iov_len=131072}], offset=19363672064, resfd=26}]) = 1
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    read(26, "@\0\0\0\0\0\0\0", 512)        = 8
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8376000, iov_len=32768}, {iov_base=0x7fec977fc000, iov_len=90112}], offset=19363803136, resfd=26}]) = 1
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=25, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=32, events=POLLIN}], 10, {tv_sec=0, tv_nsec=0}, NULL, 8) = 2 ([{fd=9, revents=POLLIN}, {fd=26, revents=POLLIN}], left {tv_sec=0, tv_nsec=0})
    write(24, "\1\0\0\0\0\0\0\0", 8)        = 8
    read(26, "\2\0\0\0\0\0\0\0", 512)       = 8
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    write(9, "\1\0\0\0\0\0\0\0", 8)         = 8
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=25, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=32, events=POLLIN}], 10, {tv_sec=0, tv_nsec=0}, NULL, 8) = 1 ([{fd=9, revents=POLLIN}], left {tv_sec=0, tv_nsec=0})
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=25, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=32, events=POLLIN}], 10, {tv_sec=0, tv_nsec=11543183}, NULL, 8) = 1 ([{fd=9, revents=POLLIN}], left {tv_sec=0, tv_nsec=11540890})
    read(9, "\33\0\0\0\0\0\0\0", 512)       = 8
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=25, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=32, events=POLLIN}], 10, {tv_sec=0, tv_nsec=11454465}, NULL, 8) = 1 ([{fd=25, revents=POLLIN}], left {tv_sec=0, tv_nsec=10338237})
    read(25, "\1\0\0\0\0\0\0\0", 512)       = 8
    io_submit(0x7feccd9cd000, 9, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8429000, iov_len=131072}], offset=19365023744, resfd=26}, {preadv, fildes=20, iovec=[{iov_base=0x7febc84a7000, iov_len=131072}], offset=19365539840, resfd=26}, {preadv, fildes=20, iovec=[{iov_base=0x7febc8525000, iov_len=131072}], offset=19366055936, resfd=26}, {preadv, fildes=20, iovec=[{iov_base=0x7fec978cb000, iov_len=126976}, {iov_base=0x7febc854b000, iov_len=4096}], offset=19366572032, resfd=26}, {preadv, fildes=20, iovec=[{iov_base=0x7febc85aa000, iov_len=131072}], offset=19367088128, resfd=26}, {preadv, fildes=20, iovec=[{iov_base=0x7febc8628000, iov_len=131072}], offset=19367604224, resfd=26}, {preadv, fildes=20, iovec=[{iov_base=0x7febc86a6000, iov_len=131072}], offset=19368120320, resfd=26}, {preadv, fildes=20, iovec=[{iov_base=0x7fec978f3000, iov_len=131072}], offset=19368636416, resfd=26}, {preadv, fildes=20, iovec=[{iov_base=0x7febc872b000, iov_len=65536}], offset=19369152512, resfd=26}]) = 9
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8449000, iov_len=131072}], offset=19365154816, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc84c7000, iov_len=131072}], offset=19365670912, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc854c000, iov_len=131072}], offset=19366703104, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc85ca000, iov_len=131072}], offset=19367219200, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8648000, iov_len=131072}], offset=19367735296, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc86c6000, iov_len=131072}], offset=19368251392, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7fec97913000, iov_len=131072}], offset=19368767488, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8469000, iov_len=131072}], offset=19365285888, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8545000, iov_len=24576}, {iov_base=0x7fec97873000, iov_len=106496}], offset=19366187008, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc85ea000, iov_len=131072}], offset=19367350272, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc86e6000, iov_len=131072}], offset=19368382464, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7fec97933000, iov_len=131072}], offset=19368898560, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc84e7000, iov_len=131072}], offset=19365801984, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7fec9788d000, iov_len=131072}], offset=19366318080, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8668000, iov_len=131072}], offset=19367866368, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7fec97953000, iov_len=57344}, {iov_base=0x7febc871b000, iov_len=65536}], offset=19369029632, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8489000, iov_len=122880}], offset=19365416960, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc856c000, iov_len=131072}], offset=19366834176, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8688000, iov_len=122880}], offset=19367997440, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8507000, iov_len=122880}], offset=19365933056, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc860a000, iov_len=122880}], offset=19367481344, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc858c000, iov_len=122880}], offset=19366965248, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7fec978ad000, iov_len=122880}], offset=19366449152, resfd=26}]) = 1
    io_submit(0x7feccd9cd000, 1, [{preadv, fildes=20, iovec=[{iov_base=0x7febc8706000, iov_len=86016}, {iov_base=0x7fec978ea000, iov_len=36864}], offset=19368513536, resfd=26}]) = 1
    ppoll([{fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=23, events=POLLIN}, {fd=25, events=POLLIN}, {fd=26, events=POLLIN}, {fd=27, events=POLLIN}, {fd=32, events=POLLIN}], 10, {tv_sec=0, tv_nsec=0}, NULL, 8) = 2 ([{fd=25, revents=POLLIN}, {fd=26, revents=POLLIN}], left {tv_sec=0, tv_nsec=0})
    write(24, "\1\0\0\0\0\0\0\0", 8)        = 8
    read(25, "\1\0\0\0\0\0\0\0", 512)       = 8
    write(2, "kvm:", 4)                     = 4
    write(2, " ", 1)                        = 1
    write(2, "Looped descriptor", 17)       = 17
    write(2, "\n", 1)                       = 1
    
    The kvm process keeps running on the host with 100% cpu usage. The VM itself runs with 50% cpu usage (2 core VM). But stays unusable.

    I'm sure everyone has this problem, please test it and report your results. I hope that the Proxmox team will test it soon.
     
  4. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    Because no one responds and Proxmox team also doesn't respond to our bug report at https://bugzilla.proxmox.com/show_bug.cgi?id=1107#c16 we doubt if we can keep using PVE in the future.

    I'm still trying to solve this myself but I really appreciate some help with it.

    I've made some progress. Starting a VM with only 1GB of memory doesn't crash, this is the command that PVE runs:
    Code:
    /usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=bd7f9680-73f0-428e-a421-2e3b6ac733d8' -name test.localdomain -smp '1,sockets=1,cores=8,maxcpus=8' -device 'kvm64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 'size=1024,slots=255,maxmem=4194304M' -object 'memory-backend-ram,id=ram-node0,size=1024M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8bc71019ca99' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=02:74:F2:CE:A1:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
    
    Starting with more than 1GB of memory makes the VM crash with the method describe in my first post. Command for the same VM with 2GB mem:
    Code:
    /usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=bd7f9680-73f0-428e-a421-2e3b6ac733d8' -name test.localdomain -smp '1,sockets=1,cores=8,maxcpus=8' -device 'kvm64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 'size=1024,slots=255,maxmem=4194304M' -object 'memory-backend-ram,id=ram-node0,size=1024M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -object 'memory-backend-ram,id=mem-dimm0,size=512M' -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' -object 'memory-backend-ram,id=mem-dimm1,size=512M' -device 'pc-dimm,id=dimm1,memdev=mem-dimm1,node=0' -k en-us -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8bc71019ca99' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=02:74:F2:CE:A1:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
    
    The only difference seems to be the 2 memory-backend-ram objects and pc-dimm devices. I tried to change te command slightly, the following also works:
    Code:
    /usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=bd7f9680-73f0-428e-a421-2e3b6ac733d8' -name test.localdomain -smp '1,sockets=1,cores=8,maxcpus=8' -device 'kvm64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 'size=1024,slots=255,maxmem=4194304M' -object 'memory-backend-ram,id=ram-node0,size=1024M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -object 'memory-backend-ram,id=mem-dimm0,size=1024M' -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' -k en-us -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8bc71019ca99' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=02:74:F2:CE:A1:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
    
    There's only one additional memory-backend-ram object and pc-dimm device.

    It also works when memdev isn't used on starting, but just start a VM with eg. mem=2G, like this:
    Code:
    /usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=bd7f9680-73f0-428e-a421-2e3b6ac733d8' -name test.localdomain -smp '1,sockets=1,cores=8,maxcpus=8' -device 'kvm64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 'size=2G,slots=255,maxmem=4194304M' -numa 'node,nodeid=0,cpus=0-7,mem=2G' -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8bc71019ca99' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=02:74:F2:CE:A1:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
    
    Hotplugging memory in the QEMU monitor works with:
    Code:
    object_add memory-backend-ram,id=mem1,size=1G
    device_add pc-dimm,id=dimm1,memdev=mem1
    
    The memory is added to the VM and the VM will not crash.

    I also compiled QEMU 2.9 from source and started that binary instead of /usr/bin/kvm. Same behaviour, so it's not a bug in pve-qemu-kvm. Maybe it's just the way PVE uses to add memory.

    I hope the Proxmox team kicks in and test this and fixes it :)
    You can ask me for help with testing or whatever. I'm working for days/weeks on this problem now and want it solved.

    Thank you!
     
  5. wbumiller

    wbumiller Proxmox Staff Member
    Staff Member

    Joined:
    Jun 23, 2015
    Messages:
    483
    Likes Received:
    47
    Couldn't reproduce it. Ran through up to test_count 50 several times now, tried with pve4 with qemu 2.7.1 and pve5 with qemu 2.9.1.
    At this point my best suggestion is - since you mentioned you compiled 2.9 from source - try a full git-bisect, which is rather tedious, but for now I can't reproduce it, so I can't do that :-/.
    In the mean time: I've been playing with these options and could trigger a bit of weirdness with numa + memory hotplug + virtio-net + ovmf/uefi and am wondering if you could try replacing virtio-net with e1000, or adding ',disable-modern=true' to the 'virtio-net-pci' part of the kvm command, see if that makes a difference.
    Also, since you tested with the non-pve qemu source, you could also open a bug report with qemu directly (if you haven't done so already).
     
  6. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    Thank you for your reply and testing. I'm very surprised you couldn't reproduce it, strange. Only thing I can think of is that we're using Dell hardware only, I tested it on Dell PowerEdge R310, R320, R420 and R610. In cluster setups with PVE 4.4 and standalone at our office on PVE 5. Possibly it's some incompatibility with Dell hardware/BIOS or something...

    I tried with e1000, same behaviour, also tried -device 'virtio-net-pci,mac=02:74:F2:CE:A1:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300,disable-modern=true', same behaviour. Thanks for thinking about the problem and giving me some options to try. It's really appreciated.

    I tried with QEMU from source but I'm not sure if it won't use any libraries or something from the default install. I downloaded the source in /usr/src and compiled according the documentation (I needed to install many additional Debian packages to make everything work). Than I could run:
    Code:
    /usr/src/qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=bd7f9680-73f0-428e-a421-2e3b6ac733d8' -name test.localdomain -smp '1,sockets=1,cores=8,maxcpus=8' -device 'kvm64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 'size=1024,slots=255,maxmem=4194304M' -object 'memory-backend-ram,id=ram-node0,size=1024M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -object 'memory-backend-ram,id=mem-dimm0,size=512M' -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' -object 'memory-backend-ram,id=mem-dimm1,size=512M' -device 'pc-dimm,id=dimm1,memdev=mem-dimm1,node=0' -k en-us -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8bc71019ca99' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=02:74:F2:CE:A1:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
    
    This gave the same problem. And the following worked perfectly:
    Code:
    qemu-system-x86_64 -accel kvm -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=bd7f9680-73f0-428e-a421-2e3b6ac733d8' -name test.localdomain -smp '1,sockets=1,cores=8,maxcpus=8' -device 'kvm64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0' -nodefaults -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 'size=4G,slots=8,maxmem=10240M' -numa 'node,nodeid=0,cpus=0-7,mem=4G' -k en-us -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8bc71019ca99' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=threads,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=02:74:F2:CE:A1:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
    
    You're right, I can open a bug report with qemu directly but I'm not sure, and I couldn't find out eiter, if Proxmox adds memory to qemu on the recommended way. My previous post describes how I can run qemu correctly with memory hotplug enabled, so it can work with qemu, but not with the options that Proxmox specified. These 2 reasons (shared libs + qemu options) prevented me from opening a bug report with qemu directly.

    You wrote:
    How could you trigger that and what did you see? Can I try the same?
    I have no problem with tedious but I'm unfamiliar with these kind of tasks, it looks very developer like :) I understand this can be used to find the commit that caused a regression, but I really have no idea where to start, if you can point me in the right direction I will do this. I want this problem solved because it's very serious for us and I'm shocked that you can't reproduce, I can do it every time, with Debian guest at the first run all the time. I've spent so much time on this problem already, so a few hours or days more isn't a problem ;-)
     
  7. wbumiller

    wbumiller Proxmox Staff Member
    Staff Member

    Joined:
    Jun 23, 2015
    Messages:
    483
    Likes Received:
    47
    Since it seems to take a few tries to trigger it, rather than a having a short simple trigger-command, it's very possible that the hardware in use at least strongly influences the likelihood of triggering the bug.

    OVMF looped endlessly right when booting the VM (during the splashscreen) as it seems to be incompatible with the 'modern' mode in virtio-pci (which was changed to default to on between 2.6 and 2.7, so with 2.7 and up 'disable-modern=true' is needed to fix this).
    I'll have to investigate this issue further and possibly report it upstream.

    First you need to clone qemu from git, then you can start a `git bisect` session which takes a good and a bad revision and then basically does a binary search for the commit introducing the issue. For that it'll check out a commit half way between the two current good & bad versions, you then compile and test it, then say `git bisect good` or `git bisect bad` depending on whether the bug triggered or not. It'll then use this commit as the new good or bad starting point and go half way between the new commits. Given the amount of revisions between the versions you It'll take about 6-ish revisions (assuming v2.5.1 works and v2.6.0 fails).
    Here's an outline of the required commands:

    As root install the required dev packages (I'm using a ./configure line below equivalent to what we use to build the pve-qemu-kvm package where some of the library dependencies are explicitly enabled, alternatively you can disable the ones you know you don't need).
    Code:
    # apt install autotools-dev libpci-dev quilt texinfo texi2html libgnutls28-dev libsdl1.2-dev check libaio-dev uuid-dev librbd-dev libiscsi-dev libspice-protocol-dev pve-libspice-server-dev libusbredirparser-dev glusterfs-common libusb-1.0-0-dev xfslibs-dev libnuma-dev libjemalloc-dev libjpeg-dev libacl1-dev libcap-dev
    
    Setup the git clone and prepare for building (assuming v2.6.0 already doesn't work):
    Code:
    $ git clone git://git.qemu.org/qemu.git
    $ cd qemu
    $ git bisect start v2.5.1 v2.6.0
    $ ./configure --with-confsuffix=/kvm --target-list=x86_64-softmmu --prefix=/usr --datadir=/usr/share --docdir=/usr/share/doc/pve-qemu-kvm --sysconfdir=/etc --localstatedir=/var --disable-xen --enable-gnutls --enable-sdl --enable-linux-aio --enable-rbd --enable-libiscsi --disable-smartcard --audio-drv-list=alsa --enable-spice --enable-usb-redir --enable-glusterfs --enable-libusb --disable-gtk --enable-xfsctl --enable-numa --disable-strip --enable-jemalloc --disable-libnfs --disable-fdt --enable-debug-info --enable-debug --disable-werror
    
    It should be enough to run configure once there (it'll rerun itself between revisions where it needs to).

    Iteration:
    1) Run `make` as user
    2) Run the qemu-system-x86_64 ... command which you use to trigger the issue (you should include `-accel kvm` - the 'kvm' binary from the pve-qemu-kvm package changes this to be on by default.
    3) If the bug was triggered:
    a) Run `git bisect bad`
    If it worked fine:
    b) Run `git bisect good`​
    4) If the above command tells you the commit responsible you're done, otherwise repeat from step 1.
     
  8. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    Thanks for the explanation. I've done the bisect but I don't think it gave usable information (so far).
    First I tried v2.5.1 v2.6.0 in bisect like you wrote. Every make inbetween worked so I approved with 'git bisect good', at the end it resulted in:
    Code:
    root@test:/usr/src/qemu# git bisect good
    a58047f7fbb055677e45c9a7d65ba40fbfad4b92 is the first bad commit
    commit a58047f7fbb055677e45c9a7d65ba40fbfad4b92
    Author: Michael Roth <mdroth@linux.vnet.ibm.com>
    Date:   Tue Mar 29 15:47:56 2016 -0500
    
        Update version for 2.5.1 release
    
        Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
    
    :100644 100644 437459cd94c9fa59d82c61c0bc8aa36e293b735e 73462a5a13445f66009e00988279d30e55aa8363 M      VERSION
    
    This commit only set another QEMU version.

    Useless so far. I started over an did a 'git checkout tags/2.5.1.1', configure command and make. I started my VM with the binary I just built and all works fine. Than I cleaned it again and started over again for tags/v2.6.0. Now my VM crashes, started over again for tags/v2.6.0-rc0 and this also crashes my VM.

    With this information I started a bisect again wit 'git bisect start v2.5.1.1 v2.6.0-rc0', all revisions I built were good (no crashes) and at the end:
    Code:
    root@test:/usr/src/qemu# git bisect good
    db51dfc1fcaf0027a5f266b7def4317605848c6a is the first bad commit
    commit db51dfc1fcaf0027a5f266b7def4317605848c6a
    Author: Michael Roth <mdroth@linux.vnet.ibm.com>
    Date:   Mon May 9 11:10:47 2016 -0500
    
        Update version for 2.5.1.1 release
    
        Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com
    
    :100644 100644 73462a5a13445f66009e00988279d30e55aa8363 3a6d2147d6d583da05abf686c317817658ae6fbd M      VERSION
    
    Probably I'm doing something wrong, can you help me again?
     
  9. wbumiller

    wbumiller Proxmox Staff Member
    Staff Member

    Joined:
    Jun 23, 2015
    Messages:
    483
    Likes Received:
    47
    I must apologize. This is one of those times where I wish git revision IDs were comparable to see which direction one is going... I got the order of the 'start' command wrong. It's `git bisect start <bad> <good>`, so the two version parameters need to be swapped. (Doesn't help that the bisect terminology can also be changed in a checked-out repository to add to the confusion.)
     
  10. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    Hmm... that explains a lot ;-) I repeated the steps with this new knowledge and now we have the commit which causes it, I think.
    Code:
    root@test:/usr/src/qemu# git bisect good
    3b3b0628217e2726069990ff9942a5d6d9816bd7 is the first bad commit
    commit 3b3b0628217e2726069990ff9942a5d6d9816bd7
    Author: Paolo Bonzini <pbonzini@redhat.com>
    Date:   Sun Jan 31 11:29:01 2016 +0100
    
        virtio: slim down allocation of VirtQueueElements
    
        Build the addresses and s/g lists on the stack, and then copy them
        to a VirtQueueElement that is just as big as required to contain this
        particular s/g list.  The cost of the copy is minimal compared to that
        of a large malloc.
    
        When virtqueue_map is used on the destination side of migration or on
        loadvm, the iovecs have already been split at memory region boundary,
        so we can just reuse the out_num/in_num we find in the file.
    
        Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
        Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
        Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
        Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    
    :040000 040000 42931b27fc2917c6031a5c487cbc2fe33490c9a0 198a86de8b06888629ec5e0f6b90b24f5ee506cf M      hw
    
    I think it make sense somehow, I'm not a developer but malloc is known :) Memory allocation, my test always fails after writing data and flushing memory caches, maybe the next step at reading the data again. I may be caused by emptying the memory or filling it when reading the data.

    I also tested the IDE bus instead of SCSI or VIRTIO, as mentioned in https://forum.proxmox.com/threads/3000-msec-ping-and-packet-drops-with-virtio-under-load.36687/ And IDE works for me too! I really think both problems are related somehow, the results are not much alike but when I read that topic this weekend I needed to try IDE, I saw some familiarities and that works.
     
  11. aderumier

    aderumier Member

    Joined:
    May 14, 2013
    Messages:
    92
    Likes Received:
    6
    do you have same performance problem, without "--enable-jemalloc" ?

    we enable it mainly for ceph/librbd performance in qemu 2.4, just wonder if this new commit could change behaviour.

    this bugzilla
    https://bugzilla.redhat.com/show_bug.cgi?id=1251353

    talk about jemalloc, tcmalloc before this commit, which seem to fix performance with tcmalloc. Don't known behaviour with jemalloc
     
  12. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    I did some additional tests with a default PVE 5.0 install.
    VM config has NUMA enabled and I use vCPU (1 socket, 8 cores, 2 vCPU's). At Options we add Memory and CPU to Hotplug.
    I tested this VM config with a default Debian Stretch install (memory hotplug in guest enabled in /etc/default/grub, see earlier post) and tried all SCSI Controller Types and hard disk bus types. See attached PDF for the results. The last column in the table represents test results for the earlier failed tests only, for these tests I disabled NUMA, set 1 socket, 2 cores and disabled memory and CPU hotplug.

    Problem really seems related to virtio but it also has something to do with NUMA and/or hotplug. I think it's NUMA because of memory allocation relation but NUMA is needed for memory hotplug.
     

    Attached Files:

  13. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    Thank you for the suggestion. I tested it with a new git clone and removed --enable-jemalloc from the configure command. My test crashes my VM at the first run. I'm not having performance issues BTW, with the test in my first post my VM crashes, in Debian it's after the first run in most cases, sometimes the second run, so it doesn't take very long to know if it works or not ;-) It takes maximum 2 minutes to complete 3 runs of my test, that's enough to verify if there's a problem.

    More suggestions are more than welcome :)
     
  14. aderumier

    aderumier Member

    Joined:
    May 14, 2013
    Messages:
    92
    Likes Received:
    6
    does it crash too if you start vm with more than 4g ?
     
  15. wbumiller

    wbumiller Proxmox Staff Member
    Staff Member

    Joined:
    Jun 23, 2015
    Messages:
    483
    Likes Received:
    47
    If you remove the -daemonize flag from the qemu command line (in case you haven't already) and do the test directly at the bad commit 3b3b0628217, does it show any (error) output? Interestingly it seems to introduce a new error case (and at this revision simply reports it and exits - later commits change this to return instead of exiting directly, so I wonder if there's a difference in how the bug manifests there as well...).
     
    #15 wbumiller, Sep 19, 2017
    Last edited: Sep 19, 2017
  16. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    Yes, just tried it with 6144MB. dd test still 2GB write and read.
     
  17. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    The bisect was finished and the only missing commit was the one which introduces the error. I'm not familiar with git, I did this to apply the commit:
    Code:
    root@test:/usr/src/qemu# git cherry-pick 3b3b0628217
    [detached HEAD beb0fb61a2] virtio: slim down allocation of VirtQueueElements
     Author: Paolo Bonzini <pbonzini@redhat.com>
     Date: Sun Jan 31 11:29:01 2016 +0100
     Committer: root <root@test.localserver>
    Your name and email address were configured automatically based
    on your username and hostname. Please check that they are accurate.
    You can suppress this message by setting them explicitly. Run the
    following command and follow the instructions in your editor to edit
    your configuration file:
    
        git config --global --edit
    
    After doing this, you may fix the identity used for this commit with:
    
        git commit --amend --reset-author
    
     1 file changed, 51 insertions(+), 31 deletions(-)
    root@test:/usr/src/qemu# git status
    HEAD detached from 3724650db0
    You are currently bisecting, started from branch 'master'.
      (use "git bisect reset" to get back to the original branch)
    
    nothing to commit, working tree clean
    root@test:/usr/src/qemu# make
      CC    x86_64-softmmu/hw/virtio/virtio.o
      LINK  x86_64-softmmu/qemu-system-x86_64
    root@test:/usr/src/qemu#
    
    I think I did it correct so I started qemu without -daemonize and you're right, it give some output at the crash but if it helps...
    Code:
    root@test:/usr/src/qemu# /usr/src/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -smbios 'type=1,uuid=bd7f9680-73f0-428e-a421-2e3b6ac733d8' -name test.localdomain -smp '2,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga cirrus -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 'size=1024,slots=255,maxmem=4194304M' -object 'memory-backend-ram,id=ram-node0,size=1024M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -object 'memory-backend-ram,id=mem-dimm0,size=512M' -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' -object 'memory-backend-ram,id=mem-dimm1,size=512M' -device 'pc-dimm,id=dimm1,memdev=mem-dimm1,node=0' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8bc71019ca99' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=02:74:F2:CE:A1:2D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
    
    qemu-system-x86_64: Looped descriptor
    root@test:/usr/src/qemu#
    
    That's all output.

    Is it possible somehow to build qemu 2.6 or newer without this commit? As I understand you can revert it and a new commit will be created which will do the opposite of this commit to revert it but I couldn't get it to work. Probably because of too many changes in virtio.c after this commit.
     
  18. wbumiller

    wbumiller Proxmox Staff Member
    Staff Member

    Joined:
    Jun 23, 2015
    Messages:
    483
    Likes Received:
    47
    Building newer qemus with the above commit reverted will be difficult because there have been a couple more changes in there which would conflict.

    However, the output you posted is useful and helped me spot a not so obvious change that happened with the above commit which seems accidental.
    I have a patch I'd like you to test and pushed it to a branch on github.

    https://github.com/Blub/qemu/commit/7bc9ce912373b571686db231dd97e08564303fa2

    You can checkout the branch this way:
    Reset the bisect state first:
    Code:
    $ git bisect reset
    Add the repository and fetch its branches:
    Code:
    $ git remote add wbumiller https://github.com/Blub/qemu
    $ git fetch wbumiller
    
    Checkout the branch:
    Code:
    $ git checkout wbumiller/virtqueue-count-fix
    Then build & test.
    If this fixes the issue for you I'd forwad the patch to the qemu developer list for them to review and apply. (Also let me know if I should include a `Reported-by` tag with your name in the message, see the various entries in `git log` for what that would look like (I'd need a name & email address)).

    Since this is based on our current 2.9.1 branch it would also be useful to verify that it fails without the patch:
    Code:
    $ git checkout wbumiller/extra
    This one should fail.
     
  19. hansm

    hansm Member

    Joined:
    Feb 27, 2015
    Messages:
    51
    Likes Received:
    2
    YES, it works!!! Thank you! I repeated the qemu build twice and repeated my tests to be sure. I am very grateful, thanks for your help and fix!

    I'm curious, now you know the problem, the cause and the solution, can you think of a way to trigger the problem on your hardware? It still seems I'm the only one having this problem but that bothers me because I can reproduce it all the time on different hardware (all Dell) with pretty default settings.

    I also tested wbumiller/extra and that fails indeed.

    Please add Reported-by tag:
    Code:
    Reported-by: Hans Middelhoek <h.middelhoek@ospito.nl>
    
    When will Proxmox apply the patch in pve-qemu-kvm packages? Directly at the next build, or only when qemu approves it and releases a version where the patch is applied?

    I also replied to thread https://forum.proxmox.com/threads/3...ket-drops-with-virtio-under-load.36687/page-4 It doesn't seem very related but their problems are also solved when they move away from virtio to ide. I think it's interesting to build a test package which can be installed with dpkg -i, so they can easily test if your patch also solves their problem.
     
  20. wbumiller

    wbumiller Proxmox Staff Member
    Staff Member

    Joined:
    Jun 23, 2015
    Messages:
    483
    Likes Received:
    47
    Qemu seems to be counting the buffers in the virtio device's queue wrong in a way which somewhat depends on your hardware and how the guest buffers requests, which in turn can depend on various components. It's probably possible to craft a failing request when directly manipulating the virtio-block or scsi driver (or writing a separate independent virtio test-driver), but the patch seems to make sense to me, works for you, so my preferred next step is to send it upstream to the people who wrote the code and should be much faster at analyzing the situation ;-)

    We'll send it upstream and begin testing with a patched package internally simultaneously, so that if the patch is accepted upstream a package will already be on its way through the internal and afterwards external testing repositories.

    It's unlikely to be related, will first wait for feedback from upstream, then it shouldn't take long for a package to be available in the pvetest repositories.
     
    aderumier likes this.

Share This Page