Linux (Debian) VMs nach Backup down

romanpoe

New Member
Jun 18, 2015
22
0
1
Vienna
www.marketmind.at
Hallo,
seit nahezu 2 jahren läuft mein backup aus proxmox heraus ohne probleme, jede nacht um 02:00 Uhr werden bestimmte Linux VMs über den internen schedule gesichert.
vor ein paar tagen hab ich auf proxmox 5.x hochgezogen (von der letzten 4.x Version) , seit dem werden die VMs auch korrekt heruntergefahren und gesichert (Mode Shutdown) - dummerweise starten sie aber anschließend nicht mehr - was sie aber zuvor ohne probleme gemacht haben...

Code:
proxmox-ve: 5.0-19 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.4.40-1-pve: 4.4.40-82
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.13-2-pve: 4.4.13-58
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.4.10-1-pve: 4.4.10-54
pve-kernel-4.4.62-1-pve: 4.4.62-88
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.10.17-2-pve: 4.10.17-19
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.8-1-pve: 4.4.8-52
libpve-http-server-perl: 2.0-5
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-3
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90


BackupLog:
Code:
INFO: starting new backup job: vzdump 555 --compress lzo --remove 0 --storage BackUp_LNXStorage --node pve4 --mode stop
INFO: Starting Backup of VM 555 (qemu)
INFO: status = running
INFO: update VM 555: -lock backup
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: xxxx.xxxxx..xxx
INFO: include disk 'virtio0' 'pve4ZFS:555/vm-555-disk-1.raw' 32G
INFO: stopping vm
INFO: creating archive '/mnt/pve/BackUp_LNXStorage/dump/vzdump-qemu-555-2017_08_16-09_03_20.vma.lzo'
INFO: starting kvm to execute backup task
INFO: started backup task '755bcedb-6234-435c-b276-3307b87bd8c5'
INFO: resume VM
ERROR: VM 555 not running
INFO: aborting backup job
ERROR: VM 555 not running
INFO: restarting vm
INFO: start failed: org.freedesktop.systemd1.UnitExists: Unit 555.scope already exists.
command 'qm start 555 --skiplock' failed: exit code 255
ERROR: Backup of VM 555 failed - VM 555 not running
INFO: Backup job finished with errors
TASK ERROR: job errors


Code:
#Ubuntu Server LTS 16.04.1
bootdisk: virtio0
cores: 2
ide2: none,media=cdrom
memory: 3072
name: xxxxxxxxxxxxxxxxx
net0: virtio=C6:BF:43:94:64:EC,bridge=vmbr1
numa: 1
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=c70be1ac-138a-4aaa-b692-191631507c53
sockets: 1
virtio0: pve4ZFS:555/vm-555-disk-1.raw,cache=writethrough,size=32G
 
Last edited:
bitte pveversion -v, backup logs und VM config posten
 
does the VM correctly shutdown without manual interverntion when you press the shutdown button on the GUI? how long does it take?
 
I think it is possible that querying the backup status fails because your system is overloaded. could you check the load and journal?
 
could you post the complete journal from before starting the backup until after it has failed?
 
journalctl -f


Code:
Aug 17 14:37:26 pve4 pvedaemon[1098]: INFO: Starting Backup of VM 555 (qemu)
Aug 17 14:37:26 pve4 qm[1103]: <root@pam> update VM 555: -lock backup
Aug 17 14:37:27 pve4 qm[1112]: <root@pam> starting task UPID:pve4:0000045A:01B4D147:59958E07:qmshutdown:555:root@pam:
Aug 17 14:37:27 pve4 qm[1114]: shutdown VM 555: UPID:pve4:0000045A:01B4D147:59958E07:qmshutdown:555:root@pam:
Aug 17 14:37:33 pve4 kernel: vmbr1: port 8(tap555i0) entered disabled state
Aug 17 14:37:33 pve4 qm[1112]: <root@pam> end task UPID:pve4:0000045A:01B4D147:59958E07:qmshutdown:555:root@pam: OK
Aug 17 14:37:33 pve4 systemd[1]: Started 555.scope.
Aug 17 14:37:33 pve4 systemd-udevd[1167]: Could not generate persistent MAC address for tap555i0: No such file or directory
Aug 17 14:37:34 pve4 kernel: device tap555i0 entered promiscuous mode
Aug 17 14:37:34 pve4 kernel: vmbr1: port 8(tap555i0) entered blocking state
Aug 17 14:37:34 pve4 kernel: vmbr1: port 8(tap555i0) entered disabled state
Aug 17 14:37:34 pve4 kernel: vmbr1: port 8(tap555i0) entered blocking state
Aug 17 14:37:34 pve4 kernel: vmbr1: port 8(tap555i0) entered forwarding state
Aug 17 14:37:35 pve4 kernel: kvm[1207]: segfault at 10 ip 000055b513b62260 sp 00007f80585fc2b8 error 4 in kvm[55b51361e000+7b9000]
Aug 17 14:37:35 pve4 kernel: vmbr1: port 8(tap555i0) entered disabled state
Aug 17 14:37:35 pve4 kernel: vmbr1: port 8(tap555i0) entered disabled state
Aug 17 14:37:36 pve4 pvedaemon[1098]: VM 555 qmp command failed - VM 555 not running
Aug 17 14:37:36 pve4 pvedaemon[1098]: VM 555 qmp command failed - VM 555 not running
Aug 17 14:37:37 pve4 qm[1219]: <root@pam> starting task UPID:pve4:000004CC:01B4D513:59958E11:qmstart:555:root@pam:
Aug 17 14:37:37 pve4 qm[1228]: start VM 555: UPID:pve4:000004CC:01B4D513:59958E11:qmstart:555:root@pam:
Aug 17 14:37:37 pve4 systemd[1]: Stopped 555.scope.
Aug 17 14:37:37 pve4 qm[1228]: start failed: org.freedesktop.systemd1.UnitExists: Unit 555.scope already exists.
Aug 17 14:37:37 pve4 qm[1219]: <root@pam> end task UPID:pve4:000004CC:01B4D513:59958E11:qmstart:555:root@pam: start failed: org.freedesktop.systemd1.UnitExists: Unit 555.scope already exists.
Aug 17 14:37:37 pve4 pvedaemon[1098]: command 'qm start 555 --skiplock' failed: exit code 255
Aug 17 14:37:37 pve4 pvedaemon[1098]: ERROR: Backup of VM 555 failed - VM 555 not running
Aug 17 14:37:37 pve4 pvedaemon[1098]: INFO: Backup job finished with errors
Aug 17 14:37:37 pve4 pvedaemon[1098]: job errors
Aug 17 14:37:37 pve4 pvedaemon[30548]: <xxxx@xxxxx.at> end task UPID:pve4:0000044A:01B4D0AF:59958E06:vzdump::xxxx@xxxx.at: job errors
Aug 17 14:38:00 pve4 systemd[1]: Starting Proxmox VE replication runner...
Aug 17 14:38:01 pve4 systemd[1]: Started Proxmox VE replication runner.
 
die VM ist beim Starten abgestürzt:

Code:
Aug 17 14:37:35 pve4 kernel: kvm[1207]: segfault at 10 ip 000055b513b62260 sp 00007f80585fc2b8 error 4 in kvm[55b51361e000+7b9000]

scheint diese Logmeldung jedes Mal auf?
 
could you try upgrading to pve-qemu-kvm 2.9.0-4 ? if the problem persists, please install the pve-qemu-kvm-dbg package, install and enable systemd-coredump and check if it is able to collect a coredump from the crashed qemu process.
 
in the meantime pve-qemu-kvm_2.9.0-4 is available in the repo, today @night I upgraded all nodes to

Code:
proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.4.40-1-pve: 4.4.40-82
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.13-2-pve: 4.4.13-58
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.4.10-1-pve: 4.4.10-54
pve-kernel-4.4.62-1-pve: 4.4.62-88
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.8-1-pve: 4.4.8-52
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90

and rebooted all nodes.
first test with backup was unsuccessful:

Code:
INFO: starting new backup job: vzdump 555 --mode stop --remove 0 --node pve4 --compress lzo --storage BackUp_LNXStorage
INFO: Starting Backup of VM 555 (qemu)
INFO: status = running
INFO: update VM 555: -lock backup
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: xxx.yyyy.at
INFO: include disk 'virtio0' 'pve4ZFS:555/vm-555-disk-1.raw' 32G
INFO: stopping vm
INFO: creating archive '/mnt/pve/BackUp_LNXStorage/dump/vzdump-qemu-555-2017_08_24-07_18_34.vma.lzo'
INFO: starting kvm to execute backup task
INFO: started backup task '3ea9b0a7-ab4e-4c72-afd9-0b122d009887'
INFO: resume VM
ERROR: VM 555 not running
INFO: aborting backup job
ERROR: VM 555 not running
INFO: restarting vm
INFO: start failed: org.freedesktop.systemd1.UnitExists: Unit 555.scope already exists.
command 'qm start 555 --skiplock' failed: exit code 255
ERROR: Backup of VM 555 failed - VM 555 not running
INFO: Backup job finished with errors
TASK ERROR: job errors

journalctl -f
Code:
Aug 24 07:26:23 pve4 pvedaemon[2842]: <my.address@world.com> starting task UPID:pve4:00001AEC:0000E2EA:599E637F:vzdump::my.address@world.com:
Aug 24 07:26:23 pve4 pvedaemon[6892]: INFO: starting new backup job: vzdump 555 --compress lzo --storage BackUp_LNXStorage --mode stop --node pve4 --remove 0
Aug 24 07:26:23 pve4 pvedaemon[6892]: INFO: Starting Backup of VM 555 (qemu)
Aug 24 07:26:24 pve4 qm[6897]: <root@pam> update VM 555: -lock backup
Aug 24 07:26:25 pve4 qm[6901]: <root@pam> starting task UPID:pve4:00001AFA:0000E393:599E6381:qmshutdown:555:root@pam:
Aug 24 07:26:25 pve4 qm[6906]: shutdown VM 555: UPID:pve4:00001AFA:0000E393:599E6381:qmshutdown:555:root@pam:
Aug 24 07:26:30 pve4 kernel: vmbr1: port 7(tap555i0) entered disabled state
Aug 24 07:26:31 pve4 qm[6901]: <root@pam> end task UPID:pve4:00001AFA:0000E393:599E6381:qmshutdown:555:root@pam: OK
Aug 24 07:26:31 pve4 systemd[1]: Started 555.scope.
Aug 24 07:26:31 pve4 systemd-udevd[6951]: Could not generate persistent MAC address for tap555i0: No such file or directory
Aug 24 07:26:31 pve4 kernel: device tap555i0 entered promiscuous mode
Aug 24 07:26:31 pve4 kernel: vmbr1: port 7(tap555i0) entered blocking state
Aug 24 07:26:31 pve4 kernel: vmbr1: port 7(tap555i0) entered disabled state
Aug 24 07:26:31 pve4 kernel: vmbr1: port 7(tap555i0) entered blocking state
Aug 24 07:26:31 pve4 kernel: vmbr1: port 7(tap555i0) entered forwarding state
Aug 24 07:26:32 pve4 kernel: kvm[6988]: segfault at 10 ip 0000558bb416b680 sp 00007fc834dfc2f8 error 4 in kvm[558bb3c28000+7b7000]
Aug 24 07:26:32 pve4 kernel: vmbr1: port 7(tap555i0) entered disabled state
Aug 24 07:26:32 pve4 kernel: vmbr1: port 7(tap555i0) entered disabled state
Aug 24 07:26:33 pve4 pvedaemon[6892]: VM 555 qmp command failed - VM 555 not running
Aug 24 07:26:33 pve4 pvedaemon[6892]: VM 555 qmp command failed - VM 555 not running
Aug 24 07:26:34 pve4 qm[7021]: <root@pam> starting task UPID:pve4:00001B6E:0000E76B:599E638A:qmstart:555:root@pam:
Aug 24 07:26:34 pve4 qm[7022]: start VM 555: UPID:pve4:00001B6E:0000E76B:599E638A:qmstart:555:root@pam:
Aug 24 07:26:34 pve4 systemd[1]: Stopped 555.scope.
Aug 24 07:26:34 pve4 qm[7022]: start failed: org.freedesktop.systemd1.UnitExists: Unit 555.scope already exists.
Aug 24 07:26:34 pve4 qm[7021]: <root@pam> end task UPID:pve4:00001B6E:0000E76B:599E638A:qmstart:555:root@pam: start failed: org.freedesktop.systemd1.UnitExists: Unit 555.scope already exists.
Aug 24 07:26:34 pve4 pvedaemon[6892]: command 'qm start 555 --skiplock' failed: exit code 255
Aug 24 07:26:34 pve4 pvedaemon[6892]: ERROR: Backup of VM 555 failed - VM 555 not running
Aug 24 07:26:35 pve4 pvedaemon[6892]: INFO: Backup job finished with errors
Aug 24 07:26:35 pve4 pvedaemon[6892]: job errors
Aug 24 07:26:35 pve4 pvedaemon[2842]: <my.address@world.com> end task UPID:pve4:00001AEC:0000E2EA:599E637F:vzdump::my.address@world.com: job errors
 
Last edited:
okay, so the next step would be to install the pve-qemu-kvm-dbg package, install and enable systemd-coredump and check if it is able to collect a coredump from the crashed qemu process ("coredumpctl list kvm"). if it does, please attempt to get details with "coredump info kvm"
 
root@pve4:~# coredumpctl info kvm
Code:
           PID: 15543 (kvm)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Thu 2017-08-24 09:57:31 CEST (2min 8s ago)
  Command Line: /usr/bin/kvm -id 555 -chardev socket,id=qmp,path=/var/run/qemu-server/555.qmp,server,nowait -mon chardev=qmp,mode=control -pidfile /var/run/qemu-server/555.pid -daemonize -s
    Executable: /usr/bin/kvm
 Control Group: /
         Slice: -.slice
       Boot ID: aa574415f8c442168711a3759eb268ec
    Machine ID: bd94244c0da6419a82a383e62dc03b51
      Hostname: pve4
       Storage: /var/lib/systemd/coredump/core.kvm.0.aa574415f8c442168711a3759eb268ec.15543.1503561451000000000000.lz4
       Message: Process 15543 (kvm) of user 0 dumped core.

                Stack trace of thread 15582:
                #0  0x000055616edaa680 n/a (/usr/bin/kvm)
                #1  0x000055616ed728c8 n/a (/usr/bin/kvm)
                #2  0x000055616ed6ea8b n/a (/usr/bin/kvm)
                #3  0x000055616ed6eb29 n/a (/usr/bin/kvm)
                #4  0x000055616edaa6b6 n/a (/usr/bin/kvm)
                #5  0x000055616eaf1692 n/a (/usr/bin/kvm)
                #6  0x000055616ed0fb12 n/a (/usr/bin/kvm)
                #7  0x000055616ed0cc30 n/a (/usr/bin/kvm)
                #8  0x000055616ead6a28 n/a (/usr/bin/kvm)
                #9  0x000055616ead3ded n/a (/usr/bin/kvm)
                #10 0x000055616ead7dfc n/a (/usr/bin/kvm)
                #11 0x000055616ea8877f n/a (/usr/bin/kvm)
                #12 0x000055616ead2c58 n/a (/usr/bin/kvm)
                #13 0x000055616eabfb64 n/a (/usr/bin/kvm)
                #14 0x00007f8326d1d494 n/a (n/a)
 
did you install pve-kvm-dbg?
 
yes, installed with "apt install pve-qemu-kvm-dbg" and also "apt install systemd-coredump"

root@pve4:~# systemctl enable systemd-coredump@
The unit files have no installation config (WantedBy, RequiredBy, Also, Alias
settings in the [Install] section, and DefaultInstance for template units).
This means they are not meant to be enabled using systemctl.
Possible reasons for having this kind of units are:
1) A unit may be statically enabled by being symlinked from another unit's
.wants/ or .requires/ directory.
2) A unit's purpose may be to act as a helper for some other unit which has
a requirement dependency on it.
3) A unit may be started when needed via activation (socket, path, timer,
D-Bus, udev, scripted systemctl call, ...).
4) In case of template units, the unit is meant to be enabled with some
instance name specified.

a more precise explanation how to enable coredump would be helpful...

Roman
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!