Hi,
I noticed that during Backup of vms, one backup fails. It happened twice in last 2 weeks and only for one vm and almost at exact time.
As it turns out vm died during backup process. I am not sure if it was down immediately or after some freeze. However, if there was freeze it only lasted for few minutes - I only received info from nagios that system went down.
I checked all the logs from compute node and from all osds and there weren't any errors like PG errors and so on. The only error I found was inside the vm and that about failed backup.
It looks like there was some issue with storage, but it is hard to find any info about that in logs
What could be the problem with that? Is there any way to log info from running kvm so that I could see why it crashed ? Or maybe it is known issue and I should just upgrade ?
I am using Proxmox and Ceph in the following versions.
# pveversion
pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-31-pve)
# ceph -v
ceph version 0.80.5
Kind regards,
Piotr D
I noticed that during Backup of vms, one backup fails. It happened twice in last 2 weeks and only for one vm and almost at exact time.
138: Dec 05 01:41:49 INFO: Starting Backup of VM 138 (qemu)
138: Dec 05 01:41:49 INFO: status = running
138: Dec 05 01:41:50 INFO: update VM 138: -lock backup
138: Dec 05 01:41:50 INFO: backup mode: snapshot
138: Dec 05 01:41:50 INFO: ionice priority: 7
138: Dec 05 01:41:50 INFO: snapshots found (not included into backup)
138: Dec 05 01:41:50 INFO: creating archive '/mnt/pve/backup/dump/vzdump-qemu-138-2014_12_05-01_41_49.vma.lzo'
138: Dec 05 01:41:50 INFO: started backup task '5af639e8-ae33-4d25-a98a-25a52a7c6ca3'
138: Dec 05 01:41:53 INFO: status: 0% (114753536/201863462912), sparse 0% (22028288), duration 3, 38/30 MB/s
138: Dec 05 01:42:26 INFO: status: 1% (2039087104/201863462912), sparse 0% (1409708032), duration 36, 58/16 MB/s
138: Dec 05 01:42:56 INFO: status: 2% (4072734720/201863462912), sparse 1% (3144876032), duration 66, 67/9 MB/s
138: Dec 05 01:43:25 INFO: status: 3% (6077087744/201863462912), sparse 2% (4887990272), duration 95, 69/9 MB/s
138: Dec 05 01:43:54 INFO: status: 4% (8156676096/201863462912), sparse 3% (6710759424), duration 124, 71/8 MB/s
138: Dec 05 01:44:19 INFO: status: 5% (10167058432/201863462912), sparse 4% (8718745600), duration 149, 80/0 MB/s
138: Dec 05 01:44:44 INFO: status: 6% (12178423808/201863462912), sparse 5% (10725748736), duration 174, 80/0 MB/s
138: Dec 05 01:45:18 INFO: status: 7% (14163968000/201863462912), sparse 6% (12200509440), duration 208, 58/15 MB/s
138: Dec 05 01:45:45 INFO: status: 8% (16198205440/201863462912), sparse 6% (14013587456), duration 235, 75/8 MB/s
138: Dec 05 01:46:26 INFO: status: 9% (18205179904/201863462912), sparse 7% (15217860608), duration 276, 48/19 MB/s
138: Dec 05 01:46:42 ERROR: VM 138 not running
138: Dec 05 01:46:42 INFO: aborting backup job
138: Dec 05 01:46:42 ERROR: VM 138 not running
138: Dec 05 01:46:44 ERROR: Backup of VM 138 failed - VM 138 not running
As it turns out vm died during backup process. I am not sure if it was down immediately or after some freeze. However, if there was freeze it only lasted for few minutes - I only received info from nagios that system went down.
I checked all the logs from compute node and from all osds and there weren't any errors like PG errors and so on. The only error I found was inside the vm and that about failed backup.
Dec 5 01:46:01 *-srv01 systemd: serial-getty@ttyS0.service holdoff time over, scheduling restart.
Dec 5 01:46:01 *-srv01 systemd: Stopping Serial Getty on ttyS0...
Dec 5 01:46:01 *-srv01 systemd: Starting Serial Getty on ttyS0...
Dec 5 01:46:01 *-srv01 systemd: Started Serial Getty on ttyS0.
Dec 5 01:46:11 *-srv01 systemd: serial-getty@ttyS0.service holdoff time over, scheduling restart.
Dec 5 01:46:11 *-srv01 systemd: Stopping Serial Getty on ttyS0...
Dec 5 01:46:11 *-srv01 systemd: Starting Serial Getty on ttyS0...
Dec 5 01:46:11 *-srv01 systemd: Started Serial Getty on ttyS0.
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
Dec 5 03:59:36 *-srv01 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="759" x-info="http://www.rsyslog.com"] start
Dec 5 03:59:27 *-srv01 journal: Runtime journal is using 8.0M (max 399.2M, leaving 598.8M of free 3.8G, current limit 399.2M).
Dec 5 03:59:27 *-srv01 kernel: Initializing cgroup subsys cpuset
Dec 5 03:59:27 *-srv01 kernel: Initializing cgroup subsys cpu
Dec 5 03:59:27 *-srv01 kernel: Initializing cgroup subsys cpuacct
Dec 5 03:59:27 *-srv01 kernel: Linux version 3.17.4-2.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Sun Nov
It looks like there was some issue with storage, but it is hard to find any info about that in logs
What could be the problem with that? Is there any way to log info from running kvm so that I could see why it crashed ? Or maybe it is known issue and I should just upgrade ?
I am using Proxmox and Ceph in the following versions.
# pveversion
pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-31-pve)
# ceph -v
ceph version 0.80.5
Kind regards,
Piotr D