After Backup - boot failed: not a bootable disk

sebastiano

Active Member
Jan 10, 2017
14
0
41
39
Hi everyone, i have an hard problem, after backup my vm, in LZO (fast) SNAPSHOT in NFS Disk, VM don't start and tell me boot failed: not a bootable disk... i don't know i can do.... please help me...

1585829310328.png
 
Hi!

If you used the CLI, what command did you use exactly to perform the backup?
What is the output of qm config VMID?
What happens when you try to restore the backup?
 
Can you try to backup your restored VM again and see if you still have this problem?
 
any solution on this? I am experiencing this same issue right now and couldn't find a solution to this.
 
any solution on this? I am experiencing this same issue right now and couldn't find a solution to this.
What is the exact log output of the backup? What does the configuration of your VM look like? Does restoring work?
 
hello,
i have the same issue now and then on one of our 40 a vm's too. if it occurs it is always the same vm. i try to remove the vm completely from the system and restore it from a working backup, but the issue came up again.

as fare as i can see, the backup somehow destroys the MBR of the virtual disk. the partition table is in system cache during running vm, so everything looks fine until next reboot of the vm. then it can be to late because the corrupted MBR is in backup files as well, depends how often you backup and restart the vm.

the first time i encounter this issue i had to boot up from a live disk within the vm and reconstruct the partition table of the disk. luckily i had a clone of this vm and could copy the table from there. since then i have a backup of the partition table from every vm outside of proxmox to recover this state. on top of that i run a cronjob every day to restore table inside vm after backup. this dosen't solve the root of the problem but is the best workaround i came up with to sleep well.


you can check if the table is missing by running this (assumption your disk is /dev/sda)
Bash:
# sfdisk -d /dev/sda
sfdisk: /dev/sda: does not contain a recognized partition table

this should actually looks something like that
Bash:
# sfdisk -d /dev/sda
label: dos
label-id: 0xf1a2e21b
device: /dev/sda
unit: sectors

/dev/sda1 : start=        2048, size=   725616640, type=83, bootable
/dev/sda2 : start=   725618688, size=     8384512, type=5
/dev/sda5 : start=   725620736, size=     8382464, type=82

to make a backup of your partition table run
Bash:
sfdisk -d /dev/sda > /etc/partitiontable.backup

to restore
Bash:
sfdisk -f /dev/sda < /etc/partitiontable.backup
don't forget to reinstall the bootloader
Bash:
grub-install /dev/sda

changing the partition table of a running system is normally a very bad idea, so do it on your own risk and only if you know what you are doing.


hope this helps to save someones data.
 
Thank you for the detailed report!

I just opened bug 2874 for this problem. It would be really helpful if you could provide the output of the following commands from your host:

Code:
pveversion -v
qm config <vmid>
cat /etc/pve/storage.cfg
where <vmid> is the id of the corrupted VM. Should this happen again, then additionally the following would be great:
  • complete log output of the backup that corrupts the VM disk
  • syslog of the time around the backup
Also, is this VM different than other VMs? For example, on a different storage? What happens if you limit the bandwidth for the backup (--bwlimit parameter for vzdump)?
 
Last edited:
  • Like
Reactions: DanielJonce
Hi,

it not happen again until now, but it's not regular at all. so no logs atm.

What happens if you limit the bandwidth for the backup (--bwlimit parameter for vzdump)?
i'm going to experiment with this after i have logs in the current state


the vm is nothing special, same storage as the others.

Code:
# qm config 107
bootdisk: sata0
cores: 4
ide2: none,media=cdrom
memory: 12288
name: jira
net0: e1000=62:33:34:61:39:36,bridge=vmbr2
numa: 1
ostype: l26
parent: upgrade
sata0: local-crypt:107/vm-107-disk-0.qcow2,cache=writethrough,size=350G
smbios1: uuid=a5e1e7d6-ed87-4028-93db-0c741d6e4879
sockets: 1

Code:
# cat /etc/pve/storage.cfg
dir: local
    path /var/lib/vz
    content vztmpl,iso,rootdir,images
    maxfiles 0

dir: local-crypt
    path /var/lib/vz-crypt
    content backup,rootdir,images
    maxfiles 1
    shared 0

dir: nfs-backup
    path /mnt/backup/
    content backup
    maxfiles 1
    shared 1

Code:
# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.3.18-3-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-2
pve-kernel-helper: 6.2-2
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-4.15: 5.4-16
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.15.18-27-pve: 4.15.18-55
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-6
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
 
What is the exact log output of the backup? What does the configuration of your VM look like? Does restoring work?
Sorry for the late reply. No restoring also show the same problem. we are able to fix the partition of the VM.
 
got a destroyed partition table over the weekend again. don't know if it was saturday or sunday, but the log files are looking identical anyway. for me there is nothing wrong. i will try to lower bandwidth now.

/var/log/syslog
Code:
Jul 25 00:03:43 terri vzdump[12329]: INFO: Starting Backup of VM 107 (qemu)
Jul 25 00:04:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:04:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:04:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:05:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:05:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:05:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:06:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:06:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:06:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:07:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:07:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:07:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:08:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:08:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:08:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:09:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:09:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:09:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:10:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:10:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:10:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:11:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:11:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:11:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:12:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:12:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:12:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:13:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:13:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:13:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:14:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:14:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:14:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:15:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:15:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:15:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:16:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:16:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:16:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:17:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:17:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:17:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:18:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:18:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:18:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:19:00 terri systemd[1]: Starting Proxmox VE replication runner...
Jul 25 00:19:00 terri systemd[1]: pvesr.service: Succeeded.
Jul 25 00:19:00 terri systemd[1]: Started Proxmox VE replication runner.
Jul 25 00:19:10 terri vzdump[12329]: INFO: Finished Backup of VM 107 (00:15:27)

no entry in kern.log in the timeframe of backup
 
backuplog:
Code:
INFO: Starting Backup of VM 107 (qemu)
INFO: Backup started at 2020-07-25 00:03:43
INFO: status = running
INFO: VM Name: jira
INFO: include disk 'sata0' 'local-crypt:107/vm-107-disk-0.qcow2' 350G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating archive '/backup/dump/vzdump-qemu-107-2020_07_25-00_03_43.vma.lzo'
INFO: started backup task '98cd8416-1f11-4245-bbf7-0e4ccef15130'
INFO: resuming VM again
INFO: status: 0% (3234725888/375809638400), sparse 0% (2702278656), duration 3, read/write 1078/177 MB/s
INFO: status: 1% (3983540224/375809638400), sparse 0% (2836807680), duration 6, read/write 249/204 MB/s
INFO: status: 2% (7689601024/375809638400), sparse 0% (3438153728), duration 21, read/write 247/206 MB/s
INFO: status: 3% (11441274880/375809638400), sparse 1% (4005117952), duration 71, read/write 75/63 MB/s
INFO: status: 4% (15333457920/375809638400), sparse 1% (5223546880), duration 117, read/write 84/58 MB/s
INFO: status: 5% (18940755968/375809638400), sparse 2% (7730028544), duration 135, read/write 200/61 MB/s
INFO: status: 6% (22574661632/375809638400), sparse 2% (10233151488), duration 148, read/write 279/86 MB/s
INFO: status: 7% (26558201856/375809638400), sparse 3% (11694641152), duration 160, read/write 331/210 MB/s
INFO: status: 8% (30085611520/375809638400), sparse 3% (12278251520), duration 206, read/write 76/63 MB/s
INFO: status: 9% (34536685568/375809638400), sparse 4% (15290085376), duration 213, read/write 635/205 MB/s
INFO: status: 10% (37729599488/375809638400), sparse 4% (17046466560), duration 222, read/write 354/159 MB/s
INFO: status: 11% (41469870080/375809638400), sparse 4% (17266765824), duration 276, read/write 69/65 MB/s
INFO: status: 12% (45282361344/375809638400), sparse 4% (17544867840), duration 308, read/write 119/110 MB/s
INFO: status: 15% (56828821504/375809638400), sparse 7% (27434426368), duration 342, read/write 339/48 MB/s
INFO: status: 21% (81489100800/375809638400), sparse 13% (52094566400), duration 345, read/write 8220/0 MB/s
INFO: status: 22% (84292927488/375809638400), sparse 14% (54441709568), duration 348, read/write 934/152 MB/s
INFO: status: 23% (86552215552/375809638400), sparse 14% (54571765760), duration 360, read/write 188/177 MB/s
INFO: status: 24% (90469695488/375809638400), sparse 14% (54837395456), duration 436, read/write 51/48 MB/s
INFO: status: 25% (96234569728/375809638400), sparse 15% (57908670464), duration 464, read/write 205/96 MB/s
INFO: status: 32% (120589320192/375809638400), sparse 21% (82263261184), duration 467, read/write 8118/0 MB/s
INFO: status: 37% (139747000320/375809638400), sparse 26% (101289701376), duration 470, read/write 6385/43 MB/s
INFO: status: 38% (142981726208/375809638400), sparse 27% (103370792960), duration 476, read/write 539/192 MB/s
INFO: status: 39% (148907622400/375809638400), sparse 28% (107854180352), duration 514, read/write 155/37 MB/s
INFO: status: 42% (160958840832/375809638400), sparse 31% (119587770368), duration 517, read/write 4017/105 MB/s
INFO: status: 47% (178487689216/375809638400), sparse 36% (136958136320), duration 520, read/write 5842/52 MB/s
INFO: status: 48% (180464254976/375809638400), sparse 36% (137301590016), duration 530, read/write 197/163 MB/s
INFO: status: 49% (184266326016/375809638400), sparse 36% (137407660032), duration 575, read/write 84/82 MB/s
INFO: status: 50% (187937193984/375809638400), sparse 36% (137487339520), duration 625, read/write 73/71 MB/s
INFO: status: 51% (191799099392/375809638400), sparse 36% (137676054528), duration 657, read/write 120/114 MB/s
INFO: status: 52% (195653402624/375809638400), sparse 37% (139235336192), duration 697, read/write 96/57 MB/s
INFO: status: 53% (199910621184/375809638400), sparse 37% (142804611072), duration 704, read/write 608/98 MB/s
INFO: status: 54% (203187159040/375809638400), sparse 38% (144192688128), duration 713, read/write 364/209 MB/s
INFO: status: 55% (206775058432/375809638400), sparse 39% (146785873920), duration 718, read/write 717/198 MB/s
INFO: status: 56% (210738151424/375809638400), sparse 39% (148582547456), duration 762, read/write 90/49 MB/s
INFO: status: 57% (217064931328/375809638400), sparse 41% (154767609856), duration 765, read/write 2108/47 MB/s
INFO: status: 58% (217977192448/375809638400), sparse 41% (154954199040), duration 769, read/write 228/181 MB/s
INFO: status: 59% (221740269568/375809638400), sparse 41% (157470150656), duration 775, read/write 627/207 MB/s
INFO: status: 60% (225810448384/375809638400), sparse 42% (160284737536), duration 781, read/write 678/209 MB/s
INFO: status: 61% (229257707520/375809638400), sparse 42% (160748589056), duration 842, read/write 56/48 MB/s
INFO: status: 62% (234319642624/375809638400), sparse 43% (165060612096), duration 853, read/write 460/68 MB/s
INFO: status: 63% (238529675264/375809638400), sparse 45% (169138917376), duration 856, read/write 1403/43 MB/s
INFO: status: 64% (240806526976/375809638400), sparse 45% (170999603200), duration 863, read/write 325/59 MB/s
INFO: status: 66% (251415363584/375809638400), sparse 48% (181311811584), duration 866, read/write 3536/98 MB/s
INFO: status: 71% (268911902720/375809638400), sparse 52% (198672842752), duration 869, read/write 5832/45 MB/s
INFO: status: 74% (281498746880/375809638400), sparse 56% (210995277824), duration 872, read/write 4195/88 MB/s
INFO: status: 81% (304507650048/375809638400), sparse 62% (233974747136), duration 875, read/write 7669/9 MB/s
INFO: status: 82% (309385756672/375809638400), sparse 63% (238467366912), duration 878, read/write 1626/128 MB/s
INFO: status: 86% (323550576640/375809638400), sparse 67% (252417388544), duration 881, read/write 4721/71 MB/s
INFO: status: 92% (347758133248/375809638400), sparse 73% (276620820480), duration 884, read/write 8069/1 MB/s
INFO: status: 97% (364908249088/375809638400), sparse 78% (293770838016), duration 887, read/write 5716/0 MB/s
INFO: status: 98% (371609567232/375809638400), sparse 79% (300391800832), duration 898, read/write 609/7 MB/s
INFO: status: 99% (372175929344/375809638400), sparse 79% (300455141376), duration 901, read/write 188/167 MB/s
INFO: status: 100% (375809638400/375809638400), sparse 80% (302152114176), duration 915, read/write 259/138 MB/s
INFO: transferred 375809 MB in 915 seconds (410 MB/s)
INFO: archive file size: 51.20GB
INFO: delete old backup '/backup/dump/vzdump-qemu-107-2020_07_24-00_03_50.vma.lzo'
INFO: Finished Backup of VM 107 (00:15:27)
 
there is no option for bandwith in backup gui. is it safe to add additional command line parameters into /etc/cron.d/vzdump or will it break some parsing for the gui?
 
thanks, but thats for all backups in the entire cluster. i want to throttle down only the backup of this specific vm, to see if it resolve the problem as suggested by Dominic.
 
I'm having a similar problem. After a reboot, one of my vms won't start because the system drive is "not a bootable disk"

Since the vm was working as expected until the reboot, I didn't know there was a problem with it and none of the backups I have of the vm fix the issue. I suspect they are all backups of the corrupted disk.

I just came here to include my system details in case it's helpful for resolving Bug # 2874

Output of pveversion -v

Code:
root@server:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-6
pve-cluster: 7.1-3
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

Output of qm config <vmid>

Code:
root@server:~# qm config 106
boot: order=sata0;ide2;net0
cores: 2
ide2: local:iso/ubuntu-20.04.3-live-server-amd64.iso,media=cdrom,size=1231808K
memory: 2048
meta: creation-qemu=6.1.0,ctime=1644723313
name: pihole
net0: virtio=42:36:24:4A:38:68,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
sata0: local-lvm:vm-106-disk-0,size=32G
smbios1: uuid=45ebcf00-603d-4d6e-94a7-7e203a3e6ffa
sockets: 1
vmgenid: eb9002dc-00e2-48e2-906e-6c762eb4713b

Output of cat /etc/pve/storage.cfg

Code:
root@server:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl,backup,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir
        nodes server

cifs: storage01
        path /mnt/pve/storage01
        server [REDACTED]
        share Storage
        content images,rootdir
        domain [REDACTED]
        prune-backups keep-all=1
        username [REDACTED]

dir: backup
        path /mnt/pve/storage01
        content backup
        prune-backups keep-all=1
        shared 0

zfspool: vm
        pool vm
        content images,rootdir
        mountpoint /vm
        nodes dl360

zfspool: nextcloud
        pool nextcloud
        content rootdir,images
        mountpoint /nextcloud
        nodes dl360
 
Last edited:
I have similar problem. I confirm it isn't an issue with the hardware as I have simply copied or cloned the VM to a brand new server and a brand new install of latest proxmox ve 7.1-2. Out of the 3 VMs I had, it only happened to this one specific one. It seems consistent that High I/O resulted to corrupting the partition table. I manage to recreate the problem several different times when I forced running a restore while another VM was backing up, sshfs one large file to the other server while the VM was running, etc. on separate occasions. That's to say, it resulted to High I/O to the point all the processes freezes. To this point, I had to unlock VMs in order to perform a proper shutdown. The Guest OS's were Windows Server 2016, Debian-Based Openmediavault and, Windows 10 VM. Windows Server 2016 partition table seems to be the one that got corrupted every often. I was able to clone and make previous backups so my temporary solution is to restore previous good backup and re-dump my files from my other backup VM. Out of the installs of 30 of my clients, I only experienced this to 1 of them (knocking on wood).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!