Proxmox 5.2 Unstable system after freeze and interrupted update

Alex456

New Member
Dec 7, 2020
5
0
1
29
Good day,
and sorry for my English.

My proxmox goes deep freeze.

After reboot i get "scanning for all disk message", after that i get
Code:
"INFO task zpool:326 blocked for more than 120 seconds,
Tainted: P 0 4.15.17-1-pve #1"

after system booted I could not start VMs it says:
Code:
kvm: -drive file=/dev/zvol/rpool/data/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on: Could not open '/dev/zvol/rpool/data/vm-100-disk-1': No such file or directory

TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -name energy01 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=6df2b0ee-7cb3-4c22-8493-6d852a396f7e' -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu host,+kvm_pv_unhalt,+kvm_pv_eoi -m 16000 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:5f48a1befe5b' -drive 'file=/mnt/storage01/template/iso/ubuntu-16.04.4-server-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-100-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=82:BB:E5:5D:AD:76,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1

I googled this error and has concluded make
Code:
apt update
apt dist-upgrade

but in the middle of update system was freezed. After that I get unstable system work.

Tryed vzdump --all but with no luck, it says error:
Code:
mailformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before end of string) at /usr/share/perl5/PVE/Tools.pm line 949, <GEN 1693> chunk 1.

syslog: in attched file, problem happened in 6 dec

zpool status
Code:
  pool: rpool
state: ONLINE
  scan: resilvered 16.4M in 0h0m with 0 errors on Mon Dec  7 17:56:11 2020
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sda2    ONLINE       0     0     0
        sdb2    ONLINE       0     0     0

errors: No known data errors

fdisk -l
Code:
Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 1905BFF7-2C27-4057-9005-D84CEDAA45B3

Device          Start        End    Sectors   Size Type
/dev/sda1          34       2047       2014  1007K BIOS boot
/dev/sda2        2048 1953508749 1953506702 931.5G Solaris /usr & Apple ZFS
/dev/sda9  1953508750 1953525134      16385     8M Solaris reserved 1

Partition 1 does not start on physical sector boundary.
Partition 9 does not start on physical sector boundary.


Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 7180DBB4-0C51-4F74-A0AD-AF5FDABA3C76

Device          Start        End    Sectors   Size Type
/dev/sdb1          34       2047       2014  1007K BIOS boot
/dev/sdb2        2048 1953508749 1953506702 931.5G Solaris /usr & Apple ZFS
/dev/sdb9  1953508750 1953525134      16385     8M Solaris reserved 1

Partition 1 does not start on physical sector boundary.
Partition 9 does not start on physical sector boundary.


Disk /dev/zd0: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zd16: 1.5 GiB, 1598029824 bytes, 3121152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd32: 30 GiB, 32212254720 bytes, 62914560 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: gpt
Disk identifier: CEBD5680-EA1C-4FC5-BDE4-F0BE077E5342

Device      Start      End  Sectors Size Type
/dev/zd32p1  2048     4095     2048   1M BIOS boot
/dev/zd32p2  4096 62914526 62910431  30G Linux filesystem


Disk /dev/zd48: 2.5 GiB, 2671771648 bytes, 5218304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes


Disk /dev/zd64: 5 GiB, 5368709120 bytes, 10485760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: gpt
Disk identifier: CEBD5680-EA1C-4FC5-BDE4-F0BE077E5342

Device      Start      End  Sectors Size Type
/dev/zd64p1  2048     4095     2048   1M BIOS boot
/dev/zd64p2  4096 10483711 10479616   5G Linux filesystem


Disk /dev/zd80: 15 GiB, 16106127360 bytes, 31457280 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: gpt
Disk identifier: CEBD5680-EA1C-4FC5-BDE4-F0BE077E5342

Device      Start      End  Sectors Size Type
/dev/zd80p1  2048     4095     2048   1M BIOS boot
/dev/zd80p2  4096 31457246 31453151  15G Linux filesystem


Disk /dev/zd96: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0xf24d1cbb

Device      Boot   Start      End  Sectors  Size Id Type
/dev/zd96p1 *       2048   999423   997376  487M 83 Linux
/dev/zd96p2      1001470 67106815 66105346 31.5G  5 Extended
/dev/zd96p5      1001472 67106815 66105344 31.5G 8e Linux LVM

Partition 2 does not start on physical sector boundary.
full log in attached files

Pls help how to save VMs?
 

Attachments

Last edited:
Please post the output of "smartctl -a /dev/sda" and "smartctl -a /dev/sdb"

Your zpool didnt get imported correctly, vdevs seem missing.

How are the disks connected ? HBA / Raid Controller / Directly ?

You could try to boot with a single disk at a time.
 
Drive sda is broken, you need to replace it.

Serial number WCC1S0970251 (should be printed somewhere on the disk)
 
Last edited:
How to replace it properly?

I found this tutorial:

Code:
/dev/sdb - new disk

sgdisk -R /dev/sdb /dev/sda
sgdisk -G /dev/sdb

pve-efiboot-tool format /dev/sdb2
pve-efiboot-tool init /dev/sdb2
pve-efiboot-tool refresh

zpool attach rpool /dev/disk/by-id/ata-VBOX_HARDDISK_VBfb65757e-ea936e4d-part3 /dev/disk/by-id/ata-VBOX_HARDDISK_VB9b8d476a-b387a510-part3

wait to repair

is it necessary to use pve-efiboot-tool?
or maybe you have better instructions to it?
 
How to replace it properly?

I found this tutorial:

Code:
/dev/sdb - new disk

sgdisk -R /dev/sdb /dev/sda
sgdisk -G /dev/sdb

pve-efiboot-tool format /dev/sdb2
pve-efiboot-tool init /dev/sdb2
pve-efiboot-tool refresh

zpool attach rpool /dev/disk/by-id/ata-VBOX_HARDDISK_VBfb65757e-ea936e4d-part3 /dev/disk/by-id/ata-VBOX_HARDDISK_VB9b8d476a-b387a510-part3

wait to repair

is it necessary to use pve-efiboot-tool?
or maybe you have better instructions to it?

That's fine, it's also documented here https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_change_failed_dev

You need to use pve-efiboot-tool to sync the boot loader otherwise the second disk wont be able to boot alone.

After the zpool attach command run "watch zpool status" and wait for the resilver to finish.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!