My VM doesn't boot anymore. The VM disk is on a ZFS. SMART passed. Any ideas?

Darkbotic

Member
Jul 10, 2024
65
1
8
Hello there!

Thank you for reading my message.

I have six 1TB HDD and I created a raidz2 disk with all of them, which resulted in an usable capacity of 3.87TB.
They've been working fine for months but today the VM (Nextcloud) stopped working.

The SMART results on all disks are fine but the VM doesn't boot at all.

Proxmox 1.jpg

I checked the System log and I see what is shown below when I start the VM and it keeps repeating over and over again until I stop it.

Any idea what's going on?

Code:
Jul 16 01:53:10 Proxmox kernel: buffer_io_error: 119 callbacks suppressed
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 65, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 68, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 70, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 69, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 73, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 72, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 75, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 71, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 74, lost async page write
Jul 16 01:53:10 Proxmox kernel: Buffer I/O error on dev zd16, logical block 80, lost async page write
Jul 16 01:53:15 Proxmox kernel: buffer_io_error: 54 callbacks suppressed
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 1, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 4, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 3, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 5, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 6, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 7, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 8, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 10, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 11, lost async page write
Jul 16 01:53:15 Proxmox kernel: Buffer I/O error on dev zd16, logical block 9, lost async page write
Jul 16 01:53:20 Proxmox kernel: buffer_io_error: 80 callbacks suppressed
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 9, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 10, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 11, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 12, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 13, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 16, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 17, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 18, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 19, lost async page write
Jul 16 01:53:20 Proxmox kernel: Buffer I/O error on dev zd16, logical block 20, lost async page write
Jul 16 01:53:25 Proxmox kernel: buffer_io_error: 24 callbacks suppressed
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 42, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 43, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 45, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 46, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 47, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 48, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 49, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 51, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 50, lost async page write
Jul 16 01:53:25 Proxmox kernel: Buffer I/O error on dev zd16, logical block 52, lost async page write
Jul 16 01:53:30 Proxmox kernel: buffer_io_error: 77 callbacks suppressed
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 85, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 86, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 87, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 83, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 88, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 91, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 92, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 94, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 95, lost async page write
Jul 16 01:53:30 Proxmox kernel: Buffer I/O error on dev zd16, logical block 79, lost async page write
Jul 16 01:53:35 Proxmox kernel: buffer_io_error: 22 callbacks suppressed
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 0, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 2, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 3, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 113, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 111, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 114, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 115, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 116, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 117, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 119, lost async page write
 
Please provide the output for qm config {VMID} & pveversion -v
Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

Code:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 6
cpu: host
efidisk0: Nextcloud:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
memory: 8192
meta: creation-qemu=8.1.5,ctime=1713162680
name: Nextcloud
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: Nextcloud:vm-101-disk-1,cache=writeback,discard=on,iothread=1,size=3632795M
scsihw: virtio-scsi-single
smbios1: uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
sockets: 1
vmgenid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
 
Your setup looks ok.
They've been working fine for months but today the VM (Nextcloud) stopped working.
What changed recently? When did you update to latest kernel? When did you (last) try to boot up VM successfully? Is everything else running properly on PVE (other VMs LXCs)?
 
Your setup looks ok.

What changed recently? When did you update to latest kernel? When did you (last) try to boot up VM successfully? Is everything else running properly on PVE (other VMs LXCs)?
Nothing has changed recently. The VM was working fine on Monday and then on Tuesday I got a notification about Nextcloud being offline. I logged into Proxmox and I saw the resources in the Summary screen of that VM were at zero and when I clicked Console, it said that the display had not been started even though the VM was "running". I tried to reboot but it didn't work, it said the qemu agent was not present. I also tried to SSH and ping it but nothing worked so I stopped it and tried to start it again but it didn't work so I checked the System log and saw the errors.

I honestly don't remember when I updated the kernel. I usually do a manual update every day so I'm sure I got the new kernel shortly after it was released. However, after a kernel update I always reboot the whole system and test all VM's and this VM was up and running.

I have two other Windows VMs and they're working fine. However, they're not on the same drive. I have an SSD as the boot drive for Proxmox and those two Windows VMs are using the spare space on that SSD. The only VM that I put on this ZFS pool is this one because I wanted to have redundancy because it's the most important one since it's my personal Nextcloud.

Do you know what the errors on my first post means?
 
Last edited:
I don't use ZFS so I'm no expert - but it would appear something is up with that zpool. I find it interesting that it managed to complete a scrub in 6.5 minutes on a 6tb 6-drive raidz2. Maybe you recently did one just before?

In your own estimate how much data is actually being used in the Nextcloud VM. Which OS did you base it on? How much data have you added there?

What does zpool status show?
 
Last edited:
can be a result of cache=writeback, crash at the wrong moment, then guest can lost important data.
disable it.
 
Are you able to mount the zvol directly (when the VM is off)? Something like ...

Code:
mkdir /mnt/testmp
mount /dev/zvol/Nextcloud/vm-101-disk-1 /mnt/testmp
ls /mnt/testmp
umount /mnt/testmp

Do you mind posting also:

Code:
zfs get all Nextcloud
 
Code:
Jul 16 01:53:35 Proxmox kernel: buffer_io_error: 22 callbacks suppressed
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 0, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 2, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 3, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 113, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 111, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 114, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 115, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 116, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 117, lost async page write
Jul 16 01:53:35 Proxmox kernel: Buffer I/O error on dev zd16, logical block 119, lost async page write
Do you know what the errors on my first post means?

It simply can't write onto the zvol. If you try to mount it directly and e.g. create a file within the volume, it would be interesting to check dmesg -e at the time.
 
I don't use ZFS so I'm no expert - bit it would appear something is up with that zpool. I find it interesting that it managed to complete a scrub in 6.5 minutes on a 6tb 6-drive raidz2. Maybe you recently did one just before?

In your own estimate how much data is actually being used in the Nextcloud VM. Which OS did you base it on? How much data have you added there?

What does zpool status show?
It's the first time I do a scrub manually. Maybe Proxmox does it automatically?
It's definitely less than 100 GB of actual data in that VM.

Here is the output for zpool status

Proxmox 4.jpg
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!