zfs raid1 kernel oops when disk full?

steev · Aug 15, 2025

Proxmox 9.0

Last night my backup job failed with the log:

Code:

INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp417990_104 for temporary files
INFO: Starting Backup of VM 104 (lxc)
INFO: Backup started at 2025-08-14 21:04:48
INFO: status = running
INFO: CT Name: jellyfin
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-104-2025_08_14-21_04_48.tar.zst'

Looking at the system log, I got this:

Code:

Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2882148, lost async page write
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2882082, lost async page write
Aug 14 21:54:40 proxmox1 kernel: ------------[ cut here ]------------
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2882243, lost async page write
Aug 14 21:54:40 proxmox1 kernel: WARNING: CPU: 10 PID: 157403 at mm/page-writeback.c:2541 writeback_iter+0x289/0x2f0
Aug 14 21:54:40 proxmox1 kernel: Modules linked in: xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2882183, lost async page write
Aug 14 21:54:40 proxmox1 kernel:  nf_conntrack
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2882143, lost async page write
Aug 14 21:54:40 proxmox1 kernel:  nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat overlay veth ebtable_filter ebtables ip_set ip6table_raw
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2881862, lost async page write
Aug 14 21:54:40 proxmox1 kernel:  iptable_raw
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2882236, lost async page write
Aug 14 21:54:40 proxmox1 kernel:  ip6table_filter
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 1266983, lost async page write
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2881840, lost async page write
Aug 14 21:54:40 proxmox1 kernel:  ip6_tables iptable_filter nf_tables sunrpc binfmt_misc bonding tls nfnetlink_log snd_sof_amd_acp70 snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt mt>
Aug 14 21:54:40 proxmox1 kernel: Buffer I/O error on dev zd0, logical block 2882242, lost async page write
Aug 14 21:54:40 proxmox1 kernel:  drm_ttm_helper snd_rpl_pci_acp6x btrtl snd_acp_pci ttm snd_hda_core btintel snd_acp_legacy_common drm_exec snd_pci_acp6x snd_hwdep drm_suballoc_helper btbcm kvm_>
Aug 14 21:54:40 proxmox1 kernel: CPU: 10 UID: 0 PID: 157403 Comm: kworker/u48:0 Tainted: P           O       6.14.8-2-pve #1
Aug 14 21:54:40 proxmox1 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
Aug 14 21:54:40 proxmox1 kernel: Hardware name: SYSTEM_MANUFACTURER SYSTEM_PRODUCT_NAME/Default string, BIOS PM6B0_08.28W 09/27/2024
Aug 14 21:54:40 proxmox1 kernel: Workqueue: writeback wb_workfn (flush-zfs-6)
Aug 14 21:54:40 proxmox1 kernel: RIP: 0010:writeback_iter+0x289/0x2f0
Aug 14 21:54:40 proxmox1 kernel: Code: a8 04 0f 84 f7 fd ff ff 48 8b 53 18 48 c1 fa 0c 4c 89 ef e8 49 dd ff ff e9 e2 fd ff ff 49 c7 45 60 00 00 00 00 e9 47 ff ff ff <0f> 0b 83 fe 01 0f 84 2c fe f>
Aug 14 21:54:40 proxmox1 kernel: RSP: 0018:ffffbbaa9a11b8e0 EFLAGS: 00010202
Aug 14 21:54:40 proxmox1 kernel: RAX: 00000000000003ff RBX: ffffbbaa9a11baf0 RCX: 000000000000001c
Aug 14 21:54:40 proxmox1 kernel: RDX: ffffedf6895bfd80 RSI: 0000000000000000 RDI: ffff9eeb4552ab10
Aug 14 21:54:40 proxmox1 kernel: RBP: ffffbbaa9a11b900 R08: 0000000000000000 R09: 0000000000000000
Aug 14 21:54:40 proxmox1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9eeb4552ab10
Aug 14 21:54:40 proxmox1 kernel: R13: ffff9eeb4552ab10 R14: ffffbbaa9a11b91c R15: ffffbbaa9a11b91c
Aug 14 21:54:40 proxmox1 kernel: FS:  0000000000000000(0000) GS:ffff9eedbe700000(0000) knlGS:0000000000000000
Aug 14 21:54:40 proxmox1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 14 21:54:40 proxmox1 kernel: CR2: 00007039ee08fdc8 CR3: 0000000068e38000 CR4: 0000000000f50ef0
Aug 14 21:54:40 proxmox1 kernel: PKRU: 55555554
Aug 14 21:54:40 proxmox1 kernel: Call Trace:
Aug 14 21:54:40 proxmox1 kernel:  <TASK>
Aug 14 21:54:40 proxmox1 kernel:  ? __pfx_zpl_putfolio+0x10/0x10 [zfs]
Aug 14 21:54:40 proxmox1 kernel:  write_cache_pages+0x51/0xc0
Aug 14 21:54:40 proxmox1 kernel:  zpl_writepages+0xb1/0x1c0 [zfs]
Aug 14 21:54:40 proxmox1 kernel:  do_writepages+0xe1/0x280
Aug 14 21:54:40 proxmox1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 21:54:40 proxmox1 kernel:  ? select_task_rq_fair+0x90c/0x22b0
Aug 14 21:54:40 proxmox1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 21:54:40 proxmox1 kernel:  ? sched_balance_find_src_group+0x53/0x620
Aug 14 21:54:40 proxmox1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 21:54:40 proxmox1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 21:54:40 proxmox1 kernel:  __writeback_single_inode+0x44/0x350
Aug 14 21:54:40 proxmox1 kernel:  writeback_sb_inodes+0x255/0x550
Aug 14 21:54:40 proxmox1 kernel:  __writeback_inodes_wb+0x54/0x100
Aug 14 21:54:40 proxmox1 kernel:  ? queue_io+0x113/0x120
Aug 14 21:54:40 proxmox1 kernel:  wb_writeback+0x1ac/0x330
Aug 14 21:54:40 proxmox1 kernel:  ? get_nr_inodes+0x41/0x70
Aug 14 21:54:40 proxmox1 kernel:  wb_workfn+0x351/0x410
Aug 14 21:54:40 proxmox1 kernel:  process_one_work+0x175/0x350
Aug 14 21:54:40 proxmox1 kernel:  worker_thread+0x34a/0x480
Aug 14 21:54:40 proxmox1 kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 14 21:54:40 proxmox1 kernel:  kthread+0xfc/0x230
Aug 14 21:54:40 proxmox1 kernel:  ? __pfx_kthread+0x10/0x10
Aug 14 21:54:40 proxmox1 kernel:  ret_from_fork+0x47/0x70
Aug 14 21:54:40 proxmox1 kernel:  ? __pfx_kthread+0x10/0x10
Aug 14 21:54:40 proxmox1 kernel:  ret_from_fork_asm+0x1a/0x30
Aug 14 21:54:40 proxmox1 kernel:  </TASK>
Aug 14 21:54:40 proxmox1 kernel: ---[ end trace 0000000000000000 ]---
Aug 14 21:54:40 proxmox1 pmxcfs[1241]: [database] crit: commit transaction failed: database or disk is full#010
Aug 14 21:54:40 proxmox1 pmxcfs[1241]: [database] crit: rollback transaction failed: cannot rollback - no transaction is active#010
Aug 14 21:54:40 proxmox1 pvescheduler[417990]: unable to delete old temp file: Input/output error
Aug 14 21:54:40 proxmox1 pvescheduler[417990]: snapshot 'vzdump' was not (fully) removed - CT is locked (backup)
Aug 14 21:54:40 proxmox1 pvescheduler[417990]: ERROR: Backup of VM 104 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file>
Aug 14 21:54:40 proxmox1 pvescheduler[417990]: ERROR: Backup of VM 105 failed - unable to create temporary directory '/var/tmp/vzdumptmp417990_105' at /usr/share/perl5/PVE/VZDump.pm line 1048.
Aug 14 21:54:40 proxmox1 pvescheduler[417990]: ERROR: Backup of VM 106 failed - unable to create temporary directory '/var/tmp/vzdumptmp417990_106' at /usr/share/perl5/PVE/VZDump.pm line 1048.
Aug 14 21:54:40 proxmox1 pvescheduler[417990]: ERROR: Backup of VM 107 failed - unable to create temporary directory '/var/tmp/vzdumptmp417990_107' at /usr/share/perl5/PVE/VZDump.pm line 1048.
Aug 14 21:54:40 proxmox1 pvescheduler[417990]: INFO: Backup job finished with errors
Aug 14 21:54:41 proxmox1 perl[417990]: notified via target `mail-external`
Aug 14 21:54:41 proxmox1 pvescheduler[417990]: job errors
Aug 14 21:54:48 proxmox1 pvestatd[1410]: unable to close file '/var/log/pve/tasks/index' - No space left on device
Aug 14 21:54:48 proxmox1 pvestatd[1410]: unable to open file '/var/log/pve/tasks/active.tmp.1410' - No space left on device
Aug 14 21:55:13 proxmox1 pvescheduler[503437]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Aug 14 21:55:13 proxmox1 pvescheduler[503436]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout

It looks like the disk filled when creating the zstd backup file. There are no errors in the zpool:

Code:

# zpool status -v
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:20 with 0 errors on Sun Aug 10 00:24:21 2025
config:

        NAME                                                                                                                   STATE     READ WRITE CKSUM
        rpool                                                                                                                  ONLINE       0     0     0
          mirror-0                                                                                                             ONLINE       0     0     0
            nvme-nvme.10ec-50333230574342423235303630353032313137-50617472696f74204d2e322050333230203531324742-00000001-part3  ONLINE       0     0     0
            ata-RS512GSSD310_EC120601A008972-part3                                                                             ONLINE       0     0     0

errors: No known data errors

And no SMART errors for either disk (they're both brand new).

If this is caused by running out of space, it shouldn't cause a kernel oops though should it? I'd expect an ENOSPC error and a message to that effect to the user.

waltar · Aug 16, 2025

You were running out of space and the error is quiet ok as you are using zfs. The kernel would send ENOSPC when having standard linux filesystem which is full but in zfs you have a buildin space limit which you cannot write without adjusting the limit. The reason is that in a cow filesystem you need space for deleting data which would not be possible anymore without the limit and so the end of space comes unexpected as diskspace run into zfs limit and not device full.

news · Aug 16, 2025

do you "tune" the zfs datapool?
And set a zfs quota on the top/ first dataset of this pool?
zfs get quota <zfs-dataset>
zfs set quota=<quota-size> <zfs-datase> # aprox. 80 % of the max space

UdoB · Aug 16, 2025

news said:
zfs get quota <zfs-pool>

Sorry, I need to nitpick - just for the records: "<zfs-pool>" has to be "<filesystem|volume|snapshot>", not a pool. (See man zfs-set)

Usually you get a pool named "rpool" and the top-level dataset is given the same name. I found this to be confusing from the dawn of time :-(

I am sure you know that. This remark is just for newcomers...

news · Aug 16, 2025

@UdoB yes you are right, it has to be.
And i mean real the base zfs dataset name, with proxmox ve we get rpool.

Example from my desktop pc with zfs als datastorrage:

Code:

$ zpool list rpool
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool   928G   221G   707G  

$ zfs get quota rpool
NAME   PROPERTY  VALUE  SOURCE
rpool  quota     745G   local

steev · Aug 16, 2025

I'm new to zfs after using btrfs for years. In btrfs, the kernel gives ENOSPC when it runs out of space for whatever reason so seeing IO/errors and kernel BUG_ONs for a common and completely predictable scenario seems like a bug and not intended design. Even if the "end of space" comes unexpectedly, the kernel still knows it's run out of space so should tell userspace that. I suppose it's a kernel thing not a Proxmox thing.

I didn't set any limits, just used the installer defaults.

Search

Search

zfs raid1 kernel oops when disk full?

steev

New Member

waltar

Famous Member

news

Renowned Member

UdoB

Distinguished Member

news

Renowned Member

steev

New Member

We value your privacy