zfs rpool performance issue (deadman)

jhyland

New Member
Oct 1, 2024
2
0
1
I'm running proxmox on bare metal. I currently have one lxc container (casaos) which runs about 15 docker containers. The rootfs for proxmox is on 2 2tb SSDs in a ZFS mirror. Its name is rpool. I have another ZFS pool named tank that is about 15tb of 6 disks. Everything is working functionally. But the performance is very poor. I think I've narrowed it down to the rpool. I see a lot of deadman events when the system becomes less responsive. This is my first lab build, so I've tried some things such as dedup that I wish I hadn't. I'm inclined to reinstall proxmox on a single 2T SSD using ext4. I'm hoping this will resolve the problem. I don't want to lose my my proxmox config or my one lxc. I have a vzdump of that on an external HDD. I also don't want to lose my larger tank pool. Do you recommend to continue to try to troubleshoot this or should I resinstall and what would the steps for that be? Thanks!

Code:
Oct 03 17:46:40 pve zed[3365279]: eid=37239 class=deadman pool='rpool' vdev=ata-JAJS600M2TB_AA202200000031005220-part3 size=20480 offset=156177002496 priority=3 err=0 flags=0x180080 bookmark=0:20681:0:6193
Oct 03 17:46:40 pve pve-ha-lrm[3010]: loop take too long (33 seconds)
Oct 03 17:46:40 pve pve-firewall[2964]: firewall update time (29.296 seconds)
Oct 03 17:46:40 pve pvestatd[2980]: status update time (29.479 seconds)
Oct 03 17:46:40 pve pmxcfs[2819]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-node/pve: -1
Oct 03 17:46:40 pve pmxcfs[2819]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-node/pve: /var/lib/rrdcached/db/pve2-node/pve: illegal attempt to update using time 1727991991 when last update time is 1727991991 (minimum one second step)
Oct 03 17:46:40 pve pmxcfs[2819]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/102: -1
Oct 03 17:46:40 pve pmxcfs[2819]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve/nas-zfs: -1
Oct 03 17:46:40 pve pmxcfs[2819]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve/local-zfs: -1
Oct 03 17:46:40 pve pmxcfs[2819]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve/local: -1
Oct 03 17:46:40 pve pmxcfs[2819]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve/USB16tb: -1
Oct 03 17:47:25 pve zed[2544]: Missed 95 events
Oct 03 17:47:25 pve zed[3365791]: eid=37241 class=deadman pool='rpool' vdev=ata-JAJS600M2TB_AA202200000031005220-part3 size=16384 offset=1065359745024 priority=3 err=0 flags=0x180080 bookmark=0:20681:1:24
Oct 03 17:47:25 pve zed[3365789]: eid=37240 class=deadman pool='rpool' vdev=ata-JAJS600M2TB_AA202200000031005220-part3 size=16384 offset=155069849600 priority=3 err=0 flags=0x180080 bookmark=0:20681:1:24
Oct 03 17:47:25 pve zed[3365793]: eid=37242 class=deadman pool='rpool' vdev=ata-JAJS600M2TB_AA202200000031005220-part3 size=16384 offset=639922794496 priority=3 err=0 flags=0x180080 bookmark=0:20681:1:24
 
hey,

from my experience, your problem can provide from dedup fonctionnality.

WARNING: Don't do my error:
Don't désactivate dedup on the deduplicated pool.

Create a new dataset, and desactivate
on HIM the dedup fonctionnality.

Use your GUI VE to migrate properly yours disks on the new undeduplicated dataset.

When all of yours disks was migrated, IF you have created a dedicated dataset with the dedup fonctionnality, you can safely remove him.
BUT if you've done dedup on your zpool, dont deactivate dedup directly on your pool, or you gonna loose yours datas.
 
Without providing any information about the hardware used, I suspect that you used consumer or prosumer drives, which perform terrible in ZFS (and CEPH), so you may be better off with just ext4 or LVM.

This is my first lab build, so I've tried some things such as dedup that I wish I hadn't.
Yes, you need to recreate the pool to get rid of the dedup table in your I/O path.

I'm inclined to reinstall proxmox on a single 2T SSD using ext4.
Then do that.
 
  • Like
Reactions: Pifouney
Yes the 2 2tb SSDs in the mirrored rpool are consumer SSDs. I guess I had to find out for myself that ZFS wasn't a good solution for them. I have a backup of my 1 lxc (took over 24 hours for 75GB--seems excessive). Is there a guide for what to save from proxmox configs so once I do the reinstall I can get back old settings and import my other ZFS pool. Once I have rpool working. Then I will create the new dataset for tank (without dedup) and copy over the 6TB of data from that to the new dataset. Sound reasonable?


Code:
root@pve:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 08:29:44 with 0 errors on Sun Sep 22 01:36:01 2024
config:

        NAME                                            STATE     READ WRITE CKSUM
        rpool                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            ata-JAJS600M2TB_AA202200000031005220-part3  ONLINE       0     0     0
            ata-JAJS600M2TB_AA202100000031003725-part3  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 03:49:36 with 0 errors on Sun Sep  8 04:22:14 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        tank                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            wwn-0x5000cca22ce7b7d3                ONLINE       0     0     0
            wwn-0x5000039ff4e58bde                ONLINE       0     0     0
            wwn-0x5000039fe6c02f0f                ONLINE       0     0     0
            wwn-0x5000cca22cd2fe02                ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA11L8JK  ONLINE       0     0     0
            wwn-0x50014ee214f4efbc                ONLINE       0     0     0
        logs
          wwn-0x5001b44a2a1096c5                  ONLINE       0     0     0
        cache
          wwn-0x50025388a00a05ed                  ONLINE       0     0     0

errors: No known data errors
root@pve:~#
 
Is there a guide for what to save from proxmox configs so once I do the reinstall I can get back old settings and import my other ZFS pool.
That's easy: The ones you changed ;) We don't know what you did where.

In general, just backup your machines to an external box (e.g. cifs of nfs mount from another machine). Most stuff, PVE changes are stored in the virtual filesystem in /etc/pve, so maybe copy that, too.
 
  • Like
Reactions: Pifouney

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!