Random ZFS Replication failed exit code 4

dendi

Renowned Member
Nov 17, 2011
126
8
83
Hello,

I receive many emails with this error:

Code:
command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=HOSTNAME' root@1.2.3.4 -- pvesr prepare-local-job 184-0 --scan kvm kvm:vm-184-disk-0 --last_sync 1633621331' failed: exit code 4

The 9 node cluster worked very well from a year but now I added VMs (120 Vms now) and I think this is due to a timeout with zfs listing on the receiving node.

The backup node contains all the replicas and does nothing (no runningVMs):

Code:
zpool status -v
  pool: kvm
 state: ONLINE
  scan: scrub repaired 0B in 0 days 07:26:51 with 0 errors on Sun Sep 12 07:50:52 2021
config:

    NAME                                                     STATE     READ WRITE CKSUM
    kvm                                                      ONLINE       0     0     0
      mirror-0                                               ONLINE       0     0     0
        ata-HGST_HUH721008ALE600_2SG05MBF-part4              ONLINE       0     0     0
        ata-HGST_HUH721008ALE600_2SG08U8F-part4              ONLINE       0     0     0
    logs   
      mirror-1                                               ONLINE       0     0     0
        ata-Micron_5100_MTFDDAK240TCB_18271D4AB52F-part3     ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD240HAFV-00003_S16LNYAD904567-part3  ONLINE       0     0     0

errors: No known data errors

Code:
ii  pve-cluster                          6.2-1                           amd64        "pmxcfs" distributed cluster filesystem for Proxmox Virtual Environment.
ii  pve-container                        3.2-3                           all          Proxmox VE Container management tool
ii  pve-docs                             6.2-6                           all          Proxmox VE Documentation
ii  pve-edk2-firmware                    2.20200531-1                    all          edk2 based firmware modules for virtual machines
ii  pve-firewall                         4.1-3                           amd64        Proxmox VE Firewall
ii  pve-firmware                         3.1-3                           all          Binary firmware code for the pve-kernel
ii  pve-ha-manager                       3.1-1                           amd64        Proxmox VE HA Manager
ii  pve-i18n                             2.2-2                           all          Internationalization support for Proxmox VE
ii  pve-kernel-5.4                       6.3-1                           all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.4.73-1-pve              5.4.73-1                        amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-helper                    6.3-1                           all          Function for various kernel maintenance tasks.
ii  pve-lxc-syscalld                     0.9.1-1                         amd64        PVE LXC syscall daemon
ii  pve-manager                          6.2-15                          amd64        Proxmox Virtual Environment Management Tools
ii  pve-qemu-kvm                         5.1.0-6                         amd64        Full virtualization on x86 hardware
ii  pve-xtermjs                          4.7.0-2                         amd64        Binaries built from the Rust termproxy crate
ii  pve-zsync                            2.0-3                           all          Proxmox VE ZFS syncing tool
ii  smartmontools                        7.1-pve2                        amd64        control and monitor storage systems using S.M.A.R.T.
ii  zfs-zed                              0.8.5-pve1                      amd64        OpenZFS Event Daemon
ii  zfsutils-linux                       0.8.5-pve1                      amd64        command-line tools to manage OpenZFS filesystems

When I try the command
Code:
zfs list -o name,volsize,origin,type,refquota -t volume,filesystem -Hrp
sometime the output is very fast and I think it is cached, But sometime it takes 5-10 seconds.
I think this causes the random replication problem.

Is there a way to improve caching? I have now two logs ssd but I can use them for something else
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!