I just added a second Proxmox 8.3 node with a 14tb HDD intended to be used as network storage. I followed this guide to set up an unprivileged container running a Samba server to share the HDD to my mixed-OS network.
This works fine when sending from other devices. However I also mounted the share on the local node and tried backing up my local VMs, and noticed this would fail after a few GB of copying, lock up the fileserver LXC, and require a forceful reset of the node. (I tried a variety of things to recover gracefully and a hard reset is the only thing I've found success with to recover)
I started testing with `dd` to eliminate the backup job part of it, and found a few things:
- Limiting to ~60 MB/s transfer speed is successful. Anything above ~80MB/s the smbd will lock up after 8-12GB.
- Transfers from other devices (other proxmox node, Mac, PC) all seem to succeed fine, and get 150-250MB/s speeds
- If I set the `dd` to compressible 0-filled data it will have no problem and hit high speeds. Problem crops up if I use random data.
- Tried various changes to smb.conf as found online and no improvements
- Tried increasing the resources for the fileserver container from 512mb RAM/2 cores to 2048mb ram/4 cores, and slight improvement (gets a few more GB before failing)
- When in failed state the `smbd` process is pegged at max CPU of the container. Even if the container terminal is responsive, I can't do anything in there to kill the server and recover gracefully. (zombie smbd process consuming 200% CPU when 2 cores allocated)
- I tried spinning up a different Samba container - this time using a turnkey linux filesharing template. This fails in the exact same way.
I could work around this by just not using the samba share on the local node, or limiting the speeds, but I'd like to have a better understanding of whats going on or a cleaner solution if possible.
This works fine when sending from other devices. However I also mounted the share on the local node and tried backing up my local VMs, and noticed this would fail after a few GB of copying, lock up the fileserver LXC, and require a forceful reset of the node. (I tried a variety of things to recover gracefully and a hard reset is the only thing I've found success with to recover)
I started testing with `dd` to eliminate the backup job part of it, and found a few things:
- Limiting to ~60 MB/s transfer speed is successful. Anything above ~80MB/s the smbd will lock up after 8-12GB.
- Transfers from other devices (other proxmox node, Mac, PC) all seem to succeed fine, and get 150-250MB/s speeds
- If I set the `dd` to compressible 0-filled data it will have no problem and hit high speeds. Problem crops up if I use random data.
- Tried various changes to smb.conf as found online and no improvements
- Tried increasing the resources for the fileserver container from 512mb RAM/2 cores to 2048mb ram/4 cores, and slight improvement (gets a few more GB before failing)
- When in failed state the `smbd` process is pegged at max CPU of the container. Even if the container terminal is responsive, I can't do anything in there to kill the server and recover gracefully. (zombie smbd process consuming 200% CPU when 2 cores allocated)
- I tried spinning up a different Samba container - this time using a turnkey linux filesharing template. This fails in the exact same way.
I could work around this by just not using the samba share on the local node, or limiting the speeds, but I'd like to have a better understanding of whats going on or a cleaner solution if possible.