Hello,
I've got a server with 3x WD red 3TB setup in RAIDZ1 and 32GB ECC RAM. Another ZFS mirror is used for root.
The 3 disk storage array is shared via ZFS built in NFS with a couple of users, and has been working perfectly for the last couple of months. Until one of the users decided to copy over their entire backups hard drive. It isn't much data, around 300GB, but its spread over at least 4 million files. The first few hours it copied like expected. Then it started filling all of the ram until the machine crashed.
Now, when I copy over a file to either one of the ZFS volumes, the ram usage increases about equal to the file being copied over. e.g.
The ram swell isn't exactly the same size, sometimes its more than the file being copied, sometimes less, but it continues until all of the ram is used and the system fails. Its not regular cache as its green in htop, not orange and cached in output of free stays low.
I've tried a scrub of the volume, but that had no effect and found zero errors.
Does anyone have any idea what is going on here? This problem makes the server completely unusable. There is a backup, so if its necessary to destroy the storage volume its possible.
If you need the output of a command or content of a log, please ask.
This is the zpool configuration:
I've got a server with 3x WD red 3TB setup in RAIDZ1 and 32GB ECC RAM. Another ZFS mirror is used for root.
The 3 disk storage array is shared via ZFS built in NFS with a couple of users, and has been working perfectly for the last couple of months. Until one of the users decided to copy over their entire backups hard drive. It isn't much data, around 300GB, but its spread over at least 4 million files. The first few hours it copied like expected. Then it started filling all of the ram until the machine crashed.
Now, when I copy over a file to either one of the ZFS volumes, the ram usage increases about equal to the file being copied over. e.g.
Code:
root@rho:~# free -h
total used free shared buffers cached
Mem: 31G 4.4G 26G 49M 1.8M 91M
-/+ buffers/cache: 4.4G 26G
Swap: 8.0G 0B 8.0G
root@rho:~# free
total used free shared buffers cached
Mem: 32792720 4665476 28127244 50236 1828 93856
-/+ buffers/cache: 4569792 28222928
Swap: 8388604 0 8388604
root@rho:~# dd if=/dev/urandom of=testfile bs=1M count=500
500+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 42.6863 s, 12.3 MB/s
root@rho:~# free -h
total used free shared buffers cached
Mem: 31G 4.7G 26G 49M 1.8M 91M
-/+ buffers/cache: 4.7G 26G
Swap: 8.0G 0B 8.0G
root@rho:~# free
total used free shared buffers cached
Mem: 32792720 4975820 27816900 50236 1828 93856
-/+ buffers/cache: 4880136 27912584
Swap: 8388604 0 8388604
I've tried a scrub of the volume, but that had no effect and found zero errors.
Does anyone have any idea what is going on here? This problem makes the server completely unusable. There is a backup, so if its necessary to destroy the storage volume its possible.
If you need the output of a command or content of a log, please ask.
This is the zpool configuration:
Code:
root@rho:~# zpool get all
NAME PROPERTY VALUE SOURCE
hdd size 8.12T -
hdd capacity 37% -
hdd altroot - default
hdd health ONLINE -
hdd guid 8250828831358797934 default
hdd version - default
hdd bootfs - default
hdd delegation on default
hdd autoreplace off default
hdd cachefile - default
hdd failmode wait default
hdd listsnapshots off default
hdd autoexpand off default
hdd dedupditto 0 default
hdd dedupratio 1.00x -
hdd free 5.11T -
hdd allocated 3.02T -
hdd readonly off -
hdd ashift 0 default
hdd comment - default
hdd expandsize - -
hdd freeing 0 default
hdd fragmentation 22% -
hdd leaked 0 default
hdd feature@async_destroy enabled local
hdd feature@empty_bpobj active local
hdd feature@lz4_compress active local
hdd feature@spacemap_histogram active local
hdd feature@enabled_txg active local
hdd feature@hole_birth active local
hdd feature@extensible_dataset enabled local
hdd feature@embedded_data active local
hdd feature@bookmarks enabled local
hdd feature@filesystem_limits enabled local
hdd feature@large_blocks enabled local
rpool size 222G -
rpool capacity 2% -
rpool altroot - default
rpool health ONLINE -
rpool guid 4109231484567507720 default
rpool version - default
rpool bootfs rpool/ROOT/pve-1 local
rpool delegation on default
rpool autoreplace off default
rpool cachefile - default
rpool failmode wait default
rpool listsnapshots off default
rpool autoexpand off default
rpool dedupditto 0 default
rpool dedupratio 1.00x -
rpool free 217G -
rpool allocated 4.76G -
rpool readonly off -
rpool ashift 12 local
rpool comment - default
rpool expandsize - -
rpool freeing 0 default
rpool fragmentation 0% -
rpool leaked 0 default
rpool feature@async_destroy enabled local
rpool feature@empty_bpobj active local
rpool feature@lz4_compress active local
rpool feature@spacemap_histogram active local
rpool feature@enabled_txg active local
rpool feature@hole_birth active local
rpool feature@extensible_dataset enabled local
rpool feature@embedded_data active local
rpool feature@bookmarks enabled local
rpool feature@filesystem_limits enabled local
rpool feature@large_blocks enabled local