Hello,
I'm struggling with this since weeks which is why I now open this thread.
I've bought a new Supermicro server recently with the following configuration:
- AMD EPYC 7402P 24c/48t CPU
- 256GB ECC RAM
- 2x Samsung EVO 1TB m.2 NVMe SSD (on board)
- 2x Samsung PM983 2,5" U.2 NVMe SSD
- 2x 1TB 2,5" SATA HDD
What I would like to have configuration wise (feel free to suggest something better):
- Proxmox 6.1
- ZFS Raid 1 on the Samsung EVOs for VM storage as well as running Proxmox on them
- ZFS Raid 1 on the Samsung PM983s for VM storage
- ZFS Raid 1 on the HDDs for VMs not needing too much IO as well as log storage as to not wear out the SSDs quite as fast
What I've done:
- I've set up the config above with proxmox default settings (8k block size, ashift12, lz4 compression)
- Installed a test Windows Server 2019 VM (150GB SCSI, VirtIO NIC, 6 Cores, 4GB RAM)
- Copied a large test file to this VM (~6GB file)
- Then made a copy of this file in the VM locally.
The issue I encounter:
- While it copies this file, the host machine goes up to almost 100% CPU on all cores.
- Over the span of some minutes, memory consumption on the host goes to 130+GB with only one VM running. This can be accelerated by repeating the copy operation in the VM.
- Transfer speeds for this copy operation are ~300-600MB/s . I know that the EVOs are not enterprise SSDs but they are rated for 3+GB/s and they more or less deliver these speeds if I install plain Windows on the Host. Also, it's not a "steady" copy, but rather does 600MB/s, then 0, then 350 and so on until it finishes.
- atop shows 100% busy time on the SSDs in question
- During the copy, there are many threads of the "zvol" process which seem to lock up the system.
What I've checked/tried:
- Lowering the max. zvol thread count to 8. Not good, even worse performance
- Changing the amount of RAM ZFS can use for caching. Didn't help either
- Changing the block size to 128k. This actually helped. CPU "only" at around 50%, transfer speeds around 1GB/s, memory consumption still high but if it get's released when VMs need it that's fine for me. But the speeds are still far from what they should be which leads me to believe that it's indeed a configuration issue
- Tried out LVM instead of ZFS. Works as intended. Fast, no CPU issues during copy, normal RAM consumption, but I'd like to set up a supported ZFS Raid 1 instead of LVM/mdraid.
I would be very thankful if someone could give me some ideas as what the cause could be and, ideally, on how to make this server perform as it should
Thanks!
Regards,
tsoo
I'm struggling with this since weeks which is why I now open this thread.
I've bought a new Supermicro server recently with the following configuration:
- AMD EPYC 7402P 24c/48t CPU
- 256GB ECC RAM
- 2x Samsung EVO 1TB m.2 NVMe SSD (on board)
- 2x Samsung PM983 2,5" U.2 NVMe SSD
- 2x 1TB 2,5" SATA HDD
What I would like to have configuration wise (feel free to suggest something better):
- Proxmox 6.1
- ZFS Raid 1 on the Samsung EVOs for VM storage as well as running Proxmox on them
- ZFS Raid 1 on the Samsung PM983s for VM storage
- ZFS Raid 1 on the HDDs for VMs not needing too much IO as well as log storage as to not wear out the SSDs quite as fast
What I've done:
- I've set up the config above with proxmox default settings (8k block size, ashift12, lz4 compression)
- Installed a test Windows Server 2019 VM (150GB SCSI, VirtIO NIC, 6 Cores, 4GB RAM)
- Copied a large test file to this VM (~6GB file)
- Then made a copy of this file in the VM locally.
The issue I encounter:
- While it copies this file, the host machine goes up to almost 100% CPU on all cores.
- Over the span of some minutes, memory consumption on the host goes to 130+GB with only one VM running. This can be accelerated by repeating the copy operation in the VM.
- Transfer speeds for this copy operation are ~300-600MB/s . I know that the EVOs are not enterprise SSDs but they are rated for 3+GB/s and they more or less deliver these speeds if I install plain Windows on the Host. Also, it's not a "steady" copy, but rather does 600MB/s, then 0, then 350 and so on until it finishes.
- atop shows 100% busy time on the SSDs in question
- During the copy, there are many threads of the "zvol" process which seem to lock up the system.
What I've checked/tried:
- Lowering the max. zvol thread count to 8. Not good, even worse performance
- Changing the amount of RAM ZFS can use for caching. Didn't help either
- Changing the block size to 128k. This actually helped. CPU "only" at around 50%, transfer speeds around 1GB/s, memory consumption still high but if it get's released when VMs need it that's fine for me. But the speeds are still far from what they should be which leads me to believe that it's indeed a configuration issue
- Tried out LVM instead of ZFS. Works as intended. Fast, no CPU issues during copy, normal RAM consumption, but I'd like to set up a supported ZFS Raid 1 instead of LVM/mdraid.
I would be very thankful if someone could give me some ideas as what the cause could be and, ideally, on how to make this server perform as it should
Thanks!
Regards,
tsoo