Little remark: MS SQL can now run on Linux, too.

Wow, I almost flipped out of my chair when I seen your message!!
That is amazing! unfortunately I still need windows server for other things beyond ms sql but that was one of the reasons for sure.
 
I just ran a test on my Samsung Pro 950 SSD 256gb

RAND-WRITE TEST:
Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
4kqd32_read: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/355.8MB/0KB /s] [0/91.8K/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=3928: Tue Dec 13 04:02:36 2016
  Description  : [4K QD32]
  write: io=10810MB, bw=368956KB/s, iops=92239, runt= 30001msec
  slat (usec): min=4, max=3019.6K, avg= 8.69, stdev=1819.83
  clat (usec): min=1, max=3309.1K, avg=169.36, stdev=3294.51
  lat (usec): min=18, max=3333.8K, avg=178.05, stdev=3855.91
  clat percentiles (usec):
  |  1.00th=[  25],  5.00th=[  39], 10.00th=[  52], 20.00th=[  78],
  | 30.00th=[  104], 40.00th=[  131], 50.00th=[  157], 60.00th=[  183],
  | 70.00th=[  209], 80.00th=[  235], 90.00th=[  266], 95.00th=[  318],
  | 99.00th=[  454], 99.50th=[  548], 99.90th=[  996], 99.95th=[ 2128],
  | 99.99th=[ 5728]
  lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.45%, 50=8.81%
  lat (usec) : 100=18.97%, 250=57.13%, 500=13.95%, 750=0.51%, 1000=0.07%
  lat (msec) : 2=0.05%, 4=0.04%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec) : >=2000=0.01%
  cpu  : usr=6.67%, sys=76.67%, ctx=0, majf=0, minf=0
  IO depths  : 1=0.1%, 2=1.2%, 4=11.7%, 8=27.7%, 16=55.9%, 32=3.5%, >=64=0.0%
  submit  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  complete  : 0=0.0%, 4=96.6%, 8=0.1%, 16=0.1%, 32=3.4%, 64=0.0%, >=64=0.0%
  issued  : total=r=0/w=2767263/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
  latency  : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: io=10810MB, aggrb=368956KB/s, minb=368956KB/s, maxb=368956KB/s, mint=30001msec, maxt=30001msec

READ TEST:
Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
4kqd32_read: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=0): [f(1)] [100.0% done] [252.9MB/0KB/0KB /s] [64.8K/0/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=3688: Wed Dec 14 02:33:16 2016
  Description  : [4K QD32]
  read : io=13490MB, bw=460428KB/s, iops=115106, runt= 30001msec
  slat (usec): min=3, max=340, avg= 6.60, stdev= 2.17
  clat (usec): min=0, max=8624, avg=152.31, stdev=117.12
  lat (usec): min=17, max=8632, avg=158.91, stdev=117.67
  clat percentiles (usec):
  |  1.00th=[  23],  5.00th=[  33], 10.00th=[  47], 20.00th=[  74],
  | 30.00th=[  98], 40.00th=[  121], 50.00th=[  145], 60.00th=[  167],
  | 70.00th=[  191], 80.00th=[  213], 90.00th=[  237], 95.00th=[  266],
  | 99.00th=[  410], 99.50th=[  636], 99.90th=[ 1672], 99.95th=[ 1848],
  | 99.99th=[ 2008]
  lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.92%, 50=9.75%
  lat (usec) : 100=20.13%, 250=62.98%, 500=5.52%, 750=0.30%, 1000=0.01%
  lat (msec) : 2=0.38%, 4=0.01%, 10=0.01%
  cpu  : usr=6.67%, sys=86.67%, ctx=0, majf=0, minf=0
  IO depths  : 1=0.1%, 2=1.4%, 4=10.7%, 8=27.0%, 16=57.0%, 32=4.0%, >=64=0.0%
  submit  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  complete  : 0=0.0%, 4=96.6%, 8=0.1%, 16=0.1%, 32=3.4%, 64=0.0%, >=64=0.0%
  issued  : total=r=3453324/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
  latency  : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  READ: io=13490MB, aggrb=460427KB/s, minb=460427KB/s, maxb=460427KB/s, mint=30001msec, maxt=30001msec
 
What is random (16)? Please use fio and report back.

Is it possible that my Samsung PCIE NVME SSD is almost the same speed then my Samsung PRO 950 SSD SATA ?
Am I reading the rest results correctly ?
How is that possible ? When I test the two on a physical server the nvme pcie ssd kills the pro 950 sata in everything.


I just ran a fio test using a samsung pro 950 ssd sata.
I have included the results in the previous message and I also have the fio test from my nvme pcie ssd that I posted the day before (on page 1).

LnxBil, can you please read both of the test results just to confirm that I'm not making a mistake.
if this is true then there is really no need to use a nvme pcie ssd in proxmox (if running windows guests)
 
Last edited:
My test was from a 950 Pro 512 GB, maybe the difference is coming from that fact, not from some Windows-slowness, but we'll see

Here my Test on Windows 10, 64-bit, 1607 run as Administrator (using your example from Page 1)

Code:
E:\bin\fio\2.15>fio C:\fio.test
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
4kqd32_read: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [r(1)] [100.0% done] [891.8MB/0KB/0KB /s] [228K/0/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=10280: Wed Dec 14 17:56:21 2016
  Description  : [4K QD32]
  read : io=26528MB, bw=905463KB/s, iops=226365, runt= 30001msec
    slat (usec): min=2, max=143, avg= 3.63, stdev= 2.43
    clat (usec): min=12, max=7200, avg=123.96, stdev=49.74
     lat (usec): min=62, max=7203, avg=127.58, stdev=49.70
    clat percentiles (usec):
     |  1.00th=[   73],  5.00th=[   81], 10.00th=[   87], 20.00th=[   94],
     | 30.00th=[  100], 40.00th=[  106], 50.00th=[  113], 60.00th=[  122],
     | 70.00th=[  133], 80.00th=[  149], 90.00th=[  175], 95.00th=[  199],
     | 99.00th=[  262], 99.50th=[  290], 99.90th=[  370], 99.95th=[  410],
     | 99.99th=[ 1864]
    lat (usec) : 20=0.01%, 50=0.01%, 100=29.11%, 250=69.56%, 500=1.30%
    lat (usec) : 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=13.33%, sys=70.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=73.2%, 32=26.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=94.5%, 8=3.7%, 16=1.5%, 32=0.3%, 64=0.0%, >=64=0.0%
     issued    : total=r=6791197/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=26528MB, aggrb=905462KB/s, minb=905462KB/s, maxb=905462KB/s, mint=30001msec, maxt=30001msec


E:\bin\fio\2.15>fio C:\fio.test
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/308.7MB/0KB /s] [0/79.2K/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=6828: Wed Dec 14 17:57:51 2016
  Description  : [4K QD32]
  write: io=9929.3MB, bw=338906KB/s, iops=84726, runt= 30001msec
    slat (usec): min=2, max=198, avg= 4.66, stdev= 2.20
    clat (usec): min=19, max=41468, avg=366.38, stdev=499.59
     lat (usec): min=23, max=41472, avg=371.03, stdev=499.61
    clat percentiles (usec):
     |  1.00th=[  102],  5.00th=[  126], 10.00th=[  139], 20.00th=[  165],
     | 30.00th=[  199], 40.00th=[  235], 50.00th=[  274], 60.00th=[  318],
     | 70.00th=[  378], 80.00th=[  466], 90.00th=[  652], 95.00th=[  820],
     | 99.00th=[ 1640], 99.50th=[ 2704], 99.90th=[ 7008], 99.95th=[ 8512],
     | 99.99th=[12736]
    lat (usec) : 20=0.01%, 50=0.05%, 100=0.79%, 250=42.99%, 500=38.58%
    lat (usec) : 750=10.89%, 1000=4.17%
    lat (msec) : 2=1.76%, 4=0.48%, 10=0.25%, 20=0.03%, 50=0.01%
  cpu          : usr=6.67%, sys=33.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=48.1%, 32=51.7%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.0%, 8=1.6%, 16=0.4%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=2541881/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: io=9929.3MB, aggrb=338906KB/s, minb=338906KB/s, maxb=338906KB/s, mint=30001msec, maxt=30001msec

So, there is "only" 228k IOPS on Windows, as Proxmox VE had 331k IOPS.

Are you sure your NVMe slot is actually a fully 4 lanes slot and not a capped mSATA slot?
 
My test was from a 950 Pro 512 GB, maybe the difference is coming from that fact, not from some Windows-slowness, but we'll see

Here my Test on Windows 10, 64-bit, 1607 run as Administrator (using your example from Page 1)

Code:
E:\bin\fio\2.15>fio C:\fio.test
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
4kqd32_read: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [r(1)] [100.0% done] [891.8MB/0KB/0KB /s] [228K/0/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=10280: Wed Dec 14 17:56:21 2016
  Description  : [4K QD32]
  read : io=26528MB, bw=905463KB/s, iops=226365, runt= 30001msec
    slat (usec): min=2, max=143, avg= 3.63, stdev= 2.43
    clat (usec): min=12, max=7200, avg=123.96, stdev=49.74
     lat (usec): min=62, max=7203, avg=127.58, stdev=49.70
    clat percentiles (usec):
     |  1.00th=[   73],  5.00th=[   81], 10.00th=[   87], 20.00th=[   94],
     | 30.00th=[  100], 40.00th=[  106], 50.00th=[  113], 60.00th=[  122],
     | 70.00th=[  133], 80.00th=[  149], 90.00th=[  175], 95.00th=[  199],
     | 99.00th=[  262], 99.50th=[  290], 99.90th=[  370], 99.95th=[  410],
     | 99.99th=[ 1864]
    lat (usec) : 20=0.01%, 50=0.01%, 100=29.11%, 250=69.56%, 500=1.30%
    lat (usec) : 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=13.33%, sys=70.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=73.2%, 32=26.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=94.5%, 8=3.7%, 16=1.5%, 32=0.3%, 64=0.0%, >=64=0.0%
     issued    : total=r=6791197/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=26528MB, aggrb=905462KB/s, minb=905462KB/s, maxb=905462KB/s, mint=30001msec, maxt=30001msec


E:\bin\fio\2.15>fio C:\fio.test
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/308.7MB/0KB /s] [0/79.2K/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=6828: Wed Dec 14 17:57:51 2016
  Description  : [4K QD32]
  write: io=9929.3MB, bw=338906KB/s, iops=84726, runt= 30001msec
    slat (usec): min=2, max=198, avg= 4.66, stdev= 2.20
    clat (usec): min=19, max=41468, avg=366.38, stdev=499.59
     lat (usec): min=23, max=41472, avg=371.03, stdev=499.61
    clat percentiles (usec):
     |  1.00th=[  102],  5.00th=[  126], 10.00th=[  139], 20.00th=[  165],
     | 30.00th=[  199], 40.00th=[  235], 50.00th=[  274], 60.00th=[  318],
     | 70.00th=[  378], 80.00th=[  466], 90.00th=[  652], 95.00th=[  820],
     | 99.00th=[ 1640], 99.50th=[ 2704], 99.90th=[ 7008], 99.95th=[ 8512],
     | 99.99th=[12736]
    lat (usec) : 20=0.01%, 50=0.05%, 100=0.79%, 250=42.99%, 500=38.58%
    lat (usec) : 750=10.89%, 1000=4.17%
    lat (msec) : 2=1.76%, 4=0.48%, 10=0.25%, 20=0.03%, 50=0.01%
  cpu          : usr=6.67%, sys=33.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=48.1%, 32=51.7%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.0%, 8=1.6%, 16=0.4%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=2541881/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: io=9929.3MB, aggrb=338906KB/s, minb=338906KB/s, maxb=338906KB/s, mint=30001msec, maxt=30001msec

So, there is "only" 228k IOPS on Windows, as Proxmox VE had 331k IOPS.

Are you sure your NVMe slot is actually a fully 4 lanes slot and not a capped mSATA slot?

This is the one I bought. http://www.asrock.com/mb/Intel/Fatal1ty X99 Professional Gaming i7/
It says: 2 Ultra M.2 (PCIe Gen3 x4 & SATA3) "The PCIe Gen3 x4 Ultra M.2 interface pushes data transfer speeds up to 32Gb/s."

This is the only hard drive that I have in that system.
The one I bought was this one: http://www.newegg.com/Product/Product.aspx?Item=N82E16820147467
 
I just installed windows 10 pro 64 bit in proxmox and ran fio in windows.

Here are my results.

Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
4kqd32_read: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [r(1)] [100.0% done] [20852KB/0KB/0KB /s] [5213/0/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=3852: Thu Dec 15 03:29:26 2016
  Description  : [4K QD32]
  read : io=606960KB, bw=20228KB/s, iops=5056, runt= 30006msec
  slat (usec): min=7, max=187, avg=12.54, stdev= 3.61
  clat (usec): min=264, max=20773, avg=6294.17, stdev=2666.21
  lat (usec): min=277, max=20787, avg=6306.71, stdev=2666.18
  clat percentiles (usec):
  |  1.00th=[ 1048],  5.00th=[ 2040], 10.00th=[ 2768], 20.00th=[ 3792],
  | 30.00th=[ 4640], 40.00th=[ 5472], 50.00th=[ 6240], 60.00th=[ 7072],
  | 70.00th=[ 7904], 80.00th=[ 8768], 90.00th=[ 9792], 95.00th=[10560],
  | 99.00th=[12096], 99.50th=[12736], 99.90th=[14144], 99.95th=[14912],
  | 99.99th=[17024]
  lat (usec) : 500=0.12%, 750=0.30%, 1000=0.47%
  lat (msec) : 2=3.86%, 4=17.55%, 10=69.30%, 20=8.39%, 50=0.01%
  cpu  : usr=0.00%, sys=6.67%, ctx=0, majf=0, minf=0
  IO depths  : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
  submit  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
  issued  : total=r=151740/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
  latency  : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  READ: io=606960KB, aggrb=20227KB/s, minb=20227KB/s, maxb=20227KB/s, mint=30006msec, maxt=30006msec

Is it possible that fio in a windows guest/vm is not working correctly ? because it gave me a speed of 5213 in windows 10 pro x64 guest/vm and when I go to a windows server guest/vm it tests a lot slower ?
 
Last edited:
My test was from a 950 Pro 512 GB, maybe the difference is coming from that fact, not from some Windows-slowness, but we'll see

Here my Test on Windows 10, 64-bit, 1607 run as Administrator (using your example from Page 1)

Code:
E:\bin\fio\2.15>fio C:\fio.test
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
4kqd32_read: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [r(1)] [100.0% done] [891.8MB/0KB/0KB /s] [228K/0/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=10280: Wed Dec 14 17:56:21 2016
  Description  : [4K QD32]
  read : io=26528MB, bw=905463KB/s, iops=226365, runt= 30001msec
    slat (usec): min=2, max=143, avg= 3.63, stdev= 2.43
    clat (usec): min=12, max=7200, avg=123.96, stdev=49.74
     lat (usec): min=62, max=7203, avg=127.58, stdev=49.70
    clat percentiles (usec):
     |  1.00th=[   73],  5.00th=[   81], 10.00th=[   87], 20.00th=[   94],
     | 30.00th=[  100], 40.00th=[  106], 50.00th=[  113], 60.00th=[  122],
     | 70.00th=[  133], 80.00th=[  149], 90.00th=[  175], 95.00th=[  199],
     | 99.00th=[  262], 99.50th=[  290], 99.90th=[  370], 99.95th=[  410],
     | 99.99th=[ 1864]
    lat (usec) : 20=0.01%, 50=0.01%, 100=29.11%, 250=69.56%, 500=1.30%
    lat (usec) : 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=13.33%, sys=70.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=73.2%, 32=26.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=94.5%, 8=3.7%, 16=1.5%, 32=0.3%, 64=0.0%, >=64=0.0%
     issued    : total=r=6791197/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=26528MB, aggrb=905462KB/s, minb=905462KB/s, maxb=905462KB/s, mint=30001msec, maxt=30001msec


E:\bin\fio\2.15>fio C:\fio.test
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iodepth=32
fio-2.15
Starting 1 thread
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/308.7MB/0KB /s] [0/79.2K/0 iops] [eta 00m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=6828: Wed Dec 14 17:57:51 2016
  Description  : [4K QD32]
  write: io=9929.3MB, bw=338906KB/s, iops=84726, runt= 30001msec
    slat (usec): min=2, max=198, avg= 4.66, stdev= 2.20
    clat (usec): min=19, max=41468, avg=366.38, stdev=499.59
     lat (usec): min=23, max=41472, avg=371.03, stdev=499.61
    clat percentiles (usec):
     |  1.00th=[  102],  5.00th=[  126], 10.00th=[  139], 20.00th=[  165],
     | 30.00th=[  199], 40.00th=[  235], 50.00th=[  274], 60.00th=[  318],
     | 70.00th=[  378], 80.00th=[  466], 90.00th=[  652], 95.00th=[  820],
     | 99.00th=[ 1640], 99.50th=[ 2704], 99.90th=[ 7008], 99.95th=[ 8512],
     | 99.99th=[12736]
    lat (usec) : 20=0.01%, 50=0.05%, 100=0.79%, 250=42.99%, 500=38.58%
    lat (usec) : 750=10.89%, 1000=4.17%
    lat (msec) : 2=1.76%, 4=0.48%, 10=0.25%, 20=0.03%, 50=0.01%
  cpu          : usr=6.67%, sys=33.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=48.1%, 32=51.7%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=98.0%, 8=1.6%, 16=0.4%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=2541881/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: io=9929.3MB, aggrb=338906KB/s, minb=338906KB/s, maxb=338906KB/s, mint=30001msec, maxt=30001msec

So, there is "only" 228k IOPS on Windows, as Proxmox VE had 331k IOPS.

Are you sure your NVMe slot is actually a fully 4 lanes slot and not a capped mSATA slot?

..and here is another test from me from a physical box running windows server 2012 r2 that has a samsung nvme just like the one I have installed on my proxmox box.
Code:
fio: this platform does not support process shared mutexes, forcing use of threa
ds. Use the 'thread' option to get rid of this warning.
4kqd32_read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=windowsaio, iode
pth=32
fio-2.15
Starting 1 thread
4kqd32_read: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [r(1)] [9.7% done] [836.2MB/0KB/0KB /s] [214K/0/0 iops] [eta 00m:
Jobs: 1 (f=1): [r(1)] [12.9% done] [852.3MB/0KB/0KB /s] [218K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [16.1% done] [858.6MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [19.4% done] [858.9MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [22.6% done] [855.9MB/0KB/0KB /s] [219K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [25.8% done] [862.6MB/0KB/0KB /s] [221K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [29.0% done] [842.7MB/0KB/0KB /s] [216K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [32.3% done] [862.5MB/0KB/0KB /s] [221K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [35.5% done] [860.2MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [38.7% done] [863.6MB/0KB/0KB /s] [221K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [41.9% done] [860.2MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [46.7% done] [857.9MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [48.4% done] [862.6MB/0KB/0KB /s] [221K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [51.6% done] [858.8MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [54.8% done] [857.8MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [58.1% done] [860.4MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [61.3% done] [858.7MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [64.5% done] [855.1MB/0KB/0KB /s] [219K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [67.7% done] [857.4MB/0KB/0KB /s] [219K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [71.0% done] [852.4MB/0KB/0KB /s] [218K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [74.2% done] [854.1MB/0KB/0KB /s] [219K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [77.4% done] [858.9MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [80.6% done] [859.8MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [83.9% done] [858.7MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [90.0% done] [858.1MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [90.3% done] [859.2MB/0KB/0KB /s] [220K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [93.5% done] [856.7MB/0KB/0KB /s] [219K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [96.8% done] [862.6MB/0KB/0KB /s] [221K/0/0 iops] [eta 00m
Jobs: 1 (f=1): [r(1)] [100.0% done] [861.9MB/0KB/0KB /s] [221K/0/0 iops] [eta 00
m:00s]
4kqd32_read: (groupid=0, jobs=1): err= 0: pid=4020: Thu Dec 15 01:54:47 2016
  Description  : [4K QD32]
  read : io=24955MB, bw=851762KB/s, iops=212940, runt= 30001msec
    slat (usec): min=1, max=68, avg= 3.97, stdev= 2.56
    clat (usec): min=32, max=16147, avg=120.92, stdev=117.22
     lat (usec): min=58, max=16152, avg=124.89, stdev=117.16
    clat percentiles (usec):
     |  1.00th=[   65],  5.00th=[   74], 10.00th=[   80], 20.00th=[   89],
     | 30.00th=[   96], 40.00th=[  103], 50.00th=[  110], 60.00th=[  117],
     | 70.00th=[  125], 80.00th=[  137], 90.00th=[  157], 95.00th=[  181],
     | 99.00th=[  258], 99.50th=[  310], 99.90th=[ 2320], 99.95th=[ 2320],
     | 99.99th=[ 2416]
    lat (usec) : 50=0.01%, 100=34.78%, 250=64.11%, 500=0.85%, 750=0.02%
    lat (usec) : 1000=0.01%
    lat (msec) : 2=0.03%, 4=0.21%, 10=0.01%, 20=0.01%
  cpu          : usr=10.00%, sys=53.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=90.8%, 32=9.1%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=92.1%, 8=1.2%, 16=6.5%, 32=0.2%, 64=0.0%, >=64=0.0%
     issued    : total=r=6388427/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: io=24955MB, aggrb=851761KB/s, minb=851761KB/s, maxb=851761KB/s, mint=30
001msec, maxt=30001msec

So it looks like your 950 PRO 512 GB on proxmox is as fast as my samsung nvme pcie ssd running on a windows physical ? ?
(I have also attached a AS SSD benchmark of the win physical server)
 

Attachments

  • AS_BENCHMARK_SERVER_AT_CLS.JPG
    AS_BENCHMARK_SERVER_AT_CLS.JPG
    61.9 KB · Views: 18
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!