[SOLVED] VMs that are created on striped storage (RAID0) are not striped

robotmoon

New Member
Sep 19, 2024
7
1
3
I have two disks which are arranged into a striped logical volume, but when I create a VM and select that volume group as storage, the VM is stored as a new linear logical volume within that volume group instead of being stored on the already existing striped logical volume.

lvs -o+lv_layout,stripes
Code:
LV                 VG         Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Layout     #Str
  striped_storage_lv storage_vg -wi-a-----  3.49t                                                     striped       2
  vm-203-disk-0      storage_vg -wi-a----- 32.00g                                                     linear        1

The setup is two disk drives attached together in a volume group that's set up as LVM storage in Proxmox. These are meant for storing temporary VMs that need fast read/write abilities for various data processing jobs. Backups and the Proxmos OS are on separate disks (ie. if there's total data loss because of a RAID0 setup, then that's an acceptable risk).

Here are the steps I took to create it:
wipefs --all /dev/nvme0n1
wipefs --all /dev/nvme1n1
pvcreate /dev/nvme0n1 /dev/nvme1n1
vgcreate storage_vg /dev/nvme0n1 /dev/nvme1n1
lvcreate -i 2 -I 64 -L 3.49T -n striped_storage_lv storage_vg
I then add the storage via the GUI as an LVM on Datacenter for holding VMs and containers.

I've also tried this with a thin pool with the same issues
lvcreate -i 2 -I 64 -L 3.49T -c 128K --thinpool striped_storage_lv storage_vg

The NVMe drives are 4TB each. That is, the 3.49TB I allocated to the striped logical volume is just 50% of the total storage available within the volume group, so the new VMs are correctly added to the volume group, but not the logical volume.
ls /dev/storage_vg/
Output striped_storage_lv vm-203-disk-0

How do I set this up properly to have striped VM storage?
 
Last edited:
Partially solved. Thanks to this thread.

I mounted the logical volume as a directory. It wouldn't let me do that without specifying a filesystem, so I chose XFS.
mkfs.xfs /dev/storage_vg/striped_storage_lv
mount /dev/storage_vg/striped_storage_lv /mnt/striped_storage_lv/
Then in Datacenter, I added the /mnt/striped_storage as directory storage.

Then I created a VM to verify it didn't make another logical volume.
Code:
  LV                 VG         Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Layout     #Str
  striped_storage_lv storage_vg -wi-ao---- 3.49t                                                     striped       2

In the VM, I ran fio with these settings:
Code:
[global]
name=nvme-seq-read
time_based
ramp_time=5
runtime=180
readwrite=read
bs=128K
ioengine=libaio
direct=1
numjobs=1
iodepth=32
group_reporting=1

[vg]
filename=/dev/sda

The block size of 128k was chosen to match the disk's datasheet (max seq read @ 128k was listed as 6.8 GB/s). My results:
Code:
READ: bw=6015MiB/s (6307MB/s), 6015MiB/s-6015MiB/s (6307MB/s-6307MB/s), io=1057GiB (1135GB), run=180001-180001msec

So it matches the factory benchmarks. But it should be twice that since it's striped (stripe size is 64K, so setting bs=128K should have utilized the striping).

The VM is stored properly in the logical volume now. But the data doesn't seem to be getting striped.
 
Last edited:
Ok, I've done more tests. There's something I'm not understanding here.

I ran the same fio test on a completely separate machine with an NVMe to get a sense of what to expect.
I also ran the fio test on a logical volume with no striping. I used lvremove to rebuild the logical volume each time I changed the stripe settings, so it's apples to apples on the same disk, same volume group size, same VM setup (rebuilt each time).

Here are the raw results:
Code:
DESKTOP (control)
=====
bs=128K:
   READ: bw=2940MiB/s (3083MB/s), 2940MiB/s-2940MiB/s (3083MB/s-3083MB/s), io=517GiB (555GB), run=180002-180002msec
bs=256K:
   READ: bw=2994MiB/s (3139MB/s), 2994MiB/s-2994MiB/s (3139MB/s-3139MB/s), io=526GiB (565GB), run=180003-180003msec

NO STRIPE
=====
bs=64K:
   READ: bw=5130MiB/s (5379MB/s), 5130MiB/s-5130MiB/s (5379MB/s-5379MB/s), io=902GiB (968GB), run=180001-180001msec
bs=128K:
   READ: bw=6460MiB/s (6773MB/s), 6460MiB/s-6460MiB/s (6773MB/s-6773MB/s), io=1136GiB (1219GB), run=180001-180001msec
bs=256K:
   READ: bw=7688MiB/s (8061MB/s), 7688MiB/s-7688MiB/s (8061MB/s-8061MB/s), io=1351GiB (1451GB), run=180001-180001msec
bs=512K:
   READ: bw=5873MiB/s (6158MB/s), 5873MiB/s-5873MiB/s (6158MB/s-6158MB/s), io=1032GiB (1108GB), run=180002-180002msec

64K STRIPE (i=2 I=64)
=====
bs=128K:
   READ: bw=6015MiB/s (6307MB/s), 6015MiB/s-6015MiB/s (6307MB/s-6307MB/s), io=1057GiB (1135GB), run=180001-180001msec
bs=256K:
   READ: bw=7928MiB/s (8314MB/s), 7928MiB/s-7928MiB/s (8314MB/s-8314MB/s), io=1394GiB (1496GB), run=180001-180001msec

128K STRIPE (i=2 I=128)
=====
bs=128K:
   READ: bw=6392MiB/s (6703MB/s), 6392MiB/s-6392MiB/s (6703MB/s-6703MB/s), io=1124GiB (1207GB), run=180001-180001msec
bs=256K:
   READ: bw=8618MiB/s (9036MB/s), 8618MiB/s-8618MiB/s (9036MB/s-9036MB/s), io=1515GiB (1627GB), run=180001-180001msec

Looking at no striping as a baseline,
- the spec sheet for the disk was 6.8 GB/s max @ 128k seq read. So even if it was actually 6.8000000 GB/s, that's still 99.6% performance inside the VM (I'm assuming companies will always round their speeds up lol). This rules out the disk itself from troubleshooting.
- the matched speeds also rule out alignment issues

Taking a closer look at lvcreate, we have this:
-i|--stripes Number
Specifies the number of stripes in a striped LV. This is
the number of PVs (devices) that a striped LV is spread
across. Data that appears sequential in the LV is spread
across multiple devices in units of the stripe size (see
--stripesize). This does not change existing allocated
space, but only applies to space being allocated by the
command. When creating a RAID 4/5/6 LV, this number does
not include the extra devices that are required for pari‐
ty. The largest number depends on the RAID type (raid0:
64, raid10: 32, raid4/5: 63, raid6: 62), and when unspeci‐
fied, the default depends on the RAID type (raid0: 2,
raid10: 2, raid4/5: 3, raid6: 5.) To stripe a new raid LV
across all PVs by default, see lvm.conf(5) alloca‐
tion/raid_stripe_all_devices.

-I|--stripesize Size[k|UNIT]
The amount of data that is written to one device before
moving to the next in a striped LV.
So the "stripesize" here isn't actually the stripe size. It's the stripe UNIT size. The terminology is wrong in lvcreate (eg. when striping across 3 disks with a stripe UNIT size of 1k, the stripe size is 3k, but inlvcreate the "stripesize" is actually the stripe UNIT size).

Based on this, I SHOULD be getting somewhere close to 13.546 GB/s (2 x 6.773 GB/s) with bs=128k and --stripesize=64k. But I'm not.

There seems to be something I'm not getting. @CCupp
 
Last edited:
I've done more testing with strange results.

I've created logical volumes with 64K, 128K, 256K, and 512K stripings.
I then built VMs on each lv from scratch and ran the same fio test on each of them.
The fio tests check read and write speeds at blocksizes of 64K, 128K, 256K, and 512K. Otherwise, they are the same.

Here is the fio test:
[global]
ramp_time=5
ioengine=libaio
direct=1
numjobs=1
iodepth=32

[seq-read-64k]
stonewall
name=seq-read-64k
time_based
runtime=15
readwrite=read
bs=64k
filename=/dev/sda

[seq-read-128k]
stonewall
name=seq-read-128k
time_based
runtime=15
readwrite=read
bs=128k
filename=/dev/sda

[seq-read-256k]
stonewall
name=seq-read-256k
time_based
runtime=15
readwrite=read
bs=256k
filename=/dev/sda

[seq-read-512k]
stonewall
name=seq-read-512k
time_based
runtime=15
readwrite=read
bs=512k
filename=/dev/sda

[seq-write-64k]
stonewall
name=seq-write-64k
readwrite=write
bs=64k
filename=/home/user/write-test
size=10G

[seq-write-128k]
stonewall
name=seq-write-128k
readwrite=write
bs=128k
filename=/home/user/write-test
size=10G

[seq-write-256k]
stonewall
name=seq-write-256k
readwrite=write
bs=256k
filename=/home/user/write-test
size=10G

[seq-write-512k]
stonewall
name=seq-write-512k
readwrite=write
bs=512k
filename=/home/user/write-test
size=10G

Here are the results:
Code:
NO STRIPE
=====
bs=64K:
   READ: bw=5130MiB/s (5379MB/s), 5130MiB/s-5130MiB/s (5379MB/s-5379MB/s), io=902GiB (968GB), run=180001-180001msec
   WRITE: bw=2637MiB/s (2765MB/s), 2637MiB/s-2637MiB/s (2765MB/s-2765MB/s), io=10.0GiB (10.7GB), run=3883-3883msec
bs=128K:
   READ: bw=6460MiB/s (6773MB/s), 6460MiB/s-6460MiB/s (6773MB/s-6773MB/s), io=1136GiB (1219GB), run=180001-180001msec
   WRITE: bw=1053MiB/s (1104MB/s), 1053MiB/s-1053MiB/s (1104MB/s-1104MB/s), io=5480MiB (5746MB), run=5204-5204msec
bs=256K:
   READ: bw=7688MiB/s (8061MB/s), 7688MiB/s-7688MiB/s (8061MB/s-8061MB/s), io=1351GiB (1451GB), run=180001-180001msec
   WRITE: bw=2612MiB/s (2739MB/s), 2612MiB/s-2612MiB/s (2739MB/s-2739MB/s), io=10.0GiB (10.7GB), run=3920-3920msec
bs=512K:
   READ: bw=5873MiB/s (6158MB/s), 5873MiB/s-5873MiB/s (6158MB/s-6158MB/s), io=1032GiB (1108GB), run=180002-180002msec
   WRITE: bw=2636MiB/s (2765MB/s), 2636MiB/s-2636MiB/s (2765MB/s-2765MB/s), io=10.0GiB (10.7GB), run=3884-3884msec

64K STRIPE SIZE
==========
bs=64k
   READ: bw=4800MiB/s (5034MB/s), 4800MiB/s-4800MiB/s (5034MB/s-5034MB/s), io=70.3GiB (75.5GB), run=15001-15001msec
   WRITE: bw=3465MiB/s (3634MB/s), 3465MiB/s-3465MiB/s (3634MB/s-3634MB/s), io=10.0GiB (10.7GB), run=2955-2955msec
bs=128k
   READ: bw=5856MiB/s (6141MB/s), 5856MiB/s-5856MiB/s (6141MB/s-6141MB/s), io=85.8GiB (92.1GB), run=15001-15001msec
   WRITE: bw=5079MiB/s (5326MB/s), 5079MiB/s-5079MiB/s (5326MB/s-5326MB/s), io=10.0GiB (10.7GB), run=2016-2016msec
#bs=256k
#   READ: bw=8093MiB/s (8486MB/s), 8093MiB/s-8093MiB/s (8486MB/s-8486MB/s), io=119GiB (127GB), run=15001-15001msec
#   WRITE: bw=5219MiB/s (5473MB/s), 5219MiB/s-5219MiB/s (5473MB/s-5473MB/s), io=10.0GiB (10.7GB), run=1962-1962msec
bs=512k
   READ: bw=6480MiB/s (6795MB/s), 6480MiB/s-6480MiB/s (6795MB/s-6795MB/s), io=94.9GiB (102GB), run=15003-15003msec
#   WRITE: bw=5224MiB/s (5478MB/s), 5224MiB/s-5224MiB/s (5478MB/s-5478MB/s), io=10.0GiB (10.7GB), run=1960-1960msec

128K STRIPE SIZE
==========
bs=64k
   READ: bw=4850MiB/s (5086MB/s), 4850MiB/s-4850MiB/s (5086MB/s-5086MB/s), io=71.1GiB (76.3GB), run=15001-15001msec
   WRITE: bw=3579MiB/s (3753MB/s), 3579MiB/s-3579MiB/s (3753MB/s-3753MB/s), io=10.0GiB (10.7GB), run=2861-2861msec
bs=128k
   READ: bw=6124MiB/s (6422MB/s), 6124MiB/s-6124MiB/s (6422MB/s-6422MB/s), io=89.7GiB (96.3GB), run=15001-15001msec
   WRITE: bw=5079MiB/s (5326MB/s), 5079MiB/s-5079MiB/s (5326MB/s-5326MB/s), io=10.0GiB (10.7GB), run=2016-2016msec
#bs=256k
#   READ: bw=8543MiB/s (8958MB/s), 8543MiB/s-8543MiB/s (8958MB/s-8958MB/s), io=125GiB (134GB), run=15001-15001msec
#   WRITE: bw=5224MiB/s (5478MB/s), 5224MiB/s-5224MiB/s (5478MB/s-5478MB/s), io=10.0GiB (10.7GB), run=1960-1960msec
bs=512k
   READ: bw=6447MiB/s (6760MB/s), 6447MiB/s-6447MiB/s (6760MB/s-6760MB/s), io=94.4GiB (101GB), run=15002-15002msec
#   WRITE: bw=5254MiB/s (5509MB/s), 5254MiB/s-5254MiB/s (5509MB/s-5509MB/s), io=10.0GiB (10.7GB), run=1949-1949msec

256K STRIPE SIZE
==========
bs=64k
   READ: bw=5939MiB/s (6227MB/s), 5939MiB/s-5939MiB/s (6227MB/s-6227MB/s), io=87.0GiB (93.4GB), run=15001-15001msec
   WRITE: bw=2806MiB/s (2943MB/s), 2806MiB/s-2806MiB/s (2943MB/s-2943MB/s), io=10.0GiB (10.7GB), run=3649-3649msec
bs=128k
   READ: bw=7780MiB/s (8158MB/s), 7780MiB/s-7780MiB/s (8158MB/s-8158MB/s), io=114GiB (122GB), run=15001-15001msec
   WRITE: bw=5135MiB/s (5385MB/s), 5135MiB/s-5135MiB/s (5385MB/s-5385MB/s), io=10.0GiB (10.7GB), run=1994-1994msec
#bs=256k
#   READ: bw=8708MiB/s (9132MB/s), 8708MiB/s-8708MiB/s (9132MB/s-9132MB/s), io=128GiB (137GB), run=15001-15001msec
#   WRITE: bw=5201MiB/s (5453MB/s), 5201MiB/s-5201MiB/s (5453MB/s-5453MB/s), io=10.0GiB (10.7GB), run=1969-1969msec
bs=512k
   READ: bw=7371MiB/s (7729MB/s), 7371MiB/s-7371MiB/s (7729MB/s-7729MB/s), io=108GiB (116GB), run=15001-15001msec
#   WRITE: bw=5198MiB/s (5450MB/s), 5198MiB/s-5198MiB/s (5450MB/s-5450MB/s), io=10.0GiB (10.7GB), run=1970-1970msec

512K STRIPE SIZE
==========
bs=64k
   READ: bw=5990MiB/s (6281MB/s), 5990MiB/s-5990MiB/s (6281MB/s-6281MB/s), io=87.7GiB (94.2GB), run=15001-15001msec
   WRITE: bw=2850MiB/s (2988MB/s), 2850MiB/s-2850MiB/s (2988MB/s-2988MB/s), io=10.0GiB (10.7GB), run=3593-3593msec
bs=128k
   READ: bw=7770MiB/s (8148MB/s), 7770MiB/s-7770MiB/s (8148MB/s-8148MB/s), io=114GiB (122GB), run=15001-15001msec
   WRITE: bw=5185MiB/s (5437MB/s), 5185MiB/s-5185MiB/s (5437MB/s-5437MB/s), io=10.0GiB (10.7GB), run=1975-1975msec
#bs=256k
#   READ: bw=8622MiB/s (9040MB/s), 8622MiB/s-8622MiB/s (9040MB/s-9040MB/s), io=126GiB (136GB), run=15001-15001msec
#   WRITE: bw=5209MiB/s (5462MB/s), 5209MiB/s-5209MiB/s (5462MB/s-5462MB/s), io=10.0GiB (10.7GB), run=1966-1966msec
bs=512k
   READ: bw=8430MiB/s (8839MB/s), 8430MiB/s-8430MiB/s (8839MB/s-8839MB/s), io=123GiB (133GB), run=15001-15001msec
#   WRITE: bw=5222MiB/s (5475MB/s), 5222MiB/s-5222MiB/s (5475MB/s-5475MB/s), io=10.0GiB (10.7GB), run=1961-1961msec

I was expecting some variation between them, but the speeds are always highest with a fio blocksize of 256k (and a slightly better bs=512k write speed) regardless of what stripe size I use for the disk. None of them come close to the speeds I was expecting.

What am I missing?

But, the write speeds do show the striping working well. The spec sheet lists one drive as 2.6 GB/s max @ 256k seq write. So I am getting ~5.2 GB/s on my write tests. But not my read tests. Is the way I'm testing read speeds wrong? Could it be my PCIe card that the drives are attached to?
 
Last edited:
Ah, thank you very much!.

The answer given in the link matches my exact use case. I think not including --type=raid0 is exactly why it's not working. I'll give it a try.

I'll repost a summary here in case the link dies:
Q:
One set of instructions uses this command:
lvcreate -i[num drives] -I[strip size] -l100%FREE -n[lv name] [vg name]
But another uses this command:
lvcreate --type raid0 [--stripes Number --stripesize Size] VG [PVs]
Why?

A:
The two LVM commands above both create a striped volume, but using different drivers:
the first command (the one without --type=raid0) defines a striped segment type which, in turn, is a fancy name for devicemapper-level striping;
the second command (the one with --type=raid0) uses the classical Linux MD driver to setup a "true" RAID0.
Managing RAID/striping at the LVM level (rather that at disk/partition level) can be useful when you want to use different protection profile (ie: RAID0 vs RAID1) to different logical volumes (ie: scratch space vs data repository). For more information, give a look here
 
Last edited:
Solved. The problem came from how fio was testing the read speeds.

I recreated the logical volumes utilizing --type=raid0 and still didn't get the read speeds I was expecting. However, in both cases, I was getting the correct write speeds, so I double checked the fio settings.

Changing the iodepth value from 32 to 512 provided the correct speed results for both read and write. What was happening with the smaller iodepth was that I/O activity was waiting for the current I/O requests to complete before queuing more requests, thus, taking a longer time to complete the job. All is well.

As a side note,
I was initially going to set up one logical volume with the most optimal stripe size for average future workloads. However, in the process of my testing, I ended up creating 4 logical volumes of different stripe sizes and...I kind of like it.

The build is for general-use data processing, mostly for ETL. The VMs on raid0 storage will be like kitchens. Data goes in, gets chopped, boiled, minced, and steamed. Then the finished meal leaves the kitchen. So this new arrangement is like four kitchens that master specific types of ingredients instead of one big one that cooks everything adequately. We'll see how it goes.
 
Last edited:
  • Like
Reactions: waltar

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!