Heavy load on other VMs when installing a new one

zeitspringer

New Member
Aug 7, 2014
6
0
1
Hey folks,

there's a strange behavior with my PVE installation since migration from old Hardware to new one.
Description:
Everything is working fine over the day for nearly two weeks now. On the old system, i went fine for 9 months.
Since the migration, there's kind of a system-wide load problem. E.g., when i install a new QEMU-VM from ISO at least one - sometimes more - of my existing VMs are breaking down under heavy "load" ("top" shows 40 for the 1-min-value, within the VM). As there's no process causing it (according to list) i assume that this is caused by IO (!?!??).
Even when i cancel installation, the system does not calm down. I have to restart the VM to get it fixed.
This scenario is fully reproducible.
I asked my provider to do a Hardware Check, incl. discs, which did not show up any problems (not really a surprise to me).
I'm not sure if to blame the hardware (which is "better" in general) or if this is a PVE issue. I do not want to exclude a misconfiguration i could have made, but i don't have any clue on that.

Migration:
As said, the old system did never have an issues. It had to less RAM, because of that i replaced it (.... never touch a running system ......).
I installed the new system by hand. It's Debian Wheezy which PVE-No-Sub Repo. I copied all datastorages, the conf files from /etc/pve/ and /var/lib/vz, and any other relevant config from /etc/. Easy way.


I hope you can give me some hints.


Best regards,
Martin
 
Linux Software Raid (mdadm).

Hint:
The "old" disks were 1.5 TB, the new are 3.0 TB. From my experience, this should have no restrictions anyhow.
 
Linux Software Raid (mdadm).

Hint:
The "old" disks were 1.5 TB, the new are 3.0 TB. From my experience, this should have no restrictions anyhow.

Running proxmox on software raid is not supported or suggested. It doesn't explain why your old system didn't have the issues, but honestly you should look to get on some type of hardware raid.

Was the old system running a different version of debain, maybe an older version of mdadm? Just shots in the dark.

Just an FYI

https://pve.proxmox.com/wiki/Software_RAID
 
Last edited:
hm, interesting point :)
Good to know, that's not supported. Nevertheless, i went very fine with it for nearly a year.

Maybe it's worth a try to degrade the RAID and re-check the scenario with just one disk. Just to get sure. It's a bit risky, so not my most favorite experiment :)

Both was Debian Wheezy stable (no contrib or non-free, no backports).
 
hm, interesting point :)
Good to know, that's not supported. Nevertheless, i went very fine with it for nearly a year.

Maybe it's worth a try to degrade the RAID and re-check the scenario with just one disk. Just to get sure. It's a bit risky, so not my most favorite experiment :)

Both was Debian Wheezy stable (no contrib or non-free, no backports).

Is it just a raid 1 setup of mdadm?
 
Is it just a raid 1 setup of mdadm?

let me give some more information.
Yes, it's a RAID 1. Both disks are used in total. Partitions have been created on each disk and then packed to separate arrays.

Code:
Personalities : [raid1]                                                                      
md4 : active raid1 sda5[0] sdb5[1]                                                                 963510080 blocks super 1.2 [2/2] [UU]                                                                                                                                               
md3 : active raid1 sda4[0] sdb4[1]                                                                 963510080 blocks super 1.2 [2/2] [UU]                                                                                                                                               
md2 : active raid1 sda3[0] sdb3[1]                                                                 963510080 blocks super 1.2 [2/2] [UU]                                                                                                                                               
md1 : active raid1 sda2[0] sdb2[1]                                                                 22003584 blocks super 1.2 [2/2] [UU]                                                                                                                                                
md0 : active (auto-read-only) raid1 sda1[0] sdb1[1]                                                16768896 blocks super 1.2 [2/2] [UU]

md2-4 are luks encrypted, with ext4 in it. On them, the DataStorages are placed.

It's the same configuration, old and new machine. "Just" Hardware differs.
 
let me give some more information.
Yes, it's a RAID 1. Both disks are used in total. Partitions have been created on each disk and then packed to separate arrays.

Code:
Personalities : [raid1]                                                                      
md4 : active raid1 sda5[0] sdb5[1]                                                                 963510080 blocks super 1.2 [2/2] [UU]                                                                                                                                               
md3 : active raid1 sda4[0] sdb4[1]                                                                 963510080 blocks super 1.2 [2/2] [UU]                                                                                                                                               
md2 : active raid1 sda3[0] sdb3[1]                                                                 963510080 blocks super 1.2 [2/2] [UU]                                                                                                                                               
md1 : active raid1 sda2[0] sdb2[1]                                                                 22003584 blocks super 1.2 [2/2] [UU]                                                                                                                                                
md0 : active (auto-read-only) raid1 sda1[0] sdb1[1]                                                16768896 blocks super 1.2 [2/2] [UU]

md2-4 are luks encrypted, with ext4 in it. On them, the DataStorages are placed.

It's the same configuration, old and new machine. "Just" Hardware differs.

Interesting that the new box is having such issues.

Another shot in the dark. Possibly AES-NI was enabled on the old hardware and its not enabled on the new hardware? I can't recal the openssl command to verify if AES-NI is loaded and working. Might be "openssl engine", you could also look at the CPU flags.
 
good point.

luks uses aes, cbc-essiv:sha256. As i did not set this explicitly, i assume this to be the same as on the old machine.
And yes, /proc/cpuinfo shows AES to be available, the module is loaded, too.

A friend of mine suggested to create disk load e.g. by dd. It's interesting if this would also affect the existing VMs. Until now, i only checked this with installing a new VM.

As anything is working in general, this seems not to be an issue of my migration, isn't it? I'm quite sure, that i did not use the "recommended way of migrating".
 
good point.

luks uses aes, cbc-essiv:sha256. As i did not set this explicitly, i assume this to be the same as on the old machine.
And yes, /proc/cpuinfo shows AES to be available, the module is loaded, too.

A friend of mine suggested to create disk load e.g. by dd. It's interesting if this would also affect the existing VMs. Until now, i only checked this with installing a new VM.

As anything is working in general, this seems not to be an issue of my migration, isn't it? I'm quite sure, that i did not use the "recommended way of migrating".

Sounds like you have AES-NI working so that shouldn't be the issue. The odd one here is that even after your done the load doesn't come down on the VM's, this tells me something major is wrong but very tough to tell. Unsure what the recommended way of migrating is. I would have backed each VM up individually, setup the new system, then restore each VM one by one on the new system.

What are the overall resources of the new machine vs the old machine, ram, cpu, drive, etc. How many VM's are you running and what resources are the allowed to use?
 
Maybe new disks have 4k sector and you have an alignment issue.

What brand/model disks are you using in the new server?
I have had some "desktop" SATA disks that when installed in a server rack in the datacenter performed like crap because they could not deal with the vibrations from all the other servers disks/fans.

What sort of performance benchmarks are you getting on the old vs new server?
pveperf would be a good place to start.