Proxmox 3.4 on IBM x3850 X6

hzk916

Renowned Member
Feb 24, 2015
22
0
66
Hi
I am having trouble with an install of Proxmox 3.4 on a new IBM x3850 X6. The VMs hang or crash every 2-3 days, restart OK.
x3850 has 2 x Xeon 12C E7-4860v2 130W 2.6GHz/1600MHz/30MB processors and 64gb memory. There are 2 RAID configurations RAID5 with 3TB SAS drives and a RAID1 with 2 x SSD drives
Proxmox is installed on the RAID5.
As it is a new server I am not sure if this is a hardware fault or a software compatibility issue. IBM support are also looking at it.
(Various VMs all Windows 1x2003 Ent, 1x2008 x32 1x2012R2 and a 1xWin7x64) all have locked or stopped at some point over last 2 weeks. Have moved all to standby HP DL360 and all are working fine.
 
Hi, due to the huge amount of factors, probably post some more detail about pve (pveversion, pveperf, storage.conf, repository used, /etc/network & such) and for at least a vm (vmid.conf)
would probably help someone spot what it could be.

Marco
 
install and VMs were done through gui, am not a Linux guy, windows only.
Where do I find vmid.conf
 
well, I'm sorry but IMHO using proxmox without knowing its inner working is possible, but not advisable to. It's super easy use only the gui when everything works smoothly but not that easy when something causes troubles.

I'll give some suggestion below, but if you are seriously (professionally) considering this solution, either get a proxmox support level or you're on your own (ie: you have to dirty your hand with many quite advanced linux topics)
it's not terrible as it sounds, but it has a learning curve, depending on the skills you have or get from someone else's help. although this forum is here to help, too, and for free :)

That said:
proxmox virtual environment (pve) comes with many diagnostic tools. The only way to use them is from the shell (linux command line) as root.
executing them (and knowing howto) and posting results on this forum is the usual way to get help from others (volunteers and pve staff)

like output of commands:
#pveversion -v
#pveperf

and many, many others, as in http://pve.proxmox.com/wiki/Command_line_tools (and many others not specific of pve but maybe linux networking, logging, system & such)

in /etc/pve you find (on any node, since it's replicated in realtime) nearly everything you need to know.
including storage.conf, and vm conf files like /etc/pve/qemu-server/100.conf

The wiki has many many info about nearly everything but is not a manual.
There is a very good manual written by an experiences user (symmcom), you can get for a few bucks, see: http://forum.proxmox.com/threads/18...-a-book-about-Proxmox-VE-is-finally-available
and another about HA topics: http://forum.proxmox.com/threads/20368-New-book-Proxmox-High-Availability

I stop here now, but feel free to ask more
but remember, there is highly skilled and nice people here around (I'm not talking about me)

without you posting some more details, there's nobody able to jump in and solve your issues... it's simply too difficult.

Marco
 
Thank you so much for your excellent reply Marco,
I will have to just dig down and learn some Linux, and perhaps I will start with the support subscription.
What I was hoping to find out from this forum is if anyone else has used Proxmox on this new IBM X6 (x3850 3837C1G) Server?
As it is very new hardware.

Tony
 
a quick search for "x3850" reveals the some people is using the X5, and it should be an evolution, mainly perhaps?
here is a comparison
http://www.thevirtualist.org/x3850-x5-vs-x3850-x6-comparison/

when I started with pve (it was 1.5 then) I had brand new x3650m2 - still here - and their network card was not supported... I can't remember but somehow I managed to load them (or soon appeared 1.7 which worked for sure, I can't recall). So, I'm quite confident that your problems are elsewhere, maybe in the storage, network or vm config or caching...

Marco
 
As I now know that Proxmox is running RHEL 6.6, do you think I need to apply these IBM updates? There are 107 of them.

group: ibm_utl_uxsp_a8sp03p-1.30_rhel_32-64

IBM System x3850 X6 / x3950 X6 UpdateXpress System Pack for RHEL 6 x64
The following files implement this fix.
ibm_utl_uxsp_a8sp03p-1.30_rhel_32-64.xml (1.42 MB)
ibm_utl_uxsp_a8sp03p-1.30_rhel_32-64.html (857.03 KB)
ibm_utl_uxsp_a8sp03p-1.30_rhel_32-64.txt (28.96 KB)
brcd_dd_fc_bfa-3.2.4.0_rhel6_32-64.chg (4.33 KB)
brcd_dd_fc_bfa-3.2.4.0_rhel6_32-64.tgz (76.46 MB)
brcd_dd_fc_bfa-3.2.4.0_rhel6_32-64.txt (53.7 KB)
brcd_dd_fc_bfa-3.2.4.0_rhel6_32-64.xml (32.34 KB)
 
As I now know that Proxmox is running RHEL 6.6, do you think I need to apply these IBM updates? There are 107 of them.

group: ibm_utl_uxsp_a8sp03p-1.30_rhel_32-64

IBM System x3850 X6 / x3950 X6 UpdateXpress System Pack for RHEL 6 x64
The following files implement this fix.
ibm_utl_uxsp_a8sp03p-1.30_rhel_32-64.xml (1.42 MB)
ibm_utl_uxsp_a8sp03p-1.30_rhel_32-64.html (857.03 KB)
ibm_utl_uxsp_a8sp03p-1.30_rhel_32-64.txt (28.96 KB)
brcd_dd_fc_bfa-3.2.4.0_rhel6_32-64.chg (4.33 KB)
brcd_dd_fc_bfa-3.2.4.0_rhel6_32-64.tgz (76.46 MB)
brcd_dd_fc_bfa-3.2.4.0_rhel6_32-64.txt (53.7 KB)
brcd_dd_fc_bfa-3.2.4.0_rhel6_32-64.xml (32.34 KB)

I don't think you will be able to apply these updates. What disk type are you using, what cpu type are you using? Are the VM"s locked up? Bluescreen?
 
It is an IBM X x3850 X6
No Bluescreen, just a disconnect and or lockup of VMs (4 x VMs , Win7x64 Win2003 Ent, Win 2008 x32, & 2012R2)All would randomly lock freeze, restart fine for 2-3 days lock again. I have moved them to a HP DL360 and they are all fine.

I have 2 RAID setup 4x900gb SAS RAID 5 with own controller and a 2x250GB SSD RAID1 with own controller. Proxmox is installed on RAID5.
Processors are 2 x Xeon 12C E7-4860v2 130W 2.6GHz/1600MHz/30MB.
RAID cards are 2 x ServeRAID M5210 SAS/SATA Controller for IBM System X, with 1GB cache upgrade
Memory is 64GB 8 x 8GB (1x8GB, 1Rx4, 1.35V) PC3L-12800 CL11 ECC DDR3 1600MHz LP RDIMM
 
It is an IBM X x3850 X6
No Bluescreen, just a disconnect and or lockup of VMs (4 x VMs , Win7x64 Win2003 Ent, Win 2008 x32, & 2012R2)All would randomly lock freeze, restart fine for 2-3 days lock again. I have moved them to a HP DL360 and they are all fine.

I have 2 RAID setup 4x900gb SAS RAID 5 with own controller and a 2x250GB SSD RAID1 with own controller. Proxmox is installed on RAID5.
Processors are 2 x Xeon 12C E7-4860v2 130W 2.6GHz/1600MHz/30MB.
RAID cards are 2 x ServeRAID M5210 SAS/SATA Controller for IBM System X, with 1GB cache upgrade
Memory is 64GB 8 x 8GB (1x8GB, 1Rx4, 1.35V) PC3L-12800 CL11 ECC DDR3 1600MHz LP RDIMM

I am more so interested on the VM side. Are you using virtio or ide or sata? Are you using cpu type "host" or something else?

Are you running the latest promxox?
 
Everything looks good, we are running a similar windows VM here without any issues. However we are using a disk type of raw vs qcow as we noticed some odd performance issues in windows guests with qcow.

These are all just shots in the dark but it would be worth trying a windows VM with a raw disk, or possibly changing the cpu type. Doesn't explain why it runs ok on the HP but its worth a shot. Is the IBM on the latest IMM and updated bios?

This post is a bit similar and he ended up moving away from qcow also and it fixed the issue. Worth a shot!

http://forum.proxmox.com/threads/20046-Some-Windows-guests-hanging-every-night/page2
 
Yes I just update the IBM yesterday with the latest IMM2 4.8 and have ran there BOMC Latest available individual
updates
ISO several times until all were updated. Going to test now for next week with 1-2 VMs see how it runs, usually takes 2-3 days before a lockup.
 
Day 6 running OK until today, one windows 7 locked up at lunch time, had to reboot VM, Proxmox and other VM's OK.
 
very strange, in my experience. are you completely sure your ram is not faulty (i know it's ecc but something strange like this could happen also for that reason)

if you have some other server working ok (even other brabnd/models) and the ram can be swapped... I would try that.
is it the only IBM x3850 X6 you have?

and: nothing in the logs about some possible sw/hw/net issue causing this?

Marco
 
very strange, in my experience. are you completely sure your ram is not faulty (i know it's ecc but something strange like this could happen also for that reason)

if you have some other server working ok (even other brabnd/models) and the ram can be swapped... I would try that.
is it the only IBM x3850 X6 you have?

and: nothing in the logs about some possible sw/hw/net issue causing this?

Marco

I agree, good points. Are these VM's now running on raw disks with cpu type host?
 
Servers tends to be very picky with brand and model of ECC RAM so you have verified that the RAM you use is certified by IBM for your particular server?
 
I purchased the memory with the server from IBM distributor.
We have just had our longest up time of 12 days before the Win7 VM froze, this time one of the Server 2008 VM also froze same time, other Server 2008R2 VM was fine.
Going to remove one of the books (and memory that was supplied) just use the original memory that came with the books see will that resolve the problem. (only 2 books in a 4 book x3850 x6)
 
Latest is the server is still not running reliably, IBM have replaced a Mid section board in the server and now they are coming to replace fans in one of the books, as the DSA report shows overheating. It looks like it was a hardware issue all along.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!