Large File write test to iscsi san causes Proxmox to hang

OldSunGuy

New Member
Mar 23, 2012
12
0
1
Hi All,
New member; relatively new to proxmox world ; first post; please be kind. Complicated setup so probrably leaving out important info. But here goes:

Working with Proxmox 2.0-42 , 10GbE cards (ATTO fiber cables direct connected SAN to Proxmox server- no switch) , and an Enhance-Tech SAN box. We are trying write tests at the shell (not even using VM's yet) to make sure we are getting good rates. Having difficult time writing large file. io-wait times go sky high on proxmox summary. For small files (< 10GB) speeds are less than expected - ESXi 5 gives rates about 3X faster and lets me complete the 65GB write test. ( I chose 65GB because that is more than twice the physical memory - 32GB RAM- to flush out whether caching might be involved. )Using dd to do write test.
Process goes like this:

0- all MTU's set to 1500 - no jumbo frames (for some reason Enhance SAN technician says their maximum value is 3500 MTU, so I just turned it off)

1- Make 100GB virtual disk volume volume on SAN (raid 60 - striped raid 6) with following params: Block size 512|Strip 64KB| Readahead Enabled| AV-media Enabled| Write-through-cache| 4 out of 4 for Background task priority

2- Attach LUN

3- Find LUN on Proxmox server and make filesystem (usually ext4 but have tried ext3 when I got a journalling error during one of the hangs

4 Mount new filesystem, cd to a directory on new volume, and run tests : dd if=/dev/zero of=./65GB_file bs=1M count=65k conv=fdatasync
Shell never comes back, but if I go into Proxmox server by another shell and check filesystem size, it looks like it finished , but prompt from original shell never comes back.

on esxi box the shell does not recognize conv=fdatasync, so I run sync; sync; sync; time dd if=/dev/zero of=./65GB_file bs=1M count=65k; sync

If I try same command on Proxmox server, dd returns unbelieveably fast (too high of rates to trust as real) but hangs doing the final sync. If I try the large file write test, prompt never comes back and Summary graphs on Proxmox web interface stop, even though it changes iowait and cpu load numbers at top. Cannot umount and remount LUN even after I kill off dd. fdisk -l hangs as well as a sync at this point. Tired of deleting LUN and rebooting Proxmox server to restart whole process.

Any suggestions/guidance/parameters changes would be greatly appreciated. Really like live migration and don't want to have to give that up if we go to VMware or Xen.

Thanks.
 
Hi,
perhaps the driver of the atto-card is ugly? A short time ago somebody has also trouble with an atto-card.

You can look with "iostat" how fast the disk will be written ("apt-get install sysstat" and eg. "iostat -dm 5 sdb" - or sdc...).
Have you also an 1GB-Nic on the storage? I have made an short test with an openattic-iscsi-box and got 108MB/s write-speed with 1GB-Nic. Thats not bad.

Udo
 
Thank you Udo for your suggestions.

I checked the driver version with modprobe ixgbe and found that we are using the standard Intel 10 Gigabit PCI Express Net Driver version 3.7.17-NAPI. This is only one month out of date. I did not upgrade at this point yet. I do not have a 1GB NIC connection to the SAN at this point to cross test.

I’ve gone back to do more testing using a slightly different command line (dropping the conv=fdatasync and sync’ing before and after writes instread), and monitoring the writes with the iostat utility. I now am using the following:

# sync; date; time dd if=/dev/zero of=<filename> bs=1M count=<number>; sync; date

I am running tests that increase file size by 5GB to see where I run into a problem. I am getting throughputs only between 25MB/s to 90 MB/s overall, though some rates seen by iostat while the run is ongoing go as high as 130MB/s, but the run doesn't maintain those rates. Similarly, the Proxmox Summary graph for Network Traffic bounces around between 40 – 100 MB/s .

My tests show pretty consistent throughput averages on multiple runs with files sizes up to 40 GB to be around 60 -80 MB/s overall for the entire writes. But when I get to 45 GB, the jobs begin to cause the malevolent behavior of disappearing but forcing delting the LUN and rebooting Proxmox to clear the iowait and being able to complete fdisks/syncs/remounting filesystems .
 
Installed newest ixgbe 10 gigabit ethernet driver. Still getting low bandwidth. Used netstat and found packet loss and retransmissions. Here is some out put :

# netstat -s | grep retrans



247239 segments retransmited
3063 times recovered from packet loss due to fast retransmit
3161 timeouts after reno fast retransmit
3069 fast retransmits
237893 retransmits in slow start
3061 classic Reno fast retransmits failed



(ten seconds later)


247843 segments retransmited
3070 times recovered from packet loss due to fast retransmit
3168 timeouts after reno fast retransmit
3076 fast retransmits
238476 retransmits in slow start
3068 classic Reno fast retransmits failed



Any suggestions on what parameters I might need to tune to reduce or eliminate the packet loss and retransmission would be greatly appreciated. Thanks.
 
Hi Spririt -
Not using a switch. Doing direct connect (peer to peer) using autocrossover feature of the cards.
Thanks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!