Queue Depth and more... Print
Monday, 15 June 2009 11:27

Last few weeks we have had some issues regarding our HDS USP-V. For some reason a SCSI lock locked the whole VMFS and all ESX servers in that cluster were not able to read/write to the VMFS anymore. The LUN was still available, however the VMFS not. So this crashed all the VMs in the cluster. The incident repeated itself after 6 weeks, on another server, other chassis, other switches and so we contacted VMware and HDS to help us in this situation.

After a lot of log file sending, checking settings, etc., one of the things we were recommended to change was:

  • Masking
  • Queue depth of HBA

Masking

According to a PDF send by HDS Services the masking can be done in some ways:

"Host Groups per HBA Versus Host Groups per ESX Hosts or VMware Cluster
To present a set of common, shared LUs to multiple ESX hosts or to a VMware cluster, host groups can be created either per HBA port (that is, per WWPN) or per a group of ESX hosts or VMware cluster.
A host group created on an HBA port basis contains the HBA's WWPN and a set of common, shared LUs (that is, only one WWPN, multiple LUs). A host group created per group of ESX hosts or per VMware cluster contains at least one WWPN from every ESX host and multiple LUs (that is, multiple WWPNs, multiple LUs). Every LU must be presented with the same host LU ID to every host or VMware treats the LU as a snapshot LU and disables access to the VMFS by default.
Although both concepts are supported, Hitachi Data Systems recommends creating host groups per HBA port (that is, per WWPN)."

So this is exactly what we changed. This creates more administrative overhead, but since HDS recommended us to do this, we did.

Queue Depth

HDS also told us the default Queue Depth on our Emulex adapters was too high. The default is 32, and after some calculations we needed to set it to 4. It is VERY important you check your vendor before changing this. More info about Queue Depth can be found on Duncan Epping his site here. If you want to know how the recommended settings is calculated check Frank Denneman his website. He did an excellent job describing how to calculate the queue depth.

Next thing I needed to do is change 96 servers to the correct queue depth and reboot them. I first created a .sh script, but, doing this by hand, was not very clever. So I was thinking: Powershell. Powershell is the way to go. So I created the script:

$VC = Connect-VIServer "<vCenter FQDN>"
$QUEUEDEPTH = 4     <---- change to your value!!!!
$ESXHOSTS = Get-VMHost
 
foreach ($ESXHOST in $ESXHOSTS)
{
$VIServer = Connect-VIServer $ESXHOST -User <administrator role> -Password <password>
 
Get-VMhostModule "lpfc_740" | Set-VMHostModule -Options "lpfc_lun_queue_depth=$QUEUEDEPTH"
Set-VMHostAdvancedConfiguration -Name "Disk.SchedNumReqOutstanding" -Value $QUEUEDEPTH
Set-VMHostAdvancedConfiguration -Name "Disk.UseDeviceReset" -Value 0
Set-VMHostAdvancedConfiguration -Name "Disk.UseLunReset" -Value 1
}
 

Be aware that this script is written for our servers with an Emulex FC adapter. If you have a Qlogic HBA you need to change the "lpfc_740" to something else and also the options part needs to be changed. For more info, check http://www.vmware.com/pdf/vi3_301_201_san_cfg.pdf page 107 and 108.

In the end, using this script, I was able to update 96 server within 30 minutes, awesome! Again, consult your Storage vendor before changing the Queue depth, else it might result in an unsupported configuration.

Comments (0)
Write comment
Your Contact Details:
Gravatar enabled
Comment:

!joomlacomment 4.0 Copyright (C) 2009 Compojoom.com . All rights reserved."

 
Did you know: that ESX checks every 20ms to migrate a vCPU to another pCPU for the optimal workload balance. This is configurable (0ms - 5000ms) in Cpu.MigratePeriod in Advanced Settings of you ESX server.