Are physical limits insignificant? Print
Thursday, 25 February 2010 17:30

Today not really a tech-post, but more or less a discussion starter... And the question is: "Are physical limits insignificant?"

The reason I've come up with this was my action last night at one of my customers. We needed to re-patch some ESX servers and since we didn't want any downtime, we've put the hosts in maintenance mode. Hardware was Blades with 16 ESX servers each. While putting all those hosts in maintenance mode, waiting 'till every VM was VMotionned, I kept time. To fully migrate the whole enclosure, it took us almost: 45 minutes! That's 45 minutes for migrating all those VMs! And the environment isn't that special, having about 20:1 / 30:1 ratios.

Anyway, I see a lot of trends where you have BIGGER, FASTER, LARGER hardware. Especially looking at Cisco UCS, which allows for a stunning 384GB of memory. Can you imagine how long it would take to VMotion that? Besides of that, how high is the impact you loose 1 piece of hardware, with all those VMs (or better yet, loose an enclosure, with 4, 8 or 16 blades)?

On the other hand, how high is the risk that a hardware piece is failing, using redundant power supplies, raid solutions, etc.? To be honest - I've seen large environments and different brands of hardware, and the only thing that breaks once in a while, is a management controller, HDD (which is in raid set anyway), or a small fan which can be easily replaced. So while the impact is very high, the risk is very low, so is it OK for us to get those gigantic machines and have incredible virtualization ratios?

Well, I think other things come into play when we do. One of the things is 'Human error'. Ever shutdown or put a host in standby without putting that host in maintenance mode first? Ever started an firmware update and see all hosts shutdown one-by-one. Ever pulled out the 'active' cable while you where thinking it was the standby one? Well, I can't say I did those things - but I've seen it happen. And an other thing - perhaps there is an SLA where you need to be able to empty a whole enclosure within a specific time range? OK, OK, that's a bit out of line, or is it? How long would it take to remediate ESX servers using Update Manager?

Anyway, lucky for us tweaking helps us a bit. While the maximum VMotions at a time is 2, you are able to tweak this to 12, so it would somewhat go faster. Check it out here on Boche his blog.

Cheers.

Comments (3)
  • Jose Ruelas  - performance
    avatar
    what would the performance be if you have 10Gb instead of 1gpbs?
  • Duncan
    avatar
    Do keep in mind that both VMotion and SVMotion are significantly faster with vSphere than with ESX3.X. Besides that for UCS and for HP Flex10 environments it is not uncommon to give VMotion 2GB of bandwidth which should also speed up the VMotion.

    And like you said, setting the amount of parallel VMotions to for instance 4 will also decrease the time it takes.
  • Yvo Wiskerke  - UCS
    avatar
    I was reading your comments on the 384GB of memory in UCS. That is only for the full-width blades. The current half width blades only allow for 96GB, which is an industry average.
    Also when you're talking about vMotion-ing a lot of VMs, UCS has multiple 10GB uplinks available that are dynamically used by the whole UCS system and can be controlled by QoS.
    Seems like we need to setup a comparison here.

    The risk of human error when cabling is one of the pros of UCS. UCS eliminates at least 60% of cabling used by other vendors. Cable once and lock the rack and only open it when you need to add or replace blades. No cabling involved in that.

    http://viewyonder.com/2009/07/14/cisco-ucs-dog-food-tastes-nice/
Write comment
Your Contact Details:
Gravatar enabled
Comment:

!joomlacomment 4.0 Copyright (C) 2009 Compojoom.com . All rights reserved."

 
Did you know: that ESX checks every 20ms to migrate a vCPU to another pCPU for the optimal workload balance. This is configurable (0ms - 5000ms) in Cpu.MigratePeriod in Advanced Settings of you ESX server.