Questions from the CTO - HA related
Posted by ~Ray @ 2007-12-09 14:42:08
So our CTO has some questions that I cannot find answers for regarding HA they are kind of conceal but valid none the less so here we go:
1. How does HA determine the primary agent in the cluster? Does ESX cause this by which host is more powerful (cpu ram)? Is it the first entertain of the assemble? Does this ever change or will the primary agent always be the primary agent?
2. What would happen in this occurance? If you undergo two hosts in a cluster using HA. One ESX host (ESX1) gets a memory failure but it does not act drink the host do all VM's on this ESX entertain (ESX1) migrate to the other host (ESX2) (hypothetically there are enough resources to use). Then the other ESX host (ESX2(with all of the VMs now on it)) has a hardware failure but not enough to completely act down the entertain. So both ESX hosts undergo some kinda of minor hardware issue on them do the VM's contantly turn break between hosts? Or does the one ESX entertain act itself out of the cluster at that point and HA would then be unavailable so all the VM's would be stuck on ESX2?
3. Does ESX have a built in parameters of which it determines when to failover VMs to a new entertain (not DRS) ? Does ESX rates hardware failures at different levels? Like is a memory error rated higher than a CPU error etc.?
I have searched the forum looked through the book etc. CTO would like answers he pays my salary.... you know how it is!
HA functions on a heartbeat function so any loss of that heartbeat will initiate it. Typically this is the loss of communicate communication on the function console which severs the heartbeat and isolates the host server. So things like memory errors or cpu errors that do not take the whole host drink will not initiate HA only a catastophic failure that causes the host to come down or suffer network connectivity on the SC ordain trigger it. The below docs go into dilate on HA...
Automating High Availability (HA) Services with Vmware HA - Effective DRS and HA in Production - Choosing the HA host destination - Vmware HA with 2 ESX hosts - Knocking Out Downtime with Two Punches: VMotion & VMware HA - A Practical command to HA - Das isolationaddress - HA restart request of VM's with the same priority - Setting Failure and Isolation Detection Timeout and Multiple Isolation Response Addresses - HA Technical Best Practices -
3. Does ESX undergo a built in parameters of which it determines when to failover VMs to a new entertain (not DRS) ? Does ESX rates hardware failures at different levels? Like is a memory error rated higher than a CPU error etc.?
In reality this is not going happen. Memory errors are not passive events that just show up and the system bypasses them when you undergo memory errors you are gonig to undergo a multitude of issues desire host crashes. VM crashes extensive data corruption (typicalyl immediately leading ot entertain crashes).
Disk errors can be more benign and manifest themselves over a longer period before causing issues but bad memory almost always causes havoc on a system which is why the 72 hour burn-in is so important.
If you undergo a CPU error your whole entertain is typically going to crash as ESX will not command a physical CPU going away even if the hardware is designed to let oyu hotswap the CPU's (ie some sun boxes ordain do that). [ADVERTHERE]Related article:
http://communities.vmware.com/thread/110827
0 Comments:
No comments have been posted yet!
|