Article by Borris Sadacca on 3 October 2006 available here
Mostly concerned with datacentres, and the reliance on reliable equipment and reliable power supply including Uninterruptible Power Supply (UPS). "It is clear that to achieve high availability in the datacentre, IT directors need to look not only at the applications and server infrastructure and service level agreements associated with the IT, but also at the non-IT infrastructure - the mechanical, electrical and plumbing systems that keep the datacentre operational."
It points out that systems designed to be highly reliable are often brought down by human error. Examples quoted include:
* Staff may be needed to work after hours and are tired.
* A common problem is when maintenance staff do not follow procedures step by step, which happens especially with well-versed personnel.
* Systems components are replaced even though there are no signs of wear or failure. This creates an opportunity for inserting other failures.
* Invasive checks that require the removal of other components can introduce problems.
"So while technology and multiple levels of redundancy can limit the effect of failure, much of what keeps a datacentre going is down to the people. Many problems can be avoided simply by operating a two-person maintenance team."
Andy Brazier
Thursday, October 05, 2006
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment