Thursday, December 30, 2010

Lessons Learned From Maintenance Mergers

Article at Aviation Week by Heather Baldwin on 23 December 2010

Quoting Hal Heule, president of HMH Consulting, who was senior VP technical operations during the America West-US Airways merger.

"There are three standout issues maintenance organizations should have a plan to address."

1. Communication. The announcement of a merger causes immediate distraction in any workplace. (Will I have a job, will it change, will I have to move?). "It is important to address them head-on, repeatedly, with solid information. Otherwise, rumors and misinformation will take over and degrade job performance, increasing the likelihood of a maintenance error."

2. Training. Heule’s biggest “lesson learned” from the America West-US Airways merger was in the area of training. “I wish we’d done more of it, and I wish we had done it better,” he says. If he had to do it all again, he would "beef up the training department and slow down the training process, devoting more time, attention and resources to this critical area." By neglecting the “why” in training people try to answer it themselves and overlook what they need to learn about the new organisation. If employees are not 100% onboard with why the new way will be better, that limits their ability to engage with the material.

3. Integration workload. Not surprisingly, it takes a lot of work to integrate two major airlines. Maintenance leaders cannot expect to handle the added workload and still fully perform their jobs.

Where there are redundancies, consider redeploying personnel to areas such as training, which need more attention. Another option is to split the workload: give one person responsibility for merger issues while another runs the day-to-day airline operations.

Taking the time to manage communications, plan out training that addresses the “why” behind forthcoming changes, and keeping all maintenance personnel employed through the merger—even if it means assigning new, merger-related responsibilities—vastly reduces the human factors issues that can lead to error.

Wednesday, December 22, 2010

Motorcycle Simulator Shows Slow Is Not Always Better

Article at AutoEvolution by Alina Dumitrache on 6 December 2010

Study performed by researchers at The University of Nottingham's Centre for Motorcycle Ergonomics & Rider Human Factors used motorcycle simulator to analyse rider behaviour.

Three groups of riders, namely novice, experienced and those who had taken advanced motorcycle training, were out through the same scenarios .

The findings showed that experience on its own does not make riders safer on the road and in some cases the experienced riders behaved more like novice riders. Advanced riders used better road positioning to anticipate and respond to hazards, kept to urban speed limits, and actually made better progress through bends than riders without the formal advanced training.

“It has demonstrated clear differences between the rider groups and potential benefits to advanced training above and beyond rider experience and basic training. Whilst experience seems to help develop rider skills to an extent, advanced training appears to develop deeper levels of awareness, perception and responsibility. It also appears to make riders better urban riders and quicker, smoother and safer riders in rural settings," said Alex Stedmon from the Human Factors Research Group.

Why we must always factor in the human element

Article on Arabia Aerospace website by Ali Al Naqbi on 8 December 2010

Increasing automation on the flight deck is supposed to improve safety – but many pilots are questioning whether, in fact, automation overload is putting aircraft at risk. Research into human factors has thrown up some evidence that cultural differences may not be adequately considered in automation design, training, certification, and operations. If they are not factored in, they may have resulting effects on performance and how automation is used.

At the same time, automation design may not be guided by a philosophy that gives adequate attention to the proper role and function of the human and to human capabilities and limitations. This may compromise system effectiveness and safety.

Pilots from different cultural backgrounds should be involved in the basic design of the avionics functions and of the training systems.

IEHF Oil & Gas conference

Taking the time to reflect back on the Institute of Ergonomics and Human Factors (IEHF) conference in November titled Human & Organisational factors in the oil, gas and chemical industries - More information here

Jim Wetherby, ex Space Shuttle pilot and now working for BP in the US, made a very interesting observation. After the Deepwater Horizon oil platform was lost, relief wells were drilled. These represented the same technical challenges as the original well, and were carried out by the same companies (Transocean drillers, Haliburton cement) yet were delivered ahead of time without incident. So it is not the case that the organisation could not drill for oil in deep waters. But it is easy to forget how dangerous these activities can be, and this complacency leads to critical items being overlooked and forgotten.

The presentation "Not on my shift" by George Petrie (Petrofac) showed how a focus on avoiding problems can work at the sharp end. This involves giving supervisors real ownership of safety, making them feel personally responsible for what happens when they are in charge.

At a higher level, the presentation from Caroline Sugden (HSL) and Peter Jefferies (ConocoPhillips) illustrated the concept of a high reliability organisation, which is one that manages to maintain almost error-free performance. These organisations have a certain set of characteristics including an 'intelligent wariness' and knowing where the 'edges of the safety envolope' are.

I felt these three presentations really gave us an insight into a future direction for safety management, culture etc. They were in contrast to some of the other presentations that gave a good account of current approaches, which may, in my opinion, be based on too simplistic idea of what really makes the difference between success and an accident.

This was another excellent conference from the Institute, and all presentations were very good. Looking forward to the next in 2012

Thursday, November 04, 2010

US - PHMSA amendment to Federal pipeline safety regulations to address human factors and other aspects of control room management

Notification of amendment from US Department of Transportation - Pipeline and Hazardous Materials
Safety Administration.

Safety regulations are being amended to address human factors and other aspects of control room management for pipelines where controllers use supervisory control and data acquisition (SCADA) systems. Under the final rule, affected pipeline operators must define the roles and responsibilities of controllers and provide controllers with the necessary information, training, and processes to fulfill these responsibilities. Operators must also implement methods to prevent controller fatigue. The final rule further requires operators to manage SCADA alarms, assure control room considerations are taken into account when changing pipeline equipment or configurations, and review reportable incidents or accidents to determine
whether control room actions contributed to the event.

This rule improves opportunities to reduce risk through more effective control of pipelines. These regulations will enhance pipeline safety by coupling strengthened control room management with improved controller training and fatigue management.

Effective control of pipelines is one key component of accident prevention. Controllers can help identify risks, prevent accidents, and minimize commodity loss if provided with the necessary tools and working environment. This rule will increase the likelihood that pipeline controllers have the necessary knowledge, skills, and abilities to help prevent accidents. The rule will also ensure that operators provide controllers with the necessary training, tools, procedures, management support, and environment where a controller’s actions can be effective in
helping to assure safe operation.

Wednesday, November 03, 2010

Put your back into it - My brilliant career

Article on the South African Times Live by Margaret Harris on 10 October 2010

It is an interview with Dale Kennedy, an ergonomist and director at consulting firm Ergomax.

I particularly liked Dale's answer to the question "What does an ergonomist do?"

"An ergonomist looks at how working environments affect people. We consider what the human body can and wants to do and design the work environment appropriately to minimise risk exposure, optimise efficiency and maximise profits. Ergonomics marries the occupational health and safety of the workers to the fundamental needs of any business - that of making money."

Nuclear submarine freed after running aground off Isle of Skye

Article in The Guardian on 22 October 2010

The Royal Navy's HMS Astute, the world's most advanced submarine, ran aground in familiar waters during an exercise off the Isle of Skye.

The accident is particularly embarrassing as it involves a new state-of-the art vessel, the largest British nuclear-powered attack submarine ever built for the navy. It cost £1.2bn and is equipped with the latest stealth and sonar technology, making it difficult to detect under the sea.

Causes of the incident are likely to cause human error. No details currently available, but the images of the sub stuck on the sea bed make it worth keeping the reference.

Tuesday, November 02, 2010

Camelford poisoning: 'water authority insisted supplies were safe'

Article in The Guardian 1 November 2010 by Steven Morris

This incident happened in July 1988, but is in the news again because of an inquest into the death of women in 2004 that may have been linked to the contamination of drinking water with aluminium sulphate. The incident is often mentioned when talking about human error, but details have not been available before.

The incident occurred when concentrated aluminium sulphate was transferred to the wrong tank at the Lowermoor plant, which supplied a large area of north Cornwall including Camelford. This meant it was present in the drinking water at a much higher concentration than it should have been.

The driver of the delivery tanker told the inquest how he had stepped in at the last minute to take over the delivery. He was told to put his load "in a tank on the left", but he was confused because there were several tanks and manhole covers.

He said he had asked his colleagues to telephone the authority to say he would be running late but when he arrived at Lowermoor no one was there. He said there was no phone available to ring anyone.

According to the The BBC on 1 November 2010 the driver told the inquest he had let himself into the works with a key given to him by the regular driver Barry Davey. But he did not know the former South West Water Authority, which ran the works, used the same key at all its plants. He had believed the key would let him into the site and open one tank.

In another article from The BBC on 1 November 2010, The chemical was used to treat cloudy water. As well as being too concentrated in the drinking water supply, the acidity in the water also released chemicals in pipe networks into people's homes.

The water company was inundated with around 900 complaints about dirty, foul-tasting water but no warnings were given to the public on the night of the incident on 6 July, 1988.

Local residents subsequently reported suffering health problems, including stomach cramps, rashes, diarrhoea, mouth ulcers, aching joints and some even said their hair had turned green from copper residues.

Friday, October 15, 2010

Common Sense Common Safety

Report by Lord Young of Graffham to the Prime Minister published 15 October 2010 and available from the Number 10 website

Lord Young was given the job of reviewing the operation of health and safety laws and the growth of the compensation culture. The main concern being that the standing of health and safety has in the eyes of the public has dramatically reduced.

According to the report, over 800,000 compensation claims were made in the UK during 2009. Some of these were for trivial matters, and the rise of claims management companies seems to have been a driving factor. The problem is that companies, voluntary organisations, schools, emergency services and others are becoming overly risk-adverse and bureaucratic because of their fear of compensation.

There are quite a number of recommendations, but taken as a whole they cover:

* Reviewing the way that compensation claims can be made, including the role of companies that assist in these claims, to get society to move away from a compensation culture;
* Making sure well-intentioned volunteers (i.e. good Samaritans) are not held liable for consequences that may arise;
* Simplify compliance processes for low-hazard workplaces, and provide more help so that organisations can easily check and record their compliance;
* Implement an accreditation scheme for health and safety consultants with the aim of improving professionalism and raising standards;
* Encouraging insurance companies to take a more reasonable approach to minimise the burden on their customers;
* Simplifying processes for schools;
* Provide a means for citizens to challenge local authorities if they want to ban events on health and safety grounds.

The report suggests some aspects of health and safety legislation should be reviewed, but does not appear to advocate significant change. It is the application that is of most concern.

One suggestion that I think is particularly power is the suggestion to "Shift from a system of risk assessment to a system of risk–benefit assessment." Although I am disappointed that this is only directed towards Education establishments, and not for every organisation. I think this fundamental shift could significantly improve the way risks are managed in practice.

Wednesday, October 13, 2010

Similar to Snail Mail

Article at the Daily WTF (curious perversions in Information Technology) by Remy Porter on 20 July 2010.

A story relating to a company that sells addresses for use with direct marketing (junk mail). Maintaining lists of addresses is relatively easy, but to have names of residents at those addresses requires more work to acquire and update. Therefore, it is cheaper to use a generic name such as "The resident" or "The car owner." Apparently these generic terms are known as "slug names."

The company in question provided a web based service whereby the direct marketers could go online and download the address lists for a fee. They updated the service so that the customer could choose the cheaper, unnamed list.

The code used by the website checked whether the 'Slug' option was selected, and knew to not include names. However, an oversight meant that an alternative to the name was not supplied, and instead it was labelled "slug." This meant, when mail was sent it was addressed "To the Slug." This was only discovered after several mailings had been sent out.

The underlying cause of this error was in the specification for the code. It simply required the option to be provided to the customer, and did not same anything about how that was to be handled. The code went through full quality assurance, but this simply checked against the original specification.

Family's Titanic secret revealed

BBC Website 22 September 2010

According to new information from novelist Louise Patten, granddaughter of Titanic's Second Officer Charles Lightoller, the ship hit the iceberg because the helmsman turned the wrong way when ordered to change direction.

The explanation of why such a fundamental error occurred is that the accident happened at a time when ship communications were in transition from sail to steam. Two different systems were in operation at the time, Rudder Orders (used for steam ships) and Tiller Orders (used for sailing ships). Crucially, Mrs Patten said, the two steering systems were the complete opposite of one another, so a command to turn 'hard a-starboard' meant turn the wheel right under one system and left under the other."

It just so happens that the helmsman had been trained on sail, but was steering a steam vessel.

Mrs Patten claims that only a very small number of people knew about this mistake, but they kept quiet because if the White Star Line had been found to be negligent, it would have gone bankrupt and everyone would have lost their jobs.

Tuesday, October 12, 2010

A manager's guide to reducing human error

Publication 770 from the American Petroleum Institute (API). Full title 'A manager's guide to reducing human errors. Improving human performance in the process industries.' Published March 2001 and written by D.K. Lorenzo

9 years since publication, it is easy to think that things have moved on and this may be somewhat out of date. However, I don't think this is the case. I particularly like section 3.1 which lists "examples of error likely situations." They include:

1. Deficient procedures - good procedures help ensure qualified people can operate correctly and safely
2. Inadequate, inoperative or misleading instrumentation - poor instrumentation means workers have to 'fill in the blanks' by deduction and inference when working out what is going in on
3. Insufficient knowledge - not just knowing the 'what' and 'how;' but also the 'why'
4. Conflicting priorities - particularly between production (which usually has more tangible rewards) and safety
5. Inadequate labelling - useful for new workers, workers who only use the system infrequently or frequent users when in stressful situations
6. Inadequate feedback - if feedback on actions is not prompt, people tend to over-react
7. Policy/practice discrepancies - once any discrepancy is tolerated between what is written and what happens in practice, the workers will use their own judgement to decide which policies are to be applied
8. Disabled equipment - either means workers are not aware of what is happening or are distracted by spurious alarms and trips
9. Poor communication - two way verbal communications to confirm understanding, backed up by written where it is critical
10. Poor layout - if an instrument, control or other item is not located conveniently, it is unlikely to be used as intended or required
11. Violations of populational stereo types - the way people will respond without thinking because of what they are used to in everyday life
12. Overly sensitive controls - people are not very precise and so controls have to be designed with that in mind
13. Excessive mental tasks - the more demanding, the greater chance of error
14. Opportunities for error - if there is the chance for error, even if likelihood is considered low, it will eventually occur
15. Inadequate tools - proper tools expand human capabilities and reduce the likelihood of error
16. Sloppy housekeeping - appearance of a facility is usually perceived as a reflection of management's general attitude.
17. Extended, uneventful vigilance - if people need to monitor something for some time (more than 30 minuets) when nothing is happening, their ability to detect something becomes very low
18. Computer control failure - computers prone to errors in software and operator input, and so whilst having the potential to improve efficiency, safety etc. the risks need to be understood and managed properly
19. Inadequate physical restriction - as long as it does not impede normal operations, interlocks, unique connections and other physical characteristics can reduce error likelihood
20. Appearance at the expense of functionality - during design it is easy to concentrate on aesthetic factors such as consistency and order; whereas how the person will use the system is actually most important.

I believe the publication is still available from the API website

Wednesday, September 22, 2010

Fight - Move - Float

At a recent task analysis workshop I was running one of the process operators, who was ex-navy, told me that warships have three modes of operation. They are Fight, Move and Float.

It occurs to me that there may be a case to identify explicit modes of operation at industrial facilities. I'm blogging this now as a prompt for me to give it some thought.

I've had a quick google and only found a couple of references to this philosophy.

The Australian Navy's Semaphore Issue 10, August 2008 says these modes are fundamental in the design of warships.

Parliamentary Candidate Richard Mollet refers to it in his blog saying that the Navy use it to decide command priorities.

According to Wikipedia page for Canadian Navy mottos "Innare Progredi Bellare" is Latin for To Float, to Move, to Fight

Wednesday, September 15, 2010

World Cup TV gaffe 'caused by human error'

According to the BBC Website on 23 August 2010

An "unfortunate error" in a control centre led to ITV HD viewers seeing a car advert instead Steven Gerrard's goal four minutes into the England v USA match on 12 June.

ITV stated that it took additional measures after the incident to ensure it could not happen again, including manufacturing special covers to avoid accidental activation of equipment.

BA passengers told in error of imminent crash

Article in The Independent by James Corcut 28 August 2010

Passengers on a British Airways flight were told they were about to crash into the sea after the message "This is an emergency. We may shortly need to make an emergency landing on water" was played in error.

BA said it was investigating the incident but denied reports that the message was triggered in error by one of the pilots. A spokesman added: "We would like to apologise to passengers."

If it was not pilot error, I wonder what caused it. If it was computer error, that seems more worrying.

Simulator training flaws tied to airline crashes

Article in USA Today by Alan Levin 31 August 2010

According to the paper's analysis, "Flaws in flight simulator training helped trigger some of the worst airline accidents in the past decade. More than half of the 522 fatalities in U.S. airline accidents since 2000 have been linked to problems with simulators".

One problem is that in rare but critical instances simulators can trick pilots into habits that lead to catastrophic mistakes. For example, many simulators make difficult take offs, such as in gusty cross-winds, seem far easier than in the real world. But people may not be told told that simulators are inaccurate.

According to Kevin Darcy, an aviation safety consultant "It's really important to know how that data is programmed and where the holes are. Otherwise you are fooling yourself."

Simulators are only as good as the data used to program them. Current simulators aren't accurate when a plane goes out of control, which has prevented their use in training for the leading killer in commercial aviation.

Simulator training was cited in some of the deadliest accidents in the past decade. Among them:

• After a Colgan Air plane went out of control and 50 people died near Buffalo on Feb. 12, 2009, the NTSB found that airline simulators needed to be improved to give pilots better training in such emergencies.

• On Nov. 12, 2001, an American Airlines pilot's aggressive use of the rudder caused his jet to break apart, killing 265 people. The NTSB found that a American simulator exercise had given pilots a false sense of how the rudders worked.

Friday, September 10, 2010

Deepwater Horizon - The Human Factors

I've had a quick look at the Accident Investigation Report published 8 September 2010 available from the BP website. There a number of human factors issues raised that are not new discoveries in the world of process safety, but this does re-emphasise their importance.

I've provided more information below, but the main human factors issues uncovered include:
* Personnel reassured by initial test results so that they did not complete the subsequent test steps that may have shown they had a problem;

* Personnel developing theories that explained what was being observed was acceptable and not a problem;

* Failure to provide detailed procedures for critical tasks, and relying on competence without any assurance that competence was in place;

* Failure to define what level of monitoring is required during critical activities, leaving it to the individual's discretion;

* Personnel undertaking critical tasks being distracted by other activities;

* Not preparing personnel (through training, exercises, instructions) to deal with problems so that they do not know what to do to avoid knock-on effects and escalation.

Recommendations made in the report cover:
* Clarifying practices and procedures for critical activities

* Improve incident reporting, investigation and close out of the resulting actions;

* Improve risk management and management of change (MOC) processes

* Enhance competency programs

* Establish BP’s in-house expertise in activities performed by contractors.

For further information see below.

Key Finding 1 of the report is that the annulus cement barrier did not isolate the hydrocarbons. It is clear that the cement requirements were complex. A simulation was run to establish an acceptable slurry design and placement. Some tests were run and passed, but the report suggests that they were not comprehensive. It appears to me that the team were happy with the initial test results, and so did not see the need to do more. This is fairly standard human behaviour. We are usually reassured if everything seems to be OK, and not inclined to look further in case something is wrong. One factors identified in High Reliability Organisations that set them apart from others is that they are NOT easily reassured that things are OK, and always assume something must be wrong somewhere, and continually search for problems.

Key-finding 3 of the report is that the negative-pressure test was accepted although well integrity had not been established. It is clear that the test was carried out, which involves reducing wellbore pressure below reservoir pressure to check that mechanical barriers were able to prevent hydrocarbons getting where should not. The report says that the initial findings from the test were not as expected, with 15 barrels of fluid flowing whereas less than 4 would have been considered normal. But further testing seemed to suggest that there was not a serious problem, and a theory of 'annular compression' or 'bladder effect' was used to explain the initial observations. However, there were other possible reasons for what happened, including plugging with solids or human error. Again, it is normal human behaviour to believe the information that tells us what we want or expect to see (i.e. in this case the test being passed) whilst ignoring other information or observations that may be indicating something is wrong or not as expected.

Another aspect of key findings 3 was that a detailed procedure was not provided, even though the negative-pressure test was a critical activity. I don't know if it is the case here, but most companies have many procedures, often with lots of detail. But they rarely concentrate on making sure they have the procedures they really need. In reality we need detailed procedures for the most critical tasks, and it is vital that they are actually used. For lower criticality tasks we can rely on competence, providing we have some way of ensuring that competence has been established. The reality for many companies is that the procedures are rarely used and competence is relied on for high and low criticality tasks, with poor systems in place to manage competence.

Key Finding 4 of the report was that the influx of hydrocarbons was not recognised soon enough to rectify the problem. Instructions were in place stating that "the well was to be monitored at all times." It is very common to write in a procedure or instruction that a system or item of equipment has to be monitored. But unless it says how the monitoring has to be carried out, what conditions need to be achieved or what needs to be done in response to events, these instructions are almost worthless. This seems to have been the case in this incident as the evidence suggests that the indications of a problem were not picked up for over 40 minutes. As well as lack of clarity in the instruction to monitor, the crew were probably distracted by a number of other activities taking place at the time.

Key finding 5 of the report was that well control response actions failed to regain control of the well. It is suggested that the protocols in place did not "fully address how to respond to high flow emergency situations" and that the crew were not sufficiently prepared. Again this is a common weakness. Company procedure and competency systems are often best at addressing routine activities and obvious emergencies (i.e. fire, toxic release) but do little to prepare people for unplanned events that can have serious knock-on effects if not recognised or dealt with properly. Although exercises to test muster and evacuation may be carried out regularly, personnel are rarely put through their paces on how to deal with major process upsets, trips, loss of utilities etc.

Recommendations from the report that relate to human factors most closely include:

* Update and clarify current practices to ensure that a clear and comprehensive set
of cementing guidelines and associated Engineering Technical Practices (ETPs) are
available as controlled standards. The practices should include, as a minimum:
- Clearly defined mandatory practices.
- Recommended practices and operational guidance.
- Definitions of critical cement jobs.
- Description of the technical authority’s (TA’s) role in oversight and decision making.

* Clarify and strengthen standards for well control and well integrity incident reporting and investigation. Ensure that all incidents are rigorously investigated and that close out of corrective actions are completed effectively.

* Review and assess the consistency, rigor and effectiveness of the current risk management and management of change (MOC) processes practiced by Drilling and Completions (D&C)

* Enhance D&C competency programs to deepen the capabilities of personnel in key operational and leadership positions and augment existing knowledge and proficiency in managing deepwater drilling and wells operations by:
- Defining the key roles to be included in the enhanced competency programs.
- Defining critical leadership and technical competencies.
- Creating a ‘Deepwater Drilling Leadership Development Program.’ The program would build proficiency and deepen capabilities through advanced training and the practical application of skills.
- Developing a certification process to assure and maintain proficiency.
- Conduct periodic assessments of competency that include testing of knowledge and demonstrations of the practical application of skills.

* Develop an advanced deepwater well control training program that supplements current industry and regulatory training. Training outcomes would be the development of greater response capability and a deeper understanding of the unique well control conditions that exist in deepwater drilling. This program should:
- Embed lessons learned from Deepwater Horizon accident.
- Require mandatory attendance and successful completion of the program for all BP and drilling contractor staff who are directly involved in deepwater operations, specifically supervisory and engineering staff, both onshore and offshore.
- Where appropriate, seek opportunities to engage the broader drilling industry to widen and share learning.

* Establish BP’s in-house expertise in the areas of subsea BOPs and BOP control systems through the creation of a central expert team, including a defined segment engineering technical authority (SETA) role to provide independent assurance of the integrity of drilling contractors’ BOPs and BOP control systems. A formalized set of authorities and accountabilities for the SETA role should be defined.

Tuesday, July 20, 2010

Mapwright - practical, simple and flexible ISO 9001

Stumbled across this website from Australian company Mapwright. It gives some great advice for generating simple and effective quality systems.

I don't know anything about the company, but the website gives some great advice. It also reminds about some of the bad things done in the name of 'quality' over recent years.

Friday, July 16, 2010

The Nature of Human Error

Taken from The Human Contribution: Unsafe Acts, Accidents and Heroic Recoveries

Reason says that error is generally considered to be some form of deviation. These can be:
* From the upright (trip of stumble);
* From current intention (slip or lapse);
* From an appropriate route towards some goal (mistake);
* Straying the path of righteousness (sin)

A variety of classifications and taxonomies have been developed over the years the explain error, which fall into four basic categories:

1. Intention - Was there an intention before action, was intention the right one and were the action performed correct to achieve the intention?

2. Action - Were there omissions, unwanted or unintended actions, repetitions, actions on the wrong object, misorderings, mistimings or merging of actions?

3. Context - Deviating because you are anticipating what is coming or echoing the past, have been primed to follow a pattern that does not work for all circumstances, you are disrupted, or distracted or stressed.

4. Outcome - Looking at errors according to the outcome, which can include inconsequential free lessons, exceedances working near the edge of safe limits, incidents and accidents.

There are many myths about errors. Reason suggests the following:

1. Errors are intrinsically bad - trial-and-error learning is essential if we are to understand novel situations.

2. Bad people make bad errors - in fact it is often the best people that make the worst errors because they may be in positions of responsibility and tend to push the limits by trying out new techniques

3. Errors are random and variable - we can actually predict the type of error likely to occur based on situation.

The Human Contribution

Book by James Reason The Human Contribution: Unsafe Acts, Accidents and Heroic Recoveries. Published by Ashgate 2008

Some of this book is really interesting and useful. Equally, I did feel quite a lot of the content could have been cut to make it more readable and focussed on the real issues. I'd say it was not the writing that was the problem, but instead a lack of good editing.

It is certainly a book worth reading, and I will put some further posts here summarising some of the bits I thought were most useful. I'll start with some excerpts from the introduction.

"The purpose of this book is to explore the human contribution to both the reliability and resilience of complex and well-defended systems." However, instead of just taking the normal view that the human in a system is a 'hazard' because of its unsafe acts, the book also explores the role of the human as a 'hero' whose adaptations and compensations have "brought troubled systems back from the brink of disaster." The author says that he believes that learning more about 'heroic recoveries' will be "potentially more beneficial to the pursuit of improved safety in dangerous operations."

Wednesday, July 07, 2010

Overfill Protective Systems - Complex Problem, Simple Solution

Useful paper by Angela E. Summers, available from

It considers the Buncefield, BP Texas City and Longford accidents and the issues related to high levels. The main conclusion is as follows:

"Catastrophic overfills are easily preventable. When overfill can lead to a fatality, follow these 7 simple steps to provide overfill protection:
1. Acknowledge that overfill of any vessel is credible regardless of the time required to overfill.
2. Identify each high level hazard and address the risk in the unit where it is caused rather than allowing it to propagate to downstream equipment.
3. Determine the safe fill limit based on the mechanical limits of the process or vessel, the measurement error, the maximum fill rate, and time required to complete action that stops filling.
4. When operator response can be effective, provide an independent high level alarm at a set point that provides sufficient time for the operator to bring the level back into the normal operating range prior to reaching a trip set point.
5. When the overfill leads to the release of highly hazardous chemicals or to significant equipment damage, design and implement an overfill protection system that provides an automated trip at a set point that allows sufficient time for the action to be completed safely. Risk analysis should be used to determine the safety integrity level (SIL) required to ensure that the overfill risk is adequately addressed. While there are exceptions, the majority of overfill protection systems are designed and managed to achieve SIL 1 or SIL 2.
6. Determine the technology most appropriate for detecting level during abnormal operation. The most appropriate technology may be different from the one applied for level control and custody transfer.
7. Finally, provide means to fully proof test any manual or automated overfill protective systems to demonstrate the ability to detect level at the high set point and to take action on the process in a timely manner.

Tuesday, July 06, 2010

When risk management goes bad

Article in the Risk Digest on 2 July 2010, which itself was created from an article by Robert Charette available to members of Enterprise Risk Management & Governance

"According to BP PLC's 582-page 2009 spill response plan for the Gulf of Mexico, walruses along with sea otters, sea lions, and seals are among the "sensitive biological resources" that could be harmed by an oil discharge from its operations in the Gulf. The only problem is that walruses, sea otters, sea lions, and seals don't happen to live in the Gulf of Mexico, and haven't for a considerable period of time—like millions of years."

"The spill plan also lists a Japanese home shopping site as one of BP's primary providers of equipment for containing a spill, a dead professor as one of its wildlife experts to consult with in the event of spill, and other outrageous gaffes."

"BP was not alone in worrying about walruses. Chevron, ConocoPhillips, and ExxonMobil's oil discharge response plans in the Gulf of Mexico also listed those poor walruses as potential victims of a spill."

"The US government must have been worried about those walruses, too, since those in government accountable for reviewing and approving the oil companies' response plans didn't say a word about them."

The oil companies had actually outsourced the writing of their oil response plans to a consulting group. Either the organisations did not read the plans or they read them but did not pick up the errors. The latter may be more worrying because it suggest oil companies and the government lack the competence to manage risk.

"It is pretty clear that oil spill risk management wasn't taken seriously at all by BP, or by most of the other major oil companies drilling in the Gulf. In congressional hearings, oil industry officials admitted that the industry is poorly equipped to handle oil spills of any size in the Gulf, and that is why the industry tries to prevent spills from happening. The industry also viewed its oil-well blowout preventers as foolproof safety mechanisms, even though they fail regularly. However, the industry officials also admitted that less than 0.1% of corporate profits are spent on improving offshore drilling technologies, even as the risks of drilling offshore have increased significantly over the past decade."

The author suggests that in the future, whenever risk management is incompetently performed, done just to meet some requirement, isn't taken seriously, or is plain lackadaisical, it should be described by the phrase, "Jumping the Walrus."

Friday, July 02, 2010

Airline pilot vowed to improve NHS safety culture after his wife's death

Articles at Wales Today by Madeleine Brindle of the Western Mail, on 28 June 2010

Airline pilot Martin Bromiley is now helping the NHS in Wales to put patient safety at the forefront of everything it does and prevent future fatal mistakes after his wife Elaine died because of clinical errors in her hospital care.

Elaine, aged 37 was admitted for a routine sinus operation, but never regained consciousness and died 13 days later.

After her death, Mr Bromiley was told by the ENT surgeon that they couldn’t have foreseen the complication and that they’d made all the right decisions, but that "it just didn’t work out." But when he realised the death would not be investigated unless he decided to sue he reflected on the aviation industry where all accidents are considered avoidable and investigations are thorough and routine, not to place blame, but so we can learn to ensure it doesn’t happen again.

He persuaded the director of the hospital unit where his wife died that an investigation was necessary. This showed that two minutes into the procedure his wife had turned blue and was struggling to breathe. Four minutes in and she was taking in only 40% oxygen. Six minutes in the team tried to put a tube down her throat. After 10 minutes in they still couldn’t get the tube in.

Guidelines stated this was an emergency, but the theatre staff continued with their attempts to intubate. This was a very experienced team, "In many ways they were the dream team to deal with something going wrong. So why didn’t they?"

The communication process seemed to have dried up. The lead anaesthetist lost control. Many of the nursing staff seemed to know what needed to be done but were ignored.

Mr Bromiley believes that inadvertent human error caused Elaine’s death and that systems need to be developed and people trained to reduce harm.

He said the NHS needs to look at how humans behave in the system and manage the structure around them to make it as easy as possible for the best service to be delivered.

“In aviation we accept that error is normal – it’s not poor performance and it’s not weakness. If you accept this then you can start to catch error; not hide and deny it. Then you can make a difference.

“If you work in healthcare and you feel something is going wrong you have to speak up. If the team in Elaine’s case had taken a minute to get as many views as possible from the team present, maybe it would have helped. Maybe she would still be here. We will never know.”

Mr Bromiley is now involved in the "1,000 Lives Plus campaign" which is involving patients to ensure that NHS Wales is working together to deliver a safe, quality, productive service.
The new UK government is asking the public to participate in restoring Britain’s traditions of freedom and fairness, and free our society of unnecessary laws and regulations – both for individuals and businesses. One area that seems likely to receive some attention is health and safety safety.

They have set up a website where anyone can post ideas and comment on others. Those flagged as health and safety can be viewed at

I think the challenge is that many of the problems are not with laws themselves, but the way they are interpreted. People are frightened to do things because they think they may be breaking a law or will be held liable if something goes wrong. This is perpetuated by a press that seems to love to use health and safety as a convenient excuse for many things. A government can change laws, but I am not sure how they can change perceptions.

Fines over taxi firm fatal blast

Article from BBC website on 24 February 2010

A taxi firm owner has been fined £2,400 for failing to protect his employees in relation to the storage of petrol and failing to protect the public and a petrol station £7,500 for breaching its petroleum licence after an explosion in Immingham in which two people died.

Sue Barker, 43, and Ann Mawer, 52, died in the blast at Fred's Taxis in 2007 when petrol on the premises ignited.

Mr Barker, owner of the company and husband of Sue, bought nearly 25 litres of petrol from the service station, using an unapproved container.

He then carried it into the taxi firm's office, which also contained a gas heater and electrical appliances.

The container broke and the petrol spilled and ignited, causing an explosion.

Thursday, June 24, 2010

Polution Prevention Guidelines PPG21

I was given a link to the updated guidelines on the Environment Agency Website.

Although I don't do a lot of purely environmental work, it is always a consideration, especially in relation to major accidents. These guidelines will be a useful reference when checking company emergency plans to make sure all the environmental aspects have been properly covered.

Monday, June 21, 2010

HSE Human Factors Roadmap

Recently released by the Health and Safety Executive and available at their website

The document presents a graphical framework explaining how companies can address human factors. The introductory paragraph reads "The following framework is intended to guide the reader through a practical approach for linking major accident hazards (MAH) to the assured performance of humans engaged on safety critical tasks associated with those hazards. The framework is presented as a human factors journey with key milestones. For each of the milestones there is a link to human factors topics which may be investigated by Seveso inspectors. Most of these topics are described in more detail in the UK Human Factors Inspectors Toolkit."

The framework works though the following stages:
* Major accident hazard scenarios
* Safety critical tasks
* Task analysis
* Human error analysis
* Procedures
* Training
* Consolidation
* Competence assurance

It shows that if the above are monitored and reviewed the outcome is assured human performance.

The framework includes a side-stream covering maintenance and inspection, branching off from human error analysis. Its stages are
* Engineering/automation
* Maintenance and inspection
* Task analysis
* Human error analysis.

I think this framework is simple and practical, and it is really useful that HSE have set out what they expect. Equally, I am very pleased that it is very close to what I do with my clients!

Monday, June 14, 2010

Group Challenges Proposed Limits on Vial Labeling

Article by Erik Greb on on 10 June 2010

US Pharmacopeia’s (USP’s) Nomenclature Expert Committee has proposed that printing on ferrules and cap overseals should be restricted. They felt that healthcare professionals should rely exclusively on package inserts and vial labels for information about drug products. The organization proposed limiting cap messages to a small set of drugs that pose a risk of imminent harm or death in the event of medication errors.

But this has been challenged by the Consortium for the Advancement of Patient Safety (CAPS) who described the proposal as "ambiguous and could unintentionally reduce patient safety."

CAPS hired Anthony Andre from Interface Analysis Associates and adjunct professor of human factors and ergonomics at San Jose State University, to study the relationship between patient safety and messages on ferrule and cap overseals. A literature review did not find any reported incidents of medication errors that were associated with cap messages, and it was felt that the human-factors principles found in scientific literature did not support the premises of USP’s proposal. An online survey of healthcare practitioners resulted in about 80% of respondents predicting that medication errors would increase if many of the currently allowed cap messages were prohibited and roughly 69% disagreeing with USP’s approach to making warnings more prominent for healthcare professionals.

An empirical human-factors study was carried out. 20 participants included nurses, physicians, and pharmacists who normally handle drug vials and check drugs against prescriptions had to select the correct drug from a group of drug vials, some of which had cap labels that would be prohibited by the USP proposal, and some of which did not. Participants selected drugs with cap labels more accurately and more quickly than they selected unlabelled drugs, according to the report. Participants rated the labelled drugs as easy to use more often than the unlabelled drugs.

I can't comment on whether USP or CAPS is right on the subject. But I am concerned that the CAPS study has only looked at the likelihood of error and not the risk. Some drug administration errors can be fatal and irreversible, whilst others are far more final. It could be the case that reserving this labelling for where it really matters may, as suggested, lead to more errors but may actually reduce risks.

Just How Risky Are Risky Businesses?

A post on Carl Bialik Number Guy blog on 11 June 2010 considers the role of quantified risk assessment in light of the TransOcean disaster.

Apparently "BP didn’t make a quantitative estimate of risk, instead seeing the chance of a spill as “low likelihood” bases on prior events in the gulf." Other industries such as aviation and nuclear tend to use more quantitative assessments, and clearly the question should be whether BP should have done this.

Jon Pack, described as a 'spokesman' is quoted as saying "If you look at the history of drilling in the Gulf, and elsewhere, blowouts are very low-likelihood, but obviously it’s a high impact, and that’s what you plan for,"industry will need to take a second look at measures put in place to prevent hazards," but said this would likely focus on changing processes rather than on calculating risk.

Barry Franklin, a director in Towers Watson’s corporate risk management practice is quoted as saying "My recommendation to companies faced with low-probability and high-severity events would be to worry less about quantifying the probability of those events and focus on developing business continuity and disaster recovery plans that can minimize or contain the damage."

The post includes quite a bit about human error. Some sections are summarised below.

By observing people at work, Scott Shappell, professor of industrial engineering at Clemson University, has estimated that 60% of problems caused by human error involve skill failures, such as of attention or memory, while 35% involve decision errors — poor choices based on bad information, incomplete knowledge or insufficient experience.

NASA has used similar techniques for decades. Among the biggest components of shuttle risk, according to Robert Doremus, manager of the NASA shuttle program’s safety and mission assurance office, are orbital debris — which has a one-in-300 chance of leading to disaster — main-engine problems (one in 650) and debris on ascent (one in 840), which felled Columbia. Human error is also a factor: There’s a 1 in 770 chance that human error in general will cause a disaster, and a 1 in 1,200 chance of crew error on entry.

Human error adds to the imprecision. “Human reliability analysis is a challenge, because you could have widespread variability,” said Donald Dube, a senior technical advisor who works on risk assessment for NRC. “But it is founded on real data.”

In nuclear power plants that have been operating for some time, human errors are the most common ones, said Paul Barringer, a consulting engineer and president of Barringer & Associates Inc. “People are at the root” of many risks, Barringer said.

Doug Wiegmann, associate professor of industrial and systems engineering at the University of Wisconsin, Madison, has studied human error in cockpits, operating rooms and other contexts. “The general human-factors issues are the same whether you’re in a cockpit or anywhere else”: communications, technology design and a checklist chief among them.

ITV HD goal miss with adverts blamed on human error

According to Kate McMahon in The Mirror on 14 June 2010

Stephen Gerard scored a goal for England after about 4 minutes in the World Cup game against the USA. Unfortunately people watching the match on ITV HD missed it because an add was being shown.

Clearly showing the add at that time was not planned. ITV have blamed "human error" from a French supplier after more than 1.5million viewers missed out on the first England goal of the World Cup. The Daily Mirror understands it was caused by an operator hitting the switch at the wrong time in the French company's London office.

Angry executives organised a crisis meeting yesterday morning to ensure the gaffe wouldn't happen again.

Windscreen water infection risk

Article on the BBC website by Emma Wilkinson on 13 June 2010

The Health Protection Agency has been studying Legionnaires' Disease and concluded that 20% of the cases experienced may have their source in windscreen wiper water
Yet adding screenwash kills the bacteria and could save lives, the Agency advised.

Legionnaires' disease is fairly rare, with 345 cases reported in 2009. Early symptoms feel similar to flu with muscle aches, tiredness, headaches, dry cough and fever. It mainly affects the over 50s, is generally more common in men and is fatal in around 10-15% of patients.

Most cases are sporadic and a source of the infection is not found. But it was noticed that people who spend a long time driving were at higher risk of infection.

A pilot study found traces of Legionella 20% of cars that did not have screenwash, but none in cars that did.

The advice is clear, add screenwash to your windscreen washer water.

Sunday, June 13, 2010

Organising events with risks

There is a concern across companies and the wider community that health and safety is stopping people from doing things that are worthwhile because there may be a risk. A short article in Tips and Advice Health and Safety provides a useful summary of the issues.

The article points out that the tabloid press gives the impression that every type of outdoor event has been banned. However, the HSE and other safety bodies are trying to say this is not the case. The problem is that people are not sure what would happen if someone did get hurt and the organisers ended up in court.

A case was heard in the high court earlier in 2010 regarding the case of Robert Uren who was paralysed when he hit his head on the bottom of a pool whilst taking part in an "it's a knock-out" type of event organised for the RAF. In this case the judge concluded that the organisers were not at fault, recognising that the fun of the game included a degree of physical challenge. He said "a balance has to be struck between the level of risk involved and the benefits the activity confers on the participants."

The article suggests that participation in any potentially dangerous event should be voluntary and that they should be well informed of the hazards and that they need to take responsibility for deciding if they suitably fit and prepared to take part. Using experienced organisers is probably a good idea, but it is still important to make sure they have a good understanding of the risks and hold the appropriate insurance.

Thursday, June 10, 2010

Oil rig culture can breed mistakes

Article by John Hofmeister in the Calgary Herald on 9 June 2010

Reflecting on the Deepwater Horizon blowout and oil spill. He points out that whilst there are a number of possible technical failures that lead to the accident, evidence from other major accidents shows that human factors is likely to have had a major contribution.

He describes a deep water drilling rig as "many people, highly skilled, brilliant on the job, with decades of knowledge and comprehension of what they are doing, motivated by high pay and great benefits, working for two weeks on and two weeks off. A deepwater rig is also a village dedicated to a single task, yet organized by small neighbourhoods of specialty skills and independent businesses." It is a good example of an oil industry that has fragmented itself through the outsourcing because of economic drivers stemming from oil-price volatility and the anti-competitive requirements of most governments.

Hofmeister identifies chain of command and communications as the two human factors he expects to have been the greatest influence..

"Chain of command in high-risk endeavours is the most important human success factor. It must be clearly understood and must work under all circumstances." But on a drilling rig, individuals do not necessarily know who is charge. There are multiple chains of command from several different subcontractors and staff working alongside each other may barely known one another.

"In the worst cases, decision-making can lead to buck passing until no one knows where it stops. Legal contracts set the ground rules for who is responsible for what. When disputes arise, companies disagree, battle or reconcile at higher levels on or even off the platform." Efforts can be made to formalise the various chains, but "People are still people" who work for their own boss, and may have relatively little understanding of overall operation.

Person-to-person communications is the other factor. "People communicating can be respectful and polite; they can also be demeaning, abrupt or abusive."

Monday, June 07, 2010

IBM distributes virus-laden USB keys at security conference

Article by Asher Moses in the Sydney Morning Herald on 21 May 2010

IBM distributed a virus-laden USB keys to attendees at Australia's biggest computer security conference. The incident is ironic because conference attendees include the who's who of the computer security world and IBM was there to show off its security credentials.

Crisis response time measurement

Blog post at Houppermans on 28 April 2010

Here is a simple guide to measure your response time to a crisis:

1. Take a copy of your business continuity plan or guide.
2. Carry it to a safe place.
3. Set fire to it and measure how long it burns.

Speed is essential to deal with a crisis. Reacting appropriately in a timely manner minimises the risk of further escalation, be it a fire, toxic substance release, kidnapping or other grave situation.

Many organisations provide guides that are just too big - one was contained almost 200 separate recovery processes, each extensively documented.

The problems with this include:

* Exercise is critical to facilitate smooth, low risk execution. To ensure so many processes are sufficiently practised presents major challenges.
* Recovery processes must be flexible - changing circumstances will endanger any format that is too prescriptive.
* Processes must be light and focused. It is essential to avoid distraction, extraneous information can distract and costs time to read, thus delaying appropriate reaction.
* Volume causes critical delay. Starting crisis management with a choice of almost two hundred separate processes loses precious time, with an added risk of choosing the wrong starting scenario. After all, this initial selection is made under stress.

Seven to nine crisis handling processes should cover every need. This number is based on practical experience and on client feedback where recovery processes were exercised or used for real. Good processes are slim, efficient, focused and flexible, stripped from anything that can distract from the actual crisis at hand.

The worst time to discover the process problems is during a crisis…

Video of an early ergonomics study

This is an excerpt from a half hour documentary on the life and work of Frank Gilbreth. Gilbreth lived at the turn of the last century and was a student of Frederick Taylor. He studied work to make it more efficient. This excerpt is about his work to improve bricklaying and find the “one best way” to lay bricks. In doing so he made bricklaying more efficient but also safer. More on his life can be found at the Gilbreth Network website.

Human error at meat plants gives UK beef farmers an annual £14m bonus

Article by Andrew Forgrave in the Daily Post on 18 May 2010

Trials of Video Image Analysis (VIA) show the machines are more accurate than human operators, who are instructed to give farmers the benefit of doubt in an estimated 6% of cases.

The National Beef Association is calling for compensation for affected farmers to cover the £14 million that may be lost if VIA is introduced.

How You Work Can Affect How You Feel

Article by Dr. Jennifer Yang on Health News Digest on 18 May 2010.

It provides a good summary of typical health and medical problems caused by office work.

Computer work may appear to be a low-effort activity when viewed from a total body perspective, but maintaining postures or performing highly repetitive tasks for extended periods can lead to problems in specific areas of the body. They include
* Cervical myofascial pain syndrome, neck and shoulder pain that can be caused by poor posture and muscle overuse when sitting at a computer workstation for prolonged periods of time.
* Rotator cuff disease, affecting the muscles and tendons that hold the shoulder joint in place (the “rotator cuff”). Shoulder pain and weakness limit movement and are typically caused by frequent performance of overhead activities and reaching.
DeQuervain’s tenosynovitis, an inflammation of the tendons of the muscles moving the thumb, caused by repetitive pinching motions of the thumb and fingers (such as from using joysticks or scissors).
* Ulnar neuropathy at the elbow, which manifests as numbness in the pinkie and ring fingers, hand clumsiness and weakness, and pain from the elbow down the forearm. Symptoms are due to damage to the ulnar nerve that stretches across the elbow joint, and are associated with repetitive elbow movements or prolonged and frequent placement of the elbows on a desk or armrests.
* Carpal tunnel syndrome, the most widely recognized of all CTDs, resulting in pain, tingling and numbness from the heel of the hand through the middle finger and sometimes includes the wrist; in severe cases, hand grip weakness and clumsiness are also common. Repetitive strain and overuse of the wrist joint causes inflammation of the tendons, which in turn crowd around the median nerve that runs alongside the tendons. Any repetitive motions involving the wrist such as excessive keyboard typing and computer mouse use are common causes of carpal tunnel syndrome.

BA - 'stalinist' bosses and safety concerns

Simon Calder writing in The Independent on 29 May 2010

Safety has been one of the issues raised during the long running dispute at British Airways. One union demand was assurances about cabin-crew rosters on new aircraft, to avoid existing staff being obliged to work aboard "an ageing fleet of old, broken, ill-maintained aircraft". Apparently BA flies an older fleet than most carriers, including its low-cost rivals and even Aeroflot.

The article says "Older aircraft are in no sense unsafe, since they are impeccably maintained by BA's engineers." But Professor Martin Upchurch of Middlesex University Business School believes "an embedded culture of bullying and authoritarianism" by the airline's top management could jeopardise safety.

In a report commissioned by Unite and sent to BA's investors, the Professor of International Employment Relations warns:

"The reporting of 'errors' may diminish if staff feel vulnerable and insecure."

"Employing newer, younger staff on lower terms and conditions may not only affect employee commitment (and customer satisfaction) but also have implications for safety when evaluated through 'critical incidents' or 'human error' reporting."

A spokesman for BA said:

"Safety of our customers and crew are our highest priority and we make no compromises. All of our cabin crew are trained to the highest standards and meet all regulatory requirements."

Professor Upchurch also describes the use of disciplinary action against cabin crew as "being reminiscent of the worse [sic] aspects of methods used by Stalinist secret police".

Writing a good checklist

My last post from Checklist Manifesto by Atul Gawande

Bad checklists are vague, imprecise, too long, hard to use and impractical. They are typically written by people sitting in offices and treat the user as "dumb." They "turn people's brains off rather than turn them on."

Good checklists are the opposite. The provide reminders for the most critical and important steps, that even a highly skilled person could miss. Most importantly, they are practical in assisting people manage complex situations by making priorities clear. They have their limitations, and need to be perfected through use.

According to Dan Boorman of Boeing you have to decide what is going to prompt the use of a checklist and what type of checklist is required. The two main types are:
1. Do then confirm - people do the steps from memory then stop and go through the checklist to make sure they have not forgotten anyting
2. Read then do - people follow through the checklist like a recipe, ticking steps off as they do them.

The rule of thumb is to have 6 to 9 items on a checklist (but this can depend on circumstances). Ideally it should fit on one page and be free of clutter and unnecessary colour. Use familiar language. Overall, you have to make sure your checklist achieves the balance between providing help whilst not becoming a distraction from other things.

Gawande uses an example of a checklist for an engine failure on a single engined aircraft. It only has six steps, but the number step is "fly the plane." It has been found that pilots can be so desperate to restart the engine they become fixated and forget to do what they can to survive without an engine.

Sunday, June 06, 2010

More checklists

More from Checklist Manifesto by Atul Gawande

Gawande uses a number of non-medical examples to illustrate the role of checklists.

The Katrina Hurricane that devastated New Orleans provides examples of what can go well and what can go wrong. The main problem was that there were too many decisions to be made with too little information. However, authorities continued to work as if the normal way of doing things applied. This meant the federal government wouldn't yield power to the state, state wouldn't yield to local government and no one would involve the private sector. This led to trucks with vital supplies of water and food were not allowed entry because the authorities did not have them on their plan. Bus requisitions required for evacuation were held up for days. The root of the problem was people assumed the normal command and control structure would work for any situation and that there would be a big plan that was going to provide the solution. This case was far too complex for that.

Gawande uses Wal-Mart as an example of an organisation that did things much better. Apparently Lee Scott, the chief executive said in a meeting with upper management "a lot of you are going to have to make decisions above your level. Make the best decision you can with the information that's available to you at the time, and, above all, do the right thing." This was passed down to store managers and set the way for people to react. The initial focus was on the 20,000 employees and their families, but once they were able to function as stores local managers acted on their own authority to distribute nappies, baby formula, food, toiletries, sleeping bags etc. They even broke into the store pharmacy to supply the hospitals. Senior managers at Wal Mart did not issue instruction but instead supported the people who were in the position to assist. They found that given common goals, everyone was able to coordinate with others and come up with "extraordinary solutions."

Gawande sees the key message from this that under conditions of true complexity, efforts to exert central control will fail. People need to be able to act and adapt. There needs to be expectations, co-ordination and common goals. Checklists have a place here to make sure stupid things are not missed but they cannot tell people what to do.

This is something I can associate with. When suggesting the need for emergency procedures to cover specific types of event I am often given the response that 'you cannot write a procedure to cover everything.' This is something I totally agree with, but are cannot agree that the answer to provide nothing. Instead, people need brief prompt cards or checklists (of sort) to help them make the right decisions. Reading these may not be the first thing someone does when confronted with a situation, but they are very useful in training and assessment, and it is likely that others coming to assist can be pointed to the prompt card to make sure nothing has been forgotten about.

Gawande uses the example of US Airways Flight 1549, the plane that landed in the Hudson River in 2009 after it flew into a flock of geese, which caused both engines to fail. Captain Chelsey Sullenberger was held up as a hero for carrying out the "most successful ditching in aviation history," but he was very quick to point out that the success was down to teamwork and adherence to procedure. Sullenberger's first officer Jeffrey Skiles had nearly as many flying hours under his belt, although less on the Airbus A320. Gawande makes the point that this could have been a problem in an incident because both may have been inclined to take control, especially as the two men had never flown together. But before starting engines the two men had gone through the various checklists, which included requiring the team to introduce themselves, a discussion of the flight plan and how they would handle any problems. By having the discipline to go through this right at the start of the flight "they not only made sure the plane was fit to travel but also transferred themselves from individuals into a team, one systematically prepared to handle whatever came their way." This was a crew that had over 150 total yeats of flight experience, but they still went through the routine checklists, even though none involved had ever been in an air accident before.

The aviation industry has learnt from experience. The need for much better teamwork was identified following the 1977 Tenerife plane collision, where the Captain on the KLM plane had total command and the second officer was not able to intervene successfully. But it has also been learnt that checklists have to avoid rigidity or creating the situation where people follow them blindly. In the Hudson River incident the checklist of main focus was engine failure. Sullenberger took control of the plane and Skiles concentrated on trying to restart the engines, whilst also doing the key steps in the ditching procedure, including sending a distress signal and making sure the plane was configured correctly. Sullenberger was greatly helped by systems on the plane that assisted in accomplishing a perfect glide, eliminating drift and wobble; to the point of displaying a green dot his screen to give a target for optimal descent. All this freed him to focus on finding a suitable landing point. At the same time flight attendants were following their protocols to prepare passengers for crash landing and being ready to open doors. Gawande summarises this by saying the crew "showed an ability to adhere to vital procedures when it mattered most, to remain calm under pressure, to recognise where one needed to improvise and where one needed not to improvise. They understood how to function in a complex and dire situation. They recognised that it required teamwork and preparation and that it required them long before the situation became complex and dire. This is what it means to be a hero in the modern era."

The origin of checklists

As promised sometime ago, I have summarised parts from the Checklist Manifesto by Atul Gawande

Gawande traces the use of checklists back to 1935 when the US Army Air Corps were looking for a long-range bomber. Boeing developed the model 299, which was significantly faster and had a greater capacity than anything offered by other companies and was nick-names the 'flying fortress'. However, at a demonstration flight it crashed shortly after take-off. No technical failure was identified, and it was concluded that it was caused by the pilot forgetting to release a locking mechanism on the elevator and rudder controls. Some people concluded that the plane was too complicated, and would never be flyable by humans. Douglas won the contract to supply their less able, but less complex plane.

Some in the Army were still keen to use the Boeing 299. They realised that the pilots on the plane that crashed were some of the most experienced pilots in the business, so more training could not be the solution. Instead they came up with the idea of checklists for take-off, flight and landing. These lists were simple, brief and to the point. They were short enough to fit on an index card, but they worked. The Army went on to order nearly 30,000 of the planes, which were dubbed the 'B-17.'

Gawande explains that in complex environments, experts are up against two main difficulties:
1. Fallibility of human memory and attention - especially for mundane routine matters that are easily overlooked when there appears to be more important things to attend to;
2. The tendency to skip checks, even when remembered, because they don't always matter - things that could cause a problem but have never done so in the past.
Checklists help to overcome these difficulties because they provide a reminder of what needs to be done and they instil a "kind of discipline" that makes people do things they may otherwise skip over.

In 2001 a critical case specialist at John Hopkins Hospital names Peter Pronovost developed a checklist with the aim of reducing infections by making sure key steps were carried out when inserting a central line (tube inserted into a vein). At first nurses were asked to observe doctors and record whether the steps were performed. The result was that in a third of cases, at least one step was missed. Nurses were then authorised to stop doctors if a step was being missed. This was seen as 'revolutionary' because it gave nurses some power over doctors.

After a year of using the checklist the results were so spectacular Pronovost was not sure whether to believe them or not. So they kept monitoring. It was calculated that after a little over two years the checklist had prevented 43 infections, eight deaths and saved £2 million in costs.

Pronovost observed that checklists helped memory recall and clearly set out the minimum steps in a process. He was surprised with the results because he had not realised how often even very experienced people did not grasp the importance of certain precautions. His results were impressive, and he put a lot of effort into spreading the word across the country (speaking in an average of seven cities per month). But others were very reluctant to take up the idea, either because they did not believe the results or they simply thought that would not need a checklist.

A checklist can be useful, not just if there are lots of steps that can be forgotten or intentionally skipped, but if there are lots of people involved in the task. Taking construction as an example, Gawande explains that in the past the 'Master Builder' designed, engineered and oversaw all aspects of construction for a building. By the middle of the twentieth century this did not work any more, and instead a team of experts and specialists are required. This is where checklists start to become necessary.

The trouble is that some professions have been slow to realise that the nature of the job has changed and become more complex. In medicine doctors don't seem to realise that most patients receive attention from many different specialists. If a checklist or equivalent is not used the result is duplicated, flawed and sometimes completely uncoordinated care.

Monday, May 03, 2010

The Checklist Manifesto by Atul Gawande

I was recommended this book by an old colleague/friend. I have to day I was a little sceptical and was not full with great confidence by the first couple of chapters. It was not that I thought the author was wrong in any way, but I did think it was going to be another of those business books that only had one relatively trivial thing to say, but fills the book saying the same thing over and over again.

However, I am very pleased to say this book was much better than I expected. It shows how the very simple concept of a checklist, if done well, can have a great impact at reducing risks. But this has less to do with the fact that steps to be performed have been written down and more to do with the cultural change that takes place, with people communicating better, working as a team and accepting that they cannot remember everything.

Equally, the author points out that getting people to accept checklists is not easy. It is difficult to explain why, but it is almost that the idea is too simple and so people somehow feel it cannot possibly help them improve.

The book has certainly got me thinking, and I will be blogging more over the coming weeks once I have had time to reflect.

The book is available from Amazon (current price £6.50) at

The Checklist Manifesto: How to get things right

Also, as an audio book

The Checklist Manifesto: How to Get Things Right

Tuesday, April 13, 2010

Basic human body measurements for technological design

ISO/TR 7250-2:2010 has been published providing statistical summaries of body measurements together with database background information for working age people in the national populations of individual ISO member bodies. The data are intended for use in conjunction with ISO standards for equipment design and safety, which require body measurement input, wherever national specificity of design parameters is required.

It tells us average height and weight
* American man 1.76 and 80 kg.
* Thai man are 1.67 m and 64 kg.
* Dutch woman 1.67 m and 72 kg
* Japanese woman 1.57 m and 51 kg.

Monday, April 12, 2010

‘Alarm fatigue’ linked to patient’s death

Article at by Liz Kowalczyk on 3 April 2010

Investigations into the death have concluded that desensitization to alarms was a factor in the patient’s death. The article says this is "a national problem with heart sensors and other ubiquitous patient monitoring devices. Numerous deaths have been reported because of alarm fatigue, as beeps are ignored or go unheard, or because monitors are accidentally turned off or purposely disabled by staff who find the noise aggravating."

I'm well aware of the concerns about alarm overload of control room operators in the process industry. I'm not surprised to find it is an issue in medicine.

available for download

I've taken this summary from an article by Robert (Bob) Waixel on the Risks Digest


Docklands Light Railway (DLR) is an off-street rapid transit light railway
system in London England (it is different from the London Underground or
'Tube' system).

DLR trains are normally run under remote automatic computer control
(monitored by controllers) but from time to time are controlled by a
passenger service agent onboard, at times of so called degraded working. At
the time of the derailment on 10 March 2009 this was the case, as the
automatic signaling had failed at a complex three way intersection. The
person driving (for simplicity referred to as 'the driver' from now on) was
being given instructions by a controller in a control room by radio. When
being manually driven trains can only be driven at a very restricted speed.

There are very few colour light signals on this railway since they are not
needed when trains are being driven automatically. Points (US: switches)
where lines diverge (or converge as in this case) have Point Position
Indicator (PPI) display lights (at ground level) to indicate their
setting. Such setting can also, of course, be confirmed by the position of
the point/switch blades themselves.

In this accident the train ran through a set of trailing points at low speed
and was derailed. There were no injuries and passengers were detrained
rapidly to an adjacent station platform.

Why did it happen?

The interest to RISKS readers lie in the mix of factors that led to the
incident, a mix of technical and human problems, including these:

* Major long term upgrade work on the whole railway caused the signaling
in this complex trackwork area to fail for long periods thus needing
trains to be driven from onboard under manual control (giving a heavy
sustained workload on controllers).

* A software change in the behaviour of interlocking of signaling and
these points, by the upgrade contractors had not been communicated by the
upgrade contractor to the controllers.

* The controller did not fully follow correct procedure in authorising the
train forward.

* The controller did not monitor progress of the train (controller was busy
elsewhere) (their screen was switched to a different type of display).

* The driver did not check the position of the points/switches for their
intended route.

* that type of Point Position Indicator was hard to see by the driver
(management had postponed replacement of them as not being urgent).

* The bulb in the PPI had failed (replacement of failed light bulbs in PPIs
wasn't considered urgent).

* The driver should not have crossed points without correct PPI showing
(driver didn't notice that no indication was showing).


* Equipment that might not be safety critical in 'normal usage' becomes so
in 'abnormal/degraded' working conditions

* People's workloads that might not be safety critical in 'normal usage'
becomes so in 'abnormal/degraded' working conditions

* If it takes a lot of simultaneous failures for an accident to happen, then
it will happen, sooner or later.

Robert (Bob) Waixel, MBCS, CITP, MCInstM, FHEA, Cambridge, CB4 1JL, UK

Tuesday, March 02, 2010

Ergotron Intros New Sit-Stand Mobile Computer Workstation

Article at by Brendan B. Read on 25 February 2010

It starts by saying "Ergonomics experts have long known that enabling employees to stand for even part of the time while working is healthier than sitting the entire time. It also improves performance." Also, it quotes research by Dr. Marc Hamilton at the University of Missouri which indicates that "the simple act of standing may have great health benefits for office workers. The sedentary act of sitting all day actually lowers metabolic rate, negatively impacting other bodily systems."

This article is about a workstation that can be used at different heights to allow working whilst seated or stood up. I understand the comments about improving productivity by getting up out of the chair, but not totally convinced that being able to work at a computer stood up is really the answer.

Andy Brazier

Why did Ethiopian flight ET409 crash and who is at fault?

Partner and Head of the Aviation Department of that firm, Attorney James Healy-Pratt, answered a number of questions related to this crash, reported at
on 26 February 2010.

His answers to the question "can you offer any theories, or information on the possible causes of this tragic event?" were as follows:

"There are a number of factors that may have affected the aircraft. In our experience, accidents almost always occur because of a number of factors combining together, rather than a single cause. Potential factors to consider include:

1. Meteorological Factors

These include lightning and turbulence. There have been reports that the aircraft was struck by lightning several times prior to impact. Turbulence alone may have been a contributing factor, particularly if the aircraft encountered a storm cell. While it does not usually cause crashes, it can greatly increase the workload on a pilot.

2. Spatial Disorientation

Spatial disorientation can occur during steep banking or acceleration and is particularly dangerous in night-time and bad weather conditions. This disorientation can result in pilots' perception disagreeing with reality. In these states, if not corrected, pilots can lose control of an aircraft.

3. Engine Failure / Technical Problems

Reports of a fire may indicate that the aircraft suffered engine failure. Aircraft are designed to be able to fly on only a single engine, however if this was complicated by other factors, control may have been lost.

4. Failure of the spoiler actuators

An accident with very similar facts to the Ethiopian Airlines accident is Kenya Airways Flight KQ507 in 2007, where a Boeing 737-800 crashed shortly after takeoff in night-time and bad weather conditions. We were instructed by families in that accident to seek answers and pursue claims for compensation. No accident report has been released. We have commenced litigation in the US as a result of our concerns that the spoiler actuators, hydraulic pumps which control the spoilers, may have jammed or asymmetrically deployed leading to the loss of control of the aircraft."

Tuesday, February 16, 2010

Recent calculation errors

There have been a few reports in the press recently where calculation errors have led to spectacular claims. They include:

The Intergovernmental Panel on Climate Change (IPCC) have admitted that the figure of 55% of the Netherlands lying below sea level quoted in a 2007 report was incorrect. In fact the correct figure is 26%. The report used data for areas of the country prone to flooding, which includes land alongside rivers that is above sea level. Reported in the Guardian by Robin McKie on 14 February 2010

In the same IPPC report it was claimed that Himalayan glaciers could melt away by 2035. Apparently this was based on "poorly substantiated estimates of rate of recession" and was unfounded. Reported in the Guardian by RDamian Carrington on 20 January 2010

The UK Conservative Party claimed in a dossier that 54% of girls in the 10 most disadvantaged areas of UK get pregnant before they turn 18. The figure was actually 5.4%. Reported on the BBC website 15 February 2010

Many people have been issued the wrong tax code, which may result in them paying extra tax. Seven different errors have been reported, which is being blamed on the introduction of a new computer system and errors made in bringing data forward from older systems. Reported on the BBC website 8 February 2010

Nodar Kumaritashvili dies at winter olypics

There will be many articles about the tragic death of Nodar whilst practising the luge at the Vancouver winter Olympics. This blog from James Pearce of the BBC gives a good summary of the the issues to consider. It appears that Nodar made a mistake, but was the course to blame for the consequences? And exactly where do you put the balance of risk for a dangerous sports like the luge?