Wednesday, September 22, 2010

Fight - Move - Float

At a recent task analysis workshop I was running one of the process operators, who was ex-navy, told me that warships have three modes of operation. They are Fight, Move and Float.

It occurs to me that there may be a case to identify explicit modes of operation at industrial facilities. I'm blogging this now as a prompt for me to give it some thought.

I've had a quick google and only found a couple of references to this philosophy.

The Australian Navy's Semaphore Issue 10, August 2008 says these modes are fundamental in the design of warships.

Parliamentary Candidate Richard Mollet refers to it in his blog saying that the Navy use it to decide command priorities.

According to Wikipedia page for Canadian Navy mottos "Innare Progredi Bellare" is Latin for To Float, to Move, to Fight

Wednesday, September 15, 2010

World Cup TV gaffe 'caused by human error'

According to the BBC Website on 23 August 2010

An "unfortunate error" in a control centre led to ITV HD viewers seeing a car advert instead Steven Gerrard's goal four minutes into the England v USA match on 12 June.

ITV stated that it took additional measures after the incident to ensure it could not happen again, including manufacturing special covers to avoid accidental activation of equipment.

BA passengers told in error of imminent crash

Article in The Independent by James Corcut 28 August 2010

Passengers on a British Airways flight were told they were about to crash into the sea after the message "This is an emergency. We may shortly need to make an emergency landing on water" was played in error.

BA said it was investigating the incident but denied reports that the message was triggered in error by one of the pilots. A spokesman added: "We would like to apologise to passengers."

If it was not pilot error, I wonder what caused it. If it was computer error, that seems more worrying.

Simulator training flaws tied to airline crashes

Article in USA Today by Alan Levin 31 August 2010

According to the paper's analysis, "Flaws in flight simulator training helped trigger some of the worst airline accidents in the past decade. More than half of the 522 fatalities in U.S. airline accidents since 2000 have been linked to problems with simulators".

One problem is that in rare but critical instances simulators can trick pilots into habits that lead to catastrophic mistakes. For example, many simulators make difficult take offs, such as in gusty cross-winds, seem far easier than in the real world. But people may not be told told that simulators are inaccurate.

According to Kevin Darcy, an aviation safety consultant "It's really important to know how that data is programmed and where the holes are. Otherwise you are fooling yourself."

Simulators are only as good as the data used to program them. Current simulators aren't accurate when a plane goes out of control, which has prevented their use in training for the leading killer in commercial aviation.

Simulator training was cited in some of the deadliest accidents in the past decade. Among them:

• After a Colgan Air plane went out of control and 50 people died near Buffalo on Feb. 12, 2009, the NTSB found that airline simulators needed to be improved to give pilots better training in such emergencies.

• On Nov. 12, 2001, an American Airlines pilot's aggressive use of the rudder caused his jet to break apart, killing 265 people. The NTSB found that a American simulator exercise had given pilots a false sense of how the rudders worked.

Friday, September 10, 2010

Deepwater Horizon - The Human Factors

I've had a quick look at the Accident Investigation Report published 8 September 2010 available from the BP website. There a number of human factors issues raised that are not new discoveries in the world of process safety, but this does re-emphasise their importance.

I've provided more information below, but the main human factors issues uncovered include:
* Personnel reassured by initial test results so that they did not complete the subsequent test steps that may have shown they had a problem;

* Personnel developing theories that explained what was being observed was acceptable and not a problem;

* Failure to provide detailed procedures for critical tasks, and relying on competence without any assurance that competence was in place;

* Failure to define what level of monitoring is required during critical activities, leaving it to the individual's discretion;

* Personnel undertaking critical tasks being distracted by other activities;

* Not preparing personnel (through training, exercises, instructions) to deal with problems so that they do not know what to do to avoid knock-on effects and escalation.

Recommendations made in the report cover:
* Clarifying practices and procedures for critical activities

* Improve incident reporting, investigation and close out of the resulting actions;

* Improve risk management and management of change (MOC) processes

* Enhance competency programs

* Establish BP’s in-house expertise in activities performed by contractors.

For further information see below.

Key Finding 1 of the report is that the annulus cement barrier did not isolate the hydrocarbons. It is clear that the cement requirements were complex. A simulation was run to establish an acceptable slurry design and placement. Some tests were run and passed, but the report suggests that they were not comprehensive. It appears to me that the team were happy with the initial test results, and so did not see the need to do more. This is fairly standard human behaviour. We are usually reassured if everything seems to be OK, and not inclined to look further in case something is wrong. One factors identified in High Reliability Organisations that set them apart from others is that they are NOT easily reassured that things are OK, and always assume something must be wrong somewhere, and continually search for problems.

Key-finding 3 of the report is that the negative-pressure test was accepted although well integrity had not been established. It is clear that the test was carried out, which involves reducing wellbore pressure below reservoir pressure to check that mechanical barriers were able to prevent hydrocarbons getting where should not. The report says that the initial findings from the test were not as expected, with 15 barrels of fluid flowing whereas less than 4 would have been considered normal. But further testing seemed to suggest that there was not a serious problem, and a theory of 'annular compression' or 'bladder effect' was used to explain the initial observations. However, there were other possible reasons for what happened, including plugging with solids or human error. Again, it is normal human behaviour to believe the information that tells us what we want or expect to see (i.e. in this case the test being passed) whilst ignoring other information or observations that may be indicating something is wrong or not as expected.

Another aspect of key findings 3 was that a detailed procedure was not provided, even though the negative-pressure test was a critical activity. I don't know if it is the case here, but most companies have many procedures, often with lots of detail. But they rarely concentrate on making sure they have the procedures they really need. In reality we need detailed procedures for the most critical tasks, and it is vital that they are actually used. For lower criticality tasks we can rely on competence, providing we have some way of ensuring that competence has been established. The reality for many companies is that the procedures are rarely used and competence is relied on for high and low criticality tasks, with poor systems in place to manage competence.

Key Finding 4 of the report was that the influx of hydrocarbons was not recognised soon enough to rectify the problem. Instructions were in place stating that "the well was to be monitored at all times." It is very common to write in a procedure or instruction that a system or item of equipment has to be monitored. But unless it says how the monitoring has to be carried out, what conditions need to be achieved or what needs to be done in response to events, these instructions are almost worthless. This seems to have been the case in this incident as the evidence suggests that the indications of a problem were not picked up for over 40 minutes. As well as lack of clarity in the instruction to monitor, the crew were probably distracted by a number of other activities taking place at the time.

Key finding 5 of the report was that well control response actions failed to regain control of the well. It is suggested that the protocols in place did not "fully address how to respond to high flow emergency situations" and that the crew were not sufficiently prepared. Again this is a common weakness. Company procedure and competency systems are often best at addressing routine activities and obvious emergencies (i.e. fire, toxic release) but do little to prepare people for unplanned events that can have serious knock-on effects if not recognised or dealt with properly. Although exercises to test muster and evacuation may be carried out regularly, personnel are rarely put through their paces on how to deal with major process upsets, trips, loss of utilities etc.

Recommendations from the report that relate to human factors most closely include:

* Update and clarify current practices to ensure that a clear and comprehensive set
of cementing guidelines and associated Engineering Technical Practices (ETPs) are
available as controlled standards. The practices should include, as a minimum:
- Clearly defined mandatory practices.
- Recommended practices and operational guidance.
- Definitions of critical cement jobs.
- Description of the technical authority’s (TA’s) role in oversight and decision making.

* Clarify and strengthen standards for well control and well integrity incident reporting and investigation. Ensure that all incidents are rigorously investigated and that close out of corrective actions are completed effectively.

* Review and assess the consistency, rigor and effectiveness of the current risk management and management of change (MOC) processes practiced by Drilling and Completions (D&C)

* Enhance D&C competency programs to deepen the capabilities of personnel in key operational and leadership positions and augment existing knowledge and proficiency in managing deepwater drilling and wells operations by:
- Defining the key roles to be included in the enhanced competency programs.
- Defining critical leadership and technical competencies.
- Creating a ‘Deepwater Drilling Leadership Development Program.’ The program would build proficiency and deepen capabilities through advanced training and the practical application of skills.
- Developing a certification process to assure and maintain proficiency.
- Conduct periodic assessments of competency that include testing of knowledge and demonstrations of the practical application of skills.

* Develop an advanced deepwater well control training program that supplements current industry and regulatory training. Training outcomes would be the development of greater response capability and a deeper understanding of the unique well control conditions that exist in deepwater drilling. This program should:
- Embed lessons learned from Deepwater Horizon accident.
- Require mandatory attendance and successful completion of the program for all BP and drilling contractor staff who are directly involved in deepwater operations, specifically supervisory and engineering staff, both onshore and offshore.
- Where appropriate, seek opportunities to engage the broader drilling industry to widen and share learning.

* Establish BP’s in-house expertise in the areas of subsea BOPs and BOP control systems through the creation of a central expert team, including a defined segment engineering technical authority (SETA) role to provide independent assurance of the integrity of drilling contractors’ BOPs and BOP control systems. A formalized set of authorities and accountabilities for the SETA role should be defined.