Friday, December 28, 2007

Aircraft Maintenance Incident Analysis

CAA PAPER 2007/04 Aircraft Maintenance Incident Analysis - Published by the Civil Aviation Authority, December 2007

Paper is an analysis of a selection of maintenance related events on jet aircraft above 5,700kg MTOW, captured and stored under the requirements of the CAA’s Mandatory Occurrence Reporting (MOR) scheme to identify trends, themes and common causes or * factors.

It presents a taxonomy that looks useful. It has three main categories:

1. Maintenance Control – An event attributed to an ineffective maintenance control
system.

2. Incomplete Maintenance – An event where the prescribed maintenance activity
is prematurely terminated. In these circumstances the correct maintenance procedures appear to have been followed but something was not removed, not fitted or not set correctly towards the end of the process.

3. Incorrect Maintenance Action – An event where the maintenance procedure was completed but did not achieve its aim through the actions or omissions of the maintainer. In these circumstances it appears that an incorrect maintenance procedure or practice was being used. This has resulted in a larger number of second level descriptors than Incomplete Maintenance, but includes the actions of not removing, not fitting or not setting something correctly by virtue of not performing the task correctly, rather than as an error of omission.

Each category is broken down further as follows, showing the results of the analysis.

1. Maintenance Control (Total 733):
* Scheduled task - 223 30·4%
* Inadequate tool control - 84 - 11·5%
* Deferred defect - 81 - 11%
* Airworthiness data - 78 - 10·7%
* Tech log - 67 - 9·2%
* Airworthiness Directive - 66 - 9%
* Modification control - 55 - 7·5%
* MEL interpretation - 37 - 5%
* Configuration control - 23 - 3·1%
* Certification - 13 - 1·8%
* Component robbery - 6 - 0·8%

2. Incomplete Maintenance (Total 602):
* Not fitted - 268 - 44·5%
* Not set correctly - 229 - 38%
* Not removed - 105 - 17·5%

3. Incorrect Maintenance (Total 1589)
* Incorrect fit - 619 - 39%
* Not set correctly - 447 - 28·1%
* Incorrect part - 160 - 10·1%
* Poor maintenance practice - 94 - 5·9%
* Procedure not adhered to - 83 - 5·2%
* Not fitted - 78 - 4·9%
* Incorrect repair - 62 - 3·9%
* Incorrect procedure - 24 - 1·5%
* Not removed - 22 - 1·4%

Unfortunatly the analysis found that information regarding underlying causes is rarely reported. This significantly limits the value of the analysis, and is something the industry needs to address.

Andy Brazier

Introducing new technology

Abbey targets cost and service gains from IT overhaul - article on computerweekly.com on 4 December 2007 by Karl Flinders.

Abbey have developed 'The Partenon' banking platform to replace 30-year-old legacy computer systems and provide the bank with a single view of its customers for the first time. It is hoped to reduce costs to the business by £300m. Abbey has consolidated all of its customer records on to a single database. Eliminating duplication has allowed the bank to reduce the number of customer records it stores from 52 million to 20 million.

The article goes on to say "Training and getting users to buy into projects is an important competency which is often overlooked in the banking sector, according to Ralph Silva, analyst at TowerGroup."

He said human error is responsible for 40% of the failures of major IT projects in the European banking sector. Only 5% are caused by problems with the technology.

"Almost every major failure of any significant IT project in the European financial services sector can be attributed to human error," said Silva. "The human element is always the last one to be considered, and yet it is the highest cause of failure."

Abbey's training programme

* Face-to-face tuition and e-learning on tools and ways of working delivered to 25,000 staff
* Support for staff in branches and contact centres
* Dedicated single point of contact helpline
* Comprehensive pilots before full roll-out
* Post implementation consolidation training
* Training includes "contingency" processes to minimise service disruption
* Training is piloted with focus groups
* Senior Abbey management sent to Santander to meet colleagues and see Partenon working.

Andy Brazier

Adverse drug reactions

Allergy to medicines 'is killing thousands' - Article in the Time Online on 27 December 2007 by David Rose.

Nearly 3,000 patients have died in the past three years as a result of taking medicines intended to help them, official figures show. Thousands more have been hospitalised after suffering harmful side-effects or serious allergic reactions to prescription drugs and other medications.

Drugs most commonly implicated in adverse reactions include low-dose aspirin, diuretics, the anticoagulant drug warfarin and other nonsteroidal antiinflammatory drugs. The most common problem associated with these medications is gastrointestinal bleeding, which can be fatal. But many of the reactions were likely to be because of incorrect dosages or known interactions of the drugs and as such were avoidable, research suggests.

Teresa Innes, 38, lapsed into a coma in September 2001 after a surgeon at Bradford Royal Infirmary prescribed a drug containing penicillin as she was about to undergo a routine procedure to drain fluid from an abscess on her thigh. Despite wearing a red allergy band on her wrist and medical notes giving warning about her acute aversion to the antibiotic, Mrs Innes was given the drug Magnapen, which staff did not realise contained penicillin.

The former care worker suffered an-aphylactic shock, which stopped her heart for 35 minutes, resulting in permanent brain damage. She was left in a persistent vegetative state from which she never recovered. She died two years later.

This is a good example of how complex it is for someone to become competent in a task. In this case it seems likely that everyone knew about Teresa's allergy, but did not have deep enough knowledge of the drug. Given the number of drugs used in health care this is hardly surprising. Some form of job aid could probably help, if people would use it in practice.

Andy Brazier

Wednesday, December 12, 2007

Ergonomics society oil and gas conference - part 5

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Andrew Hopkins gave a presentation entitled "Thinking about process safety indicators." Andrew is very well known for his book "Lessons from Longford" which gives a fascinating account of organisational failures related to Esso's fire and explosion in Australia.

Andrew made a number of very good points in his presentation. He talked about the 'Heinrich triangles' which suggest that for every fatal accident there will be 10 major injures, 100 minor injuries, 1000 near misses etc. He said this gives the impression that reducing the rate of minor incidents can influence the likelihood of a major accident. However, this is not the case and that a separate triangle is required that only covers process safety incidents so that for every major accident there is 10 major process disturbances, 100 minor process disturbances and 1000 near misses. There may be a very small overlap on the bottom level of the personal and process safety triangles.

Andrew's main point was that we have become overly concerned with the difference between leading and lagging indicators of safety performance. This distinction is quite artificial and not as clear cut as it may appear. Instead what we need is more process safety indicators. It does not really matter if they are leading or lagging, as they only need to occur with sufficient frequency to give statistically relevant data. To do be effective the indicators need to show how well barriers or defences are working and performing.

An interesting suggestion from Andrew was that manager bonuses should be linked to process safety, although it must be done in a way that does not cause 'perverse outcomes' whereby the act of measuring leads to data being hidden. Any personal incentives should be symbolic and public (e.g. cinema pass).

Andy Brazier

Ergonomics society oil and gas conference - part 4

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Ian James of HSE presented the 7 step approach to managing human factors:

1. Consider main site hazards
2. Identify human activities for these (e.g. bulk transfers, maintenance, startup, reactor charging)
3. Outline key steps in these activities (remember to talk to operators)
4. Identify potential human failures for key steps (slips, mistakes and violations)
5. Identify performance influencing factors that make failure more likely (job, person, organisation)
6. Use the hierarchy of control (don't reply on human as the last line of defense, but automation introduces new issues)
7. Manager error recovery (makes it more likely that errors will be detected by others or the system)

HSE expect companies to take a structured approach, focused on human role in initiating and mitigating major hazards that considers all error types (unintentional and decision failures, as well as intentional and action failures). They expect operators to be involved, and that management failures are considered. HSE prefer a qualitative approach, and do not expect quantification of risks related to human factors.

Andy Brazier

Ergonomics society oil and gas conference - part 3

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Isadore (Irv) Rosenthal gave a presentation titles 'BP's Texas City accident - are the lessons taught likely to be learned and implemented?' Irv had been a member of the Baker Panel that investigated the management and organisational failures that contributed to this accident. I have blogged findings from the report previously, and Irv covered many of these points. However, his presentation provided further insight, which is summarised below.

It is easy to see BP as a large, highly profitable company that makes you wonder why money was not being spent to improve safety. Whilst this is true, the fact that the refinery arm of the business made a relatively small contribution to the overall profit, well below that of exploration and production. It is estimated that the accident has cost BP over $2.5 billion in fines, settling claims and most significantly lost opportunity. It also had a very negative impact on stock/share prices for up to 18 months.

The findings from the Baker Panel report should not have been a surprise to the company, because many similar issues had been raised by reports of the accidents at BP Grangemouth Refinery in 2000. For example, quoting from reports:

1. Grangemouth - "Insufficient management attention and resources were given to maintaining and improving technical standards for process operations and enforcing adherence to standards, codes of practice, company procedures and HSE guidance"
1. Texas City - "Process safety, operations performance, and systematic risk reduction priorities had not been set and consistently reinforced by management."

2. Grangemouth - There was a need to build awareness and competencies in process safety and integrity management within senior leadership and the organisation in order to develop a meaningful value conversation around cost versus safety. "There was a lack of experience in some areas, and limited refresher training plans."
2. Texas City - The Texas City Refinery suffers from an "inability to see risks and, hence, tolerance of a high level of risk. This is largely due to poor hazard/risk identification skills throughout management and the workforce, exacerbated by a poor understanding of process safety...There was no ongoing training program in process hazards risk awareness and identification for either operators, supervisors or managers."

3. Grangemouth - "With no formal structure or specific focus on process safety, many of the components of process safety management (PSM) were not formalised at Grangemoth. There was no site governance structure to provide overview and assurance that process safety issues were being handled appropriately. Process safety needed to be elevated to the same level as person safety."
3. Texas City - "The investigation team was not able to identify a clear view of the key process safety priorities for the site or a sense of a vision or future for the long term. Focus (was) on environment and personal safety, not process safety. There was little ownership of PSM through the line organisation."

4. Grangemouth - "BP group and Complex Management did not detect and intervene early enough on deteriorating performance....Inadequate performance measurement and audit systems, poor root cause analysis of incidents, and incorrect assumption about performance based on lost time accident frequencies and a lack of key performance indicators.. meant the company did not adequately measure the major accident hazard potential."
4. Texas City - "The safety measures focused primarily on occupational safety measures, such as recordable and lost time injuries. This focus on personal safety had led to the sense that safety was improving at the site. There was not clear focus or visibility on measures around process safety, such as lagging indicators on loss of containment, hydrocarbon fires, and process upsets."

5. Grangemouth - "Over the years, a number of maintenance and reliability reviews, task forces, and studies had been conducted, but many recommendations had not been implemented. There was a maintenance backlog and mechanical integrity testing was not prioritised to ensure that safety critical equipment received timely preventative maintenance."
5. Texas City - Risk awareness "repeated failures to complete recommended actions from audits, peer reviews and past incident investigations." "There is currently a backlog of unclosed action items in the tracking database related to various aspects of process safety management, including those stemming from incident investigation. Some of the the latter extend back over a period of more than twelve months."

In conclusion Irv felt BP will learn from Texas City because:

1. Everyone at the company felt very bad about the accident and it had had a major financial and public relations impact.
2. The board had recognised that good process safety also improves product quality, yields, profits and the public image need to keep its license to operate and win oil leases.
3. Unions, neighbours, regulatory agencies and political concerns will motivate more action
4. BP are implementing process safety that should lead to better process safety practices.

I hope he is right in his conclusions!!

Andy Brazier

Ergonomics society oil and gas conference - part 2

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Trevor Kletz gave a presentation titled '25+ years of human factors and process safety.' Although I have heard him speak many times and read some of his books, his message is still (unfortunately) still very relevant to many.

In this presentation he recounted that in the 1960s it was believed that 80% or more of accidents were due to people not taking enough care, and so methods were used following an accident were to 'persuade' people to be more careful. The actual action taken depended entirely on the consequences, not potential consequences and ranged from a 'friendly word' through to dismissal "pour encourage les autres."

Trevor's key message was that one element of human factors that is still not getting enough attention is design. Lessons about design are not being learnt, and so opportunities to engineer-out human error are being missed. His examples included:

* Avoid people falling down stairs by only building bungalows. OK, so this may not be possible, but by stair cases have one or turns in them, the distance that can be fallen is significantly reduced;
* At Bhopal the substance that caused the harm to so many people was an ;intermediate. It was convenient to store it , but not essential
* Piper Alpha occurred in part because oil and gas is separated offshore, yet it is technically possible to carry this step onshore;
* Nitration is a common but very hazardous reaction used to make amines. No other process is known, but no one has ever looked for one;
* The new Pendolino trains have a major problem with toilets leaking. This is because the waste materials (which are corrosive) are stored at roof level and when they leak create very bad smells.

Trevor's message was that we are still missing simple fixes during design. Perhaps if accident reports were discussed critically by designers, some of these problems that cause human error would be avoided.

Andy Brazier

Ergonomics society oil and gas conference - part 1

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Martin Anderson opened the conference by giving an idea of where industry should be heading. Of particular note was his negativity towards behavioural safety. Not because there is anything particularly wrong with it, but because too many companies think using such a programme means they have 'done' human factors.

Martin showed a poster add from the airforce. It read "It takes about 80,000 rivets, 30,000 washers, 10,000 screws and bolts to help make this aircraft fly...... and only one nut to destroy it." Martin made it clear that this was NOT a useful message. Individuals rarely have much influence over the factors that make it more or less likely they will make an error, and so telling people to 'be careful' makes very little difference.

We all know that culture is an important part of human factors, but Matrin made the point that we can think this refers to 'operator culture' when in fact it is the 'organisational culture' that we need to be looking at. He quoted the following examples from the major accidents

* Poor competency assurance - Esso Longford
* Poor user interfaces - Texaco Pembroke
* Failure to learn from the past - Mexico City
* Poor maintenance management - Bhopal
* Inadequate management of change - Flixborough
* Poor communications - Piper Alpha
* Poor implementation of safety policy - Kings Cross fire

Martin made the point very forcibly that behavioural safety does not equal huaman factors. Behavioural approaches:

* Focus on observable behaviours only
* Draw attention away from process safety issues
* Don't address the significant impacts of management behaviour
* Can make a contribution to safety, but have limited benefits for the control of major hazards.

In particular it is not appropriate to focus on employee behaviour or culture when the organisation has insufficient resources, inapparopriate priorities, does not plan work effecitvely, has not assessed risks, has poor control over contractors, does not invest capital, has inadeqaute procedures and competency assurance etc.

Martin finished with a quote from Winston Churchill

"To look is one thing, to see what you look at is another
To understand what you see is another
To learn from what you understand is something else.
But to act on what you learn is all that really matters"

Andy Brazier

Friday, November 23, 2007

Humorous Communication Errors - QI

Last week's episode of the BBC comedy QI included a couple of classic stories from the panel.

Alan Davies explained how he had the opportunity to go behind the scenes at London Zoo, and took his young nieces along. At the lion enclosure the keeper gave very clear instructions that standing near the smaller mesh was safe but near the bigger mesh was not (of vice verse). When asked if they understood, the young girls said yes. On entering proceeding, one of the girls asked Alan "what is mesh?"

On a similar note, Bill Bailey recounted an opportunity he had to go into a big cat enclosure in a Brazilian zoo. The keeper said "always approach the cat from the front." As Bill made his way into the enclosure the keeper said "sorry, I meant to say never approach from the front."

They made me laugh

Andy Brazier

Missing disks

It is very big news in the UK this week that HM Revenue and Customs have lost two cds containing benefit details of 25 million people when posting them to the National Audit Office. There are lots of news articles, but this one from the BBC is a good place to start.

I think there are a number of worrying things about this case. Not particularly that the disks got lost, as that is entirely predictable. I would expect arrangements are made to minimise the likelihood of it happening, but it can never be zero.

Instead I am concerned that little effort seems to have been made to have protected the data. As I understand it was not encrypted, so could be quite easy to extract by someone who knew what they were doing. Also, some of the data was not necessary but it was considered too expensive to remove it. I would have thought the need to send this type of information to the National Audit Office would be known, and hence that databases etc. would have been set up to allow it to happen safely and easily. It seems that is not the case.

I am also dismayed that the government are so quick to deny systemic failures and blame junior members of staff for not following procedures. What a terrible attitude towards organisational responsibility.

Andy Brazier

Monday, November 12, 2007

LEAN - 5s

LEAN manufacturing has cropped up a few times in various conversations I have had. I must find out more about it sometime. However, one related technique is known as 5s, and I think it has some safety and human factors application.

The 5s's are Japanese words, but it translates quite well:

Seiri - Sort/Tidiness - Throw away all rubbish and unrelated materials in the workplace

Seiton - Simplify/Orderliness - Set everything in proper place for quick retrieval and storage

Seiso - Systematically clean/Cleanliness - Clean the workplace; everyone should be a janitor

Seiketsu - Standardisation - Standardize the way of maintaining cleanliness

Shitsuke - Sustain/Discipline - Practice 'Five S' daily - make it a way of life; this also means 'commitment'

I am sure there are many references to this. Here is one.

Andy Brazier

Tuesday, October 16, 2007

Stress

Article called "Relax? Don't do it" by Catherine Quin published in the Guardian Office Hours supplement on 15 October 2007

The debate about whether some amount of stress is good for you. Research from the universities of Kentucky and British Columbia has shown that moderate amounts of stress can strengthen the immune system, thought to relate to the primeval fight or flight response to protect primeval humans from injury sustained in a stressful encounter. However, any more than controlled bursts of stress have a negative impact.

Stress can be "acute," which is bouts interspersed with periods of calm. However, "chronic" stress means you don't get the periods of calm. For example, you can't switch off when you get home from work or lie awake at night worrying.

Stress can cause over stimulation of the adrenal gland which interferes with cortical levels, which in turn disrupts waking and sleep patterns. The result can be migraines, hypertension, lowered immunity and depression.

The consequences of stress are closely related to the individuals perception. Even top performs get stressed, but they have identified strategies to control the symptoms and harness the stress to help them perform better. To do this you need to first recognise you have a choice in how you respond to stress. You then need to be able to recognise the effects stress has on you and then learn how to control these effects.

Andy Brazier

The role of consultants

An article in The Times2 on 15 October 2007 by Joe Joseph titles 'Modern morels.'

It asks if you are sitting on a bus or train next to someone doing their homework and you can see them answering a question wrong, should you correct them?

Alexander Pope is quoted as saying "to err is human, to forgive is divine." He did not say "to err is human, but to correct is divine" because no one likes an interfering know-it-all stranger.

There are certain groups of people we pay to criticise others. They include teachers, judges and management consultants. For the latter, bosses pay consultants break uncomfortable news (e.g. restructuring). This leaves the boss free to implement the changes they planned all along and the consultants get the blame. The consultant is not being payed for their corporate insights, but to keep quiet about the Machiavellian subterfuge.

The article finishes by asking whose job is it to criticise outside of work, school etc.? The answer being "spouses."

Andy Brazier

Friday, October 12, 2007

Aviation industry humour

These appear in many places on the internet. They make me laugh and I have blogged them as I am sure I will have a use for them one day when providing human factors training. I don't know the original source, but I copied them from this website

Qantas Maintenance Humor
After every flight, pilots fill out a form called a gripe sheet, which conveys to the mechanics problems encountered with the aircraft during the flight that need repair or correction. The mechanics read and correct the problem, and then respond in writing on the lower half of the form what remedial action was taken, and the pilot reviews the gripe sheets before the next flight. Never let it be said that ground crews and engineers lack a sense of humor. Here are some actual logged maintenance complaints and problems as submitted by Qantas pilots and the solution recorded by maintenance engineers. By the way, Qantas is the only major airline that has never had an accident.

(P = the problem logged by the pilot.)
(S = the solution and action taken by the engineers.)

P: Left inside main tire almost needs replacement.
S: Almost replaced left inside main tire.

P: Test flight OK, except auto-land very rough.
S: Auto-land not installed on this aircraft.

P: Something loose in cockpit.
S: Something tightened in cockpit.

P: Dead bugs on windshield.
S: Live bugs on back-order.

P: Autopilot in altitude-hold mode produces a 200 feet per minute descent.
S: Cannot reproduce problem on ground.

P: Evidence of leak on right main landing gear.
S: Evidence removed.

P: DME volume unbelievably loud.
S: DME volume set to more believable level.

P: Friction locks cause throttle levers to stick.
S: That's what they're there for.

P: IFF inoperative.
S: IFF always inoperative in OFF mode.

P: Suspected crack in windshield.
S: Suspect you're right.

P: Number 3 engine missing.
S: Engine found on right wing after brief search.

P: Aircraft handles funny.
S: Aircraft warned to straighten up, fly right, and be serious.

P: Target radar hums.
S: Reprogrammed target radar with lyrics.

P: Mouse in cockpit.
S: Cat installed.

P: Noise coming from under instrument panel. Sounds like a midget pounding on something with a hammer.
S: Took hammer away from midget.

All too rarely, airline attendants and pilots make an effort to make the in flight safety lecture and other announcements a bit more entertaining. Here are some real examples of intentional and unintentional humor that have been heard or reported:

1. On a Southwest flight (SW has no assigned seating, you just sit where you want) passengers were apparently having a hard time choosing, when a flight attendant announced, "People, people we're not picking out furniture here, find a seat and get in it!"

2. On a Continental Flight with a very "senior" flight attendant crew, the pilot said, "Ladies and gentlemen, we've reached cruising altitude and will be turning down the cabin lights. This is for your comfort and to enhance the appearance of your flight attendants."

3. On landing, the stewardess said, "Please be sure to take all of your belongings. If you're going to leave anything, please make sure it's something we'd like to have.

4. "There may be 50 ways to leave your lover, but there are only 4 ways out of this airplane"

5. "Thank you for flying Delta Business Express. We hope you enjoyed giving us the business as much as we enjoyed taking you for a ride."

6. As the plane landed and was coming to a stop at Ronald Reagan, a lone voice came over the loudspeaker: "Whoa, big fella. WHOA!"

7. After a particularly rough landing during thunderstorms in Memphis, a flight attendant on a Northwest flight announced, "Please take care when opening the overhead compartments because, after a landing like that, sure as hell everything has shifted."

8. From a Southwest Airlines employee: "Welcome aboard Southwest Flight 245 to Tampa . To operate your seat belt, insert the metal tab into the buckle, and pull tight. It works just like every other seat belt; and, if you don't know how to operate one, you probably shouldn't be out in public unsupervised."

9. "In the event of a sudden loss of cabin pressure, masks will descend from the ceiling. Stop screaming, grab the mask, and pull it over your face. If you have a small child traveling with you, secure your mask before assisting with theirs. If you are traveling with more than one small child, pick your favorite."

10. "Weather at our destination is 50 degrees with some broken clouds, but we'll try to have them fixed before we arrive. Thank you, and remember, nobody loves you, or your money, more than Southwest Airlines."

11. "Your seat cushions can be used for flotation; and, in the event of an emergency water landing, please paddle to shore and take them with our compliments."

12. "As you exit the plane, make sure to gather all of your belongings. Anything left behind will be distributed evenly among the flight attendants. Please do not leave children or spouses."

13. And from the pilot during his welcome message: "Delta Airlines is pleased to have some of the best flight attendants in the industry. Unfortunately, none of them are on this flight!"

14. Heard on Southwest Airlines just after a very hard landing in Salt Lake City the flight attendant came on the intercom and said, "That was quite a bump, and I know what y'all are thinking. I'm here to tell you it wasn't the airline's fault, it wasn't the pilot's fault, it wasn't the flight attendant's fault, it was the asphalt."

15. Overheard on an American Airlines flight into Amarillo, Texas, on a particularly windy and bumpy day: During the final approach, the Captain was really having to fight it. After an extremely hard landing, the Flight Attendant said, "Ladies and Gentlemen, welcome to Amarillo . Please remain in your seats with your seat belts fastened while the Captain taxis what's left of our airplane to the gate!"

16. Another flight attendant's comment on a less than perfect landing: "We ask you to please remain seated as Captain Kangaroo bounces us to the terminal."

17. An airline pilot wrote that on this particular flight he had hammered his ship into the runway really hard. The airline had a policy which required the first officer to stand at the door while the passengers exited, smile, and give them a "Thanks for flying our airline." He said that, in light of his bad landing, he had a hard time looking the passengers in the eye, thinking that someone would have a smart comment. Finally everyone had gotten off except for a little old lady walking with a cane. She said, "Sir, do you mind if I ask you a questi on?" "Why, no, Ma'am," said the pilot. "What is it?" The little old lady said, "Did we land, or were we shot down?"

18. After a real crusher of a landing in Phoenix, the attendant came on with, "Ladies and Gentlemen, please remain in your seats until Capt. Crash and the Crew have brought the aircraft to a screeching halt against the gate. And, once the tire smoke has cleared and the warning bells are silenced, we'll open the door and you can pick your way through the wreckage to the terminal."

19. Part of a flight attendant's arrival announcement: "We'd like to thank you folks for flying with us today. And, the next time you get the insane urge to go blasting through the skies in a pressurized metal tube, we hope you'll think of US Airways."

20. Heard on a Southwest Airline flight. "Ladies and gentlemen, if you wish to smoke, the smoking section on this airplane is on the wing and if you can light 'em, you can smoke 'em."

21. A plane was taking off from Kennedy Airport . After it reached a comfortable cruising altitude, the captain made an announcement over the intercom, "Ladies and gentlemen, this is your captain speaking. Welcome to Flight Number 293, nonstop from New York to Los Angeles . The weather ahead is good and, therefore, we should have a smooth and uneventful flight. Now sit back and relax... OH, MY GOD!" Silence followed, and after a few minutes , the captain came back on the intercom and said, "Ladies and Gentlemen, I am so sorry if I scared you earlier. While I was talking to you, the flight attendant accidentally spilled a cup of hot coffee in my lap. You should see the front of my pants!" A passenger in Coach yelled, "That's nothing. You should see the back of mine."

Tuesday, October 09, 2007

Ergonomics checklist

A very useful information sheet is available from Bosch Rexroth at their website.

Andy Brazier

Six sigma applied to safety improvement

Article "A Safer Way to Manufacture" by Walt Rostykus. Published on the Industry Week website September 2007

Goodyear wanted to improve safety in their tyre manufacturing process. They mapped out a strategy that was linked to Goodyear's continuous improvement process and applied Six Sigma's five steps to safety:

* Define: Establish a common goal for improvement and metrics to track process. Establish needed resources including a support infrastructure.
* Measure: Identify and assess tasks for ergonomic risk. Determine the level of exposure to risk.
* Analyze: Evaluate and identify hazards. Evaluate new tools and processes for risk.
* Improve: Control risks and hazards in the workplace. Validate reduction of risk.
* Control: Monitor, review and maintain controls.

Given the number and diversity of Goodyear operations, officials decided the long-term plan would focus on select locations each year, and be initiated in four phases:

* Establish common tools and approach. In addition to the Ergonomics Process Standard, officials selected common assessment and tracking tools to ensure consistent measurement and tracking. Consultant engineers conducted workshops to engage both plant leadership and leaders of the ergonomics process. Together, they developed implementation plans for their respective sites.
* Engage associates and make quick improvements. Consultant engineers led rapid improvement activities to make quick, simple changes in the workplace. This approach engaged associates, improved the workplace quickly, and started the momentum for the ergonomics process.
* Establish a sustainable improvement process. Key associates took special training to develop the skills to conduct ergonomic risk assessments and design/implement solutions in the workplace. This phase established a sustainable improvement process that could continue long after the consultants left the plant.
* Follow up and audit the process. Finally, officials audited each ergonomic process against the criteria to ensure the plant met company expectations.

To improve the chances of success, Goodyear started with five pilot plants. They were selected for different reasons including; high incidence of work-related musculoskeletal disorders (need to do it), interest of plant management (want to do it), type of operations/products (opportunities for improvement), and agreement with labor (commitment to do it).

A Systems Approach

Based on the pilot implementations, Goodyear designed an Ergonomics Center of Excellence (ECOE) model, which allows for a systematic rollout that includes:

* Site visits by an ergonomics consultant. The purpose is to align expectations with the project charter.
* Conduct RAPID events. RAPID events are a form of Kaizen tactical activity that makes swift, measurable and relevant improvements to the workplace, eliminating non-value-added work elements.
* Follow-up audits to ensure that the process aligns with Goodyear's internal process document.
* Training for all team members, which includes plant manufacturing, functional leadership and floor employees.

Andy Brazier

Simulator training alone not enough, experts warn

Excerpts from an article by Jennifer Harrington September 24, 2007. Available from AINonline

Human error is a contributing factor in 60 to 80 percent of all air incidents and accidents, according to FAA statistics. Advisory Circular 120-51E states that many “problems encountered by flight crews have very little to do with the technical aspects of operating in a multi-person cockpit. Instead, problems are associated with poor group decision-making, ineffective communication, inadequate leadership and poor task or resource management.” The facts also show that relatively few corporate flight departments routinely address issues such as human factors and crew resource management (CRM).

Steve Hopkins, chief instructor and senior partner at Century CRM (Booth No. 1217), a pilot-oriented resource management training provider, said part of the problem stems from the fact that most training programs have been developed using outdated data. “Historically, back in the 1950s, ’60s and ’70s, aircraft weren’t as reliable as they are today,” he said. “If the engine or equipment failed, you needed to know what to do.” As technology advanced, however, hardware failures declined. Unfortunately, “the human factors have stayed pretty constant. People still make the same stupid mistakes,” he said.

“For most operators, 100 percent of their training budget is focused on the simulator, which addresses 20 percent of the accidents,” said Gary Rower, founder of Century CRM. “The human factors, which cause 80 percent of the accidents, go unaddressed.”

Andy Brazier

Monday, October 08, 2007

The tragic human cost of NHS baby blunders

Article in the Observer by Denis Campbell on Sunday September 23, 2007. Available from their website

Quote - "Errors and negligence that result in stillbirths or disabled babies are costing Britain's hospitals billions in compensation. In this investigation, The Observer reveals how staff shortages are wrecking the lives of countless parents "

This article lists a number of tragic cases where errors by medical staff have led to death or severe handicap to babies during birth. However, I can't see how the conclusion that these errors are caused by staff shortages has been made.

I am concerned that the NHS fails to learn from the mistakes that take place. It almost seems that they are expected to say that they need more staff, and are using these errors as a justification. This seems pretty bogus to me. Until the NHS starts to understand the root causes of error they will not be able to learn.

Another quote from the article - "The Department of Health insist that England has a good record on births. Gwyneth Lewis, the Department of Health's chief adviser on childbirth, says: 'Due to the skill and expertise of our midwives and doctors, England is one of the safest places to have a baby.'"

Perhaps it is a case that the principles of risk management do not apply in the medical profession.

Andy Brazier

Ergonomics and the bottom line

"Ergonomics And Economics Why ergonomics makes a lot of sense from a dollars-and-cents standpoint and why it may be inevitable because of legislation.
By M. Franz Schneider published in Office Ergonomics May/June 1985 and available at this web address

A group of 123 office workers were selected to investigate the impact of ergonomic furniture on productivity. For eight months before any design changes, workers kept diaries of time spent on various tasks. The absenteeism rate, and number of errors per document and time to complete tasks was monitored. The workers were given checklists which they completed every half hour, describing their postural comfort and perceived well-being.

Workers participated in the selection of furniture through user evaluations, development of layouts, and determination of finishes and accessories. The performance measures were continued for six months after the design changes.

Results were impressive: Monday morning absenteeism dropped from 7 per cent to less than 1 per cent. Over-all absenteeism fell from 4 per cent to less than 1 per cent. Error rates in document preparation fell from 25 per cent to 11 per cent. The percent of the day computer equipment was in use increased from 60 to 86. These results signified an increase in active work time of more than 40 per cent. Reports of postural discomfort showed a marked drop in frequency, severity and duration.

The subjective ratings that managers made of their own performance indicated that more than 70 per cent felt that their effectiveness had improved "very much." Ninety per cent subjectively rated the productivity of their employees as "much improved."

It is suggested that the fact that the study started 8 months prior to the design changes should mean the "observer effect" was minimised because performance only improved after the design changes were made. Also, the productivity improvements endured after the study team was no longer on-site.

Other studies have demonstrated similar benefits:

The performance of State Farm Insurance clerical workers improved as much as 15 per cent with ergonomically acceptable work stations and seating (Dr. T.J.Springer).

Laboratory work showed that the keystroke rate for data-entry tasks increased five per cent when workers were moved from an ergonomically unacceptable environment to one that was ergonomically correct (Dr. Marvin Dainoff).

The performance of office workers at Blue Cross-Blue Shield was shown to improve with the move to an ergonomically enhanced environment, resulting in an overall productivity improvement of 4.4 per cent.

The Norwegian State Institute showed improvements to work station layout and seating, halved back-related absenteeism and reduced turnover from 40 per cent to 5 per cent.

At a major automobile company, management workers used their computer equipment less than 12 per cent of the day. After the introduction of ergonomic computer tables and an improved chair, the VDT-use rate went up four times. Time taken to complete reports and memos was reduced, and the quality of correspondence was rated as being higher. More significantly, the average management worker had at least three more hours per week of time for work. Time that had initially been eaten up by the tedious clerical/management interface was freed by the use of "user-friendly" computer equipment. A telemarketing group reported an increase from ten per cent to 80 per cent on final closings of sales after the change to ergonomically enhanced office furnishings and improvements to the acoustics and lighting of the environment.

Conclusion

People generally work only 60 per cent of the working day, or about 288 minutes. A 5 per cent improvement would provide 14 minutes of productive work per day, 14 fewer minutes of back discomfort and getting up to wander around the office, and 14 more minutes to review reports. There would be 14 fewer minutes of re-doing memos that have been processed incorrectly and 14 minutes for new work, 14 fewer minutes of frustration with screen glare and 14 more minutes of effective programming.

Andy Brazier

Tuesday, September 18, 2007

Classifying causes

I have had a comment on my last post regarding the Virgin train crash. It questions whether there are problems with accident investigation because we have not properly defined the terminology used to describe different types of cause (e.g. "immediate" and "underlying").

My view is that these classifications of cause do not give us the full range needed to fully explain an accident. The reality is that there are many different types of failure that can contribute to an accident, and each of these failures may have multiple causes. Most accidents will start with a combination of technical, human and organisational failures that create a hazardous situation. This already highlights the complexity. For example, a human error can be an "immediate" cause of an accident, but it can also cause a technical failure, in which case the human error would be an "underlying" cause.

This is further complicated by the fact that a hazardous situation does not necessarily result in an accident. If the situation has been predicted defences can be put in place. Only if these fail do you have a developing incident. Even then there are opportunities to recover the situation. Failure to recover results in an accident, whilst successful recovery means it is a near miss.

If I look at an accident I tend to start by thinking "what failures resulted in a hazardous situation developing, were there potential defences and did they fail, and could the situation have been recovered?" This gives me a set of failures (you may call them "immediate" causes) that require analysis. These can then be broken down, for example using "why trees" until the root causes are found. In this case the root causes are where the "why tree" cannot be broken down any further.

My view of failure types comes from a model I first saw in the PhD thesis of Tjerk van Der Schaaf, who is now a professor at Eidhoven University. You can see the model reproduced in another thesis (see figure 1.1 on page 6).

Andy Brazier

Friday, September 14, 2007

Virgin Rail crash, February 2007

A summary of Network Rail's investigation into the Virgin rail crash that killed one person was released 4 September 2007. It is available from here

The report uses a lot of railway jargon that I am not familiar with.

The conclusions identify the immediate cause as the deterioration of components in the stretcher bar system on the points. Underlying cause was a failure to carry out an inspection that would have identified the fault.
* Deficiencies in the asset inspection and maintenance regime employed on Lancs & Cumbria maintenance area resulted in the deterioration of 2B points not being identified. These deficiencies included:
* A breakdown in the local management/supervisory structure that leads, monitors and regulates asset inspection and maintenance activities;
* A systematic failure in the track patrolling regime employed on the local area;
* The issue and subsequent briefing of mandated standards not being carried out in a robust and auditable manner;
* A lack of sample verification to test the quality and arrangements for inspections undertaken.

I find this quite bizarre. Failure to inspect something does not cause it to fail. Yes, it may allow a hazard to be discovered before an accident occurs, but that is not the same thing. It sounds to me like Network rail are trying to distract us from more fundamental problems with the design of points. Especially given the fact we still do not know what caused the Potter's Bar train crash, which also involved a failure of points.

In fact, reading more of the report into this crash it seems design issues were raised, and most of the action items are focussed on these types of issue. This makes it even more strange in my opinion that the conclusions in the report (which are probably all that most people will read) are so focussed on inspection.

Wednesday, July 11, 2007

Drug recall

According to news reports on 6 June 2007, including this one from the BBC, Roche have had to recall all batches of the anti-HIV drug Viracept (generic name nelfinavir) because it was contaminated with potentially cancer causing chemicals. The contamination is being blamed on human error.

According to the Roche website the company are to establish Viracept Patient Registries in order to register and closely follow patients who may have been exposed to a chemical impurity in their Viracept HIV formulations.

Andy Brazier

Inexplicable errors

This is something I have come across a couple of times recently. Someone makes a completely bizarre error and no one can explain exactly why it occurred. I guess this is one of the things with humans, we are not logical creatures and instead more emotional. Sometimes things can't be explained.

This article called "What next for crane safety?" by Phil Bishop, 27 June 2007 describes an incident at Canary Wharf in May 2000 where the top of a tower crane fell whilst being raised into position. In this task it is important to attach the top of the crane to the 'climbing frame.' Because this was forgotten the crane was simply balancing and able to fall. According to the article this is such an obvious and well known issue that it seems incredible that it was forgotten.

Advice in the article is given in the article to reduce the likelihood of errors, but it does not actually address the accident in question. I think the reality is that some bizarre things will happen and we should not spend lots of effort preventing them in the future because we can be fairly sure they are one off incidents. However, we can use these incidents to delve into systems as a whole and find weaknesses and hence opportunities to improve.

Andy Brazier

Quantified human reliability analysis

Available here. Academic paper by Marzio Marseguerra, Enrico Zio, and Massimo Librizzi1 entitled Human Reliability Analysis by Fuzzy “CREAM”

The work uses the the Cognitive Reliability and Error Analysis Method (CREAM) model, which assumes that the human failure probability depends on the level of control a person has over the contextual scenario in which requested to perform. Four modes of control are identified
* Scrambled,
* Opportunistic,
* Tactical,
* Strategic.

The fuzzy approach allows for ambiguity and uncertainty in the calculations.

I must take a closer look at the paper some time.

Andy Brazier

Monday, May 14, 2007

Shift handover in the NHS

Publication from the BMA has some useful information about shift handover that may apply to industry. (I think it ironic that a 38 page document is used to discuss good communication, but I guess that is how things work in the NHS).

It makes the point that good handover does not happen by chance. It requires work by all those involved (organisations and individuals):
* shifts must coordinate
* adequate time must be allowed
* handover should have clear leadership
* adequate information technology support must be provided.

Sufficient and relevant information should be exchanged to ensure patient safety:
* the clinically unstable patients are known to the senior and covering clinicians
* junior members of the team are adequately briefed on concerns from previous shifts
* tasks not yet completed are clearly understood by the incoming team.

Handover is of little value unless action is taken as a result:
* tasks should be prioritised
* plans for further care are put into place
* unstable patients are reviewed.


The document includes a list of handover pitfalls, including:
* Giving verbal handovers at the same time as the team taking over the patient’s care are setting up vital life support and monitoring equipment - valuable information will be lost. The importance of written handover information must be stressed.
* Roles and responsibilities are not always clear during handover and this can lead to omissions.
* Checklists and written updates are important and often under-utilised. When such information is incomplete or omitted it has a knock on effect of increasing the
workload of the staff who have taken over the patient's care because they have to spend a significant proportion of time chasing information.
* It is important that nursing staff are made aware of critical features in the medical management of a patient that will affect care during the next shift.
* Fragmentation of information at the point of handover is a major problem. It is important to avoid multiple concurrent conversations between individuals and let one person (a nominated lead) speak at a time to everyone. This reduces the opportunities for conflicting information to be given.
* Handover is a two way process. Good handover practice is characterised by the team who are taking over the patient’s care asking questions and having the opportunity to clarify points they are uncertain of. They should not be passive recipients of information.

This document gives some practical examples of how handover is being dealt with in NHS holspitals.

Failing to respond to a safety problem

According to this article on the Jishka homework website, "In 1970, psychologists Latane and Darley published their study on "bystander apathy." They found that - in order to help in a crisis - any bystander has to answer five questions. If any one of these questions is answered negatively, help will not be given."

1. Do I notice something happening? If the person is in a hurry or distracted by personal problems, they are less likely to notice what is happening around them.
2. Is the situation an emergency? Is a person lying in a doorway a homeless person resting, a drunk, or a person who has collapsed from a heart attack? Most situations have a high degree of ambiguity. It is hard to tell what is happening.
3. Am I responsible? Latane and Darley found that with more people around, there was a diffusion of responsibility - bystanders assume that others will act, so they are not personally responsible.
4. What can I do? Often people are unsure of their abilities (training or skills) to help in a given situation. They may be concerned that they might make the situation worse.
5. Will I intervene? Bystanders must weigh the costs or dangers of intervening. Will I be harmed? Will I be sued?

I was led to this link by a discussion on this forum where someone was asking what to do about a situation where a number of people violated a procedure. It seemed likely that they did that because others were without raising any concerns. Bystander apathy can certainly help in understanding that violations are often caused.

There is a good discussion on the topic on the Wikipedia

Andy Brazier

Friday, May 04, 2007

Bp Texas city - managers + supervisor blamed

According to a number of news articles, including one in the Denver Post by Juan A. Lozano on 5 May 2007.

An internal report by BP PLC about its deadly 2005 Texas City plant explosion recommended that four executives and managers be fired for failing to perform their jobs and demonstrating poor judgment.

Accountability of John Manzoni, BP's top refinery executive, should be reviewed by the company after he "failed to implement his duties" and didn't "carry out his responsibilities."

The February report singled out four managers who "failed to perform their management accountabilities in significant ways": Mike Hoffman, BP's group vice president for refining and marketing; Pat Gower, U.S. refining vice president; Don Parus, the Texas City refinery manager; and Willie Willis, a plant supervisor.

Hoffman has resigned. The others are still employed by the company, according to the report.

Interestingly, according to an article by Ed Crooks and Sheila McNulty on 4 May 2007 in the Financial Times

New BP Chief Executive Tony Hayward wants to slow the rapid circulation of BP managers. Since 2000, it has been common for BP managers to stay only 18-24 months in their jobs; Mr Hayward wants to raise that to three to four years.

Also, he wants BP to employ fewer contractors and bring more activities in house.

Andy Brazier

Thursday, April 26, 2007

Seven Reasons Why Organisational Change Fails

From Tony Kenneson-Adams' website

Seven reasons why 70% of all change and transformation programs, acquisitions and downsizing efforts fail to meet the expectations of the managers that implement them?

1. Change is driven by a symptom not a cause.
Knee-jerk reactions to symptoms without analysis of the true problem. Classic example is is making staff redundant to reduce costs without analysis of costs. If the high costs is because of poor utilisation of machinery, rearranging shift patterns could improve utilisation and increase profits.

2. Businesses don't know where they are before committing to change.
This is like using a map to get from Derby to Bristol when in fact you are sat in Cardiff.

3. Organisations do not plan for success.
The journey from old to new must be clearly planned, milestones achieved and objectives set.

4. Business do not know when they have arrived.
Without a plan and objectives how do you know you have arrived at your new organization, structure target or what ever. Also, unless you measure success, how will you know if the change has improved anything. There is also no better way to disenfranchise your workforce than introducing multiple change and them seeing no real effect.

5. Approaches to change do consider the impact on the business
Change sends ripples across a the whole business and so must be looked at in the context of the whole business.

6. Staff enablers are victims of change.
Far too often the management have decided on the why, when, who, and when, of the change before they think of discussing the change with their staff. However, staff have a vested interest in change and will invest themselves in change if they are brought into the change process early enough. Not only can they suggest how best change can be brought about, as they truly know the 'nuts and bolts of the job, but working with the staff will reduce conflict, change intolerance and resistance.

7. Lack of Management Buy-in.
Management at all levels need to 'buy-in' to the change, and not just those that will be directly affected. By actively seeking buy-in you are adding a multiplication factor for your success, and a line of communication that you may need to access at some point in the change procedure.

Evidence that you are not managing change well include
* Losing valuable managers to other companies for the same or a lesser salary
* Staff suffering from stress related illness
* Managers putting in such long hours for a diminishing return
* Spending large amounts of money changing process and structure without improving the bottom line to the planned extent

Andy Brazier

Wednesday, April 25, 2007

Surviving a plane crash

Not quite sure why, but the Metro free newspaper published an article on 23 April 2007 by Ed West called "Landing on your feet." I was interested because it shows that training and behaviour can make a difference in many different types of circumstance.

Key points from the article are:

* Knowing the brace position means you are less likely to be injured in the impact and hence more likely to be in a fit state to get out
* Frequent fliers have heard the safety announcements and read the safety card more times; and consequently know what to do in a crash (including simple things like how to undo the seatbelt)
* Count the number of rows from where you are sitting to the emergency exit
* Wear full length cotton clothes and good footwear to give you a good chance of getting out, and it gives some fire protection
* You need to get out within 2 minutes if there is a fire
* Once out, stay at least 150 metres from the the plane

The article finishes by saying Aer Lingus has not lost a passenger since 1970, whilst Ryanair, Monarch, Easyjet and Air Kazakhstan have a 'perfect safety record.'

Andy Brazier

Demolition hammers

Article by Becky Schultz on forconstructionpros.com 24 April 2007

It is about demolition hammers, and in particular changes that have been made to their design to reduce vibration for the users. These are a bit like a big electric drill where the bit goes in and out rather than around. The article makes some interesting points about perception, but also points out that the extra cost of avoiding vibration to the user is increased productivity.

One of the biggest marketing challenges for demolition hammer suppliers has been getting users to understand that less vibration doesn't mean less power. "There has always been this correlation, historically, that the more productive I am, the more the tool is going to vibrate," says Bernstein. Consequently, when low-vibration hammers were first introduced in the U.S., they encountered resistance. "[Users] would complain that the tool doesn't hit as hard as everyone else's only because they didn't feel that vibration back to their bodies, and they have this perception that vibration equals power," says Gallert [of Wacker Corp].

This perception is changing. "It's just within the last year or two that we're starting to get people to realize the benefits," says Gallert. "After they use the [hammer] all day, they see that the tool is working, they're getting the job done and at the end of the day, they feel much better."

Another obstacle has been cost. Vibration-dampening technology does, in some cases, increase the price of the tool. "But it's important to weigh the costs with the benefits you're getting down the road," says Cook. "On the one hand, productivity is immediately increased. And there is certainly a benefit down the road with the workers. Workers' comp incidents or claims shouldn't be as prevalent."

Ironically, when it comes to low-vibration hammer designs, productivity may prove to be the determining factor, not operator comfort.

"A lot of times the guys that are buying the tool aren't the ones using it," Bernstein points out. "What we found is these guys are really paying for productivity all day long. If the tool is more comfortable [to use] because we've taken the vibration out, then the user doesn't have to take as many breaks during the day. So a pretty interesting added benefit of the lower vibration is the added productivity that results. "That," he says, "is something the guy who's buying the tool is willing to pay for."

Andy Brazier

Edward De Bono

Quote from De Bono in an article in the Guardian on 24 April 2007

"Studies have shown that 90% of error in thinking is due to error in perception. If you can change your perception, you can change your emotion and this can lead to new ideas. Logic will never change emotion or perception."

I understand this to mean that people who can think creatively are more likely to make the right decisions and selections. Technical knowledge and written procedures are relatively unhelpful where problem solving is required. This does have an application to industrial safety. It is fairly well known that people often suffer from tunnel vision in high demand situations, often assuming the situation they have been presented with is the same as ones they have seen in the past, and so trying the same solution.

So it seems creative thinking needs to be part of the training for plant operators, maintenance technicians, supervisors and plant managers. Interesting idea!

Andy Brazier

Human error in maintenance

Below are excerts from article entitled "human error is preventable" by Daryl Mather. Published at the plant.com website in April 2007

Human error used to be an area that was only associated with high-risk industries like aviation, rail, petrochemical and the nuclear industry. The high consequences of failure in these industries meant that there was a real obligation on companies to try to reduce the likelihood of all failure causes, not just those related to “normal” or engineering failures. However, there is a lot to be gained, for relatively little outlay, by including a focus on human error within all maintenance operations.

Human error continues to be a common cause of asset failure, both in terms of how an asset is maintained, as well as how it is operated. We see this all the time in areas such as poor calibration, poor alignment, incorrect settings, and even poor quality workmanship.

If you look at the conditions involved in asset maintenance, there are a multitude of reasons why human error would occur which include frequent removal and replacement of large numbers of varied components, often carried out in cramped and poorly lit spaces with less-than-adequate tools, and usually under severe time pressure.

Equipment alignment is a prime example. Fixing a motor to a new plinth and then aligning it to whatever it is driving is a pretty standard task. Yet there are a large number of ways we can make mistakes. Poorly marking the footing mounts, poor drilling (too shallow, not in line) and poor alignment practices are all valid examples.

Also, after a few months new concrete plinths have a tendency to “settle,” often forcing misalignment through shifting of the motor. Failure to take this into account and to perform the necessary checks to correct it if it occurs, is also a human error related issue.

You’ll be surprised to learn that the work procedure helps to increase the likelihood of error, not reduce it. For example:

* Very wordy sentences and instructions will often be ignored. This is human nature. Make sure that the instructions are broken into logical parts, and that they are written in short concise sentences in layman’s terms.
* Studies have shown that when there is a long list of instructions, those in the middle will often be omitted. Make a quality assurance check at the end and ask the technician to double check whether they did certain frequently omitted tasks.
* Too many instructions will be ignored, as will too few. Procedures need to be aimed at presenting an accurate level of detail and instruction as is required.
* A lot of work instructions are focused on the present, but often there is a need for a re-check of alignment several months afterwards. Employ this in the work procedure; make it a task for the maintenance scheduler or to program a separate task once this task has been done.
* More than all of the above, procedures must not tell technicians how to perform basic skills, or they will be ignored. (E.g. don’t go into detail about how to torque a bolt or remove a screw.)

Procedures is one of the many areas where slight adjustments in current practice could have a big impact in reducing lost time and money due to human error. There are many others.

Thursday, April 19, 2007

Prescribing Errors

From Medical News Today on 7 April 2007

The GMC (General Medical Council) has announced funding for a £100,000 research project that aims to investigate the prevalence and causes of errors in doctors' prescribing.


Professor Peter Rubin, chair of the GMC's Education Committee said: "Safe prescribing is crucial to patient safety. Claims that there is a link between education, training and poor prescribing are, so far, anecdotal rather than based on robust evidence. The GMC takes a strong interest in these claims, and is committed to finding out more. We are confident that this research will help shed light on the extent to which this problem exists and identify its causes."

All very good, but £100k to examine what everyone knows, and no suggestion of developing solutions seems bizarre to me. This need to have all the supposed facts before doing anything is, in my opinion, why everything takes so long in the NHS. The trouble is during the time this study takes place, many more people will be harmed by the errors.

Andy Brazier

Research Centre for NHS Patient Safety and Service Quality

Article at Medical news today on 9 April 2007

Announced on 6 April 2007 the centre will bring together academic and clinical researchers. The £4.5m Research Centre for NHS Patient Safety and Service Quality will be one of two such Centres in the UK, funded by the National Institute for Health Research. It will be based in Imperial College's Biosurgery and Surgical Technology section, at St Mary's Hospital, London.

The Centre will trial new approaches and technologies to reduce human error and improve patient care, for example through the use of pharmacy robots to dispense medication, and the involvement of patients themselves in spotting and anticipating medical errors.

This is a lot of money, and of course has to be potentially a good thing. However, I note it brings academics and clinicians together, with no mention of practical human factors expertise, including that from other industries. The idea of using robots and patients, when little basic human factors work has been done in NHS leads me to be sceptical. Time will tell, and from what I have seen so far of the NHS results will be way off in the future.

Andy Brazier

Ergonomics at Britax

Article at ferret.com.au 11 April 2007

Australian supplier of children’s car safety products Britax has installed a new lean assembly system as part of an ongoing improvement plan looking at ways to improve operator ergonomics, production efficiency and working environment.

Britax says it has dramatically improved the working environment, ergonomics, operator efficiency and material flow and has achieved the company’s goals with improvements in both production efficiency and materials handling.

Forklift traffic has been reduced and significant space saving has been achieved throughout the processing areas setting a foundation for a culture of ongoing continuous improvement.

Andy Brazier

Offshore safety

An interesting article on BBC website on 13 April following the capsize of the Norweigian rig support ship in the North Sea

Recent accident statistics from HSE (excluding helicopter)
2006/07 No fatalities, 7 major injuries
2005/06 One fatality, 28 major injuries
2004/05 No fatalities, 27 serious injuries

That seems pretty good to me given the industry employs 20,000 people

HSE has warned that an increasing number of the floating rigs were now beyond their planned life by as much as 10 years. While there was no question that they were in danger, they were needing increasingly regular maintenance.

Professor Mick Bloor of the University of Glasgow, who has studied the industry, described the North Sea oil rig support vessels as being "the quality end of the shipping industry." But a study he was involved in found the dangers of work at sea were made worse by the long hours and irregular working hours of those in the offshore industry. "That has implications for getting proper sleep and leads to the possibility of fatigue-related problems in what is an already demanding environment," he explained.

Andy Brazier

Mars-probe failure 'human error'

From BBC website on 14 April 2007

The US space agency, Nasa, has said that human error was to blame for the failure of the $247m (£124m) Mars Global Surveyor spacecraft (MGS). The craft was 10 years old, but changes made to the computer software caused batteries to overheat and fail 5 months later.

Also, in an article by Staff writers on 17 April 2007 from itnew.com.au

It has been determined that someone uploading commands to update positioning in the High Gain Antenna's positioning for contingency operations wrote the information to the wrong memory address in the onboard computer.

"This resulted in the corruption of two independent parameters and had dire consequences for the spacecraft," the report released by NASA explained.

The corrupted upload happened, according to the report, because two previous updates conflicted and programmers were trying to fix the discrepancy.

NASA said the error caused problems with a solar array, which caused the craft to go into contingency mode, exposing batteries to direct sunlight and overheating. That ultimately depleted the batteries, most likely within 12 hours, according to the report. A second parameter error caused the antenna to rotate away from Earth, which blocked communications.

NASA said that more thorough operating procedures and processes and periodic reviews could have reduced the chance of errors.

Andy Brazier

Alert Bulletin

The Nautical Institute has published a series of bulletins related to human factors, which are now available on a dedicated website

They look good, with useful centre-page pull-outs on key issues. Also, some cartoons that can be used by others (with suitable credits). There is no list of contents, to date, which is a shame but they look very useful.

Andy Brazier

Human Focus online magazine

Lloyd' Register have published this magazine, available on their website

It is essentially a marketing exercise, but I guess it gives a reasonable overview of key human factors issues applied to the marine industry. I can't see anything revolutionary there, but it is a quality publication.

I do like their opening quote.

"There is a limit to the improvements in maritime safety that can be made by attending simply to the hull, machinery and essential systems. To ensure that further progress is made we need to focus on the way that the ship is used and, specifically, the people who interact with it."

Andy Brazier

Thursday, March 29, 2007

CSB Report of BP Texas accident - detail

The 337 Page report is now available at the Chemical Safety Board's website.

Below is a reasonably detailed summary of the report. I have also put together a much more brief overview.

On March 23, 2005, at 1:20 p.m., the BP Texas City Refinery suffered one of the worst industrial disasters in recent U.S. history. Explosions and fires killed 15 people and injured another 180, alarmed the community, and resulted in financial losses exceeding $1.5 billion. The incident occurred during the startup of an isomerization1 (ISOM) unit when a raffinate splitter tower2 was overfilled; pressure relief devices opened, resulting in a flammable liquid geyser from a blowdown stack that was not equipped with a flare. The release of flammables led to an explosion and fire. All of the fatalities occurred in or near office trailers located close to the blowdown drum.

The Texas City disaster was caused by organizational and safety deficiencies at all levels of the BP Corporation. Warning signs of a possible disaster were present for several years, but company officials did not intervene effectively to prevent it. The extent of the serious safety culture deficiencies was further revealed when the refinery experienced two additional serious incidents just a few months after the March 2005 disaster.

There were numerous cultural, human factors, and organizational causes of the disaster . One underlying cause was that BP used inadequate methods to measure safety conditions at Texas City. For instance, a very low personal injury rate at Texas City gave BP a misleading indicator of process safety performance. In addition, while most attention was focused on the injury rate, the overall safety culture and process safety management (PSM) program had serious deficiencies.

Cost-cutting and failure to invest in the 1990s by Amoco and then BP left the Texas City refinery vulnerable to a catastrophe.


Key Technical Findings

PROCEDURES
* Many deviations from written procedures occurred. These were not unique actions but as a result of established work practices, frequently taken to protect unit equipment and complete the startup in a timely and efficient manner.
* Management did not ensure procedures were updated, incorporated learning from incidents or adapted to cover unique startup circumstances.
* There was no effective management of change of procedures.
* Management actions (or inactions) sent a strong message to operations personnel that procedures were not strict instructions but were outdated documents to be used as guidance.
* The ISOM startup procedure was not followed and no record was made of steps completed. As a result a key valve was shut that prevented liquid leaving the raffinate splitter tower.
* The procedure required filling the tower to 50% level. Previous experience showed it needed to be filled higher because the level would typically drop significantly during startup. It was filled to a 99% level reading, but was actually way off scale. A high level alarm was activated at 72%, but a subsequent high level switch was faulty (this was not noticed by operators).

PRE-STARTUP CHECKS
* A rigorous pre-startup procedure required all startups after turnarounds to go through a Pre-Startup Safety Review (PSSR). The process safety coordinator for the ISOM was unfamiliar with its applicability, and therefore, no PSSR procedure was conducted.
* The PSSR is a formal review carried out by a technical team led by the operations superintendent and signed off by senior management. It involves verification of all safety systems and equipment, including procedures and training, process safety information, alarms and equipment functionality, and instrument testing and calibration. Also, that all non-essential personnel had been removed from the unit and neighboring units and that the operations crew had reviewed the startup procedure.
* BP guidelines state that unit startup requires a thorough review of startup procedures by operators and supervisors; but this was not performed or checked off.
* The start-up procedure covered the scenario of one continuous startup. In reality the startup was paused, part of the plant shutdown and later restarted.

MAINTENANCE
* Faulty equipment, including level indicators and control valves, had been identified but not repaired.
* BP Supervisors deemed there was not enough time during the turnaround to make the necessary repairs.
* BP Supervisors stopped technicians checking alarms and instruments because there was not time to complete the checks before the unit was due to start.
* The same BP Supervisors then signed the startup procedure that required that all control valves had been tested and were operational prior to startup.

CONTOL SYSTEM
* Level indicator showed the tower level declining when it was actually overfilling.
* Redundant high level alarm did not activate
* Tower was not equipped with any other level indications or automatic safety devices.
* The control board display did not provide adequate information on the imbalance of flows in and out of the tower to alert the operators to the dangerously high level.

CONTROL SYSTEM INTERFACE
* The reading of how much liquid raffinate was entering the unit was on a different screen from the one showing how much raffinate product was leaving the unit. This made it difficult to identify a discrepancy (Texaco Pembroke Explosion is referenced).

MANNING
* There was a lack of supervisory oversight and technically trained personnel during the startup. This is despite analaysis carried out by Amoco based on 15 previous incidents that showed incidents were 10 times more likely during startup than normal operation. Guidelines on site were that supplementary assistance be present during startup, including additional board operators.
* One supervisor had to leave site due to a family emergency. No one was assigned to provide effective cover.

SHIFT HANDOVER
* Supervisors and operators poorly communicated critical information during the shift turnover (handover);
* Night shift operator left early. Subsequent shift handover was brief because it did not involve the person who had done all the work.
* Records in the shift log were brief and ambiguous. They were mis-interpreted by the incoming shift. This was further exacerbated by the failure to record steps completed on the startup procedure by the previous shift operators.
* BP did not have a shift turnover communication requirement for its operations staff.


FATIGUE
* Operators had worked 12-hour shifts for 29 or more consecutive days.
* Had been getting about 5 hours sleep per night.
* BP has no corporate or site-specific fatigue prevention policy or regulations.
* “Operators were expected to work” the 12-hour, 7-days-a-week turnaround schedule, although they were allowed time off if they had scheduled vacation , used personal/vacation time, or had extenuating circumstances that would be considered on a “case-by-case” basis.

COMMUNICATION
* Key messages were not written down, but passed verbally over phone and radio.
* Board and outside operators interpreted a message regarding routing of rafinate. The Board operator closed a control valve. The outside operator manually opened that valve.

TRAINING
* The operator training program was inadequate. In particular hazards of unit startup.
* Training for abnormal situations was insufficient.
* Training consisted of on-the-job instruction, which covered primarily daily, routine duties.
* Startup or shutdown procedures would be reviewed only if the trainee happened to be scheduled for training at the time the unit was undergoing such an operation.
* BP’s computerized tutorials provided factual and often narrowly focused information, such as which alarm corresponded to which piece of equipment or instrumentation. This type of information did not provide operators with knowledge of the process or safe operating limits.
* BP training program did not include specific instruction on the importance of calculating material balances, and the startup procedures did not discuss how to make such calculations.
* Managers did not effectively conduct performance appraisals to determine the knowledge level and training development plans of operators.
* The central training department staff had been reduced from 28 to eight,
* Simulators were unavailable for operators to practice handling abnormal situations, including infrequent and high hazard operations such as startups and unit upsets.

PLANT
* The process unit was started despite previously reported malfunctions of the tower level indicator, level sight glass, and a pressure control valve.
* The size of the blowdown drum was insufficient to contain the liquid sent to it by the pressure relief valves.
* Neither Amoco nor BP replaced blowdown drums and atmospheric stacks, even though a series of incidents warned that this equipment was unsafe.

SAFE OPERATING LIMITS
* ISOM operating limits did not include limits for high level in the raffinate splitter tower.
* BP had developed an electronic system for monitoring operation outside defined envelope. However, the feature to alert that this had occurred had not been activated.

RISK MANAGEMENT
* Occupied trailers were sited too close to a process unit handling highly hazardous materials. All fatalities occurred in or around the trailers.
* Eight previous serious releases of flammable material from the ISOM blowdown stack had not been investigated these events.
* BP Texas City managers did not effectively implement their pre-startup safety review policy to ensure that nonessential personnel were removed from areas

ORGANISATIONAL FAILURES

COST-CUTTING – failure to invest and production pressures

BOARD OF DIRECTORS – No director responsible for assessing and verifying the performance of BP’s major accident hazard prevention programs.

SAFETY PERFORMANCE - Reliance on the low personal injury but not indicators of process safety performance and the health of the safety culture.

MECHANICAL INTEGRITY - “run to failure” of process equipment at Texas City.

CHECK BOX MENTALITY - Personnel completed paperwork and checked off on safety policy and procedural requirements even when those requirements had not been met.

CULTURE – lack of reporting and learning culture. Personnel not encouraged to report safety problems and some feared retaliation for doing so. Lessons not captured or acted upon, including those from other sites and organisations.

FALURE TO ACT - Numerous surveys, studies, and audits identified deep-seated safety problems at Texas City, but the response of BP managers at all levels was typically “too little, too late.”

MANAGEMENT OF CHANGE - BP Texas City did not effectively assess changes involving people, policies, or the organization that could impact process safety.

CSB Report of BP Texas accident - overview

The 337 Page report is now available at the Chemical Safety Board's website. The executive summary seems quite comprehensive and readable, but a quick scan of the main report suggests that there is more to learn if you dig deep enough.

From what I have read so far, the key issues were.

* Procedures - did not reflect how tasks were done in practice, and were not really used for the startup
* Pre-start checks - a comprehensive program of checks was specified but not carried out
* Maintenance - faulty equipment was not repaired during the turnaround because supervisors decided there was not enough time
* Control system - indicators and alarms were not working
* Interface - information to carry out a mass balance was not available on a single screen (exactly the same as the Texaco Pembroke accident)
* Manning - failure to provide extra personnel for startup
* Shift handover - insufficient discussion and poor log keeping
* Fatigue - operators working on the turnaround, 12 hours shifts for 30 days without a break
* Communication - critical messages passed verbally and misunderstood
* Training - mostly on the job with no training for abnormal situations, including startup.
* Poor plant design
* Operating limits - failure to identify all key operating limits and to monitor operations
* Poor risk management - including siting of trailers and failure to remove non-essential personnel during start-up
* Multiple organisational failures - as identified in Baker report

I've also put together a more detailed summary here

Wednesday, March 21, 2007

Shift handover and shift log software

I realised a long time ago from my studies of the Piper Alpha inquiry that shift handover is a critical activity that can contribute the major accidents. However, it has not received much attention in the past, possibly because it has fallen in to the category of 'too hard.' The Buncefield inquiry has also identified shift handover as an issues, and it seems likely that it will be receiving a higher profile now.

I met up with the guys from Infotechnics last week to look at their shift log and handover software called Opralog. I was very impressed. It seems to be very easy to use but provide a great deal of power that takes it well beyond being simply a tool for assisting handovers. In particular, it allows companies to start logging events from the perspective of what needs to be done to deal with them, rather than simply what happened to plant and equipment.

Opralog's main features (as I see it) include:

* Predefined events mean operators and technicians have less to write, meaning they are more inclined to record useful information;
* Interfacing with plant data allows text descriptions to be recorded to explain observed plant events
* Events can be logged automatically, triggered by plant data (e.g. if a parameter exceeds a certain value) which prompts the operator or technician to record an explanation
* Logs can feed in to each other - for example, certain parts of operator logs can populate part of their supervisors log.

Well worth a look.

You can find out more about shift handover at my website

Monday, February 26, 2007

Too much training

Scotland were heavily defeated by Italy in the six nations rugby union on Saturday. Scotland had a disastrous start with Italy intercepting three times (a chip-over kick, pop-pass and long pass) in the first 7 minutes, scoring tries each time.

I'm no expert on rugby, but it looked to me that Scotland had been practicing the maneuvers, but in training had no opposition. This was later said by one of the commentators at the BBC.

I think parallels can be made with industrial settings. We know training is important, but often fail to provide the right training. This case highlights that whilst skills are important, the training programme can back fire if people are not able to make the correct decisions about which skills to use and when. In at least two of the three cases (chip kick and long pass) it is an obvious risk that an interception is possible. What Scotland failed to do was consider whether the risk was worthwhile. If they were close to Italy's try, it probably would be as they had a good chance of scoring points and would have more chance of recovering from an interception before Italy scored. As they were close to their own, the benefit was much less and risk much higher.

This is one area where companies get it wrong with simulators. They ask for 'high fidelity' versions that allow people to gain skills in operating the plant. Unfortunately the complexity and cost of these simulators means that more time is spent gaining skill, leaving relatively little time to practice decision making. Conversely 'low fidelity' simulators do not provide the opportunity to gain skills in operating plant, but this leaves a lot more time on practicing decision making, problem solving etc.

Andy Brazier

Friday, February 16, 2007

TLA's

Jeremy Clarkson's article in the InGear section of the Sunday Times on 11 February 2007 discussed how some Three Letter Acronyms take longer to say than the full versions. Given that communication can be critical to safety, these examples may be useful to illustrate scenarios.

www = 9 syllables whereas world wide web = 3

From the army

IED= Improvised explosive device = bomb
ACV = armoured combat vehicle = tank
ADW = air defence warning = siren

In business China becomes PRC

And Jeremy adds one that is probably libelous IFA = thief.

Leading indicators of safety performance part 2

Following a comment I received on my previous post on the topic, I have given some further thought to leading indicators.

When you talk about performance indicators people often say they need to be S.M.A.R.T - Simple/sensible; measurable; attainable; realistic; time-based.

But there is a counter argument (I am sure I read this as a quote somewhere once) that says 'all that is important cannot be measured and things that can be measured are not always important.'

The comment made to my earlier post says that lagging indicators need to be based on actual consequences rather so that they are precise, accurate, difficult to manipulate and easily understood. Therefore near misses and high potential incidents cannot be used as indicators. This is an interesting point, and now I have time to think about it I am pretty sure it is correct. I still maintain that a huge amount can be learnt from near misses, but agree that this is not the same as using them to provide performance indicators.

The comment also said that leading indicators are certainly more difficult than lagging ones, but if we ask the people who are close to the risk and working with it every day, we will very quickly get a good indication of which of our systems are weak. Then we can hang indicators on those systems to drive improvement.

So from this I conclude that

1. Our traditional lagging indicators are useful, and there but there is probably no need to look for many new ones.
2. Leading indicators can be identified, but they need to be fluid in order to reflect the issues most relevant to an organisation at any time.
3. Near misses are an excellent source of important information but do not provide data that we can use to measure performance.

Wednesday, February 14, 2007

Behavoural safety - IOSH branch presentation

Presentation by Nick Wharton of JMOC at the IOSH Manchester Branch 13 February 2007

Nick gave a very good presentation covering the basics of behavioural safety. He is a good and entertaining presenter, and clearly very experienced. I think that, although he is obviously quite evangelical about behavioural safety, he was also very honest that it is impossible to create a direct link between introducing a behavioural programme and improving safety performance. This is quite a contrast to some presentations I have seen where it is claimed behaviour modification is The answer to safety.

I am fairly ambivalent to behavioural safety. I consider it to be a useful tool in the safety toolbox but have concerns that companies often put all their effort and resources into it, at the expense of other approaches. In fact some of Nick's figures showed how much continued effort is required. It is not just a case of keeping up a level of effort, but it seems you need to keep increasing effort otherwise safety performance starts to drop. I wonder how sustainable this can be.

Nick suggested that behaviour modification is applicable to process safety as well as personal safety. I am far from convinced about this. However, Nick did say that behavioural safety should not be used until good systems are in place. Perhaps it is the case that the process safety systems are not yet well enough developed, and so there is more to do before we can try behavioural safety. I guess my question is whether systems will every be good enough, and I feel effort spent on systems may always be more beneficial than that spent on behaviours.