Aircraft Maintenance Incident Analysis

CAA PAPER 2007/04 Aircraft Maintenance Incident Analysis - Published by the Civil Aviation Authority, December 2007

Paper is an analysis of a selection of maintenance related events on jet aircraft above 5,700kg MTOW, captured and stored under the requirements of the CAA’s Mandatory Occurrence Reporting (MOR) scheme to identify trends, themes and common causes or * factors.

It presents a taxonomy that looks useful. It has three main categories:

1. Maintenance Control – An event attributed to an ineffective maintenance control

2. Incomplete Maintenance – An event where the prescribed maintenance activity
is prematurely terminated. In these circumstances the correct maintenance procedures appear to have been followed but something was not removed, not fitted or not set correctly towards the end of the process.

3. Incorrect Maintenance Action – An event where the maintenance procedure was completed but did not achieve its aim through the actions or omissions of the maintainer. In these circumstances it appears that an incorrect maintenance procedure or practice was being used. This has resulted in a larger number of second level descriptors than Incomplete Maintenance, but includes the actions of not removing, not fitting or not setting something correctly by virtue of not performing the task correctly, rather than as an error of omission.

Each category is broken down further as follows, showing the results of the analysis.

1. Maintenance Control (Total 733):
* Scheduled task - 223 30·4%
* Inadequate tool control - 84 - 11·5%
* Deferred defect - 81 - 11%
* Airworthiness data - 78 - 10·7%
* Tech log - 67 - 9·2%
* Airworthiness Directive - 66 - 9%
* Modification control - 55 - 7·5%
* MEL interpretation - 37 - 5%
* Configuration control - 23 - 3·1%
* Certification - 13 - 1·8%
* Component robbery - 6 - 0·8%

2. Incomplete Maintenance (Total 602):
* Not fitted - 268 - 44·5%
* Not set correctly - 229 - 38%
* Not removed - 105 - 17·5%

3. Incorrect Maintenance (Total 1589)
* Incorrect fit - 619 - 39%
* Not set correctly - 447 - 28·1%
* Incorrect part - 160 - 10·1%
* Poor maintenance practice - 94 - 5·9%
* Procedure not adhered to - 83 - 5·2%
* Not fitted - 78 - 4·9%
* Incorrect repair - 62 - 3·9%
* Incorrect procedure - 24 - 1·5%
* Not removed - 22 - 1·4%

Unfortunatly the analysis found that information regarding underlying causes is rarely reported. This significantly limits the value of the analysis, and is something the industry needs to address.

Introducing new technology

Abbey targets cost and service gains from IT overhaul - article on on 4 December 2007 by Karl Flinders.

Abbey have developed 'The Partenon' banking platform to replace 30-year-old legacy computer systems and provide the bank with a single view of its customers for the first time. It is hoped to reduce costs to the business by £300m. Abbey has consolidated all of its customer records on to a single database. Eliminating duplication has allowed the bank to reduce the number of customer records it stores from 52 million to 20 million.

The article goes on to say "Training and getting users to buy into projects is an important competency which is often overlooked in the banking sector, according to Ralph Silva, analyst at TowerGroup."

He said human error is responsible for 40% of the failures of major IT projects in the European banking sector. Only 5% are caused by problems with the technology.

"Almost every major failure of any significant IT project in the European financial services sector can be attributed to human error," said Silva. "The human element is always the last one to be considered, and yet it is the highest cause of failure."

Abbey's training programme

* Face-to-face tuition and e-learning on tools and ways of working delivered to 25,000 staff
* Support for staff in branches and contact centres
* Dedicated single point of contact helpline
* Comprehensive pilots before full roll-out
* Post implementation consolidation training
* Training includes "contingency" processes to minimise service disruption
* Training is piloted with focus groups
* Senior Abbey management sent to Santander to meet colleagues and see Partenon working.

Adverse drug reactions

Allergy to medicines 'is killing thousands' - Article in the Time Online on 27 December 2007 by David Rose.

Nearly 3,000 patients have died in the past three years as a result of taking medicines intended to help them, official figures show. Thousands more have been hospitalised after suffering harmful side-effects or serious allergic reactions to prescription drugs and other medications.

Drugs most commonly implicated in adverse reactions include low-dose aspirin, diuretics, the anticoagulant drug warfarin and other nonsteroidal antiinflammatory drugs. The most common problem associated with these medications is gastrointestinal bleeding, which can be fatal. But many of the reactions were likely to be because of incorrect dosages or known interactions of the drugs and as such were avoidable, research suggests.

Teresa Innes, 38, lapsed into a coma in September 2001 after a surgeon at Bradford Royal Infirmary prescribed a drug containing penicillin as she was about to undergo a routine procedure to drain fluid from an abscess on her thigh. Despite wearing a red allergy band on her wrist and medical notes giving warning about her acute aversion to the antibiotic, Mrs Innes was given the drug Magnapen, which staff did not realise contained penicillin.

The former care worker suffered an-aphylactic shock, which stopped her heart for 35 minutes, resulting in permanent brain damage. She was left in a persistent vegetative state from which she never recovered. She died two years later.

This is a good example of how complex it is for someone to become competent in a task. In this case it seems likely that everyone knew about Teresa's allergy, but did not have deep enough knowledge of the drug. Given the number of drugs used in health care this is hardly surprising. Some form of job aid could probably help, if people would use it in practice.

Ergonomics society oil and gas conference - part 5

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Andrew Hopkins gave a presentation entitled "Thinking about process safety indicators." Andrew is very well known for his book "Lessons from Longford" which gives a fascinating account of organisational failures related to Esso's fire and explosion in Australia.

Andrew made a number of very good points in his presentation. He talked about the 'Heinrich triangles' which suggest that for every fatal accident there will be 10 major injures, 100 minor injuries, 1000 near misses etc. He said this gives the impression that reducing the rate of minor incidents can influence the likelihood of a major accident. However, this is not the case and that a separate triangle is required that only covers process safety incidents so that for every major accident there is 10 major process disturbances, 100 minor process disturbances and 1000 near misses. There may be a very small overlap on the bottom level of the personal and process safety triangles.

Andrew's main point was that we have become overly concerned with the difference between leading and lagging indicators of safety performance. This distinction is quite artificial and not as clear cut as it may appear. Instead what we need is more process safety indicators. It does not really matter if they are leading or lagging, as they only need to occur with sufficient frequency to give statistically relevant data. To do be effective the indicators need to show how well barriers or defences are working and performing.

An interesting suggestion from Andrew was that manager bonuses should be linked to process safety, although it must be done in a way that does not cause 'perverse outcomes' whereby the act of measuring leads to data being hidden. Any personal incentives should be symbolic and public (e.g. cinema pass).

Ergonomics society oil and gas conference - part 4

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Ian James of HSE presented the 7 step approach to managing human factors:

1. Consider main site hazards
2. Identify human activities for these (e.g. bulk transfers, maintenance, startup, reactor charging)
3. Outline key steps in these activities (remember to talk to operators)
4. Identify potential human failures for key steps (slips, mistakes and violations)
5. Identify performance influencing factors that make failure more likely (job, person, organisation)
6. Use the hierarchy of control (don't reply on human as the last line of defense, but automation introduces new issues)
7. Manager error recovery (makes it more likely that errors will be detected by others or the system)

HSE expect companies to take a structured approach, focused on human role in initiating and mitigating major hazards that considers all error types (unintentional and decision failures, as well as intentional and action failures). They expect operators to be involved, and that management failures are considered. HSE prefer a qualitative approach, and do not expect quantification of risks related to human factors.

Ergonomics society oil and gas conference - part 3

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Isadore (Irv) Rosenthal gave a presentation titles 'BP's Texas City accident - are the lessons taught likely to be learned and implemented?' Irv had been a member of the Baker Panel that investigated the management and organisational failures that contributed to this accident. I have blogged findings from the report previously, and Irv covered many of these points. However, his presentation provided further insight, which is summarised below.

It is easy to see BP as a large, highly profitable company that makes you wonder why money was not being spent to improve safety. Whilst this is true, the fact that the refinery arm of the business made a relatively small contribution to the overall profit, well below that of exploration and production. It is estimated that the accident has cost BP over $2.5 billion in fines, settling claims and most significantly lost opportunity. It also had a very negative impact on stock/share prices for up to 18 months.

The findings from the Baker Panel report should not have been a surprise to the company, because many similar issues had been raised by reports of the accidents at BP Grangemouth Refinery in 2000. For example, quoting from reports:

1. Grangemouth - "Insufficient management attention and resources were given to maintaining and improving technical standards for process operations and enforcing adherence to standards, codes of practice, company procedures and HSE guidance"
1. Texas City - "Process safety, operations performance, and systematic risk reduction priorities had not been set and consistently reinforced by management."

2. Grangemouth - There was a need to build awareness and competencies in process safety and integrity management within senior leadership and the organisation in order to develop a meaningful value conversation around cost versus safety. "There was a lack of experience in some areas, and limited refresher training plans."
2. Texas City - The Texas City Refinery suffers from an "inability to see risks and, hence, tolerance of a high level of risk. This is largely due to poor hazard/risk identification skills throughout management and the workforce, exacerbated by a poor understanding of process safety...There was no ongoing training program in process hazards risk awareness and identification for either operators, supervisors or managers."

3. Grangemouth - "With no formal structure or specific focus on process safety, many of the components of process safety management (PSM) were not formalised at Grangemoth. There was no site governance structure to provide overview and assurance that process safety issues were being handled appropriately. Process safety needed to be elevated to the same level as person safety."
3. Texas City - "The investigation team was not able to identify a clear view of the key process safety priorities for the site or a sense of a vision or future for the long term. Focus (was) on environment and personal safety, not process safety. There was little ownership of PSM through the line organisation."

4. Grangemouth - "BP group and Complex Management did not detect and intervene early enough on deteriorating performance....Inadequate performance measurement and audit systems, poor root cause analysis of incidents, and incorrect assumption about performance based on lost time accident frequencies and a lack of key performance indicators.. meant the company did not adequately measure the major accident hazard potential."
4. Texas City - "The safety measures focused primarily on occupational safety measures, such as recordable and lost time injuries. This focus on personal safety had led to the sense that safety was improving at the site. There was not clear focus or visibility on measures around process safety, such as lagging indicators on loss of containment, hydrocarbon fires, and process upsets."

5. Grangemouth - "Over the years, a number of maintenance and reliability reviews, task forces, and studies had been conducted, but many recommendations had not been implemented. There was a maintenance backlog and mechanical integrity testing was not prioritised to ensure that safety critical equipment received timely preventative maintenance."
5. Texas City - Risk awareness "repeated failures to complete recommended actions from audits, peer reviews and past incident investigations." "There is currently a backlog of unclosed action items in the tracking database related to various aspects of process safety management, including those stemming from incident investigation. Some of the the latter extend back over a period of more than twelve months."

In conclusion Irv felt BP will learn from Texas City because:

1. Everyone at the company felt very bad about the accident and it had had a major financial and public relations impact.
2. The board had recognised that good process safety also improves product quality, yields, profits and the public image need to keep its license to operate and win oil leases.
3. Unions, neighbours, regulatory agencies and political concerns will motivate more action
4. BP are implementing process safety that should lead to better process safety practices.

I hope he is right in his conclusions!!

Ergonomics society oil and gas conference - part 2

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Trevor Kletz gave a presentation titled '25+ years of human factors and process safety.' Although I have heard him speak many times and read some of his books, his message is still (unfortunately) still very relevant to many.

In this presentation he recounted that in the 1960s it was believed that 80% or more of accidents were due to people not taking enough care, and so methods were used following an accident were to 'persuade' people to be more careful. The actual action taken depended entirely on the consequences, not potential consequences and ranged from a 'friendly word' through to dismissal "pour encourage les autres."

Trevor's key message was that one element of human factors that is still not getting enough attention is design. Lessons about design are not being learnt, and so opportunities to engineer-out human error are being missed. His examples included:

* Avoid people falling down stairs by only building bungalows. OK, so this may not be possible, but by stair cases have one or turns in them, the distance that can be fallen is significantly reduced;
* At Bhopal the substance that caused the harm to so many people was an ;intermediate. It was convenient to store it , but not essential
* Piper Alpha occurred in part because oil and gas is separated offshore, yet it is technically possible to carry this step onshore;
* Nitration is a common but very hazardous reaction used to make amines. No other process is known, but no one has ever looked for one;
* The new Pendolino trains have a major problem with toilets leaking. This is because the waste materials (which are corrosive) are stored at roof level and when they leak create very bad smells.

Trevor's message was that we are still missing simple fixes during design. Perhaps if accident reports were discussed critically by designers, some of these problems that cause human error would be avoided.

Ergonomics society oil and gas conference - part 1

I was a speaker at the Ergonomics Society's conference on 'Human and organisational factors in the oil, gas and chemical industries' on 27-28 November 2007. I am blogging key messages from some of the presentations.

Martin Anderson opened the conference by giving an idea of where industry should be heading. Of particular note was his negativity towards behavioural safety. Not because there is anything particularly wrong with it, but because too many companies think using such a programme means they have 'done' human factors.

Martin showed a poster add from the airforce. It read "It takes about 80,000 rivets, 30,000 washers, 10,000 screws and bolts to help make this aircraft fly...... and only one nut to destroy it." Martin made it clear that this was NOT a useful message. Individuals rarely have much influence over the factors that make it more or less likely they will make an error, and so telling people to 'be careful' makes very little difference.

We all know that culture is an important part of human factors, but Matrin made the point that we can think this refers to 'operator culture' when in fact it is the 'organisational culture' that we need to be looking at. He quoted the following examples from the major accidents

* Poor competency assurance - Esso Longford
* Poor user interfaces - Texaco Pembroke
* Failure to learn from the past - Mexico City
* Poor maintenance management - Bhopal
* Inadequate management of change - Flixborough
* Poor communications - Piper Alpha
* Poor implementation of safety policy - Kings Cross fire

Martin made the point very forcibly that behavioural safety does not equal huaman factors. Behavioural approaches:

* Focus on observable behaviours only
* Draw attention away from process safety issues
* Don't address the significant impacts of management behaviour
* Can make a contribution to safety, but have limited benefits for the control of major hazards.

In particular it is not appropriate to focus on employee behaviour or culture when the organisation has insufficient resources, inapparopriate priorities, does not plan work effecitvely, has not assessed risks, has poor control over contractors, does not invest capital, has inadeqaute procedures and competency assurance etc.

Martin finished with a quote from Winston Churchill

"To look is one thing, to see what you look at is another
To understand what you see is another
To learn from what you understand is something else.
But to act on what you learn is all that really matters"

