Thursday, March 29, 2007

CSB Report of BP Texas accident - detail

The 337 Page report is now available at the Chemical Safety Board's website.

Below is a reasonably detailed summary of the report. I have also put together a much more brief overview.

On March 23, 2005, at 1:20 p.m., the BP Texas City Refinery suffered one of the worst industrial disasters in recent U.S. history. Explosions and fires killed 15 people and injured another 180, alarmed the community, and resulted in financial losses exceeding $1.5 billion. The incident occurred during the startup of an isomerization1 (ISOM) unit when a raffinate splitter tower2 was overfilled; pressure relief devices opened, resulting in a flammable liquid geyser from a blowdown stack that was not equipped with a flare. The release of flammables led to an explosion and fire. All of the fatalities occurred in or near office trailers located close to the blowdown drum.

The Texas City disaster was caused by organizational and safety deficiencies at all levels of the BP Corporation. Warning signs of a possible disaster were present for several years, but company officials did not intervene effectively to prevent it. The extent of the serious safety culture deficiencies was further revealed when the refinery experienced two additional serious incidents just a few months after the March 2005 disaster.

There were numerous cultural, human factors, and organizational causes of the disaster . One underlying cause was that BP used inadequate methods to measure safety conditions at Texas City. For instance, a very low personal injury rate at Texas City gave BP a misleading indicator of process safety performance. In addition, while most attention was focused on the injury rate, the overall safety culture and process safety management (PSM) program had serious deficiencies.

Cost-cutting and failure to invest in the 1990s by Amoco and then BP left the Texas City refinery vulnerable to a catastrophe.


Key Technical Findings

PROCEDURES
* Many deviations from written procedures occurred. These were not unique actions but as a result of established work practices, frequently taken to protect unit equipment and complete the startup in a timely and efficient manner.
* Management did not ensure procedures were updated, incorporated learning from incidents or adapted to cover unique startup circumstances.
* There was no effective management of change of procedures.
* Management actions (or inactions) sent a strong message to operations personnel that procedures were not strict instructions but were outdated documents to be used as guidance.
* The ISOM startup procedure was not followed and no record was made of steps completed. As a result a key valve was shut that prevented liquid leaving the raffinate splitter tower.
* The procedure required filling the tower to 50% level. Previous experience showed it needed to be filled higher because the level would typically drop significantly during startup. It was filled to a 99% level reading, but was actually way off scale. A high level alarm was activated at 72%, but a subsequent high level switch was faulty (this was not noticed by operators).

PRE-STARTUP CHECKS
* A rigorous pre-startup procedure required all startups after turnarounds to go through a Pre-Startup Safety Review (PSSR). The process safety coordinator for the ISOM was unfamiliar with its applicability, and therefore, no PSSR procedure was conducted.
* The PSSR is a formal review carried out by a technical team led by the operations superintendent and signed off by senior management. It involves verification of all safety systems and equipment, including procedures and training, process safety information, alarms and equipment functionality, and instrument testing and calibration. Also, that all non-essential personnel had been removed from the unit and neighboring units and that the operations crew had reviewed the startup procedure.
* BP guidelines state that unit startup requires a thorough review of startup procedures by operators and supervisors; but this was not performed or checked off.
* The start-up procedure covered the scenario of one continuous startup. In reality the startup was paused, part of the plant shutdown and later restarted.

MAINTENANCE
* Faulty equipment, including level indicators and control valves, had been identified but not repaired.
* BP Supervisors deemed there was not enough time during the turnaround to make the necessary repairs.
* BP Supervisors stopped technicians checking alarms and instruments because there was not time to complete the checks before the unit was due to start.
* The same BP Supervisors then signed the startup procedure that required that all control valves had been tested and were operational prior to startup.

CONTOL SYSTEM
* Level indicator showed the tower level declining when it was actually overfilling.
* Redundant high level alarm did not activate
* Tower was not equipped with any other level indications or automatic safety devices.
* The control board display did not provide adequate information on the imbalance of flows in and out of the tower to alert the operators to the dangerously high level.

CONTROL SYSTEM INTERFACE
* The reading of how much liquid raffinate was entering the unit was on a different screen from the one showing how much raffinate product was leaving the unit. This made it difficult to identify a discrepancy (Texaco Pembroke Explosion is referenced).

MANNING
* There was a lack of supervisory oversight and technically trained personnel during the startup. This is despite analaysis carried out by Amoco based on 15 previous incidents that showed incidents were 10 times more likely during startup than normal operation. Guidelines on site were that supplementary assistance be present during startup, including additional board operators.
* One supervisor had to leave site due to a family emergency. No one was assigned to provide effective cover.

SHIFT HANDOVER
* Supervisors and operators poorly communicated critical information during the shift turnover (handover);
* Night shift operator left early. Subsequent shift handover was brief because it did not involve the person who had done all the work.
* Records in the shift log were brief and ambiguous. They were mis-interpreted by the incoming shift. This was further exacerbated by the failure to record steps completed on the startup procedure by the previous shift operators.
* BP did not have a shift turnover communication requirement for its operations staff.


FATIGUE
* Operators had worked 12-hour shifts for 29 or more consecutive days.
* Had been getting about 5 hours sleep per night.
* BP has no corporate or site-specific fatigue prevention policy or regulations.
* “Operators were expected to work” the 12-hour, 7-days-a-week turnaround schedule, although they were allowed time off if they had scheduled vacation , used personal/vacation time, or had extenuating circumstances that would be considered on a “case-by-case” basis.

COMMUNICATION
* Key messages were not written down, but passed verbally over phone and radio.
* Board and outside operators interpreted a message regarding routing of rafinate. The Board operator closed a control valve. The outside operator manually opened that valve.

TRAINING
* The operator training program was inadequate. In particular hazards of unit startup.
* Training for abnormal situations was insufficient.
* Training consisted of on-the-job instruction, which covered primarily daily, routine duties.
* Startup or shutdown procedures would be reviewed only if the trainee happened to be scheduled for training at the time the unit was undergoing such an operation.
* BP’s computerized tutorials provided factual and often narrowly focused information, such as which alarm corresponded to which piece of equipment or instrumentation. This type of information did not provide operators with knowledge of the process or safe operating limits.
* BP training program did not include specific instruction on the importance of calculating material balances, and the startup procedures did not discuss how to make such calculations.
* Managers did not effectively conduct performance appraisals to determine the knowledge level and training development plans of operators.
* The central training department staff had been reduced from 28 to eight,
* Simulators were unavailable for operators to practice handling abnormal situations, including infrequent and high hazard operations such as startups and unit upsets.

PLANT
* The process unit was started despite previously reported malfunctions of the tower level indicator, level sight glass, and a pressure control valve.
* The size of the blowdown drum was insufficient to contain the liquid sent to it by the pressure relief valves.
* Neither Amoco nor BP replaced blowdown drums and atmospheric stacks, even though a series of incidents warned that this equipment was unsafe.

SAFE OPERATING LIMITS
* ISOM operating limits did not include limits for high level in the raffinate splitter tower.
* BP had developed an electronic system for monitoring operation outside defined envelope. However, the feature to alert that this had occurred had not been activated.

RISK MANAGEMENT
* Occupied trailers were sited too close to a process unit handling highly hazardous materials. All fatalities occurred in or around the trailers.
* Eight previous serious releases of flammable material from the ISOM blowdown stack had not been investigated these events.
* BP Texas City managers did not effectively implement their pre-startup safety review policy to ensure that nonessential personnel were removed from areas

ORGANISATIONAL FAILURES

COST-CUTTING – failure to invest and production pressures

BOARD OF DIRECTORS – No director responsible for assessing and verifying the performance of BP’s major accident hazard prevention programs.

SAFETY PERFORMANCE - Reliance on the low personal injury but not indicators of process safety performance and the health of the safety culture.

MECHANICAL INTEGRITY - “run to failure” of process equipment at Texas City.

CHECK BOX MENTALITY - Personnel completed paperwork and checked off on safety policy and procedural requirements even when those requirements had not been met.

CULTURE – lack of reporting and learning culture. Personnel not encouraged to report safety problems and some feared retaliation for doing so. Lessons not captured or acted upon, including those from other sites and organisations.

FALURE TO ACT - Numerous surveys, studies, and audits identified deep-seated safety problems at Texas City, but the response of BP managers at all levels was typically “too little, too late.”

MANAGEMENT OF CHANGE - BP Texas City did not effectively assess changes involving people, policies, or the organization that could impact process safety.

CSB Report of BP Texas accident - overview

The 337 Page report is now available at the Chemical Safety Board's website. The executive summary seems quite comprehensive and readable, but a quick scan of the main report suggests that there is more to learn if you dig deep enough.

From what I have read so far, the key issues were.

* Procedures - did not reflect how tasks were done in practice, and were not really used for the startup
* Pre-start checks - a comprehensive program of checks was specified but not carried out
* Maintenance - faulty equipment was not repaired during the turnaround because supervisors decided there was not enough time
* Control system - indicators and alarms were not working
* Interface - information to carry out a mass balance was not available on a single screen (exactly the same as the Texaco Pembroke accident)
* Manning - failure to provide extra personnel for startup
* Shift handover - insufficient discussion and poor log keeping
* Fatigue - operators working on the turnaround, 12 hours shifts for 30 days without a break
* Communication - critical messages passed verbally and misunderstood
* Training - mostly on the job with no training for abnormal situations, including startup.
* Poor plant design
* Operating limits - failure to identify all key operating limits and to monitor operations
* Poor risk management - including siting of trailers and failure to remove non-essential personnel during start-up
* Multiple organisational failures - as identified in Baker report

I've also put together a more detailed summary here

Wednesday, March 21, 2007

Shift handover and shift log software

I realised a long time ago from my studies of the Piper Alpha inquiry that shift handover is a critical activity that can contribute the major accidents. However, it has not received much attention in the past, possibly because it has fallen in to the category of 'too hard.' The Buncefield inquiry has also identified shift handover as an issues, and it seems likely that it will be receiving a higher profile now.

I met up with the guys from Infotechnics last week to look at their shift log and handover software called Opralog. I was very impressed. It seems to be very easy to use but provide a great deal of power that takes it well beyond being simply a tool for assisting handovers. In particular, it allows companies to start logging events from the perspective of what needs to be done to deal with them, rather than simply what happened to plant and equipment.

Opralog's main features (as I see it) include:

* Predefined events mean operators and technicians have less to write, meaning they are more inclined to record useful information;
* Interfacing with plant data allows text descriptions to be recorded to explain observed plant events
* Events can be logged automatically, triggered by plant data (e.g. if a parameter exceeds a certain value) which prompts the operator or technician to record an explanation
* Logs can feed in to each other - for example, certain parts of operator logs can populate part of their supervisors log.

Well worth a look.

You can find out more about shift handover at my website