Gas turbines (GTs) truly are engineering marvels. Their sophisticated design, materials, and operation challenge the limits of modern technology. When one of these machines ‘breaks,’ understanding what happened, and why, is a demanding exercise requiring the participation of experts in equipment forensics
By Ron Munson and John Malloy, Mechanical & Materials Engineering LLC (M&M Engineering)
When a major failure occurs to any piece of equipment, the immediate concern is for the welfare of employees. Hazardous conditions must be eliminated and personnel safety secured. It is important to isolate energy sources using proper lock-out/tag-out (LOTO) procedures and to assure structural stability of the compromised equipment. Keep in mind that large dynamic forces from the failure event and resulting fires can weaken structures, leading to their collapse.
The remaining equipment in the quarantined area must be protected from the environment to prevent further damage. It is likely that most equipment will be inoperable for an extended time while repairs are made, thus lay-up procedures are appropriate. Immediate actions after the event have an impact on the success of the failure investigation to follow.
There is no established forensics procedure for first steps to take following a significant GT event. The best advice is to “document/document/ document.” Record the scene with video and/or digital photographs. “Shoot” everything—including control equipment settings, and even things that appear normal. As memory fades, the images captured will prove invaluable. Another essential step: Make a projectile map. Tag and label each liberated piece and its relative position to the equipment of origin.
The tone and demeanor of investigators during the initial visit is very important. Be casual and do not offer any opinions. Listen carefully to statements by operators and others who could have knowledge about the event; do not prejudge what is said. Record what everyone says; the validity of the statements can be determined later.
Among the people to interview as soon as possible after the event, while memories are fresh, include these:
- First responders. Ask them what they moved or disturbed during their initial damage-control efforts.
- Operators. Ask them about observed trends, anomalies, or procedural changes prior to the event; also, what was reset or turned off immediately after the failure.
- Maintenance personnel. Ask them about recent maintenance and observations they made during that work.
If a fire occurred, give serious consideration to retaining a “cause and origin” professional early in the recovery project. Details of a fire deteriorate quickly with time.
Preserving the remains
Once at-risk personnel are accounted for, equipment is rendered safe, and environmental issues around the GT workspace are resolved, it’s time to begin the investigation into what happened. However, before the investigative team arrives onsite, be sure to protect affected equipment from the elements using tarps and enclosures and from “souvenir hunters.” Broken components may be sought after for nostalgic purposes, but pieces critical to the investigation can be lost if not properly guarded.
In addition, it is important to secure the electronic “remains”— such as data loggers, control-system CPUs, and any other data storage device. You’ll want information from at least six months prior to the event through the failure to identify trends that may offer a reason for the incident.
Often a step-change or trend in vibration level or lube-oil temperature, considered insignificant before the failure, proves important afterward. Even data within normal limits should be retained because “exclusion” is critical to establishing root cause.
Level of investigation
At the very early stages, it is important to decide what you want the failure investigation to achieve. There are three basic levels to such investigations:
- Level One, which has a 75% to 99% certainty of being correct, focuses on the mechanistic level of damage. In this case, the investigation will determine the component(s) that failed first and the damage mechanism that precipitated the failure. Examples of possible damage mechanisms could include creep, thermal-mechanical fatigue, foreign-object introduction (impact), incipient melting, among many others. Metallurgical investigation techniques and sound logic are your best tools at this level.
- Level Two, with a 50% to 90% certainty of being correct, is determination of cause. Generally, this level is easy to achieve once the failure mechanism is known. To illustrate: If a hot section is overheated, and a faulty exhaust gas thermocouple controller is identified, the mechanism (overheating) and cause (faulty thermocouple) are easily identified.
- Level Three, the most difficult and costly to achieve, is the root-cause level of analysis. Its certainty of success always is less than 100%. A root cause analysis (RCA) is multi-disciplined and should ultimately provide the underlying causes of the event. Notice the plural usage of the word “cause.” A major GT loss event almost always has multiple causes, the cumulative effect of which leads to the failure.
Conducting a proper RCA for a GT loss requires a team of skilled and knowledgeable people who work well together and communicate effectively. The composition and size of the team are critical. The team must be small enough to operate efficiently, but have all necessary technical disciplines covered. Members generally should include:
- The owner. It obviously must be represented on the team, but the specific person chosen is an important consideration. He or she should know the big picture of plant and equipment operation, and, more importantly, know where to go for information and facts. The owner’s representative also must be empowered with access to all sensitive information: In an RCA effort the root cause may be controversial.
A plant engineer or maintenance manager usually is involved—perhaps both. Oftentimes, however, the people who should be involved are likely fully occupied and focused on repair of the damaged unit. Understanding what happened often is secondary to restoring the cash flow.
- The OEM best understands equipment design and the details of construction. Its delegate must have access to details on construction materials, stress models, and dimensions. The OEM also has historical information on the failed machine, as well as on the entire fleet of similar machines.
However, the OEM usually is not willing or able to share this information with the other team members because of confidentiality and proprietary information concerns. It also is cautious because of litigation/subrogation that could result from this event or similar events at other locations. Management and regulation of the OEM effort by the equipment owner is critical to the success of the RCA.
- Third party, hired by owner. In most cases, the owner will hire a third party, either an individual or a company to represent it on the RCA team. These engagements usually occur even if the owner has employees on the team. The third party usually has special expertise in either GT technology or failure analysis that is not resident with the owner.
- Third party, hired by insurance carrier and/or others. Insurance contract coverage on modern GTs often is complex. Many companies, each of which has a vested interest in the results of the investigation, typically share the risk. These insurance companies generally will hire an adjuster which, in turn, may hire a third party to help it understand technical issues on the loss or recovery, or to provide the accounting to quantify the financial loss.
In addition, the individual insurance companies also will hire (or jointly hire) an independent third party to protect their interests. The broker, who sold the policy, has an interest to be sure any subsequent claim is fairly and accurately addressed. In many cases, there is a financial institution that has a significant financial interest in the loss and recovery efforts.
The bottom line: For a major, complex failure there can be many interested participants vying for a spot on the RCA team. Managing these diverse parties and their agendas can be challenging, often frustrating.
Not all participants can be on the RCA team. The phrase “Too many cooks in the kitchen” certainly applies here. The owner usually has the final say on participation roles. In most cases, all parties share in the findings of the RCA team, but only a limited number of qualified people participate in the actual effort. In no case should any one participant ever have complete control of the RCA staff. Such a team cannot be objective, candid, and thorough.
Research, data mining
Many GT loss events are not unique. Other machines have had similar or even identical failures. The lessons learned from such earlier failures can greatly streamline the efforts of the RCA team. However, it is important not to leap to conclusions based upon someone else’s evaluation on a similar but not identical piece of equipment. The RCA process must be carefully conducted and fully completed before comparisons are drawn. However, if two or more independent studies reach similar conclusions, the degree of certainty improves.
There is information on GT loss events available to interested parties. User groups are the best source for sharing information, but this must be done discreetly and ethically. Some research organizations such as the Electric Power Research Institute (EPRI), Palo Alto, Calif, have chartered sponsor groups on generic fleet problems. Other sources of information include repair organizations and depots. Retired OEM engineers are also good sources for data and expertise, and make valued members of RCA teams.
OEMs usually issue technical information letters to owners concerning generic or fleet issues affecting machines. It is always prudent to review the letters and determine if one or more are relevant to the failure under investigation. In some instances an owner has received an information letter, but has not implemented the directive. The relevance of this non-compliance must be weighed as it pertains to the impact on the failure event.
Separation of cause versus effect
The first look at a failed turbine can be confusing. The often-catastrophic nature of failure events leads to a jumbled mass of parts and debris. It is a daunting task to sort the sequence of events from an initial triage. The following points aid in assessing the damage:
Think front to rear. Turbine air flow will sweep liberated debris to the rear of the engine, creating an increasing cascade of damage. The failure often, though not always, will begin at the front, or near the front of this damage. Usually damage originating in a row or stage can and will impact the preceding stage by debris being propelled forward. A good example is the liberation of a second-stage airfoil. It would be knocked forward into the trailing edge of the second-stage vane, leaving impact damage in front of the original failed stage.
Machine inputs. It’s always important to understand what’s entering the GT. The following are important:
1. Air, filtered but not pure. While filters generally are effective, some particles will pass around or through the filter into the machine. This is especially true of volatile substances contained in atmospheric water vapor. Because a massive volume of air passes though the turbine, even a low concentration of contaminants translates into a large quantity of particulates entering the engine. Determination of filter condition is vital to this end.
2. Fuel. Variation in fuel composition and properties can affect combustor dynamics and acoustic events. If the fuel quality or composition is suspect, often there are analytical data available for the fuel source.
3. Lube oil. Substandard or contaminated oil can affect the integrity of the hydrodynamic seal, leading to premature bearing failures.
4. Water for fogging, evaporative cooling, NOx control, or thrust augmentation can adversely impact compressor performance and damage the leading edges of compressor blades in the first few stages. Fogging and thrust augmentation also can affect the flow dynamics in the air path and lead to resonant-frequency shifts of the blade harmonics. Injection of water may lead to the buildup of an electrostatic charge that can cause bearing failure.
5. Steam for NOx control may not be pure and can cause corrosion.
Unintentional inputs. Turbomachinery does not like free objects in the flow path. Objects either foreign (unintentionally left in machine) or domestic (parts liberated by metal fracture) can cause failures, usually catastrophic. There are two ways to look for evidence of FOD (foreign object damage) or DOD (domestic object damage).
One is to sort through debris recovered from inside the machine and in the exhaust. This is time consuming and rarely successful. Foreign objects usually are ground up and obliterated by the rotational motion of the turbine. Domestic objects cannot be differentiated from accident debris.
The other method is to examine the airfoils just forward of the catastrophic damage to look for impact marks, which can be examined for geometric form to match suspect objects. It is also possible to examine the marks in an electron microscope to check for metal transfer between the offending object and the damaged component. The transferred metal usually can be matched to a DOD or FOD component.
Physics matters. Often a specialist will focus on his area of expertise and not see an obvious detail. Physics matters! There are several points worth remembering, including these:
- Hot metal expands.
- Gas flows from a region of high pressure to one of lower pressure.
- Rotating parts do not like debris.
- Air gets hotter as it is compressed.
- A filtration efficiency of 99.7% means that there still is 0.3% contamination.
- GT materials heated above 2600F melt without cooling and protective coatings.
- Liberated parts cause increasing damage as they move aft through the engine in the direction of air flow.
- Unbalanced rotors rub the casing.
The metal does not lie (but it only tells part of the story). A detailed examination of the fracture surfaces by an experienced metallurgist usually can identify the initial damage mechanism for a GT. Generally, the initial component failure will have experienced a progressive damage mechanism such as creep or fatigue; secondary damage will have different mechanisms—such as ductile overload or cleavage. It is prudent to evaluate all fractures especially those forward of the catastrophic damage. Analysis of the data may involve many tasks and a typical investigation may include:
- Metallurgical laboratory analysis.
- Finite-element modeling and stress analysis.
- Analytical flow modeling.
- NDE of like or sister equipment.
- Apparatus testing.
- Review of parts pedigree, including tracking of third-party repairs.
When all phases of the investigation are complete—including laboratory examination, mechanical analysis, records review, witness statements, and visual observations—a failure scenario should be postulated in a suitable forum for discussion by the RCA team. This must be done before a final report is written. All data and observations should be reviewed critically. If all pieces of the failure puzzle do not fit together, there are two possibilities:
- The theory of failure is wrong or incomplete.
- The input data are wrong.
In the end, the various analyses must converge to the same conclusions. ccj