Monitoring the service life of HRSGs

By Peter Rop, NEM bv

For many years, it has been common practice to employ a robust, service-life monitoring system on a combined-cycle/cogeneration plant’s most valuable turbomachinery: the gas turbines (GTs) and steam turbines (STs). To date, such monitoring systems typically have not been installed on the plant’s heat-recovery steam generators (HRSGs).

Yet HRSGs, just like turbines, are exposed to large and rapid temperature changes, are critical to maintaining plant capacity, and can create enormous repair and business- interruption losses when unscheduled maintenance is required.

A service-life monitoring system doesn’t just measure temperatures and pressures, and display a snapshot of them on a computer screen. Rather, it determines the long-term effects of large temperature changes, as well as the effects of pressure changes and sustained operation at high temperature levels, on the system’s most vulnerable components.

Armed with this information, plant operators can fully understand the effects of different modes of operation, and can take corrective action to improve availability and extend service life. Such information also helps planners determine the best time for equipment inspections, and understand the expected maintenance requirements during upcoming outages.

In other words, by monitoring what is happening inside GTs, STs, and HRSGs operating procedures can be optimized, maintenance can be better planned and probably even reduced, and smarter economic decisions can be made regarding the benefit of production versus the cost of increased maintenance and reduced service life. Given today’s trends toward more and more cyclic operation, and the industry’s better understanding of cycling’s impact on plant equipment, manufacturers and users should now extend the use of servicelife monitoring systems from their GTs and STs to their HRSGs.

How it works

A service-life monitoring system comprises three systems: a sensor system, a stress evaluator, and a service-life damage evaluator. The sensor system records the necessary operating parameters—localized temperatures and pressures. From these inputs, the stresses are calculated at critical locations in the HRSG. Finally, the stress data are combined with operating history to determine the impact on remaining service life.

Three criteria are used in evaluating the remaining service life: fatigue, creep, and magnetite layer stress. The first two are based on the fraction of service life that is degraded. During every stress cycle or time interval at high temperature and stress, the material is degraded until finally it cannot withstand the stress—either fatigue or creep—anymore. At this point, the material will start to crack and ultimately collapse (Fig 1).

The third criterion is based on absolute stress level, though indirectly also on cycles. When the absolute stress becomes too high at any one location, the magnetite layer—the oxide layer that protects an HRSG’s internal surfaces, especially in the HP drum—will crack. At each crack location, new magnetite will form, but only at the expense of base material, and with reduced material strength. Each time the stress exceeds a limit, the magnetite layer will crack at the same location and the penetration into the base material will grow further until a serious crack is formed that substantially lowers the strength of the drum. How fast this will happen is difficult to predict, so it is preferable to avoid exceeding the stress limit at all locations.

No two are alike. Virtually all combined-cycle/cogen plants are unique designs. Even two side-byside units will not be perfectly identical, because of differences in site layout and geometry. This means that behavior and responses in each HRSG will be different, thus each monitoring system will have to be customized. For example, when geometries are not identical, the same temperature differences in two HRSGs result in different thermal stresses. Therefore the calculation of stress from the measured data must be customized.

For a cycling plant, fatigue is the primary criterion that will determine service life. So the main focus should be on monitoring the largest stress ranges and their number of cycles. This becomes even more important when you realize that fatigue is a logarithmic phenomenon—that is, the allowable number of cycles, before a crack is initiated, is a logarithmic function of the stress range. For this reason the allowable number of cycles is highly sensitive to the stress range. To illustrate how sensitive, consider that when the peak stress range in a high-pressure (HP) superheater outlet header is varied by 10%, the allowable number of cycles in that component varies between -40% and +70%.

Bottom line: To get reliable results from a service-life monitoring system, both the input data and the system evaluating that data must be as accurate as possible.

A credible dynamic analysis

Once the decision is made to employ a service-life monitoring system for an HRSG, you first must determine where the critical locations are. Critical means locations with the smallest margins with respect to the three criteria explained earlier.

For the fatigue-damage criteria, the locations must be found with the most severe combination of large stress variations and number of cycles. Because fatigue damage is linked to thermal stresses, and because thermal stresses are caused by dynamic effects—predominately transient temperature differences during startup and shutdown—a dynamic analysis is the necessary starting point.

The term “dynamic analysis” has gained attention in recent years, and many HRSG designers now are at least paying lip service to it. But to be useful, the dynamic analysis and the modeling environment must be thorough, and must cover the entire range of startup and shutdown conditions. Simplifications, quasi-static approaches, and the omission of thermodynamic phenomena— such as local formation of condensate—have been attempted by some designers. These attempts are not valid. For example, the maximum stress in an HP steam drum during a cold startup occurs around 290 psia, which is only about 15% of the maximum pressure. Dynamic linearization around the base-load condition, or a series of static cases around 15% of the maximum pressure, clearly will not yield an accurate result.

A thorough and detailed dynamic model is important also because the peak stresses occur at the subcomponent level. For example, to simulate the temperature behavior of a tube-to-header connection in an HP superheater module, the headers and the tube bundle have to be distinguished separately in the dynamic model. Simulating only the superheater bundle is not detailed enough, and therefore will not provide the proper temperature data for the stress model.

To keep to a reasonable number the details that must be modeled, you can perform an assessment of the overall design to identify the possible critical locations. Modeling then can be focused on these locations.

Calculating stresses Once the critical locations are determined, their behavior must be known. Specifically, how do the local temperature and pressure data translate into peak stress?

Finite-element analyses can help, particularly for the most critical spots. Such analyses assist in determining how the peak stress develops as a function of different temperature histories (Fig 2).

Finite-element analyses also indicate how the critical parameters can best be measured. The peak stress itself is mostly on the inside surface of a pressure part, and therefore is practically impossible to measure. Actually, stress itself cannot be measured, only the related strain or force— which is then used to calculate stress. So the measurement will always be of a derived parameter on a location that is not the critical spot itself. Keeping the required accuracy in mind, it therefore is important to select the right parameter to measure, to determine the right location for it to be measured, and to properly relate that parameter to the peak stress.

Often the conditions in the undisturbed wall of the pressure part are used to derive a stress situation in that part of the wall. Through a stress-concentration factor, the peak stress at the critical location is then determined, based on the assumption that the peak stress and the “undisturbed wall” stress are related. Essentially, the stress concentration factor describes this relationship. Note, however, that this often is a complex relationship, not a simple multiplication factor. This means designers must focus on truly understanding the relationship during all possible operating conditions and transients.

From studying the behavior of the peak stress and how it relates to the “undisturbed wall” stress, it is possible to determine the best locations for parameter measurements. But it also is important to determine the type of measurement and to specify the instrumentation. For example, to determine the temperature profile through a drum wall, several temperature measurements through the wall can be used. These include inside, mid-wall, and outside measurements.

But a reliable inside temperature measurement is difficult to achieve. To drill a hole from the outside as close as possible to the inside wall, you must know with high accuracy the actual localized wall thickness. Only then will we have a precisely defined thickness of the bottom of the hole, hence will we achieve an accurate temperature measurement.

The temperature gradients are large in the first few millimeters extending away from the inside wall, so every millimeter off the expected depth will yield considerable differences in temperature measurements.To increase the accuracy, engineers can use additional measurements that are derived from the process.

For example, on the steam side of an HP drum, the steam is condensing (as long as the steam is warmer than the drum), therefore the inside temperature of the wall is very close to the saturation temperature. Thus engineers can use the saturation temperature, which is easy to calculate from a pressure measurement in the HP drum.

When cycling the hottest sectionsof the HRSG—the superheaters and reheaters—it is important to monitor the effect of dynamic temperature differences between tube rows. When the GT is ignited, a heat front will travel through the superheater and reheater towards the HP evaporator (Fig 3). The tube rows warm up one-by-one, while not being cooled by steam (since steam production has not started yet). This creates large expansion differences between tube rows and bundles, as well as headers twisting or bending.

A lot of tube-to-header failures can be traced to stresses induced by these large differential-expansion forces and bending moments that occur during every startup. It follows that accurate measurement of the tube row temperatures is essential to a servicelife monitoring system. Such accuracy, however, comes at the cost of progressively more data that has to be handled and interpreted.

Interpreting the results

Finally, once the stresses and their cycles are determined, the impact on the material must be established. More important, the impact must be presented in such a way that it can be used in economic decision-making.

Often the damage resulting from fatigue and creep—plus their very important interaction—is presented as a “damage fraction.” This assumes that every stress cycle or interval at high temperature inflicts a certain amount of damage. After a 100% damage level is reached, the material is predicted to collapse. The damage fraction tells you how far, linearly speaking, the material is from failure.

In reality, the material’s behavior is not linear. It is more complicated than that. In fact, both fatigue and creep are probabilistic phenomena. This means that a probability of failure develops according to a certain function of the number of stress cycles and time at high temperature. That particular failure represents an amount of money that will be lost when it actually occurs. Therefore the probability of failure represents a “risk”—defined as the probability multiplied by the consequence— which is a parameter that also can be expressed as an amount of money. That amount of money can be compared directly with the benefits that are expected if the plant takes that risk.

So instead of presenting a damage fraction, a robust service-life monitoring system can present the chance of failure at a certain critical location. Combined with the cost of that failure, it provides operators with an actual monetary risk, which can be taken into account when deciding on a certain mode of operation. This approach requires good understanding of the probability of failure and its behavior, and the cost of failure of the critical locations.

The latter might also be a function of the way the probability function develops. For example, repairing a crack in a steam nozzle obviously is different than replacing that nozzle because of its complete failure.


Nevertheless, presenting the data merely as a damage fraction can be quite useful because it makes it easy to compare different modes of operation. This is particularly true at the beginning of service life, when chances of complete failure are still small.

Challenge for existing HRSGs. A service-life monitoring system ideally would be employed on new HRSGs, before they have accumulated any run time. The difficulty in retrofitting such a monitoring system on existing HRSGs is that you somehow must estimate the damage that has already been inflicted on each unit since its initial startup.

One option might be to connect the service-life monitoring system to a dynamic model that runs the operational data accumulated by the plant, from initial startup to the moment the service-life monitoring system was installed. Depending on the chosen method of presenting the data, an initial damage fraction, a probability of failure, or a risk could then be determined for each of the critical locations.

Summarizing the points made

  • A service-life monitoring system for an HRSG is essential to optimizing operating procedures and to improving maintenance strategies.
  • Since no two combined-cycle/ cogen plants are the perfectly identical, the service-life monitoring system must be customized for each HRSG and balance-of-plant configuration.
  • For cycling units, fatigue is the predominant damage mechanism that reduces service life. Therefore, a thorough dynamic analysis of the HRSG and balance-of-plant must be performed to identify the critical locations that must be monitored.
  • For a service-life monitoring system to yield credible results, the sensitivity of fatigue damage for stress requires the best attainable accuracy. This begins with proper measurements, which allow calculation of the actual peak stresses.
  • Peak stress not only depends on local effects at the component level, but also of interactions at the sub-component level. Therefore, these interactions also must be monitored, requiring a significant increase in the number of measurements that must be collected.
  • Proper measurement of essential parameters requires extra attention. This is particularly true for the many critical locations that are hard to reach, and therefore must rely on indirect measurements.
  • To gain a better insight into the damage mechanisms that are affecting your HRSG, you can use several different methods for presenting the data. Both fatigue and creep are probabilistic phenomenon that can be used to determine how the probability of crack initiation develops. Translating this probability into a “risk” factor provides operators with an expected financial cost of failure that can be weighed against the expected financial benefits of continuing to operate. ccj oh