Degradation modeling applied to residual lifetime prediction

Preparing to load PDF file. please wait...

0 of 0
Degradation modeling applied to residual lifetime prediction

Transcript Of Degradation modeling applied to residual lifetime prediction

The Annals of Applied Statistics 2011, Vol. 5, No. 2B, 1586–1610 DOI: 10.1214/10-AOAS448 © Institute of Mathematical Statistics, 2011
Georgia Institute of Technology
Sensor-based degradation signals measure the accumulation of damage of an engineering system using sensor technology. Degradation signals can be used to estimate, for example, the distribution of the remaining life of partially degraded systems and/or their components. In this paper we present a nonparametric degradation modeling framework for making inference on the evolution of degradation signals that are observed sparsely or over short intervals of times. Furthermore, an empirical Bayes approach is used to update the stochastic parameters of the degradation model in real-time using training degradation signals for online monitoring of components operating in the field. The primary application of this Bayesian framework is updating the residual lifetime up to a degradation threshold of partially degraded components. We validate our degradation modeling approach using a real-world crack growth data set as well as a case study of simulated degradation signals.
1. Introduction. Most failures of engineering systems result from a gradual and irreversible accumulation of damage that occurs during a system’s life cycle. This process is known as degradation [Bogdanoff and Kozin (1985)]. In many applications, it can be very difficult to assess and observe physical degradation, especially when real-time observations are required. However, degradation processes are almost always associated with some manifestations that are much easier to observe and monitor overtime. Generally, the evolution of these manifestations can be monitored using sensor technology through a process known as Condition Monitoring (CM). The observed condition-based signals are known as degradation signals [Nelson (1990)] and are usually correlated with the underlying physical degradation process. Some examples of degradation signals include vibration signals for monitoring excessive wear in rotating machinery, acoustic emissions for monitoring crack propagation, temperature changes and oil debris for monitoring engine lubrication and many others.
Degradation modeling attempts to characterize the evolution of degradation signals. There is a significant number of research works that have focused on degradation models; these include models presented in Lu and Meeker (1993), Padgett and Tomlinson (2004), Gebraeel et al. (2005), Müller and Zhang (2005), Gebraeel
Received May 2010; revised October 2010. 1Supported in part by the NSF Grant CMMI-0738647. Key words and phrases. Condition Monitoring, functional principal component analysis, nonparametric estimation, residual life distribution, sparse degradation signal.



FIG. 1. Examples of incomplete degradation signals.
(2006) and Park and Padgett (2006). Many of these models rely on a representative sample of complete degradation signals. A complete degradation signal is a continuously observed signal that captures the degradation process of a component from a brand “New State” to a completely “Failed State.”
Unfortunately, building a database of complete degradation signals can be very expensive and time consuming in applications, such as monitoring of jet engines, turbines, power generating units, structures and bridges and many others. For example, in applications consisting of relatively static structures such as bridges, degradation usually takes place very slowly (several tens of years). Since the system is relatively static, it suffices to observe the degradation process at intermittent discrete time points. The result is a sparsely observed degradation signal such as the signals depicted in Figure 1(a). In contrast, in applications consisting of dynamic systems, such as turbines, generators and machines, degradation cannot be reasonably assessed by sparse measurements. At the same time, continuous observations of the complete degradation process of such systems are economically unjustifiable. Usually, the only way to gain a relatively accurate understanding of the health/performance of a dynamic system is to monitor its performance over a time interval. In naval maritime applications, power generating units of an aircraft are removed, tested for a short period of time (during which degradation data can be acquired) and put back into operation. The result is a collection of fragmented degradation signals as depicted in Figure 1(b).
In this paper we develop a degradation model that applies to incomplete degradation signals as well as complete degradation signals. An incomplete degradation signal is defined as a signal that consists of sparse observations of the degradation process or continuous observations made over short time intervals (fragments). One challenge in such applications is that the evolution of the degradation signals cannot be readily assessed to determine the parametric form of the underlying degradation model. This is because one cannot clearly trace how a degradation



signal progresses over time from incomplete observations. For example, is there a well defined parametric model that describes the signals in Figure 1?
To overcome this challenge, the underlying degradation model in this paper is assumed nonparametric. Most degradation models used to characterize the evolution of sensor-based degradation signals are parametric models. A common approach is to model the degradation signals using a parametric (linear) model with random coefficients [Lu and Meeker (1993); Gebraeel et al. (2005); Gebraeel (2006)]. Other modeling approaches assume that the degradation signal follows a Brownian motion process [Doksum and Hoyland (1992); Pettit and Young (1999)] or a Gaussian process with known covariance structure [Padgett and Tomlinson (2004); Park and Padgett (2006)]. In contrast, we assume that the mean and covariance functions of the degradation process are unknown and they are estimated based on an assembly of training incomplete degradation signals. The mean function is estimated using standard nonparametric regression methods such as local smoothing [Fan and Yao (2003)]. The covariance function is decomposed using the Karhunen–Loève decomposition [Karhunen (1947); Loève (1945)] and estimated using the Functional Principal Component Analysis (FPCA) method introduced by Yao, Müller and Wang (2005).
Under the nonparametric modeling framework, one condition for accurate estimation of the mean and covariance functions is that the degradation process is densely observed throughout its support. However, in applications where the degradation signals are incompletely sampled, not all degradation signals are observed up to the point of failure; in addition, only a few components will survive up to the maximum time point of the degradation process support. Consequently, the degradation process is commonly under-sampled close to the upper bound of its support. To overcome this difficulty, we introduce a nonuniform sampling procedure for collecting incomplete degradation signals, which ensures relatively dense coverage throughout the sampling time domain.
One important application of degradation modeling is predicting the lifetime of components operating in the field. For real-time monitoring, an empirical Bayes approach is introduced to update the stochastic parameters of the degradation model. In this paper we focus on estimation of the distribution of the residual life up to a degradation threshold for partially degraded components using training degradation signals which are sparsely or completely observed. Other applications of the degradation modeling and the Bayesian updating procedure are estimation of the lifetime at a specified degradation level and estimation of the degradation level at a specified lifetime.
We evaluate the performance of our methodology using both a crack growth data set and simulated degradation signals. In these empirical studies we compare parametric to nonparametric degradation modeling, assess the estimation accuracy of the remaining lifetime for complete and incomplete signals, and contrast uniform vs. nonuniform sampling procedures for acquiring ensembles of incomplete



degradation signals. In both studies there is not a significant decrease in the accuracy of the residual life estimation when using ensembles of incomplete instead of complete signals. We also highlight the robustness of our approach by comparing it with misspecified parametric models, which are common when the underlying degradation process is complicated and sparsely observed. Last, we show in the simulation study that using a nonuniform sampling procedure that ensures dense observation of the sampling time domain reduces the estimation error. Based on these empirical studies, we conclude that the nonparametric approach introduced in this paper is efficient in characterizing the underlying degradation process and it is more robust to model misspecification than parametric approaches, which is particularly important when the training signals are incompletely observed (sparse or fragmented).
The remainder of the paper is organized as follows. Section 2 discusses the development of our degradation modeling framework. The empirical Bayes approach for updating the degradation distribution of a partially degraded component is introduced in Section 3. The derivation of the remaining lifetime distribution under the empirical Bayes approach is presented in Section 4. In Section 5 we introduce an experimental design for sampling incomplete degradation signals. We discuss performance results of our methodology using real-world and simulated degradation signals in Section 6 and 7, respectively.

2. Sensor-based degradation modeling. We denote the observed degradation signals Si(tij ), for j = 1, . . . , mi (mi is the number of observation time points for signal i) and i = 1, . . . , n (n is the number of signals) where {tij }j=1,...,mi are the observation time points in a bounded time domain [0, M] for signal i. Note
that M will always be finite since any industrial application has a finite time of
failure. We model the distribution of the signals Si(t) by borrowing information across multiple degradation signals. We decompose the degradation signal as


Si(t) = μ(t) + Xi(t) + σ εi(t),

where μ(t) is the underlying trend of the degradation process and is assumed to be
fixed but unknown, and Xi(t) represents the random deviation from the underlying degradation trend. We also assume Xi(t) and εi(t) are independent.
The model in (2.1) is a general decomposition for functional data with various
modeling alternatives and assumptions for the model components: μ(t), Xi(t), and εi(t). In this paper we discuss one such modeling alternative which applies to sparse and fragmented signals as well as to complete signals and it applies under the assumption that the observation time points {tij }j=1,...,mi are fixed but not necessarily equally spaced and the assumption that the error terms εi(t) are independent and identically distributed. Deviations from these assumptions may require
some modifications to the modeling approach discussed in this paper.
In our modeling approach, the degradation signal Si(t) follows a stochastic process with mean μ(t) and stochastic deviations Xi(t) with mean zero and covariance cov(t, t ). The mean function μ(t) and the covariance surface cov(t, t )



are both assumed to be nonparametric, that is, no prespecified assumption on their
shape. This generalized assumption encompasses the particular cases developed
earlier by Gebraeel et al. (2005) and Gebraeel (2006), which assume a linear trend, μ(t) = α + βt where α ∼ N (0, δα) and β ∼ N (0, δβ), and parametric covariance structure cov(t, t ) = δα + δβ tt .
The following steps discuss how we estimate the mean function and the covari-
ance surface of our degradation model.

Step 1: We use nonparametric methods to estimate the mean μ(t). In this paper we use local quadratic smoothing [Fan and Yao (2003)] to allow estimation of the mean function under general settings including complete and incomplete (sparse and fragmented) signals. The bandwidth parameter, which controls the smoothing level, is selected using the leave-one-curve-out cross-validation method [Rice and Silverman (1991)]. Alternative estimation methods include decomposition of the mean function using a basis of functions (e.g., splines, Fourier, wavelets) and estimate the coefficients using parametric methods. These alternative methods will apply under various signal behaviors (e.g., smooth vs. with sharp changes, uniformly vs. nonuniformly sampled).
Step 2: The covariance surface is estimated using the demeaned data, Si(t) − μˆ (t), where μˆ (t) is the local quadratic smoothing estimate of μ(t). Using the Karhunen–Loéve decomposition [Karhunen (1947); Loève (1945)], the covariance, cov(t, t ) = Cov(Si(t), Si(t )), can be expressed as follows:


cov(t, t ) = λkφk(t)φk(t ), t, t ∈ [0, M],


where φk(t) for k = 1, 2, . . . are the associated eigenfunctions with support [0, M] and λ1 ≥ λ2 ≥ · · · are the ordered eigenvalues. Based on this decomposition, the deviations from the underlying degradation trend Xi(t) are decomposed using the following expression:


Xi(tij ) = ξikφk(tij ),

where ξik called scores are uncorrelated random effects with mean zero and variance E(ξi2k) = λk. The decomposition in equation (2.3) is an infinite sum. Generally, only a small number of eigenvalues are commonly significantly nonzero. For
the eigenvalues which are approximately zero, the corresponding scores will also
be approximately zero. Consequently, we use a truncated version of this decompo-
sition. Therefore, expression (2.3) can be approximated as follows:


Xi(tij ) = ξikφk(tij ),

where K is the number of significantly nonzero eigenvalues. We select K to minimize the modified Akaike criterion defined by Yao, Müller and Wang (2005).



In the statistical literature this method has been coined Functional Principal Component Analysis (FPCA). The key reference for FPCA is Ramsay and Silverman [(1997), Chapter 8]. Another important reference is Yao, Müller and Wang (2005), in which the authors derived theoretical results for model parameter consistency and asymptotic (n large) distribution results under the assumption that the scores follow a normal distribution.
An alternative method for estimating the covariance function of the process Xi(t) is decomposing the covariance function as in equation (2.2) where the basis of functions {φk, k = 1, 2, . . .} is fixed [James, Hastie and Sugar (2000)]. However, this approach doesn’t allow dimensionality reduction in the same way FPCA does and it is not theoretically founded.

3. Degradation model updating. Next, we consider a component operating in the field called fielded component. Assume that we have observed its degradation signal at a vector of time t = (t1, . . . , tm∗); therefore, S(t) denotes the observed signal of the testing component, m∗ represents the number of observations and t∗ = tm∗ denotes the latest observation time. In this section we introduce an Empirical Bayes approach which allows real-time updating of the distribution of the degradation process for partially degraded components given the observed signal S(t) and the prior distribution of the scores ξik for k = 1, 2, . . . . The prior distribution of the scores is estimated empirically from a set of historical degradation signals.
Proposition 1 illustrates the updating procedure assuming that the prior distribution of the scores is normal and assuming that the mean function μ(t) and the basis of functions φk(t), k = 1, . . . , K, are fixed. The proof of this proposition follows from the theory of Bayesian linear models.

PROPOSITION 1. Assume that S(t) follows

S(t) = μ(t) + ξkφk(t) + ε(t),

where the prior distribution of ξk is N (0, λk) with ξ1, . . . , ξK uncorrelated; ε(t) are independent of ξk for k = 1, . . . , K; the distribution of ε(t) is N (0, σ 2) with σ 2 fixed. It follows that the posterior distribution of the scores is

(ξ1∗, . . . , ξK∗ ) ∼ N (Cd, C),



1 P (t) P (t) +

−1 −1

and d = 1 P (t) S(t) − μ(t)





with (3.1)

S(t) = (S(t1), . . . , S(tm∗)) , = diag(λ1, . . . , λK ),

μ(t) = (μ(t1), . . . , μ(tm∗)) ,

φ1(t1) . . . φK (t1)

P (t) = ⎝ . . . . . . . . . ⎠ .

φ1(tm∗ ) . . . φK (tm∗ )

In Proposition 1 the prior distribution of the scores is specified by the variance

parameters λk, k = 1, . . . , K, which are estimated using the degradation model

in Section 2 and based on a set of incomplete or complete training degradation

signals. Specifically, we first apply Functional Principal Component Analysis on

the historical degradation signals which will further provide estimates for the vari-

ance parameters λk, k = 1, . . . , K, and the eigenfunctions φk, k = 1, . . . , K. Based

on these estimates, we obtain the posterior distributions of the updated scores ξ1∗, . . . , ξK∗ since the matrix C and the vector d are fully determined by the eigenvalues λk, k = 1, . . . , K, and the eigenfunctions φk, k = 1, . . . , K. The expectation

of the posterior scores is nonzero and, therefore, we denote the posterior mean

function μ∗(t) = μ(t) +

K k=1







Following Proposition 1, the expectation of the posterior distribution follows

the same formula as the conditional expectation estimator in Yao, Müller and

Wang (2005), equation (4). Generally, this similarity applies under the empirical

Bayesian prior derived from FPCA. On the other hand, the sampling distribution

of the conditional expectation estimator in Yao, Müller and Wang (2005) is different from the posterior distribution of ξk∗, k = 1, . . . , K, since their variances are
not equal. Moreover, the conditional expectation estimator and its mean estimation

error in Yao, Müller and Wang (2005) are conditional on the training observations,

whereas the posterior distribution in Proposition 1 is conditional on the observa-

tions of a new component.

The advantage of this Bayesian framework is that it unifies the conditional ex-

pectation estimation and prediction into a procedure which allows updating the

distribution of the degradation process for a new component. We can therefore use

the posterior distribution of the scores for a partially degraded signal to estimate

the distribution of various statistical summaries, including the lifetime at a speci-

fied degradation level and estimation of the degradation level at a specified time.

In the next section we discuss one specific application to this updating framework:

residual life estimation.

4. Remaining life distribution. In this paper we focus our attention on engineering applications where a soft-failure of a system occurs once its underlying degradation process reaches a predetermined critical threshold. This critical threshold is commonly used to initiate maintenance activities such as repair and/or component replacement well in advance of catastrophic failure. Consequently, degradation data can still be observed beyond the critical threshold. In this section we



describe how our degradation modeling framework is applied to estimate the dis-
tribution of remaining life up to a degradation threshold of partially degraded sys-
tems. In the remainder of this section, S∗(·) will denote the underlying degradation
process of a partially degraded system. Based on the degradation process S∗(·), the failure time of a system is defined as


T = inf {S∗(t) ≥ D}.
t ∈[0,M ]

One has to bear in mind that T may not exist if the threshold D is set too high, that is, the component may fail before its degradation signal reaches the threshold. The selection of the failure threshold D is an important problem, however, this aspect is beyond the scope of this paper. In this work, we assume that T exists, and the threshold D is known a priori. This is a reasonable assumption because in many industrial applications failure/alarm thresholds are usually based on subjective engineering judgement or well-accepted standards, such as the International Standards Organization (ISO) (e.g., the ISO 2372 is used for defining acceptable vibration threshold levels for different machine classifications). A second assumption is that the failure time T is smaller than a maximum failure time M. This assumption is also reasonable, as in practice a component may be replaced after a given period of time even if it did not fail.
The distribution of the residual life (RLD) of a partially degraded component at a fixed time t∗ ∈ [0, M] is estimated assuming that the degradation process S∗(·) of the component follows a posterior distribution based on Proposition 1. We estimate the distribution of the residual life (RLD) using

R(y|t∗) = P T − t∗ ≤ y|S∗(t) ∼ Gaussian(μ∗(t), Cov∗(t, t )), t∗ ≤ T ≤ M ,

where μ∗(t) and Cov∗(t, t ) are the posterior mean and covariance functions of the degradation process S∗(·). The derivations of μ∗(t) and Cov∗(t, t ) are based on the results of Proposition 1. We note here that the distribution of the RLD above is not conditional on the observed signal of the partially degraded component but on the posterior distribution of its degradation process; since the degradation is only partially observed and most often sparsely sampled, conditioning on the posterior distribution will generally provide a more accurate RLD estimator since we incorporate the additional information in the training degradation signals.
Furthermore, we estimate RLD under two assumptions:

A.1 The new component has not failed up to the last observation time point t∗, that is, the failure time becomes

T = inf {S∗(t) ≥ D} = inf {S∗(t) ≥ D} := T ∗.

t ∈[0,M ]

t ∈[t ∗ ,M ]

A.2 We assume the probability that the degradation process S∗(t) crosses back the threshold D after the failure time T ∗ is negligible, that is, P (S∗(T ∗ + y) <



D) ≈ 0 for all y > 0. This implies, if we condition on y ≥ T ∗ − t∗ > 0, which is the same as conditioning on T ∗ ≤ t∗ +y, P (S∗(t∗ +y) < D|T ∗ ≤ t∗ +y) ≈ 0. This further implies
P S∗(t∗ + y) ≥ D = P S∗(t∗ + y) ≥ D|T ∗ ≤ t∗ + y P (T ∗ ≤ t∗ + y)
≈ P (T ∗ ≤ t∗ + y).

Under these two assumptions, the RLD becomes R(y|t∗) = (by A.1) P T ∗ − t∗ ≤ y|S∗(t) ≈ (by A.2) P S∗(t∗ + y) ≥ D|S∗(t) .

The approximation in assumption A.2 is similar to the approximation in the paper
by Lu and Meeker (1993) which assumes that the probability of a negative random
slope in the linear model is negligible. One particular case for the assumption A.2
to hold is that the signal is monotone. However, monotonicity is not a necessary
condition. Assumption A.2 also holds for nonmonotone signals—an example of
such a signal is in Figure 2.
Proposition 2 below describes the updating procedure for RLD of a new component given the posterior distribution of its degradation process S∗(·) updated up to time t∗. The proof follows directly as a consequence of Proposition 1.

PROPOSITION 2. For a new partially degraded component with its degradation process S∗(·) updated up to time t∗, the residual life distribution is given as

FIG. 2. Example of a signal for which assumption A.2 holds.



follows: (4.2) P T − t∗ ≤ y|S∗(·), T ≥ t∗ =

Z(g∗(y|t∗)) − Z(g∗(0|t∗))

1 − Z(g∗(0|t∗))


where Z represents the standard normal cumulative distribution function and g∗(y|t ∗) = μ√∗(Vt∗∗+(ty∗+)−yD) with
μ∗(t∗ + y) = μ(t∗ + y) + (Cd) p(t∗ + y),


V ∗(t∗ + y) =

[Ck1,k2φk1 (t ∗ + y)φk2 (t ∗ + y)].

k1=1 k2=1

In the above equations, p(t∗ + y) = (φ1(t∗ + y), . . . , φK (t∗ + y)) , and Ck1,k2 refers to the (k1, k2) element of the matrix C.
One advantage of obtaining the distribution rather than simply a point estimate is that we can also derive a confidence interval for the remaining lifetime up to a degradation threshold D. Following the derivation in Proposition 2, a 1 − α confidence interval for RLD is [L, U ] such that
P L ≤ T − t∗ ≤ U |S∗(·), T ≥ t∗ = 1 − α.

Since we have one equation with two unknowns, the lower—L and the upper— U tails are commonly equally weighted, and, therefore,

Z(g∗(U |t∗)) − Z(g∗(0|t∗))


1 − Z(g∗(0|t∗))

=1− 2


Z(g∗(L|t∗)) − Z(g∗(0|t∗)) α

1 − Z(g∗(0|t∗))

=. 2

However, we cannot obtain exact solutions for L and U because we do not have a closed-form expression for the inverse of the cumulative density function of T −t∗. For example, the first relationship is equivalent to finding U from g∗(U |t∗) = zα1 where zα1 is the 1 − α1 quantile of the normal distribution. Using this equation, we

would like to obtain U such that

μ∗(t∗ + U ) − D

√ V





= zα1 ,

which is a nonlinear function of U and its solution does not have a close form
expression. We therefore resort to parametric bootstrap [Efron and Tibshirani (1993); Davison and Hinkley (1997)] to sample from the distribution of T − t∗ which will give us a set of realizations from this distribution—T1, T2, . . . , TB . Using these realizations from the distribution of T − t∗, we estimate a quantile boot-
strap confidence interval. The confidence interval estimation procedure is as follows. For b = 1, . . . , B:
DistributionDegradation SignalsDegradation ProcessComponentPaper