Prognostics Health Management: Perspectives in Engineering Systems Reliability Prognostics

The Prognostic Health Management (PHM) has been asserting itself as the most promising methodology to enhance the effective reliability and availability of a product or system during its life-cycle conditions by detecting current and approaching failures, thus, providing mitigation of the system risks with reduced logistics and support costs. However, PHM is at an early stage of development, it also expresses some concerns about possible shortcomings of its methods, tools, metrics and standardization. These factors have been severely restricting the applicability of PHM and its adoption by the industry. This paper presents a comprehensive literature review about the PHM main general weaknesses. Exploring the research opportunities present in some recent publications, are discussed and outlined the general guidelines for finding the answer to these issues.


INTRODUCTION
In modern complex engineering systems the Prognostic Health Management (PHM) reveals an increasingly utmost importance in System Design and Development, Production and Construction, Operations, Logistics Support and Maintenance, Safety and Phase-out and Disposal (Sun et al. 2012, Rundle et al. 2012. Such importance is due to several factors summarized in Figure 1. Most of these factors are well discussed in the publications (Sun et al. 2012, Rundle et al. 2012. The PHM approach combines simultaneously several methods, tools and approaches, until quite recently used isolated. These characteristics have allowed PHM to continually achieved new application areas so far unfilled by any other approach, in energy efficiency programs (Welch & Rogers 2010), or as a decision support tool in prognostics-based product warranties and health monitoring based liability (Rundle et al. 2012, Ning, et al. 2013.
Many successful application examples have shown the PHM promising potential from this recent methodology, with technologies, methods and tools to solve problems associated with the reliability, availability and maintenance in Condition Based Maintenance (CBM) perspective (Sheppard et al. 2009, Sun et al. 2012. The use of operational and environmental conditions in systems analysis modelling is being advantageous for performance degradation and failure detection, in order to avoid and manage undesired occurrences. In recent years many PHM cases studies were posted based mainly on the isolated use of Model-based, Data-driven approaches and Fusion or Hybrid approaches. However, other relevant issues remain unresolved, especially in regards to problems related to: data-fusion with multi-dimensional Condition Monitoring (CM) input, development of models which can deal with multiple failure modes and the influence of external environmental variables, the treatment of dynamics and non-linearity of some degradation process with linear approaches (Si et al. 2012), reliability prognostics over extended periods of time, detection of intermittent faults and the inclusion of software failures in degradation models.
The hybrid method development should be increased with the inclusion of the uncertainty quantification, confidence levels quantification, methods, tools, metrics and standardization (Sun et al. 2012, Bird & Shao 2014. The influence of uncertainty sources on the accuracy and confidence levels in the PHM metrics reliability prognostics are still a key area development (Pecht 2010, Sun et al. 2012, Lumme & Pylvänen 2012, Wang et al. 2013. This paper is divided in next six sections: (2) PHM approaches. Describes the fundamental PHM approaches: Failure Modes Mechanisms and Effects Analysis (FMMEA), Based-model and fusion. The benefits and constraints of each approach or method are discussed. (3) PHM metrics. Describes the classification established for PHM metrics and the several methods can be used in metrics calculation in- cluding stochastic, statistical, Time series, Machine learning or Physics of Failure (PoF). (4) PHM standards reveal the most recent news about the PHM standardization developments. (5) PHM perspectives. This section discusses the fundamental PHM gaps related with the unsolved problems. The inclusion of uncertainty sources on the metrics accuracy and confidence levels and the inconsistencies for solving highly nonlinear problems involving linear approximations. (6) Future work. Exploring the research opportunities present in some recent publications, are outlined guidelines for the actuation and contribution in the development, consolidation and acceptance of the PHM by the industry. (7) Conclusions. This section gives an overview of the article highlights and the most significant aspects of the proposed work.

PHM APPROACHES
The FMMEA, Data-driven, Model-Based and Hybrid or Fusion approach are the most used approaches to perform PHM. Next sections summarize the key points, advantages and weaknesses of each one of these approaches.

FMMEA approach
The FMMEA was first put forward by the University of Maryland in Computer Aided Life Cycle Engineering Centre (CALCE) in the early 1990s of 20th century (He & Ma 2012). The FMMEA is an evolutionary fusion process of Failure Mode Effects Analysis (FMEA) with Failure Mode and Effects Cause Analysis (FMECA), mainly carried out by experience according to relevant standards. The FMMEA is an analytical method developed to identify, evaluate and prevent product and/or process failure modes, mechanisms and effects. The FMMEA compared with classical FMEA techniques is a very effective problem solving. Based on past experience, lays more emphasis on failure mechanisms and can be more effective in controlling the risk of product failure by excavating the root cause of potential failures. Normally the FMMEA is an early stage of PHM process. The outcome of FMMEA is setting up a list of critical failure modes and failure mechanisms that allow the identification of the parameters to monitor. These parameters are the basis for accurate the PoF models used on prognostic of life expectancy of the system, module, or component. The second FMMEA step is the risk analysis that provides the criteria for Risk Priority Number (RPN), which including estimating the de-tection capability, severity and likelihood of the failure (Pei et al. 2012, Wang et al. 2013. The revelead weakness of FEMMEA is the failure mechanism identification that is hard to implement. This difficulty comes from the non-standardized analysis processes, making FMMEA inefficiency, poor reliability and reuse (Wang et al. 2013, Liu et al. 2014. Recently the authors Liu et al. (2014) developed a new FMMEA method automation based with fuzzy cognitive map theory. It suggests a standardized description of the physical process of failure, and a standardized definition of function failure mode with property changes of output flow, which is an effective solution to nonstandard traditional FMMEA.

Model-based approach
The Model-Based approaches are domain expertise mathematical representations of the physical system by simultaneous usage of mathematical and PoF models of the system. Later through techniques of statistical estimation regularly improved on Kalman filter (KF), particle filter (PF) also known as Sequential Monte Carlo method (SMC), parity relationships, Bayesian approaches and Petri Nets, the residuals are calculated. With the residuals its possible obtains the system degradation model and the detection and isolation of faults (Han et al. 2012, Kulkarni et al. 2013. Generally the PoF approach could be applied in prognostics with two methods: (1) monitoring the life cycle environmental and operational conditions of the product. Then, using this information in physics-based failure models they provide real-time or periodic estimates of Remaining Useful Life (RUL), based on the knowledge of the processes that cause deterioration and lead to the occurrence of systems failure; (2) using the design of canary device(s) based on the FMMEA that identify the end of useful life. The wear on each identified weak point is obtained as a function of load conditions, depends of subsystem geometry and material properties. The purpose of the PoF approach in the PHM process is to calculate the accumulated damage by the different failure system mechanisms with operational environment (Mathew et al. 2012).
The model-based advantages are: damages estimation that may occur during all stages of storage and transport, taking into account the degradation caused by environmental conditions, such as thermal loads, humidity, vibration and impact. The knowledge of the failure mechanisms coupled with the monitoring system loads and parametric data, allow the identification of the nature and extent of the fault (Kulkarni et al. 2013, Zhanyong & Xuegang 2014. The model-based approach also has a number of constraints: the specific and detailed knowledge of the system, geometry and material composition, and the physical processes that lead to failure are needed, which are not always available. In complex systems is difficult or impossible to create models that represent the multiple physical processes occurring in the system, for example the intermittent faults. The presence of anomalies in the database affects the system health standard definition and the resulting data declassification. The declassification leads to problems of a false indication of anomalies, failure alarms and the indication of system abnormal operation (Pecht 2010).

Data-driven approach
The Data-driven approach could be used alternatively to the PoF for the reliability prognostics, by monitoring the system operation, with its environmental data and performance parameters (power, current, voltage, temperature, humidity, vibration and acoustic noise, etc.) which will be observed by sensors. Then data filtering and normalization is made in order to reduce noise and remove the effects of scale. The data are used to establish the health status of system and to identify performance deviations with the fault occurrence. The Data-driven approach is based on the assumption that the statistical characteristics of a system data remain unchanged until the fault occurrence. The system failure is defined by fixing the limit values from the observed parameters. The trends provide a more thorough evaluation of the parameters progression of damage or malfunction over time, allowing the construction of a prediction system (Sheppard et al. 2009, Gu et al. 2012, Tsui et al. 2014. The Data-driven approach can be applied on-line, when there are integrated data acquisition devices on the systems to supervise, or may be applied off-line, when it is analysed historic information previously collected and stored in databases to build the models instead (Chen & Pecht 2012, Tsui et al. 2014.
Data-driven approach can be divided into three different types according their usage: (1) If they have representative available data from healthy and unhealthy states of the system are applied supervised learning algorithms; (2) If a single class of data are available, such as related to the system healthy state is used a semi-supervised approach; (3) To the treatment of non-labelled data is applied the unsupervised learning approach. The use of semisupervised or supervised learning techniques requires the existence of reliable test data for nonoccurrence detection errors (Pecht 2010, Hu et al. 2012.
The data analysis techniques, most commonly used with the Data-driven tools are the Markov chains, stochastic processes and time series analysis. The strengths of the Data-driven approach can be described: It can be used as black-box models that can grasp the system behaviour based on monitored data without requiring specific knowledge of the system. To correlate the parameters it is also possible to use variations with interactions between subsystems and the effects of changes in environmental parameters using the local data acquisition system. The Data-driven tools are useful to the purposes of diagnostics, since they allow the standards recognition and the use of statistical techniques used in the detection of changes in the system parameters, allowing the detection and analysis of intermittent faults (Pecht 2010, Tsui et al. 2014. The Datadriven also have disadvantages: action may require the data system training with historical data to determine the strength of correlations, set standards, and data evaluation indicators of degradation and failure occurrence. In most applications, the historical or operating data are insufficient and difficult to obtain. The same is also true for the diagnosis and the determination of trend thresholds for fault prognosis, whereby in the case of products in storage, standby, non-operated systems and systems with occurrence of infrequent failures, which have never been subject to environmental wear conditions. The solution to this problem is the use of system models also called by Fusion or Hybrid models, such like PoF models combined with Data-driven models (Pecht 2010, Kulkarni et al. 2013, discussed in the next section.

Fusion or hybrid approach
The hybrid or fusion approach combines the Modelbased and Data-driven tools, taking full and direct double advantage of the benefits from both approaches in order to estimate the RUL under both types, operating and non-operating life-cycle conditions, Figure 2.
The first step of the fusion process is the FMMEA to help determine in real-time system diagnostics.
Generally it consists of all information available, including operational and environmental loads, as well as performance parameters. For verification of malfunctions existence, the observed data is compared with the parameters stored in the healthy database pattern, previously collected during the system operational phase. The database also contains other additional specifications and standards for the definition of faults. The most relevant parameters which contribute significantly to the anomaly are isolated and used to determine PoF models of system degradation. Therefore, the databases provide information as fault thresholds for the system parameters, failure modes, degradation states, and labels for healthy or unhealthy operating conditions (Pecht 2010, Kulkarni et al. 2013. In database detection parameters setting, machine learning techniques can be used or other probabilis- tic approaches such as parametric or distribution analysis. It is very important to identify the parameters indicators of changes presence in system performance. The isolation of parameters for PoF models can be accomplished using a variety of techniques, such as, Principal Components Analysis (PCA), Least Squares Estimation (LSE), Expectation Maximization (EM) and Maximum Likelihood Estimation (MLE). Based on information collected in the critical parameters of isolation phase, the most relevant degradation models are selected to identify the type of faults or failure degradation mechanisms that drove the system to a potential failure. The PoF models are used to calculate the RUL of the system, based on environmental parameters and data relating to the material properties and specifications of the system. The knowledge concerning the failure mechanisms and their representative models are used to extract information about the fault identification from the measured parameters, failure modes, degradation states, and labels for the healthy or unhealthy conditions. The alarms can be armed to warn the operating system failure eminence, based on the value of the RUL reported. It can be timely provided the system repair or replacement, depending on the critical application index , Zhanyong & Xuegang 2014. The authors Lui et al. (2011) propose a recognition condition of complex systems based on Multi-fractal Analysis and Demp-ster-Shafer evidence theory to extract and utilize the multi-source fusion information implied in monitoring highly non-linear data properly the industrial safety systems, or equipment with highly non-linear load conditions. The work published by Rotshtein et al. (2012) purposes to apply the innovative approach of chaos theory to the reliability modelling, by inserting the information about the causes of failures and establishing the connection with the elements (temperature, humidity, voltage, load condition, etc.) of their origin.
The Fusion approach takes advantages from both PoF and Data-driven approaches. Allows the detection of intermittent faults, identifies the failure precursors, detects deviations from the normal operation and enables the construction of degradation models to estimate the RUL or the End of Life (EOL). It also allows the effective maintenance planning and identification of processes causing eventual system failure, establishing the extent and nature of the failures and adopts effective maintenance strategies. Most of the studies published up to now used either alone Data-driven or PoF approaches. The Fusion approach takes advantages from both PoF and Data-driven approaches but few studies are posted. To improve and optimize this approach is required to intensify its diversification across a larger number of case studies.

PHM METRICS
The prognostics metrics classification is divided into two classes: user's requirements and functionality. The metrics categories about the end-user requirements fall into operational, engineering and regulatory metrics. The functional classification is based on the information type that these metrics can provide. Until now are identified and listed in detail five broad categories of metrics: algorithm performance, computational performance, cost-benefit-risk, ease algorithm certification (Saxena et al. 2010, Gu et al. 2012, and cost-benefit metrics (Sun et al. 2012, Chang et al. 2013).
In the current context of the systems based health management, the prognostics is defined as the detection of failure precursors from sensor data or maintenance historic data-base, predicting the RUL or the EOL by generating a current state estimate and using expected future operational conditions for a specific system. The estimation of RUL or EOL it is a prediction/ forecasting/ extrapolation process (Saxena et al. 2010 The RUL is fundamental to access condition and health monitoring information of components or systems and is considered a key factor in CBM, on the planning of maintenance activities, spare parts provision, operational performance, and the profitability on assets. On green manufacturing the RUL assumes significance, in the management of product reuse and recycle which impacts on energy consumption, raw material use, pollution and landfill and others requirements of the environmental protection standards. The RUL is characterized as a random variable with dependence of the asset age, operational environment and the health information from CM. The RUL can be calculated or estimated by Data-driven, Model-based or Fusion approaches. The data analysis methods, associated with the Data-driven approaches are the statistical and machine learning methods (Sun et al. 2012). The authors Si et al. (2012) propose following classification for the Datadriven statistical methods, Table 1. First the statistical based models are divided into two subgroups, namely, on directly observed state processes (online) and the indirectly observed state processes (offline).  In the online statistical based models there are the regression-based models, namely, coefficient regression, auto-regressive (AR) linear, auto-regressive moving average (ARMA) (Chen & Pecht 2012), non-parametric regression method (Fang et al. 2012), the Wiener process (Wei et al. 2013), Gamma processes (Xu & Wang 2012) and Markovian-based models , Tsui et al. 2014.
In the offline statistical based models can be found the stochastic non-linear filtering-based models, namely, non-linear filtering, Kalman filtering (KF) (Celaya et al. 2012, Zhang & Pisu 2014, Monte Carlo (MC) (Chen & Pecht 2012, Sustrino et al. 2012, Wei et al. 2013, Particle filter (PF) , Lau et al. 2012, Celaya et al. 2012, Chen & Pecht 2012, the covariate based hazard models (Ghodrati et al. 2012), Hidden Markov Model (HMM) and Hidden Semi-Markov model (HSMM) (Lau et al. 2012, Medjaher et al. 2012. Concerning the machine learning methods, Table  2, further disseminated in RUL estimations are Artificial neural network (ANN) (Lau et al. 2012, Fang et al. 2012, Neuro-fuzzy (NF), (Chen & Pecht 2012, Lau et al. 2012, Medjaher et al. 2012, Support Vector Machine (SVM) (Vasan et al. 2013), Hidden Markov Model (HMM), (Lau et al. 2012, Medjaher et al. 2012 and Dynamic Bayesian Networks (Si et al. 2012, Medjaher et al. 2012). Table 2. Data-driven machine learning methods approach. In the context of role concepts standardization, definitions and metrics, it is important to develop online metric performance evaluation algorithms, such as robustness and sensitivity. The current performance evaluation methods ignore the effects of future load conditions in the useful life consumption rate, the maintenance operations and fault tolerance characteristics of some systems. The development of these new metrics should be associated with the best methods of management and representation of uncertainty. The incorporation of risk, cost-benefit and parsing analysis it should also be considered (Saxena et al. 2010, Sun et al. 2012).

Quantification and uncertainty management
One of the biggest PHM current challenges is the uncertainty measurement and management. Predictions of the RUL or time to failure over extended time periods increase the inaccuracy due to the various sources of uncertainty. The most common sources of uncertainty are the sensors errors, uncertainty measurement on operating unplanned loading regimes, models assumptions and inconsistencies, loss of information due to the reduced amount of data and prognostics under no circumstances corresponding to the data used for tuning, data acquisition system as the amount of available data increases etc. (Pecht 2010, Sun et al. 2012, Lumme & Pylvänen 2012, Wang et al. 2013. For these reasons, decisions regarding the state of the system (maintenance, repair and replacement activities) should take into account the uncertainties described. It is necessary to develop methods for delimiting the uncertainty (upper and lower limits) and confidence levels for the metrics used. Recently the authors Du et al. (2014) propose an innovative method for reliability sensitivity analysis on quantifying the influence of the uncertainty change model inputs on the reliability model. Also the researchers Lau et al. (2012), Vasan et al. (2013), Tsui et al. (2014) & Zhicai et al. (2014 have put forward the Dempster-Shafer theory, Bayesian probability theory, fuzzy logic and the Weibull model algorithms to provide RUL estimation in the form of probability distribution functions (PDF). This method is suitable for the development of state models and as solution to the problem of the uncertainty.

PHM STANDARDS
The PHM is an engineering discipline is still climbing up the learning curve, which is why it is not universally accepted as a research methodology. One of the biggest current difficulties, relates to establish in the scientific community a set of prognostic concepts definitions, metrics and standard universal methods that allow the comparison of results and performance measurement methods and tools to avoid ambiguous and inconsistent interpretations (Saxena et al. 2010, Gu et al. 2012, Liu & Sun 2014. The work published by (Sheppard et al. 2009) makes a survey of methods and standards already standardized by IEEE in the areas of diagnostics and maintenance, with special focus on CBM, which can be applied or adapted to the PHM. The work done by Saxena et al. (2010), proposes a number of terms and definitions for the prognostics, categorizes, classifies and review the prediction applications in different fields, including aerospace, electronics, nuclear, medicine, finance, weather and automobile. More recently PHM Society is endeavouring in the establishment of common work committees to the study and establishment of PHM taxonomy. But, much more hard work remains to be done with the aggravating circumstance due to the lack of skilled human resources and training programs in PHM are also difficult tasks to overcome , Bird & Shao 2014.

PHM PERSPECTIVES
Present PHM challenges include the uncertainty analysis and investigation of techniques to match an estimated RUL and other metrics from multiple sources in order to provide unique combinations of RUL values. Some of the suggested techniques are the Dempster-Shafer regression, fuzzy algorithms and information Model-Based fusion techniques.
The challenges in the uncertainty analysis lead to the identification and quantification of all sources that contribute to erroneous predictions, as measurement noise, uncertainties models, and loss or unavailability of pattern data. It is also required the research and development of models and Data-Driven approaches take into consideration the uncertainty of the predictions made. Providing estimates in the form of PDFs are more informative and realistic for maintenance and logistical decisions. Another major challenge of PHM is to establish a set of prognostic concepts definitions, metrics and standard universal methods that allow the comparison of results and performance measurement methods and tools to prevent ambiguous and inconsistent interpretations among the scientific community. As part of the concepts standardization, definitions and metrics, it is important to develop online metrics performance algorithms evaluation. The current performance evaluation methods ignore the effects of future load conditions in the consumption rate of the useful life, the maintenance work and fault tolerance characteristics of some systems. In the development of these new metrics should be associated the uncertainty management methods, risk analysis, cost-benefit analysis and requirements analysis.

FUTURE WORK
The future work will focus two the main aspects of PHM approach weakness: (i) nonlinearity of the failure degradation models, and (ii) little knowledge in fusion approaches. Nonlinearity exists extensively in the failure degradation process of components or systems, by the fact that the degradation may accelerate at a late stage, the degradation process. In this way the linear models or linear assumptions cannot trace the dynamics of such degradation processes. However the nonlinearity remains complex and difficult to modelling (Si et al. 2012, Zhang & Pisu 2014. Fusion approaches are the combination of PoF and Data-driven models, but very few studies have been reported , Wang et al. 2013, Chen & Pecht 2012. The study of phenomenon's with nonlinear behaviour it is not an exclusive engineering problem. For instance in medicine the fractals are been used to study and extract information from non-linear dynamics vital health signals (Goldberger 1999). The work published by Yanqing et al. (2011) demonstrates that multi-fractal analysis in the form of Multi-fractal Detrended Fluctuation Analysis (MF-DFA) has significant advantage than conventional methods to extract non-linear features in industrial complex systems for condition recognition. The paper discusses the chaos theory in the general context of reliability theory with relevance to the reliability of Wireless Sensors Network (WSN) in a PHM system.
Recently the authors Rotshtein et al. (2012) in a preliminary study consider that the chaos theory is a convenient methodology for systems reliability dynamics observation of parameters connected with a failure. There is a deadlock to finding of a solution for the nonlinearity failure degradation modelling and the fusion approach optimization.
Within this framework of thought, our future studies in this work must focus the development and integration of fractals and chaotic algorithms in the parameter fault isolation and Fusion degradation models processes, identified in the Fusion approach model in Figure 2.

CONCLUSIONS
We are currently witnessing the PHM development and expansion on the reliability prognostics in various engineering areas. There are three basic approach tools to allow the RUL estimation in components or systems, when subjected to a variety of environmental conditions and functional loads, namely, Data-driven approach, Model-based or PoF approach and fusion or hybrid approach.
The Model-based and Data-driven approaches currently in use by PHM have advantages and limitations. The Model-based approach limitation is that it does not detect intermittent faults. The Datadriven approach is useful when the system information does not exist. The strength of this approach is the diagnostics, detection of intermittent faults, and reduction of the number of failures per detect. However, it has some constraints, including the difficulty in determining the RUL without historical data and the rules gap for establishing the failure thresholds used to calculate the RUL.
Current studies are focused on search means for effectively provide a PHM fusion system, which is a mix of Model-based approaches with technical Datadriven diagnosis and prognosis for dynamic RUL prognostics. In the studies published until now many fundamental questions still remain to solve: the related problems with the reliability prognostics over extended periods of time, detection of intermittent faults, inclusion of software failures in degradation models, data-fusion with multi-dimensional CM input, the influence of external environmental variables modelling, development of models which can deal with multiple failures and the treatment of dynamics and non-linearity of some degradation process with linear approaches, and the influence of uncertainty sources on the accuracy and confidence levels in the PHM metrics.
Concerning the difficulty to deal with multiple failures RUL estimation and the nonlinearity problem of failure degradation processes, the fractals and dynamics chaotic theory, similarly to other areas of science with similar problems, could be helpful. In this perspective future work shall focus the integration of fractals and chaos theory in the fusion approach model.