Prediction of Maintenance Equipment Failures Using Automated Machine Learning

. Predictive maintenance is a key area that is beneﬁting from the Industry 4.0 advent. Recently, there have been several attempts to use Machine Learning (ML) in order to optimize the maintenance of equipments and their repairs, with most of these approaches assuming an expert-based ML modeling. In this paper, we explore an Automated Machine Learning (AutoML) approach to address a predictive maintenance task related to a Portuguese software company. Using recently collected data from one of the company clients, we ﬁrstly performed a benchmark comparison study that included four open-source modern AutoML technologies to predict the number of days until the next failure of an equipment and also determine if the equipments will fail in a ﬁxed amount of days. Overall, the results were very close among all AutoML tools, with AutoGluon obtaining the best results for all ML tasks. Then, the best AutoML predictive results were compared with a manual ML modeling approach that used the same dataset. The results achieved by the AutoML approach outperformed the manual method, thus demonstrating the quality of the automated modeling for the predictive maintenance domain.


Introduction
The Industry 4.0 phenomenon allowed companies to focus on the analysis of historical data to obtain useful insights. In particular, predictive maintenance is a crucial application area that emerged from this context, where the goal is to optimize the maintenance and repair process of equipments through the usage of Machine Learning (ML) algorithms [17]. Indeed, some ML studies try to anticipate the failure of equipments (typically, manufacturing machines), aiming to reduce the costs of repairs [4,6,11,16]. Other approaches [2,3,5,19] use ML algorithms to predict the behavior of the manufacturing process.
Despite all potential Industry 4.0 benefits, many organizations do not currently apply ML to enhance maintenance activities. And those who do rely mostly on data science experts, the ML models are tuned manually, often requiring a large number of trial-and-error experiments. In effect, we have only found one study that applied AutoML for the maintenance domain [18]. Yet, we note that such study only used synthetic data, which might not reflect the complexities of real industrial maintenance data. In contrast with the "traditional" ML expert design approach, in this paper we apply Automated Machine Learning (AutoML), aiming to automate the ML modeling phase and thus reduce the data to maintenance insights process cycle. Moreover, we apply AutoML using real-world data, collected from the client of a Portuguese software company in the area of maintenance management.
The AutoML was explored for two specific prediction tasks: the number of days until an equipment fails and if the equipments will fail in a fixed number of days. We designed a large set of computational experiments to assess the AutoML predictive performance of four open-source tools. To provide a baseline comparison, we also measure the best AutoML results with a manual ML modeling that was made previously by one of the company's professionals. The comparison clearly favors the AutoML results, thus attesting the potential of the AutoML approach for the predictive maintenance domain.

AutoML Tools
In this article, we apply and compare four modern open-source AutoML tools, based on a recent benchmark study performed in [9]. In order to achieve a more fair comparison, we did not tune the hyperparameters of the AutoML tools. Table 1 summarizes the main characteristics of the four explored AutoML tools: -AutoGluon is an AutoML toolkit based on the Gluon framework [1]. In this work, we only considered the tabular data module, which runs several algorithms and returns a Stacked Ensemble with multiple layers [8].
-H2O AutoML is the AutoML module from the H2O framework. H2O AutoML runs several algorithms from H2O and two Stacked Ensembles, one with the best ML model of each family and another with all models [10,13]. -rminer is a library for the R programming language, focused on facilitating the usage of ML algorithms [7]. Rminer also provides AutoML functions that can be highly customized. In this paper, we used the "automl3" template 4 , which runs several ML algorithms and one Stacked Ensemble. -TPOT is a Python AutoML tool that uses Genetic Programming to automate several phases of the ML workflow [12,15]. It uses the Python Scikit-Learn framework to produce ML pipelines.

Data
The provided data has a large number of datasets related to predictive maintenance, which are detailed in Fig. 1. For the context of this work, we assume a tabular dataset composed of the aggregation of several attributes from each entity. Overall, the data includes 2,608 records and 21 input attributes. Each record represents an action (e.g., a work order) related to one of the company's equipments (e.g., industrial machine). Each record includes diverse inputs attributes, such as the tasks performed by the machine, consumption of material, and meter readings.
The data also includes five target variables for regression or binary classification tasks. The regression task target (attribute DaysToNextFailure) describes the number of days between that record and the failure of the respective equipment. As for the binary classification targets (attributes FailOnxDays), these describe if the equipment will fail or not in a certain amount of days (e.g., in three days). Table 2 details the input and output variables (Attribute), their description (Description), data type (Type), number of levels (Levels), domain values (Domain), and example values from one of the records (Example). Half (12) of the 21 input attributes are categorical. Among these, most present a low cardinality (e.g., RecordType, Brand). However, some of the attributes present a very high cardinality (e.g., Part).

Data Preprocessing
Since several data attributes are of type String, which is not accepted by some AutoML tools, we opted to encode all String attributes into numerical types. For the String attributes that presented a low cardinality (five levels or less), we applied the known One-Hot encoding. Since this method creates one binary column for each level of the original attribute, we applied a different transformation for the columns with a higher cardinality.
Indeed, for the categorical variables with more than five levels, we used the Inverse Document Frequency (IDF) technique [14]. This method is used to convert a categorical column into a numerical column of positive values, based on the frequency of each level of the attribute. IDF uses the function f (x) = log(n/f x), where n is the length of x and f x is the frequency of x. The benefit of IDF, when compared with One-Hot Encoding, is that the IDF technique does not generate new columns, which is useful for attributes with high cardinality (e.g., the attribute Part has 161 levels).
The remaining attributes (of Integer and Float types) were not altered because most AutoML tools already apply preprocessing techniques to the numerical columns (e.g., normalization, standardization). After applying the transformations, the final dataset had 42 inputs and 5 target columns.

Evaluation
In order to evaluate the results from the AutoML tools, we adopted a similar approach to the benchmark developed in [9]. For every predictive experiment, we divided the dataset into 10 folds for an external cross-validation and adopted an internal 5-fold cross-validation (i.e., over the training data) to select the best Indication whether the equipment Integer 2 {0,1} 1 will fail in the next 5 days

FailOn7Days
Indication whether the equipment Integer 2 {0,1} 1 will fail in the next 7 days For all four AutoML tools we defined a maximum training time of one hour (3,600 seconds) and an early stopping of three rounds, when available. The maximum time of one hour was chosen since it is the default value for most of the AutoML tools. We computed the average of the evaluation measures, computed on the test sets of the 10 external folds, to provide an aggregated value. Additionally, we use confidence intervals based on the t-distribution with 95% confidence to verify the statistical significance of the experiments. In order to identify the best results for each target, we choose the AutoML tool that had the best average predictive performance (with maximum precision of 0.01). All experiments were executed using an Intel Xeon 1.70GHz server with 56 cores and 64GB of RAM.

Results
All the experiments were implemented in Python or R (when the tool did not have a Python API) using the AutoML libraries detailed in Section 2.1. For each AutoML tool, we executed five experiments, one for each target variable (DaysToNextFailure and FailOnxDays). Table 3 shows the average external test scores for all 10 folds and the respective confidence intervals (near the ± symbol). Overall the best AutoML tool was AutoGluon, which produced the highest AUC values for the binary classification tasks and the lowest MAE value for the regression task. For the regression task (DaysToNextFailure), besides Au-toGluon, the best AutoML tools were H2O AutoML, TPOT, and rminer. The maximum predictive difference was 3.94 points (days). As for the binary classification, the predictive test set results are more similar: maximum difference of 3 percentage points (pp) for FailOn3Days, 4 pp for FailOn5Days, 1 pp for FailOn7Days, and 1 pp for FailOn10Days. AutoGluon was the best tool for all four binary classification targets, followed by H2O AutoML and TPOT (best in two targets each).
Finally, we compare the best AutoML results for each target with the best result achieved by a human ML modeling (held before this study). For each target, Table 4 shows the best predictive result and the respective AutoML tool in rounded brackets. For each AutoML tool, we also show the algorithm that was most often the leader, across the external folds. For the human modeling, we show the best obtained result and the used algorithm (also in rounded brackets).
It should be noted that the human modeling used a distinct preprocessing procedure since it applied the One-Hot encoding to all categorical attributes (and not IDF for the high cardinality ones, as we adopted for the AutoML tools). Nevertheless, the comparison clearly favors the AutoML results for all predicted target variables. In particular, for the binary classification task, the human modeling achieved only slightly better results than a random model, while all AutoML tools achieved results that can be considered excellent (e.g., AUC higher than 0.90). For regression, the human modeling achieved an average error of 68.36 days, while the highest MAE obtained by the AutoML tools was 8.89 (achieved by rminer).

Conclusions
Predictive maintenance is a key industrial application that is being increasingly enhanced by the adoption of ML. Yet, most ML related works assume an expert ML model design that requires manual effort and time. In this paper, we explore the potential of AutoML to automate predictive maintenance ML modeling. We used real-world data provided by a Portuguese software company within the domain of maintenance management to predict equipment malfunctions.
Our goal was to anticipate failures from several types of equipments (e.g., industrial machines), using two ML tasks: regression -to predict the number of days until the next failure of the equipment; and binary classification -to predict if the equipment will fail in a fixed amount of days (e.g, in three days). For the ML modeling and training, we used four recent state-of-the-art Automated Machine Learning (AutoML) tools: AutoGluon, H2O AutoML, rminer, and TPOT.
Several computational experiments were held, assuming five predictive tasks (one regression and four binary classifications). For all ML tasks, AutoGluon presented the best average results among the AutoML tools. The AutoML results were further compared with a human ML design, performed previously by a professional of the Portuguese company. The comparison favored all AutoML tools, which provided better average results than the manual approach by a large margin. These results confirm the potential of the AutoML modeling, which can automatically provide high quality predictive models. This is particularly valuable for the predictive maintenance domain since industrial data can arise with a high velocity, thus the predictive models can be dynamically updated through time, reducing the effort of the data analysis.
In future work, we intend to perform experiments with more AutoML tools and from the domain of predictive maintenance datasets. In particular, we intend to experiment with AutoML technologies that can automatically perform feature engineering and selection tasks.