Condition based maintenance is the process of doing maintenance only when it is required. Adoption of this maintenance strategy leads to significant monetary gains as it precludes periodic maintenance and reduces unplanned downtime. Another term commonly used for condition based maintenance is predictive maintenance. As the name suggests, in this method we predict in advance when to perform maintenance. Maintenance is required, if fault has already occurred or is imminent. This leads us to the problem of fault diagnosis and prognosis.
In fault diagnosis, fault has already occurred and our aim is to find what type of fault is there and what is its severity. In fault prognosis, our aim is to predict the time of occurrence of fault in future, given its present state. These two problem are central to condition based maintenance. There are many methods to solve these problems. These methods can be broadly divided into two groups:
- Model Based Approaches
- Data-Driven Approaches
In model based approach, a complete model of the system is formulated and it is then used for fault diagnosis and prognosis. But this method has several limitations. Firstly, it is a difficult task to accurately model a system. Modelling becomes even more challenging with variations in working conditions. Secondly, we have to formulate different models for different tasks. For example, to diagnose bearing fault and gear fault, we have to formulate two different models. Data-driven methods provide a convenient alternative to these problems.
In data-driven approach, we use operational data of the machine to design algorithms that are then used for fault diagnosis and prognosis. The operational data may be vibration data, thermal imaging data, acoustic emission data, or something else. These techniques are robust to environmental variations. Accuracy obtained by data-driven methods is also at par and sometimes even better than accuracy obtained by model based approaches. Due to these reasons data-driven methods are becoming increasingly popular at diagnosis and prognosis tasks.
In this project we will apply some of the standard machine learning techniques to publicly available data sets and show their results with code. There are not many publicly available data sets in machinery condition monitoring. So we will manage with those that are publicly available. Unlike machine learning community where almost all data and codes are open, in condition monitoring very few things are open, though some people are gradually making codes open. This project is a step towards that direction, even though a tiny one.
This is an ongoing project and modifications and additions of new techniques will be done over time. Python and R are popular programming languages that are used for machine learning applications. We will use those for our demonstrations. Tensorflow will be used for deep learning applications. This page contains results on fault diagnosis only. Results on fault prognosis will be summarized in a separate webpage.
Results using Case Western Reserve University Bearing Data
We will first apply traditional feature based methods (so-called shallow learning methods) to obtain results and then apply deep learning based methods. In feature based methods, we will extensively use wavelet packet energy features and wavelet packet entropy featues that are calculated from raw time domain data. The procedure detailing calculation of wavelet packet energy features can be found at this link and similar calculations for wavelet packet entropy features can be found at this link.
-
SVM on time domain features (10 classes, sampling frequency: 48k) (Overall accuracy: 96.5%) (Python code) (R code)
-
SVM on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 99.3%) (Python code) (R code)
-
Visualizing High Dimensional Data Using Dimensionality Reduction Techniques (Python Code) (R Code)
-
SVM on wavelet packet entropy features (10 classes, sampling frequency: 48k) (Overall accuracy: 99.3%) (Python code) (R code)
-
SVM on time and wavelet packet features (12 classes, sampling frequency: 12k) (Achieves 100% test accuracy in one case) (Python code) (R code)
-
Multiclass Logistic Regression on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 98.5%) (Python code) (R code)
-
Multiclass Logistic Regression on wavelet packet energy features (12 classes, sampling frequency: 12k) (Overall accuracy: 99.7%) (Python code) (R code)
-
LDA on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 89.8%) (Python code) (R code)
-
LDA on wavelet packet energy features (12 classes, sampling frequency: 12k) (Overall accuracy: 99.5%) (Python code) (R code)
-
QDA on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 96.5%) (Python code) (R code)
-
QDA on wavelet packet energy features (12 classes, sampling frequency: 12k) (Overall accuracy: 99%) (Python code) (R code)
-
kNN on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 89.8%) (Python code) (R code)
-
kNN on wavelet packet energy features (12 classes, sampling frequency: 12k) (Overall accuracy: 99.5%) (Python code) (R code)
-
Decision tree on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 94.5%) (Python code) (R code)
-
Decision tree on wavelet packet energy features (12 classes, sampling frequency: 12k) (Overall accuracy: 99.7%) (Python code) (R code)
-
Bagging on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 97%) (Python code) (R code)
-
Bagging on wavelet packet energy features (12 classes, sampling frequency: 12k) (Overall accuracy: 100%) (Python code) (R code)
-
Boosting on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 99%) (Python code) (R code)
-
Boosting on wavelet packet energy features (12 classes, sampling frequency: 12k) (Overall accuracy: 100%) (Python code) (R code)
-
Random forest on wavelet packet energy features (10 classes, sampling frequency: 48k) (Overall accuracy: 98.1%) (Python code) (R code)
-
Random forest on wavelet packet energy features (12 classes, sampling frequency: 12k) (Overall accuracy: 100%) (Python code) (R code)
-
Fault diagnosis using convolutional neural network (CNN) (10 classes, sampling frequency: 48k) (Overall accuracy: 96.2%)
-
CNN based fault diagnosis using continuous wavelet transform (CWT) (10 classes, sampling frequency: 48k) (Overall accuracy: 98.2%)
(This list will be updated gradually.)
-
Transient vibration and shock response spectrum plots in MATLAB
-
Simple examples on finding instantaneous frequency using Hilbert transform (MATLAB Code)
Readers who use the processed datasets of this page must cite the original data source as
BibTeX citation
@misc{casewesternbearingdata,
url = {https://csegroups.case.edu/bearingdatacenter/home},
note = {This data come from Case Western Reserve University Bearing Data Center Website}
}
For attribution, readers may cite this project as
BibTeX citation
@misc{sahoo2016datadriven,
author = {Sahoo, Biswajit},
title = {Data-Driven Machinery Fault Diagnosis},
url = {https://biswajitsahoo1111.github.io/cbm_codes_open/},
year = {2016}
}