ImbGAFS: GA Feature Selection for AUC in Bird Strike Prediction


Aji Gautama Putrada and Sidik Prabowo, Telkom University, Indonesia


Several studies discuss airplane failure prediction due to bird strikes. However, these studies need to analyze further the imbalance in their dataset. Our research aim is to create an airplane failure prediction by bird strike using a machine learning method optimized using GA feature selection. GA feature selection uses AUC maximization as the objective function to tackle imbalance problems in the bird strike dataset. First, we obtained the airplane bird strike dataset from Kaggle. We carry out preprocessing on the dataset. We then compared and chose one of four state-of-the-art machine learning methods: SVM, MLP, logistic regression, and random forest. The selection process involves oversampling methods, synthetic minority oversampling technique (SMOTE), and optimum threshold selection, which involves geometric mean (g-mean) and area under curve (AUC) values. Finally, we optimize airplane failure prediction by performing AUC maximization using GA feature selection. Our test results show that random forest is the best machine learning method in airplane failure prediction compared to SVM, logistic regression, and MLP. SMOTE can increase random forest AUC from 0.845 to 0.878. Finally, the random forest model from ImbGAFS is better than the conventional method without feature selection. The increase in the AUC value is from 0.878 to 0.889. Then, after carrying out optimal threshold selection, ImbGAFS+random forest also has better sensitivity, specificity, and g-mean than conventional methods. The increase is from 0.7737, 0.8350, and 0.8037 to 0.8033, 0.8301, and 0.8166, respectively.


Genetic Algorithm, Area Under Curve Maximization, Airplane Failure, Imbalanced Dataset, Bird Strike.

Full Text  Volume 13, Number 16