Diabetes Mellitus Detection Using Information Gain and Machine Learning Algorithms
Keywords:
Diabetes, Data mining, Feature selection, K Nearest Neighbor, Naïve Bayes, Logistic RegressionAbstract
Diabetes is a dreadful disease triggered as a result of hyperglycemia. An early detection of this ailment using efficient and intelligent diagnostic tools is very essential to save patients life so as to prevent untimely death and further complications of cardiovascular related issues in the body. Data mining techniques are getting prominence in medical domains in making predictions of chronic health conditions which has the potentials to efficiently diagnose diabetes mellitus disease. The study aims to develop an intelligent diagnostic tool for early prediction of diabetes disease. Data mining algorithms such as k-nearest Neighbor (KNN), Naïve Bayes and Logistic regression were employed in making prognostications. Attribute selection was performed on information gain attribute evaluator based on the data obtained from Pima Indian dataset to determine the best subset of attributes for disease classifications. Experimental result attained from the study using selected features revealed that KNN and Naïve Bayes achieved accuracy of 75.2% and 80.9% respectively while Logistic regression attained the utmost prediction accuracy of 82.2% capable of predicting diabetes mellitus disease efficiently. The model is suitable in medical domains for early prognosis of diabetes disease to assist medical personnel in making intelligent decisions.
References
Benhalima, Katrien, Paul Van Crombrugge, Carolien Moyson, Johan Verhaeghe, Sofie Vandeginste, Hilde Verlaenen, Chris Vercammen et al. "The sensitivity and specificity of the glucose challenge test in a universal two-step screening strategy for gestational diabetes mellitus using the 2013 World Health Organization criteria." Diabetes care. Vol.41, Issue.7, pp.e111-e112, 2018.
Durairaj, Manoj, and Veera Ranjani. "Data mining applications in healthcare sector: a study." International journal of scientific & technology research , Vol.2, Issue.10, pp.29-35, 2013.
Jongbo, Olayinka Ayodele, Adebayo Olusola Adetunmbi, Roseline Bosede Ogunrinde, and Bukola Badeji-Ajisafe. "Development of an ensemble approach to chronic kidney disease diagnosis." Scientific African 8, e00456, 2020.
Li, Yukai, Huling Li, and Hua Yao. "Analysis and study of diabetes follow-up data using a data-mining-based approach in new urban area of Urumqi, Xinjiang, China, 2016-2017." Computational and mathematical methods in medicine, 2018.
Mirza, Shuja, Sonu Mittal, and Majid Zaman. "A review of data mining literature." International Journal of Computer Science and Information Security (IJCSIS) Vol.14, Issue.11, pp.437-442, 2016.
Chang, Victor, Jozeene Bailey, Qianwen Ariel Xu, and Zhili Sun. "Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms." Neural Computing and Applications, Vol.35, Issue.22, pp.16157-16173, 2023.
Iyer, Aiswarya, S. Jeyalatha, and Ronak Sumbaly. "Diagnosis of diabetes using classification mining techniques." arXiv preprint arXiv, pp.1502.03774, 2015.
Priyadarshini, Rojalina, Nilamadhab Dash, and Rachita Mishra. "A novel approach to predict diabetes mellitus using modified extreme learning machine." In 2014 International Conference on Electronics and Communication Systems (ICECS), IEEE, pp.1-5, 2014.
Kaur, Paramjot, and Ramanpreet Kaur. "Comparative analysis of classification techniques for diagnosis of diabetes." In Advances in Bioinformatics, Multimedia, and Electronics Circuits and Signals: Proceedings of GUCON 2019, Springer Singapore, pp.215-221, 2020.
Orabi, Karim M., Yasser M. Kamal, and Thanaa M. Rabah. "Early predictive system for diabetes mellitus disease." In Advances in Data Mining. Applications and Theoretical Aspects: 16th Industrial Conference, ICDM 2016, New York, NY, USA, July 13-17, 2016. Proceedings 16, pp.420-427, 2016. Springer International Publishing.
Apoorva, S., K. Aditya S, P. Snigdha, P. Darshini, and H. A. Sanjay. "Prediction of diabetes mellitus type-2 using machine learning." In Computational Vision and Bio-Inspired Computing: ICCVBIC, Springer International Publishing 9, pp.364-370, 2020.
Kishore, G. Naveen, V. Rajesh, A. Vamsi Akki Reddy, K. Sumedh, and T. Rajesh Sai Reddy. "Prediction of diabetes using machine learning classification algorithms." International Journal of Scientific & Technology Research, Vol.9, Issue.1, pp.2277-8616, 2020.
Pratiwi, Asriyanti Indah, and Adiwijaya. "On the feature selection and classification based on information gain for document sentiment analysis." Applied Computational Intelligence and Soft Computing, pp.1-5, 2018.
Kaviani, Pouria, and Sunita Dhotre. "Short survey on naive bayes algorithm." International Journal of Advance Engineering and Research Development Vol.4, Issue.11, pp.607-611, 2017.
Gazalba, Ikbal, and Nurul Gayatri Indah Reza. "Comparative analysis of k-nearest neighbor and modified k-nearest neighbor algorithm for data classification." In 2017 2nd international conferences on information technology, information systems and electrical engineering (ICITISEE), IEEE, pp.294-298, 2017.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.