(1) Background: Diabetes is a common chronic disease and a leading cause of death. Early diagnosis gives patients with diabetes the opportunity to improve their dietary habits and lifestyle and manage the disease successfully. Several studies have explored the use of machine learning (ML) techniques to predict and diagnose this disease. In this study, we conducted experiments to predict diabetes in Pima Indian females with particular ML classifiers. (2) Method: A Pima Indian diabetes dataset (PIDD) with 768 female patients was considered for this study. Different data mining operations were performed to a conduct comparative analysis of four different ML classifiers: Naïve Bayes (NB), J48, Logistic Regression (LR), and Random Forest (RF). These models were analyzed by different cross-validation (K = 5, 10, 15, and 20) values, and the performance measurements of accuracy, precision, F-score, recall, and AUC were calculated for each model. (3) Results: LR was found to have the highest accuracy (0.77) for all 'k' values. When k = 5, the accuracy of J48, NB, and RF was found to be 0.71, 0.76, and 0.75. For k = 10, the accuracy of J48, NB, and RF was found to be 0.73, 0.76, 0.74, while for k = 15, 20, the accuracy of NB was found to be 0.76. The accuracy of J48 and RF was found to be 0.76 when k = 15, and 0.75 when k = 20. Other parameters, such as precision, f-score, recall, and AUC, were also considered in evaluations to rank the algorithms. (4) Conclusion: The present study on PIDD sought to identify an optimized ML model, using with cross-validation methods. The AUC of LR was 0.83, RF 0.82, and NB 0.81). These three were ranked as the best models for predicting whether a patient is diabetic or not.

Comparative machine-learning approach: A follow-up study on type 2 diabetes predictions by cross-validation methods

Seyed Khosrow Tayebati;Gopi Battineni;Getu Sagaro;C Nalini;Francesco Amenta
2019-01-01

Abstract

(1) Background: Diabetes is a common chronic disease and a leading cause of death. Early diagnosis gives patients with diabetes the opportunity to improve their dietary habits and lifestyle and manage the disease successfully. Several studies have explored the use of machine learning (ML) techniques to predict and diagnose this disease. In this study, we conducted experiments to predict diabetes in Pima Indian females with particular ML classifiers. (2) Method: A Pima Indian diabetes dataset (PIDD) with 768 female patients was considered for this study. Different data mining operations were performed to a conduct comparative analysis of four different ML classifiers: Naïve Bayes (NB), J48, Logistic Regression (LR), and Random Forest (RF). These models were analyzed by different cross-validation (K = 5, 10, 15, and 20) values, and the performance measurements of accuracy, precision, F-score, recall, and AUC were calculated for each model. (3) Results: LR was found to have the highest accuracy (0.77) for all 'k' values. When k = 5, the accuracy of J48, NB, and RF was found to be 0.71, 0.76, and 0.75. For k = 10, the accuracy of J48, NB, and RF was found to be 0.73, 0.76, 0.74, while for k = 15, 20, the accuracy of NB was found to be 0.76. The accuracy of J48 and RF was found to be 0.76 when k = 15, and 0.75 when k = 20. Other parameters, such as precision, f-score, recall, and AUC, were also considered in evaluations to rank the algorithms. (4) Conclusion: The present study on PIDD sought to identify an optimized ML model, using with cross-validation methods. The AUC of LR was 0.83, RF 0.82, and NB 0.81). These three were ranked as the best models for predicting whether a patient is diabetic or not.
2019
262
File in questo prodotto:
File Dimensione Formato  
109.pdf

accesso aperto

Descrizione: Machines 2019, 7, 74:1-11
Tipologia: Versione Editoriale
Licenza: PUBBLICO - Creative Commons
Dimensione 1.69 MB
Formato Adobe PDF
1.69 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11581/433828
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 61
  • ???jsp.display-item.citation.isi??? 37
social impact