skip to main content
research-article

Comparative analysis of weka-based classification algorithms on medical diagnosis datasets

Published: 01 January 2023 Publication History

Abstract

Background:

With the advent of 5G and the era of Big Data, the rapid development of medical information technology around the world, the massive application of electronic medical records and cases, and the digitization of medical equipment and instruments, a large amount of data has accumulated in the database system of hospitals, which includes clinical diagnosis data and hospital management data.

Objective:

This study aimed to examine the classification effects of different machine learning algorithms on medical datasets so as to better explore the value of machine learning methods in aiding medical diagnosis.

Methods:

The classification datasets of four different medical fields in the University of California Irvine machine learning database were used as the research object. Also, six categories of classification models based on the Bayesian theorem idea, integrated learning idea, and rule-based and tree-based idea were constructed using the Weka platform.

Results:

The between-group experiments showed that the Random Forest algorithm achieved the best results on the Indian liver disease patient dataset (ILPD), delivery cardiotocography (CADG), and lymphatic tractography (LYMP) datasets, followed by Bagging and partition and regression tree. In the within-group algorithm comparison experiments, the Bagging algorithm achieved better results than other algorithms based on the integration idea for 11 metrics on all datasets, mainly focusing on 2 binary datasets. Logit Boost had only 7 metrics with significant performance, and the best algorithm was Rotation Forest, with 28 metrics achieving optimal values. Among the algorithms based on tree ideas, the logistic model tree algorithm achieved optimal results on all metrics on the mammographic dataset (MAGR). The classification performance of BFTree, J48, and Random Tree was poor on each dataset. The best algorithm was Random Forest on the ILPD, CADG, and LYMP datasets with 27 metrics reaching the optimum.

Conclusion:

Machine learning algorithms have good application value in disease prediction and can provide a reference basis for disease diagnosis.

References

[1]
Jayanthi P. Machine learning and deep learning algorithms in disease prediction. Deep Learning for Medical Applications with Unique Data. 2022; 123-152.
[2]
Kundu N, Rani G, Dhaka VS, et al. IoT and Interpretable Machine Learning Based Framework for Disease Prediction in Pearl Millet. Sensors. 2021; 21.
[3]
Velswamy K, Velswamy R, Swamidason I, et al. Classification model for heart disease prediction with feature selection through modified bee algorithm. Soft Computing. 2021; 1-9.
[4]
Mullaivanan D, Kalpana R. A Comprehensive Survey of Data Mining Techniques in Disease Prediction. 2021.
[5]
Mohammadmersad G, Taylor SJE, Pook MA, et al. Comparative (Computational) Analysis of the DNA Methylation Status of Trinucleotide Repeat Expansion Diseases. Journal of Nucleic Acids. 2013; 2013: 689798.
[6]
Weitschek E, Fiscon G, Felici G. Supervised DNA barcodes species classification: Analysis, comparisons and results. BioData Mining. 2014; 7(1): 4.
[7]
Chaudhary A, Srivastava S, Garg S. Development of a software tool and criteria evaluation for efficient design of small interfering RNA. Biochemical & Biophysical Research Communications. 2011; 404(1): 313-320.
[8]
Zhang G, Ge H. Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. Computational Biology and Chemistry. 2013; 46(10): 16-22.
[9]
Carlos, Fernandez-Lozano, Marcos, et al. Markov mean properties for cell death-related protein classification. Journal of Theoretical Biology. 2014; 349: 12-21.
[10]
Ferreira D, Oliveira A, Freitas A. Applying data mining techniques to improve diagnosis in neonatal jaundice. BMC Medical Informatics and Decision Making. 2012; 12(1): 143-143.
[11]
Amini L, Azarpazhouh R, Farzadfar MT, et al. Prediction and control of stroke by data mining. International Journal of Preventive Medicine. 2013; 4(Suppl 2): 5245.
[12]
Abdülkadir C, Demirel B. A software tool for determination of breast cancer treatment methods using data mining approach. Journal of Medical Systems. 2011; 35(6): 1503-1511.
[13]
Amarreh I, Meyerand ME, Stafstrom C, et al. Individual classification of children with epilepsy using support vector machine with multiple indices of diffusion tensor imaging. NeuroImage: Clinical. 2014; 4.
[14]
Kanda PAM, Trambaiolli LR, Lorena AC, et al. Clinician’s road map to wavelet EEG as an Alzheimer’s disease biomarker. Clinical Eeg and Neuroscience. 2014; 45(2): 104-12.
[15]
Peissig PL, Costa VS, Caldwell MD, et al. Relational machine learning for electronic health record-driven phenotyping. Journal of Biomedical Informatics. 2014; 52: 260-270.
[16]
Stiglic G, Kokol P. Discovering subgroups using descriptive models of adverse outcomes in medical care. Methods of Information in Medicine. 2012; 51(4): 348-52.
[17]
Dhanda SK, Singla D, Mondal AK, et al. DrugMint: A webserver for predicting and designing of drug-like molecules. Biology Direct. 2013; 8(1): 28.
[18]
Xia KJ, Wang JQ, Jin Y. Medical Data Classification and Early-prediction of Nephropathy Based on WEKA Platform. China Digital Medicine. 2018.
[19]
Zhang Y, Dou YF. Medical Data Classification and Early Diabetes Prediction Based on WEKA. Journal of Medical Information. 2021; 34(6): 32-35.
[20]
Roger J, Marshall, Richard J, et al. Quantifying the effect of age on short-term and long-term case fatality in 14,000 patients with incident cases of cardiovascular disease. European journal of cardiovascular prevention and rehabilitation: official journal of the European Society of Cardiology, Working Groups on Epidemiology & Prevention and Cardiac Rehabilitation and Exercise Physiology. 2008.
[21]
Ho TK. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis & Machine Intelligence. 1998; 20(8): 832-844.
[22]
Fernandez-Delgado M, Cernadas E, Barro S, et al. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research. 2014; 15: 3133-3181.
[23]
Kamath C, Cantu-Paz E. Creating ensembles of decision trees through sampling: US, US 6938049 B2 [P]. 2005.

Cited By

View all
  • (2024)Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseasesTechnology and Health Care10.3233/THC-24802132:S1(241-251)Online publication date: 1-Jan-2024
  • (2024)Value of magnetic resonance imaging radiomics features in predicting histologic grade of invasive ductal carcinoma of the breastTechnology and Health Care10.3233/THC-23067132:3(1609-1618)Online publication date: 1-Jan-2024

Index Terms

  1. Comparative analysis of weka-based classification algorithms on medical diagnosis datasets
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Technology and Health Care
    Technology and Health Care  Volume 31, Issue S1
    2023
    552 pages
    This is an open access article distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

    Publisher

    IOS Press

    Netherlands

    Publication History

    Published: 01 January 2023

    Author Tags

    1. Algorithms
    2. comparison
    3. machine learning
    4. medical data
    5. prediction
    6. Weka

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseasesTechnology and Health Care10.3233/THC-24802132:S1(241-251)Online publication date: 1-Jan-2024
    • (2024)Value of magnetic resonance imaging radiomics features in predicting histologic grade of invasive ductal carcinoma of the breastTechnology and Health Care10.3233/THC-23067132:3(1609-1618)Online publication date: 1-Jan-2024

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media