Multiclass disease predictions based on integrated clinical and genomics datasets
AffiliationUniversity of Derby
MetadataShow full item record
AbstractClinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a prediction model which uses information from both clinical and genomics datasets. We have demonstrated multiclass disease predictions based on combined clinical and genomics datasets using machine learning methods. We have created an integrated dataset, using a clinical (ClinVar) and a genomics (gene expression) dataset, and trained it using instancebased learner to predict clinical diseases. We have used an innovative but simple way for multiclass classification, where the number of output classes is as high as 75. We have used Principal Component Analysis for feature selection. The classifier predicted diseases with 73% accuracy on the integrated dataset. The results were consistent and competent when compared with other classification models. The results show that genomics information can be reliably included in datasets for clinical predictions and it can prove to be valuable in clinical diagnostics and precision medicine.
CitationSubhani, M. and Anjum, A. (2019) 'Multiclass disease predictions based on integrated clinical and genomics datasets', The Eleventh International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies. Novotel, Athens, 2-6 June. IARA: Wilmington, pp. 20-27.