نمایش پرونده ساده آیتم

dc.contributor.authorHamidi, Farzaneh
dc.date.accessioned2022-07-31T04:23:27Z
dc.date.available2022-07-31T04:23:27Z
dc.date.issued2022en_US
dc.identifier.urihttp://dspace.tbzmed.ac.ir:80/xmlui/handle/123456789/66835
dc.description.abstractIntroduction: Early diagnosis of ovarian cancer and genes affecting it play a very key role in the treatment and life of the patient. By using gene expression data extracted from microarray technology and machine learning algorithms, it is possible to provide new and intelligent methods in the health and treatment system that can diagnose ovarian cancer with high accuracy. Objectives: Comparison of regularization and machine learning approaches (LASSO, Elastic net and Boruta) in variable selection and prediction and its application in ovarian cancer microarray data. Methods: We used the Boruta, LASSO and Elastic net to select the most critical miRNAs related to GC in the training sample that produce the highest prediction accuracy. We used SMOTE random oversampling to balance the outcome in the GSE106817 data. We then used five-fold cross-validation to find the optimal hyper parameters on DT, RF, LR, XGBT, and ANN to choose the best approaches in the balanced sample using the most important variables selected by Boruta, LASSO and Elastic net. Once the prediction models were developed, we applied them on the test sample GSE113486 and GSE113740 to verify the accuracy of developed prediction approach. We looked for an algorithm that may generate a higher predictive power among the 5 ML algorithms in terms of the yielded areas under the ROC curves (AUCs). Sensitivity, specificity, positive predictive value, negative predictive value, misclassification rate, and Kappa were assessed. The guidelines of developing transparent multivariable prediction models were followed for this analysis. We used “Boruta” and “Glmnet” package in R software. This study also investigates the shrinkage strategy, focusing on the regularized linear regression versions LASSO and Elastic Net and also a wrapper method named Boruta that implementing a novel feature selection algorithm for finding all relevant variable. The algorithm is a wrapper around a Random Forest classification algorithm. It iteratively removes the variables which are proved by a statistical test to be less important than random probes. The performance of these techniques has been studied with simulating environment is discussed in the section 3 and the next section provides summary of the results. Result: By using the mentioned methods, a set of very small and important variables was obtained, based on the evaluation criteria, the obtained results had considerable validity and value, and the obtained miRNAs were identified as potential strong biomarkers for ovarian cancer. All microarrays individually had significant expression levels in cancer cases (p=0.001 and ROC>90%) and in the original data set (p=0.001 and ROC>98%) and in external evaluation data (p=0.001 and ROC>95). %) which can be said that all 5 classification models using these microarrays had high and significant AUC. The simulation results according to the box diagrams showed that when the sample size increases, in high correlations, the performance of Lasso is better than Elasticnet and then Bruta, while in low correlations, Bruta performs better than Elasticnet and Lasso. Also, according to the results of this study, on the scenarios that were designed with high dimensions, we found that in high dimensions, when the correlation is strong, Bruta is better than Elastic Net, and after that, Lasso performs well, while in low and weak correlations, Elastic Net performs better than Bruta and Lasso. Conclusion: The findings of this study provided significant evidence that a set of serum miRNA profile extracts are promising diagnostic biomarkers for ovarian cancer. The simulation phase of the study showed that, based on conditions such as high correlation and high dimensions, Boruta has a better performance than ElasticNet and Lasso.en_US
dc.language.isofaen_US
dc.publisherTabriz University of Medical Sciences, School of Healthen_US
dc.relation.isversionofhttp://dspace.tbzmed.ac.ir:80/xmlui/handle/123456789/66834en_US
dc.subjectLASSOen_US
dc.subjectElastic Neten_US
dc.subjectBorutaen_US
dc.subjectFeature Selectionen_US
dc.subjectOvarian Canceren_US
dc.subjectBiomarkeren_US
dc.subjectMachine Learningen_US
dc.subjectGene Expression Omnibusen_US
dc.subjectClassificationen_US
dc.subjectPredictionen_US
dc.subjectSimulationen_US
dc.titleComparison of Regularization and Machine Learning Approaches in Variable Selection and Prediction and its Application in Biological Data Analysisen_US
dc.typeThesisen_US
dc.contributor.supervisorGilani, Neda
dc.identifier.callno588بen_US
dc.contributor.departmentBiostatisticsen_US
dc.description.disciplineBiostatisticsen_US
dc.description.degreeMScen_US


فایلهای درون آیتم

Thumbnail

این آیتم در مجموعه های زیر مشاهده می شود

نمایش پرونده ساده آیتم