Analyzing a Lung Cancer Patient Dataset with the Focus on Predicting Survival Rate One Year after Thoracic Surgery
Date
2017Author
Rezaei Hachesu, P
Moftian, N
Dehghani, M
Samad Soltani, T
Metadata
Show full item recordAbstract
Background: Data mining, a new concept introduced in the mid-1990s, can help researchers to gain new, profound
insights and facilitate access to unanticipated knowledge sources in biomedical datasets. Many issues in the medical
field are concerned with the diagnosis of diseases based on tests conducted on individuals at risk. Early diagnosis
and treatment can provide a better outcome regarding the survival of lung cancer patients. Researchers can use data
mining techniques to create effective diagnostic models. The aim of this study was to evaluate patterns existing in risk
factor data of for mortality one year after thoracic surgery for lung cancer. Methods: The dataset used in this study
contained 470 records and 17 features. First, the most important variables involved in the incidence of lung cancer
were extracted using knowledge discovery and datamining algorithms such as naive Bayes, maximum expectation
and then, using a regression analysis algorithm, a questionnaire was developed to predict the risk of death one year
after lung surgery. Outliers in the data were excluded and reported using the clustering algorithm. Finally, a calculator
was designed to estimate the risk for one-year post-operative mortality based on a scorecard algorithm. Results: The
results revealed the most important factor involved in increased mortality to be large tumor size. Roles for type II
diabetes and preoperative dyspnea in lower survival were also identified. The greatest commonality in classification
of patients was Forced expiratory volume in first second (FEV1), based on levels of which patients could be classified
into different categories. Conclusion: Development of a questionnaire based on calculations to diagnose disease can
be used to identify and fill knowledge gaps in clinical practice guidelines.