• English
    • Persian
  • Persian 
    • English
    • Persian
  • ورود
مشاهده آیتم 
  •   صفحه اصلی مخزن دانش
  • TBZMED Published Academics Works
  • Published Articles
  • مشاهده آیتم
  •   صفحه اصلی مخزن دانش
  • TBZMED Published Academics Works
  • Published Articles
  • مشاهده آیتم
JavaScript is disabled for your browser. Some features of this site may not work without it.

Coping with Unbalanced Class Data Sets in Oral Absorption Models

Thumbnail
تاریخ
2013
نویسنده
Newby, D
Freitas, AA
Ghafourian, T
Metadata
نمایش پرونده کامل آیتم
چکیده
Class imbalance occurs frequently in drug discovery data sets. In oral absorption data sets, in the literature, there are considerably more highly absorbed compounds compared to poorly absorbed compounds. This produces models that are biased toward highly absorbed compounds which lack generalization to industry settings where more early stage drug candidates are poorly absorbed. This paper presents two strategies to cope with unbalanced class data sets: undersampling the majority high absorption class and misclassification costs using classification decision trees. The published data set by Hou et al. [J. Chem. Inf Model. 2007, 47, 208-218], which contained percentage human intestinal absorption of 645 drug and drug-like compounds, was used for the development and validation of classification trees using classification and regression tree (C&RT) analysis. The results indicate that undersampling the majority class, highly absorbed compounds, leads to a balanced distribution (50:50) training set which can achieve better accuracies for poorly absorbed compounds, whereas the biased training set achieved higher accuracies for highly absorbed compounds. The use of misclassification costs resulted in improved class predictions, when applied to reduce false positives or false negatives. Moreover, it was shown that the classical overall accuracy measure used in many publications is particularly misleading in the case of unbalanced data sets and more appropriate measures presented here may be used for a more realistic assessment of the classification models' performance. Thus, these strategies offer improvements to cope with unbalanced class data sets to obtain classification models applicable in industry.
URI
http://dspace.tbzmed.ac.ir:8080/xmlui/handle/123456789/49326
Collections
  • Published Articles

مخزن دانش دانشگاه علوم پزشکی تبریز در نرم افزار دی اسپیس، کپی رایت 2018 ©  
تماس با ما | Send Feedback
Theme by 
Atmire NV
 

 

مرور

همه مخزنجامعه ها و مجموعه هابراساس تاریخ انتشارنویسنده هاعنوانهاموضوعاین مجموعهبراساس تاریخ انتشارنویسنده هاعنوانهاموضوع

حساب من

ورودثبت نام

مخزن دانش دانشگاه علوم پزشکی تبریز در نرم افزار دی اسپیس، کپی رایت 2018 ©  
تماس با ما | Send Feedback
Theme by 
Atmire NV