The effect of variable selection on the non-linear modelling of oestrogen receptor binding
Abstract
Oestrogen Receptor Binding Affinity (RBA) is often used as a measure of the oestrogenicity of endocrine disrupting chemicals. Quantitative Structure-Activity Relationship (QSAR) modelling of the binding affinities has been performed by three-dimensional approaches such as Comparative Molecular Field Analysis (CoMFA). Such techniques are restricted, however, for chemically diverse sets of chemicals as the alignment of molecules is complex. The aim of the present study was to use non-linear methods to model the RBA to the oestrogen receptor of a large diverse set of chemicals. To this end, various variable selection methods were applied to a large group of descriptors. The methods included stepwise regression, partial least squares and recursive partitioning (Formal Inference Based Recursive Modelling, FIRM). The selected descriptors were used in Counter-Propagation Neural Networks (CPNNs) and Support Vector Machines (SVMs) and the models were compared in terms of the predictivity of the activities of an external validation set. The results showed that although there was a certain degree of similarities between the structural descriptors selected by different methods, the predictive power of the CPNN and SVM models varied. Although the variables selected by stepwise regression led to poor CPNN models they resulted in the best SVM model in terms of predictivity. The parameters selected by some of the FIRM methods were superior in CPNN.