Wegner, Jörg K. and Fröhlich, Holger and Zell, Andreas

Feature selection for Descriptor based Classification Models. 1. Theory and GA-SEC Algorithm

Journal of Chemical Information and Computer Science (JCICS) vol. 44 (2004), no. 3, pp. 921-930


Abstract

The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present several standard approaches and modifications of our Genetic Algorithm based on the Shannon Entropy Cliques (GA-SEC) algorithm and the extension for classification problems using boosting.


Downloads and Links

[doi] [pdf] [pdf]


BibTeX

@article{2004_131,
  author = {Wegner, J\"org K. and Fr\"ohlich, Holger and Zell, Andreas},
  title = {{Feature selection for Descriptor based Classification Models. 1.
	Theory and GA-SEC Algorithm}},
  journal = {Journal of Chemical Information and Computer Science (JCICS)},
  year = {2004},
  volume = {44},
  pages = {921--930},
  number = {3},
  month = feb,
  abstract = {The paper describes different aspects of classification models based
	on molecular data sets with the focus on feature selection methods.
	Especially model quality and avoiding a high variance on unseen data
	(overfitting) will be discussed with respect to the feature selection
	problem. We present several standard approaches and modifications
	of our Genetic Algorithm based on the Shannon Entropy Cliques (GA-SEC)
	algorithm and the extension for classification problems using boosting.},
  doi = {10.1021/ci0342324},
  pdf = {http://www.cogsys.cs.uni-tuebingen.de/publikationen/2004/Wegner2004.pdf},
  url = {http://dx.doi.org/10.1021/ci0342324}
}