AutoML Strategy Based on Grammatical Evolution: A Case Study about Knowledge Discovery from Text
Suilan Estevez-Velarde, Yoan Gutiérrez, Andrés Montoyo, Yudivián Almeida-Cruz
The process of extracting knowledge from natural language text poses a complex problem that requires both a combination of machine learning techniques and proper feature selection. Recent advances in Automatic Machine Learning (AutoML) provide effective tools to explore large sets of algorithms, hyper-parameters and features to find out the most suitable combination of them. This paper proposes a novel AutoML strategy based on probabilistic grammatical evolution, which is evaluated on the health domain by facing the knowledge discovery challenge in Spanish text documents. Our approach achieves state-of-the-art results and provides interesting insights into the best combination of parameters and algorithms to use when dealing with this challenge. Source code is provided for the research community.
@inproceedings{estevez-velarde-etal-2019-automl, title = "{A}uto{ML} Strategy Based on Grammatical Evolution: A Case Study about Knowledge Discovery from Text", author = "Estevez-Velarde, Suilan and Guti{'e}rrez, Yoan and Montoyo, Andr{'e}s and Almeida-Cruz, Yudivi{'a}n", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/P19-1428", doi = "10.18653/v1/P19-1428", pages = "4356--4365", abstract = "The process of extracting knowledge from natural language text poses a complex problem that requires both a combination of machine learning techniques and proper feature selection. Recent advances in Automatic Machine Learning (AutoML) provide effective tools to explore large sets of algorithms, hyper-parameters and features to find out the most suitable combination of them. This paper proposes a novel AutoML strategy based on probabilistic grammatical evolution, which is evaluated on the health domain by facing the knowledge discovery challenge in Spanish text documents. Our approach achieves state-of-the-art results and provides interesting insights into the best combination of parameters and algorithms to use when dealing with this challenge. Source code is provided for the research community.", }