ARTIFICIAL INTELLIGENCE RESEARCH LABORATORY
    Center for Computational Intelligence, Learning, and Discovery
    Department of Computer Science


Language Models, Language Learning, and Applications


Automata induction, the task of infering an unknown grammar (or equivalently, the corresponding recognition device) from examples finds applications in several areas including structural pattern recognition, language learning, information retrieval and computational biology. Honavar's research on grammar inference explores the design and analysis of algorithms for induction of regular grammars within different models of interaction between the learner and the environment. Of particular interest are models of language learning from simple examples, induction of large regular grammars, as well acquisition of semantics along with syntax of natural as well as artificial languages.

More recent work on language modeling has focused on probabilistic generative models and their applications in sequence classification.

Selected References

  1. Kang, D-K., Silvescu, A. and Honavar, V. (2006). RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. In: Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer Science.. Berlin: Springer-Verlag.

  2. Kang, D-K., Fuller, D., and Honavar, V. (2005). Learning Misuse and Anomaly Detectors from System Call Frequency Vector Representation. In: IEEE International Conference on Intelligence and Security Informatics. Springer-Verlag Lecture Notes in Computer Science. Vol. 3495. pp. 511-516. Springer-Verlag.

  3. Kang, D-K., Zhang, J., Silvescu, A., and Honavar, V. (2005). Multinomial Event Model Based Abstraction for Sequence and Text Classification. In: Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA 2005). Edinburgh, UK. Vol. 3607. pp. 134-148. Berlin: Springer-Verlag.

  4. Yakhnenko, O., Silvescu, A., and Honavar, V. (2005). Discriminatively Trained Markov Model for Sequence Classification. In: IEEE Conference on Data Mining (ICDM 2005). Houston, Texas. IEEE Press.

  5. Andorf, C., Silvescu, A., Dobbs, D., and Honavar, V. (2004). Probabilistic Graphical Models for Protein Function Classification. In: Proceedings of the Conferenc on Knowledge-Based Computer Systems, Hyderabad, India.

  6. Parekh, R. and Honavar, V. (2001). DFA Learning from Simple Examples. Machine Learning. Vol. 44. pp. 9-35.

  7. Parekh, R. and Honavar, V. (2000). On the Relationships between Models of Learning in Helpful Environments.i. In: Proceedings of the Fifth International Conference on Grammatical Inference. Lecture Notes in Artificial Intelligence Vol. 1891. Berlin: Springer-Verlag. pp. 207-220.

  8. Parekh, R. & Honavar, V. (2000). Automata Induction, Grammar Inference, and Language Acquisition. Invited chapter. In: Handbook of Natural Language Processing. Dale, Moisl & Somers (Ed). New York: Marcel Dekker.

  9. Parekh, R. and Honavar, V. (1999). Simple DFA are Polynomially Probably Exactly Learnable from Simple Examples. In: Proceedings of the International Conference on Machine Learning. Bled, Slovenia.

  10. Parekh, R., Nichitiu, C., and Honavar, V. (1998). A Polynomial Time Incremental Algorithm for Learning DFA. In: Proceedings of the Fourth International Colloquium on Grammatical Inference (ICGI'98), Ames, IA. Lecture Notes in Computer Science vol. 1433 pp. 37-49. Berlin: Springer-Verlag.