Data Mining and Knowledge Discovery: Algorithms and Applications
Data Mining is concerned with the development and applications of algorithms for discovery of a priori unknown relationships - associations, groupings, classifiers from data. Honavar's current research on data mining is focused on:
- Algorithms for learning from distributed, autonomous data sources
- Algorithms for learning from ontologies (e.g., attribute value taxonomies, part-whole relationships, class taxonomies) and partially specified data
- Algorithms for learning ontologies (e.g., attribute value taxonomies, class taxonomies, part-whole relationships) from data
- Algorithms for learning classifiers from relational data
- Applications of data mining approaches for knowledge discovery in computational molecular biology - e.g., discovery of macromolecular sequence-structure-expression-evolution-function relationships
- Application of data mining algorithms in monitoring and control of complex engineered systems - e.g., prediction of demand for electric power from historical data, coordinated intrusion detection in computer networks
Selected References
-
Caragea, D., Zhang, J., Pathak, J., and Honavar, V. (2006). Learning Classifiers from Distributed, Ontology-Extended Data Sources. Proceedings of the 8th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2006), Krakov, Poland, Lecture Notes in Computer Science. Berlin: Springer. In press.
-
Yan, C., Terribilini, M., , Wu, F., Jernigan, R.L., Dobbs, D. and Honavar, V. (2006) Identifying amino acid residues involved in protein-DNA interactions from sequence. BMC Bioinformatics, 2006.
-
Terribilini, M., Lee, J.-H., Yan, C., Jernigan, R. L., Honavar, V. and Dobbs, D. (2006) Predicting RNA-binding Sites from Amino Acid Sequence. RNA Journal.. Vol. In press, Accepted, 2006.
-
Kang, D-K., Silvescu, A. and Honavar, V. (2006). RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. In: Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer Science.. Berlin: Springer-Verlag. In press.
- Zhang, J., Kang, D-K., Silvescu, A. and Honavar, V. (2006) Learning Compact and Accurate Naive Bayes Classifiers from Attribute Value Taxonomies and Data. Knowledge and Information Systems. Vol. 9. No. 2. pp. 157-179, 2006.
-
Pathak, J, Yong, J. Honavar, V., McCalley, J. (2006). Condition Data Aggregation for Failure Mode Estimation of Power Transformers. In: Hawaii International Conference on Systems Sciences.
-
Terribilini, M., Lee. J-H., Yan, C., Carpenter, S., Jernigan, R., Honavar, V. and Dobbs, D. (2006). Identifying interaction sites in recalcitrant proteins: predicted protein and rna binding sites in HIV-1 and EIAV agree with experimental data. In: Pacific Symposium on Biocomputing. Hawaii. In press.
-
Vasile, F., Silvescu, A., Kang, D-K., and Honavar, V. (2006). TRIPPER: An Attribute Value Taxonomy Guided Rule Learner. In: Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). In press.
-
Silvescu, A. and Honavar, V. (2005). Independence, Decomposability and functions which take values into an Abelian Group. In: Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics. http://anytime.cs.umass.edu/aimath06/proceedings.html.
-
Yakhnenko, O., Silvescu, A., and Honavar, V. (2005). Discriminatively Trained Markov Model for Sequence Classification. In: IEEE Conference on Data Mining (ICDM 2005). Houston, Texas. IEEE Press.
-
Caragea, D., Zhang, J., Bao, J., Pathak, J., and Honavar, V. (2005). Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous Information Sources (Invited paper). In: Proceedings of the 16th International Conference on Algorithmic Learning Theory. Lecture Notes in Computer Science. Singapore. Vol. 3734. pp. 13-44. Berlin: Springer-Verlag.
-
Zhang, J., Caragea, D. and Honavar, V. (2005). Learning Ontology-Aware Classifiers. In: Proceedings of the 8th International Conference on Discovery Science. Springer-Verlag Lecture Notes in Computer Science. Singapore. Vol. 3735. pp. 308-321. Berlin: Springer-Verlag.
- Kang, D-K., Fuller, D., and Honavar, V. (2005). Learning Misuse and Anomaly Detectors from System Call Frequency Vector Representation. In: IEEE International Conference on Intelligence and Security Informatics. Springer-Verlag Lecture Notes in Computer Science. Vol. 3495. pp. 511-516. Springer-Verlag.
-
Kang, D-K., Zhang, J., Silvescu, A., and Honavar, V. (2005). Multinomial Event Model Based Abstraction for Sequence and Text Classification. In: Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA 2005). Edinburgh, UK. Vol. 3607. pp. 134-148. Berlin: Springer-Verlag.
- Kang, D-K., Fuller, D., and Honavar, V. (2005). Learning Classifiers for Misuse and Anomaly Detection Using a Bag of System Calls Representation. In: Proceedings of the 6th IEEE Systems, Man, and Cybernetics Workshop (IAW 05). West Point, NY. pp. 118-125. IEEE.
-
Andorf, C., Silvescu, A., Dobbs, D. and Honavar, V. (2004). Learning Classifiers for Assigning Protein Sequences to Gene Ontology Functional Families. In: Fifth International Conference on Knowledge Based Computer Systems (KBCS 2004). India. pp. 256-255. New Delhi, India: Allied Publishers.
-
Caragea, D., Silvescu, A., and Honavar, V. (2004). A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. In: International Journal of Hybrid Intelligent Systems. Vol. 1. No. 2. pp. 80-89.
-
Cook, D., Caragea, D., and Honavar, V. (2004). Visualization in Classification Problems. In: Proceedings in Computational Statistics (COMPSTAT 2004). pp. 799-806. Springer-Verlag.
-
Kang, D-K., Silvescu, A., Zhang, J. and Honavar, V. (2004). Generation of Attribute Value Taxonomies from Data for Accurate and Compact Classifier Construction. In: IEEE International Conference on Data Mining. pp. 130-137. IEEE Press.
-
Lonosky, P., Zhang, X., Honavar, V., Dobbs, D., Fu, A., and Rodermel, S. (2004). A Proteomic Analysis of Chloroplast Biogenesis in Maize. In: Plant Physiology. Vol. 134. pp. 560-574.
-
R. Polikar, L. Udpa, S. Udpa, and V. Honavar (2004). An Incremental Learning Algorithm with Confidence Estimation for Automated Identification of NDE Signals. In: IEEE Transactions of Ultrasonics, Ferroelectrics, and Frequency Control. Vol. 51. pp. 990-1001.
-
Sen, T.Z., Kloczkowski, A., Jernigan, R.L., Yan, C., Honavar, V., Ho, K-M., Wang, C-Z., Ihm, Y., Cao, H., Gu, X., and Dobbs, D. (2004). Predicting Binding Sites of Protease-Inhibitor Complexes by Combining Multiple Methods. In: BMC Bioinformatics. Vol. 5. pp. 205.
-
Yan, C., Dobbs, D., and Honavar, V. (2004). A Two-Stage Classifier for Identification of Protein-Protein Interface Residues. In: Bioinformatics. Vol. 20. pp. i371-378.
-
Yan, C., Dobbs, D., and Honavar, V. (2004). Identifying Protein-Protein Interaction Sites from Surface Residues . A Support Vector Machine Approach. In: Neural Computing Applications. Vol. 13. pp. 123-129.
-
Zhang, J. and Honavar, V. (2004). Learning Compact and Accurate Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: IEEE International Conference on Data Mining. pp. 289-298. IEEE Press.
-
Atramentov, A., Leiva, H., and Honavar, V. (2004).
A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments.. In: Proceedings of the Thirteenth International Conference on Inductive Logic Programming. Berlin: Springer-Verlag. In press.
-
Caragea, D., Silvescu, A., and Honavar, V. (2003).
Decision Tree Induction from Distributed, Heterogeneous, Autonomous Data Sources. In: Proceedings of the Conference on Intelligent Systems Design and Applications (ISDA 03). In press.
-
Wang, X., Schroeder, D., Dobbs, D., and Honavar, V. (2003). Automated Data-Driven Discovery of Motif-Based Protein Function Classifiers. Information Sciences. In press.
-
Yan, C., Dobbs, D. (2003). Identification of Surface Residues Involved in Protein-Protein Interaction -- A Support Vector Machine ApproachIn: Proceedings of the Conference on Intelligent Systems Design and Applications (ISDA-03). Tulsa, Oklahoma. 2003.
-
Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC.
-
Helmer, G., Wong, J., Honavar, V., and Miller, L. (2003). Lightweight Agents for Intrusion Detection. Journal of Systems and Software. Vol. 67. pp. 109-122.
-
Helmer, G., Wong, J., Honavar, V., and Miller, L. (2002). Automated Discovery of Concise Predictive Rules for Intrusion Detection. Journal of Systems and Software.60 (3) (2002) pp. 165-175
-
Caragea, D., Silvescu, A., and Honavar, V. (2001). Invited Chapter.
Towards a Theoretical Framework for Analysis and Synthesis of Agents That Learn
from Distributed Dynamic Data Sources. In: Emerging Neural Architectures Based on Neuroscience. Berlin: Springer-Verlag.
-
Caragea, D., Cook, D., and Honavar, V. (2001). Gaining Insights into Support Vector Machine Classifiers Using Projection-Based Tour Methods. In: Proceedings of the Conference on Knowledge Discovery and Data Mining.
-
Parekh, R. and Honavar, V. (2001). DFA Learning from Simple Examples. Machine Learning. Vol. 44. pp. 9-35.
-
Polikar, R., Shinar, R., Honavar, V., Udpa, L., and Porter, M. (2001). Detection and Identification of Odorants Using an Electronic Nose. In: Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing.
-
Polikar, R., Udpa, L., Udpa, S., and Honavar, V. (2001). Learn++: An Incremental Learning Algorithm for Multi-Layer Perceptron Networks. IEEE Transactions on Systems, Man, and Cybernetics. Vol. 31, No. 4. pp. 497-508.
-
Silvescu, A., and Honavar, V. (2001). Temporal Boolean Network Models of Genetic Networks and Their Inference from Gene Expression Time Series. Complex Systems.. Vol. 13. No. 1. pp. 54-.
-
Parekh, R., Yang, J., and Honavar, V. (2000).
Constructive Neural Network
Learning Algorithms for Multi-Category Pattern Classification.
IEEE Transactions on Neural Networks. Vol. 11. No. 2. pp. 436-451.
-
Yang, J. and Honavar, V. (1999). DistAl: An Inter-Pattern Distance Based Constructive Neural Network Learning Algorithm.. Intelligent Data Analysis.
Vol. 3. pp. 55-73.
-
Yang, J. and Honavar, V. (1998).
Feature Subset
Selection Using a Genetic Algorithm. In: Feature Extraction, Construction, and Subset Selection: A Data Mining Perspective. Motoda, H. and Liu, H. (Ed.) New York: Kluwer. 1998. A shorter version of this paper appears in IEEE Intelligent Systems (Special Issue on Feature Transformation and Subset Selection).
- Balakrishnan, K. and Honavar, V. (1998). Intelligent Diagnosis Systems. Journal of Intelligent Systems. Vol. 8. No.3/4. pp. 239-290.