|
|
Shashi K. Gadia Associate Professor
Current Affiliations
- Adjunct Professor, Birla Institute of Technology and Science.
Research Interests - Non-atomic values in databases: XML where values are subtrees and parametric data where values are functions from parametric spaces consisting of one or multidimensional points. We have studied database models, query languages, incomplete information, query optimization in temporal, spatial and multilevel security databases. As ongoing efforts we have build prototypes for storage of large XML documents and geotemporal datasets in study of agriculture.
Research Areas - Database Systems, Software Systems
Research Statement - In databases query languages are algebraic; for users queries are natural and easy to write and optimization is left to the system. Traditionally this framework is only available for atomic values. Our major interest is in extending this framework to non-atomic values. As non-atomic values can be complex in structure and arbitrarily large in size, one faces difficulties ranging from development of models and query languages to storage, implementation, and optimization. There are two major categories of non-atomic values: nested and semistructured values on one hand and parametric values that have underlying dimensions, such as space, time, and beliefs, on the other.
The nested and semistructured values of databases have been absorbed into the data model offered by XML and its associated query language XQuery. The reason of success of XML is that it is not tied to a linear arrangement of atomic values rather it offers a tree-based recursive model for information where values are sub-trees. XML is destined to become ubiquitous as it is found to be highly conducive to user data and metadata and various layers in a computer system ranging from hardware support, memory organization, secondary storage, to layers of software system architecture their specifications and interfacing. XML and efforts leading to it are not our creation; our participation is in enabling XML itself. There are multiple approaches to storage of XML; we too are developing easily deployable and efficient support for storage, access, and navigation of terabyte range XML documents. A full realization of this technology is a longterm exercise requiring efforts ranging from building support for XML-navigation in hardware, low level software-based navigation, and XQuery.
Parametric values are functions from one or multidimensional space of points. Although XML seems to absorb so many forms of data it is inherently non-conducive for modeling the behavior of parametric values to be linguistically satisfactory to users. We have given a methodology for natural unification of temporal, spatial, and belief dimensions. Our work, starting from scratch, involves a wide range of activities: development of models, query languages, dealing with incomplete information, pattern matching query optimization, access methods, implementation, and user interfaces. It is interesting to note that even though XML does not mimic linguistics of the concept of dimension, it can serve as a physical layer for parametric data that is extraordinarily easier to use for system building than any storage technology existing today. Our implementation of prototypes of parametric data is made possible due to the usage of our own XML-based storage technology at physical level.
Linguistically, XML and parametric values are mutually independent of each other; neither can absorb the other. Their understanding and deployment in computer systems will continue to grow for many decades to come. The linguistic advantages offered by our approach to parametric data are unsurpassed in temporal, spatial, and multilevel security database literatures.
Currently our focus is on the following projects.
1. Building technology to enable XML by providing support for it in hardware (CPU) and continue to use XML for system building.
2. A parametric database prototype for NC94 build in collaboration with our colleagues in meteorology and atmospheric sciences. NC94 is an important geotemporal dataset, consisting of climate, soil, and agricultural yield covering the north-central region of the United States during 1970 to 2000. The objective of this highly refined dataset developed by agriculturists, is to help build scientific models for study of agriculture and its relationship to natural phenomenon such as climate and spread of rust (crop-based soil diseases). The prototype will make NC94 more accessible to societal users ranging from farmers to policy makers.
3. A parametric model and query language for multilevel security database that would benefit privacy and confidentiality when information is shared in environments such as medical care and intelligence.
4. We have also developed a style for database implementation. All systems (subsystems) are developed in a modular fashion to support a command line interface. A common GUI is being implemented that allows commands from different subsystems to be executed in a single batch. The GUI has a language of its own that allows string variables to hold queries, the corresponding expression tree, display the expression tree in our native XML format or as a graph. Queries can be executed inside nested loops with different system parameters and performance benchmarks can be logged in XML-based log files created by the user. The benchmark results can be queried to generate reports (hopefully even graphs). Starting from empty disks, the entire exercise of creating a storage, buffer manager, loading real datasets or creating synthetic data, running queries, collecting and reporting benchmarks can be done by at the click of a button making experiments completely repeatable. Depending upon the experiment they can run from seconds to weeks and months unattended. The GUI helps make the system architecture at large self documenting. It is suitable for research and development as well as database implementation course (Com S 562 at ISU). In addition, the GUI can also run SQL and XQuery queries making it suitable for elementary database course (Com S 363 at ISU). It leads to a smooth interpolation between instruction and research.
Education - Ph.D. University of Illinois 1977
Current Grants CSR: Small: Meta Analysis Directed Execution. Akhilesh Tyagi, Arun Somani, and Shashi Gadia. NSF (2009-2011). $100,000.
Representative Publications - Refereed Journal and Conference Publications
Seo-Young Noh, Shashi K. Gadia, and Shihe Ma. An XML-based methodology for parametric temporal database model implementation. Journal of Systems and Software, Elsevier. Vol. 81. No. 6. pp. 929-948, 2008.
Seo-Young Noh, Shashi K Gadia. A Comparison of Two Approaches to Utilizing XML in Parametric Databases for Temporal Data. Information & Software Technology, Elsevier. Vol. 48. pp. 807-819, 2006.
Shashi K. Gadia and T. Cheng. A pattern matching language for spatio-temporal databases. Proceedings of the Third International Conference on Information and Knowledge Management. pp. 288-295, 1994.
Shashi K. Gadia, T. Cheng and S. Nair. Object identity and dimension alignment in parametric databases. Proceedings of the Second International Conference on Information and Knowledge Management. pp. 615-624, 1993.
Shashi K. Gadia and G. Bhargava. Relational database systems with zero information-loss. IEEE Transactions on Knowledge and Data Engineering. Vol. 5. pp. 76-87, 1993.
|