Instance-Level Integration, Query Processing and Optimization in Federated Database Systems

Publication Type


Publication Date



This thesis addresses the instance-level integration, query processing and optimization problems in a federated database environment in which the heterogeneity among component databases has to be resolved and the autonomy of component database systems has to be preserved. The main contribution of this thesis is to define entity identification and attribute value conflict as two instance-level integration problems arising from the heterogeneity of local databases and to propose solutions to them. The objective of entity identification is to match object instances from different databases which correspond to the same real-world entity. Attribute value conflict arises when the attribute values in the two databases, modeling the same property of a real-world entity, do not match. The thesis also addresses the federated query processing and optimization problem in the context of heterogeneity and autonomy. Federated query optimization is concerned with producing an efficient execution plan for a query over a virtually integrated database. In this research, we propose a two-step entity identification process which separates the derivation of identifying attributes and the matching of object instances. Reasoning techniques based on definite logic and indefinite logic are adopted by the two-step process. In the context of attribute value conflict resolution, we are concerned with resolving values that contain uncertainties. We propose an extended relational model based on the Dempster-Shafer theory of evidence to incorporate such uncertain knowledge about the source databases. The closure and boundedness properties of our proposed extended operations are formulated. In the context of federated query processing and optimization, we propose a set of integration operations that are useful in resolving instance-level conflicts. We develop an algebraic transformation framework which involves both the existing relational operations and the integration operations. This framework is subsequently used for optimizing the federated database queries based on our proposed query processing architecture. We have also implemented the definite logic approach to entity identification problem, and the evidential reasoning approach to resolving conflicting attribute values. The algorithms for performing the proposed integration operations, and the algorithms for federated query processing have been realized in the Myriad project--a federated database prototype.


Databases and Information Systems

Research Areas

Data Management and Analytics


University of Minnesota

Additional URL