In Partial Fulfillment of the Requirements for the Degree of
Master of Science
Will defend his thesis
Structured and unstructured data have been studied separately in database systems and information retrieval research, respectively. Despite this, a new interest in integrating both fields is being inspired by search engines techniques. Our research focuses on establishing links between a database and the documents surrounding it, and exploiting a DBMS query language (SQL), as well as its extensibility features like User-Defined Functions (UDFs) and stored procedures. In this thesis, we first study how to adapt classical information retrieval techniques for working inside a DBMS using SQL queries and UDFs. We then study how to efficiently compute top-k queries that are particularly difficult to optimize in SQL. Finally, we study how to match keywords in documents with keywords in a database, at different storage granularity levels. Specifically, we study how to establish links between metadata information such as table and column names with documents based on exact matches. This matching process is systematically carried out in a bottom-up fashion. Experimental evaluation shows our system can efficiently analyze and explore a digital library with thousands of documents.