Fabio Benedetti







Revealing the underlying
structure of Linked Open Data
for enabling visual querying

 The Linked Data Principles ratified by Tim-Berners Lee promise that a large portion of Web Data will be usable as one big interlinked RDF (i.e. Resource Description Framework) database. Today, with more than one thousand of Linked Open Data (LOD) sources available on the Web, we are assisting to an emerging trend in publication and consumption of LOD datasets. However, the pervasive use of external resources together with a deficiency in the definition of the internal structure of a dataset causes that many LOD sources are extremely complex to understand. The goal of this thesis is to propose tools and techniques able to reveal the underlying structure of a generic LOD dataset for promoting the consumption of this new format of data. In particular, I propose an approach for the automatic extraction of statistical and structural information from a LOD source and the creation of a set of indexes (i.e. Statistical Indexes) that enhance the description of the dataset. By using this structural information, I defined two models able to effectively describe the structure of a generic RDF dataset: Schema Summary and Clustered Schema Summary. The Schema Summary contains all the main classes and properties used within the datasets, whether they are taken from external vocabularies or not. The Clustered Schema Summary, suitable for large LOD datasets,  provides a more high-level view of the classes and the properties used by gathering together classes that are object of multiple instantiations. All these efforts allowed the development of a tool called LODeX able to provide a high-level summarization of a LOD dataset and a powerful visual query interface to support users in querying/analyzing an unknown datasets. All the techniques proposed in this thesis have been extensively evaluated and compared with the state of the art in their field: a performance evaluation of the LODeX’s module delegated to the extraction of the indexes is proposed; the technique of schema summarization has been evaluated according to ontology summarization metrics; finally, LODeX itself has been evaluated inspecting its portability and usability. In the second part of the thesis, I present a novel technique called CSA (Context Semantic Analysis) that exploits the information contained in a knowledge graph for estimating the similarity between documents. This technique has been compared with other state of the art measures by using a benchmark containing documents an measures of similarity provided by human judges.

Marius Octavian Olaru



 Heterogeneous DataWarehouse Analysis and Dimensional Integration

The DataWarehouse (DW) is the main Business Intelligence instrument for the analysis of large banks of operational data and for extracting strategic information in support of the decision making process. It is usually focused on a specific area of an organization. Data Warehouse integration is the process of combining multidimensional information from two or more heterogeneous DWs, and to present users an unified global overview of the combined strategic information from the DWs. The problem is becoming more and more frequent as the dynamic economic context sees many companies merges/acquisitions and the formation of new business networks, like co-opetition, where managers need to analyze all the involved parties and to be able to take strategic decisions concerning all the participants. The contribution of the thesis is to analyze heterogeneous DW environments and to present a dimension integration methodology that allows users to combine, access and query data from heterogeneous multidimensional sources. The integration methodology relies on graph theory and the Combined WordSense Disambiguation technique for generating semantic mappings between multidimensional schemas. Subsequently, schema heterogeneity is analyzed and handled, and compatible dimensions are uniformed by importing dimension categories fromone dimension to another. This allows users from different sources to have the same overview of the local data, and increases local schema compatibility for drill-across queries. The dimensional attributes are populated with instance value by using a chase algorithm variant based on the RELEVANT clustering approach. Finally, several quality properties are discussed and analyzed. Dimension homogeneity/heterogeneity is presented from the integration perspective; also the thesis presents the theoretical fundamentals under which mapping quality properties (like coherency, soundness and consistency) are preserved. Furthermore, the integration methodology will be analyzed when slowly changing dimensions are encountered.


Matteo Interlandi




On Declarative Data-Parallel Computation: Models, Languages and Semantics

If we put under analysis the plethora of large-scale data-processing tools avail- able nowadays, we can recognize two main approaches: a declarative approach pursued by parallel DBMS systems and firmly grounded on the relational model theory; and an imperative approach followed by modern data-processing “MapReduce- like” systems which are highly scalable, fault-tolerant, and mainly driven by industrial needs. Although there has been some work trying to bring together the two worlds, these works focus mainly on exporting languages and interfaces - i.e., declarative languages on top of imperative systems, or MapReduce-like functions over parallel DBMS – or in a systematic merging of the features of the two approaches. We advocate that, instead, a declarative imperative approach should be attempted: this is, the development of a new computational model with related language, based on the relational theory and following the same patterns commonly present in modern data-processing systems, while maintaining a declarative flavor.
The goal of this thesis is then to carry out a first step in this direction. More concretely, we developed a new synchronous computational model for relational distributed parallel data-processing, leveraging on previous works on relational transducers and transducer networks. Such computational model accepts declarative program specifications expressed in a version of Datalog¬ specifically tailored for parallel computation. Datalog¬ is a language lying in between logic and query languages and, thanks to its nature, not only data-driven parallel computation can be declarative expressed, but also the theoretical foundations connecting the semantics of programs with the emerging properties of their parallel execution can be explored.

Abdul Rahman Dannaoui



Information Integration for biological data sources

This thesis focuses on data integration and data provenance in the context of the MOMIS data integration system that was used to create the CEREALAB database. Its main contribution is the creation of the CEREALAB database V2.0 with new functionalities derived from the needs of the end users and the study of different data provenance models to finally create a new component for the MOMIS system in order to offer data provenance support for the CEREALAB users.

Serena Sorrentino

thesis presentation
Label Normalization and Lexical Annotation for Schema and Ontology Matching The goal of this thesis is to propose, and experimentally evaluate automatic and semi-automatic methods performing label normalization and lexical annotation of schema labels. In this way, we may add sharable semantics to legacy data sources. Moreover, annotated labels are a powerful means in order to discover Lexical Relationships among structured and semi-structured data sources. Original methods to automatically normalize schema labels and extract lexical relationships have been developed and their affectiveness for automatic schema matching shown.

Nana Mbinkeu Carlos

Query Optimization and Quality-Driven Query Processing for Integration Systems This thesis focused on some core aspects in data integration, i.e. Query Processing and Data Quality. First this thesis proposed new techniques that consider the optimization of the full outerjoin operation, which is used in data integration systems for data fusion. Then this thesis demonstrated how to achieve Quality-Driven Query Processing, where quality constraints specified in Data Quality Aware Queries are used to perform query optimization.

Antonio Sala

Data and Service Integration: Architectures and Applications to Real Domains This thesis focuses on Semantic Data Integration Systems, with particular attention to mediator system approaches, to perform data and service integration. One of the topics of this thesis is the application of MOMIS to the bioinformatics domain to integrate different public databases to create an ontology of molecular and phenotypic cereals data. However, the main contribution of this thesis is a semantic approach to perform aggregated search of data and services. In particular, I describe a technique that, on the basis of an ontological representation of data and services related to a domain, supports the translation of a data query into a service discovery process, that has also been implemented as a MOMIS extension. This approach can be described as a Service as Data approach, as opposed to Data as a Service approaches. In the Service as Data approach, informative services are considered as a kind of source to be integrated with other data sources, to enhance the domain knowledge provided by a Global Schema of data. Finally, new technologies and approaches for data integration have been investigated, in particular distributed architecture, with the objective to provide a scalable architecture for data integration. An integration framework in a distributed environment is presented that allows realizing a data integration process on the cloud.

Laura Po

Automatic Lexical Annotation: an effective technique for dynamic data integration. La tesi illustra come l'annotazione lessicale sia un elemento cruciale in ambito di integrazione dati. Grazie all'annotazione lessicale, vengono scoperte nuove relazioni tra gli elementi di uno schema o tra elementi di schemi diversi. Diversi metodi per eseguire automaticamente l'annotazione delle sorgenti dati vengono descritti e valutati in diversi scenari. L'annotazione lessicale può perfezionare anche sistemi per la scoperta di matching tra ontologie. Sono presentati alcuni esperimenti di applicazione dell'annotazione lessicale ai risultati di un matcher. Infine, viene introdotto l'approccio all'annotazione probabilistica e viene illustrata la sua applicazione nei processi di integrazione dinamici.

Mirko Orsini

Query Management in Data Integration Systems: the MOMIS approach.  This thesis investigates the issue of Query Management in Data Integration Systems, taking into account several problems that have to be faced during the query processing phase. The achieved goals of the thesis have been the study, analysis and proposal of techniques for effectively querying Data Integration Systems. The proposed techniques have been developed in the MOMIS Query Manager prototype to enable users to query an integrated schema, and to provide users a consistent and concise unified answer. The effectiveness of the MOMIS Query Manager prototype has been demonstrated by means of the THALIA testbed for Data Integration Systems. Experimental results show how the MOMIS Query Manager can deal with all the queries of the benchmark.

A new kind of metadata that offers a synthesized view of an attributes values, the relevant values, has been defined and the effectiveness of such metadata for creating or refining a search query in a knowledge base is demonstrated by means of experimental results.

The security issues in Data integration/interoperation systems have been investigated and an innovative method to preserve data confidentiality and availability when querying integrated data has been proposed. A security framework for collaborative applications, in which the actions that users can perform are dynamically determined on the basis of their attribute values, has been presented, and the effectiveness of the framework has been demonstrated by an implemented prototype.

Gionata Gelati



Agent Technology Applied to Information Systems
 The thesis is thus divided into three parts. In the first one, software agents are presented and critically compared to other mainstream technologies. We also discuss modeling issues. In the second part, some example systems where we applied agent technology are presented and the solution is discussed. The realistic scenarios and requirements for the systems were provided by the WINK and SEWASIE projects. The third part presents a logical framework for characterizing the interaction of software agents in virtual societies where they may act as representatives of humans.


Francesco Guerra

thesis presentation

flag ita Dai Dati all'Informazione: il sistema MOMIS

The thesis introduces the methodology for the construction of a Global Virtual View of structured data sources implemented in the MOMIS system. In particular, the thesis focuses on the problem of the management and update of multi-language sources. Noreover, the thesis proposes a comparison between MOMIS and the main mediators available in the literature. Finally, some applications of the MOMIS systems in the fields of Semantic Web and e-commerce (developed within National and European projects) have been proposed.

Ilario Benetti

Knowledge Management for Electronic Commerce applications This work summarizes the activities developed during the Ph. D studies in Information Engineering. It is organized in two parts. The first part describes the Knowledge Management Systems and their applications to Electronic Commerce. In particular, a technical and organizational  overview about the most critical issues concerning the Electronic Commerce applications is presented. This part is the result of a two years long research carried out in cooperation with Professor Enrico Scarso within the interdisciplinary – ICT and business organization – MIUR project “Il Commercio Eletronico: nuove opprtunità e nuovi mercati per le PMI”. The second part introduces the Intelligent Integration of Information (I3) research topic and presents the MOMIS system approach for I3. It outlines the theory underlying the MOMIS prototype and focuses on the generation of virtual catalogs issues in the electronic commerce environment exploiting the SIDesigner component. A new MOMIS architecture, based on XML Web Service, is finally proposed. The new architecture not only aims at addressing specific virtual catalogs’ issues, but it also lead to a general improvement of the MOMIS system.

Alberto Corni

Intelligent Information Integration: The MOMIS Project This thesis describes the work done during my Ph.D studies in Computer Engineering. It is organized in two parts. The first and main part describes the reseach project MOMIS for the Intelligent Integration of heterogeneous information. It outlines the theory for Intelligent Integration and the design and implementation of the prototype that implements the theoretical techniques. During my Ph.D. studies I stayed at the Northeastern University in Boston, Mass. (USA). Subject of the second part of this document is the work I did with Professor Ken Baclawski in information retrieval on annotation of documents using ontologies, and retrieval of the annotated documents.

Maurizio Vincini

pdf1 pdf2
flag ita Utilizzo di tecniche di Intelligenza Artificiale nell'Integrazione di Sorgenti Informative Eterogenee Nella tesi di Dottorato viene presentato il sistema MOMIS (Mediator envirOnment for Multiple Information Sources), per l'integrazione di sorgenti di dati strutturati e semistrutturati secondo l'approccio della federazione delle sorgenti. Il sistema prevede la definizione semi-automatica dello schema univoco integrato che utilizza le informazioni semantiche proprie di ogni schema (col termine schema si intende l'insieme di metadati che descrive un deposito di dati).

Domenico Beneventano

flag ita Uno Strumento di Inferenza nelle Basi di Dati ad Oggetti (Subsumption inference for Object-Oriented Data Models) Object-oriented data models are being extended with recursion to gain expressive power. This complicates both the incoherence detection problem which has to deal with recursive classes descriptions and the optimization problem which has to deal with recursive queries on complex objects. In this phd thesis, we propose a theoretical framework able to face the above problems. In particular, it is able to validate and automatically classify in a database schema, (recursive) classes, views and queries, organized in an inheritance taxonomy. The framework adopts the ODL formalism (an extension of the Description Logics developed in the area of Artificial Intelligence) which is able to express the semantics of complex object data models and to deal with cyclic references at the schema and instance level. It includes subsumption algorithms, which perform automatic placement in a specialization hierarchy of (recursive) views and queries, and incoherence algorithms, which detect incoherent (i.e., always empty) (recursive) classes, views and queries. As different styles of semantics: greatest fixed-point, least fixed-point and descriptive can be adopted to interpret recursive views and queries, first of all we analyze and discuss the choice of one or another of the semantics and, secondly, we give the subsumption and incoherence algorithms for the three different semantics. We show that subsumption computation and incoherence detection appear to be feasible since in almost all practical cases they can be solved in polynomial time algorithms. Finally, we show how subsumption computation is useful to perform Semantic query optimization, which uses semantic knowledge (i.e., integrity constraints) to transform a query into an equivalent one that may be answered more efficiently.
The phd thesis is in Italian. The content of this phd thesis can be found in the following two papers:
  • D. Beneventano, S. Bergamaschi, "Incoherence and Subsumption for recursive views and queries in Object-Oriented Data Models", Data & Knowledge Engineering 21 (1997), pag. 217-252, Elsevier Science B.V. (North- Holland). Abstract (ps), Paper (ps)
  • D. Beneventano, S. Bergamaschi, C. Sartori: "Description Logics for Semantic Query Optimization in Object-Oriented Database Systems", ACM Transaction on Database Systems, Volume 28: 1-50 (2003). Electronic Edition.