DataBase Group

Annotation and Mapping Discovery among Data Sources

Schema matching is the task of finding the semantic correspondences (mappings) between elements of two schemata

Approach: starting from “hidden” meanings associated to schema labels (i.e. class and attribute names, also called terms), the MOMIS Data Integration system discovers lexical relationships among schema elements
Lexical Annotation of schema labels is the explicit assignment of meanings w.r.t. a reference lexical thesaurus (such as WordNet )
- Manual Annotation is a boring and not scalable task --> Automatic or Semi-automatic Annotation

WSD (Word Sense Disambiguation) is the ability of identifying the meanings of words in a context by a computational technique
The semi-automatic CWSD (Combined Word Sense Disambiguation) method:
1. associates to each label, one/more WordNet meanings
2. combines two WSD algorithms: SD (Structural Disambiguation) exploits the schema derived relationships & WND (WordNet domains Disambiguation) exploits WordNet Domains

Schema label normalization: is the reduction of each label to some standardized form that can be easily recognized
→ abbreviation expansion and CN (Compound Noun) annotation

For a detailed description, please see the Phd Thesis of Serena Sorrentino and the Phd Thesis of Laura Po
Techniques are implemented in NORMS, a tool of the MOMIS-Datariver Data Integrator, developed within the FIT STARTUP project.