Förslaget inkom 2003-03-31

METADATA FOR WEATHER DATABASES AT SMHI: Metadata-harvesting agents for weather databases at SMHI

SMHI maintains large volumes of meteorological, hydrological and oceanographic data (MHO data) in many separate databases. Each database contains different data from different time periods in different formats. A major problem is to locate relevant databases, understand how to access them, and how to correctly interpret the data. Most of the database systems provide metadata(*) but we lack a central tool that can search across all available information resources.
As a first step towards a central metadata tool, SMHI adopted the FGDC standard (Federal Geographic Data Committee **) for electronic documentation of our MHO databases. However, collecting, formatting, and entering metadata is tedious work. The next step is to develop harvesting agents that automatically collect metadata from the separate database systems and organise the metadata according to the FGDCs XML-based metadata model . The XML-data needs to be organised and stored in an efficient manner. Finally, users should be able to search the harvested metadata using an intuitive but flexible GUI.

Project Proposal: Metadata-harvesting agents for weather databases at SMHI

The proposed Master Thesis is to investigate techniques for real-time harvesting of metadata from heterogeneous infomation resources including relations database systems, XML-documents, dedicated file-servers, etc. The final report should include:
- description of criteria for a good metadata harvesting system
- analysis of known techniques for harvesting metadata (including agents, polling, active databases etc.).
- based on the criteria for a good harvester, recommend a suitable harvesting technique
- practical evaluation of the selected harvesting technique based on prototype implementation in Java or C++.
- discussion of appropriate solution

(*) Simply defined, metadata is data about data. Metadata describes the content, quality, conditions and other characteristics of information resources.
(**) Documentation of FGDC is available at http://www.fgdc.gov/

Applicants must speak fluent swedish since the work involves contacts SMHI-staff (system developers, information owners, system administrators, meteorologists, etc.).


