In recent years, the amount of statistical data available on the web has been growing fast. Numerous organizations and governments publish data sets in a multitude of formats and encodings, using different scales, and providing access through a wide range of mechanisms. Due to such inconsistent publishing practices, integrated analysis of statistical data is challenging. StatSpace tackles this problem through semantic integration and provides uniform access to disparate statistical data. At present, it incorporates more than 1,800 data sets published by a variety of data providers including the World Bank, the European Union, and the European Environment Agency. StatSpace transparently lifts data from raw sources, maps geographical and temporal dimensions, aligns value ranges, and allows users to explore and integrate the previously isolated data sets. This paper introduces the constituent elements of the StatSpace architecture - i.e., a metadata repository, URI design patterns, and supporting services - and demonstrates the usefulness of the resulting Linked Data infrastructure by means of use case examples.
Architecture | Metadata structure |
Datahub | https://datahub.io/dataset/statspace |
SPARQL endpoint | http://ogd.ifs.tuwien.ac.at/sparql (Graph URI:http://statspace.linkedwidgets.org) |
RDF data (~33MB) | http://statspace.linkedwidgets.org/code/statspace.ttl |
Example 1: | List data sets in the metadata repository |
Example 2: | List information about a specific metadata description |
Goal: | Allowing users to integrate statistical data from multiple data sources |
Input: | SPARQL query |
Output: | Statistical data from different data sets which satisfy the input query |
Link: | http://statspace.linkedwidgets.org/mediator/ |
Goal: | Transforming raw data into RDF format following Data cube vocabulary |
Input: | RML mapping (required) and parameters (optional) |
Output: | Data in RDF format |
Example: | Transform data of the World Bank to RDF format |
Goal: | Generating metadata for statistical data sets |
Input: | RML mapping or SPARQL endpoint |
Output: | Metadata of data sets |
Example 1: | Generate metadata for a statistical data set of the ONS |
Example 2: | Generate metadata for a statistical data set of the World Bank |
Example 3: | Generate metadata for a sparql endpoint |
StatSpace Explorer | http://statspace.linkedwidgets.org/explorer |
StatSpace Interface | http://linkedwidgets.org/resource |
Mashup of statistical information on an area | http://linkedwidgets.org/LinkedWidgetPlatform.html?id=MashupSpatialDataLocator |
Mashup of statistical data comparison | http://linkedwidgets.org/LinkedWidgetPlatform.html?id=MashupSpatialDataComparator |
Goal: | Allowing end users to explore statistical datasets in an easy way |
Input: | Sparql endpoint |
Output: | Statistical widgets. Each dataset in the input endpoint is modelled in one widget. Users can run widgets in browser or upload them to our platform (http://linkedwidgets.org) to create mashups |
Link: | http://statspace.linkedwidgets.org/generation/ |
Goal: | Allowing end user to query data from Sparql endpoint without worrying about cross domain issue |
Input: | Values for two parameters - endpoint and query |
Output: | In successful case, the service returns result in JSON format, otherwise it returns 500 error code |
Structure: | http://statspace.linkedwidgets.org/sparql?endpoint=endpoint_string&query=query_string |
Example: | URL |