shopify analytics

StatSpace: A linked statistical data space

In recent years, the amount of statistical data available on the web has been growing fast. Numerous organizations and governments publish data sets in a multitude of formats and encodings, using different scales, and providing access through a wide range of mechanisms. Due to such inconsistent publishing practices, integrated analysis of statistical data is challenging. StatSpace tackles this problem through semantic integration and provides uniform access to disparate statistical data. At present, it incorporates more than 1,800 data sets published by a variety of data providers including the World Bank, the European Union, and the European Environment Agency. StatSpace transparently lifts data from raw sources, maps geographical and temporal dimensions, aligns value ranges, and allows users to explore and integrate the previously isolated data sets. This paper introduces the constituent elements of the StatSpace architecture - i.e., a metadata repository, URI design patterns, and supporting services - and demonstrates the usefulness of the resulting Linked Data infrastructure by means of use case examples.



Architecture Metadata
Architecture Metadata structure

Source code

Documentation of URI design and mapping

1. Metadata repository

Datahub https://datahub.io/dataset/statspace
SPARQL endpoint http://ogd.ifs.tuwien.ac.at/sparql (Graph URI:http://statspace.linkedwidgets.org)
RDF data (~33MB) http://statspace.linkedwidgets.org/code/statspace.ttl
Example 1: List data sets in the metadata repository
Example 2: List information about a specific metadata description

2. Code lists

SDMX code list https://sdmx.org/?page_id=3215
CL_Area http://statspace.linkedwidgets.org/codelist/cl_area RDF data
CL_Period http://statspace.linkedwidgets.org/codelist/cl_period RDF data
CL_Age http://statspace.linkedwidgets.org/codelist/cl_age RDF data
CL_Education_Lev http://statspace.linkedwidgets.org/codelist/cl_educationLev RDF data
CL_Occupation http://statspace.linkedwidgets.org/codelist/cl_occupation RDF data
CL_Currency http://statspace.linkedwidgets.org/codelist/cl_currency RDF data
CL_Civil_Status http://statspace.linkedwidgets.org/codelist/cl_civilStatus RDF data
CL_Freq http://purl.org/linked-data/sdmx/2009/code#freq RDF data
CL_Sex http://purl.org/linked-data/sdmx/2009/code#sex RDF data
CL_EconomicActivity http://statspace.linkedwidgets.org/codelist/cl_economicActivity RDF data
CL_Coicop http://statspace.linkedwidgets.org/codelist/cl_coicop RDF data
CL_Cofog http://statspace.linkedwidgets.org/codelist/cl_cofog RDF data
CL_Copp http://statspace.linkedwidgets.org/codelist/cl_copp RDF data
CL_Copni http://statspace.linkedwidgets.org/codelist/cl_copni RDF data
CL_Unit_Measure http://statspace.linkedwidgets.org/codelist/cl_unitMeasure RDF data
CL_Subject http://statspace.linkedwidgets.org/codelist/cl_subject RDF data

3. Mediator service

Goal: Allowing users to integrate statistical data from multiple data sources
Input: SPARQL query
Output: Statistical data from different data sets which satisfy the input query
Link: http://statspace.linkedwidgets.org/mediator/

4. RML mapping service

Goal: Transforming raw data into RDF format following Data cube vocabulary
Input: RML mapping (required) and parameters (optional)
Output: Data in RDF format
Example: Transform data of the World Bank to RDF format

5. Metadata generation service

Goal: Generating metadata for statistical data sets
Input: RML mapping or SPARQL endpoint
Output: Metadata of data sets
Example 1: Generate metadata for a statistical data set of the ONS
Example 2: Generate metadata for a statistical data set of the World Bank
Example 3: Generate metadata for a sparql endpoint

6. StatSpace exploration

StatSpace Explorer http://statspace.linkedwidgets.org/explorer
StatSpace Interface http://linkedwidgets.org/resource
Mashup of statistical information on an area http://linkedwidgets.org/LinkedWidgetPlatform.html?id=MashupSpatialDataLocator
Mashup of statistical data comparison http://linkedwidgets.org/LinkedWidgetPlatform.html?id=MashupSpatialDataComparator

7. Other services

7.1. Widget generation

Goal: Allowing end users to explore statistical datasets in an easy way
Input: Sparql endpoint
Output: Statistical widgets. Each dataset in the input endpoint is modelled in one widget. Users can run widgets in browser or upload them to our platform (http://linkedwidgets.org) to create mashups
Link: http://statspace.linkedwidgets.org/generation/

7.2. Sparql query service

Goal: Allowing end user to query data from Sparql endpoint without worrying about cross domain issue
Input: Values for two parameters - endpoint and query
Output: In successful case, the service returns result in JSON format, otherwise it returns 500 error code
Structure: http://statspace.linkedwidgets.org/sparql?endpoint=endpoint_string&query=query_string
Example: URL

Publications

StatSpace: A unified platform for statistical data exploration
Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, and A Min Tjoa
Proceedings of OTM 2016 Conferences: CoopIS, ODBASE, and C&TC
Rhodes, Greece, October 2016

Toward a Framework for Statistical Data Integration
Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa
Proceedings of the 3rd International Workshop on Semantic Statistics
Bethlehem, U.S. October, 2015

Toward a statistical data integration environment: the role of semantic metadata
Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa
Proceedings of the 11th International Conference on Semantic Systems
Vienna, Austria, August, 2015

Multiscale Exploration of Spatial Statistical Datasets: A Linked Data Mashup Approach
Ba-Lam Do, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, Amin Anjomshoaa, A Min Tjoa
Proceedings of the 2nd International Workshop on Semantic Statistics
Riva del Garda, Italy, October, 2014

Widget-based Exploration of Linked Statistical Data Spaces
Ba-Lam Do, Tuan-Dat Trinh, Peter Wetz, Amin Anjomshoaa, Elmar Kiesling, A Min Tjoa
Proceedings of 3rd International Conference on Data Management Technologies and Applications
Vienna, Austria, August, 2014

Exploring linked statistical data using linked widgets
A Min Tjoa, Ba-Lam Do, Amin Anjomshoaa, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling
Proceedings of the Fifth Symposium on Information and Communication Technology
Hanoi, Vietnam, December, 2014