The amount of systems and data sources in an organisation may be overwhelming and near impossible to keep track of. Helsinki is no exception to the rule. Data is dispersed into countless solutions, and bringing it into the same location is not a feasible solution. How can anyone find the data they are seeking for, if they don’t even know where to start looking. How can we bring the findings and wisdom of experts as part of the data for the next users? Helsinki has started a project to seek possibilities of having a data catalogue solving these issues, and Forum Virium Helsinki aims to support the project by piloting our own data catalogue, Datahub.
In a city environment, even one question may require multiple points of view and diverse sets of data. For example, the researchers investigating weather impact on traffic have to look for data from multiple systems. The data of e-scooters is in one place, public transportation in another, snow plows in another, etc.
In addition, each data source has their own specialist, whose help is often needed for understanding the data correctly. When there is a great amount of data and lots of specialists, building a big picture could prove challenging.
Traditional solution for sharing this kind of knowledge is documentation. When speaking of data, the documentation rarely keeps up in a frequently changing data environment and the data tools usually offer only a possibility to share “just” the data. And usually these tools and documentation are not reachable by all the data users! Data catalog in principle answers to this challenge, sharing information.
Data catalogue collects automatically the schema and basic information from different systems, and brings them conveniently to a single place. This leaves the systems specialised for storing and moving the data doing what they do best, while catalog focuses on offering data discovery and understanding. With an easy user interface, everyone has a possibility to get to know what kind of data is available, and what it represents.
The researchers investigating weather impact on traffic could start to look for data from data catalogues search bar.
Open source as a solution
Forum Virium Helsinki has already been involved in establishing Helsinki Region Infoshare (HRI), which has evolved into an acknowledged open data catalogue. Before the data ends up into HRI, those are created in different systems of the city, partnering organisations, projects and processes. Now, we have piloted a next generation data catalogue for internal use, which allows even better use of information sources. This helps the gathering of data and the knowledge at an early stage. And like HRI, this is based upon open source solutions.
Automation and usability are essential in next generation data catalogues. Where the public data sources are updated manually, for internal use, the catalogue publishes all the data sources automatically and continuously. We started a pilot in the winter of 2022, and already in the first weeks, it has been connected into an important data warehouse and stream processor of IoT.
For users, the data catalogue offers many improvements:
- Data discovery over different systems
- Centralised place for information and documentation of data, which offers up to date metadata
- New possibilities to group and enrich datasets, for example by #tagging.
- Automatic information of data quality.
Data catalogue is technically a simple tool, but the great benefits of it can be only accomplished with continuous development and collaboration. It serves different kinds of specialists working with data, and supports them to share their knowledge. The aim is to make using diverse data sources easier and more trustworthy. It is our way to make wiser decisions on developing our urban environment.