1•1 • Introduction to data •
"We need to look at the whole society and think, "Are we actually thinking about what we're doing as we go forward, and are we preserving the really important values that we have in society? Are we keeping it democratic, and open, and so on?" "Tim Berners Lee
By Javiera Atenas, with contributions of Juan Pane and Juan Belbis
Data are characteristics or information, usually numerical, that are collected through observation. In a more technical sense, they are a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum is a single value of a single variable. Data are transformed into information when they are created, extracted, elaborated and used with pre-established objectives. The information system often made up of data of the same or different type (the data set is defined as a “dataset”), is transformed into knowledge when it is interpreted thanks to tools, applications, methods, indicators, etc.
Data can be small or big, private, personal, governmental, military, scientific, public, confidential, commercial, financial or open, and normally pertain to information delivered in machine-readable file formats (machine-readable) in a format known as raw data. The most common formats are integer, floating-point number, character, string and Boolean. With the constant evolution of technology, the informative content and the data held by public administrations represent excellent opportunities to promote transparency in the actions of governments and administrations. Moreover, they can offer more efficient services and, since they facilitate reuse by other public and private subjects, they also can be used in areas other than those for which they have been produced or collected. Knowledge, in practice, acquires the value of awareness – in the case of open data this can be defined as “collective”, understood as being for the “common good” – when used for change and the improvement of reality (the facts).
Whilst data are features of information that are collected through observation, information is understood as a symbolic representation that describes facts, conditions, values or situations, collected and arranged in an appropriate way to fulfil the objective of the institution that manages it. On their own, these values lack a semantic value, that is, they do not have a meaning for someone, so they do not add value to the recipient of the message. For these data to make sense, they must be processed, associated or grouped within the same context to form information. Thus, we can conclude that information is an organised set of processed and related data in a way that allows us to communicate or acquire knowledge.
1•1 •Understanding Open Data
According to the International Open Data Charter, “Open Data is digital data that is made available with the technical and legal characteristics necessary so that it can be freely used, reused and redistributed by anyone, at any time and anywhere.” The Charter has arisen from a conversation between governments and civil society, which has resulted in the promotion of the adoption of the six principles described below. Moreover, Open Data has been defined by the Open Knowledge Foundation as that which can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and ShareAlike. Open Data core technical principles can be understood as follow:
- Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
- Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution, including the intermixing with other datasets.
- Universal Participation: everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
The six principles of open data developed by the Open Data Charter are a globally agreed set of aspirational norms for how to publish data, which can be summarised as follows.
- Open by default: This represents a real shift in how government operates and how it interacts with citizens. At the moment, we often have to ask officials for the specific information we want. Open by default turns this on its head under the perspective that there should be a presumption of publication for all. Governments need to justify data that is kept closed, for example, for security or data protection reasons. To make this work, citizens must also feel confident that open data will not compromise their right to privacy.
- Timely and comprehensive: Open data is only valuable if it is still relevant. Getting information published quickly and in a comprehensive way is central to its potential for success. As much as possible, governments should provide data in its original, unmodified form.
- Accessible and usable: Ensuring that data is machine-readable and facilitates its dissemination, with portals being one way of achieving this. It is also important to consider the user experience of those accessing data, including such matters as the file formats in which information is provided. Data should be free of charge and under an open licence, as demonstrated by Creative Commons.
- Comparable and interoperable: Data has a multiplier effect. The more quality datasets you have access to, and the easier it is for them (the datasets) to talk to each other, the more the potential value that can be acquired from them. Commonly agreed data standards play a crucial role in making this happen.
- For improved governance & citizen engagement: Open data has the capacity to let citizens (and others in government) have a better idea of what officials and politicians are doing. Transparency can improve public services and help hold governments to account.
- For inclusive development and innovation: Finally, open data can help spur inclusive economic development. For example, greater access to data can make farming more efficient, or it can be used to tackle climate change. Finally, we often think of open data as just about improving government performance, but there is a whole universe out there of entrepreneurs making money off the back of open data.
The Government of Canada summarises the benefits of Open Data as follows:
- Support for innovation – Access to knowledge resources in the form of data supports innovation in the private sector by reducing duplication and promoting the reuse of existing resources.
- Advancing the government’s accountability and democratic reform – Increased access to government data and information provides the public with greater insight into government activities, service delivery, and use of tax dollars.
- Leveraging public sector information to develop consumer and commercial products – Open and unrestricted access to scientific data for public interest purposes, particularly statistical, scientific, geographical, and environmental information, maximises its use and value, whilst the reuse of existing data in commercial applications improves time-to-market for businesses.
- Better use of existing investment in broadband and community information infrastructure – Canada has invested in information and communications networks in the form of technical infrastructure and community services, such as libraries and social service agencies.
- Support for research – Access to federal research data supports evidence-based primary research in the Canadian and international academic, public sector, and industry-based research communities. Access to collections of data, reports, publications, and artefacts held in federal institutions allows for the use of these collections by researchers.
- Support informed decisions for consumers – Providing access to public sector service information to support informed decision-making, for example, real-time air travel statistics, can help travellers to choose an airline and understand the factors that can lead to flight delays.
- Proactive Disclosure – proactively providing data that is relevant to Canadians reduces the amount of access to information requests, email campaigns and media inquiries. This greatly reduces the administrative cost and burden associated with responding to such inquiries.
If you would like to delve deeper into open data, we advise you to go to The Open Data Handbook, where you will find the Open Data guide that discusses the legal, social and technical aspects of open data more in-depth. It can be used by anyone but is specially designed for those seeking to open up data. You will also find case studies highlighting the social and economic value, the impact and the varied applications of open data from cities and countries across the globe; and the resource library with a curated collection of open data resources, including articles, longer publications, how-to guides, presentations and videos, produced by the global open data community.
This video will allow you to get familiar with how to navigate the EU open data portal. You will need to navigate this site for our hackathon activity, so get ready!
1•2 •Opening up data
A dataset is a collection of organised data records where each element has the same structure, ordered for processing by a computer. For example, a dataset can be the list of schools in a country, the list of all state contracts for all its institutions, or the general budget of the nation.
The same dataset can have multiple distributions (or resources) that can vary in two dimensions:
- Temporal: in this case, the same dataset has records associated with data time. For example, the general budget of the nation has a different version each year, so too the list of contracts of a government.
- Format: each data set can be represented in various formats. For example, if we consider that the list of government contracts can be represented in a table, it can be digitised to be opened with Acrobat Reader (in .pdf format), or Microsoft Excel (.xls), by any processor text (.csv) or processed by automated systems (.json), among other arrangements.
A wide variety of formats can be used to make data available to the public; however, not all meet the necessary requirements to define such data as “open”. The format in which the information is published, that is, the digital base with which the information is stored, can, in fact, be open or closed. An open format is one in which the specificities of the software are available to anyone, free of charge, so that anyone can use them in the software itself without any limitation of reuse imposed through intellectual property rights. When, instead, the format is closed, it may mean that the format is proprietary and that the technical characteristics are not publicly available or that the file format is proprietary and, although the technical specifications are public, its use is limited.
The fundamental reason why it is important to clarify the meaning of “open” and why exactly use this definition can be summarised with one term: interoperability. This is the ability of different systems and organisations to work together. In our case, it is the ability to combine a database with others. Interoperability is the key that allows for the first practical advantage of openness: it increases exponentially the possibility of combining different databases and thus, developing new and better products and services. Interoperability is the key that allows for the first practical advantage of openness: it increases exponentially the possibility of combining different databases and thus, developing new and better products and services.
Furthermore, the advantage of files in open formats is that this allows developers to produce software and services using these formats. This minimises the obstacles to reusing the information they contain. Using proprietary formats can lead to dependency on third-party software or the licensees of the formats. At worst, this may mean that the information can be read using only a specific software format, which could be prohibitively expensive or become out of date over time.
Publishing data in open data portals in an efficient manner is key to developing strategies that address the following:
What data will be published iteratively and when? This refers to the roadmap to publishing the information. Given that there are generally limited resources, it is difficult initially to publish 100% of all the information available. So, it is important to have a roadmap in order to have clear and prioritised objectives in relation to what will be published. and when this will be achieved.
Where will the data be published and how will the data be published? This refers to the decision of the web address (the URL) where the open data portal will be, as well as the decisions regarding the formats in which the data will be published (JSON, CSV, JSON-LD). Some things that are important to consider are for example, if it will include an API for developers, or if massive downloads are expected.
What is the data update frequency? It should be acknowledged that there are datasets that need a higher update frequency than others, with some, for example, requiring daily (night, noon, etc.), weekly, monthly etc. updates.
Who is responsible for the publication of the data? This refers to those who are responsible for data management (system, institution, etc.). In all cases, it must be specified who publishes the data, and who is responsible for maintaining its accuracy and quality.
Who to contact if you have questions? It is important to explain clearly how to make inquiries relating to the data, in order to avoid misunderstandings.
What licence will be used to publish the data? The licence defines the permissions that the data owner grants in relation to what users can do. An open licence (for open data) must at least require attribution to the source and redistribution with the same licence.
Where can I find more reference information? There must be a place within the open data portal where you can access more information on related topics, such as where you can find data dictionaries, data manuals, or providing links to sites where you can find this data.
What is the regulatory framework of reference? It is very important always to have all the necessary references relating to policies, laws, decrees, resolutions, circulars, etc., which serve as a reference to everything that is exposed in the open data portal.
- Read this short article about the state of open data in agriculture from The State of Open Data for Development
- Prepare notes about the main points you think that could be addressed and brainstorm some ideas on how that could be done.
- Why do you think that sustainable agriculture is a ‘wicked’ problem? Do you have any ideas as to ways forward in addressing problems in this area using open data?
- Upload your thoughts here
1•3 •Publishing open data
The technical approach to data opening is based on the five-star data opening scheme defined by Tim Berners-Lee, a summary of which can be seen in the five-star figure below. This scheme proposes an incremental scale of data openness levels, where each level implies progress in terms of the objectives of open data: freedom of use, reuse and redistribution.
The great leap to the third star: the third star implies that the data is in a non-proprietary format, that is, it can be consumed and reused by anyone. To this end, the open data organisations are championing the standardisation of the open formats to be used in order to facilitate the work of data consumers. These formats are summarised in the following table.
Something to keep in mind is that depending on the types of data to be published, there are different formats to be used. For example, if the data is tabular, that is, it is contained in a table, one of the most used formats is CSV. On the other hand, if the data indicates geo-referencing there are other specialised formats to represent this information. Below are some of the most commonly used data types and formats.
When designing an open data policy, it is recommended to focus on the user, consult the demand for data and based on this prioritise the data to be published. When developing an open data opening plan, it is necessary that those who publish the data, be it the academy or the public sector, analyse and understand which are the sets that we can consider of high value or of greater relevance, in order to prioritise their publication according to certain factors, such as their value for user communities or to promote public participation. Other elements that may arise in cases of national or international contingency should also be considered, such as data on emergencies or natural disasters, epidemics or cases of corruption, which need to be published quickly.
Data expedition! Follow the link