Data on the web

Written by Published in Data
Data on the web

The amount of data available on the web is eye watering; where do you even start to understand it, let alone access it? Luckily we have Alejandra Garcia Rojas who walks us through a brief history of the web, explains big data and provides a multitude of links for us to explore further.

The history of the internet is relatively recent. Starting in1958 with the first modem, the internet we’re more familiar with only really came to be in1989. Data, on the other hand, has been around forever. Data comes in all shapes, sizes, measures, etc. It can be as informal as reviewing the weekly flyer to as formal as an annual national census. Now with the internet we seem to have an infinite expanse of data available.

The Growth of the WWW

When you start to think about the growth of the internet, the amount of available data doesn’t come as a surprise. The online world is growing at exponential rates, in 1996 there were 45 million online users and 250,000 sites and in 2016 it has ballooned to 3 billion users and 800 million sites. If we look at how much the internet is being used, we’re blown away again. Global internet traffic has grown from 100GBps in 1992 to 16,144GBps in 2014. That growth in traffic is expected to more than triple by 2019 to an estimated 51,794 GBps. Just think about how many points of data that will create.

How is Big Data being used?

With so much data available and more being created every minute, who is using it all? Alejandra breaks this down to the biggest general users:

Consumer Business

  • Customer experience
  • Brand perception
  • Target segment identification

Producers

  • Demand Forecast
  • Supply Chain
  • Product Design

Government and Financial Services

  • Risk management
  • Fraud detection

Health Sciences

  • Research
  • Real time data
  • Health Care

What is Open Data?

What exactly is open data? Let’s look at a definition of Open Data: “Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)”. However, there’s the aim to take that even further. In his piece called Open Data: The Next Phase in the Technology Revolution Casey Coleman pushes for Open Government Data, which means ”Publicly available data structured in a way that enables the data to be fully discoverable and useable by end users.” The benefits being:  transparency, public service improvement, economic and Social Value. Ultimately Open Data means Free Data for the general public since with access to more data we can make better informed decisions.

I want some data, where do I get it?

Ready to get your hands on some of this wonderful information? Alejandra has provided some links and sources to start you on your data hunt:

Open Data Issues

Unsurprisingly, there are issues associated with open data. Unlike the information you’ve collected through formal or informal research, open data has been collected by someone else. Here are some of the top issues with open data:

  • Source of origin, trust and privacy
  • Timeliness, relevancy, completeness, sufficiency
  • Usage rights like acquiring licenses for creative content
    • Public domain (CC0, PDDL)
    • Attribution (CC-by ODC-by)
    • Attribution & share-alike (CC-by-sa, ODnL)
  • Reusability

How can we improve on these open data issues? One solution presented by Tim Berners-Lee (the inventor of the Web and Linked Data initiator) is based on a 5 Star system. 5 Star Data is a way of classifying data based on how usable the information is for someone else. Below are the proposed 5 Star system:

★ make your stuff available on the Web (whatever format) under an open license

★★ make it available as structured data (e.g., Excel instead of image scan of a table)

★★★ make it available in a non-proprietary open format (e.g., CSV as well as of Excel)

★★★★ use URIs to denote things, so that people can point at your stuff

★★★★★ link your data to other data to provide context

Beyond the availability of data and the quality of the information, we can take it one step further and look at Linked [Open] Data. This refers to how to improve the connection between data and users using cloud computing and other channels.

The world of data is an exciting place and continually growing. With the help of Alejandra we have a better understanding of what is open data and where we can begin to access the wealth of information available.