Scotland’s Covid-19 Open Data

We are in unprecedented times. People are trying to make sense of what is going on around them and the demands for up to date, even up-to-the-minute,  information is as never before. Journalists, data scientists, immunologists, epidemiologists and others are looking for data to use to develop that information for the broader public, as well as to feed into predictive modelling. That means that governments and Health Services at all levels (UK and Scotland) need to be publishing that data quickly, consistently, and in a way that makes it easy for the data users to consume it. They need to look at best practice and quickly adopt those standards and approaches.

Let’s start with what this post is not. It is not a criticism of some very hard pressed people in NHS Scotland and Scottish Government who are trying very hard to do the right thing.

So, what is it? It is an honest suggestion of how the Scottish Government must adapt in how it publishes data on the most pressing issue of modern times.

The last five days

Last Sunday, 15th March, as the number of people in Scotland with Covid-19 started to climb in Scotland (even if numbers were still low in comparison to other EU countries) I went looking for open data on which I could start to plan some analysis and visualisation. And I found none.

What I did find was a static HTML webpage. This had the figures for that day:  the  total number of tests conducted, the total number of negative results, and the number of positive cases for each Health Board. This page is then overwritten at 2pm the next day. This is an awful practice, also used by Scotrail to hide its performance month on month.

I was able, using the Internet Wayback machine, to fill in some gaps back to 5th March but that was far from complete. I published what I could on GitHub and mentioned that on Twitter and in a couple of Slack Groups. Thankfully a friend, Lesley, was ahead of me in terms of data collection for her work as a data journalist, and was able to furnish testing data back to the start on 24 January 2020. Since then I’ve updated the GitHub repo daily – usually when the data is published at 2pm.

Almost immediately I began, a couple of people started to build visualisations based on what I had put in GitHub including this one. Some said that they were waiting for the numbers to climb to more significant levels, particularly deaths before they would start to use the data.

Two or three times the data has been published then corrected with some test results for Shetland / Grampian being reassigned between the two. This is understandable given the current circumstances.

SG webpage with table of Covid19 daily cases
SG webpage with table of Covid19 daily cases

On 19 March 2020, the 2pm publication was delayed, with the number of fatalities, and positive results being published after 3.30pm and the total number of tests being published after 7pm. Again – this is undertandable. The present circumstances are unprecedented, process are being developed. Up to now much of Scotland’s open data publication has been done, if at all, at a more leisurely and considered pace. It does make one wonder how, as the numbers rise exponentially, as they surely will, how the processes will cope.

Why is this important?

At this time the public are trying to make sense of a very difficult situation. Journalists, scientists and others are trying to assist in that by interpreting what data there is for them, including building visualisations of that. People are also seeking reassurances – that the UK and Scottish Government are on top of the situation. Transparency around government activity such as testing, and the spread of the virus, would build trust. Indeed there is real concern that Scotland, and the UK as a whole, is not meeting WHO guidance on testing and tracing cases.

But with a static web page, with limited range of data that is erased daily, this is not possible. Even setting up a scraper to grab the essential content from that page is not feasible if the data is only partially published for long periods.

We have some useful data visualisations such as this set by Lesley herself. What can be done is limited. Deaths per health board are are collected, we’ve been told, but they are not published – only a Scotland-level total.

I’ve had it confirmed by someone I know in the Scottish Government that they are looking at creating and posting Linked Open Data which I suspect will be on their platform, which is a great resource but which is seen by many as a barrier to actually getting data quickly and simply.

Italian government GitHub repo
Italian government GitHub repo

Compare this with the Italian Government who have won plaudits from the data science, journalism and developer communities for making their data available quickly and simply using GitHub  as the platform. This is one that is familiar to the end-users. They also have a great range of background information (look at it in Chrome which will translate it). On that platform they publish daily national and regional statistics for

  • date
  • state
  • hospitalised with symptoms
  • intensive care
  • total hospitalised
  • home isolation
  • total currently positive
  • new currently positive
  • discharged healed
  • deceased
  • total cases
  • swabs tests.

Not only is the data feeding the larger, world-wide analysis such as that by Johns Hopkins University, but people at a national level are using that data to create some compelling, interactive visualisations such as this one. As each country starts to recover and infections and deaths start to slow, having ways o visualising that depends on data to drive those views.

[edited] Wouldn’t a dashboard such as this one for Singapore, built by volunteers, be a good thing for Scotland? We could do it with the right data supplied.

Singapore dashboard
Singapore dashboard

[/edited]

So, this is a suggestion, or rather a request, to NHS Scotland and the Scottish Government to put in place a better set of published data, which is made available in as simple and as timely a fashion as can be accomplished under the present circumstances. Give us the data and we’ll crowd-source some useful tools built on it.

How to do that?

The Scottish Government should look to fork one of the current repositories and using that as a starting point. In an ideal world that would be the Italian one – but even starting with my simple one (if the former is too much) would be a step forward.

Also, I would encourage the government to get involved in the conversations that are already happening – here for example in the Scottish Open Data group.

There is a large and growing community there, composed of open data practitioners, enthusiasts and consumers, across many disciplines, who can help and are willing to support the government’s work in this area.

Aberdeen Air Quality

Update: A write-up of this event which took place on 16-17th February 2019 is available on this page.

How much do you care about the quality of the air you breathe as you walk to work or university, take the kids to school, cycle or jog, or open your bedroom window?

How good is the air you are breathing? How do you know? What are the levels of particulates (PM2.5 or PM10) and why is this important?

pm25_comparison
pm25_comparison

When do these levels go up or down? What does that mean?

Who warns you? Where do they get their data, and how good is it?

Where do you get information, or alerts that you can trust?

We aim to sort this in Aberdeen

Partnering with community groups, Aberdeen University and 57 North Hacklab, we are working on a longterm project to build and deploy community-built, and hosted, sensors for PM2.5 and PM10. We aim to have fifty of these in place in the next few months, across Aberdeen. You can see some early ones in place and generating data here.

The first significant milestone of this will be the community workshop we are holding on 16-17 February 2019. If you want to be part of it, you can get a ticket here. But, be quick; they are going quickly.

Weekend activities

There are loads of things you can do if you attend.

Sensor Building

For a small cost, you can come along and build your own sensor with someone to help you, and take it home to plug into your home wifi. It will then contribute data for your part of the city.

But we will be doing much more than that.

Working with the data

If you have experience in data science or data analysis, or if you want to work with those who do, there are loads of options to work with the data from existing and future sensors.

These include

  • Allow historical reading to be analysed against the official government sensors for comparison
  • Use the data; wind speed, humidity… to build live maps of readings to identify sources of emissions.
  • Compensate readings from sensors against factors which affect pollution levels to attempt to understand the emissions of pollutants in a given area.
  • Build predictive models of future pollution
  • Fix a minor issue with the existing data Collected Data (see https://github.com/opendata-stuttgart/madavi-api/issues/8 )
  • Build an API for the access of the Luftdaten sensor data to allow querying of the sensor data

Software development

If you are a software developer or studying to be one, you could

  • Create alerts systems to warn of anticipated spikes in pollutants, perhaps using Twitter, or email.
  • Add to the code for the Luftdaten sensors to allow connection over LoRaWAN interface.
  • Create LoRaWAN server code to allow sensors to feed up to the Luftdaten website.
  • Security testing of the IoT Code used by the Luftdaten sensors.

Community Groups / Educators / Activists / Journalists

You don’t have to be a techie! If you are a concerned citizen, and community activist, a teacher, or a journalist there is so much you could do. For example:

  • How can you understand the data?
  • Identify how this could assist with local issues, campaigns, educational activities.
  • Help us capture the weekend by blogging, or creating digital content

Even if you just want to be part of the buzz and keep the coffees and teas flowing, that is a great contribution.

See you there!

Ian, Bruce, Andrew and Steve

Header image by Jaroslav Devia on Unsplash

CTC9 – Team Presentations

In this close-out post I shall hand over to the teams themselves to walk you through their CTC9 weekend. Check out the videos using the links below. Use the ‘ctc9’ tag to find all other blog posts about the amazing volunteering experience this weekend.

Team: Soul Cats

Team: The Professionals

Team: ALISS API

CTC9 – What a weekend!

I am so glad I joined the CTC9 project as a volunteer. Blogging about this project was a tremendous experience. There are two aspects of this weekend that amazed me beyond the teams’ achievements.

The idea funnel

It was fascinating to witness the journey we all ventured on – from random ideas on post-its to distilling them down into structured approaches.

ideation
ideas ideas ideas
planning
how things fit together

Team work

The teams seemed to develop naturally based on people’s interests. It is remarkable how smoothly people from different sectors and backgrounds worked together in a very productive way. The Code the City staff did a great job in keeping us all on track.

team work

CTC9 – Near the finish line

Here’s a quick update before the big show-and-tell later on.

Team: ALISS API database

The team has developed a draft version of the website tucked away on a test server. They have established the first functional search using the category ‘social isolation’. It returns a list of service providers in the area that is drawn from the three source databases. This is a big step forward, as we now know how to program a search and are able to deliver visible results on a user interface.

The team is also working on searches based on location by postcode or radius.

One expected challenge is the extraction of information from differently formatted data sources. For example, one source database does not provide contact details in dedicated address fields but in a more general description box.

Team: Soul Cats

This group went back to focusing on the public end users. They came up with various names for this new website that make it easy to find. They played with words from Scots dialect and proper King’s English. All suggestions were googled to see whether they exist already or are buried in amongst a ton of other results. Ideally, we want something unique!

The team suggested to submit a selection of words to a public forum in order to collect opinions or votes.

Team: The Professionals

The Professionals are a spin-off group from the Soul Cats. It’s a rollercoaster with those Cats! They went back to focusing on the value this website for health care professionals. In a structured approach they answered 4 key questions:

  1. Who are key stakeholders?
  2. What are key relationships?
  3. What are key challenges?
  4. What are the gains right now if this project went live?

team-gathering