Scotland’s Covid-19 Open Data

We are in unprecedented times. People are trying to make sense of what is going on around them and the demands for up to date, even up-to-the-minute,  information is as never before. Journalists, data scientists, immunologists, epidemiologists and others are looking for data to use to develop that information for the broader public, as well as to feed into predictive modelling. That means that governments and Health Services at all levels (UK and Scotland) need to be publishing that data quickly, consistently, and in a way that makes it easy for the data users to consume it. They need to look at best practice and quickly adopt those standards and approaches.

Let’s start with what this post is not. It is not a criticism of some very hard pressed people in NHS Scotland and Scottish Government who are trying very hard to do the right thing.

So, what is it? It is an honest suggestion of how the Scottish Government must adapt in how it publishes data on the most pressing issue of modern times.

The last five days

Last Sunday, 15th March, as the number of people in Scotland with Covid-19 started to climb in Scotland (even if numbers were still low in comparison to other EU countries) I went looking for open data on which I could start to plan some analysis and visualisation. And I found none.

What I did find was a static HTML webpage. This had the figures for that day:  the  total number of tests conducted, the total number of negative results, and the number of positive cases for each Health Board. This page is then overwritten at 2pm the next day. This is an awful practice, also used by Scotrail to hide its performance month on month.

I was able, using the Internet Wayback machine, to fill in some gaps back to 5th March but that was far from complete. I published what I could on GitHub and mentioned that on Twitter and in a couple of Slack Groups. Thankfully a friend, Lesley, was ahead of me in terms of data collection for her work as a data journalist, and was able to furnish testing data back to the start on 24 January 2020. Since then I’ve updated the GitHub repo daily – usually when the data is published at 2pm.

Almost immediately I began, a couple of people started to build visualisations based on what I had put in GitHub including this one. Some said that they were waiting for the numbers to climb to more significant levels, particularly deaths before they would start to use the data.

Two or three times the data has been published then corrected with some test results for Shetland / Grampian being reassigned between the two. This is understandable given the current circumstances.

SG webpage with table of Covid19 daily cases
SG webpage with table of Covid19 daily cases

On 19 March 2020, the 2pm publication was delayed, with the number of fatalities, and positive results being published after 3.30pm and the total number of tests being published after 7pm. Again – this is undertandable. The present circumstances are unprecedented, process are being developed. Up to now much of Scotland’s open data publication has been done, if at all, at a more leisurely and considered pace. It does make one wonder how, as the numbers rise exponentially, as they surely will, how the processes will cope.

Why is this important?

At this time the public are trying to make sense of a very difficult situation. Journalists, scientists and others are trying to assist in that by interpreting what data there is for them, including building visualisations of that. People are also seeking reassurances – that the UK and Scottish Government are on top of the situation. Transparency around government activity such as testing, and the spread of the virus, would build trust. Indeed there is real concern that Scotland, and the UK as a whole, is not meeting WHO guidance on testing and tracing cases.

But with a static web page, with limited range of data that is erased daily, this is not possible. Even setting up a scraper to grab the essential content from that page is not feasible if the data is only partially published for long periods.

We have some useful data visualisations such as this set by Lesley herself. What can be done is limited. Deaths per health board are are collected, we’ve been told, but they are not published – only a Scotland-level total.

I’ve had it confirmed by someone I know in the Scottish Government that they are looking at creating and posting Linked Open Data which I suspect will be on their platform, which is a great resource but which is seen by many as a barrier to actually getting data quickly and simply.

Italian government GitHub repo
Italian government GitHub repo

Compare this with the Italian Government who have won plaudits from the data science, journalism and developer communities for making their data available quickly and simply using GitHub  as the platform. This is one that is familiar to the end-users. They also have a great range of background information (look at it in Chrome which will translate it). On that platform they publish daily national and regional statistics for

  • date
  • state
  • hospitalised with symptoms
  • intensive care
  • total hospitalised
  • home isolation
  • total currently positive
  • new currently positive
  • discharged healed
  • deceased
  • total cases
  • swabs tests.

Not only is the data feeding the larger, world-wide analysis such as that by Johns Hopkins University, but people at a national level are using that data to create some compelling, interactive visualisations such as this one. As each country starts to recover and infections and deaths start to slow, having ways o visualising that depends on data to drive those views.

[edited] Wouldn’t a dashboard such as this one for Singapore, built by volunteers, be a good thing for Scotland? We could do it with the right data supplied.

Singapore dashboard
Singapore dashboard

[/edited]

So, this is a suggestion, or rather a request, to NHS Scotland and the Scottish Government to put in place a better set of published data, which is made available in as simple and as timely a fashion as can be accomplished under the present circumstances. Give us the data and we’ll crowd-source some useful tools built on it.

How to do that?

The Scottish Government should look to fork one of the current repositories and using that as a starting point. In an ideal world that would be the Italian one – but even starting with my simple one (if the former is too much) would be a step forward.

Also, I would encourage the government to get involved in the conversations that are already happening – here for example in the Scottish Open Data group.

There is a large and growing community there, composed of open data practitioners, enthusiasts and consumers, across many disciplines, who can help and are willing to support the government’s work in this area.