Scotland’s Covid-19 Open Data

We are in unprecedented times. People are trying to make sense of what is going on around them and the demands for up to date, even up-to-the-minute,  information is as never before. Journalists, data scientists, immunologists, epidemiologists and others are looking for data to use to develop that information for the broader public, as well as to feed into predictive modelling. That means that governments and Health Services at all levels (UK and Scotland) need to be publishing that data quickly, consistently, and in a way that makes it easy for the data users to consume it. They need to look at best practice and quickly adopt those standards and approaches.

Let’s start with what this post is not. It is not a criticism of some very hard pressed people in NHS Scotland and Scottish Government who are trying very hard to do the right thing.

So, what is it? It is an honest suggestion of how the Scottish Government must adapt in how it publishes data on the most pressing issue of modern times.

The last five days

Last Sunday, 15th March, as the number of people in Scotland with Covid-19 started to climb in Scotland (even if numbers were still low in comparison to other EU countries) I went looking for open data on which I could start to plan some analysis and visualisation. And I found none.

What I did find was a static HTML webpage. This had the figures for that day:  the  total number of tests conducted, the total number of negative results, and the number of positive cases for each Health Board. This page is then overwritten at 2pm the next day. This is an awful practice, also used by Scotrail to hide its performance month on month.

I was able, using the Internet Wayback machine, to fill in some gaps back to 5th March but that was far from complete. I published what I could on GitHub and mentioned that on Twitter and in a couple of Slack Groups. Thankfully a friend, Lesley, was ahead of me in terms of data collection for her work as a data journalist, and was able to furnish testing data back to the start on 24 January 2020. Since then I’ve updated the GitHub repo daily – usually when the data is published at 2pm.

Almost immediately I began, a couple of people started to build visualisations based on what I had put in GitHub including this one. Some said that they were waiting for the numbers to climb to more significant levels, particularly deaths before they would start to use the data.

Two or three times the data has been published then corrected with some test results for Shetland / Grampian being reassigned between the two. This is understandable given the current circumstances.

SG webpage with table of Covid19 daily cases
SG webpage with table of Covid19 daily cases

On 19 March 2020, the 2pm publication was delayed, with the number of fatalities, and positive results being published after 3.30pm and the total number of tests being published after 7pm. Again – this is undertandable. The present circumstances are unprecedented, process are being developed. Up to now much of Scotland’s open data publication has been done, if at all, at a more leisurely and considered pace. It does make one wonder how, as the numbers rise exponentially, as they surely will, how the processes will cope.

Why is this important?

At this time the public are trying to make sense of a very difficult situation. Journalists, scientists and others are trying to assist in that by interpreting what data there is for them, including building visualisations of that. People are also seeking reassurances – that the UK and Scottish Government are on top of the situation. Transparency around government activity such as testing, and the spread of the virus, would build trust. Indeed there is real concern that Scotland, and the UK as a whole, is not meeting WHO guidance on testing and tracing cases.

But with a static web page, with limited range of data that is erased daily, this is not possible. Even setting up a scraper to grab the essential content from that page is not feasible if the data is only partially published for long periods.

We have some useful data visualisations such as this set by Lesley herself. What can be done is limited. Deaths per health board are are collected, we’ve been told, but they are not published – only a Scotland-level total.

I’ve had it confirmed by someone I know in the Scottish Government that they are looking at creating and posting Linked Open Data which I suspect will be on their platform, which is a great resource but which is seen by many as a barrier to actually getting data quickly and simply.

Italian government GitHub repo
Italian government GitHub repo

Compare this with the Italian Government who have won plaudits from the data science, journalism and developer communities for making their data available quickly and simply using GitHub  as the platform. This is one that is familiar to the end-users. They also have a great range of background information (look at it in Chrome which will translate it). On that platform they publish daily national and regional statistics for

  • date
  • state
  • hospitalised with symptoms
  • intensive care
  • total hospitalised
  • home isolation
  • total currently positive
  • new currently positive
  • discharged healed
  • deceased
  • total cases
  • swabs tests.

Not only is the data feeding the larger, world-wide analysis such as that by Johns Hopkins University, but people at a national level are using that data to create some compelling, interactive visualisations such as this one. As each country starts to recover and infections and deaths start to slow, having ways o visualising that depends on data to drive those views.

[edited] Wouldn’t a dashboard such as this one for Singapore, built by volunteers, be a good thing for Scotland? We could do it with the right data supplied.

Singapore dashboard
Singapore dashboard

[/edited]

So, this is a suggestion, or rather a request, to NHS Scotland and the Scottish Government to put in place a better set of published data, which is made available in as simple and as timely a fashion as can be accomplished under the present circumstances. Give us the data and we’ll crowd-source some useful tools built on it.

How to do that?

The Scottish Government should look to fork one of the current repositories and using that as a starting point. In an ideal world that would be the Italian one – but even starting with my simple one (if the former is too much) would be a step forward.

Also, I would encourage the government to get involved in the conversations that are already happening – here for example in the Scottish Open Data group.

There is a large and growing community there, composed of open data practitioners, enthusiasts and consumers, across many disciplines, who can help and are willing to support the government’s work in this area.

SODU2020 – a guest post by Sarah Roberts of Swirrl

Scottish Open Data Unconference

It’s all going on in Scotland in March. As we spring into Spring (nearly there!), we’re very excited to be sponsoring, and going, to the Scottish Open Data Unconference in Aberdeen on 14th and 15th March. Topics are pitched in the morning of each day, an agenda is created and participants talk as much as the chair. 

Our colleague Jamie Whyte is lucky enough to have a ticket, so if you spot him do say hi! Here are some recent open data happenings we’ve picked up on our radar…

Scottish Index of Multiple Deprivation

The Scottish Index of Multiple Deprivation was released late January and we loved the accompanying briefing document, which put the numbers into context (find it here). The data’s also available on the Scottish Government’s Open Data site, where you can use the Atlas section to find key data zones and see key facts about them. The below screenshot is of the data zone which is ranked as the most deprived in the 2020 SIMD.

SIMD - Greenock Town centre
SIMD – Greenock Town centre

People are already making stuff with the data — below is a screenshot of Jamie’s lava lamp visualisation of the data

Commentary, explanation and analysis from others include: Alasdair Rae’s summary matrix of the SIMD data by council area, a story graphic of the data, an interactive mapping tool, an analysis blog post from Scottish parliament information centre and news articles, like this one from the BBC.

Jamie Whyte - King of the Lava Lamp
Jamie Whyte – King of the Lava Lamp image

W3C Community Group

Another thing we’ve noticed is that there’s preliminary work happening on GraphQL and RDF, which aims to serve as a case for future standardisation. More on this here, where you can send a request to join the group if this is your bag. It’s definitely ours! 

Collaborative work with data

Last, but not least, collaboration. This is a wide concept but it’s also a trend that’s cropping up in different aspects of working with open data. Here are some we’ve noticed:

“promote trust and co-operation between government and civil society.”

  • The Office for National Statistics is publishing data in a collaborative project across a spread of organisations including ONS, HMRC, MHCLG, DWP and DIT. The Connected Open Government Statistics (COGS) project involves a lot of technical collaborative work in harmonising codelists, as well as harmonising a data model and all the processes that go into it. More on this project here on the GSS blog site. 
  • 2019 saw a growing, collaborative API community, with API events involving government and people working with government. We went to one in Newcastle and another one’s arranged for March 16th (if you’re still hungry for more after the unconference!) 
  • The Open Data Institute have been busy, busy, busy. Jeni Tennison spoke about the idea of how collaboration is key for new institutions of the data age, at our Power of Data conference in October (catch that video here). The ODI have also been working on a data and public services toolkit & there’s an introductory event to this in Edinburgh just a few days before the Scottish Open Data Unconference. 

Thanks for reading! If you’d like to find out a bit more about who we are and what we do, take a look at our website, our blog, our latest newsletter and / or our twitter stream. We’ve just been named as one of the FT1000 fastest growing companies in Europe and we’re still hiring, so if you think you can help us we’d love to hear from you. 

We love data and we’re delighted to be sponsoring the Scottish Open Data Unconference. See you there.