Joining the dots between Britain’s historical railways using Wikidata – Part One

A bit of background

The evening before Code The City 18 I started to think about what fun project to spend the day doing at our one day mini-hack event. After reading Ian Watt’s blogpost about Wikidata and spending 10 minutes or so playing around with it, I decided a topic for further experimentation was required.

At the time of writing, I’m just over a third of the way through my very interesting part-time online MA Railway Studies at University of York. Looking at Britain’s railways from their very beginning, there are many railway companies from 1821 onwards. Some of these companies merged, some were taken over, others just disappeared whilst others were replaced by new companies. All these amalgamations eventually led to the “Big Four” groupings in 1923 and then on to British Railways in 1948’s railway nationalisation. British Railways rebranded as British Rail in 1965 and then splintered into numerous companies as a result of the denationalisation of the 1990s.

With the railway companies appearing in some form or another in Wikipedia, I thought it would be useful to be able to pick any railway company and view the chain of companies that led to it and those that followed. The ultimate goal would be to be able to bring up the data for British Rail and then see the whole past unfold to the left and the future unravel to the right. In theory at least, Wikidata should allow me to do that. 

No software coding skills are required to see the results of my experimentation: by clicking on the links provided (usually directly after the code) it is possible to run the queries and see what happens. However, using the code provided as a start, it is possible to build on the examples to find out things for yourself.

Understanding Wikidata and SPARQL

SPARQL is the query language used to retrieve various data sets from Wikidata via the Wikidata Query Service.

As is always the case with anything software related, the examples and tutorials never seem to handle those edge cases that you seem to hit within the first 5 minutes. Maybe I hit these cases so soon due to jumping straight from the “hello world” of requesting all the railway companies formed in the UK to trying to build the more complex web of railway companies rather than working my way through all the simpler steps? However, my belief is to fail quickly, leaving plenty of time left to fail some more before succeeding, after all you never see a young child plan out a strategy when they are learning to get the different shaped blocks through the correct holes.

At the time of writing…

Comments about the state of certain items of data were relevant at the time I wrote this article. As one of the big features of Wikidata is it constantly being updated, expanded and corrected, the data referenced may have changed by the time you read this. Some of the changes are those I’ve made in reaction to my discoveries, but I have left some out there for others to fix.

A simple list

First off, I created a simple SPARQL query to request all the railway companies that were formed in the UK.

SELECT ?company ?companyLabel
WHERE {
?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
SERVICE wikibase:label {
bd:serviceParam wikibase:language “en”.
}
}
ORDER BY (lcase(?companyLabel))

Run the query

The output of this query can be seen by running it yourself here by clicking on the white-on-blue arrow displayed on the Wikidata Query Service console. It is safe to modify the query in the console without messing up my query as any changes cause a new bookmarked query to be created. So please experiment as that’s the only way to learn.

Now what does the query mean and where do all those magic numbers come from?

  • wdt:P31 means get me all Wikidata triples (wdt) that have the property instance of  (P31) that is has a value of railway company (Q249556).
  • wdt:P17 means get me all of the results so far that have the property country (P17) set to United Kingdom (Q145).

Where did I get those numbers from? First, I went to Wikipedia and searched for a railway company, LMS Railway, and got to the page for London, Midland and Scottish Railway. From here I went to the Wikidata item for the page.

Screen grab of Wikipedia page for LMSR that shows how to get to the Wikidata
Wikipedia page for LMSR that shows how to get to the Wikidata

From here I hovered my pointer over instance ofrailway companycountry and United Kingdom to find out those magic numbers.

Screen grab of the Wikidata page for LMSR
Wikidata page for LMSR

Some unexpected results

Some unexpected companies turned up in the results list due to my query not being specific enough. For example, Algeciras Gibraltar Railway Company, located in Gibraltar but with headquarters registered in the UK the data has its country as United Kingdom. To filter my results down to just those that are located in the UK I tried searching for those that had the located in the administrative territorial entity (P131) with any of the following values:

  • England (Q21) 
  • Northern Ireland (Q26)
  • Scotland (Q22)
  • Wales (Q25)
  • Ireland (Q57695350) (covering 1801 – 1922)

using this query:

SELECT ?company ?companyLabel ?countryLabel
WHERE {
  VALUES ?country { wd:Q21 wd:Q26 wd:Q22 wd:Q25 wd:Q57695350 }
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145; wdt:P131 ?country.
  SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
}
}

ORDER BY (lcase(?companyLabel))

Run the query

However, that dropped my result set from 228 to 25 due to not all the companies having that property set.

Note: When trying to find out what values to use it is often quick and easy to run a simple query to ask Wikidata itself. To find out what all the values were for UK countries I wrote the following that asked for all countries that had an instance of value of country within the United Kingdom (Q3336843):

select ?country ?countryLabel
WHERE {
  ?country wdt:P31 wd:Q3336843 .
  SERVICE wikibase:label {
 bd:serviceParam wikibase:language "en".
  }
}

Run the query

Dates

In order to see what other information could easily be displayed for the companies, I looked at the list of properties on the London, Midland and Scottish Railway. I saw several dates listed so decided that would be my next area of investigation. There is an inception (P571) date that shows when something came into being, so I tried a query with that:

SELECT ?company ?companyLabel ?inception
WHERE {
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  ?company wdt:P571 ?inception 
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en".
  }
}
ORDER BY (lcase(?companyLabel))

Run the query.

This demonstrated two big issues with data. Firstly, the result set had dropped from 228 to 106 indicating that not all the company entries have the inception property set. The second was that only one, Scottish North Eastern Railway, had a full date (29th July 1856) specified, the rest only had a year and that was being displayed as 1st January for the year. Adding the OPTIONAL clause to the inception request returns the full data set with blanks where there is no inception date specified.

SELECT ?company ?companyLabel ?inception
WHERE {
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  OPTIONAL { ?company wdt:P571 ?inception. }
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en".
  }
}
ORDER BY (lcase(?companyLabel))

Run the query

Railway companies are not a straightforward case when it comes to a start date due to there being no one single start date. Each railway company required an Act of Parliament to officially enable it to be formed and grant permission to build the railway line(s). This raises the question: is it the date that Act was passed, the date the company was actually formed or the date that the company commenced operating their service that should be used for the start date? Here is a revised query that gets both the start time (P580) and end time(P582) of the company if they have been set:

SELECT ?company ?companyLabel ?inception ?startTime ?endTime
WHERE {
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  OPTIONAL { ?company wdt:P571 ?inception. }
  OPTIONAL { ?company wdt:P580 ?startTime. }
  OPTIONAL { ?company wdt:P582 ?endTime. }
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en".
  }
}
ORDER BY (lcase(?companyLabel))

Run the query

Unfortunately, of the 228 results only one, London, Midland and Scottish Railway, has a startTime and endTime, and London and North Eastern Railway is the only with endTime. Based on these results it looks like that startTime and endTime are not generally used for railway companies. Looking through the data for Scottish North Eastern Railway did turn up a new source of end dates in the form of the dissolved, abolished or demolished (P576) property. Adding a search for this resulted in 9 companies with dissolved dates. 

SELECT ?company ?companyLabel ?inception ?startTime ?endTime ?dissolved
WHERE {
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  OPTIONAL { ?company wdt:P571 ?inception. }
  OPTIONAL { ?company wdt:P580 ?startTime. }
  OPTIONAL { ?company wdt:P582 ?endTime. }
  OPTIONAL { ?company wdt:P576 ?dissolved. }
  SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
  }
}
ORDER BY (lcase(?companyLabel))

Run the query

There is no logic in which companies have this property: they range from Scottish North Eastern Railway dissolving on 10th August 1866 to several that ended due to the formation of British Railways, the more recent British Rail ending on 1st January 2001, and the short lived National Express East Coast (1st January 2007 – 1st January 2009). However, once again, the dates are at times misleading as, in the case of National Express East Coast, it is only the year rather than full date in the inception and dissolved, abolished or demolishedproperties.

Some of the railway companies, such as Underground Electric Railways Company of London, have another source of dates and that is as part of the railway company value for their instance of. It is possible to extract the start and end dates if they are present by making use of nested conditional queries. In the line:

OPTIONAL {?company p:P31 [ pq:P580 ?companyStart]. }

the startTime property is extracted from the instance of property if it exists.

SELECT ?company ?companyLabel ?inception ?startTime ?endTime ?dissolved ?companyStart ?companyEnd
WHERE {
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  OPTIONAL { ?company wdt:P571 ?inception. }
  OPTIONAL { ?company wdt:P580 ?startTime. }
  OPTIONAL { ?company wdt:P582 ?endTime. }
  OPTIONAL { ?company wdt:P576 ?dissolved. }
  OPTIONAL { ?company p:P31 [ pq:P580 ?companyStart] . }
  OPTIONAL { ?company p:P31 [ pq:P582 ?companyEnd] . }
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en".
  }
}
ORDER BY (lcase(?companyLabel))

Run the query

Another date that can be used to work out the start and end of the companies can be found hanging off the values of very useful pair of properties: replaced by (P1366) and replaces (P1365). This conveniently connects into the next part of my exploration that will follow in Part Two. Although, as with many railway related things, the exact time of arrival of part two cannot be confirmed.

[Header photograph taken by Andrew Sage]

Aberdeen Plaques – Part One

On Saturday 14th December 2019 we ran a one-day mini hack event. The idea behind it was for people to come along for a day to work on their side projects and, if they needed support, attempt to persuade others to assist them.

That’s what I did with my Aberdeen Plaques project: something I’d had on the back burner for more than a year.

Why do it?

The commemorative plaques which are dotted around the city are a perfect candidate for open data. They have a subject, usually some dates, are located somewhere, and are of different types etc. Making that all available as open data would open up a whole range of possibilities.

Some Aberdeen plaques
Some Aberdeen plaques

If we captured all of that well then we could do analysis on the data (ratio of women to men, most represented professions), create walking routes (maybe one for the arts, one for the sciences and so on), create timelines to see what periods are more represented.

Having recently trained as a WikiMedia UK trainer – and having experimented with some of the tools (Wiki Commons, Wiki Data, Wikipedia, Histropedia) I was convinced that these were the right way to go.

Pre-event prep

So, in advance of the hack day I’d done a bit of prep in the two weeks running up to the day iteself.

I’d created a spreadheet which recorded the
* subject (person or ‘thing’)
* Gender if known
* the link to the now-retired city council plaques system (hidden from public view)
* The location if known
* The geo coordinates (to be determined)
* Whether the subject had a Wikipedia page (tbd)
* Whether there was an image of the plaque on Wiki Commons (tbd)
* Whether the subject of the plaque was represented on Wiki Data (tbd)
* Any identifiers on Open Plaques (tbd)
* Any external links (eg to Flickr for photos)

I’d then populated some of the data (eg whether there were images of the plaque on Wiki Commons) as well as some other bits. But most cells were blank.

Pre-event spreadsheet
Pre-event spreadsheet

As a keen walker and photographer I had also photographed and uploaded seventeen plaque images to Wiki Commons in the lead up, so that we would have some images to work with.

How to use our time most effectively on the day?

Our aim for the day was then to find out what data / info / images existed, fill in the gaps, and explore how to use WikiData to store and retrieve data, and how we could potentially create maps, timelines and similiar new products.

What we did on the day

At the start of the event we pitched our project ideas, and I managed to persude five others (Angela, Mike, Stephen, James and Steve) to join me in working on the plaques project.

Angela and Mike, and later Angela and Stephen would go out and take photographs. Steve, James and I would work on the data capture, completing research on what existed, creating new entries for the data on Wiki Data, and testing queries on the Wiki Data query service.

How we did it

We used the spreadsheet that I had set up to capture all of the data we’d gathered – and as it eveolved it would show progress as well as what was still lacking. We had no expectations that we would do it all on the day, but we could pick away at it in future weeks and months.

In the run-up to the event I’d discovered The Pingus’ album of plaques photographs on Flickr. Sadly these had not been published with a licence that would allow us to use them. I’d sent a request, a few days before CTC18, for them to change the licence for the Aberdeen plaques pictures to a CC-SA one. This would have allowed our republishing on Wiki Commons. Sadly it didn’t elicit a response. But the album did show that there were many more plaques than the old ACC system listed. And it was possible to get co-ordinates from them. So the number of plaques to deal with kept growing.

During the day James filled in loads of gaps in which subjects were on Wikipedia and which on Wikidata.

Steve and I experimented with capturing and querying the data. Structuring that in a way that aids recall through Wiki Data Query Service was an interative process. Firstly I tried adding a statement ‘commomorative plaque image’ (P1801) into the wikidata record for the subject as you can see in this first example https://www.wikidata.org/wiki/Q2095630. But that limited what we could do.

So, we discovered that we could create a new object which was an instance of commemorative plaque. Our first attempt was https://www.wikidata.org/wiki/Q78438703 and we evolved what we captured there – adding statement, and Steve discovered the ‘openPlaques plaque ID'(P1893). Incidentally we also tried ‘openplaques Subject ID’ (P1430) but adding that to the plaque object throws an error. The latter should be added to the person record not the plaque.

At the end of CTC18

We ended the day with

  • 138 plaques listed.
  • 57 sets of co-ordinates identified
  • 68 Wikipedia articles identified as matching plaque subjects (and eleven plaques subjects who had NO wikipedia page)
  • 36 Images in WikiCommons
  • 77 WikiData entries for the subject of the plaques (existing or created)
  • 11 new wikidata entries for the plaques themselves

This was a great leap forward in one day and would pave the way for future work.

What next?

Since CTC18 ended, I’ve got firmly stuck into this project over the xmas break. Over the last three weeks I have now photographed over a hundred plaques (plenty of walking) and have created wikidata entries for most plaques and also their subjects in wikidata.

I’ll cover all of that, and how we can now use the data in part two, coming soon.

We help kids in regeneration areas. What’s one of them?

At CTC we work with ONE Codebase to deliver Young City Coders classes. These are after school activities to encourage young people to get into coding by trying Scratch, Python and other languages in a Coder Dojo like environment.

Inoapps generously gave us some funding to cover costs and donated old laptops (as did the James Hutton Institute) which we cleaned up and recycled into machines they could use.

All of which is great – and we have 20-25 kids each session starting to get into these coding languages.

The Challenge

But there is an issue – the bulk of our kids are overwhelmingly from west-end schools. And we have an aim to help kids in regeneration areas where opportunities are generally fewer.

So, that means identifying Aberdeen schools that fall in the regeneration areas and contacting the head teacher and having a discussion about what help they would like to see us provide. Simple?

No.

Search for regeneration areas

Starting with the basics – what are the regeneration areas of Aberdeen? According to Google, the Aberdeen City Council website doesn’t tell us. Certainly not in the top five pages of results (and yes, I did go down that far).

Google’s top answer is from the Evening Express article which says that there are five regeneration areas: Middlefield, Woodside, Tillydrone, Torry and Seaton. From what I have heard that sounds like it might be about right – but surely there is an official source of this.

Further searching turns up a page from Pinacl Solutions who won a contract from ACC to provide wifi in the Northern regeneration areas of “Northfield, Middlefield, Woodside and Tillydrone.” Which raises the question of whether Northfield is or isn’t a sixth regeneration area.

The Citizens Advice Bureau Aberdeen has an article on support services for regeneration areas of “Cummings Park, Middlefield, Northfield, Seaton, Tillydrone, Torry, Woodside and Powis.” That adds two more to our list.

Other sites report there being an “Aberdeen City Centre regeneration area.” Is that a ninth?

Having a definitive and authoritative page from ACC would help. Going straight to their site and using the site’s own search function should help. I search for “regeneration areas” and then just “regeneration.”

ACC results for regeneration areas
ACC results for regeneration areas

I get two results: “Union Street Conservation Area Regeneration Scheme” and “Buy Back Scheme”. The latter page has not a single mention of regeneration despite the site throwing up a positive result. The former appears to be all about the built environment. So it is probably not a ninth one in the sense that the others are. Who knows?

So what are the regeneration areas – and how can I find which schools fall within them?

Community Planning Aberdeen

Someone suggested that I try the Community Planning Aberdeen site’. Its not having a site search wasn’t very helpful but using Google to restrict only results from that domain threw up a mass of PDFs.

After wading through half a dozen of these I could find no list or definition of the regeneration areas of the city are. Amending the query to a specific “five regeneration areas” or “eight….” didn’t work.

Trying “seven regeneration areas” did return this document with a line: “SHMU supports residents in the seven regeneration areas of the city.” So, if that is correct then it appears there are seven. What they are – and which of the eight (or nine) we’ve found so far is not included – is still unknown.

Wards, neighbourhoods, districts, areas, school catchment areas

And – do they map onto council wards or are they exact matches for other defined areas – such as neighbourhoods?

It turns out that there are 13 council wards in the city. I had to manually count them from this page. I got there via Google as search the ACC site for Council Wards doesn’t get you there.

I seem to remember there were 37(?) city neighbourhoods identified at one time. To find them I had to know that there were 37 as searching for “aberdeen neighbourhoods’ wasn’t specific enough to return any meaningful list or useful page.

And until we find our what the regeneration areas are, and we can work out which primary and secondary schools fall in those areas, we can’t do very much. Which means that the kids who would benefit from code clubs most don’t get our help.

I though this would be easy!

At the very minimum I could have used a web page with a list of regeneration areas and some jpg maps to show where they are. That’s not exactly hard to provide. And I’d make sure that the SEO was done in a way that it performed well on Google (oh and I’d sort the site’s own search). But that would do at a pinch. Sticking at that would miss so many opportunities, though.

Better would be a set of Shape Files or geojson (ideally presented on the almost empty open data platform) with polygons that I could download and overlay on a background map.

That done I could download a set of school boundaries (they do exist here – yay) and overlay those and workout the intersections between the two. Does the school boundary overlap a regeneration area? Yes? If so, it is on our target list to help.

Incidentally what has happened to the ACC online mapping portal?  Not only does it not appear in any search results either, but all of the maps except the council boundary appear to have vanished, and there used to be dozens of them!

Lack of clarity helps no-one

A failure to publish information and data helps no-one. How can anyone know if their child’s school is in a regeneration area. How can a community group know if they are entitled to additional funding.

Without accurate boundary maps – and better still data – how can we match activities to physical areas (be they regeneration areas, wards, neighbourhoods, or catchment areas)?

How can we analyse investment, spending, attainment, street cleanliness, crime, poverty, number of planning applications, house values, RTAs per area if we can’t get the data?

For us this is a problem, but for the kids in the schools this is another opportunity denied.

Just as we highlighted in our previous post on recycling, the lack of open data is not an abstract problem. It deprives people of data and information and stifles opportunities for innovation. Our charity, and our many volunteers at events can do clever stuff with the data – build new services, apps, websites, and act as data intermediaries to help with data literacy.

Until there is a commitment nationally (and at a city level) to open data by default we will continue to highlight this as a failing by government.

——————————-

The header image for this page is for a map of secondary school boundaries from ACC Open Data, on an Open Street Map background.

 

Scotland’s Open Data, February 2019. An update.

Note: this blog post first appeared on codethecity.co.uk in February 2019 and has been archived here with a redirect from the original URL.

Scotland’s provision of open data may be slowly improving, but it is a long way behind the rest of the UK. In my most recent trawl through websites and portals I found a few minor improvements, which are positive, but progress is too slow; some data providers are slipping backwards; and most others are still ignoring the issue altogether. Now is the time for the Scottish Government to act to fix this drag on the Scottish economy and society, and stop inhibiting innovation.

Latest review

Over the last week, I have conducted yet another trawl of Scottish Open Data websites and portals. I keep this updated on this Github Repo.  I’ve carried out this research without assistance, in my own time. The review could be more comprehensive, frequent and robust if I was supported to do it.

This work builds on previous pieces of research I’ve carried out and articles that I have written. Recently, I’ve created an index of those blog posts here as much for my own convenience of finding and linking to them as anything.

During this latest trawl, I’ve tried to better capture the wide spread of Scottish Government departments, agencies, non-departmental public bodies, health boards, local authorities, health and social care partnerships and academic institutions;  and assess each sector using quite conservative measures.

The output of that, as we will see below, does not paint a good picture of Scotland’s performance, despite a few very good examples of people doing good work despite a clear policy gap.

Let us look at this sector by sector, following the list of findings here.

Local Authorities

Of Scotland’s 32 local authorities, only 19 produce open data of any kind.  This group uses a mixture of open data portals (10), web landing pages (7) and GIS systems (2). This leaves 13 who produce no open data whatsoever.

Those 19 councils (ignoring the other 13) produce a total of 731 datasets, giving a mean for the group of 38 and a median of 17 datasets. This total is only six more than I found three months ago, despite Dumfries and Galloway launching a new portal with 33 datasets !

Also, stagnation is a real issue. For example, it is worth noting once again that while Edinburgh produces an impressive 234 open data sets, only five of those have been updated in the last six months, and 228 of them date from 2014-2017.  While there is a value in retaining historic data ( allowing comparisons, trends etc to be analysed), the value of data which is not being updated diminishes rapidly.

When I ran the OD programme for Aberdeen City Council (which, like all Scottish councils, is a unitary authority), based on some back-of-the-envelope calculations I reckoned that we could reasonably expect to have about 250 data sets. So, if each of the 32 did the same, as we would expect, then we’d have 8,000 datasets from local authorities alone. This puts the 731 current figure into perspective.

Scottish Government

So far, I have found the following open data being produced:

  • 248 datasets on the excellent, and expanding, Statistics.Gov.Scot portal  covering a number of departments, agencies and NDPBs,
  • 54 datasets on the Scottish Natural Heritage portal, 53 of which are explicitly covered by OGL and one marked “free to use data.”
  • At least 43 OGL-licensed mapping layers on the Marine Scotland portal
  • Just four geospatial datasets for download on the Spatial Hub
  • Six Linked open data sets, licensed under OGL, on the SEPA site.
  • Great interactive mapping of the Scottish Indices of Multiple Deprivation, for which the source Data is included above on the Statistics Portal mentioned above.

That makes a total of 353 datasets. I’ve not tracked these number previously, so can’t say if they are rising, but there certainly appears to be good progress and some good quality work going on to make Scottish Government data available openly. This includes the four newly-opened sets of boundary data by the Spatial Hub, out of 33 data sets.

However, if we look at the breadth of agencies etc that comprises the Scottish Government, it is clear that there are many gaps. In addition to the parent body of the Scottish Government there are a further 33 Directorates, 9 Agencies, and 92 Non-Departmental Public Bodies. That’s a total of 135 business units.

Let’s assume that they could each produce a conservative 80 data sets, and it is arguable that that should be considerably higher, then we’d expect 10,800 datasets to be released. Suddenly, 353 doesn’t seem that great.

Health

Scotland’s Health service is composed, in addition to the parent NHS Scotland body, of 14 Health Boards and 30 joint Health and Social Care Partnerships. That gives a total of 45 bodies.

Again, taking the same modest yardstick, of 80 open data sets for each, we would expect to see 3,600 data sets released.

What I found was 26 data sets on the new NHS Scotland open data portal. This is a great, high-quality resource, which I know from conversations with those behind it has great commitment to adding to its range of data provided.

However, given our yardstick above, we are still 3,574 data sets short on Scottish Health data.

Higher and Further education

Scotland’s HE / FE landscape comprises of 35 Universities and colleges.

Glasgow and Edinburgh Universities each have an open data publication mechanism for data arising out of a business operation, which contain interesting and useful data.

Despite that, there is no operational, statistical or other open data being created by any universities or colleges that I could identify. Again, using the same measure as above, that produces a deficit of (80 x 35) or 2,800 datasets.

Supply versus expectation

If we accept for the moment that the approximate number of data sets that we might expect in the Scottish public sector is as set out above, and that the current provision is, or is close to, what I have found in this trawl, then what is the over all picture?

Sector Published Expected Defecit
Local Government 731 8000 7,269
Scottish Government 353 10,800 10,447
Health 26 3,600 3,574
FE / HE 0 2,800 2,800
Totals 1,110 25,200 24,090

Table 1: Supply versus expectation of Scottish public sector Open Data

As we can see from the table above, it appears that the Scottish public sector is currently publishing 1,110 of 24,090 expected open data sets. This is just 4.6%. So, by those calculations, more than 95% of data that we might reasonably expect to see published as Open Data is not being released.

Scotland is behind the UK generally

Whether you agree with the exact figures or not, and I am open to challenge and discussion, it is clear that we are failing to produce the data that is badly needed to stimulate innovation and deliver the economic and social benefits that we expected when set out to deliver open data for Scotland.

I’ve long argued that in terms of the UK’s performance in Open Data league tables, such as the Open Data Barometer, Scotland is a drag on the UK’s performance, with Scotland’s meagre output falling well short of the rest of the UK’s Open Data.  In addition to existing approaches, we should see Scotland’s OD assessed separately, using the same methodology, in order to be able compare Scotland with the UK as a whole. That would allow us to measure Scotland’s performance on a like-for-like basis, identify shortfalls and target remedial action where needed.

Policy underpinning

I have argued previously that a significant issue which stops the Scottish public sector getting behind open data is the lack of public policy to make it happen, as well as an ignorance, or denial, of the potential economic and social benefits that it would bring. While I was part of the group who wrote the Scottish Government’s 2015 Open Data Strategy, it was, in its final form, toothless and not underpinned by policy.

We now have an Open Government Action Plan for Scotland 2018-2020 (PDF). This is  great step forward but unfortunately it is almost entirely silent on Open Data, as pointed out in my response to the draft in November 2018.

Even when Open Data does make an appearance, on page 19, it is relation to broader topic rather than forming actions on its own merits.  The position is similar in the plan’s detailed commitments.  This is not to denigrate the work that has gone into these, and the early positive engagement between Scottish Government and civic groups, but this is a huge missed opportunity – and we should not have to wait until 2020 to rectify it.

At this point, it is worth contrasting this with the Welsh Government’s Open Government plan 2016-2018 which was reviewed recently (PDF). In that plan, Open Data was the entire focus of the first two sections, and covered pages 4 to 6 of the plan. This was no afterthought: it was a significant driver and a central plank of their open government plan.

The broader community

Scotland still lacks a developed Open Data community. This will come in time as data is made more widely available, is more usable and useful – and also through the engagement with the Open Government process  – but we all need to work to develop that and accelerate the process. I set out suggestions for this in a previous post.

There are significant opportunities to grow the use of open data through the opening of private sector and community-generated and -curated data.

The universities and colleges in Scotland should be adopting open data in their curriculum, raising awareness among students, creating entrepreneurs who can establish businesses on the back of open data.

Schools should be using open data to get their classes involved: using it to explain their environment, climate, and transport system; to understand local demographics, the distribution of local government spending, or comparative attainment of schools.

Government should be  developing the curriculum to use open data to foster a better understanding of data and how it underpins modern society.

There are some positive things going on: the roadshows that the Scottish Government are doing, as well as other Data Fest Fringe events; the regular data hack weekends we’ve been doing in Aberdeen under the Code The City banner; and the major long-term project to build and deploy community-hosted air quality monitoring sensors which provide open data for the local community. These need to become the norm – and to be happening across the country.

Organisations such as The Data Lab, Censis and other innovation centres have a great opportunity here to advance their work, whether in education, community building or fostering innovation, and to support this to achieve their organisational missions.

Bringing people together

Having earlier created a Twitter account for a nascent Scottish Open Data Action Group (@Soda_group), I have reconsidered that. Instead of an action group to pressure, shame or coerce the Scottish Government into action, what we need is a common group that has the Scottish Government onside – and everyone works together. So I have renamed it @opendata_sco. It already has 179 followers and I hope that we can grow that quickly, and use that to generate more interest and engagement.

I have also launched a new open Slack channel for Open Data Scotland, so that a community can better communicate with one another.

Please join, using this form.

As I have said previously this isn’t a them-and-us, supply-and-demand relationship. We’re all in it together, and the better we collaborate as a community the better, and quicker, society as a whole benefits from it.

========================================

Header photo by Andrew Amistad on Unsplash

Boundaries, not barriers

Note: This blogpost first appeared on codethecity.co.uk in January 2019 and has been archived here with a redirect from the original URL. 

I wrote some recent articles about the state of open data in Scotland. Those highlighted the poor current provision and set out some thoughts on how to improve the situation. This post is about a concrete example of the impact of government doing things poorly.

Ennui: a great spur to experimentation

As the Christmas ticked by I started to get restless. Rather than watch a third rerun of Elf, I decided I wanted to practice some new skills in mapping data: specifically how to make Choropleth Maps. Rather than slavishly follow some online tutorials and show unemployment per US state, I thought it would be more interesting to plot some data for Scotland’s 32 local authorities.

Where to get the council boundaries?

If you search Google for “boundary data Scottish Local Authorities”  you will be taken to this page on the data.gov.uk website. It is titled “Scottish Local Authority Areas”  and the description explains the background to local government boundaries in Scotland. The publisher of the data is the Scottish Government Spatial Data Infrastructure (SDI). Had I started on their home page, which is far from user-friendly, and filtered and searched, I would have eventually been taken back to the page on the data.gov.uk data portal.

The latter page offers a link to “Download via OS OpenData” which sounds encouraging.

Download via OS Open Data
Download via OS Open Data

This takes you to a page headed, alarmingly, “Order OS Open Data.” After some lengthy text (which warns that DVDs will take about 28 days to arrive but that downloads will normally arrive within an hour), there then follows a list of fifteen data sets to choose. The Boundary Line option looked most appropriate after reading descriptions.

This was described as being in a proprietary ERSI shapefile format, and being 754Mb of files, with another version in the also proprietary Mapinfo format. Importantly, there was no option for downloading data for Scotland only, which I wanted. In order to download it, I had to give some minimal details, and complete a captcha. On completion, I got the message, “Your email containing download links may take up to 2 hours to arrive.”

There was a very welcome message at the foot of the page: “OS OpenData products are free under the Open Government Licence.” This linked not to the usual National Archives definition, but to a page on the OS site itself with some extra, but non-onerous reminders.

Once the link arrived (actually within a few minutes) I then clicked to download the data as a Zip file. Thankfully, I have a reasonably fast connection, and within a few minutes I received and unzipped twelve sets of 4 files each, which now took up 1.13GB on my hard drive.

Partial directory listing of downloaded files
Partial directory listing of downloaded files

Two sets of files looked relevant: scotland_and_wales_region.shp and scotland_and_wales_const_region.shp. I couldn’t work out what the differences were in these, and it wasn’t clear why Wales data is also bundled with Scotland – but these looked useful.

Wrong data in the wrong format

My first challenge was that I didn’t want Shapefiles, but these were the only thing on offer, it appeared. The tutorials I was going to follow and adapt used a library called Folium, which called for data as GeoJson, which is a neutral, lightweight and human readable file format.

I needed to find a way to check the contents of the Shapefiles: were they even the ones I wanted? If so, then perhaps I could convert them in some way.

To check the shapefile contents, I settled on a library called GeoPandas. One after the other I loaded scotland_and_wales_region.shp and scotland_and_wales_const_region.shp. After viewing the data in tabular form, I could see that these are not what I was looking for.

So, I searched again on the Scottish Spatial Infrastructure and found this page. It has a Download link at the top right. I must have missed that.

SSI Download Link
SSI Download Link

But when you click on Download it  turns out to be a download of the metadata associated with the data, not the data files. Clicking Download link via OS Open Data, further down page, takes you back to the very same link, above.

I did further searching. It appeared that the Scottish Local Government Boundary Commission offered data for wards within councils but not the councils’ own boundaries themselves. For admin boundaries, there were links to OS’ Boundary Line site where I was confronted by same choices as earlier.

Eventually, through frustration I started to check the others of the twelve previously-downloaded Boundary Line data sets and found there was a shape file called “district_borough_unitary_region.shp” On inspection in GeoPandas it appeared that this was what I needed – despite Scottish Local Authorities being neither districts nor boroughs – except that it contained all local authority boundaries for the UK – some 380 (not just the 32 that I needed).

Converting the data

Having downloaded the data I then had to find a way to convert it from Shapefile to Geojson (adapting some code I had discovered on StackOverflow) then subset the data to throw away almost 350 of the 380 boundaries. This was a two stage process: use a conversion script to read in Shapefiles, process and spit out Geojson; write some code to read in the Geojson, covert it to a python dictionary, match elements against a list of Scottish LAs, then write the subset of boundaries back out as a geojson text file.

Code to convert shapefiles to geojson
Code to convert shapefiles to geojson

Using the Geojson to create a choropleth map

I’ll spare the details here, but I then spent many, many hours trying to get the Geojson which I had generated to work with the Folium library. Eventually it dawned on me that while the converted Geojson looked ok, in fact it was not correct. The conversion routine was not producing the correct Geojson.

Another source

Having returned to this about 10 days after my first attempts, and done more hunting around (surely someone else had tried to use Scottish LAs as geojson!) I discovered that Martin Crowley had republished on Github boundaries for UK Administrations as Geojson. This was something that had intended to do for myself later, once I had working conversions, since the OGL licence permits republishing with accreditation.

Had I had access to these two weeks ago, I could have used them. With the Scottish data downloaded as Geojson, producing a simple choropleth map as a test took less than ten minutes!

Choropleth map of Scottish Local Authorities
Choropleth map of Scottish Local Authorities

While there is some tidying to do on the scale of the key, and the shading, the general principle works very well. I will share the code for this in a future post.

Some questions

There is something decidedly user-unfriendly about the SDI approach which is reflective of the Scottish public sector at large when it comes to open data. This raises some specific, and some general questions.

  1. Why can’t the Scottish Government’s SDI team publish data themselves, as the OGL facilitates, rather than have a reliance on OS publishing?
  2. Why are boundary data, and by the looks of it other geographic data, published as ESRI GIS shapefiles or Mapinfo formats rather than the generally more-useable, and much-smaller, GeoJson format?
  3. Why can’t we have Scottish (and English, and Welsh) authority boundaries as individual downloads, rather than bundled as UK-level data, forcing the developer to download unnecessary files? I ended up with 1.13GB (and 48 files) of data instead of a single 8.1MB Scottish geojson file.
  4. What engagement with the wider data science / open community have SDI team done to establish how their data could be useful, useable and used?
  5. How do we, as the broader Open Data community share or signpost resources? Is it all down to government? Should we actively and routinely push things to Google Dataset Search? Had there been a place for me to look, then I would have found the GitHub repo of council boundaries in minutes, and been done in time to see the second half of Elf!

And finally

I am always up for a conversation about how we make open data work as it should in Scotland. If you want to make the right things happen, and need advice, or guidance, for your organisation, business or community, then we can help you. Please get in touch. You can find me here or here or fill in this contact form and we will respond promptly.

Open Data Scotland – a nudge from OD Camp?

Note: This blog post was originally published in November 2018 at CodeTheCity.co.uk and was archived here with redirects from the original URL. 

Over the first weekend of November 2018, just over 100 people congregated in Aberdeen to attend the UK Open Data Camp. We’d pushed hard to bring it to Scotland, and specifically Aberdeen, for the first time. The event, the sixth of its type, which follows an unconference model where the attendees set the agenda, has previously taken place in England, Wales and Northern Ireland.

I’m not going to go through what we did over the weekend, you can find plenty of that here and here. There are links to all 44 sessions which took places on this Google doc, and many of those have collaborative notes taken during the sessions.

Instead this is a reflective piece, seeking to understand what OD Camp can show us about the state of Open Data in Scotland and beyond.

Who was there?

Of the 100+ attendees, including camp-makers, we estimate that about 40 were from the public sector. Getting exact numbers is hard – people register in their own name, with their own email addresses, but we think that is a good guess.

While this sounds good, during the pitching session on the first day Rory Gianni asked a question: “Hands up who is here from the Scottish public sector?” Two people’s hands went up out of 100+. Each were from local authorities, Aberdeen and Perth city councils, and a third (also from Aberdeen) joined later on Saturday.

This is really concerning and shows the gulf between what Scotland could, or rather must, be doing and what is actually happening.

The Scottish Public Sector

It is estimated that the Scottish Civil Service encompasses 16,000+ officers. It encompasses 33 directorates,  nine executive agencies  and around 90 Non-departmental public bodies (NDPBs) plus other odds and ends such as the Crown Office and Procurator Fiscal Service.

Then we have 14 health boards, 32 local authorities, 32 Joint Health and Social Care Partnerships and so on.

All of these should be producing open data.

Reality

Sadly, we are very far from that. Few are of any scale or quality. I’ve written about this extensively in the past including in this blog post and its successor post.

So, if we use attendance by the Scottish public sector, at a free-to-attend event which was arranged for them on their very doorstep, as a barometer of commitment to open data, it is clear that something is rotten in the state of Denmark Scotland.

Three weeks on

Since the event, I’ve reached out to the Scottish Government through two channels. I contacted the Roger Halliday, the Chief Statistician, the senior civil servant with a responsibility for Open data, and responded to a Twitter contact from Kate Forbes, the minister for Public Finance and Digital Economy.

I then had an hour-long conversation with Roger and two of his colleagues. This was a very positive discussion. I took away that there is a genuine commitment to doing things better, underpinned by a realism about capacity and capability to widely deliver publication and engagement with the wider OD community. I have agreed to be part of a round table meeting on OD to be held in the new year – and have expressed a commitment to assist in any way needed to improve things.

Meanwhile

Ironically, in the midst of this three week period, the Scottish Government published its Open Government action plan. This emerged on 14th November and is open for feedback until 27th November. So, if you are quick, you can respond to that – and I encourage you to do so. While this certainly seeks to move things in the right direction in terms of openness and transparency, it is extremely light on open data and committed actions to address some of the issues which I have raised.

My next blog post will be a copy of the feedback which I provide, and on which I am currently working.

And finally

When I started drafting this post I was in a very negative frame of mind as regards the Scottish Open Data scene – and particularly in terms of the public sector. In the intervening period, I  launched the Scottish Open Data Action group on Twitter. The thinking  behind this was to get together a group of activists to swell the public voice beyond mine and that of ODI Aberdeen.

Given the way things are moving on with the Scottish Government and the positive engagement that has begun, the group, which is in its infancy, may not be needed as a vocal pressure group. Instead we could be a supportive external panel who provide expertise and encouragement as needed. Who knows – let’s see!

The Many False Dawns of Scottish Open Data (2010 to 2018)

Note: this blog post was first published on 10th June 2018 at CodeTheCity.co.uk has has been archived here with redirects from the original URL. 

In this, the first of two posts, I look back over eight years of open data in Scotland, showing where ambition and intent mostly didn’t deliver as we hoped.

In the next part I will look forward, examining how we should rectify things, engage the right people, build on current foundations, and how we all can be involved in making it work as we hoped it would all those years ago.

Let our story begin

“The moon was low down, and there was just the glimmer of the false dawn that comes about an hour before the real one.” – Rudyard Kipling, Plain Tales from the Hills, 1888

The story to-date of Open Data in Scotland is one of multiple false dawns. Are we at last about to witness a real sunrise after so much misplaced hope?

The trigger

At Data Lab‘s recent Innovation Week in Glasgow, I found myself among 115 other data science MSc students – some of the brightest and best in Scotland – working on seven different industry challenges. You can read more of how that went on my own blog. In this post I want to mention briefly one of the challenges,  and the subsequent conversations which it stirred in the room, then on social media and even in email correspondence, then use that to illustrate my false dawn analogy.

The Challenge

The Innovation Week challenge was a simple one compared to some others, and was composed of two questions: “how might we analyse planning applications in light of biodiversity?”, and, “how might we evaluate the cumulative impact of planning applications across the 32 Scottish Local Authorities?”

These are, on the face of it, fairly easily answered. To make it even simpler, as part of the preparation for the innovation week, Data Lab, Snook and others had done some of the leg work for us. This included identifying the NBN Atlas system as one which contained over 219 million sightings of wildlife species, which could be queried easily and which provided open access to its data.

That should have been the difficult part. The other part, getting current and planning application data from the Scottish Local Authorities should have been the easier task – but it was far from it. In fact, in the context of the time available to us, it was impossible as we could find not a single council, of the 32, offering its planning data as open data. You can read more of the particulars of that on my earlier blog posts, above.

This is about the general – not the specific, so, for now, let us set some context to this, and perhaps see how we got to be this point.

The first false dawn.

We start in August 2010, when I was working in Aberdeen City Council. I’d been reading quite a bit about open data, and following what a few enlightened individuals, such as Chris Taggart were doing. It seemed to me so obvious that open data could deliver so much socially and economically – even if no formal studies had by then been published. So, since it was a no-brainer, I arranged for us to publish the first open data in Scotland – at least from a Scottish City Council.

Another glimmer

The UK Coalition Government had, in 2010, put Open Data front and centre. They created http://data.gov.uk and mandated a transparency agenda for England and Wales which necessitated publishing Open Data for all LA transactions over £500.

At some point thereafter, in 2011-12 both Edinburgh and Glasgow councils started to produce some open data. Sally Kerr in Edinburgh became their champion – and began working with Ewan Klein in Edinburgh University to get things moving there. I can’t track the exact dates. If you can help me, please let me know and I will update this post.

Studies, and even mainstream press, were starting to highlight the benefits of open data. Now this was starting to feel like a movement!

Suffering from premature congratulation

In 2012 the Open Data Institute was founded by Nigel Shadbolt and Tim Berners-Lee, and from day one championed open data as a public good, stressing the need for effective governance models to protect it.

During 2012 and 2013 Aberdeen, Edinburgh and others started work with Nesta Scotland, run out of Dundee, by the inspirational Jackie McKenzie and her amazing team. They funded two collaborative programmes: Make It Local Scotland and Open Data Scotland.

The former had Aberdeen City Council using Linked Open Data (another leap forward) to create a citizen-driven alerts system for road travel disruption. This was built by Bill Roberts and his team at Swirrl – who have gone on to do more excellent work in this area.

Around mid 2013 Glasgow had received Technology Strategy Board funding for a future cities demonstrator was was recruiting people to work on its open data programme

Sh*t gets real

The second Nesta programme, Open Data Scotland , saw two cities – Aberdeen and Edinburgh – work with two rural councils, East Lothian and Clackmannanshire. Crucially, it linked us all with the Code For Europe movement, and we were able to see at first-hand the amazing work being done in Amsterdam, Helsinki, Barcelona, Berlin and elsewhere. It felt that we were part of something bigger, and unstoppable.

And it gets real-er

In late 2014 the Scottish Government appeared to suddenly ‘get’ open data. They wanted a strategy – so they pulled a bunch of us together two write one. The group included Sally from Edinburgh and me – and the document was published in March 2015.  I had pushed for it to have more teeth than it ended up having, and to commit to defined actions, putting an onus on departments and local government to deliver widely on this in a tight timescale.

It did include –
“To realise our vision and to meet the growing interest from users we encourage all organisations to have an Open Data publication plan in place and published on their website by December 2015. Organisations currently publishing data in a format which does not readily support re-use, should within their plan identify when the data will be made available in a more re-usable format. The ambition is for all data by 2017 to be published in a format of 3* or above.” I will come back to this later.

This MUST be it!

In 2016-2017 the Scottish Cities Alliance, supported by the European Regional Development Fund launched a programme: Scotland’s Eighth City – The Smart City. At its heart was data – and more specifically open data. The data project was to feature all seven of Scotland’s cities, working on four streams of work:

  • data standards
  • data platforms
  • data engagement and
  • Data analytics.

The perception was also at that time that the Scottish Government had taken its eye off the ball as regards open data. Little if anything had changed as a result of the 2015 strategy. By working together as 7 cities we could lead the way – and get the other 25 councils, and the Scottish Government themselves, not only to take notice, but also to work with us to put Open Data at the heart of Scottish public services.

The programme would run from Jan 2017 to Dec 2018. I was asked to lead it, which I was delighted to do – and remained involved in that way until I retired from Aberdeen City Council in June 2017.

At that point Aberdeen abandoned all commitment to open data and withdrew from the SCA programme. I have no first-hand knowledge of the SCA programme as it stands now.

Six False Dawns Later

So, after six false dawns what is the state of open data in Scotland: is it where we expected it to be? The short answer to that has be a resounding no.

Some of the developments which should have acted as beacons have been abandoned. The few open data portals we have are, with some newer exceptions, looking pretty neglected: data is incomplete or out of date. There is no national co-ordination of effort, no clear sets of guidance, no agreement on standards or terminologies, no technical co-ordination.

Activity, where it happens at all, is localised, and is more often than not grass-roots driven (which is not in itself a bad thing). In some cases local authorities are being shamed into reinstating their programmes by community groups.

The Scottish Government, with the exception of their SIMD Linked Data work, which was again built by Swirrl, and some statistical data, have produced shamefully little Open Data since their 2015 Strategy.

Despite a number of key players in the examples above still being around, in one role of another, and a growing body of evidence demonstrating ROI, there is strong evidence that Senior Managers, Elected Members and others don’t understand the socio-economic benefits that publishing open data can bring. This is particularly disturbing considering the shrinking budgets and the need to be more efficient and effective.

So, what now?

Given that we have witnessed these many false dawns, when will the real sunrise be? What will trigger that, and what can we each do to make it happen?

For that you will have to read our next instalment!

[Header image by Marc Marchal on Unsplash]