Earlier this year Code The City held an Editathon with Wikimedia UK. The subject was the history of Aberdeen Cinemas. We ended up with 16 people all working together to create new articles, update existing ones, capture new images for Wiki Commons, and generate or enhance WikiData items. This was a follow up to previous sessions that Dr Sara Thomas of WikiMedia UK led for us in the city, mainly for information professionals.
This has led to significant interest from cultural bodies in the city in using the suite of WikiMedia platforms and tools to improve access to their collections in Aberdeen. We expect to do quite a bit more of this with them in 2020.
Two weeks ago I attended a Train the Trainer 3-day workshop in Glasgow for Wikimedia UK to become a trainer for them in Scotland. That will see me training professionals and volunteers in how to use Wikipedia, Wiki Commons and Wikidata in particular.
In this blog post I explain why you might want to use some of the fancy features of WikiData query service, show you how to do that, using on my adaptation of others’ shared examples, and encourage you to experiment for yourself.
Wikidata uses a Linked Open Data format to store data. While I have added quite a number of items to Wikidata I’ve not had a chance to really study how to use SPARQL (the query language behind the scenes) to to execute queries against the data. This is done in the Wikidata Query service. This is a key skill to using some of the more advanced features. Without the means to extract data there is little point in stuffing data into it. In fact WikiData allows us to do some very fancy things with the data which we retrieve.
So, I decided this week to start working on that. This describes the first steps I have been doing. It should also provide a simple introduction to any else wanting to dip their toe in the SPARQL waters.
Where to start?
This 16-minute tutorial on Youtube is a great place to begin; it is where I started. It describes how to create a simple query and build it up to something more powerful. I copied what it did then adapted that to build a query that I wanted. I suggest that you watch it first to understand what each line of SPARQL is doing.
Here are the steps, mainly frown from and adapted from that tutorial.
In the query above we use the Educated at statement (P69) and the identifier for Aberdeen University (Q270532 ) in combination with the Sex or gender statement (P21) with the Female identifier (Q6581072).
You can run this for yourself here using the white-on-blue arrow. I’ve used one of the great things you can do with Wikidata which is to share this query using the link symbol on the left of the page just above the arrow:
Changing the parameters of the query means that we can check males (Q6581097) against females (Q6581072). Or you can compare different universities. To do this go to the Wikidata homepage and search for the name of the institution. The query will return a page with the Q code in the title. Thus we can compare various universities by amending the Q code in the query above: University ofAberdeen (Q270532) with University of Glasgow (Q192775) or Edinburgh University (Q160302).
Running these queries we can see that the number of both male and female graduates with entries on WikiData of Aberdeen University is significantly smaller than from either Glasgow or Edinburgh, and we can see that the proportion of females of all graduates for each university is smallest for Aberdeen.
The results of these queries should themselves cause us to reflect on the relatively smaller number of results of either gender from Aberdeen compared to the other universities; and also the smaller proportion of women. It suggests that there is some work to do to ensure that we get better representation of both genders in Wikidata.
Enhancing our query
Now that we have a basic query we can retrieve additional bits of data for the subjects of the query including place of birth, date of birth and images.
These are represented by P19 (birth place), P560 (date of birth) and P18 (image). As we see in the example below, when we query these we follow them with a name we assign to the item returned (e.g. ?person wdt:P19 ?birthPlace ) and we add the name we give it, in this case ?birthPlace to the Select statement on the first line of the query, ensuring that it will feature in the data returned in the table or other format output.
You will note that the above example now uses the ?birthPlace to create a new query to get the co-ordinates (P625) of that place which we assign to coordinates:
> ?birthPlace wdt:P625 ?coordinates
and we include coordinates in the first line of things we will display.
Advantages of extra data elements
By having birthplace coordinates we can plot the results in a map which is easily done using the tools built into the wikidata query service.
Run the query (white arrow on blue on the left menu) and observe the table that was returned. You can see that the first line of the Select statement formed the columns of the table.
Note that instead of 125 results as we had in the simple query, we only get 20 results. My understanding of this is that we are specifying records which must have a place of birth, an image etc. Where these do not exist then they records for that person are not returned. This in itself shows that there is a piece of work to do to identify where records in the batch of 125 lack these elements and fix them.
In fact you could say that there is a whole cycle of adding data, querying it, spotting anomalies, fixing those and re-querying which leads to substantial enrichment of the data.
Now click on the dropdown by the eye symbol, on the left immediately above the results, and choose the map option. The tool will generate a map with a pin in the location of each place of birth. You can pan and zoom to the UK and click on each pin. Try it. To get back to the query, click on the arrow, top-right.
Now click on the eye symbol to show other options, and choose Timeline.
As we can see below, the Wikidata query service will construct a rudimentary timeline with relatively little effort. This is one of its great features. So far we have the same 20 complete records – and the cards or tiles are titled by the place of birth but we can change that.
Enhancing the timeline on Histropedia
To improve on our timeline we can construct a better query using the Wikidata Query Service then paste it into the Histropedia service to run it. Our first version which makes small improvements on our previous timeline produces the results below. This labels by the person’s name, and colour codes the individual records by place of birth label. To see the code, click the gear wheel at the top right of the screen. Note we still only retrieve 20 results.
We can substantially enhance this query as we have done on the following version. This makes certain items optional, gets the country of birth and colour-codes by that, and ranks the records by prominence (with the most prominent at the front). If I understand it correctly by using optional elements it also retrieves 76 records, much more than previously.
I would encourage you to watch the tutorial video at the start of this post, then try to hack some of the queries to which I provided links. For example how many female graduates of the Robert Gordon University would each query generate? How would you find the Q code of that institution? Have fun with it!
At CTC we work with ONE Codebase to deliver Young City Coders classes. These are after school activities to encourage young people to get into coding by trying Scratch, Python and other languages in a Coder Dojo like environment.
Inoapps generously gave us some funding to cover costs and donated old laptops (as did the James Hutton Institute) which we cleaned up and recycled into machines they could use.
All of which is great – and we have 20-25 kids each session starting to get into these coding languages.
But there is an issue – the bulk of our kids are overwhelmingly from west-end schools. And we have an aim to help kids in regeneration areas where opportunities are generally fewer.
So, that means identifying Aberdeen schools that fall in the regeneration areas and contacting the head teacher and having a discussion about what help they would like to see us provide. Simple?
Search for regeneration areas
Starting with the basics – what are the regeneration areas of Aberdeen? According to Google, the Aberdeen City Council website doesn’t tell us. Certainly not in the top five pages of results (and yes, I did go down that far).
Google’s top answer is from the Evening Express article which says that there are five regeneration areas: Middlefield, Woodside, Tillydrone, Torry and Seaton. From what I have heard that sounds like it might be about right – but surely there is an official source of this.
Further searching turns up a page from Pinacl Solutions who won a contract from ACC to provide wifi in the Northern regeneration areas of “Northfield, Middlefield, Woodside and Tillydrone.” Which raises the question of whether Northfield is or isn’t a sixth regeneration area.
The Citizens Advice Bureau Aberdeen has an article on support services for regeneration areas of “Cummings Park, Middlefield, Northfield, Seaton, Tillydrone, Torry, Woodside and Powis.” That adds two more to our list.
Other sites report there being an “Aberdeen City Centre regeneration area.” Is that a ninth?
Having a definitive and authoritative page from ACC would help. Going straight to their site and using the site’s own search function should help. I search for “regeneration areas” and then just “regeneration.”
I get two results: “Union Street Conservation Area Regeneration Scheme” and “Buy Back Scheme”. The latter page has not a single mentionof regeneration despite the site throwing up a positive result. The former appears to be all about the built environment. So it is probably not a ninth one in the sense that the others are. Who knows?
So what are the regeneration areas – and how can I find which schools fall within them?
Community Planning Aberdeen
Someone suggested that I try the Community Planning Aberdeen site’. Its not having a site search wasn’t very helpful but using Google to restrict only results from that domain threw up a mass of PDFs.
After wading through half a dozen of these I could find no list or definition of the regeneration areas of the city are. Amending the query to a specific “five regeneration areas” or “eight….” didn’t work.
Trying “seven regeneration areas” did return this document with a line: “SHMU supports residents in the seven regeneration areas of the city.” So, if that is correct then it appears there are seven. What they are – and which of the eight (or nine) we’ve found so far is not included – is still unknown.
Wards, neighbourhoods, districts, areas, school catchment areas
And – do they map onto council wards or are they exact matches for other defined areas – such as neighbourhoods?
It turns out that there are 13 council wards in the city. I had to manually count them from this page. I got there via Google as search the ACC site for Council Wards doesn’t get you there.
I seem to remember there were 37(?) city neighbourhoods identified at one time. To find them I had to know that there were 37 as searching for “aberdeen neighbourhoods’ wasn’t specific enough to return any meaningful list or useful page.
And until we find our what the regeneration areas are, and we can work out which primary and secondary schools fall in those areas, we can’t do very much. Which means that the kids who would benefit from code clubs most don’t get our help.
I though this would be easy!
At the very minimum I could have used a web page with a list of regeneration areas and some jpg maps to show where they are. That’s not exactly hard to provide. And I’d make sure that the SEO was done in a way that it performed well on Google (oh and I’d sort the site’s own search). But that would do at a pinch. Sticking at that would miss so many opportunities, though.
Better would be a set of Shape Files or geojson (ideally presented on the almost empty open data platform) with polygons that I could download and overlay on a background map.
That done I could download a set of school boundaries (they do exist here – yay) and overlay those and workout the intersections between the two. Does the school boundary overlap a regeneration area? Yes? If so, it is on our target list to help.
Incidentally what has happened to the ACC online mapping portal? Not only does it not appear in any search results either, but all of the maps except the council boundary appear to have vanished, and there used to be dozens of them!
Lack of clarity helps no-one
A failure to publish information and data helps no-one. How can anyone know if their child’s school is in a regeneration area. How can a community group know if they are entitled to additional funding.
Without accurate boundary maps – and better still data – how can we match activities to physical areas (be they regeneration areas, wards, neighbourhoods, or catchment areas)?
How can we analyse investment, spending, attainment, street cleanliness, crime, poverty, number of planning applications, house values, RTAs per area if we can’t get the data?
For us this is a problem, but for the kids in the schools this is another opportunity denied.
Just as we highlighted in our previous post on recycling, the lack of open data is not an abstract problem. It deprives people of data and information and stifles opportunities for innovation. Our charity, and our many volunteers at events can do clever stuff with the data – build new services, apps, websites, and act as data intermediaries to help with data literacy.
Until there is a commitment nationally (and at a city level) to open data by default we will continue to highlight this as a failing by government.
The header image for this page is for a map of secondary school boundaries from ACC Open Data, on an Open Street Map background.
Note: This blogpost first appeared on codethecity.co.uk in January 2019 and has been archived here with a redirect from the original URL.
I wrote some recent articles about the state of open data in Scotland. Those highlighted the poor current provision and set out some thoughts on how to improve the situation. This post is about a concrete example of the impact of government doing things poorly.
Ennui: a great spur to experimentation
As the Christmas ticked by I started to get restless. Rather than watch a third rerun of Elf, I decided I wanted to practice some new skills in mapping data: specifically how to make Choropleth Maps. Rather than slavishly follow some online tutorials and show unemployment per US state, I thought it would be more interesting to plot some data for Scotland’s 32 local authorities.
Where to get the council boundaries?
If you search Google for “boundary data Scottish Local Authorities” you will be taken to this page on the data.gov.uk website. It is titled “Scottish Local Authority Areas” and the description explains the background to local government boundaries in Scotland. The publisher of the data is the Scottish Government Spatial Data Infrastructure (SDI). Had I started on their home page, which is far from user-friendly, and filtered and searched, I would have eventually been taken back to the page on the data.gov.uk data portal.
This takes you to a page headed, alarmingly, “Order OS Open Data.” After some lengthy text (which warns that DVDs will take about 28 days to arrive but that downloads will normally arrive within an hour), there then follows a list of fifteen data sets to choose. The Boundary Line option looked most appropriate after reading descriptions.
This was described as being in a proprietary ERSI shapefile format, and being 754Mb of files, with another version in the also proprietary Mapinfo format. Importantly, there was no option for downloading data for Scotland only, which I wanted. In order to download it, I had to give some minimal details, and complete a captcha. On completion, I got the message, “Your email containing download links may take up to 2 hours to arrive.”
There was a very welcome message at the foot of the page: “OS OpenData products are free under the Open Government Licence.” This linked not to the usual National Archives definition, but to a page on the OS site itself with some extra, but non-onerous reminders.
Once the link arrived (actually within a few minutes) I then clicked to download the data as a Zip file. Thankfully, I have a reasonably fast connection, and within a few minutes I received and unzipped twelve sets of 4 files each, which now took up 1.13GB on my hard drive.
Two sets of files looked relevant: scotland_and_wales_region.shp and scotland_and_wales_const_region.shp. I couldn’t work out what the differences were in these, and it wasn’t clear why Wales data is also bundled with Scotland – but these looked useful.
Wrong data in the wrong format
My first challenge was that I didn’t want Shapefiles, but these were the only thing on offer, it appeared. The tutorials I was going to follow and adapt used a library called Folium, which called for data as GeoJson, which is a neutral, lightweight and human readable file format.
I needed to find a way to check the contents of the Shapefiles: were they even the ones I wanted? If so, then perhaps I could convert them in some way.
To check the shapefile contents, I settled on a library called GeoPandas. One after the other I loaded scotland_and_wales_region.shp and scotland_and_wales_const_region.shp. After viewing the data in tabular form, I could see that these are not what I was looking for.
So, I searched again on the Scottish Spatial Infrastructure and found this page. It has a Download link at the top right. I must have missed that.
But when you click on Download it turns out to be a download of the metadata associated with the data, not the data files. Clicking Download link via OS Open Data, further down page, takes you back to the very same link, above.
I did further searching. It appeared that the Scottish Local Government Boundary Commission offered data for wards within councils but not the councils’ own boundaries themselves. For admin boundaries, there were links to OS’ Boundary Line site where I was confronted by same choices as earlier.
Eventually, through frustration I started to check the others of the twelve previously-downloaded Boundary Line data sets and found there was a shape file called “district_borough_unitary_region.shp” On inspection in GeoPandas it appeared that this was what I needed – despite Scottish Local Authorities being neither districts nor boroughs – except that it contained all local authority boundaries for the UK – some 380 (not just the 32 that I needed).
Converting the data
Having downloaded the data I then had to find a way to convert it from Shapefile to Geojson (adapting some code I had discovered on StackOverflow) then subset the data to throw away almost 350 of the 380 boundaries. This was a two stage process: use a conversion script to read in Shapefiles, process and spit out Geojson; write some code to read in the Geojson, covert it to a python dictionary, match elements against a list of Scottish LAs, then write the subset of boundaries back out as a geojson text file.
Using the Geojson to create a choropleth map
I’ll spare the details here, but I then spent many, many hours trying to get the Geojson which I had generated to work with the Folium library. Eventually it dawned on me that while the converted Geojson looked ok, in fact it was not correct. The conversion routine was not producing the correct Geojson.
Having returned to this about 10 days after my first attempts, and done more hunting around (surely someone else had tried to use Scottish LAs as geojson!) I discovered that Martin Crowley had republished on Github boundaries for UK Administrations as Geojson. This was something that had intended to do for myself later, once I had working conversions, since the OGL licence permits republishing with accreditation.
Had I had access to these two weeks ago, I could have used them. With the Scottish data downloaded as Geojson, producing a simple choropleth map as a test took less than ten minutes!
While there is some tidying to do on the scale of the key, and the shading, the general principle works very well. I will share the code for this in a future post.
There is something decidedly user-unfriendly about the SDI approach which is reflective of the Scottish public sector at large when it comes to open data. This raises some specific, and some general questions.
Why can’t the Scottish Government’s SDI team publish data themselves, as the OGL facilitates, rather than have a reliance on OS publishing?
Why are boundary data, and by the looks of it other geographic data, published as ESRI GIS shapefiles or Mapinfo formats rather than the generally more-useable, and much-smaller, GeoJson format?
Why can’t we have Scottish (and English, and Welsh) authority boundaries as individual downloads, rather than bundled as UK-level data, forcing the developer to download unnecessary files? I ended up with 1.13GB (and 48 files) of data instead of a single 8.1MB Scottish geojson file.
What engagement with the wider data science / open community have SDI team done to establish how their data could be useful, useable and used?
How do we, as the broader Open Data community share or signpost resources? Is it all down to government? Should we actively and routinely push things to Google Dataset Search? Had there been a place for me to look, then I would have found the GitHub repo of council boundaries in minutes, and been done in time to see the second half of Elf!
I am always up for a conversation about how we make open data work as it should in Scotland. If you want to make the right things happen, and need advice, or guidance, for your organisation, business or community, then we can help you. Please get in touch. You can find me here or here or fill in this contact form and we will respond promptly.
Note: This post was first published in in June 2018 on the CodeTheCity.co.uk blog and has been archived here with redirects from the URL.
There is an oft-repeated joke in which a tourist, completely lost in the Irish countryside, asks an old fellow who is leaning on a gate at the edge of a field, “Can you tell me how to get to Dublin?” After a long pause, the old guy replies, “Well, you don’t want to start from here”.
Previously, I covered open data in Scotland from 2010 to the present. Now I look ahead, but to get there we need to start from where we currently find ourselves.
Scottish open data publishing – now
Earlier this week I spent a couple of hours pulling this list together as a first snapshot of the current open data publishing landscape. The intention is to present an accurate precis of the current state, within the available time to do the research. If I have missed anything, or got it wrong, let me know and I will fix it.
There have been sporadic attempts – of varying size, cost, and success – to make Scottish open data available. How these were initiated or funded varies. Examples include bodies such as Nesta, individual local authorities, groups such as the Scottish Cities Alliance (SCA), and by the Scottish Government.
It appears that at the time of writing that the SCA programme (which is scheduled to run from Jan 2017 to Dec 2018) has so far delivered new open data portals for Dundee, Perth, Inverness and Stirling. Some of these have started to publish a few data sets and others, 18 months into the programme, are still waiting to do so. Aberdeen, who dropped out in late 2017, announced in May of this year that they were back on board, but so far there is no sign of anything being delivered. Even the open data landing pages which Aberdeen City Council once hosted have been removed, although I have heard mention of some GIS open data due to be released.
Edinburgh and Glasgow had existing portals prior to the SCA programme. In Edinburgh’s case, while it has an impressive 234 datasets, only four of these have been updated in the last six months, and no new data sets added for over 15 months.
It looks like Glasgow’s open data platform is a new one, replacing the one created as part of the TSB funded £20m+ future cities programme (PDF. Links to original site have disappeared). It used to host over 370 data sets. The new one has far fewer: 72 . While a number of these have been added to the new portal this year, many of them are historic: e.g. house sales data only go to 2013, which suggests that these are ported from the old site and not updated. It also suggests that around 300 data sets have vanished (temporarily, we hope)!
Some considerable recent attention, and an award, has been given to a project carried out on business rates data by North Lanarkshire Council (NLC) with partners Snook and Urban Tide. This is part of a programme funded by the ODI, and the press coverage reiterates NLC’s claim to have an open by default policy. I know both Urban Tide and Snook, and their work – so I am sure that it will be great. In researching this, though, I could find no data.
In response to my enquiries NLC told me that they are testing a platform. Interestingly, Edinburgh has claimed in the past to have an open-by-default policy for data too, which I cannot locate. Sadly this position is not supported by their own portal’s current condition.
Similarly, Renfrewshire have an Open Data in Renfrewshire page, “The Council is taking a lead role in complying with the Scottish Government’s Open Data Strategy“, the Dublin Code data of which show it was created, and last updated in April 2016. They have a 25-page strategy dated 2015 with a commitment to open data by default, but NO open data that I can find; not even an entry in their website A-Z.
When we created the business case for the SCA data programme, I was quite clear that each of the 7 local authorities were procuring a portal for their city, not for the council. This is an important point. When local councils fail to provide a platform, and data, it is not just the local authority’s image it is tarnished – they are failing citizens, academia and businesses alike.
Where can we see best practice in action?
Sadly, the answer isn’t in Scottish local government, at least for now. Perhaps, when the SCA project reaches its conclusion in December, there will more to show for it. Let us hope.
It also has its Scottish Spatial Data Infrastructure hub which presents geospatial data for both local authority and Scottish Government. This is a welcome resource but is not without its challenges. I’ve not found a way to search by licence (as it appears that not all data is licensed for reuse) and some of the data formats (e.g. WMS or WFS) are more suited to other specialists rather than the general public.
If you know of other high quality examples which I have missed, please let me know.
What stops publishers doing better?
I have had many conversations about this over the years. Since I wrote part one of this mini-series several people contacted me with their thoughts about the Scottish Public Sector’s approach to OD.
Issues which get in the way of doing it right (in no particular order) include:
Lack of awareness (or deliberate ignoring) of legal commitments to provide the data
No open data policy, so it is easy to not do it.
No organisational commitment
A lack of understanding by, and therefore no support from, senior managers / elected members
Short term-ism. Too frequently, OD is delivered as a project, not a long-term commitment
No clear responsibility for OD, or the wrong people / roles with responsibility
Lack of awareness of benefits (to organisation, to economy, to society)
Lack of capacity or lack of skills
Lack of engagement with wider data community
Imagined barriers, or no drive to overcome them
Poor data management, and / or siloed structures within the organisation
Data hoarding by services (“data is power and I am not giving mine up”)
Legal restrictions on publishing (real or imagined)
I can’t deal with all of these in this post – and many are cultural, and need to be resolved by the organisations themselves, but I will address a few of these below. It should also be noted that the G8 Charter on Open Data from 2013, and the Scottish Government’s 2015 Open Data Strategy (PDF), mean that not publishing is simply not an option.
But, licensing …
While not all open data is geospatial, a significant proportion is, and particularly useful one at that. A common barrier which is raised when electing not to release geospatial data is the licensing restrictions imposed by Ordnance Survey. Sometimes these are genuine issues but on occasion these difficulties are either thrown up by over-cautious individuals or those who can’t be bothered to research and tackle them.
I do recognise that the issue is a complex one but it is worth comparing the likes of the Surrey Planning Hub which offers a developer-friendly API returning fully-geocoded planning application data for all local authorities in an entire county, with – for example – the Scottish Spatial Hub which hosts 27 amalgamated spatial datasets for the 32 councils. Only three of these are open data. If you try to download the Planning Application data (c.f. Surrey) you are asked for a authentication key. If you try to register for one you are informed that you can only do so if you work for a local authority.
If anyone can explain why Surrey and Hampshire Hub, and other English authorities such as Camden can offer downloads of planning open data, of this quality and Scotland can’t, I would love to hear that. At its heart I believe there a misunderstanding about the OS Licence for Derived Data and presumption to publish.
This recent blog post by Ben Proctor, based on work at OD Camp Belfast, gives as good a set of guidance, and some debunking of myths. His summary hits the nail on the head: “The vast majority of derived data based on OS information can just be published by public bodies under this ‘presumption to publish’.”
The vast majority of derived data based on OS information can just be published by public bodies under this ‘presumption to publish’.
An announcement last week by Ordnance Survey points in the direction of further openness and a more permissive licensing regime (see this post by Owen Boswarva) and this is ahead of the formation and work of the new Geospatial Commission (GC).
So, perceived licence issues will soon be no longer being a barrier behind which the mis-informed can shelter. If I were working in local government data, or in a Scottish Government directorate, I would be proactively planning now how I am going to start to publish it.
Of course, the issues are not just with with publication.
The aim of the Aberdeen meet-up is to create that city-region local data community: bringing together interested, engaged participants from academia, citizens, community groups, developers, councils, Scots Govt departments, private companies and others. Open data is a large part of that conversation as well as data science and other related topics.
Activity such as that should be happening in each of the seven cities, and across Scotland more generally. While it doesn’t have to be driven by the local council – ours wasn’t – it should open up a meaningful dialogue with authorities: demonstrating need, prioritising specific data, providing feedback, creating opportunities for data use, identifying data in others’ hands, providing advocacy etc.
When we created the Scottish Cities Alliances Open Data programme, one of the four planned work streams, which was well-funded, was the nurturing of local data communities. Our aim was to move from the position of council as provider, and citizen / developer as consumer, of data, to one of all interested parties working together. As I said in that piece, “Going beyond publication, the true value of open data will be realised in its re-use and in the innovative uses to which it is put. The SCA partners will work to develop city-region open data eco-systems where the public, third and private sectors collaborate to encourage data use, economic stimulation and creative approaches to solving civic challenges.”
Going beyond publication, the true value of open data will be realised in its re-use and in the innovative uses to which it is put. The SCA partners will work to develop city-region open data eco-systems where the public, third and private sectors collaborate to encourage data use, economic stimulation and creative approaches to solving civic challenges.
As an adjunct to the SCA programme I put forward a proposal in 2017 for funding of a Code For Scotland programme, based on our experience as part of Code For Europe 2014 (PDF). There was a general support for it, but it was put on hold at the time. Part of the idea behind that was to provide seed support for creating a grass-roots movement to work with data in each Scottish city. In the absence of that, or to complement it should it come about, we do need to create informal networks of open data groups across the country.
So, what’s missing?
I subscribe to the notion that data in public hands is a common asset – and should be treated as such: a concept sometimes referred to as a data commons. Getting to that position entails quite a change in thinking and action. A first step is to create open data, publishing that in a way that easily allows, or encourages, re-use, with clear permissive licensing.
Drawing from the points above, to achieve the potential offered by open data (and already realised in more progressive places) Scotland needs the following:
The Scottish Government, and its many branches, Local Government, Health Boards, and others must now demonstrate a commitment to publish open data. This should follow the Enschede model and implement an open-by-default data policy. This means having the policy formally adopted, published, and committed to by all managers and employees.
We need to stop seeing open data as a separate activity to an organisation’s other data governance. It is not. Open data can be regarded to some degree as a barometer of how well an organisation manages its data assets.
Government need to move beyond ‘build-it-and-they-will-come’ attitude to data publishing, and to work with all partners to make it usable, useful and used.
While publishing static open data at three-star level on the five star model is useful starting point, it is not in itself an end. We need
common standards such as DCAT to enable interoperability between data catalogues.
Collaboration is key – and organisations should band together to share some of the heavy lifting. This increases outcomes, improves standards and reduces local cost. We should bin the ‘not invented here mentality’ and look further afield for where work of high quality is taking place. We should share these best practices like this.
While we are on this topic, individual councils should abandon the “we’re special” mentality which surfaces far too often. All unitary authorities essentially provide the same bunch of services, and have the same core systems from few suppliers. Each would benefit from increased co-operation, collaboration and common approaches to data management and publication.
Academia needs to get behind the open data movement. Data Lab and its many partner universities should be actively involved in the Scottish open data eco-system. MSc programmes (and undergraduate courses) should
regularly use open data, and
teach how to make use of it,
show how to build new and innovative services,
encourage students to be advocates for open data, how to request it, and to act as an intermediary between the publisher and the citizen.
We should then extend that to school pupils – linking it to the curriculum, demonstrating how to use data, interpret and understand it, build with it.
Each local city region, at a minimum, should have an active open data group – and links between these should be encouraged. Funding for this core part of the eco-system should be seen by Scottish and Local government as an investment in the economic and social future of Scotland.
The whole is greater than the sum of the parts: recruiting and involving additional local partners, such as local businesses, to make their data open will significantly enhance what the data community can build or create.
We need more meet-ups, events, competitions, challenges, and opportunities for data scientists, coders, analysts to work with government data.
And what will you do?
As the old adage says, “If you are not part of the solution, you are part of the problem.” So my challenge to you is, whatever your role, what are you doing to bring this about?
For local government in particular, please stop boasting about what you are going to do. Do that thing whatever it is, make it live, publish the data, deliver that policy, live up to promises – then you can boast about it.
If you have a responsibility for data and you aren’t actively pushing for its release as open data then you are probably in the wrong job.
If you are a politician, or elected official, and you are not questioning why your organisation is not publishing open data and supporting its use then you should stop down, and let someone who understands this stand for your seat.
If you work in Economic Development, Community Development, Health, Social Care, Transport, Environmental Services or anything else and you aren’t supporting a movement which can positively impact on your area of specialism then your need rethink your commitment to that role.
If you find yourself justifying why you haven’t published, couldn’t get support, would have liked to but , didn’t get a budget, weren’t supported, ‘legal’ said no, the dog ate your data…. please stop. I have heard excuses from all quarters for the last eight years. No more, please.
If you are an academic and your course neither makes use of, nor champions, open data, then revise your course materials (they could probably do with a refresh anyway).
If you are a developer, citizen, journalist, analyst – whatever – and you are not part of a local data meet-up, join one. If there isn’t one, start one.
If your local authority isn’t publishing open data, ask them why: lobby councillors, use FOI, get in the press.
Stop waiting for others to make stuff happen!
My intention is to write a follow up to this section, with a more detailed list of suggestions, links to handy guides, useful publications etc.
I am always up for a conversation about this. If you want to make the right things happen, and need advice, or guidance, for your organisation, business or community, then we can help you. Please get in touch. You can find me here or here or fill in this contact form and we will respond promptly.
Note: this blog post was first published on 10th June 2018 at CodeTheCity.co.uk has has been archived here with redirects from the original URL.
In this, the first of two posts, I look back over eight years of open data in Scotland, showing where ambition and intent mostly didn’t deliver as we hoped.
In the next part I will look forward, examining how we should rectify things, engage the right people, build on current foundations, and how we all can be involved in making it work as we hoped it would all those years ago.
Let our story begin
“The moon was low down, and there was just the glimmer of the false dawn that comes about an hour before the real one.” – Rudyard Kipling, Plain Tales from the Hills, 1888
The story to-date of Open Data in Scotland is one of multiple false dawns. Are we at last about to witness a real sunrise after so much misplaced hope?
At Data Lab‘s recent Innovation Week in Glasgow, I found myself among 115 other data science MSc students – some of the brightest and best in Scotland – working on seven different industry challenges. You can read more of how that went on my own blog. In this post I want to mention briefly one of the challenges, and the subsequent conversations which it stirred in the room, then on social media and even in email correspondence, then use that to illustrate my false dawn analogy.
The Innovation Week challenge was a simple one compared to some others, and was composed of two questions: “how might we analyse planning applications in light of biodiversity?”, and, “how might we evaluate the cumulative impact of planning applications across the 32 Scottish Local Authorities?”
These are, on the face of it, fairly easily answered. To make it even simpler, as part of the preparation for the innovation week, Data Lab, Snook and others had done some of the leg work for us. This included identifying the NBN Atlas system as one which contained over 219 million sightings of wildlife species, which could be queried easily and which provided open access to its data.
That should have been the difficult part. The other part, getting current and planning application data from the Scottish Local Authorities should have been the easier task – but it was far from it. In fact, in the context of the time available to us, it was impossible as we could find not a single council, of the 32, offering its planning data as open data. You can read more of the particulars of that on my earlier blog posts, above.
This is about the general – not the specific, so, for now, let us set some context to this, and perhaps see how we got to be this point.
The first false dawn.
We start in August 2010, when I was working in Aberdeen City Council. I’d been reading quite a bit about open data, and following what a few enlightened individuals, such as Chris Taggart were doing. It seemed to me so obvious that open data could deliver so much socially and economically – even if no formal studies had by then been published. So, since it was a no-brainer, I arranged for us to publish the first open data in Scotland – at least from a Scottish City Council.
The UK Coalition Government had, in 2010, put Open Data front and centre. They created http://data.gov.uk and mandated a transparency agenda for England and Wales which necessitated publishing Open Data for all LA transactions over £500.
At some point thereafter, in 2011-12 both Edinburgh and Glasgow councils started to produce some open data. Sally Kerr in Edinburgh became their champion – and began working with Ewan Klein in Edinburgh University to get things moving there. I can’t track the exact dates. If you can help me, please let me know and I will update this post.
In 2012 the Open Data Institute was founded by Nigel Shadbolt and Tim Berners-Lee, and from day one championed open data as a public good, stressing the need for effective governance models to protect it.
During 2012 and 2013 Aberdeen, Edinburgh and others started work with Nesta Scotland, run out of Dundee, by the inspirational Jackie McKenzie and her amazing team. They funded two collaborative programmes: Make It Local Scotland and Open Data Scotland.
The former had Aberdeen City Council using Linked Open Data (another leap forward) to create a citizen-driven alerts system for road travel disruption. This was built by Bill Roberts and his team at Swirrl – who have gone on to do more excellent work in this area.
Around mid 2013 Glasgow had received Technology Strategy Board funding for a future cities demonstrator was was recruiting people to work on its open data programme
The second Nesta programme, Open Data Scotland , saw two cities – Aberdeen and Edinburgh – work with two rural councils, East Lothian and Clackmannanshire. Crucially, it linked us all with the Code For Europe movement, and we were able to see at first-hand the amazing work being done in Amsterdam, Helsinki, Barcelona, Berlin and elsewhere. It felt that we were part of something bigger, and unstoppable.
And it gets real-er
In late 2014 the Scottish Government appeared to suddenly ‘get’ open data. They wanted a strategy – so they pulled a bunch of us together two write one. The group included Sally from Edinburgh and me – and the document was published in March 2015. I had pushed for it to have more teeth than it ended up having, and to commit to defined actions, putting an onus on departments and local government to deliver widely on this in a tight timescale.
It did include –
“To realise our vision and to meet the growing interest from users we encourage all organisations to have an Open Data publication plan in place and published on their website by December 2015. Organisations currently publishing data in a format which does not readily support re-use, should within their plan identify when the data will be made available in a more re-usable format. The ambition is for all data by 2017 to be published in a format of 3* or above.” I will come back to this later.
This MUST be it!
In 2016-2017 the Scottish Cities Alliance, supported by the European Regional Development Fund launched a programme: Scotland’s Eighth City – The Smart City. At its heart was data – and more specifically open data. The data project was to feature all seven of Scotland’s cities, working on four streams of work:
data engagement and
The perception was also at that time that the Scottish Government had taken its eye off the ball as regards open data. Little if anything had changed as a result of the 2015 strategy. By working together as 7 cities we could lead the way – and get the other 25 councils, and the Scottish Government themselves, not only to take notice, but also to work with us to put Open Data at the heart of Scottish public services.
The programme would run from Jan 2017 to Dec 2018. I was asked to lead it, which I was delighted to do – and remained involved in that way until I retired from Aberdeen City Council in June 2017.
At that point Aberdeen abandoned all commitment to open data and withdrew from the SCA programme. I have no first-hand knowledge of the SCA programme as it stands now.
Six False Dawns Later
So, after six false dawns what is the state of open data in Scotland: is it where we expected it to be? The short answer to that has be a resounding no.
Some of the developments which should have acted as beacons have been abandoned. The few open data portals we have are, with some newer exceptions, looking pretty neglected: data is incomplete or out of date. There is no national co-ordination of effort, no clear sets of guidance, no agreement on standards or terminologies, no technical co-ordination.
Activity, where it happens at all, is localised, and is more often than not grass-roots driven (which is not in itself a bad thing). In some cases local authorities are being shamed into reinstating their programmes by community groups.
The Scottish Government, with the exception of their SIMD Linked Data work, which was again built by Swirrl, and some statistical data, have produced shamefully little Open Data since their 2015 Strategy.
Despite a number of key players in the examples above still being around, in one role of another, and a growing body of evidence demonstrating ROI, there is strong evidence that Senior Managers, Elected Members and others don’t understand the socio-economic benefits that publishing open data can bring. This is particularly disturbing considering the shrinking budgets and the need to be more efficient and effective.
So, what now?
Given that we have witnessed these many false dawns, when will the real sunrise be? What will trigger that, and what can we each do to make it happen?
Last weekend we hosted the second Aberdeen Air Quality hack weekend in recent months. Coming out it there are a number of tasks which we need to work on next. While some of these fall to the community to deliver, there are also significant opportunities for us to work with partners.
While the Air Aberdeen website is better, we still need to apply the styling that was created at the weekend.
We’ve established that the DHT022 chips which we use in the standard Luftdaten device model have challenges in working in our maritime climate. They get saturated and stop reporting meaningful values. There is a fix which is to use BME380 chips in their place. These will continue to give humidity and temperature readings, plus pressure, but due to the different technology used will handle the humidity better. Knowing local humidity is important (see weather data below). So, we need to adapt the design of all new devices to use these chips, and retrofit the existing devices with the new chips.
Placement of new devices
We launched in February with a target of 50 sensors by the end of June and 100 by the end of the year. So far attendees have built 55 devices of which 34 are currently, or have recently been, live. That leaves 21 in people’s hands that are still to be registered and turned on. We’re offering help to those hosts to make them live.
Further, with the generous sponsorship of Converged, Codify, and now IFB we will shortly build 30 more devices, and that will take us to a total of 85. We’ve had an approach by a local company who may be able to sponsor another 40. So, it looks like we will soon exceed the 100 target. Where do we locate these new ones? We need to have a plan to strategically place those around the city where they would be most useful which is where the map, above, comes in.
Community plus council?
We really want to work with the local authority on several aspects of the project. It’s not them versus us. We all gain by working together. There are several areas that we could collaborate on, in addition to the strategic placement of future devices.
For example, we’ve been in discussions with the local authority’s education service with a view to siting a box on every one of the 60 schools in the city. That would take us to about 185 devices – far in excess of the target. Doing that needs funding, and while the technology challenge to get them on the network is trivial, ensuring that the devices survive on the exterior of the buildings might be a challenge.
Also, we’ve asked but had no response to our request to co-locate one of our devices on a roadside monitoring station which would allow us to check the correlation between the outputs of the two. We need to pursue that again.
Comparing our data suggests that we can more than fill in gaps in the local council’s data. The map of the central part of Aberdeen in the image above, shows all of the six official sensors (green) and 12 of the 24 community sensors that we have in the city (in red). You can also see great gaps where there are no sensors which again shows the need for strategic placement of the new ones.
We’ve calculated that with a hundred sensors we’d have 84,096,000 data observations per year for the city, all as open data. The local authority, with six sensors each publishing three items of data hourly, have 157,680 readings per annum – which is 0.18% of the community readings (and if we reach 185 devices then ACC’s data is about 0.10% or 1/1000th of the community data) and the latter of course, besides being properly open-licensed, has much greater granularity and geographic spread.
We need to ensure that we gather historic and new weather data and use that to check if adjustments are needed to PM values. Given that the one-person team who was going to work on this at CTC16 disappeared, we need to first set up that weather data gathering, then apply some algorithms to adjust the data when needed, then make that data available.
Engagement with Academia
We need to get the two local universities aboard, particularly on the data science work. We have some academics and post-grads who attend our events, but how do we get the data used in classes and projects? How do we attract more students to work with us? And , again we need to get schools to only hosting the devices but the pupils using the data to understand their local environment?
The cool stuff
Finally, we when we have the data collected, cleaned, and curated, and APIs in place (from the green up through orange to red layers below) we can start to build some cool things (the blue layers).
These might include, but are not limited to:
data science-driven predictive models of forecast AQ in local areas,
public health alerts,
mobile apps to guide you where it is safe to walk, cycle, jog or suggest cleaner routes to school for children,
logging AQ over time and measuring changes,
correlating local AQ with admissions to hospital of cases of COPD and other health conditions
inform debate and the formulation of local government strategy and policy.
As we saw at CTC16, we could also provide the basis for people to innovate using the data. One great example was the hacked LED table-top lamp which changes colour depending on the AQ outside. Others want to develop personalised dashboards.
Update: A write-up of this event which took place on 16-17th February 2019 is available on this page.
How much do you care about the quality of the air you breathe as you walk to work or university, take the kids to school, cycle or jog, or open your bedroom window?
How good is the air you are breathing? How do you know? What are the levels of particulates (PM2.5 or PM10) and why is this important?
When do these levels go up or down? What does that mean?
Who warns you? Where do they get their data, and how good is it?
Where do you get information, or alerts that you can trust?
We aim to sort this in Aberdeen
Partnering with community groups, Aberdeen University and 57 North Hacklab, we are working on a longterm project to build and deploy community-built, and hosted, sensors for PM2.5 and PM10. We aim to have fifty of these in place in the next few months, across Aberdeen. You can see some early ones in place and generating data here.
The first significant milestone of this will be the community workshop we are holding on 16-17 February 2019. If you want to be part of it, you can get a ticket here. But, be quick; they are going quickly.
There are loads of things you can do if you attend.
For a small cost, you can come along and build your own sensor with someone to help you, and take it home to plug into your home wifi. It will then contribute data for your part of the city.
But we will be doing much more than that.
Working with the data
If you have experience in data science or data analysis, or if you want to work with those who do, there are loads of options to work with the data from existing and future sensors.
Allow historical reading to be analysed against the official government sensors for comparison
Use the data; wind speed, humidity… to build live maps of readings to identify sources of emissions.
Compensate readings from sensors against factors which affect pollution levels to attempt to understand the emissions of pollutants in a given area.