Aberdeen Provosts

In the run up to Code The City 19 we had several suggestions of potential projects that we could work on over the weekend. One was that we add all of the Provosts of Aberdeen to Wikidata. This appealed to me so I volunteered to work on it in a team with Wikimedia UK’s Scotland Programme Coordinator, Dr Sara Thomas, with whom I have worked on other projects.

In preparation for CTC19 I’d been reading up on the history of the City’s provosts and discovered that up to 1863 the official title was Provost, and from that point it was Lord Provost. I’d made changes to the Wikipedia page to reflect that, and I’d added an extra item to Wikidata so that we could create statements that properly reflected which position the people held.

Sara and I began by agreeing an approach and sharing resources. We made full use of Google Docs and Google Sheets.

We had two main sources of information on Provosts:

Running the project

I started by setting up a Google Sheet to pull data from Wikipedia as a first attempt to import a list to work with. The importHTML function in Google Sheets is a useful way to retrieve data in list or table format.

I entered the formula in the top left cell (A1):

=importhtml("https://en.wikipedia.org/wiki/List_of_provosts_of_Aberdeen", "list", 27)

and repeating the formula for all the lists – one per century. This populated our sheet with the numerous lists of provosts.

That state didn’t last very long. The query is dynamic. The structure of the Wikipedia page was being adapted, it appeared, with extra lists – so groups of former provosts kept disappearing from our sheet.

I decided to create a list manually – copying the HTML of the Wikipedia page and running some regex find and replace commands in a text editor to leave only the text we needed, which I then pasted into sheets.

Partial list of Provosts
Partial list of Lord Provosts

Once we had that in the Google Sheet we got to work with some formulae to clean and arrange the data. Our entries were in the form “(1410–1411) Robert Davidson” so we had to

    • split names from dates,
    • split the start dates from end dates, and
    • split names into family names and given names.

Having got that working (albeit with a few odd results to manually fix) Sara identified a Chrome plugin called “Wikipedia and WikiData tools” which proved really useful. For example we could query the term in a cell e.g. “Hadden” and get back the QID of the first instance of that. And we could point another query at the QID and ask what it was an instance of. If it was Family Name, or Given Name we could use those codes and only manually look up the others. That saved quite a bit of time.

Identifying QIDs for Given and Family Names
Identifying QIDs for Given and Family Names

Our aim in all of this was to prepare a bulk upload to Wikidata with as little manual entry as possible. To do that Sara had identified Quickstatements, which is a bulk upload tool for Wikidata, which allows you to make large numbers of edits through a relatively simple interface.

Sara created a model for what each item in Quickstatements should contain:

A model of a Quickstatements entry
A model of a Quickstatements entry

There are a few quirks – for example, how you format a date – but once you’ve got the basics down it’s an incredibly powerful tool. The help page is really very useful.

Where dates were concerned, I created a formula to look up the date in another cell then surround it with the formatting needed:

="+"&Sheet1!J99&"-00-00T00:00:00Z/9"

Which gave +1515-00-00T00:00:00Z/9 as the output.

You can also bulk-create items, which is what we did here. We found that it worked best in Firefox, after a few stumbles.

Data harvesting

As mentioned above, we used a printed source, from which we harvested the data about the individual Provosts.  It’s easy to get very detailed very quickly, but we decided on a basic upload for:

  • Name
  • First name
  • Last name
  • Position held (qualified by the dates)
  • Date of birth, and death (where available).

Some of our provosts held the position three or four times, often with breaks between. We attempted to work out a way to add the same role held twice with different date qualifiers, but ultimately this had to be done manually

The first upload

We made a few test batches – five or six entries to see how the process worked.

A test batch to upload via Quickstatements
A test batch to upload via Quickstatements

When that worked we created larger batches. We concluded the weekend with all of the Provosts and Lord Provosts being added to Wikidata which was very satisfying. We also had a list of further tasks to carry out to enhance the data. These included:

  • Add multiple terms of office – now complete,
  • Add statements for Replaces (P1365) and Replaced By (P1366) – partly done,
  • Add honorific titles, partly done
  • Add images of signatures (partly done) and portraits ( completed) from the reference book,
  • Add biographical details from the book – hardly started,
  • Source images for WIkiCommons from the collection portraits at AAGM – request sent,
  • Add places of burial, identifiers from Find A Grave, photographs of gravestones,
  • Add streets named after provosts and link them.

You can see the results in this WikiData query: https://w.wiki/PsF

A Wikidata Query showing Provosts' Terms of Office, and their replacements
A Wikidata Query showing Provosts’ Terms of Office, and their replacements

This was a very interesting project to work on – and there is still more to do to improve the data, which you can help with.

We help kids in regeneration areas. What’s one of them?

At CTC we work with ONE Codebase to deliver Young City Coders classes. These are after school activities to encourage young people to get into coding by trying Scratch, Python and other languages in a Coder Dojo like environment.

Inoapps generously gave us some funding to cover costs and donated old laptops (as did the James Hutton Institute) which we cleaned up and recycled into machines they could use.

All of which is great – and we have 20-25 kids each session starting to get into these coding languages.

The Challenge

But there is an issue – the bulk of our kids are overwhelmingly from west-end schools. And we have an aim to help kids in regeneration areas where opportunities are generally fewer.

So, that means identifying Aberdeen schools that fall in the regeneration areas and contacting the head teacher and having a discussion about what help they would like to see us provide. Simple?

No.

Search for regeneration areas

Starting with the basics – what are the regeneration areas of Aberdeen? According to Google, the Aberdeen City Council website doesn’t tell us. Certainly not in the top five pages of results (and yes, I did go down that far).

Google’s top answer is from the Evening Express article which says that there are five regeneration areas: Middlefield, Woodside, Tillydrone, Torry and Seaton. From what I have heard that sounds like it might be about right – but surely there is an official source of this.

Further searching turns up a page from Pinacl Solutions who won a contract from ACC to provide wifi in the Northern regeneration areas of “Northfield, Middlefield, Woodside and Tillydrone.” Which raises the question of whether Northfield is or isn’t a sixth regeneration area.

The Citizens Advice Bureau Aberdeen has an article on support services for regeneration areas of “Cummings Park, Middlefield, Northfield, Seaton, Tillydrone, Torry, Woodside and Powis.” That adds two more to our list.

Other sites report there being an “Aberdeen City Centre regeneration area.” Is that a ninth?

Having a definitive and authoritative page from ACC would help. Going straight to their site and using the site’s own search function should help. I search for “regeneration areas” and then just “regeneration.”

ACC results for regeneration areas
ACC results for regeneration areas

I get two results: “Union Street Conservation Area Regeneration Scheme” and “Buy Back Scheme”. The latter page has not a single mention of regeneration despite the site throwing up a positive result. The former appears to be all about the built environment. So it is probably not a ninth one in the sense that the others are. Who knows?

So what are the regeneration areas – and how can I find which schools fall within them?

Community Planning Aberdeen

Someone suggested that I try the Community Planning Aberdeen site’. Its not having a site search wasn’t very helpful but using Google to restrict only results from that domain threw up a mass of PDFs.

After wading through half a dozen of these I could find no list or definition of the regeneration areas of the city are. Amending the query to a specific “five regeneration areas” or “eight….” didn’t work.

Trying “seven regeneration areas” did return this document with a line: “SHMU supports residents in the seven regeneration areas of the city.” So, if that is correct then it appears there are seven. What they are – and which of the eight (or nine) we’ve found so far is not included – is still unknown.

Wards, neighbourhoods, districts, areas, school catchment areas

And – do they map onto council wards or are they exact matches for other defined areas – such as neighbourhoods?

It turns out that there are 13 council wards in the city. I had to manually count them from this page. I got there via Google as search the ACC site for Council Wards doesn’t get you there.

I seem to remember there were 37(?) city neighbourhoods identified at one time. To find them I had to know that there were 37 as searching for “aberdeen neighbourhoods’ wasn’t specific enough to return any meaningful list or useful page.

And until we find our what the regeneration areas are, and we can work out which primary and secondary schools fall in those areas, we can’t do very much. Which means that the kids who would benefit from code clubs most don’t get our help.

I though this would be easy!

At the very minimum I could have used a web page with a list of regeneration areas and some jpg maps to show where they are. That’s not exactly hard to provide. And I’d make sure that the SEO was done in a way that it performed well on Google (oh and I’d sort the site’s own search). But that would do at a pinch. Sticking at that would miss so many opportunities, though.

Better would be a set of Shape Files or geojson (ideally presented on the almost empty open data platform) with polygons that I could download and overlay on a background map.

That done I could download a set of school boundaries (they do exist here – yay) and overlay those and workout the intersections between the two. Does the school boundary overlap a regeneration area? Yes? If so, it is on our target list to help.

Incidentally what has happened to the ACC online mapping portal?  Not only does it not appear in any search results either, but all of the maps except the council boundary appear to have vanished, and there used to be dozens of them!

Lack of clarity helps no-one

A failure to publish information and data helps no-one. How can anyone know if their child’s school is in a regeneration area. How can a community group know if they are entitled to additional funding.

Without accurate boundary maps – and better still data – how can we match activities to physical areas (be they regeneration areas, wards, neighbourhoods, or catchment areas)?

How can we analyse investment, spending, attainment, street cleanliness, crime, poverty, number of planning applications, house values, RTAs per area if we can’t get the data?

For us this is a problem, but for the kids in the schools this is another opportunity denied.

Just as we highlighted in our previous post on recycling, the lack of open data is not an abstract problem. It deprives people of data and information and stifles opportunities for innovation. Our charity, and our many volunteers at events can do clever stuff with the data – build new services, apps, websites, and act as data intermediaries to help with data literacy.

Until there is a commitment nationally (and at a city level) to open data by default we will continue to highlight this as a failing by government.

——————————-

The header image for this page is for a map of secondary school boundaries from ACC Open Data, on an Open Street Map background.

 

Boundaries, not barriers

Note: This blogpost first appeared on codethecity.co.uk in January 2019 and has been archived here with a redirect from the original URL. 

I wrote some recent articles about the state of open data in Scotland. Those highlighted the poor current provision and set out some thoughts on how to improve the situation. This post is about a concrete example of the impact of government doing things poorly.

Ennui: a great spur to experimentation

As the Christmas ticked by I started to get restless. Rather than watch a third rerun of Elf, I decided I wanted to practice some new skills in mapping data: specifically how to make Choropleth Maps. Rather than slavishly follow some online tutorials and show unemployment per US state, I thought it would be more interesting to plot some data for Scotland’s 32 local authorities.

Where to get the council boundaries?

If you search Google for “boundary data Scottish Local Authorities”  you will be taken to this page on the data.gov.uk website. It is titled “Scottish Local Authority Areas”  and the description explains the background to local government boundaries in Scotland. The publisher of the data is the Scottish Government Spatial Data Infrastructure (SDI). Had I started on their home page, which is far from user-friendly, and filtered and searched, I would have eventually been taken back to the page on the data.gov.uk data portal.

The latter page offers a link to “Download via OS OpenData” which sounds encouraging.

Download via OS Open Data
Download via OS Open Data

This takes you to a page headed, alarmingly, “Order OS Open Data.” After some lengthy text (which warns that DVDs will take about 28 days to arrive but that downloads will normally arrive within an hour), there then follows a list of fifteen data sets to choose. The Boundary Line option looked most appropriate after reading descriptions.

This was described as being in a proprietary ERSI shapefile format, and being 754Mb of files, with another version in the also proprietary Mapinfo format. Importantly, there was no option for downloading data for Scotland only, which I wanted. In order to download it, I had to give some minimal details, and complete a captcha. On completion, I got the message, “Your email containing download links may take up to 2 hours to arrive.”

There was a very welcome message at the foot of the page: “OS OpenData products are free under the Open Government Licence.” This linked not to the usual National Archives definition, but to a page on the OS site itself with some extra, but non-onerous reminders.

Once the link arrived (actually within a few minutes) I then clicked to download the data as a Zip file. Thankfully, I have a reasonably fast connection, and within a few minutes I received and unzipped twelve sets of 4 files each, which now took up 1.13GB on my hard drive.

Partial directory listing of downloaded files
Partial directory listing of downloaded files

Two sets of files looked relevant: scotland_and_wales_region.shp and scotland_and_wales_const_region.shp. I couldn’t work out what the differences were in these, and it wasn’t clear why Wales data is also bundled with Scotland – but these looked useful.

Wrong data in the wrong format

My first challenge was that I didn’t want Shapefiles, but these were the only thing on offer, it appeared. The tutorials I was going to follow and adapt used a library called Folium, which called for data as GeoJson, which is a neutral, lightweight and human readable file format.

I needed to find a way to check the contents of the Shapefiles: were they even the ones I wanted? If so, then perhaps I could convert them in some way.

To check the shapefile contents, I settled on a library called GeoPandas. One after the other I loaded scotland_and_wales_region.shp and scotland_and_wales_const_region.shp. After viewing the data in tabular form, I could see that these are not what I was looking for.

So, I searched again on the Scottish Spatial Infrastructure and found this page. It has a Download link at the top right. I must have missed that.

SSI Download Link
SSI Download Link

But when you click on Download it  turns out to be a download of the metadata associated with the data, not the data files. Clicking Download link via OS Open Data, further down page, takes you back to the very same link, above.

I did further searching. It appeared that the Scottish Local Government Boundary Commission offered data for wards within councils but not the councils’ own boundaries themselves. For admin boundaries, there were links to OS’ Boundary Line site where I was confronted by same choices as earlier.

Eventually, through frustration I started to check the others of the twelve previously-downloaded Boundary Line data sets and found there was a shape file called “district_borough_unitary_region.shp” On inspection in GeoPandas it appeared that this was what I needed – despite Scottish Local Authorities being neither districts nor boroughs – except that it contained all local authority boundaries for the UK – some 380 (not just the 32 that I needed).

Converting the data

Having downloaded the data I then had to find a way to convert it from Shapefile to Geojson (adapting some code I had discovered on StackOverflow) then subset the data to throw away almost 350 of the 380 boundaries. This was a two stage process: use a conversion script to read in Shapefiles, process and spit out Geojson; write some code to read in the Geojson, covert it to a python dictionary, match elements against a list of Scottish LAs, then write the subset of boundaries back out as a geojson text file.

Code to convert shapefiles to geojson
Code to convert shapefiles to geojson

Using the Geojson to create a choropleth map

I’ll spare the details here, but I then spent many, many hours trying to get the Geojson which I had generated to work with the Folium library. Eventually it dawned on me that while the converted Geojson looked ok, in fact it was not correct. The conversion routine was not producing the correct Geojson.

Another source

Having returned to this about 10 days after my first attempts, and done more hunting around (surely someone else had tried to use Scottish LAs as geojson!) I discovered that Martin Crowley had republished on Github boundaries for UK Administrations as Geojson. This was something that had intended to do for myself later, once I had working conversions, since the OGL licence permits republishing with accreditation.

Had I had access to these two weeks ago, I could have used them. With the Scottish data downloaded as Geojson, producing a simple choropleth map as a test took less than ten minutes!

Choropleth map of Scottish Local Authorities
Choropleth map of Scottish Local Authorities

While there is some tidying to do on the scale of the key, and the shading, the general principle works very well. I will share the code for this in a future post.

Some questions

There is something decidedly user-unfriendly about the SDI approach which is reflective of the Scottish public sector at large when it comes to open data. This raises some specific, and some general questions.

  1. Why can’t the Scottish Government’s SDI team publish data themselves, as the OGL facilitates, rather than have a reliance on OS publishing?
  2. Why are boundary data, and by the looks of it other geographic data, published as ESRI GIS shapefiles or Mapinfo formats rather than the generally more-useable, and much-smaller, GeoJson format?
  3. Why can’t we have Scottish (and English, and Welsh) authority boundaries as individual downloads, rather than bundled as UK-level data, forcing the developer to download unnecessary files? I ended up with 1.13GB (and 48 files) of data instead of a single 8.1MB Scottish geojson file.
  4. What engagement with the wider data science / open community have SDI team done to establish how their data could be useful, useable and used?
  5. How do we, as the broader Open Data community share or signpost resources? Is it all down to government? Should we actively and routinely push things to Google Dataset Search? Had there been a place for me to look, then I would have found the GitHub repo of council boundaries in minutes, and been done in time to see the second half of Elf!

And finally

I am always up for a conversation about how we make open data work as it should in Scotland. If you want to make the right things happen, and need advice, or guidance, for your organisation, business or community, then we can help you. Please get in touch. You can find me here or here or fill in this contact form and we will respond promptly.

AQ – what’s next?

For more background read this post and this one. 

Last weekend we hosted the second Aberdeen Air Quality hack weekend in recent months. Coming out it there are a number of tasks which we need to work on next. While some of these fall to the community to deliver, there are also significant opportunities for us to work with partners.

The Website

While the Air Aberdeen website is better, we still need to apply the styling that was created at the weekend.

draft web design
Draft web design

Humidity Measurement

We’ve established that the DHT022 chips which we use in the standard Luftdaten device model have challenges in working in our maritime climate. They get saturated and stop reporting meaningful values. There is a fix which is to use BME380 chips in their place. These will continue to give humidity and temperature readings, plus pressure,  but due to the different technology used will handle the humidity better. Knowing local humidity is important (see weather data below). So, we need to adapt the design of all new devices to use these chips, and retrofit the existing devices with the new chips. 

Placement of new devices

We launched in February with a target of 50 sensors by the end of June and 100 by the end of the year. So far attendees have built 55 devices of which 34 are currently, or have recently been, live. That leaves 21 in people’s hands that are still to be registered and turned on. We’re offering help to those hosts to make them live.

Further, with the generous sponsorship of Converged,  Codify, and now IFB we will shortly build 30 more devices, and that will take us to a total of 85. We’ve had an approach by a local company who may be able to sponsor another 40. So, it looks like we will soon exceed the 100 target. Where do we locate these new ones? We need to have a plan to strategically place those around the city where they would be most useful which is where the map, above, comes in.

Community plus council?

We really want to work with the local authority on several aspects of the project. It’s not them versus us. We all gain by working together. There are several areas that we could collaborate on, in addition to the strategic placement of future devices.

For example, we’ve been in discussions with the local authority’s education service with a view to siting a box on every one of the 60 schools in the city. That would take us to about 185 devices – far in excess of the target. Doing that needs funding, and while the technology challenge to get them on the network is trivial, ensuring that the devices survive on the exterior of the buildings might be a challenge.

Also, we’ve asked but had no response to our request to co-locate one of our devices on a roadside monitoring station which would allow us to check the correlation between the outputs of the two. We need to pursue that again.

Comparing our data suggests that we can more than fill in gaps in the local council’s data. The map of the central part of Aberdeen in the image above, shows all of the six official sensors (green) and 12 of the 24 community sensors that we have in the city (in red). You can also see great gaps where there are no sensors which again shows the need for strategic placement of the new ones.

We’ve calculated that with a hundred sensors we’d have 84,096,000 data observations per year for the city, all as open data. The local authority, with six sensors each publishing three items of data hourly, have 157,680 readings per annum – which is 0.18% of the community readings (and if we reach 185 devices then ACC’s data is about 0.10% or 1/1000th of the community data) and the latter of course, besides being properly open-licensed, has much greater granularity and geographic spread.

Weather data

We need to ensure that we gather historic and new weather data and use that to check if adjustments are needed to PM values. Given that the one-person team who was going to work on this at CTC16 disappeared, we need to first set up that weather data gathering, then apply some algorithms to adjust the data when needed, then make that data available.

Engagement with Academia

We need to get the two local universities aboard, particularly on the data science work. We have some academics and post-grads who attend our events, but how do we get the data used in classes and projects? How do we attract more students to work with us? And , again we need to get schools to only hosting the devices but the pupils using the data to understand their local environment?

The cool stuff

Finally, we when we have the data collected, cleaned, and curated, and APIs in place (from the green up through orange to red layers below) we can start to build some cool things (the blue layers).

AQA Data Layers
AQA Data Layers

These might include, but are not limited to:

  • data science-driven predictive models of forecast AQ in local areas,
  • public health alerts,
  • mobile apps to guide you where it is safe to walk, cycle, jog or suggest cleaner routes to school for children,
  • logging AQ over time and measuring changes,
  • correlating local AQ with admissions to hospital of cases of COPD and other health conditions
  • inform debate and the formulation of local government strategy and policy.

As we saw at CTC16, we could also provide the basis for people to innovate using the data. One great example was the hacked LED table-top lamp which changes colour depending on the AQ outside. Others want to develop personalised dashboards.

The possibilities, as they say, are endless.

An open letter to Aberdeen City Council

It has been well documented that there is a problem with Aberdeen City Council and their approach to Smart City and Open Data in particular. See these posts, these requests and this github page from a project at CTC11, where we tried to help fix things. Today, a Finnish researcher on Smart Cities posted this on Reddit!  International reputation? What international reputation!

Now it appears that in the relaunch last week of the Aberdeen City Council website, the council has ditched masses of content. This includes the city-wide What’s On which was until recently the most heavily-used part of the council website and which provided an extremely useful community resource.

More digging – well Googling of some popular terms for council website content  and functions – returns nothing but 404 errors. See the list below for some examples.

When, in 2006 when when the site last underwent a major update, the small team took just six months on the transition, beginning to end. No content was lost or broken, and with URL rewriting and redirects they ensured that everything worked on day one.

The council have been working on the current relaunch – on and off as managers were swapped around or were dispensed with – for two years! And the mess of the site, with massive holes in content and functionality,  far outweighs the much-improved look and feel.

So, what is the plan to restore content, much of which is a matter of public record?

We, as tax-payers, have paid for the creation of functionality and information which is of significant public use. So, where has it gone?

For example where is:

Don’t the citizens of Aberdeen deserve better than this?

Maybe someone would care to make an FOI request to the city council – to ask what data the decision-making on transfer of content and functionality was based on, and get a copy of the website stats for the last three months? I think they are fed up of me.

Ian