Opening the data in Aberdeen Convicts – part 2

Introduction

In part one of this blog post we explained the rationale of opening up the data from the Register of Returned Convicts of Aberdeen (1869-1939) . In this second part our intern for the summer project, Sara Mazzoli, explains our methodology and our results. 

How we did it

Preparatory work: designing the Google Sheets

Having made the case to open the data we then designed a process for opening that data. We considered that Wikidata would be an ideal site to upload the records. Indeed, the data uploaded on Wikidata falls under the CC0 license, which allows individuals to share it and use it freely.  Moreover, Wikidata allows individuals to freely query the data, and to apply visualisation and analysis techniques. 

The process consisted of a few different steps, both for the opening of records and for the opening of convicts’ pictures – to which we will dedicate a separate paragraph. First, we designed two Google spreadsheets. 

The first Google spreadsheet hosted the instructions for transcribing and checking the transcribed data, which we designed before the process began, as well as a table for the volunteers to sign up to either transcribe or check the records. 

The other Google spreadsheet was divided into seven further sheets, one for each decade. To determine if one record pertained to one decade or the other, we took as reference the discharge date of the prisoners. Each row contained information on the transcription and checking (the person who transcribed the record; the person who checked the record; their eventual notes – e.g., outstanding information, illegible writing in the page to transcribe, etc.); link to the page of the register; page number) as well as the data of the Register’s page – which was: registered number of the convict; age on discharge; convict name and aliases; gender of the convict; complexion; eyes; hair; height (in imperial measurement); crime; sentence; sentence date; discharge date; distinguishing marks (such as tattoos, scars); address.

Once all the data was transcribed, we took two different processes to open the convicts’ data on Wikidata and to upload the isolated mug shots on Wiki commons.

Opening the convicts’ data: upload to Wikidata

Once all the data was transcribed, we decided to create another Google sheet, since it is easier to upload data on Wikidata through this platform. Here, we designed the look-up tables, and created formulas to translate the convicts’ details from natural language (English) to Wikidata properties and items’ codes (e.g., from “brown hair” to “P1884: Q2367101”, where “P1884” is the category for hair colour and “Q2367101” is brown). 

Lookups of hair colours and QID codes

However, again, there were decisions we had to make:

  • Because metric measurement is more accurate and easier to understand, we designed a formula to translate the imperial measurement into metric.
  • Because distinguishing marks lacked a unique format, and because Wikidata needs some structure for the information that is uploaded to be machine-readable and to be queried; we decided not to upload the distinguishing marks. 
  • As for the addresses, given that we wanted to visualize the addresses in a map, and noting all the addresses on the map would make it too cluttered, we have decided to just upload the first address for each individual. 

Therefore, we created a Unique ID for all the items that we created so that they would be connected together. Also, we enriched the available information of the Register with the data collected by Phil Astley in his blog

All the convicts are now available here: https://w.wiki/3bZn 

Opening the convicts’ mugshots: upload on Wikicommons

For the mug shots, we had to take a different process. 

First, we isolated the mug shots. Then, we created a Wiki Commons category for the Aberdeen mug shots, where we uploaded the isolated pictures. We also created a broader category for Images from the Aberdeen and Aberdeenshire Archives – so that it would function as a collector for  possible future projects. 

To upload the pictures, we created an Excel sheet to generate an automatic description for the pictures, using the available data from the register, as well as information from Phil’s blog. Once all the pictures were uploaded, we matched the picture(s) with the individual it was related to. Indeed, all Wikimedia services are related, and therefore it is easy to link a Wiki Commons image with a Wikidata item. 

What we found out: results

Analysis of sentences: general

For the following visualizations, since they do not report the name of the convicts, we decided to make use of all the available data -, it was effectively possible to analyse the data from 278 records. 

The graph below represents the number of convicts discharged every single year. Amongst these, it seems to us that there are 20 individuals who were probably sentenced to penal servitude twice, such as Elisabeth Wilson or Baxter.  For all the data analysis, we decided to take as reference the date of discharge, rather than the sentence date. The reason for this lies in the fact that the sentence date was often not stated in the Register, while the date of discharge was always present. Therefore, it allows us to carry out a more meaningful and precise analysis.

It can be seen that, after the year 1904 (the year that splits the register in half), far less individuals were released. This means that most convictions happened in the early stages of the Register’s existence – and in particular during the 1860s, 1870s and 1880s. In fact, we counted 167 released individuals between 1869 and 1904, but only 111 individuals being released after 1904. 

Another interesting feature, that can be seen here, is that no individuals were released in 1918 and 1919, as the Great War was raging. 

If we look closer at the convictions, it is noticeable that all the sentences were penal servitudes (apart from one transportation, given in 1851). 83, however, are peculiar, since they included police supervision, hard labour, fortifying of license or fines.

Secondly, it is possible to see that a person could be convicted for more than one crime. For example, nine people were convicted for robbery and assault – that is why the number of offences is higher than the number of returned convicts, as it can be seen from the graph below. Moreover, each sentence could include “P.Cs” (i. e., previous convictions) and/or “hab & rep” (i. e., habit and repute – check the first blog post for more details). 

Finally, three of the sentences given in the register were of penal servitude for life. All these three were given at the start of the twentieth century, and two of them were for crimes against the person (culpable homicide and murder). The third one was for attempting to “communicate information respecting H.M.Forces with the intention of assisting the enemy”. However, all these individuals were ultimately released within 1914 and 1931.

Because these sentences were given in the nineteenth and twentieth centuries, they did not respond to nowadays crime categories as much. For  example, convictions of “abortion”, “sodomy” or “plagium” were hard to classify. Therefore, we decided to follow a classification of the time, first developed by Hume in 1797. Indeed, Hume wrote a very comprehensive work on the classification and cases within Scots Law, which better represents the crimes present in the register. 

Categories:Offences present in the Register pertaining to each category:
Offences against propertyTheft (which can be aggravated by: habit and repute, housebreaking, shopbreaking, warehouse breaking), theft by opening lockfast places, embezzlement, plagium, larceny; reset; falsehood and fraud
Offences against thepersonCulpable homicide, murder, procuring abortion; rape, indecent assault; assault, wounding
Offences against public peaceSending threatening letters
Offences against public police or economyIncest, bigamy, sodomy, indecency, removing a body from its grave
Offences against the StateForgery of notes, uttering, Attempt to communicate information respecting H.M.Forces with the intention of assisting the enemy

Table 1: Offences as classified by Hume in 1797 (Hume, 1819). It is really fascinating to find out that plagium, which is still defined as a crime of “child stealing”, is classified as an offence against property; the rationale being that “the creature taken, which has no will on its own, is a thing” (Hume, 1819, p. 82). 

Of course, this classification cannot be employed uncritically, as it reflects a view of the world on crime, justice, morality and human nature. Grasping this view of the world is impossible for us, but it is interesting to reflect on it  and the impacts that it may still hold on the way we see and experience crime. Therefore, in suggesting that the offences in the Register are classified according to these categories, we do not aim to justify these classifications; but rather to frame those offences in the moral and social paradigms in which they belong to. As claimed by Pauw (2014, p. 9), “Crime history can provide insight into the social response to crime. In terms of social history, the study of crime provides perspective on society’s definition and expectations for moral behavior”. From this point of view, the category of “Offences against public police or economy” is quite exemplary. In fact, it comprehended all offences that went against “propriety, good neighbourhood and good manners”. 

Therefore, as we also stressed in the first blogpost, it is important to underline that such classifications, despite having real effects on individuals, are constructed – and thus can and must be framed and questioned.  

Taking a closer look at the offences for which individuals were convicted, it is evident that the vast majority were accused of crimes against

Property. The total number of offences against property were 246. In comparison, the number of offences against the person was just 50, and those against public police or economy were 12. The offences pertaining to the other two classes of crimes, summed together, account for less than 10 crimes. There were moreover a few sentences that we could not classify, such as “Military striking a superior officer”.

It is interesting to notice that, while most offences for other categories seem quite scattered through the register, most of convictions against public police or economy were mainly given within the last few years of the Register’s existence.

Analysis of sentences: averages and medians

With the aim of understanding how the sentences length changed over the decades, to understand if we could find out any particular pattern; we calculated the average sentence length given for each crime category, both before 1904 and after 1904. This is because the data on charges for crimes against property is the only one which allows a more granular and detailed analysis and confrontation of sentences from decade to decade. 

Due to the fact that, as mentioned above, 84 sentences were containing other pieces of information (such as hard labour, police supervision, etc.), these sentences could not be included in the calculation of the average. Also, in this case, as for the mixed sentences, such as the above-mentioned “Assault & Robbery”, we could not determine whether the sentence length could be equally splitted between the two charges, and therefore decided to pair them together with “Other” in this analysis.

Point in timeAgainst propertyAgainst the personAgainst public peaceAgainst public police or economyAgainst the StateOther and mixed sentences
Convicts released before 1904       
Number of sentences per type of crime109173212 
Average sentence length (years)6.557.948.336.56.33 
Median sentence length (years)7756.5
Convicts released after 1904       
Number of sentences per type of crime64161102
Average sentence length (years)4.577.534.412.55.71 
Median sentence length (years)553412.5

Table 2: statistics on sentence length for each class of crimes, divided by point in time.

In general, sentence length for individuals released after 1904 was a bit lower compared to the penalty received by individuals released before. Calculating the average sentence length for the sentences given before 1904 (where it was possible to calculate it), we found out that convicts were, on average, sentenced to 6.73 years in prison. After 1904, that number reduced to 5.26 years. It seems that most of it has to do with shorter punishments for crimes against property, which were the vast majority and which average length for crimes against property significantly decreased for individuals released after 1904. 

For crimes against the person after 1904, we see an average sentence length of 7.81 years, contrasted by a median length of 5. For convicts released before 1904 who committed a crime against the person, usually a sentence of 5 years or more was given, and thus sentences were more homogeneous – and therefore, median and average are similar. Instead, for those convicted of crime against the person released after 1904, there were four harsher sentences to 10 years or more (including the two penal servitudes for life, which lasted 15 and 30 years), and eleven sentences that lasted 5 years or less. A similar thing is true for crimes against public police or economy before 1904; where amongst those three sentences given, two were 5 years long and one, for incest, 15 years long.

Analysis of gender and age data

Amongst these 278 records that we could analyse, we found out that 37 convicts were women. Most of them were convicted for crimes against property, mostly theft. There were four however, who were convicted for a crime against the person – all of them for culpable homicide in 1886, 1893, 1923, 1929. When  analysing the data for gender, it is apparent that most of women were convicted in the second half of the nineteenth century, being most of them discharged between 1869 and 1893. 

In terms of how the average age of female convicts compares to male convicts, as well as to how the average sentence length given to women compares with that of male convicts; there are no apparent significant differences.

Point in timeAverage age on dischargeAverage sentence
 FemaleMaleFemaleMale
Convicts released before 190444397.57.03
Convicts released after 190439403.675.17

Table 3: statistics on sentence length and age, divided by gender and by point in time.

Analysis of addresses data

The map with the addresses was developed by Ian, and can be seen here. This visualization was created through the Wikidata queries service, and is based on the data we have currently uploaded on Wikidata. Thus, it is based on records of individuals discharged between 1869 and 1921.

Fig 1: Map with the addresses to which convicts returned. The first view on the right comprehends the whole Aberdeen urban area, while the second one below it is more focussed on the city (the represented area is that inside the black square in the first picture). The third represents Aberdeen’s city centre, where most convicts returned to (the represented area is that inside the black square in the second picture). 

The yellow dots represent the address of convicts released in 1870s, the green ones is for the addresses of those who were released in 1880s, the red ones are for the addresses of those who were released in 1890s, the purple ones is for the addresses of those who were released in 1900s,  the brown ones are for the addresses of those who were released in 1910s, the pink ones are for the addresses of those who were released in 1920s.

According to Smith (2000, p. 22), in 1708 Aberdeen had 5000 inhabitants. By 1800, that figure quintupled, but the city’s borders remained unchanged: “the boundaries at the time were defined by the Denburn Valley to the west, the south end of the Spital (now St Peter Street) to the north, and the tidal estuary of the River Dee to the south. […] This growth [in population] had been accommodated primarily by the infill of open space thus greatly increasing the density of the urban population”. Therefore, Aberdeen’s council decided on a plan to expand the city towards the Western areas of the Denburn valley, thereby building Union Street – which project started in 1799.

The construction of Union Street redefined the geography and the social composition of the city. While at the start of the nineteenth century, the poor and the rich more or less shared the same urban space; with the expansion of the city towards the West, the middle and upper classes moved to the newly built suburbs. The working class was thus left to live in the city centre, often in squalid and unsanitary conditions, in slum-like housing. Indeed, despite the economic prosperity experienced by the city during the second half of the nineteenth century, the living condition of the working class did not improve (Williams, 2000). 

In fact, according to Williams (2000), despite the widespread poverty amongst the working classes, caused by low wages, the local government hardly ever intervened before the twentieth century. Indeed, in Victorian times, poverty was generally seen as a choice, rather than the result of social forces; and thus public intervention was not seen as a possible solution. The Victoria Lodging-House, in which 19 convicts resided as their first address – previously a residence known as Provost Skene – was opened as a result of philanthropic action, rather than from governmental initiative. The first municipal attempts to solve the housing crisis and to clear the slums are to be found at the very end of the nineteenth century and at the start of the twentieth, with the appointment of Matthew Hay as the medical officer of health. Hay denounced the conditions in which the poor lived, and suggested improving the most critical areas for the general public health interest. Therefore, the government built the Corporation Lodging House in East North street in 1899, where 17 convicts resided as their first address. It also started closing uninhabitable dwellings. More systematic attempts to overcome the indigent conditions in which the poor lived started in the Twenties, in which, for example, Guestrow was cleared. In total, the people in the Register that lived in Guestrow as their first address were 31, with the last moving there in 1913. 

The high concentration of former convicts in the areas of Castlehill and Castlegate therefore may suggest that most of them lived in such conditions, and were belonging to the working class. Therefore, they might have lived in conditions of extreme poverty. This could also explain why the rate of crimes against property, which patterns usually change according to the economic cycle, remained quite high during the second half of the nineteenth century; despite Aberdeen’s economy increasing during that period.

Reflections on the project

All in all, this project has been incredibly interesting and stimulating. It represented a chance to dive into the history of Aberdeen, explore social Victorian practices, and understand and work with a platform such as Wikidata. Indeed, it was fascinating to see, for example, the way in which convicts were described. We could notice the widespread presence of tattoos, and even the presence of vaccination marks after the 1880s. 

Part of the reason why the project was so stimulating, and represented an occasion to learn so much, lies in the complexity of the project itself. We had to make decisions at every stage of the project, we had to design spreadsheets and formulas, and employ tools which we had never before – and that required quite a bit of trial and error. To me, it was an occasion to understand how Wikidata, a platform born in 2012, works, and what are its potentials. 

Since its foundation, Wikidata has since grown with projects like this one. In fact, as Ian told me, the project has been a great occasion to also help shape and change Wikidata’s way to categorize convictions. Indeed, the way in which data can be stored in the platform is quite flexible and fairly easy to change. 

Many scholars, activists and Wikimedians have highlighted the possibilities unfolded by the opening data using Wikidata. For example, Evans (2017) has claimed that Wikidata can provide better access to datasets, and can better connect collections together. In fact, being just one platform, it potentially allows for the datasets’ items to be linked also with other items in other datasets. Therefore, Wikidata can be defined as a platform to publish linked open data (LOD), and as such can provide us with more insights on the data compared to a single institution’s website (Allison-Cassin & Scott, 2018) – since that website probably does not offer the same possibility of linking the institution’s data with that of other institutions.

Furthermore, as Ian explained to me, employing Wikidata is free, and thus far less costly than maintaining an institution’s website on which to open the data. Indeed, there is a very small risk of Wikidata being closed, as it is with any site, but the datasets on the platform can always be downloaded and backed up, and this risk is much lower than a project website set up by a local authority whose funding may be cut in future. Therefore, Wikidata can potentially represent a great opportunity for the GLAM sector, which data is so crucial to understand the history of the places where we live in. Of course, I would argue that this data must be framed and contextualized, and the choices that were made must also be made as transparent as possible. Also, it is noticeable that there are limits to which data can be opened, and how. Since the data to be opened still requires some uniform formatting, it was not possible for us to open the distinguishing marks yet. Nonetheless, for the GLAM sector Wikidata can represent an occasion to engage with the local community, and to co-create meaningful projects. 

Ultimately, on this note, we want to thank once again the volunteers who took some time to help us with the project. We couldn’t have done it without your kind collaboration.

References

Allison-Cassin, S., & Scott, D. (2018). Wikidata: a platform for your library’s linked open data. Code4Lib Journal, (40).

Evans, J. [Wikimedian in Residence – University of Edinburgh]. (2017, November 7). Wikidata loves Galleries. Libraries, Archives & Museums – Jason Evans, National Library of Wales [Video]. YouTube. https://www.youtube.com/watch?v=qf6OG2QTvT4&t=1406s

Hume, D. (1819). Commentaries on the Law of Scotland, Vol. 1

Pauw, E. (2014). Reports of Criminality: The Aberdeen Journal and the Presentation of Crime, 1845-1850 (Doctoral dissertation).

Smith, J. S. (2000). The Growth of the City. In Aberdeen, 1800-2000: A New History (pp. 22-46). Tuckwell Press.

Williams, N. J. (2000). Housing. In Aberdeen, 1800-2000: A New History (pp. 295-322). Tuckwell Press.

Social Sounds

A blog post by James Littlejohn.

CTC23 – the future of the City.  A new theme to explore. After introductions, initial ideas were sought for the Miro board – to ease us along Bruce put on some Jazz music. This was to inspire Dimi to put forward an idea on how sounds in a City could be. This post-its gathered interest and the Social Sounds project and team was formed.

Dimi shared his vision and existing knowledge on sound projects, namely, Luckas Martinelli project This would become the algorithmic starting point visualising sound data on a map. The first goal was to using this model and apply it and visualise a sound map of Aberdeen. This was achieved over the weekend but was only half of the visualisation goals.  The other half was to look build a set of tools that would allow communities to envision and demonstrate noise pollution reductions through interventions, green walls, trees plantings or even popup band stands.  An early proof of concept toolkit was produced. The social in social sounds references to community, connecting all those connected by sound and place. The product concluded by show how this social graph could be export to a decision making platform e.g.loomio.org 

What is next?  The algorithmic model needs to be ground with real world sound sensor data. Air quality devices in Aberdeen can be upgraded with a microphone. Also, noise itself needs to be included in the map experience, this can be achieved through a sound plug in of existing recordings. The toolkit needs much more work, it needs to give members of the community the ability to add their own intervention ideas and for those ideas to be visualised on the map, highlight the noise reduction potential or enhancement, permanent or temporary. Much achieved, much to do.

CTC23 – The OD Bods

Introduction

This blog post was written to accompany the work of The OD Bods team at Code the City 23 – The Future of The City

Open data has the power to bring about economic, social, environmental, and other benefits for everyone. It should be the fuel of innovation and entrepreneurship, and provide trust and transparency in government.

But there are barriers to delivering those benefits. These include:

  • Knowing who publishes data, and where,
  • Knowing what data is being published – and when that happens, and
  • Knowing under what licence (how) the data is made available, so that you can use it, or join it together with other agencies’ data.

In a perfect world we’d have local and national portals publishing or sign-posting data that we all could use. These portals would be easy to use, rich with metadata and would use open standards at their core. And they would be federated so that data and metadata added at any level could be found further up the tree. They’d use common data schemas with a fixed vocabulary which would be used as a standard across the public sector. There would be unique identifiers for all identifiable things, and these would be used without exception. 

You could start at your child’s school’s open data presence and get an open data timetable of events, or its own-published data on air quality in the vicinity of the school (and the computing science teacher would be using that data in classes). You could move up to a web presence at the city or shire level and find the same school data alongside other schools’ data; and an aggregation or comparison of each of their data. That council would publish the budget that they spend on each school in the area, and how it is spent. It would provide all of the local authority’s schools’ catchment areas or other LA-level education-specific data sets. And if you went up to a national level you’d see all of that data gathered upwards: and see all Scottish Schools and also see the national data such as SQA results, school inspection reports – all as open data.

But this is Scotland and it’s only six years since the Scottish Government published a national Open Data Strategy; one which committed data publication would be open by default

Looking at the lowest units – the 32 local authorities – only 10, or less than a third, even have any open data. Beyond local government, of the fourteen health boards none publishes open data, and we note that of the thirty Health and Social Care Partnerships only one has open data. Further, in 2020 it was found that of an assumed 147 business units comprising Scottish Government (just try getting data of what comprises what is in the Scottish Government) – 120 have published no data.

And, of course there are no regional or national open data portals. Why would Scottish Government bother? Apart, that is, from that six year old national strategy and an EU report in 2020 from which it was clear that OD done well would benefit the Scottish economy by around £2.21bn per annum? Both of these are referred to in the Digital Strategy for Scotland 2021

Why there is no national clamour around this is baffling. 

And despite there being a clear remit at Scottish Government for implementing the OD Strategy no-one, we are told, measures or counts the performance nationally. Because if you were doing this poorly, you’d want to hide that too, wouldn’t you? 

And, for now, there is no national portal. There isn’t even one for the seven cities, let alone all 32 councils. Which means there is 

  • no facility to aggregate open data on, say, planning, across all 32 councils. 
  • no way to download all of the bits of the national cycle paths from their custodians. 
  • no way to find out how much each spends on taxis etc or the amount per pupil per school meal. 

There is, of course, the Spatial Hub for Scotland, the very business model of which is designed (as a perfect example of the law of unintended consequences) to stifle the publication of open data by local government. 

So, if we don’t have these things, what do we have?

What might we expect?

What should we expect from our councils – or even our cities? 

Here are some comparators

Remember, back about 2013 , both Aberdeen and Edinburgh councils received funding from Nesta Scotland to be part of Code For Europe where they learned from those cities above. One might have expected that by now they’d have reached the same publication levels as these great European cities by now? We’ll see soon. 

But let’s be generous. Assume that each local authority in Scotland could produce somewhere between 100 and 200 open data sets. 

  • Scotland has 32 local authorities 
  • Each should be able to produce 100  – 200 datasets per authority  – say 150 average

= 150 x 32 = 4800 data sets.

The status quo

Over the weekend our aim was to look in detail at each of Scotland’s 32 local authorities and see which was publishing their data openly – to conform with the 2015 Open Data Strategy for Scotland. What did we find?

Our approach

As we’ve noted above there is no national portal. And no-one in Scottish Government is counting or publishing this data. So, following the good old adage, “if you want something done, do it yourself”, a few of us set about trying to pull together a list of all the open datasets for Scotland’s 7 cities and the other 25 authorities. For the naive amongst us, it sounded like an easy thing to do. But getting started even became problematic. Why?

  1. Only some councils had any open data – but which?
  2. Only some of those had a landing page for Open Data. Some had a portal. Some used their GIS systems. 
  3. Those that did provide data used different categories. There was no standardised schema. 
  4. For others, some had a landing page but then additional datasets were being found elsewhere on their websites
  5. Contradictory licence references on pages – was it open or not?

We also looked to see if there was already a central hub of sorts upon which we could build. We found reference to Open Data on Scottish Cities Alliance website but couldn’t find any links to open data. 

Curiosity then came into play, why were some councils prepared to publish some data and others so reluctant? What was causing the reluctancy? And for those publishing, why were all datasets not made open, what was the reason for selecting the ones they had chosen?

What we did

Our starting point was to create a file to allow us to log the source of data found. As a group, we decided upon headers in the file, such as the type of file, the date last updated to name but a few.

From previous CTC events which we attended we knew that Ian had put a lot of effort previously into creating a list of council datasets – IW’s work of 2019 and 2020 which became our starting source. We also knew that Glasgow and Edinburgh were famous for having large, but very out of date, open data portals which were at some point simply switched off. 


We were also made aware of another previous attempt from the end of 2020 to map out the cities’ open data. The screenshot below (Fig 1) is from a PDF by Frank Kelly of DDI Edinburgh which compared datasets across cities in Scotland. You can view the full file here.

Fig 1 From an analysis of Scottish Cities’s open data by Frank Kelly of DDI Edinburgh, late 2020 or early 2021

For some councils, we were able to pull in a list of datasets using the CKAN API. That worked best of all with a quick bit of scripting to gather the info we needed. If all cities, and other authorities did the same we’d have cracked it all in a few hours! But it appears that there is no joined up thinking, no sharing of best practices, no pooling of resources at play in Scotland. Surely COSLA, SCA, SOCITM and other groups could get their heads together and tackle this? 

For others there were varying degrees of friction. We could use the arcGIS API to gather a list of data sets. But the arcGIS API tied us up in knots trying to get past the sign in process, i.e. did we need an account or could we use it anonymously – it was difficult to tell. Luckily with an experienced coder in our team we were able to make calls to the API and get responses – even if these were verbose and needed manual processing afterwards. This post from Terence Eden “What’s your API’s “Time To 200”?” is really relevant here! 

For the rest it was a manual process of going into each city/council website and listing files. With three of us working on it for several hours. We succeeded in pulling together the datasets from the different sources into our csv file

One council trying to publish open data but the quality, and the up-to-date-ness was questionable

Ultimately, the sources were so varied and difficult to navigate that it took 5 digitally-skilled individuals a full day, that is 30 man-hours, to pull this data together. Yet if we have missed any, as we are sure to have done, it may be because they have moved or are hidden away. Let us know if there are more. 

From this output it became clear that there was no consistency in the types of files in which the data was being provided and no consistency in the refresh frequency. This makes it difficult to see a comprehensive view in a particular subject across Scotland (because there are huge gaps) and makes it difficult for someone not well versed in data manipulation to aggregate datasets, hence reducing usability and accessibility. After all, we want everyone to be able to use the data and not put barriers in the way.

We have a list, now what

We now had a list of datasets in a csv file, so it was time to work on understanding what was in it. Using Python in Jupyter Notebooks, graphs were used to analyse the available datasets by file type, the councils which provided it, and how the data is accessed. This made it clear that even among the few councils which provide any data, there is a huge variation in how they do that. There is so much to say about the findings of this analysis, that we are going to follow it up with a blog post of its own.

Unique Datasets by Council
Unique dattes by council and filetype
Average filetypes provided for each data set by Council

One of our team also worked on creating a webpage (not currently publicly-accessible) to show the data listings and the graphs from the analysis. It also includes a progress bar to show the number of datasets found against an estimated number of datasets which could be made available – this figure was arbitrary but based on a modest expectation of what any local authority could produce. As you saw above, we set this figure much lower than we see from major cities on the continent.

What did we hope to achieve?

A one stop location where links to all council datasets could be found. 

Consistent categories and tags such that datasets containing similar datasets could be found together. 

But importantly we wanted to take action – no need for plans and strategies, instead we took the first step.

What next?

As we noted at the start of this blog post, Scotland’s approach Open Data is not working. There is a widely-ignored national strategy. There is no responsibility for delivery, no measure of ongoing progress, no penalty for doing nothing and some initiatives which actually work against the drive to get data open. 

Despite the recognised economic value of open data – which is highlighted in the 2021 Digital Strategy but was also a driver for the 2015 strategy! – we still have those in government asking why they should publish and looking specifically to Scotland (a failed state for OD) for success stories rather than overseas. 

 We’ve seen closed APIs being, we assume, to try to measure use. We suspect the thinking goes something like this:

A common circular argument

In order for open data to be a success in Scotland we need it to be useful, usable, and used. 

Useful

That means the data needs to be geared towards those who will be using it: students, lecturers, developers, entrepreneurs, data journalists, infomediaries. Think of the campaign in 2020 led by Ian to get Scottish Government to publish Covid data as open data, and what has been made of it by Travelling Tabby and others to turn raw data into something of use to the public.

Usable

The data needs to be findable, accessible, and well structured. It needs to follow common standards for data and the metadata. Publishers need to collaborate – coordinate data releases across all cities, all local authorities. ‘Things’ in the data need to use common identifiers across data sets so that they can be joined together, but the data needs to be usable by humans too. 

Used

The data will only be used if the foregoing conditions are met. But government needs to do much more to stimulate its use: to encourage, advertise, train, fund, and invest in potential users. 

The potential GDP rewards for Scotland are huge (est £2.21bn per annum) if done well. But that will not happen by chance. If the same lacklustre, uninterested, unimaginative mindsets are allowed to persist; and no coordination applied to cities and other authorities, then we’ll see no more progress in the next six years than we’ve seen in the last. 

While the OGP process is useful, bringing a transparency lens to government, it is too limited. Government needs to see this as an economic issue as is the case, and one which the current hands-off approach is failing. We also need civic society to get behind this, be active, visible, militant and hold government to account. What we’ve seen so far from civic society is at best complacent apathy. 

Scotland could be great at this – but the signs, so far, are far from encouraging!

Team OD Bods (Karen, Pauline, Rob, Jack, Stephen and Ian)

Waste Wizards at CTC22

A write-up of progress at the March 2021 Environment-themed hack weekend.

What problem we were addressing?


The public have access to two free, easy accessible waste recycling and disposal methods. The first is “kerbside collection” where a bin lorry will drive close to almost every abode in the UK and crews will (in a variety of different ways) empty the various bins, receptacles, boxes and bags. The second is access to recycling centres, officially named Household Waste Recycling Centres (HWRCs) but more commonly known as the tip or the dump. These HWRCs are owned by councils or local authorities and the information about these is available on local government websites.


However, knowledge about this second option: the tips, the dumps, the HWRCs, is limited. One of the reasons for that is poor standardisation. Council A will label, map, or describe a centre one way; Council B will do it in a different way. There is a lot of perceived knowledge – “well everybody just looks at their council’s website, and everybody knows you can only use your council’s centres”. This is why at CTC22 we wanted to get all the data about HWRCs into a standard set format, and release it into the open for communities to keep it present and up to date. Then we’d use that data to produce a modern UI so that residents can actually get the information they require:

  • Which tips they can use?
  • When these dumps are open?
  • What can they take to these HWRCs?
  • “I have item x – where can I dispose of it?”

Our approach


There were six main tasks to complete:

  1. Get together a list of all the HWRCs in the UK
  2. Build an open data community page to be the centre point
  3. Bulk upload the HWRCs’ data to WikiData
  4. Manually enter the HWRCs into OpenStreetMap
  5. Create a website to show all the data
  6. Create a connection with OpenStreetMap so that users could use the website to update OSM.

What we built / did

All HWRCs are regulated by a nation’s environmental regulator:

  • For Scotland it is SEPA
  • For Northern Ireland it is NIEA
  • For Wales it is NRW
  • For England it is EA

A list of over 1,000 centres was collated from these four agencies. The data was of variable quality and inconsistent.


This information was added to a wiki page on Open Street Map – Household waste in the United Kingdom, along with some definitions to help the community navigate the overly complex nature of the waste industry.


From that the lists for Scotland, Wales and England were bulk uploaded to WikiData. The was achieved by processing the data in Jupiter Notebooks, from which formatted data was exported to be bulk uploaded via the Quick Statements tool. The NIEA dataset did not include geolocation information so future investigation will need to be done to add these before these too can be uploaded. A Wikidata query has been created to show progress on a map. At the time of writing 922 HWRCs are now in Wikidata.

Then the never-ending task of locating, updating, and committing the changes of each of the OSM locations was started.

To represent this data the team built a front-end UI with .NET Core and Leaflet.js that used Overpass Turbo to query OSM. Local Authority geolocation polygons were added to highlight the sites that a member of the public could access. By further querying the accepted waste streams the website is able to indicate which of those centres they can visit can accept the items they are wanting to recycle.

However, the tool is only as good as the data so to close the loop we added a “suggest a change” button that allowed users to post a note on that location on OpenStreetMap so the wider community can update that data.

We named the website OpenWasteMap and released it into the wild.

The github repo from CTC22 is open and available to access.

Pull requests are also welcome on the repo for OpenWasteMap.

What we will do next (or would do with more time/ funding etc)

The next task is to get all the data up-to-date and to keep it up to date; we are confident that we can do this because of the wonderful open data community. It would also be great if we could improve the current interface on the frontend for users to edit existing waste sites. Adding a single note to a map when suggesting a change could be replaced with an edit form with a list of fields we would like to see populated for HWRCs. Existing examples of excellent editing interfaces in the wild include healthsites.io which provides an element of gamification and completionism with a progress bar with how much data is populated for a particular location.

An example entry from Healthsites.io

Source: https://healthsites.io/map#!/locality/way/26794119

While working through the council websites it has become an issue that there is no standard set of terms for household items, and the list is not machine friendly. For example, a household fridge can be called:

  • Fridge
  • Fridge Freezer
  • WEEE
  • Large Domestic Electrical Appliance
  • Electric Appliance
  • White Good

A “fun” next task would be to come up with a taxonomy of terms that allows easier classification and understanding for both the user and the machine. Part of this would include matching “human readable” names to relevant OpenStreetMap tags. For example “glass” as an OSM tag would be “recycling:glass”


There are other waste sites that the public can used called Bring Banks / Recycling Points that are not run by Local Authorities that are more informal locations for recycling – these too should be added but there needs to be some consideration on how this information is maintained as their number could be tenfold that of HWRCs.

As we look into the future we must also anticipate the volume of data we may be able to get out of sources like OpenStreetMap and WikiData once well populated by the community. Starting out with a response time of mere milliseconds when querying a dozen points you created in a hackathon is a great start; but as a project grows the data size can spiral into megabytes and response times into seconds. With around 1,000 recycling centres in the UK and thousands more of the aforementioned Bring Banks this could be a lot of data to handle and serve up to the public in a presentable manner.

Using Wikidata to model Aberdeen’s Industrial Heritage

Saturday 6th March, 2021 was World Open Data Day. To mark this international event CTC ran a Wikidata Taster session. The objectives were to introduce attendees to Wikidata and how it works, and give them a few hours to familiarise themselves with how to add items, link items, and add images.

Presentation title screen
Presentation title screen

The theme of the session (to give it some structure and focus) was the Industrial Heritage of Aberdeen. More specifically the bygone industries of Aberdeen and, more specific still, the many Iron Foundries that once existed. I chose the specific topic as it is still relatively easy to spot the products of the industry on streets and pavements as we walk around the city, photograph those and add them to Wiki Commons, as I have been doing.

We had thirteen people book and eight turn up. After I gave a short presentation on how Wikidata operates we divided ourselves into three groups in breakout rooms. This was all on Zoom, of course, while we were still under lockdown.

The teams of attendees chose a foundry each: Barry, Henry & Cook Limited; Blaikie Brothers, and William McKinnon & Company Ltd. I’d already created an entry for John Duffus and Company in preparation for the event and to use as a model.

I’d also created a Google Sheet with a tab for each of the other thirteen foundries I’d identified (including those selected by the groups). I’d also spent quite a while trying to figure out how to access and search the old business and Post Office Directories for the city which had been digitised for 1824 to 1941. I eventually I built myself a tool, which I shared with the teams, which generated an URL for a specific search term for a certain directory. They used this, as well as other sources, to identify key dates, addresses and name changes of businesses.

By the end of the session our teams had created items for

They had also created items for foundry buildings – linked to Canmore etc, as well as founders. We enhanced these with places of their burial, portraits and images of gravestones. I took further photos which I uploaded to Commons and linked the following Monday. I created two Wikidata queries to show the businesses added, and the founders who created the businesses.

The statistics for the 3 hour session (although some worked into the afternoon and even the next day) are impressive. You can see more detail on the event dashboard.

We received positive feedback from the attendees who have been able to take their first steps towards using Wikidata as a public linked open data for heritage items.

I hope that the attendees will keep working on the iron founders until we have all of these represented on Wikidata. Next we can tackle shipbuilders and the granite industry!