Code The City is a civic hacking initiative focused on using tech and (open) data for civic good. We use hack weekends, open data, workshops, and idea generation tools. We run Data Meetups, the Aberdeen Python User Group and the annual Scottish Open Data Unconference
This was one of three projects which were worked on during CTC24 – Open In Practice. We asked Jan Ainali, who led the project, to explain it for those who were not present at the weekend event.
Why did we run this project?
On Wikidata, there is a WikiProject for getting all the government agencies of all levels and the whole world properly modeled. Since it is a huge project, every way to try to break it down to bite size pieces are necessary. Work on the UK was already started, so it made a lot of sense trying to complete Scotland.
Work done before CTC24
In October during Scottish Open Data Unconference 2021 we got started on this task. We found some good sources and made fine progress, completing several categories of agencies. By completing here, I mean that we made sure there were items in Wikidata representing the agencies and that they were well enough modeled so that we could query for them. But we weren’t done, and some of the trickiest parts remained.
What we achieved at CTC24, what impact we hope it will have
With a joint effort, we managed to sort out how the judicial system was organized, which was something that remained unclear since the last session. Most time was spent in researching to understand it, and when that was done, it was fairly straightforward to create items for the courts that were missing and to model the others in a way that made it possible to query for them. We also managed to sort out some other small tasks from the last event, and finally we could produce one huge query to get all Scottish agencies at once: https://w.wiki/4TpN
Now we think this is the most complete and up-to-date list of Scottish agencies. If we are wrong about that, we would love feedback so that we can improve it.
What next – how can people get involved?
A few things are happening right now. First, we are importing agencies into the Govdirectory platform that is a more user-friendly view of Wikidata. We have already imported the local authorities, NHS boards and Health and Social Care Partnerships. You can find that data here: https://www.govdirectory.org/united-kingdom/
Since the large query was a bit messy, we will also try to improve the modeling in Wikidata. You are more than welcome to help with this. This will make queries for everybody simpler, and we will continuously be importing more agencies to the platform as we get done.
At Code the City we believe that the right people, with the right skills and tools, can do great things. We believe that we can use technology and data to solve many civic challenges. Those beliefs are as applicable now as was when we started seven years ago. And our volunteers who come to our events time and again agree. They know that sharing their skills and knowledge with others in small teams, over a weekend, working on a focussed and achievable project, is a satisfying experience which leaves them with a sense of achievement. It also introduces them to working in teams and in an agile way: short sprints of work and pauses for review.
In the last seven years we’ve tackled many topics – and worked with multiple partner organisations in the public and private sector to solve their challenges – and to identify opportunities to use data and technology to improve how they deliver their services.
Throughout that period we’ve had some central principles that we’ve adopted which still hold true:
Data, where appropriate, should be open and licensed for reuse
Software should be developed as open source – where the code can be inspected, and improved on by anyone, and reusable openly by others
Information, images and other content should be as openly licensed as possible to encourage re-use and creativity
Where appropriate stable platforms exist (such as WIkidata, Open Streetmap, Github,or Wiki Commons) we should use those
People working in small teams and in short sprints of activity can achieve an enormous amount over a weekend
Last week at Open UK’s COP26 event “Open Technology for Sustainability”, which our co-founder and trustee Ian Watt attended, those same principles that inspired our creation, and inform our continuing work, were echoed time and again by speakers. And at the evening awards dinner we were runners-up to the the wonderful Open Knowledge Foundation, in the Data category. This further validates our belief in our approach.
More recently we’ve been concentrating even harder on improving open data in Scotland and the UK – but not to the exclusion of other projects. In addition to several history and heritage projects which have seen large amounts of open data created and published, we’ve had projects such as Open Wastemap which was built almost entirely over two CTC weekend and uses community-sourced data in Wikidata and OpenStreetmap to power this really useful tool to find local recycling facilities.
Our next event CTC24 – Open In Practice is taking place in just over a week. It is the perfect introduction to what we do and to becoming involved. We already have a list of potential projects that attendees, new and experienced, can get involved in. Some of these are local in scale and some national. All need a blend of skills from attendees. You don’t need to be either a coder or data expert to participate. You can sign up directly here or from the event link above.
No excuses: be part of the group that does the good things – or stand by and watch while we do!
One of the biggest strengths of the events we run at CTC is the wide variety of projects we undertake, with each one being engaging in a completely different way. In our last blog post we looked at the ‘History + Data = Innovation’ and ‘History and Culture’ events which highlighted interesting records from the 19th and 20th century and which generate new open data from historic sources. The Event we’re looking at today however, ‘Archeology Meets Data Science’ uncovers an even older part of Aberdeen’s History.
To give some context, an excavation of St Nicholas Kirk from 2006-2007 uncovered more than a thousand human remains and artefacts, as well as parts of the building dated at over a thousand years old. In 2018 we were granted the opportunity to work with data taken from the excavation as well as work with some of the human remains themselves – certainly not the usual kind of work you’d expect to carry out at a hack event! Six teams were formed to work on various aspects of the project such as working through the original data and compiling supplementary info in an extensive Q&A document.
The team which mostly worked with the Skeletons used a technique called Photogrammetry – this involves taking photos from many angles which with the right software, can be used to create a 3d model.
While some experts were involved with the project many of the team had never done something like this, so there was a lot of trial and error with finding the right software. Eventually their perseverance paid off however, and the result is a great looking archive of the scanned remains freely available online.
As well as this we also had a team of 3d modellers use Unity to recreate the burial site as well as where the skeletons were located. There’s even Virtual Reality support if you own an Oculus Go, and the burial can be accessed here.
Ali Cameron was one of the experts involved in the project, having been involved in the field for more than 40 years, and was impressed with the work we carried out. We asked about her thoughts regarding the project, she had the following to say:
Some of the quieter students really came out of their shells, we all got a chance to meet the coders and I have kept up with a couple of them to discuss other projects. The coders really enjoyed the archaeology side and I chatted with them all about the aspect they were working on which is quite different from a lot of the programming they had done before. The Event was extremely successful and very fascinating.
Overall the project turned out to be quite an unforgettable experience for those involved. It challenged our team and volunteers in an interesting way, and was a unique chance to interact with some of the oldest relics in Aberdeen’s history. It also highlights our goal of making data accessible very well, as we took a fascinating discovery and allowed its contents to be made freely available online thanks to our 3d modelling endeavors.We have a more in-depth post regarding the project for those wanting to find out more.
While the projects we carry out may seem a bit daunting to newcomers, everyone involved in our events carries out a vital role which is highlighted by some past attendee experiences.
As part of our commitments to making Data accessible for all we were looking to digitally archive many 19th and 20th century records that were only available physically or were missing information online. This was the main focus of our 19th and 20th Events known as ‘History + Data = Innovation’ and ‘History and Culture’ respectively. These primarily involved uploading data on a multitude of subjects to Wikidata such as Listed buildings, convict registers and March Stones, which signified the boundary of crofts in Aberdeen primarily in the 16th century.
Heather Black initially found out about the project through the Aberdeen memories Facebook page and got involved transcribing part of the Aberdeen harbour logbooks; specifically the names, registered countries, the name of the captain and what cargo was being shipped.
Having a keen interest in local history the project immediately drew her attention, and when asked about her thoughts on the event, she stated that “It was a good distraction while being on furlough from my work at the time, and very interesting too”, and is looking to participate in future events that catch her interest.
Sheila Watt was similarly involved in this project, and despite not having too much interest initially ended up having a great experience. Her daughter joined Code the City after looking for a volunteering project to take part in for a Duke of Edinburgh Award last year. Eventually Sheila joined as well after hearing about the Aberdeen Harbour Arrivals project and due to her interest in local history ended up greatly enjoying her experience, describing it as the perfect project to keep her interested during the first lockdown last year.
Taking on the role of a volunteer transcriber, she enjoyed her experience so much that she took part again for the Returned Prisoner Project and similarly had a great time in the same role. She now regularly checks on our projects through the slack channel and will hopefully be involved in future projects.
Despite some participants being apprehensive about the tech-side of things, there are lots of ways to contribute to each event, and you’ll definitely find something that plays to your strengths and skills. If you’re interested in any of the event topics you’ll find the work engaging too, and get to talk to some friendly like-minded people.
Check this post to learn more about the Harbour archival process as well as the wider project.
Our next event is CTC24 Open in Practice where we will have several projects that will appeal to non-coders, offering the opportunity to gather more data and open it up for public benefit.
In part oneof this blog post we explained the rationale of opening up the data from the Register of Returned Convicts of Aberdeen (1869-1939) . In this second part our intern for the summer project, Sara Mazzoli, explains our methodology and our results.
How we did it
Preparatory work: designing the Google Sheets
Having made the case to open the data we then designed a process for opening that data. We considered that Wikidata would be an ideal site to upload the records. Indeed, the data uploaded on Wikidata falls under the CC0 license, which allows individuals to share it and use it freely. Moreover, Wikidata allows individuals to freely query the data, and to apply visualisation and analysis techniques.
The process consisted of a few different steps, both for the opening of records and for the opening of convicts’ pictures – to which we will dedicate a separate paragraph. First, we designed two Google spreadsheets.
The first Google spreadsheet hosted the instructions for transcribing and checking the transcribed data, which we designed before the process began, as well as a table for the volunteers to sign up to either transcribe or check the records.
The other Google spreadsheet was divided into seven further sheets, one for each decade. To determine if one record pertained to one decade or the other, we took as reference the discharge date of the prisoners. Each row contained information on the transcription and checking (the person who transcribed the record; the person who checked the record; their eventual notes – e.g., outstanding information, illegible writing in the page to transcribe, etc.); link to the page of the register; page number) as well as the data of the Register’s page – which was: registered number of the convict; age on discharge; convict name and aliases; gender of the convict; complexion; eyes; hair; height (in imperial measurement); crime; sentence; sentence date; discharge date; distinguishing marks (such as tattoos, scars); address.
Once all the data was transcribed, we took two different processes to open the convicts’ data on Wikidata and to upload the isolated mug shots on Wiki commons.
Opening the convicts’ data: upload to Wikidata
Once all the data was transcribed, we decided to create another Google sheet, since it is easier to upload data on Wikidata through this platform. Here, we designed the look-up tables, and created formulas to translate the convicts’ details from natural language (English) to Wikidata properties and items’ codes (e.g., from “brown hair” to “P1884: Q2367101”, where “P1884” is the category for hair colour and “Q2367101” is brown).
However, again, there were decisions we had to make:
Because metric measurement is more accurate and easier to understand, we designed a formula to translate the imperial measurement into metric.
Because distinguishing marks lacked a unique format, and because Wikidata needs some structure for the information that is uploaded to be machine-readable and to be queried; we decided not to upload the distinguishing marks.
As for the addresses, given that we wanted to visualize the addresses in a map, and noting all the addresses on the map would make it too cluttered, we have decided to just upload the first address for each individual.
Therefore, we created a Unique ID for all the items that we created so that they would be connected together. Also, we enriched the available information of the Register with the data collected by Phil Astley in his blog.
To upload the pictures, we created an Excel sheet to generate an automatic description for the pictures, using the available data from the register, as well as information from Phil’s blog. Once all the pictures were uploaded, we matched the picture(s) with the individual it was related to. Indeed, all Wikimedia services are related, and therefore it is easy to link a Wiki Commons image with a Wikidata item.
What we found out: results
Analysis of sentences: general
For the following visualizations, since they do not report the name of the convicts, we decided to make use of all the available data -, it was effectively possible to analyse the data from 278 records.
The graph below represents the number of convicts discharged every single year. Amongst these, it seems to us that there are 20 individuals who were probably sentenced to penal servitude twice, such as Elisabeth Wilson or Baxter. For all the data analysis, we decided to take as reference the date of discharge, rather than the sentence date. The reason for this lies in the fact that the sentence date was often not stated in the Register, while the date of discharge was always present. Therefore, it allows us to carry out a more meaningful and precise analysis.
It can be seen that, after the year 1904 (the year that splits the register in half), far less individuals were released. This means that most convictions happened in the early stages of the Register’s existence – and in particular during the 1860s, 1870s and 1880s. In fact, we counted 167 released individuals between 1869 and 1904, but only 111 individuals being released after 1904.
Another interesting feature, that can be seen here, is that no individuals were released in 1918 and 1919, as the Great War was raging.
If we look closer at the convictions, it is noticeable that all the sentences were penal servitudes (apart from one transportation, given in 1851). 83, however, are peculiar, since they included police supervision, hard labour, fortifying of license or fines.
Secondly, it is possible to see that a person could be convicted for more than one crime. For example, nine people were convicted for robbery and assault – that is why the number of offences is higher than the number of returned convicts, as it can be seen from the graph below. Moreover, each sentence could include “P.Cs” (i. e., previous convictions) and/or “hab & rep” (i. e., habit and repute – check the first blog post for more details).
Finally, three of the sentences given in the register were of penal servitude for life. All these three were given at the start of the twentieth century, and two of them were for crimes against the person (culpable homicide and murder). The third one was for attempting to “communicate information respecting H.M.Forces with the intention of assisting the enemy”. However, all these individuals were ultimately released within 1914 and 1931.
Because these sentences were given in the nineteenth and twentieth centuries, they did not respond to nowadays crime categories as much. For example, convictions of “abortion”, “sodomy” or “plagium” were hard to classify. Therefore, we decided to follow a classification of the time, first developed by Hume in 1797. Indeed, Hume wrote a very comprehensive work on the classification and cases within Scots Law, which better represents the crimes present in the register.
Offences present in the Register pertaining to each category:
Offences against property
Theft (which can be aggravated by: habit and repute, housebreaking, shopbreaking, warehouse breaking), theft by opening lockfast places, embezzlement, plagium, larceny; reset; falsehood and fraud
Incest, bigamy, sodomy, indecency, removing a body from its grave
Offences against the State
Forgery of notes, uttering, Attempt to communicate information respecting H.M.Forces with the intention of assisting the enemy
Table 1: Offences as classified by Hume in 1797 (Hume, 1819). It is really fascinating to find out that plagium, which is still defined as a crime of “child stealing”, is classified as an offence against property; the rationale being that “the creature taken, which has no will on its own, is a thing” (Hume, 1819, p. 82).
Of course, this classification cannot be employed uncritically, as it reflects a view of the world on crime, justice, morality and human nature. Grasping this view of the world is impossible for us, but it is interesting to reflect on it and the impacts that it may still hold on the way we see and experience crime. Therefore, in suggesting that the offences in the Register are classified according to these categories, we do not aim to justify these classifications; but rather to frame those offences in the moral and social paradigms in which they belong to. As claimed by Pauw (2014, p. 9), “Crime history can provide insight into the social response to crime. In terms of social history, the study of crime provides perspective on society’s definition and expectations for moral behavior”. From this point of view, the category of “Offences against public police or economy” is quite exemplary. In fact, it comprehended all offences that went against “propriety, good neighbourhood and good manners”.
Therefore, as we also stressed in the first blogpost, it is important to underline that such classifications, despite having real effects on individuals, are constructed – and thus can and must be framed and questioned.
Taking a closer look at the offences for which individuals were convicted, it is evident that the vast majority were accused of crimes against
Property. The total number of offences against property were 246. In comparison, the number of offences against the person was just 50, and those against public police or economy were 12. The offences pertaining to the other two classes of crimes, summed together, account for less than 10 crimes. There were moreover a few sentences that we could not classify, such as “Military striking a superior officer”.
It is interesting to notice that, while most offences for other categories seem quite scattered through the register, most of convictions against public police or economy were mainly given within the last few years of the Register’s existence.
Analysis of sentences: averages and medians
With the aim of understanding how the sentences length changed over the decades, to understand if we could find out any particular pattern; we calculated the average sentence length given for each crime category, both before 1904 and after 1904. This is because the data on charges for crimes against property is the only one which allows a more granular and detailed analysis and confrontation of sentences from decade to decade.
Due to the fact that, as mentioned above, 84 sentences were containing other pieces of information (such as hard labour, police supervision, etc.), these sentences could not be included in the calculation of the average. Also, in this case, as for the mixed sentences, such as the above-mentioned “Assault & Robbery”, we could not determine whether the sentence length could be equally splitted between the two charges, and therefore decided to pair them together with “Other” in this analysis.
Point in time
Against the person
Against public peace
Against public police or economy
Against the State
Other and mixed sentences
Convicts released before 1904
Number of sentences per type of crime
Average sentence length (years)
Median sentence length (years)
Convicts released after 1904
Number of sentences per type of crime
Average sentence length (years)
Median sentence length (years)
Table 2: statistics on sentence length for each class of crimes, divided by point in time.
In general, sentence length for individuals released after 1904 was a bit lower compared to the penalty received by individuals released before. Calculating the average sentence length for the sentences given before 1904 (where it was possible to calculate it), we found out that convicts were, on average, sentenced to 6.73 years in prison. After 1904, that number reduced to 5.26 years. It seems that most of it has to do with shorter punishments for crimes against property, which were the vast majority and which average length for crimes against property significantly decreased for individuals released after 1904.
For crimes against the person after 1904, we see an average sentence length of 7.81 years, contrasted by a median length of 5. For convicts released before 1904 who committed a crime against the person, usually a sentence of 5 years or more was given, and thus sentences were more homogeneous – and therefore, median and average are similar. Instead, for those convicted of crime against the person released after 1904, there were four harsher sentences to 10 years or more (including the two penal servitudes for life, which lasted 15 and 30 years), and eleven sentences that lasted 5 years or less. A similar thing is true for crimes against public police or economy before 1904; where amongst those three sentences given, two were 5 years long and one, for incest, 15 years long.
Analysis of gender and age data
Amongst these 278 records that we could analyse, we found out that 37 convicts were women. Most of them were convicted for crimes against property, mostly theft. There were four however, who were convicted for a crime against the person – all of them for culpable homicide in 1886, 1893, 1923, 1929. When analysing the data for gender, it is apparent that most of women were convicted in the second half of the nineteenth century, being most of them discharged between 1869 and 1893.
In terms of how the average age of female convicts compares to male convicts, as well as to how the average sentence length given to women compares with that of male convicts; there are no apparent significant differences.
Point in time
Average age on discharge
Convicts released before 1904
Convicts released after 1904
Table 3: statistics on sentence length and age, divided by gender and by point in time.
Analysis of addresses data
The map with the addresses was developed by Ian, and can be seen here. This visualization was created through the Wikidata queries service, and is based on the data we have currently uploaded on Wikidata. Thus, it is based on records of individuals discharged between 1869 and 1921.
Fig 1: Map with the addresses to which convicts returned. The first view on the right comprehends the whole Aberdeen urban area, while the second one below it is more focussed on the city (the represented area is that inside the black square in the first picture). The third represents Aberdeen’s city centre, where most convicts returned to (the represented area is that inside the black square in the second picture).
The yellow dots represent the address of convicts released in 1870s, the green ones is for the addresses of those who were released in 1880s, the red ones are for the addresses of those who were released in 1890s, the purple ones is for the addresses of those who were released in 1900s, the brown ones are for the addresses of those who were released in 1910s, the pink ones are for the addresses of those who were released in 1920s.
According to Smith (2000, p. 22), in 1708 Aberdeen had 5000 inhabitants. By 1800, that figure quintupled, but the city’s borders remained unchanged: “the boundaries at the time were defined by the Denburn Valley to the west, the south end of the Spital (now St Peter Street) to the north, and the tidal estuary of the River Dee to the south. […] This growth [in population] had been accommodated primarily by the infill of open space thus greatly increasing the density of the urban population”. Therefore, Aberdeen’s council decided on a plan to expand the city towards the Western areas of the Denburn valley, thereby building Union Street – which project started in 1799.
The construction of Union Street redefined the geography and the social composition of the city. While at the start of the nineteenth century, the poor and the rich more or less shared the same urban space; with the expansion of the city towards the West, the middle and upper classes moved to the newly built suburbs. The working class was thus left to live in the city centre, often in squalid and unsanitary conditions, in slum-like housing. Indeed, despite the economic prosperity experienced by the city during the second half of the nineteenth century, the living condition of the working class did not improve (Williams, 2000).
In fact, according to Williams (2000), despite the widespread poverty amongst the working classes, caused by low wages, the local government hardly ever intervened before the twentieth century. Indeed, in Victorian times, poverty was generally seen as a choice, rather than the result of social forces; and thus public intervention was not seen as a possible solution. The Victoria Lodging-House, in which 19 convicts resided as their first address – previously a residence known as Provost Skene – was opened as a result of philanthropic action, rather than from governmental initiative. The first municipal attempts to solve the housing crisis and to clear the slums are to be found at the very end of the nineteenth century and at the start of the twentieth, with the appointment of Matthew Hay as the medical officer of health. Hay denounced the conditions in which the poor lived, and suggested improving the most critical areas for the general public health interest. Therefore, the government built the Corporation Lodging House in East North street in 1899, where 17 convicts resided as their first address. It also started closing uninhabitable dwellings. More systematic attempts to overcome the indigent conditions in which the poor lived started in the Twenties, in which, for example, Guestrow was cleared. In total, the people in the Register that lived in Guestrow as their first address were 31, with the last moving there in 1913.
The high concentration of former convicts in the areas of Castlehill and Castlegate therefore may suggest that most of them lived in such conditions, and were belonging to the working class. Therefore, they might have lived in conditions of extreme poverty. This could also explain why the rate of crimes against property, which patterns usually change according to the economic cycle, remained quite high during the second half of the nineteenth century; despite Aberdeen’s economy increasing during that period.
Reflections on the project
All in all, this project has been incredibly interesting and stimulating. It represented a chance to dive into the history of Aberdeen, explore social Victorian practices, and understand and work with a platform such as Wikidata. Indeed, it was fascinating to see, for example, the way in which convicts were described. We could notice the widespread presence of tattoos, and even the presence of vaccination marks after the 1880s.
Part of the reason why the project was so stimulating, and represented an occasion to learn so much, lies in the complexity of the project itself. We had to make decisions at every stage of the project, we had to design spreadsheets and formulas, and employ tools which we had never before – and that required quite a bit of trial and error. To me, it was an occasion to understand how Wikidata, a platform born in 2012, works, and what are its potentials.
Since its foundation, Wikidata has since grown with projects like this one. In fact, as Ian told me, the project has been a great occasion to also help shape and change Wikidata’s way to categorize convictions. Indeed, the way in which data can be stored in the platform is quite flexible and fairly easy to change.
Many scholars, activists and Wikimedians have highlighted the possibilities unfolded by the opening data using Wikidata. For example, Evans (2017) has claimed that Wikidata can provide better access to datasets, and can better connect collections together. In fact, being just one platform, it potentially allows for the datasets’ items to be linked also with other items in other datasets. Therefore, Wikidata can be defined as a platform to publish linked open data (LOD), and as such can provide us with more insights on the data compared to a single institution’s website (Allison-Cassin & Scott, 2018) – since that website probably does not offer the same possibility of linking the institution’s data with that of other institutions.
Furthermore, as Ian explained to me, employing Wikidata is free, and thus far less costly than maintaining an institution’s website on which to open the data. Indeed, there is a very small risk of Wikidata being closed, as it is with any site, but the datasets on the platform can always be downloaded and backed up, and this risk is much lower than a project website set up by a local authority whose funding may be cut in future. Therefore, Wikidata can potentially represent a great opportunity for the GLAM sector, which data is so crucial to understand the history of the places where we live in. Of course, I would argue that this data must be framed and contextualized, and the choices that were made must also be made as transparent as possible. Also, it is noticeable that there are limits to which data can be opened, and how. Since the data to be opened still requires some uniform formatting, it was not possible for us to open the distinguishing marks yet. Nonetheless, for the GLAM sector Wikidata can represent an occasion to engage with the local community, and to co-create meaningful projects.
Ultimately, on this note, we want to thank once again the volunteers who took some time to help us with the project. We couldn’t have done it without your kind collaboration.
Allison-Cassin, S., & Scott, D. (2018). Wikidata: a platform for your library’s linked open data. Code4Lib Journal, (40).