Aberdeen Built Ships

This project was one of several initiated at the fully-online Code the City 19 History and Data event.

It’s purpose is to gather data on Aberdeen-built ships, with the permission of the site’s owners, and to push that refined bulk data, with added structure, onto Wikidata as open data, with links back to the Aberdeen Ships site through using a new identifier.

By adding the data for the Aberdeen Built Ships to Wikidata we will be able to do several things including

Create a timeline of ship building
Create maps, charts and graphs of the data (e.g. showing the change in sizes and types of ships over time
Show the relative activity of the many shipbuilders and how that changed
Link ship data to external data sources
Improve the data quality
Increase engagement with the ships database.

The description below is largely borrowed from the ReadMe file of the project’s Github Repo.

Progress to date

So far the following has been accomplished, mainly during the course of the weekend.

A script get_ids.py was developed to gather all the ship IDs from Aberdeen-built ships and writes them to ids.txt.
The script get_details.py uses the IDs from ids.txt and scrapes the full ship information from Aberdeen-built ships and writes it to the file ships.json.
The file query.rq contains code to execute a query on Wikidata Query Service to get the QID and name of every ship on Wikidata. This has been manually downloaded as all_wd_ships.json.
The file ship_builders.py checks ships.json and constructs a list of all ship builders and a frequency count of their appearance, writing it out to ship_builders.csv.
The file already_in_wd.py has checked for ships names in ships.json and crossed matched with all_wd_ships.json and generated a list of ships whose name indicates that they MAY be already in Wikidata.

Next Steps?

To complete the project the following needs to be done

Ensure that the request for an identifier for ABS is created for use by us in adding ships to Wikidata. A request to create an identifier for Aberdeen Ships is currently pending.
Create Wikidata entities for all shipbuilders and note the QID for each. We’ve already loaded nine of these into WikiData.
Decide on how to deal with the list of ships that MAY be already in Wikidata. This may have to be a manual process. Think about how we reconcile this – name / year / tonnage may all be useful.
Decide on best route to bulk upload – eg Quickstatements. This may be useful: Wikidata Import Guide
Agree a core set of data for each ship that will parsed from ships.json to be added to Wikidata – e.g. name, year, builder, tonnage, length etc
Create a script to output text that can be dropped into a CSV or other file to be used by QuickStatements (assuming that to be the right tool) for bulk input ensuring links for shipbuilder IDs and ABS identifiers are used.

We will also be looking to get pictures of the ships published onto Wiki Commons with permissive licences, link these to the Wiki Data and increase and improve the number of Wikipedia articles on Aberdeen Ships in the longer-term.

Header Image of a Scale Model of Thermopylae at Aberdeen Maritime Museum By Stephencdickson – Own work, CC BY-SA 4.0

Aberdeen Provosts

In the run up to Code The City 19 we had several suggestions of potential projects that we could work on over the weekend. One was that we add all of the Provosts of Aberdeen to Wikidata. This appealed to me so I volunteered to work on it in a team with Wikimedia UK’s Scotland Programme Coordinator, Dr Sara Thomas, with whom I have worked on other projects.

In preparation for CTC19 I’d been reading up on the history of the City’s provosts and discovered that up to 1863 the official title was Provost, and from that point it was Lord Provost. I’d made changes to the Wikipedia page to reflect that, and I’d added an extra item to Wikidata so that we could create statements that properly reflected which position the people held.

Sara and I began by agreeing an approach and sharing resources. We made full use of Google Docs and Google Sheets.

We had two main sources of information on Provosts:

Memorials of the Aldermen, Provosts, and Lord Provosts of Aberdeen, 1272-1895 by Munro 1897, out of copyright, scanned and made available openly, and
Wikipedia, which I suspect draws on the former although there are date discrepancies.

Running the project

I started by setting up a Google Sheet to pull data from Wikipedia as a first attempt to import a list to work with. The importHTML function in Google Sheets is a useful way to retrieve data in list or table format.

I entered the formula in the top left cell (A1):

=importhtml("https://en.wikipedia.org/wiki/List_of_provosts_of_Aberdeen", "list", 27)

and repeating the formula for all the lists – one per century. This populated our sheet with the numerous lists of provosts.

That state didn’t last very long. The query is dynamic. The structure of the Wikipedia page was being adapted, it appeared, with extra lists – so groups of former provosts kept disappearing from our sheet.

I decided to create a list manually – copying the HTML of the Wikipedia page and running some regex find and replace commands in a text editor to leave only the text we needed, which I then pasted into sheets.

Partial list of Provosts — Partial list of Lord Provosts

Once we had that in the Google Sheet we got to work with some formulae to clean and arrange the data. Our entries were in the form “(1410–1411) Robert Davidson” so we had to

- split names from dates,
- split the start dates from end dates, and
- split names into family names and given names.

Having got that working (albeit with a few odd results to manually fix) Sara identified a Chrome plugin called “Wikipedia and WikiData tools” which proved really useful. For example we could query the term in a cell e.g. “Hadden” and get back the QID of the first instance of that. And we could point another query at the QID and ask what it was an instance of. If it was Family Name, or Given Name we could use those codes and only manually look up the others. That saved quite a bit of time.

Identifying QIDs for Given and Family Names

Our aim in all of this was to prepare a bulk upload to Wikidata with as little manual entry as possible. To do that Sara had identified Quickstatements, which is a bulk upload tool for Wikidata, which allows you to make large numbers of edits through a relatively simple interface.

Sara created a model for what each item in Quickstatements should contain:

There are a few quirks – for example, how you format a date – but once you’ve got the basics down it’s an incredibly powerful tool. The help page is really very useful.

Where dates were concerned, I created a formula to look up the date in another cell then surround it with the formatting needed:

="+"&Sheet1!J99&"-00-00T00:00:00Z/9"

Which gave +1515-00-00T00:00:00Z/9 as the output.

You can also bulk-create items, which is what we did here. We found that it worked best in Firefox, after a few stumbles.

Data harvesting

As mentioned above, we used a printed source, from which we harvested the data about the individual Provosts. It’s easy to get very detailed very quickly, but we decided on a basic upload for:

Name
First name
Last name
Position held (qualified by the dates)
Date of birth, and death (where available).

Some of our provosts held the position three or four times, often with breaks between. We attempted to work out a way to add the same role held twice with different date qualifiers, but ultimately this had to be done manually

The first upload

We made a few test batches – five or six entries to see how the process worked.

A test batch to upload via Quickstatements

When that worked we created larger batches. We concluded the weekend with all of the Provosts and Lord Provosts being added to Wikidata which was very satisfying. We also had a list of further tasks to carry out to enhance the data. These included:

Add multiple terms of office – now complete,
Add statements for Replaces (P1365) and Replaced By (P1366) – partly done,
Add honorific titles, partly done
Add images of signatures (partly done) and portraits ( completed) from the reference book,
Add biographical details from the book – hardly started,
Source images for WIkiCommons from the collection portraits at AAGM – request sent,
Add places of burial, identifiers from Find A Grave, photographs of gravestones,
Add streets named after provosts and link them.

You can see the results in this WikiData query: https://w.wiki/PsF

A Wikidata Query showing Provosts' Terms of Office, and their replacements — A Wikidata Query showing Provosts’ Terms of Office, and their replacements

This was a very interesting project to work on – and there is still more to do to improve the data, which you can help with.

Aberdeen Harbour Board Arrivals Transcription Project

A blog post by Mollie Horne, Project Archivist at Aberdeen City and Aberdeenshire Archives and Ian Watt of Code The City.

The arrivals transcription project is an ongoing partnership between Code the City and Aberdeen City & Aberdeenshire Archives. It forms part of a wider project funded by the Archives Revealed initiative funded by The National Archives which aims to improve the accessibility of records.

The arrival registers are a small part of a much larger collection which was transferred to Aberdeen City and Aberdeenshire Archives as a result of a partnership with the Aberdeen Harbour Board.

The project was originally intended to be part of the physical Code the City 19 event in April 2020 but in anticipation of the nationwide restrictions, it was decided to move entirely online. In the week before we were told to work from home, Mollie photographed each individual page (all 649 of them) from the arrival registers from 1914-1920 and uploaded them to the Google Sheets system which had been set up by Ian. This meant that we had a large amount of material which could be worked on for an extended period.

After creating a set of guidelines and helpful links, we invited the public to work on transcribing and checking entries from March 27th onwards. As the online CTC19 event was scheduled for 11-12th April this allowed us two weeks to create enough data to be useful to the coders over the official weekend.

Transcribers accessed two Google sheets. The first was to log their participation and note what photograph they were transcribing.

The second sheet was the one into which they transcribed the data.

We also set up an open Slack group where transcribers could chat, ask questions, get help etc.

Progress was rapid: by the end of the weekend almost 4,000 records had been transcribed and checked. At the time of writing (2nd May 2020) that has now grown to over 7,000 records transcribed.

When an image has been transcribed, and checked, we lock off the entries to preserve them form change.

The data which had been transcribed was used to create a website, set up by Andrew Sage of CTC, where we could see information in a collated an organised way – this was extremely useful to inform other transcriptions. So far we have managed to fully complete 1914 and are working through the rest of the years.

The arrivals transcription project started as a great way to highlight an important time in the history of the Harbour, which has always been a big part of Aberdeen. However, given current circumstances, it has also become a great opportunity to give people something to focus on.

The project remains open – and you can still get involved by contributing just an hour or two of your time. Start here.