1914 – 1920 Aberdeen Harbour Arrivals Transcription Project – CTC20 update

Building on our foundations

After such a successful weekend at CTC19, we were delighted to be back for CTC20 to continue work on the Aberdeen Harbour Arrivals project. As expected, the team working on the project was made up of both avid coders and history enthusiasts which brings a great range of skills and knowledge to the weekend.
A second spreadsheet was created to input adjustments, this allowed us to clean data to be more presentable whilst keeping the accurate ledger transcriptions intact; a must when dealing with archival material. This data cleaning has allowed us to create a more presentable website which is easier to understand and navigate.

Expanding the data set

The adjustments spreadsheet also included the addition of a new column of information sourced externally from the original transcription documents. When first registered fishing vessels were assigned a Fishing Port Registration Number. Where known, that number has been added and will hopefully allow us to cross reference this vessels with other sources at some point in the future.

Vessel types and roles

Initial steps were taken to begin to create a better understanding about the various vessels, their history and purpose. Many of the vessel names contain prefixes relating to their type (e.g. HMS – His Majesty’s Ship for a regular naval vessel, HMSS for a submarine) and they have now been extracted and a list of definitions is being built up. Decoding these prefixes highlighted just how much naval military activity was taking place around Aberdeen during the First World War.

Visualising the data

Some of the team also looked forward to consider how the data could be used in the future. A series of graphs and charts have been created to highlight patterns such as most frequent ships and most popular cargo. We even have an interactive map to show where the in the world the ships were arriving from.

As with CTC19, the weekend has been a great success. Archivists learned more about data and the coders benefitted from over 15,000 records to play with.

Next steps

An ideal future step for the project is the creation of individual records in the website for each vessel so we can begin to expand on the information – i.e. vessel name, history of Masters, expanded description about what it was, what role in played in the First World War. Given the heavy use of Wikidata by many of the other projects that were part of CTC19 and CTC20, consideration has to be given to using Wikidata as the expanded repository for building up the bigger picture for each vessel. However, as we are still very much in the historical investigation stage and not entirely sure about the full facts for many vessels it would not be appropriate at this stage to start pushing unverified information into Wikidata.

Aberdeen Built Ships – an update at CTC20

This project was commenced at CTC19 on 11th -12th April. The aim was to import from Aberdeen Built Ships (with the permission of the Galleries and Museums Service who operate it) a complete set of data on those 3000+ ships into Wikidata data in as clean and well-formatted state as possible.

We got part of the way there at CTC19, and in work done in the following weeks, but the data had still not been imported.

CTC20 progress

We had in the weeks since CTC19, we had identified issues with two significant aspects of the data in the core ABS system: a lack of standardisation of ship types (meaning that there were up to nine variants of a single type) and a similar issue with ship builders.

For the purposes of CTC20 we agreed to set these aside and press ahead with an import of core data for each ship we could – and to revisit the specific details above later.

What was done

Core data was imported into Wikidata for most of the ships. We excluded some ships from the import if the name field was blank or UNKNOWN or UNNAMED. Other, existing, ships had an ABS ID added to their item. This has resulted in 3085 ships in Wikidata with an ABS ID at the time of writing.

Method

We initially tried to use the CSV format for wikidata quickstatements, but couldn’t get this to work so switched to the TSV version. A python script was written to write the quickstatements file that could then be copied into the quickstatements batch import tool. The import had 2 errors for ships that had a range of years in the Date so generated invalid dates in the quickstatements. These (and 2 duplicates that I noticed after the import) are noted to correct later.

The ABS ID property (P8260) was manually added to the ships that already existed in wikidata.

The mappings between QID and ABS ID was found from SPARQL query:

SELECT ?qid ?absid
WHERE
{
  ?qid wdt:P8260 ?absid.
}

Next Steps?

To complete the project the following needs to be done

Add Country of Origin (P495) to all existing Aberdeen-built ships in Wikidata. This will suppress the warning messages when viewing each ship.
Rationalise all ship builders that exist in ship_builders.csv – deduplicating these and create Wikidata entries for each we will use.
Rationalise all ship types that exist in ship_types.csv – deduplicating these and create Wikidata entries for each we will use.
Update each ship with specific type and ship builder.
Extract / rationalise data from some of the fields, e.g. we have one dimensions field rather than separate fields for length/beam/draft/… and what’s there is inconsistent
Isolate ships that have no Wikidata identifier – i.e. any one not in the list of 59 positive matches. Set aside those which have entries for later processing.
Source and add pictures of the ships in ABS (see below)
Develop a means of monitoring both the original ABS system (rescrape periodically and do a diff on the file in some way? ) and monitor Wikidata for changes to the ships records (Wikidata query, executed periodically, generating a CSV download and checked for differences from previous runs?) to feed back to ABS.

Images of ships

Despite there now being 3,085 Aberdeen-built ships in Wikidata only 12 of these (or 0.388%) has a picture associated with them. There is a significant opportunity to work with Aberdeen Museums to add images from their extensive collection to Wiki Commons and associate these with the ships now in Wikidata.

Header image Twice & Rinina25 / CC BY-SA https://upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Genova-Tall_Ship-IMG_1509.JPG/512px-Genova-Tall_Ship-IMG_1509.JPG

Aberdeen Built Ships

This project was one of several initiated at the fully-online Code the City 19 History and Data event.

It’s purpose is to gather data on Aberdeen-built ships, with the permission of the site’s owners, and to push that refined bulk data, with added structure, onto Wikidata as open data, with links back to the Aberdeen Ships site through using a new identifier.

By adding the data for the Aberdeen Built Ships to Wikidata we will be able to do several things including

Create a timeline of ship building
Create maps, charts and graphs of the data (e.g. showing the change in sizes and types of ships over time
Show the relative activity of the many shipbuilders and how that changed
Link ship data to external data sources
Improve the data quality
Increase engagement with the ships database.

The description below is largely borrowed from the ReadMe file of the project’s Github Repo.

Progress to date

So far the following has been accomplished, mainly during the course of the weekend.

A script get_ids.py was developed to gather all the ship IDs from Aberdeen-built ships and writes them to ids.txt.
The script get_details.py uses the IDs from ids.txt and scrapes the full ship information from Aberdeen-built ships and writes it to the file ships.json.
The file query.rq contains code to execute a query on Wikidata Query Service to get the QID and name of every ship on Wikidata. This has been manually downloaded as all_wd_ships.json.
The file ship_builders.py checks ships.json and constructs a list of all ship builders and a frequency count of their appearance, writing it out to ship_builders.csv.
The file already_in_wd.py has checked for ships names in ships.json and crossed matched with all_wd_ships.json and generated a list of ships whose name indicates that they MAY be already in Wikidata.

Next Steps?

To complete the project the following needs to be done

Ensure that the request for an identifier for ABS is created for use by us in adding ships to Wikidata. A request to create an identifier for Aberdeen Ships is currently pending.
Create Wikidata entities for all shipbuilders and note the QID for each. We’ve already loaded nine of these into WikiData.
Decide on how to deal with the list of ships that MAY be already in Wikidata. This may have to be a manual process. Think about how we reconcile this – name / year / tonnage may all be useful.
Decide on best route to bulk upload – eg Quickstatements. This may be useful: Wikidata Import Guide
Agree a core set of data for each ship that will parsed from ships.json to be added to Wikidata – e.g. name, year, builder, tonnage, length etc
Create a script to output text that can be dropped into a CSV or other file to be used by QuickStatements (assuming that to be the right tool) for bulk input ensuring links for shipbuilder IDs and ABS identifiers are used.

We will also be looking to get pictures of the ships published onto Wiki Commons with permissive licences, link these to the Wiki Data and increase and improve the number of Wikipedia articles on Aberdeen Ships in the longer-term.

Header Image of a Scale Model of Thermopylae at Aberdeen Maritime Museum By Stephencdickson – Own work, CC BY-SA 4.0

Aberdeen Harbour Board Arrivals Transcription Project

A blog post by Mollie Horne, Project Archivist at Aberdeen City and Aberdeenshire Archives and Ian Watt of Code The City.

The arrivals transcription project is an ongoing partnership between Code the City and Aberdeen City & Aberdeenshire Archives. It forms part of a wider project funded by the Archives Revealed initiative funded by The National Archives which aims to improve the accessibility of records.

The arrival registers are a small part of a much larger collection which was transferred to Aberdeen City and Aberdeenshire Archives as a result of a partnership with the Aberdeen Harbour Board.

The project was originally intended to be part of the physical Code the City 19 event in April 2020 but in anticipation of the nationwide restrictions, it was decided to move entirely online. In the week before we were told to work from home, Mollie photographed each individual page (all 649 of them) from the arrival registers from 1914-1920 and uploaded them to the Google Sheets system which had been set up by Ian. This meant that we had a large amount of material which could be worked on for an extended period.

After creating a set of guidelines and helpful links, we invited the public to work on transcribing and checking entries from March 27th onwards. As the online CTC19 event was scheduled for 11-12th April this allowed us two weeks to create enough data to be useful to the coders over the official weekend.

Transcribers accessed two Google sheets. The first was to log their participation and note what photograph they were transcribing.

The second sheet was the one into which they transcribed the data.

We also set up an open Slack group where transcribers could chat, ask questions, get help etc.

Progress was rapid: by the end of the weekend almost 4,000 records had been transcribed and checked. At the time of writing (2nd May 2020) that has now grown to over 7,000 records transcribed.

When an image has been transcribed, and checked, we lock off the entries to preserve them form change.

The data which had been transcribed was used to create a website, set up by Andrew Sage of CTC, where we could see information in a collated an organised way – this was extremely useful to inform other transcriptions. So far we have managed to fully complete 1914 and are working through the rest of the years.

The arrivals transcription project started as a great way to highlight an important time in the history of the Harbour, which has always been a big part of Aberdeen. However, given current circumstances, it has also become a great opportunity to give people something to focus on.

The project remains open – and you can still get involved by contributing just an hour or two of your time. Start here.