Aberdeen Built Ships – an update at CTC20

This project was commenced at CTC19 on 11th -12th April. The aim was to import from Aberdeen Built Ships (with the permission of the Galleries and Museums Service who operate it) a complete set of data on those 3000+ ships into Wikidata data in as clean and well-formatted state as possible.

We got part of the way there at CTC19, and in work done in the following weeks, but the data had still not been imported.

CTC20 progress

We had in the weeks since CTC19, we had identified issues with two significant aspects of the data in the core ABS system: a lack of standardisation of ship types (meaning that there were up to nine variants of a single type) and a similar issue with ship builders.

For the purposes of CTC20 we agreed to set these aside and press ahead with an import of core data for each ship we could – and to revisit the specific details above later.

What was done

Core data was imported into Wikidata for most of the ships. We excluded some ships from the import if the name field was blank or UNKNOWN or UNNAMED. Other, existing, ships had an ABS ID added to their item. This has resulted in 3085 ships in Wikidata with an ABS ID at the time of writing.

Screenshot of Samuel Plimsoll
Screenshot of Samuel Plimsoll

Method

We initially tried to use the CSV format for wikidata quickstatements, but couldn’t get this to work so switched to the TSV version. A python script was written to write the quickstatements file that could then be copied into the quickstatements batch import tool. The import had 2 errors for ships that had a range of years in the Date so generated invalid dates in the quickstatements. These (and 2 duplicates that I noticed after the import) are noted to correct later.

The ABS ID property (P8260) was manually added to the ships that already existed in wikidata.

The mappings between QID and ABS ID was found from SPARQL query:

SELECT ?qid ?absid
WHERE
{
  ?qid wdt:P8260 ?absid.
}

Next Steps?

To complete the project the following needs to be done

  • Add Country of Origin (P495) to all existing Aberdeen-built ships in Wikidata. This will suppress the warning messages when viewing each ship.
  • Rationalise all ship builders that exist in ship_builders.csv – deduplicating these and create Wikidata entries for each we will use.
  • Rationalise all ship types that exist in ship_types.csv – deduplicating these and create Wikidata entries for each we will use.
  • Update each ship with specific type and ship builder.
  • Extract / rationalise data from some of the fields, e.g. we have one dimensions field rather than separate fields for length/beam/draft/… and what’s there is inconsistent
  • Isolate ships that have no Wikidata identifier – i.e. any one not in the list of 59 positive matches. Set aside those which have entries for later processing.
  • Source and add pictures of the ships in ABS (see below)
  • Develop a means of monitoring both the original ABS system (rescrape periodically and do a diff on the file in some way? ) and monitor Wikidata for changes to the ships records (Wikidata query, executed periodically, generating a CSV download and checked for differences from previous runs?) to feed back to ABS.

Images of ships

ships with images
Ships with images

Despite there now being 3,085 Aberdeen-built ships in Wikidata only 12 of these (or 0.388%) has a picture associated with them. There is a significant opportunity to work with Aberdeen Museums to add images from their extensive collection to Wiki Commons and associate these with the ships now in Wikidata.

Header image Twice & Rinina25 / CC BY-SA https://upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Genova-Tall_Ship-IMG_1509.JPG/512px-Genova-Tall_Ship-IMG_1509.JPG

Aberdeen Built Ships

This project was one of several initiated at the fully-online Code the City 19 History and Data event.

It’s purpose is to gather data on Aberdeen-built ships, with the permission of the site’s owners, and to push that refined bulk data, with added structure, onto Wikidata as open data, with links back to the Aberdeen Ships site through using a new identifier.

By adding the data for the Aberdeen Built Ships to Wikidata we will be able to do several things including

  • Create a timeline of ship building
  • Create maps, charts and graphs of the data (e.g. showing the change in sizes and types of ships over time
  • Show the relative activity of the many shipbuilders and how that changed
  • Link ship data to external data sources
  • Improve the data quality
  • Increase engagement with the ships database.

The description below is largely borrowed from the ReadMe file of the project’s Github Repo.

Progress to date

So far the following has been accomplished, mainly during the course of the weekend.

Next Steps?

To complete the project the following needs to be done

  • Ensure that the request for an identifier for ABS is created for use by us in adding ships to Wikidata. A request to create an identifier for Aberdeen Ships is currently pending.
  • Create Wikidata entities for all shipbuilders and note the QID for each. We’ve already loaded nine of these into WikiData.
  • Decide on how to deal with the list of ships that MAY be already in Wikidata. This may have to be a manual process. Think about how we reconcile this – name / year / tonnage may all be useful.
  • Decide on best route to bulk upload – eg Quickstatements. This may be useful: Wikidata Import Guide
  • Agree a core set of data for each ship that will parsed from ships.json to be added to Wikidata – e.g. name, year, builder, tonnage, length etc
  • Create a script to output text that can be dropped into a CSV or other file to be used by QuickStatements (assuming that to be the right tool) for bulk input ensuring links for shipbuilder IDs and ABS identifiers are used.

We will also be looking to get pictures of the ships published onto Wiki Commons with permissive licences, link these to the Wiki Data and increase and improve the number of Wikipedia articles on Aberdeen Ships in the longer-term.

Header Image of a Scale Model of Thermopylae at Aberdeen Maritime Museum By Stephencdickson – Own work, CC BY-SA 4.0