1914 – 1920 Aberdeen Harbour Arrivals Transcription Project – CTC20 update

Building on our foundations

After such a successful weekend at CTC19, we were delighted to be back for CTC20 to continue work on the Aberdeen Harbour Arrivals project. As expected, the team working on the project was made up of both avid coders and history enthusiasts which brings a great range of skills and knowledge to the weekend.
A second spreadsheet was created to input adjustments, this allowed us to clean data to be more presentable whilst keeping the accurate ledger transcriptions intact; a must when dealing with archival material. This data cleaning has allowed us to create a more presentable website which is easier to understand and navigate.

Expanding the data set

The adjustments spreadsheet also included the addition of a new column of information sourced externally from the original transcription documents. When first registered fishing vessels were assigned a Fishing Port Registration Number. Where known, that number has been added and will hopefully allow us to cross reference this vessels with other sources at some point in the future.

Vessel types and roles

Initial steps were taken to begin to create a better understanding about the various vessels, their history and purpose. Many of the vessel names contain prefixes relating to their type (e.g. HMS – His Majesty’s Ship for a regular naval vessel, HMSS for a submarine) and they have now been extracted and a list of definitions is being built up. Decoding these prefixes highlighted just how much naval military activity was taking place around Aberdeen during the First World War.

Visualising the data

Some of the team also looked forward to consider how the data could be used in the future. A series of graphs and charts have been created to highlight patterns such as most frequent ships and most popular cargo. We even have an interactive map to show where the in the world the ships were arriving from.

As with CTC19, the weekend has been a great success. Archivists learned more about data and the coders benefitted from over 15,000 records to play with.

Next steps

An ideal future step for the project is the creation of individual records in the website for each vessel so we can begin to expand on the information – i.e. vessel name, history of Masters, expanded description about what it was, what role in played in the First World War. Given the heavy use of Wikidata by many of the other projects that were part of CTC19 and CTC20, consideration has to be given to using Wikidata as the expanded repository for building up the bigger picture for each vessel. However, as we are still very much in the historical investigation stage and not entirely sure about the full facts for many vessels it would not be appropriate at this stage to start pushing unverified information into Wikidata.

Aberdeenshire Settlements on Wikidata and Wikipedia

Introduction

This project was part of Code The City’s #CTC20 History and Culture hack weekend.

The Challenge

To identify (all of) the settlements – towns, hamlets, villages – in Aberdeenshire and ensure that these are well represented with high quality items on Wikidata and Wikipedia.

Aims

Identify one or more lists of settlements in Aberdeenshire
Use those lists to identify gaps in Wikipedia and WIkidata for Aberdeenshire settlements.
Create Wikidata items, update Wikipedia with a more comprehensive list of settlements and, time permitting, enhance existing Wikipedia articles with Infoboxes, and create new Wikipedia articles where these are missing.

Approach

We began by importing a list from Wikipedia  into Google Sheets using its function

=importHTML(url, item, position)

This gave a list of 183 settlements – with five having missing Wikipedia articles.

To compare, we then wrote an initial Wikidata query  which only returned 10 results. It turned out that there are two (or more) Aberdeenshires in Wikidata (each representing something subtly different) and we used the wrong one.

Amending our query and running the new one   gave us 283 settlements. On checking we saw that  they included the 10 above too. It also included whether the item had a Wikipedia article associated with it. We used this Wikidata list (with a quick python script) to update the original Wikipedia list page above.

Adding Images / WikiShootMe

We further updated the query adding whether there was a photo associated with the item giving firstly these results and, by changing the default view to map, we could see where the coordinates were placing each point. The vast majority (est. 90% ) of items had no photograph.

By following this tutorial that Ian had created recently,  we were able to create a custom clickable map in the WikiShootMe tool. This means that anyone can click on a red dot, and choose to take or upload a photo of the settlement and have that added to Wiki Commons, and associated automatically with the Wikidata item.

We published that on Twitter and asked for contributions. Not only could someone take and upload a photo, but it also meant that one could search Wiki Commons for a matching image (which hadn’t yet been associated with the Wikidata item) and tell it to use that. Where none existed it was possible to search on Geograph for a locality. The licensing on Geograph is compatible with Wiki Commons’s terms, so if a suitable image was available, we could use the Geograph2Commons tool and import it.

Over the next few days (i.e. beyond the weekend itself), we went from a starting point of about 10% of settlements in Wikidata having photos to about 90%. You can see this on an image grid, or table.

Red dots show missing photos; green, ones found
Red dots show missing photos; green, ones found

Updating Coordinates

Looking closer at the mapped Wikidata, a number of the items’ coordinates were well out (e.g. Rosehearty, Sandhaven New Aberdour etc). We started to fix these. We did this by finding the settlements in our WikiShootme map, right clicking on the correct position and selecting show coordinates, and pasting those back into the Wikidata item.

Where the original coordinates were imported from Wikipedia it raised a warning. We fixed each one in Wikipedia too, as we went. This needs much more error checking and fixing.

Fixing coordinates and uploading images
Fixing coordinates and uploading images

Missing Places

Our list of places started at 183 links on wikipedia, it grew to 283 with wikidata but still it was clear that many of the populous settlements are missing from Wikidata such as Fintry.

Fintray missing
Fintray missing

These can be added manually but we figured there must be a larger list available from another source like OpenStreetMap (OSM). Not knowing how to get this list we put out a tweet for help.

A tweet for help
A tweet for help

@MaxErickson was one those that came to our aid with a query search for overpass turbo (a web-based data filtering tool for OpenStreetMap) which listed all its identified places in Aberdeenshire with coordinates and place types (town, village, hamlet). This gave us over 780 results but many of these were farm steadings or small islands (islets) in the Ythan, with a bit of filter we got it down to 629 places. We plan to add these to Wikidata, but first it’s worth gathering more data on them.

MySociety

We wanted to add more information to these place such as which constituency each was in for Scottish and UK elections. The Boundary Commission for Scotland website has a tool which lets you enter a postcode and returns this information:

Querying the Boundary Commission for Scotland website
Querying the Boundary Commission for Scotland website

After digging around their website we found that they use mapit.mysociety api to do this. Mapit is open-source software but there is a charge for using their api, luckily CodeTheCity is a charity and eligible for free usage so Ian signed us up!  The API accepts a variety of inputs including lat/lon which we got from the turbo query of OSM.

With a bit more python scripting we now have a CSV with 629 places each listed with coordinates, Scottish Parliament region, Scottish Parliament constituency, UK parliament constituency, Health Board and Unitary Authority.

A spreadsheet of enhanced data for Aberdeenshire settlements
A spreadsheet of enhanced data for Aberdeenshire settlements

What Next?

We are going to get the csv uploaded to Wikidata via Quick Statements, to add the missing places, update existing places with Mysociety data and correct any wandering coordinates in wikidata/wikipedia.

  1. Check the Wikidata list with the OSM list for any missing places in the OSM list (ensuring that core data for each place is included).
  2. Add more information to our CSV to allow us to populate Wikipedia infoboxes for these places. This would include
    • Altitude
    • Distance from London (UK Capital)
    • Distance from Edinburgh (Scotland Capital)
    • Postcode district(s)
    • Dial Code(s)
    • Population (may be difficult for smaller settlements)
    • Area (may be difficult for smaller settlements)
  3. Update Wikidata with new places and any edits required to existing places
  4. Update Wikipedia List page as a table from this data.

Gavin Barnett and Ian Watt

06 August 2020

Urban Henges

Yesterday, thanks to Giuseppe Sollazzo’s fantastic newsletter, I discovered a great project on Github: Urban Henges. This is the work of Victoria Crawford. The purpose of the project is to take a map of any town or city and work out which streets align with sunrise each day of the year. It then creates images for each day and compiles them into an animated GIF.

I cloned her repo and after a little tinkering I was able to run it for myself. At present it is a single Jupyter Notebook containing some Python scripts.

If you are looking to run it for yourself I recommend creating a new Anaconda environment, running Python 3.7, and then installing the OSMNX library using

> conda install -c conda-forge osmnx

I chose to make an animation for Aberdeen. I spotted too late that it truncates the city title after 7 characters, something I later changed.

The process took one hour and 20 minutes to complete, even on a fast MacBook Pro with 32Gb RAM as there is a lot of computation.

Here is the Aberdeen animation.

Aberdeen Urban Henge Animation
Aberdeen Urban Henge Animation

Fun, don’t you think!?

Kudos to Victoria for sharing her code on Github, and to Guiseppe for highlighting this, and so many more projects in his regular newsletter. Hopefully Victoria will add an open licence to the Github repo to make it clear that we can repurpose the code.

And don’t forget this is only possible because the main data for the streets network is Open Data from Open Street Map which is entirely contributed and published by a large community of users. Why don’t you help maintain the maps for your area?

Ian

Header image  by Simon Hattinga Verschure on Unsplash

Aberdeen Built Ships

This project was one of several initiated at the fully-online Code the City 19 History and Data event.

It’s purpose is to gather data on Aberdeen-built ships, with the permission of the site’s owners, and to push that refined bulk data, with added structure, onto Wikidata as open data, with links back to the Aberdeen Ships site through using a new identifier.

By adding the data for the Aberdeen Built Ships to Wikidata we will be able to do several things including

  • Create a timeline of ship building
  • Create maps, charts and graphs of the data (e.g. showing the change in sizes and types of ships over time
  • Show the relative activity of the many shipbuilders and how that changed
  • Link ship data to external data sources
  • Improve the data quality
  • Increase engagement with the ships database.

The description below is largely borrowed from the ReadMe file of the project’s Github Repo.

Progress to date

So far the following has been accomplished, mainly during the course of the weekend.

Next Steps?

To complete the project the following needs to be done

  • Ensure that the request for an identifier for ABS is created for use by us in adding ships to Wikidata. A request to create an identifier for Aberdeen Ships is currently pending.
  • Create Wikidata entities for all shipbuilders and note the QID for each. We’ve already loaded nine of these into WikiData.
  • Decide on how to deal with the list of ships that MAY be already in Wikidata. This may have to be a manual process. Think about how we reconcile this – name / year / tonnage may all be useful.
  • Decide on best route to bulk upload – eg Quickstatements. This may be useful: Wikidata Import Guide
  • Agree a core set of data for each ship that will parsed from ships.json to be added to Wikidata – e.g. name, year, builder, tonnage, length etc
  • Create a script to output text that can be dropped into a CSV or other file to be used by QuickStatements (assuming that to be the right tool) for bulk input ensuring links for shipbuilder IDs and ABS identifiers are used.

We will also be looking to get pictures of the ships published onto Wiki Commons with permissive licences, link these to the Wiki Data and increase and improve the number of Wikipedia articles on Aberdeen Ships in the longer-term.

Header Image of a Scale Model of Thermopylae at Aberdeen Maritime Museum By Stephencdickson – Own work, CC BY-SA 4.0

Aberdeen Provosts

In the run up to Code The City 19 we had several suggestions of potential projects that we could work on over the weekend. One was that we add all of the Provosts of Aberdeen to Wikidata. This appealed to me so I volunteered to work on it in a team with Wikimedia UK’s Scotland Programme Coordinator, Dr Sara Thomas, with whom I have worked on other projects.

In preparation for CTC19 I’d been reading up on the history of the City’s provosts and discovered that up to 1863 the official title was Provost, and from that point it was Lord Provost. I’d made changes to the Wikipedia page to reflect that, and I’d added an extra item to Wikidata so that we could create statements that properly reflected which position the people held.

Sara and I began by agreeing an approach and sharing resources. We made full use of Google Docs and Google Sheets.

We had two main sources of information on Provosts:

Running the project

I started by setting up a Google Sheet to pull data from Wikipedia as a first attempt to import a list to work with. The importHTML function in Google Sheets is a useful way to retrieve data in list or table format.

I entered the formula in the top left cell (A1):

=importhtml("https://en.wikipedia.org/wiki/List_of_provosts_of_Aberdeen", "list", 27)

and repeating the formula for all the lists – one per century. This populated our sheet with the numerous lists of provosts.

That state didn’t last very long. The query is dynamic. The structure of the Wikipedia page was being adapted, it appeared, with extra lists – so groups of former provosts kept disappearing from our sheet.

I decided to create a list manually – copying the HTML of the Wikipedia page and running some regex find and replace commands in a text editor to leave only the text we needed, which I then pasted into sheets.

Partial list of Provosts
Partial list of Lord Provosts

Once we had that in the Google Sheet we got to work with some formulae to clean and arrange the data. Our entries were in the form “(1410–1411) Robert Davidson” so we had to

    • split names from dates,
    • split the start dates from end dates, and
    • split names into family names and given names.

Having got that working (albeit with a few odd results to manually fix) Sara identified a Chrome plugin called “Wikipedia and WikiData tools” which proved really useful. For example we could query the term in a cell e.g. “Hadden” and get back the QID of the first instance of that. And we could point another query at the QID and ask what it was an instance of. If it was Family Name, or Given Name we could use those codes and only manually look up the others. That saved quite a bit of time.

Identifying QIDs for Given and Family Names
Identifying QIDs for Given and Family Names

Our aim in all of this was to prepare a bulk upload to Wikidata with as little manual entry as possible. To do that Sara had identified Quickstatements, which is a bulk upload tool for Wikidata, which allows you to make large numbers of edits through a relatively simple interface.

Sara created a model for what each item in Quickstatements should contain:

A model of a Quickstatements entry
A model of a Quickstatements entry

There are a few quirks – for example, how you format a date – but once you’ve got the basics down it’s an incredibly powerful tool. The help page is really very useful.

Where dates were concerned, I created a formula to look up the date in another cell then surround it with the formatting needed:

="+"&Sheet1!J99&"-00-00T00:00:00Z/9"

Which gave +1515-00-00T00:00:00Z/9 as the output.

You can also bulk-create items, which is what we did here. We found that it worked best in Firefox, after a few stumbles.

Data harvesting

As mentioned above, we used a printed source, from which we harvested the data about the individual Provosts.  It’s easy to get very detailed very quickly, but we decided on a basic upload for:

  • Name
  • First name
  • Last name
  • Position held (qualified by the dates)
  • Date of birth, and death (where available).

Some of our provosts held the position three or four times, often with breaks between. We attempted to work out a way to add the same role held twice with different date qualifiers, but ultimately this had to be done manually

The first upload

We made a few test batches – five or six entries to see how the process worked.

A test batch to upload via Quickstatements
A test batch to upload via Quickstatements

When that worked we created larger batches. We concluded the weekend with all of the Provosts and Lord Provosts being added to Wikidata which was very satisfying. We also had a list of further tasks to carry out to enhance the data. These included:

  • Add multiple terms of office – now complete,
  • Add statements for Replaces (P1365) and Replaced By (P1366) – partly done,
  • Add honorific titles, partly done
  • Add images of signatures (partly done) and portraits ( completed) from the reference book,
  • Add biographical details from the book – hardly started,
  • Source images for WIkiCommons from the collection portraits at AAGM – request sent,
  • Add places of burial, identifiers from Find A Grave, photographs of gravestones,
  • Add streets named after provosts and link them.

You can see the results in this WikiData query: https://w.wiki/PsF

A Wikidata Query showing Provosts' Terms of Office, and their replacements
A Wikidata Query showing Provosts’ Terms of Office, and their replacements

This was a very interesting project to work on – and there is still more to do to improve the data, which you can help with.