Aberdeen Built Ships – an update at CTC20

This project was commenced at CTC19 on 11th -12th April. The aim was to import from Aberdeen Built Ships (with the permission of the Galleries and Museums Service who operate it) a complete set of data on those 3000+ ships into Wikidata data in as clean and well-formatted state as possible.

We got part of the way there at CTC19, and in work done in the following weeks, but the data had still not been imported.

CTC20 progress

We had in the weeks since CTC19, we had identified issues with two significant aspects of the data in the core ABS system: a lack of standardisation of ship types (meaning that there were up to nine variants of a single type) and a similar issue with ship builders.

For the purposes of CTC20 we agreed to set these aside and press ahead with an import of core data for each ship we could – and to revisit the specific details above later.

What was done

Core data was imported into Wikidata for most of the ships. We excluded some ships from the import if the name field was blank or UNKNOWN or UNNAMED. Other, existing, ships had an ABS ID added to their item. This has resulted in 3085 ships in Wikidata with an ABS ID at the time of writing.

Screenshot of Samuel Plimsoll
Screenshot of Samuel Plimsoll

Method

We initially tried to use the CSV format for wikidata quickstatements, but couldn’t get this to work so switched to the TSV version. A python script was written to write the quickstatements file that could then be copied into the quickstatements batch import tool. The import had 2 errors for ships that had a range of years in the Date so generated invalid dates in the quickstatements. These (and 2 duplicates that I noticed after the import) are noted to correct later.

The ABS ID property (P8260) was manually added to the ships that already existed in wikidata.

The mappings between QID and ABS ID was found from SPARQL query:

SELECT ?qid ?absid
WHERE
{
  ?qid wdt:P8260 ?absid.
}

Next Steps?

To complete the project the following needs to be done

  • Add Country of Origin (P495) to all existing Aberdeen-built ships in Wikidata. This will suppress the warning messages when viewing each ship.
  • Rationalise all ship builders that exist in ship_builders.csv – deduplicating these and create Wikidata entries for each we will use.
  • Rationalise all ship types that exist in ship_types.csv – deduplicating these and create Wikidata entries for each we will use.
  • Update each ship with specific type and ship builder.
  • Extract / rationalise data from some of the fields, e.g. we have one dimensions field rather than separate fields for length/beam/draft/… and what’s there is inconsistent
  • Isolate ships that have no Wikidata identifier – i.e. any one not in the list of 59 positive matches. Set aside those which have entries for later processing.
  • Source and add pictures of the ships in ABS (see below)
  • Develop a means of monitoring both the original ABS system (rescrape periodically and do a diff on the file in some way? ) and monitor Wikidata for changes to the ships records (Wikidata query, executed periodically, generating a CSV download and checked for differences from previous runs?) to feed back to ABS.

Images of ships

ships with images
Ships with images

Despite there now being 3,085 Aberdeen-built ships in Wikidata only 12 of these (or 0.388%) has a picture associated with them. There is a significant opportunity to work with Aberdeen Museums to add images from their extensive collection to Wiki Commons and associate these with the ships now in Wikidata.

Header image Twice & Rinina25 / CC BY-SA https://upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Genova-Tall_Ship-IMG_1509.JPG/512px-Genova-Tall_Ship-IMG_1509.JPG

Aberdeenshire Settlements on Wikidata and Wikipedia

Introduction

This project was part of Code The City’s #CTC20 History and Culture hack weekend.

The Challenge

To identify (all of) the settlements – towns, hamlets, villages – in Aberdeenshire and ensure that these are well represented with high quality items on Wikidata and Wikipedia.

Aims

Identify one or more lists of settlements in Aberdeenshire
Use those lists to identify gaps in Wikipedia and WIkidata for Aberdeenshire settlements.
Create Wikidata items, update Wikipedia with a more comprehensive list of settlements and, time permitting, enhance existing Wikipedia articles with Infoboxes, and create new Wikipedia articles where these are missing.

Approach

We began by importing a list from Wikipedia  into Google Sheets using its function

=importHTML(url, item, position)

This gave a list of 183 settlements – with five having missing Wikipedia articles.

To compare, we then wrote an initial Wikidata query  which only returned 10 results. It turned out that there are two (or more) Aberdeenshires in Wikidata (each representing something subtly different) and we used the wrong one.

Amending our query and running the new one   gave us 283 settlements. On checking we saw that  they included the 10 above too. It also included whether the item had a Wikipedia article associated with it. We used this Wikidata list (with a quick python script) to update the original Wikipedia list page above.

Adding Images / WikiShootMe

We further updated the query adding whether there was a photo associated with the item giving firstly these results and, by changing the default view to map, we could see where the coordinates were placing each point. The vast majority (est. 90% ) of items had no photograph.

By following this tutorial that Ian had created recently,  we were able to create a custom clickable map in the WikiShootMe tool. This means that anyone can click on a red dot, and choose to take or upload a photo of the settlement and have that added to Wiki Commons, and associated automatically with the Wikidata item.

We published that on Twitter and asked for contributions. Not only could someone take and upload a photo, but it also meant that one could search Wiki Commons for a matching image (which hadn’t yet been associated with the Wikidata item) and tell it to use that. Where none existed it was possible to search on Geograph for a locality. The licensing on Geograph is compatible with Wiki Commons’s terms, so if a suitable image was available, we could use the Geograph2Commons tool and import it.

Over the next few days (i.e. beyond the weekend itself), we went from a starting point of about 10% of settlements in Wikidata having photos to about 90%. You can see this on an image grid, or table.

Red dots show missing photos; green, ones found
Red dots show missing photos; green, ones found

Updating Coordinates

Looking closer at the mapped Wikidata, a number of the items’ coordinates were well out (e.g. Rosehearty, Sandhaven New Aberdour etc). We started to fix these. We did this by finding the settlements in our WikiShootme map, right clicking on the correct position and selecting show coordinates, and pasting those back into the Wikidata item.

Where the original coordinates were imported from Wikipedia it raised a warning. We fixed each one in Wikipedia too, as we went. This needs much more error checking and fixing.

Fixing coordinates and uploading images
Fixing coordinates and uploading images

Missing Places

Our list of places started at 183 links on wikipedia, it grew to 283 with wikidata but still it was clear that many of the populous settlements are missing from Wikidata such as Fintry.

Fintray missing
Fintray missing

These can be added manually but we figured there must be a larger list available from another source like OpenStreetMap (OSM). Not knowing how to get this list we put out a tweet for help.

A tweet for help
A tweet for help

@MaxErickson was one those that came to our aid with a query search for overpass turbo (a web-based data filtering tool for OpenStreetMap) which listed all its identified places in Aberdeenshire with coordinates and place types (town, village, hamlet). This gave us over 780 results but many of these were farm steadings or small islands (islets) in the Ythan, with a bit of filter we got it down to 629 places. We plan to add these to Wikidata, but first it’s worth gathering more data on them.

MySociety

We wanted to add more information to these place such as which constituency each was in for Scottish and UK elections. The Boundary Commission for Scotland website has a tool which lets you enter a postcode and returns this information:

Querying the Boundary Commission for Scotland website
Querying the Boundary Commission for Scotland website

After digging around their website we found that they use mapit.mysociety api to do this. Mapit is open-source software but there is a charge for using their api, luckily CodeTheCity is a charity and eligible for free usage so Ian signed us up!  The API accepts a variety of inputs including lat/lon which we got from the turbo query of OSM.

With a bit more python scripting we now have a CSV with 629 places each listed with coordinates, Scottish Parliament region, Scottish Parliament constituency, UK parliament constituency, Health Board and Unitary Authority.

A spreadsheet of enhanced data for Aberdeenshire settlements
A spreadsheet of enhanced data for Aberdeenshire settlements

What Next?

We are going to get the csv uploaded to Wikidata via Quick Statements, to add the missing places, update existing places with Mysociety data and correct any wandering coordinates in wikidata/wikipedia.

  1. Check the Wikidata list with the OSM list for any missing places in the OSM list (ensuring that core data for each place is included).
  2. Add more information to our CSV to allow us to populate Wikipedia infoboxes for these places. This would include
    • Altitude
    • Distance from London (UK Capital)
    • Distance from Edinburgh (Scotland Capital)
    • Postcode district(s)
    • Dial Code(s)
    • Population (may be difficult for smaller settlements)
    • Area (may be difficult for smaller settlements)
  3. Update Wikidata with new places and any edits required to existing places
  4. Update Wikipedia List page as a table from this data.

Gavin Barnett and Ian Watt

06 August 2020

How to make a custom WIkiShootMe page for missing images

One of the many WikiLabs tools that I use a lot is Wikishootme.

Wikishootme screenshot by https://tools.wmflabs.org/wikishootme/ - https://tools.wmflabs.org/wikishootme/, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=73548153
Wikishootme screenshot – CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=73548153

This application is designed to be used on a mobile phone. It allows you to call up a map of where you are at the moment and find missing images of listed building (as red dots). You can then authorise the app, using your Wikipedia / Wikidata credentials, and click on a red dot to upload a photo that you either take there and then or from your phone’s media. The image goes straight to Wiki Commons with a CC-BY-SA licence. And, once uploaded, the photos are automatically linked to the wikidata entry for that item! Should that be automagically?

I had a bunch of projects where I thought it would be useful to generate a custom map with missing images (for example of plaques, or boundary stones), then encourage people to photograph them and add them. Thankfully, Wikishootme allows you to do that.

It turns out it’s not too hard to do. Here is a walk through.

1. Create your wikidata query

I’m going to use the March Stones of Aberdeen as an example. I suggest that you copy exactly what I do, creating this query in full through all three steps. Then when you understand how it works, substitute your own query.

In Wikidata’s Query Service, create the query to retrieve the data you want. Wikishootme is quite particular about column names in the final output, so we need to make sure that our query has columns called ‘q‘ (for the wikidata identifiers) and ‘location‘ for the coordinate locations.

SELECT ?q ?location WHERE{
?q wdt:P31 wd:Q921099; wdt:P131 wd:Q62274582 .
?q wdt:P625 ?location .
}

(For the purposes of this tutorial it is not necessary to understand the syntax of a SPARQL query. If you are curious, in the above query P31 means an instance of; Q921099 is the identifier for a boundary marker; P131 means located in the administrative entity; and Q62274582 is Aberdeen City)

Try it here

Test that your query runs ok and returns what you expect. The query above will generate a table with two columns – one labelled q with a list of Wikidata QID codes, and another, location with coordinate pairs for each item.

2. Grab the SPARQL

Next copy all of the code between the {} pair (i.e. all of the second and third lines of the query above, but without the curly braces.

Then head to https://urldecode.org, paste your query text into it, and click on encode.

This will create a stream of characters that can be passed as part of a URL to another service. Copy all of that text. When I encode the query above I get the following string:

%3Fq%20wdt%3AP31%20wd%3AQ921099%3B%20wdt%3AP131%20wd%3AQ62274582%20.%20%3Fq%20wdt%3AP625%20%3Flocation%20.

3. Generate the URL

We now need to append (or add) the encoded text to the end of the following URL.

https://wikishootme.toolforge.org/#lat=0&lng=0&zoom=1&layers=wikidata_no_image&worldwide=1&sparql_filter=

This is best done in a text editor.

So, when I paste the encoded string to the end of that, I get this:

https://wikishootme.toolforge.org/#lat=0&lng=0&zoom=1&layers=wikidata_no_image&worldwide=1&sparql_filter=%3Fq%20wdt%3AP31%20wd%3AQ921099%3B%20wdt%3AP131%20wd%3AQ62274582%20.%20%3Fq%20wdt%3AP625%20%3Flocation%20.

4. Try it out

Click on the link above. Did it work? It does for me. When I open it it defaults to a whole world map.

Default view of Wikishootme
Default view of Wikishootme

Scroll and zoom to where your red dots are.

Wikishootme, scrolled and zoomed
Wikishootme, scrolled and zoomed

Tip: when you get the map centred and at the scale you like, recopy the URL. This will capture the location and zoom level in your map for sharing.

Also, click on the layers symbol at the top right of the map. Choose to display where the data has images (green) as well as the red:

Wikishootme Layers control
Wikishootme Layers control

That will change your view to showing red (missing) and green (captured) images for your wikidata items.

Wikishootme showing red and green dots
Wikishootme showing red and green dots

Now you can share your map. I suggest copying your URL (see the Tip above) into a link shortener such as bit.ly so as to make sharing easier.

Now, when someone clicks on your URL they can click on a red dot, and upload a missing photo to Wiki Commons, and automatically link it to Wikidata – and turn those red dots green!

Header Photo by Ravi Roshan on Unsplash

Urban Henges

Yesterday, thanks to Giuseppe Sollazzo’s fantastic newsletter, I discovered a great project on Github: Urban Henges. This is the work of Victoria Crawford. The purpose of the project is to take a map of any town or city and work out which streets align with sunrise each day of the year. It then creates images for each day and compiles them into an animated GIF.

I cloned her repo and after a little tinkering I was able to run it for myself. At present it is a single Jupyter Notebook containing some Python scripts.

If you are looking to run it for yourself I recommend creating a new Anaconda environment, running Python 3.7, and then installing the OSMNX library using

> conda install -c conda-forge osmnx

I chose to make an animation for Aberdeen. I spotted too late that it truncates the city title after 7 characters, something I later changed.

The process took one hour and 20 minutes to complete, even on a fast MacBook Pro with 32Gb RAM as there is a lot of computation.

Here is the Aberdeen animation.

Aberdeen Urban Henge Animation
Aberdeen Urban Henge Animation

Fun, don’t you think!?

Kudos to Victoria for sharing her code on Github, and to Guiseppe for highlighting this, and so many more projects in his regular newsletter. Hopefully Victoria will add an open licence to the Github repo to make it clear that we can repurpose the code.

And don’t forget this is only possible because the main data for the streets network is Open Data from Open Street Map which is entirely contributed and published by a large community of users. Why don’t you help maintain the maps for your area?

Ian

Header image  by Simon Hattinga Verschure on Unsplash

Aberdeen Built Ships

This project was one of several initiated at the fully-online Code the City 19 History and Data event.

It’s purpose is to gather data on Aberdeen-built ships, with the permission of the site’s owners, and to push that refined bulk data, with added structure, onto Wikidata as open data, with links back to the Aberdeen Ships site through using a new identifier.

By adding the data for the Aberdeen Built Ships to Wikidata we will be able to do several things including

  • Create a timeline of ship building
  • Create maps, charts and graphs of the data (e.g. showing the change in sizes and types of ships over time
  • Show the relative activity of the many shipbuilders and how that changed
  • Link ship data to external data sources
  • Improve the data quality
  • Increase engagement with the ships database.

The description below is largely borrowed from the ReadMe file of the project’s Github Repo.

Progress to date

So far the following has been accomplished, mainly during the course of the weekend.

Next Steps?

To complete the project the following needs to be done

  • Ensure that the request for an identifier for ABS is created for use by us in adding ships to Wikidata. A request to create an identifier for Aberdeen Ships is currently pending.
  • Create Wikidata entities for all shipbuilders and note the QID for each. We’ve already loaded nine of these into WikiData.
  • Decide on how to deal with the list of ships that MAY be already in Wikidata. This may have to be a manual process. Think about how we reconcile this – name / year / tonnage may all be useful.
  • Decide on best route to bulk upload – eg Quickstatements. This may be useful: Wikidata Import Guide
  • Agree a core set of data for each ship that will parsed from ships.json to be added to Wikidata – e.g. name, year, builder, tonnage, length etc
  • Create a script to output text that can be dropped into a CSV or other file to be used by QuickStatements (assuming that to be the right tool) for bulk input ensuring links for shipbuilder IDs and ABS identifiers are used.

We will also be looking to get pictures of the ships published onto Wiki Commons with permissive licences, link these to the Wiki Data and increase and improve the number of Wikipedia articles on Aberdeen Ships in the longer-term.

Header Image of a Scale Model of Thermopylae at Aberdeen Maritime Museum By Stephencdickson – Own work, CC BY-SA 4.0