Mapping Memorials to Women in Aberdeen

This project, which was part of CTC20,  grew from a WMUK / Archaeology Scotland join project carried out by Scottish Graduate School of Arts & Humanities intern Roberta Leotta during lockdown 2020. More details about the background to the project can be found here.

It’s often touted that there are some cities in Scotland (coughEdinburghcough) where there are more statues to animals than there are to women. In my own work transferring OpenPlaques data to Wikidata I’ve observed that there are more entries for Charles Rennie Macintosh than there for women in Glasgow. So in this light, it’s somewhat refreshing to work on a project that celebrates all kinds of memorials to women in Scotland.

The Women of Scotland: Mapping Memorials project began in 2010 as a joint project between Glasgow Women’s Library, and Women’s History Scotland. It’s similar in many ways to OpenPlaques, but using Wikidata could add an extra dimension – let’s increase the coverage of women’s history and culture on the Wikimedia projects by getting these memorials and the women they celebrate into Wikidata, use that to identify gaps in knowledge, and then work to fill the gap.

Over the two days, here’s what we did:

Data collection

We scraped the initial list of data from Mapping Memorials website manually, and created a shared worksheet based on a model that’s been used previously for other cities. (The manual process is slow, and a bit fiddly, and is the one thing that I wouldn’t do again. We’re in contact with the admin so going forward, I’m hopeful that we wouldn’t need to repeat this step in the future.)

Once we had this list, we could create a more automated process to deal with gathering the other pieces of information we needed to create new, good quality Wikidata items, although some (description, for example) needed a human eye.

Wikidata identifiers

We were using two main identifiers on Wikidata – P8048 (Women of Scotland memorial ID) and P8050 (Women of Scotland subject ID). The former for the entries to the memorials themselves, and the latter for the women they celebrate. Where the women didn’t have entries, we could create those, and then link them to the entries for the memorials.

Both identifiers use the last part of the URL for each entry on the Mapping Memorials site, so that was fairly easy to do in Google docs. Once we had that info, it’s an easy enough step to bulk-create items either using Quickstatements or Wikibase CLI.

Creating items & avoiding duplicates

There’s a plug in for Google Sheets called Wikipedia and Wikidata Tools which has some useful features for projects like this – WikidataQID for looking up whether something already exists on Wikidata, and WikidataFacts, which tells you what that item is. The former is ok if you have an exact match, the latter is really useful for flagging anything which might lead to a disambiguation page, for example.

Ultimately we did end up with a few duplicates that needed to be merged, but this was pretty easily managed, and it really showed how useful it is to have local knowledge involved in local projects – there were a couple of sets of coordinates that were obviously wrong, but also some errors that wouldn’t have been spotted by someone unfamiliar with the area.

Coordinates and dates

I really like Quickstatements, but there are a few areas in which it’s fiddly, including coordinates and dates. I’m really interested in looking further into Wikibase CLI for dates in particular, as the process there for dates (documented here) looks to be substantially easier in terms of data prep than it does in Quickstatements. Many thanks to Tony for that work, as his expertise saved us a lot of time! He also used that tool to create items for those women commemorated who were missing from Wikidata, documented here.

As with dates, coordinates are entered into Quickstatements in a different format than that which you’d use manually inside Wikidata itself, hence the formatting you’ll see in column Q on the Data collection tab. Most of this we had to grab from Google Maps, which again is a bit fiddly.


Once we had a master list of QIDs for the memorials we were working with, we could use Quickstatements to bulk upload sets of statements to those items.

For example, matching the memorials to the women commemorated, using this format:

Screenshot of a spreadsheet showing QID for memorials and the women they commemorate
Screenshot of a spreadsheet showing QID for memorials and the women they commemorate

The Q numbers on the left are those of the memorials, P547 is “commemorates”, and the Q numbers on the right are those of the women celebrated. We were also able to add P8050 (Women of Scotland subject ID) to some women who already had entries on Wikidata, but no WoS ID.

Screenshot of a spreadsheet showing each memorial QID and its type
Screenshot of a spreadsheet showing each memorial QID and its type

The Q number on the left again is the memorial, P31 is “instance of”, and the Q number on the right corresponds to a type of thing – a commemorative plaque, a garden, or a road, for example.

Once you’ve got the info in this format, it’s just a case of copy & pasting into QS, clicking import, and then run. (Note – you do need to be an autoconfirmed user to use QS, which means that your account must be at least 4 days old, and having more than 50 edits.) It’s relatively easy, and I was pleased that one of our relatively-new-to-Wikidata participants had the chance to make her first bulk uploads (description & commons category) using the tool over the weekend.


This project grew out of a desire to increase the coverage of Scottish heritage on Wikimedia Commons, so it was great to take some time on this. Mapping Memorials does have some images, but they’re not openly licensed, and others are missing. After Wikimedia Commons, our next port of call was Geograph, where many images have been released on Wiki-compatible Creative Commons licenses. Using Geograph2Commons, images can easily be transferred over to Wikimedia Commons, so that they can be used in any Wikimedia Project. Geograph also links to this feature from their site – click on “Find out how to reuse this image”, and then scroll down to “Wikipedia template for image page”, then click on the “geograph2commons” link. Really simple. Our group did some detective work for images, and then added them to Commons, and linked them manually to the Wikidata item.

This gave us a list of missing images… which is fine, but wouldn’t it be better to see them on a map?

Visualisation and filling the gaps

Thanks to Ian’s tutorial on how to create a custom WikiShootMe map, we were able to create a custom map that showed us which of the memorials we were working on had images, which didn’t, and where they were. That map is here, and it was great to see it slowly turn more green than red over the weekend as we found more images, or as volunteers headed out across Aberdeen between days to take missing pictures.

A screenshot of a clickable map where people can upload photos of monuments
A screenshot of a clickable map where people can upload photos of monuments

One of the small, but very satisfying, things you can do with these kinds of images is to integrate them into relevant Wikipedia articles. I added images from the project to the articles for Aberdeen Town House, Caroline Phillips, and Katherine Grainger. At the time of writing, around 2500 people have viewed those articles since I added the images.

Next steps

Over the course of the weekend we added 77 new memorials, and 26 new women to Wikidata, as well as a whole host of new photos. These entries all had some quite rich data, and as complete as we could make it.

We were surprised to see some of the individuals who didn’t have a Wikipedia article – and of course, we can use the Wikidata query service to identify those gaps. The queries below could give us a great starting point for an editathon, or indeed, for any Wikipedia editor interested in writing Women’s biography.

  • Wikidata query for women with a Women of Scotland subject ID, a memorial in Aberdeen, but no enwiki article:
  • Wikidata query for women with a Women of Scotland subject ID, but no enwiki article:

Huge thanks to the team, and to Code the City for another great hack weekend!

Dr Sara Thomas
Scotland Programme Coordinator, Wikimedia UK


Header image: The Grave of Jessie Seymour Irvine by Ian Watt on Wiki Commons  (CC-BY-SA)

How to make a custom WIkiShootMe page for missing images

One of the many WikiLabs tools that I use a lot is Wikishootme.

Wikishootme screenshot by -, CC BY-SA 3.0,
Wikishootme screenshot – CC BY-SA 3.0,

This application is designed to be used on a mobile phone. It allows you to call up a map of where you are at the moment and find missing images of listed building (as red dots). You can then authorise the app, using your Wikipedia / Wikidata credentials, and click on a red dot to upload a photo that you either take there and then or from your phone’s media. The image goes straight to Wiki Commons with a CC-BY-SA licence. And, once uploaded, the photos are automatically linked to the wikidata entry for that item! Should that be automagically?

I had a bunch of projects where I thought it would be useful to generate a custom map with missing images (for example of plaques, or boundary stones), then encourage people to photograph them and add them. Thankfully, Wikishootme allows you to do that.

It turns out it’s not too hard to do. Here is a walk through.

1. Create your wikidata query

I’m going to use the March Stones of Aberdeen as an example. I suggest that you copy exactly what I do, creating this query in full through all three steps. Then when you understand how it works, substitute your own query.

In Wikidata’s Query Service, create the query to retrieve the data you want. Wikishootme is quite particular about column names in the final output, so we need to make sure that our query has columns called ‘q‘ (for the wikidata identifiers) and ‘location‘ for the coordinate locations.

SELECT ?q ?location WHERE{
?q wdt:P31 wd:Q921099; wdt:P131 wd:Q62274582 .
?q wdt:P625 ?location .

(For the purposes of this tutorial it is not necessary to understand the syntax of a SPARQL query. If you are curious, in the above query P31 means an instance of; Q921099 is the identifier for a boundary marker; P131 means located in the administrative entity; and Q62274582 is Aberdeen City)

Try it here

Test that your query runs ok and returns what you expect. The query above will generate a table with two columns – one labelled q with a list of Wikidata QID codes, and another, location with coordinate pairs for each item.

2. Grab the SPARQL

Next copy all of the code between the {} pair (i.e. all of the second and third lines of the query above, but without the curly braces.

Then head to, paste your query text into it, and click on encode.

This will create a stream of characters that can be passed as part of a URL to another service. Copy all of that text. When I encode the query above I get the following string:


3. Generate the URL

We now need to append (or add) the encoded text to the end of the following URL.

This is best done in a text editor.

So, when I paste the encoded string to the end of that, I get this:

4. Try it out

Click on the link above. Did it work? It does for me. When I open it it defaults to a whole world map.

Default view of Wikishootme
Default view of Wikishootme

Scroll and zoom to where your red dots are.

Wikishootme, scrolled and zoomed
Wikishootme, scrolled and zoomed

Tip: when you get the map centred and at the scale you like, recopy the URL. This will capture the location and zoom level in your map for sharing.

Also, click on the layers symbol at the top right of the map. Choose to display where the data has images (green) as well as the red:

Wikishootme Layers control
Wikishootme Layers control

That will change your view to showing red (missing) and green (captured) images for your wikidata items.

Wikishootme showing red and green dots
Wikishootme showing red and green dots

Now you can share your map. I suggest copying your URL (see the Tip above) into a link shortener such as so as to make sharing easier.

Now, when someone clicks on your URL they can click on a red dot, and upload a missing photo to Wiki Commons, and automatically link it to Wikidata – and turn those red dots green!

Header Photo by Ravi Roshan on Unsplash

Aberdeen Provosts

In the run up to Code The City 19 we had several suggestions of potential projects that we could work on over the weekend. One was that we add all of the Provosts of Aberdeen to Wikidata. This appealed to me so I volunteered to work on it in a team with Wikimedia UK’s Scotland Programme Coordinator, Dr Sara Thomas, with whom I have worked on other projects.

In preparation for CTC19 I’d been reading up on the history of the City’s provosts and discovered that up to 1863 the official title was Provost, and from that point it was Lord Provost. I’d made changes to the Wikipedia page to reflect that, and I’d added an extra item to Wikidata so that we could create statements that properly reflected which position the people held.

Sara and I began by agreeing an approach and sharing resources. We made full use of Google Docs and Google Sheets.

We had two main sources of information on Provosts:

Running the project

I started by setting up a Google Sheet to pull data from Wikipedia as a first attempt to import a list to work with. The importHTML function in Google Sheets is a useful way to retrieve data in list or table format.

I entered the formula in the top left cell (A1):

=importhtml("", "list", 27)

and repeating the formula for all the lists – one per century. This populated our sheet with the numerous lists of provosts.

That state didn’t last very long. The query is dynamic. The structure of the Wikipedia page was being adapted, it appeared, with extra lists – so groups of former provosts kept disappearing from our sheet.

I decided to create a list manually – copying the HTML of the Wikipedia page and running some regex find and replace commands in a text editor to leave only the text we needed, which I then pasted into sheets.

Partial list of Provosts
Partial list of Lord Provosts

Once we had that in the Google Sheet we got to work with some formulae to clean and arrange the data. Our entries were in the form “(1410–1411) Robert Davidson” so we had to

    • split names from dates,
    • split the start dates from end dates, and
    • split names into family names and given names.

Having got that working (albeit with a few odd results to manually fix) Sara identified a Chrome plugin called “Wikipedia and WikiData tools” which proved really useful. For example we could query the term in a cell e.g. “Hadden” and get back the QID of the first instance of that. And we could point another query at the QID and ask what it was an instance of. If it was Family Name, or Given Name we could use those codes and only manually look up the others. That saved quite a bit of time.

Identifying QIDs for Given and Family Names
Identifying QIDs for Given and Family Names

Our aim in all of this was to prepare a bulk upload to Wikidata with as little manual entry as possible. To do that Sara had identified Quickstatements, which is a bulk upload tool for Wikidata, which allows you to make large numbers of edits through a relatively simple interface.

Sara created a model for what each item in Quickstatements should contain:

A model of a Quickstatements entry
A model of a Quickstatements entry

There are a few quirks – for example, how you format a date – but once you’ve got the basics down it’s an incredibly powerful tool. The help page is really very useful.

Where dates were concerned, I created a formula to look up the date in another cell then surround it with the formatting needed:


Which gave +1515-00-00T00:00:00Z/9 as the output.

You can also bulk-create items, which is what we did here. We found that it worked best in Firefox, after a few stumbles.

Data harvesting

As mentioned above, we used a printed source, from which we harvested the data about the individual Provosts.  It’s easy to get very detailed very quickly, but we decided on a basic upload for:

  • Name
  • First name
  • Last name
  • Position held (qualified by the dates)
  • Date of birth, and death (where available).

Some of our provosts held the position three or four times, often with breaks between. We attempted to work out a way to add the same role held twice with different date qualifiers, but ultimately this had to be done manually

The first upload

We made a few test batches – five or six entries to see how the process worked.

A test batch to upload via Quickstatements
A test batch to upload via Quickstatements

When that worked we created larger batches. We concluded the weekend with all of the Provosts and Lord Provosts being added to Wikidata which was very satisfying. We also had a list of further tasks to carry out to enhance the data. These included:

  • Add multiple terms of office – now complete,
  • Add statements for Replaces (P1365) and Replaced By (P1366) – partly done,
  • Add honorific titles, partly done
  • Add images of signatures (partly done) and portraits ( completed) from the reference book,
  • Add biographical details from the book – hardly started,
  • Source images for WIkiCommons from the collection portraits at AAGM – request sent,
  • Add places of burial, identifiers from Find A Grave, photographs of gravestones,
  • Add streets named after provosts and link them.

You can see the results in this WikiData query:

A Wikidata Query showing Provosts' Terms of Office, and their replacements
A Wikidata Query showing Provosts’ Terms of Office, and their replacements

This was a very interesting project to work on – and there is still more to do to improve the data, which you can help with.

Aberdeen Plaques – Part Two

In part one I described what we did at CTC18 to capture data and images of Commemorative Plaques in Aberdeen, and what I then did in the following three weeks.

A few people asked my why we would bother to put plaques into Wikidata and WikiCommons in this way. Why not have a council website – or why not use Open Plaques?

In this second instalment I am going to demonstrate how we can use the data which we have created to make some interesting visualisations and even do some calculations and analysis.

It can also power other new apps and services – allowing developers to create tailored routes around the city, on themes such as the arts or medicine – which is beyond the scope of this post.

Getting Started

At the time of writing we now have 132 Aberdeen Commemorative Plaques recorded  in Wiki Data.

I can check that with this simple query on the Wiki Data Query Service:

Plaques - Query One
Plaques – Query One

All that this does is ask for every instance (P31) of a commemorative plaque (Q721747) whcich is located in (P131) the Aberdeen City (Q62274582) area.

Try It for yourself.

Click on the white-on-blue arrow at the left. See what it produces. Note the bottom half of the screen turns into a table of results, and on the centre bar there is a message ‘xxx results in xxxx milliseconds‘.

How many pictures of plaques?

I can retrieve the photograph for plaque using the following query.

Plaques - Query Two
Plaques – Query Two

Here I am saying give us plaques which have image (P18). In effect this is saying ONLY those that have an image. If not all entries have an image, yet, then we will get a smaller number.

Try it.

As I run it I get 126 – which is six fewer than I got plaques.

Get all plaques with images or not

Let’s modify the query to this.

Plaques - Query Three
Plaques – Query Three

Here I am the OPTIONAL command which has the effect of saying IF there is an image give me it, but don’t restrict the results to only those with images. When we run that we can spot the missing ones by scrolling down through the list. I get six plaques with no images. This is a useful technique to spot missing things when totals (in this case plaques and images) don’t tally.

Try it.

Commemorating who or what?

As it stands the query is still not very user-friendly as all we have for the plaques is their Plaque ID. Of course we can click on those, but it would be more helpful to have the names of their subjects.

We’ll do that in two steps.

Firstly, let’s work out what the subjects are.

We can add the following line to the query and remember to add ?subject to the SELECT on the first line.

 ?plaque wdt:P547 ?subject

Note P547 is the statement “commemorates“.

Try it

If we run that we get a new column called subject and it is filled with links to subject IDs, which are the Wikidata entries for either people or things that the plaques commemorates. I note that when I run it my list has grown from 132 to 134.

Any guesses why that should be?

Some of the plaques commemorate more than one person.

Let’s make it a bit more friendly.

Add the following line just before the end of your query

 SERVICE wikibase:label {bd:serviceParam wikibase:language "en". }

And change ?subject to ?subjectLabel in the first line.

This instructs the WikiData Query service to use another service to retrieve labels from the items.

Plaques - Query Four
Plaques – Query Four

The label is in effect the title of the Wikidata item. Look at this one Immediately below the title, and to the left, there is an edit link. Click that. See how the ‘label‘ and the ‘description immediately below it become editable. Cancel that for now.

Try running that query to get subject names (labels) back

Now we have a name (in a subjectLabel column) for who or what is being commemorated.

Which provosts have plaques?

We can ask which of our plaques commemorates a previous Lord Provost of Aberdeen.

We use the P547 (commemorates) statement to get our subject, then use the following

subject wdt:P39 wd:Q57906938.

where P39 is Position Held, and Q57906938 is the identifier for Lord Provost of Aberdeen.

Plaques - provosts?
Plaques – provosts?

Currently we appear to have four plaques to former Lord Provosts.

Note: the “Try it” link below has been updated to take  account of subsequent work done to separate Provosts and Lord Provosts into separate categories.

Try it

A different view

At this point you might want to change the view for your query just to have a look at the images we have.

Above the table of results, on the extreme left there is an eye symbol and a drop down. Choose “Image Grid” to see the images only.

Plaques - change view
Plaques – change view

You might also have noticed that there are other options, several of which are greyed out as we don’t yet have that data in our query. These views include ‘Map‘ and “Timeline‘. We’ll come back to those.

Our Image Grid looks something like this:

Plaques - Image Grid
Plaques – Image Grid

Remember to swap back to ‘Table’ view once you’ve finished.

Adding more data fields

We can now add more data fields to our query.

Firstly, let’s add the geographic coordinates of the plaques’ locations.

Add the following line to your code:

 OPTIONAL {?plaque wdt:P625 ?coordinates .}

and, again add the new value, ?coordinates to the first line of the query too.

You will now have an extra field in the returned data table.

Try it 

Mapping results

Now change the view from Table to Map. The Wikidata query service automatically uses the coordinates to plot the results on a map which is scaled to show the results. You may need to scroll down to see all of the map. Click on one of the plotted points. You should get a pop up with the name of the person or building commemorated, plus a photo of the plaque itself, as shown below.

Plaques - map view
Plaques – map view

Note – if you add the following as the first line of your query, it will default to a map view rather than table when first run.


Now let’s see if we can get more data for the people for whom there are plaques.

Dates of birth and death

We can change our query to find out if there are dates of birth and death for our human subjects  (rather than buildings).

We can use P569 (date of birth) and P570 (date of death) and ascribe those to
?DOB and ?DOD respectively – again, adding those fields to our SELECT statement on line one. Your query should look like this?

Plaques - Query Five
Plaques – Query Five

Try it

Looking at our table of results we can see that we have a mix of types of results – people, bridges, buildings etc. but only the people have dates.

Table showing dates of birth
Table showing dates of birth

Interestingly the one subject with the DOB and DOD in the screenshot above is Elizabeth Crombie Duthie who gifted Duthie Park to the city of Aberdeen.

Remember, if you change the DOB and DOB from being OPTIONAL to just being regular requests, you can filter records to show ONLY those with dates associated with them which will screen out not only non-human subjects but will exclude any people with incomplete or missing dates.

Notable people

It could be argued that the fact there is a plaque to a person would indicate that they are notable, but not every person or object for which there is a plaque has a Wikipedia article. Let’s add some code to see which of our plaques has an associated article.

Plaques - Query Six
Plaques – Query Six

Try It

Changing the above so that we remove the OPTIONAL {} around the section beginning ?article  we get ONLY those with Wikipedia articles which is, as I run it, 79 plaque subjects.

You can if you want we add the following

 ?subject wdt:P31 wd:Q5 .

where P31 (instance of ) is Q5 (human) we can screen out all of the non-people plaques.

Try it

At this point, try flipping the view to TimeLine – you may have to scroll down quite a way to see all of the plaques. Many of them are concentrated at the right, spanning much of the 20th century. You should see John Barbour (1316-1395 at the extreme left).

Plaques - timeline
Plaques – timeline

Finally, before we start doing some statistical analysis let’s try something more sophisticated.

Can we create a map showing only female subjects whose work was in the medical sciences?

To do that we need to select only subjects who have a P21 (gender or sex) of Q6581072 (female). Then we need to select an occupation (P31) which is an instance or subclass of Q66811410 (the medical profession). This requires a structure that we haven’t see before:

?occupation wdt:P31/wdt:P279* wd:Q66811410

While we are at it, let’s get an image of the subject if there is one, and find out of there is a wikipedia article about the subject. And, since we want a map, we add that as our default view at the top.

Plaques - map of female medics
Plaques – map of female medics

This gives us the following output:

Map view of female medics
Map view of female medics

Try it

Changing this query to male (Q6581097) or choosing different types of professions is straightforward.

Statistical analysis

The Wikidata Query Service allows us to move beyond visualising the data in different ways. Let’s have a look at a couple of examples.

Analysing who or what is commemorated

The following query finds out what the subject of the plaque is an instance of (P31) – line 6:

Plaque - query seven
Plaque – query seven

but instead of creating a list, it use the COUNT () function to analyse the subject being an instance of (P31) Instance Of.

Try it

We can see that we have 105 humans, 5 lanes etc. Note that some double counting occurs. Some structures, for example, are instances of two things.

We can also analyse the gender of the human subjects just by changing P31 in the above to P21 (Sex or Gender).

At present I get

Plaques by gender
Plaques by gender

That’s far from gender equality, isn’t it!

What’s in a name?

Ascertaining the most common first names on plaques is also straightforward.

We use P735 (given name) statement, get the labels, count and group by those.

Try it.

We get the following results

Plaques - given names chart
Plaques – given names chart

With 81% of plaques to people being for males it is hardly surprising that our league table of names begins with James, William, George, John, Alexander ….

We can do more sophisticated analysis too.

Analysing Occupations

We can add the following line to our query to get back the occupation of the subject of the plaque:

 ?subject wdt:P106 ?occupation

Bear in mind that many of our plaque subjects are true polymaths. Have a look at Robert Brown. He has 10 listed occupations!

So what are the most common occupations of those people for whom there are plaques? Any guesses?

Let’s use the following query:

Plaques - Using Count()
Plaques – Using Count()

This uses the COUNT () function as well as a GROUP BY clause. The query looks at all of the different occupation labels, counts how many of each there are.

Try it

This returns, by default, a table of values. We can flip to a Bar Chart to make better sense of the data:

Plaques - Bar Chart of occupations
Plaques – Bar Chart of occupations

So, we can see that for those commemorated by a plaque the most common occupations are Physician, Painter, University Lecturer, Writer and so on.

We can add a couple of refinements if we wish. If we want our query to default to a BarChart when we run it we can add the following line at the start of the query:


and if we want the table to be sorted by value we can add a line such as

ORDER BY DESC (?count)

Try it

What next?

Over the last month I’ve been busy gathering data, taking photographs and publishing all of those on WikiData and wiki Commons. That phase is not quite complete, if it ever could be considered complete. You can monitor live progress here.

There are a couple of photographs which I can’t easily take which I know Aberdeen City Council’s Museum and Galleries team have. It would be great to see those made available by them on Wiki Commons, as I have shared the 148 plaque photos I have taken.

I know of at least 24 more plaques which I have photographed which are not listed yet in Wikidata.

When I published part one of this series I got some great feedback on Twitter. One suggestion is that we add structured data to the Wiki Commons pages for each photograph. Another was to add further data to the record for each plaque using statement P276 (location) where the plaque is on a known listed building. So far I have done that for 5 plaques – check it for yourself. There are loads more to do.

Many of the people records that I have created in Wikidata are skeletal. They need more detail, photographs, biographical links etc. Similarly, given that people or places are noteworthy enough to merit a plaque, they should pass the notability test for Wikipedia, yet at least 68 plaque subjects have no Wikipedia entry.

And plaques are just a start – an easy introduction to what is possible given, in this case, about 100 hours of work. While that was almost all done by one person, if we ran a Code The City weekend on a similar theme and similar sized challenge, six people could achieve the same over a weekend with a little coordination.

At Code The City, we’re about to start discussions with the local cultural institutions about setting up a more formal alliance for the city (shire?) to help shape how they use digital and data more effectively and grow volunteers with skills and tools to make that happen, which is an exciting note on which to finish this post! Watch this space, as they say.


Joining the dots between Britain’s historical railways using Wikidata – Part One

A bit of background

The evening before Code The City 18 I started to think about what fun project to spend the day doing at our one day mini-hack event. After reading Ian Watt’s blogpost about Wikidata and spending 10 minutes or so playing around with it, I decided a topic for further experimentation was required.

At the time of writing, I’m just over a third of the way through my very interesting part-time online MA Railway Studies at University of York. Looking at Britain’s railways from their very beginning, there are many railway companies from 1821 onwards. Some of these companies merged, some were taken over, others just disappeared whilst others were replaced by new companies. All these amalgamations eventually led to the “Big Four” groupings in 1923 and then on to British Railways in 1948’s railway nationalisation. British Railways rebranded as British Rail in 1965 and then splintered into numerous companies as a result of the denationalisation of the 1990s.

With the railway companies appearing in some form or another in Wikipedia, I thought it would be useful to be able to pick any railway company and view the chain of companies that led to it and those that followed. The ultimate goal would be to be able to bring up the data for British Rail and then see the whole past unfold to the left and the future unravel to the right. In theory at least, Wikidata should allow me to do that. 

No software coding skills are required to see the results of my experimentation: by clicking on the links provided (usually directly after the code) it is possible to run the queries and see what happens. However, using the code provided as a start, it is possible to build on the examples to find out things for yourself.

Understanding Wikidata and SPARQL

SPARQL is the query language used to retrieve various data sets from Wikidata via the Wikidata Query Service.

As is always the case with anything software related, the examples and tutorials never seem to handle those edge cases that you seem to hit within the first 5 minutes. Maybe I hit these cases so soon due to jumping straight from the “hello world” of requesting all the railway companies formed in the UK to trying to build the more complex web of railway companies rather than working my way through all the simpler steps? However, my belief is to fail quickly, leaving plenty of time left to fail some more before succeeding, after all you never see a young child plan out a strategy when they are learning to get the different shaped blocks through the correct holes.

At the time of writing…

Comments about the state of certain items of data were relevant at the time I wrote this article. As one of the big features of Wikidata is it constantly being updated, expanded and corrected, the data referenced may have changed by the time you read this. Some of the changes are those I’ve made in reaction to my discoveries, but I have left some out there for others to fix.

A simple list

First off, I created a simple SPARQL query to request all the railway companies that were formed in the UK.

SELECT ?company ?companyLabel
?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
SERVICE wikibase:label {
bd:serviceParam wikibase:language “en”.
ORDER BY (lcase(?companyLabel))

Run the query

The output of this query can be seen by running it yourself here by clicking on the white-on-blue arrow displayed on the Wikidata Query Service console. It is safe to modify the query in the console without messing up my query as any changes cause a new bookmarked query to be created. So please experiment as that’s the only way to learn.

Now what does the query mean and where do all those magic numbers come from?

  • wdt:P31 means get me all Wikidata triples (wdt) that have the property instance of  (P31) that is has a value of railway company (Q249556).
  • wdt:P17 means get me all of the results so far that have the property country (P17) set to United Kingdom (Q145).

Where did I get those numbers from? First, I went to Wikipedia and searched for a railway company, LMS Railway, and got to the page for London, Midland and Scottish Railway. From here I went to the Wikidata item for the page.

Screen grab of Wikipedia page for LMSR that shows how to get to the Wikidata
Wikipedia page for LMSR that shows how to get to the Wikidata

From here I hovered my pointer over instance ofrailway companycountry and United Kingdom to find out those magic numbers.

Screen grab of the Wikidata page for LMSR
Wikidata page for LMSR

Some unexpected results

Some unexpected companies turned up in the results list due to my query not being specific enough. For example, Algeciras Gibraltar Railway Company, located in Gibraltar but with headquarters registered in the UK the data has its country as United Kingdom. To filter my results down to just those that are located in the UK I tried searching for those that had the located in the administrative territorial entity (P131) with any of the following values:

  • England (Q21) 
  • Northern Ireland (Q26)
  • Scotland (Q22)
  • Wales (Q25)
  • Ireland (Q57695350) (covering 1801 – 1922)

using this query:

SELECT ?company ?companyLabel ?countryLabel
  VALUES ?country { wd:Q21 wd:Q26 wd:Q22 wd:Q25 wd:Q57695350 }
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145; wdt:P131 ?country.
  SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".

ORDER BY (lcase(?companyLabel))

Run the query

However, that dropped my result set from 228 to 25 due to not all the companies having that property set.

Note: When trying to find out what values to use it is often quick and easy to run a simple query to ask Wikidata itself. To find out what all the values were for UK countries I wrote the following that asked for all countries that had an instance of value of country within the United Kingdom (Q3336843):

select ?country ?countryLabel
  ?country wdt:P31 wd:Q3336843 .
  SERVICE wikibase:label {
 bd:serviceParam wikibase:language "en".

Run the query


In order to see what other information could easily be displayed for the companies, I looked at the list of properties on the London, Midland and Scottish Railway. I saw several dates listed so decided that would be my next area of investigation. There is an inception (P571) date that shows when something came into being, so I tried a query with that:

SELECT ?company ?companyLabel ?inception
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  ?company wdt:P571 ?inception 
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en".
ORDER BY (lcase(?companyLabel))

Run the query.

This demonstrated two big issues with data. Firstly, the result set had dropped from 228 to 106 indicating that not all the company entries have the inception property set. The second was that only one, Scottish North Eastern Railway, had a full date (29th July 1856) specified, the rest only had a year and that was being displayed as 1st January for the year. Adding the OPTIONAL clause to the inception request returns the full data set with blanks where there is no inception date specified.

SELECT ?company ?companyLabel ?inception
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  OPTIONAL { ?company wdt:P571 ?inception. }
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en".
ORDER BY (lcase(?companyLabel))

Run the query

Railway companies are not a straightforward case when it comes to a start date due to there being no one single start date. Each railway company required an Act of Parliament to officially enable it to be formed and grant permission to build the railway line(s). This raises the question: is it the date that Act was passed, the date the company was actually formed or the date that the company commenced operating their service that should be used for the start date? Here is a revised query that gets both the start time (P580) and end time(P582) of the company if they have been set:

SELECT ?company ?companyLabel ?inception ?startTime ?endTime
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  OPTIONAL { ?company wdt:P571 ?inception. }
  OPTIONAL { ?company wdt:P580 ?startTime. }
  OPTIONAL { ?company wdt:P582 ?endTime. }
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en".
ORDER BY (lcase(?companyLabel))

Run the query

Unfortunately, of the 228 results only one, London, Midland and Scottish Railway, has a startTime and endTime, and London and North Eastern Railway is the only with endTime. Based on these results it looks like that startTime and endTime are not generally used for railway companies. Looking through the data for Scottish North Eastern Railway did turn up a new source of end dates in the form of the dissolved, abolished or demolished (P576) property. Adding a search for this resulted in 9 companies with dissolved dates. 

SELECT ?company ?companyLabel ?inception ?startTime ?endTime ?dissolved
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  OPTIONAL { ?company wdt:P571 ?inception. }
  OPTIONAL { ?company wdt:P580 ?startTime. }
  OPTIONAL { ?company wdt:P582 ?endTime. }
  OPTIONAL { ?company wdt:P576 ?dissolved. }
  SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
ORDER BY (lcase(?companyLabel))

Run the query

There is no logic in which companies have this property: they range from Scottish North Eastern Railway dissolving on 10th August 1866 to several that ended due to the formation of British Railways, the more recent British Rail ending on 1st January 2001, and the short lived National Express East Coast (1st January 2007 – 1st January 2009). However, once again, the dates are at times misleading as, in the case of National Express East Coast, it is only the year rather than full date in the inception and dissolved, abolished or demolishedproperties.

Some of the railway companies, such as Underground Electric Railways Company of London, have another source of dates and that is as part of the railway company value for their instance of. It is possible to extract the start and end dates if they are present by making use of nested conditional queries. In the line:

OPTIONAL {?company p:P31 [ pq:P580 ?companyStart]. }

the startTime property is extracted from the instance of property if it exists.

SELECT ?company ?companyLabel ?inception ?startTime ?endTime ?dissolved ?companyStart ?companyEnd
  ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 .
  OPTIONAL { ?company wdt:P571 ?inception. }
  OPTIONAL { ?company wdt:P580 ?startTime. }
  OPTIONAL { ?company wdt:P582 ?endTime. }
  OPTIONAL { ?company wdt:P576 ?dissolved. }
  OPTIONAL { ?company p:P31 [ pq:P580 ?companyStart] . }
  OPTIONAL { ?company p:P31 [ pq:P582 ?companyEnd] . }
  SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en".
ORDER BY (lcase(?companyLabel))

Run the query

Another date that can be used to work out the start and end of the companies can be found hanging off the values of very useful pair of properties: replaced by (P1366) and replaces (P1365). This conveniently connects into the next part of my exploration that will follow in Part Two. Although, as with many railway related things, the exact time of arrival of part two cannot be confirmed.

[Header photograph taken by Andrew Sage]