Aberdeen Plaques – Part One

On Saturday 14th December 2019 we ran a one-day mini hack event. The idea behind it was for people to come along for a day to work on their side projects and, if they needed support, attempt to persuade others to assist them.

That’s what I did with my Aberdeen Plaques project: something I’d had on the back burner for more than a year.

Why do it?

The commemorative plaques which are dotted around the city are a perfect candidate for open data. They have a subject, usually some dates, are located somewhere, and are of different types etc. Making that all available as open data would open up a whole range of possibilities.

Some Aberdeen plaques
Some Aberdeen plaques

If we captured all of that well then we could do analysis on the data (ratio of women to men, most represented professions), create walking routes (maybe one for the arts, one for the sciences and so on), create timelines to see what periods are more represented.

Having recently trained as a WikiMedia UK trainer – and having experimented with some of the tools (Wiki Commons, Wiki Data, Wikipedia, Histropedia) I was convinced that these were the right way to go.

Pre-event prep

So, in advance of the hack day I’d done a bit of prep in the two weeks running up to the day iteself.

I’d created a spreadheet which recorded the
* subject (person or ‘thing’)
* Gender if known
* the link to the now-retired city council plaques system (hidden from public view)
* The location if known
* The geo coordinates (to be determined)
* Whether the subject had a Wikipedia page (tbd)
* Whether there was an image of the plaque on Wiki Commons (tbd)
* Whether the subject of the plaque was represented on Wiki Data (tbd)
* Any identifiers on Open Plaques (tbd)
* Any external links (eg to Flickr for photos)

I’d then populated some of the data (eg whether there were images of the plaque on Wiki Commons) as well as some other bits. But most cells were blank.

Pre-event spreadsheet
Pre-event spreadsheet

As a keen walker and photographer I had also photographed and uploaded seventeen plaque images to Wiki Commons in the lead up, so that we would have some images to work with.

How to use our time most effectively on the day?

Our aim for the day was then to find out what data / info / images existed, fill in the gaps, and explore how to use WikiData to store and retrieve data, and how we could potentially create maps, timelines and similiar new products.

What we did on the day

At the start of the event we pitched our project ideas, and I managed to persude five others (Angela, Mike, Stephen, James and Steve) to join me in working on the plaques project.

Angela and Mike, and later Angela and Stephen would go out and take photographs. Steve, James and I would work on the data capture, completing research on what existed, creating new entries for the data on Wiki Data, and testing queries on the Wiki Data query service.

How we did it

We used the spreadsheet that I had set up to capture all of the data we’d gathered – and as it eveolved it would show progress as well as what was still lacking. We had no expectations that we would do it all on the day, but we could pick away at it in future weeks and months.

In the run-up to the event I’d discovered The Pingus’ album of plaques photographs on Flickr. Sadly these had not been published with a licence that would allow us to use them. I’d sent a request, a few days before CTC18, for them to change the licence for the Aberdeen plaques pictures to a CC-SA one. This would have allowed our republishing on Wiki Commons. Sadly it didn’t elicit a response. But the album did show that there were many more plaques than the old ACC system listed. And it was possible to get co-ordinates from them. So the number of plaques to deal with kept growing.

During the day James filled in loads of gaps in which subjects were on Wikipedia and which on Wikidata.

Steve and I experimented with capturing and querying the data. Structuring that in a way that aids recall through Wiki Data Query Service was an interative process. Firstly I tried adding a statement ‘commomorative plaque image’ (P1801) into the wikidata record for the subject as you can see in this first example https://www.wikidata.org/wiki/Q2095630. But that limited what we could do.

So, we discovered that we could create a new object which was an instance of commemorative plaque. Our first attempt was https://www.wikidata.org/wiki/Q78438703 and we evolved what we captured there – adding statement, and Steve discovered the ‘openPlaques plaque ID'(P1893). Incidentally we also tried ‘openplaques Subject ID’ (P1430) but adding that to the plaque object throws an error. The latter should be added to the person record not the plaque.

At the end of CTC18

We ended the day with

  • 138 plaques listed.
  • 57 sets of co-ordinates identified
  • 68 Wikipedia articles identified as matching plaque subjects (and eleven plaques subjects who had NO wikipedia page)
  • 36 Images in WikiCommons
  • 77 WikiData entries for the subject of the plaques (existing or created)
  • 11 new wikidata entries for the plaques themselves

This was a great leap forward in one day and would pave the way for future work.

What next?

Since CTC18 ended, I’ve got firmly stuck into this project over the xmas break. Over the last three weeks I have now photographed over a hundred plaques (plenty of walking) and have created wikidata entries for most plaques and also their subjects in wikidata.

I’ll cover all of that, and how we can now use the data in part two, coming soon.

A timeline of Female Aberdeen Uni Graduates

Background

Earlier this year Code The City held an Editathon with Wikimedia UK. The subject was the history of Aberdeen Cinemas. We ended up with 16 people all working together to create new articles, update existing ones, capture new images for Wiki Commons, and generate or enhance WikiData items. This was a follow up to previous sessions that Dr Sara Thomas of WikiMedia UK led for us in the city, mainly for information professionals.

This has led to significant interest from cultural bodies in the city in using the suite of WikiMedia platforms and tools to improve access to their collections in Aberdeen. We expect to do quite a bit more of this with them in 2020.

Two weeks ago I attended a Train the Trainer 3-day workshop in Glasgow for Wikimedia UK to become a trainer for them in Scotland.  That will see me training professionals and volunteers in how to use Wikipedia, Wiki Commons and Wikidata in particular.

In this blog post I explain why you might want to use some of the fancy features of WikiData query service, show you how to do that, using on my adaptation of others’ shared examples, and encourage you to experiment for yourself.

Wikidata

Wikidata uses a Linked Open Data format to store data. While I have added quite a number of items to Wikidata I’ve not had a chance to really study how to use SPARQL (the query language behind the scenes) to to execute queries against the data. This is done in the Wikidata Query service. This is a key skill to using some of the more advanced features. Without the means to extract data there is little point in stuffing data into it. In fact WikiData allows us to do some very fancy things with the data which we retrieve.

So, I decided this week to start working on that. This describes the first steps  I have been doing. It should also provide a simple introduction to any else wanting to dip their toe in the SPARQL waters.

Where to start?

This 16-minute tutorial on Youtube is a great place to begin; it is where I started. It describes how to create a simple query and build it up to something more powerful.  I copied what it did then adapted that to build a query that I wanted. I suggest that you watch it first to understand what each line of SPARQL is doing.

Here are the steps, mainly frown from and adapted from that tutorial.

Find all female graduates of Aberdeen University
Find all female graduates of Aberdeen University

In the query above we use the Educated at statement (P69) and the identifier for Aberdeen University (Q270532 ) in combination with the Sex or gender statement (P21) with the Female identifier (Q6581072).

You can run this for yourself here using the white-on-blue arrow. I’ve used one of the great things you can do with Wikidata which is to share this query  using the link symbol on the left of the page just above the arrow:

Save a Wikidata query
Save a Wikidata query

Changing the parameters of the query means that we can check males (Q6581097) against females (Q6581072). Or you can compare different universities. To do this go to the Wikidata homepage and search for the name of the institution. The query will return a page with the Q code in the title. Thus we can compare various universities by amending the Q code in the query above: University of Aberdeen (Q270532) with University of Glasgow (Q192775) or Edinburgh University (Q160302).

Running these queries we can see that the number of both male and female graduates with entries on WikiData of Aberdeen University  is significantly smaller than from either Glasgow or Edinburgh, and we can see that the proportion of females of all graduates for each university is smallest for Aberdeen.

 

University Male Grads Female Grads % Female
Aberdeen 944 125 11.7
Edinburgh 3804 571 13.1
Glasgow 1562 291 15.7

The results of these queries should themselves cause us to reflect on the relatively smaller number of results of either gender from Aberdeen compared to the other universities;  and also the smaller proportion of women. It suggests that there is some work to do to ensure that we get better representation of both genders in Wikidata.

Enhancing our query

Now that we have a basic query we can retrieve additional bits of data for the subjects of the query including place of birth, date of birth and images.

These are represented by P19 (birth place), P560 (date of birth) and P18 (image). As we see in the example below, when we query these we follow them with a name we assign to the item returned (e.g. ?person wdt:P19 ?birthPlace ) and we add the name we give it, in this case ?birthPlace to the Select statement on the first line of the query, ensuring that it will feature in the data returned in the table or other format output.

enhanced wikidata query

You will note that the above example now uses the ?birthPlace  to create a new query to get the co-ordinates (P625) of that place which we assign to coordinates:

> ?birthPlace wdt:P625 ?coordinates

and we include coordinates in the first line of things we will display.

Advantages of extra data elements

By having birthplace coordinates we can plot the results in a map which is easily done using the tools built into the wikidata query service.

Run the query (white arrow on blue on the left menu) and observe the table that was returned. You can see that the first line of the Select statement formed the columns of the table.

Table of wikidata query results
Table of wikidata query results

Note that instead of 125 results as we had in the simple query, we only get 20 results. My understanding of this is that we are specifying records which must have a place of birth, an image etc. Where these do not exist then they records for that person are not returned. This in itself shows that there is a piece of work to do to identify where records in the batch of 125 lack these elements and fix them.

In fact you could say that there is a whole cycle of adding data, querying it, spotting anomalies, fixing those and re-querying which leads to substantial enrichment of the data.

Map results

Now click on the dropdown by the eye symbol, on the left immediately above the results, and choose the map option. The tool will generate a map with a pin in the location of each place of birth. You can pan and zoom to the UK and click on each pin. Try it. To get back to the query, click on the arrow, top-right.

wikidata map view with clicked point
Wikidata map view with clicked point

A timeline

Now click on the eye symbol to show other options, and choose Timeline.

As we can see below, the Wikidata query service will construct a rudimentary timeline with relatively little effort.  This is one of its great features. So far we have the same 20 complete records – and the cards or tiles are titled by the place of birth but we can change that.

Wikidata timeline
Wikidata timeline

Enhancing the timeline on Histropedia

To improve on our timeline we can construct a better query using the Wikidata Query Service then paste it into the Histropedia service to run it.  Our first version which makes small improvements on our previous timeline produces the results below. This labels by the person’s name, and colour codes the individual records by place of birth label. To see the code, click the gear wheel at the top right of the screen. Note we still only retrieve 20 results.

A first query on Histropedia
A first query on Histropedia

We can substantially enhance this query as we have done on the following version. This makes certain items optional, gets the country of birth and colour-codes by that, and ranks the records by prominence (with the most prominent at the front). If I understand it correctly by using optional elements it also retrieves 76 records, much more than previously.

enhanced Histropedia timeline
enhanced Histropedia timeline

I would encourage you to watch the tutorial video at the start of this post, then try to hack some of the queries to which I provided links. For example how many female graduates of the Robert Gordon University would each query generate? How would you find the Q code of that institution?  Have fun with it!