Saturday 6th March, 2021 was World Open Data Day. To mark this international event CTC ran a Wikidata Taster session. The objectives were to introduce attendees to Wikidata and how it works, and give them a few hours to familiarise themselves with how to add items, link items, and add images.
The theme of the session (to give it some structure and focus) was the Industrial Heritage of Aberdeen. More specifically the bygone industries of Aberdeen and, more specific still, the many Iron Foundries that once existed. I chose the specific topic as it is still relatively easy to spot the products of the industry on streets and pavements as we walk around the city, photograph those and add them to Wiki Commons, as I have been doing.
We had thirteen people book and eight turn up. After I gave a short presentation on how Wikidata operates we divided ourselves into three groups in breakout rooms. This was all on Zoom, of course, while we were still under lockdown.
The teams of attendees chose a foundry each: Barry, Henry & Cook Limited; Blaikie Brothers, and William McKinnon & Company Ltd. I’d already created an entry for John Duffus and Company in preparation for the event and to use as a model.
I’d also created a Google Sheet with a tab for each of the other thirteen foundries I’d identified (including those selected by the groups). I’d also spent quite a while trying to figure out how to access and search the old business and Post Office Directories for the city which had been digitised for 1824 to 1941. I eventually I built myself a tool, which I shared with the teams, which generated an URL for a specific search term for a certain directory. They used this, as well as other sources, to identify key dates, addresses and name changes of businesses.
By the end of the session our teams had created items for
They had also created items for foundry buildings – linked to Canmore etc, as well as founders. We enhanced these with places of their burial, portraits and images of gravestones. I took further photos which I uploaded to Commons and linked the following Monday. I created two Wikidata queries to show the businesses added, and the founders who created the businesses.
The statistics for the 3 hour session (although some worked into the afternoon and even the next day) are impressive. You can see more detail on the event dashboard.
We received positive feedback from the attendees who have been able to take their first steps towards using Wikidata as a public linked open data for heritage items.
I hope that the attendees will keep working on the iron founders until we have all of these represented on Wikidata. Next we can tackle shipbuilders and the granite industry!
It’s often touted that there are some cities in Scotland (coughEdinburghcough) where there are more statues to animals than there are to women. In my own work transferring OpenPlaques data to Wikidata I’ve observed that there are more entries for Charles Rennie Macintosh than there for women in Glasgow. So in this light, it’s somewhat refreshing to work on a project that celebrates all kinds of memorials to women in Scotland.
The Women of Scotland: Mapping Memorials project began in 2010 as a joint project between Glasgow Women’s Library, and Women’s History Scotland. It’s similar in many ways to OpenPlaques, but using Wikidata could add an extra dimension – let’s increase the coverage of women’s history and culture on the Wikimedia projects by getting these memorials and the women they celebrate into Wikidata, use that to identify gaps in knowledge, and then work to fill the gap.
Once we had this list, we could create a more automated process to deal with gathering the other pieces of information we needed to create new, good quality Wikidata items, although some (description, for example) needed a human eye.
We were using two main identifiers on Wikidata – P8048 (Women of Scotland memorial ID) and P8050 (Women of Scotland subject ID). The former for the entries to the memorials themselves, and the latter for the women they celebrate. Where the women didn’t have entries, we could create those, and then link them to the entries for the memorials.
Both identifiers use the last part of the URL for each entry on the Mapping Memorials site, so that was fairly easy to do in Google docs. Once we had that info, it’s an easy enough step to bulk-create items either using Quickstatements or Wikibase CLI.
Creating items & avoiding duplicates
There’s a plug in for Google Sheets called Wikipedia and Wikidata Tools which has some useful features for projects like this – WikidataQID for looking up whether something already exists on Wikidata, and WikidataFacts, which tells you what that item is. The former is ok if you have an exact match, the latter is really useful for flagging anything which might lead to a disambiguation page, for example.
Ultimately we did end up with a few duplicates that needed to be merged, but this was pretty easily managed, and it really showed how useful it is to have local knowledge involved in local projects – there were a couple of sets of coordinates that were obviously wrong, but also some errors that wouldn’t have been spotted by someone unfamiliar with the area.
Coordinates and dates
I really like Quickstatements, but there are a few areas in which it’s fiddly, including coordinates and dates. I’m really interested in looking further into Wikibase CLI for dates in particular, as the process there for dates (documented here) looks to be substantially easier in terms of data prep than it does in Quickstatements. Many thanks to Tony for that work, as his expertise saved us a lot of time! He also used that tool to create items for those women commemorated who were missing from Wikidata, documented here.
As with dates, coordinates are entered into Quickstatements in a different format than that which you’d use manually inside Wikidata itself, hence the formatting you’ll see in column Q on the Data collection tab. Most of this we had to grab from Google Maps, which again is a bit fiddly.
Once we had a master list of QIDs for the memorials we were working with, we could use Quickstatements to bulk upload sets of statements to those items.
For example, matching the memorials to the women commemorated, using this format:
The Q numbers on the left are those of the memorials, P547 is “commemorates”, and the Q numbers on the right are those of the women celebrated. We were also able to add P8050 (Women of Scotland subject ID) to some women who already had entries on Wikidata, but no WoS ID.
The Q number on the left again is the memorial, P31 is “instance of”, and the Q number on the right corresponds to a type of thing – a commemorative plaque, a garden, or a road, for example.
Once you’ve got the info in this format, it’s just a case of copy & pasting into QS, clicking import, and then run. (Note – you do need to be an autoconfirmed user to use QS, which means that your account must be at least 4 days old, and having more than 50 edits.) It’s relatively easy, and I was pleased that one of our relatively-new-to-Wikidata participants had the chance to make her first bulk uploads (description & commons category) using the tool over the weekend.
This project grew out of a desire to increase the coverage of Scottish heritage on Wikimedia Commons, so it was great to take some time on this. Mapping Memorials does have some images, but they’re not openly licensed, and others are missing. After Wikimedia Commons, our next port of call was Geograph, where many images have been released on Wiki-compatible Creative Commons licenses. Using Geograph2Commons, images can easily be transferred over to Wikimedia Commons, so that they can be used in any Wikimedia Project. Geograph also links to this feature from their site – click on “Find out how to reuse this image”, and then scroll down to “Wikipedia template for image page”, then click on the “geograph2commons” link. Really simple. Our group did some detective work for images, and then added them to Commons, and linked them manually to the Wikidata item.
This gave us a list of missing images… which is fine, but wouldn’t it be better to see them on a map?
Visualisation and filling the gaps
Thanks to Ian’s tutorial on how to create a custom WikiShootMe map, we were able to create a custom map that showed us which of the memorials we were working on had images, which didn’t, and where they were. That map is here, and it was great to see it slowly turn more green than red over the weekend as we found more images, or as volunteers headed out across Aberdeen between days to take missing pictures.
One of the small, but very satisfying, things you can do with these kinds of images is to integrate them into relevant Wikipedia articles. I added images from the project to the articles for Aberdeen Town House, Caroline Phillips, and Katherine Grainger. At the time of writing, around 2500 people have viewed those articles since I added the images.
Over the course of the weekend we added 77 new memorials, and 26 new women to Wikidata, as well as a whole host of new photos. These entries all had some quite rich data, and as complete as we could make it.
We were surprised to see some of the individuals who didn’t have a Wikipedia article – and of course, we can use the Wikidata query service to identify those gaps. The queries below could give us a great starting point for an editathon, or indeed, for any Wikipedia editor interested in writing Women’s biography.
Wikidata query for women with a Women of Scotland subject ID, a memorial in Aberdeen, but no enwiki article: https://w.wiki/YVH
Wikidata query for women with a Women of Scotland subject ID, but no enwiki article: https://w.wiki/YVG
Huge thanks to the team, and to Code the City for another great hack weekend!
Dr Sara Thomas
Scotland Programme Coordinator, Wikimedia UK
Mesolithic Deeside is a group of archaeologists, students and local volunteers investigating the river Dee area 10,000 years ago. They’ve been gathering flints on seasonal field-walking trips and recording the data from the outputs of those allowing them to map Mesolithic Deeside.
The following is a summary of what the the group with some additional helpers achieved over the two days of CTC20.
Team: Andy, Ali, Sheila and Irvine
Discussed the goals of the project with the Mesolithic Deeside Team
Displaying data visually for public consumption
Updating / refreshing the website
Looking at ways to identify future sites for test pitting
Decided to focus on developing a way of visualising the data that has been collected
Data is currently stored in a QGIS project and a number of csv files
Initial work looked into the possibility of using QGIS and Tableau for visualisation
Tableau was later dropped in favour of QGIS
Issues with Andy loading QGIS data from the project – no reason why it shouldn’t work
Decided that Irvine would focus on working with QGIS and Andy would focus on finding a solution with Google Maps
Andy has selected a subset of the data and is currently working to put that data on a Google Map
Data needs to be cleaned and tidied up before being displayed, i.e typos, consistent name formatting
Currently working with Google Sites & Awesome Table
Awesome table works with Google Sheets and picks up certain types from a header row – can be tricky to get working
Unable to disable clustering when zoomed in
Team 1: Andy, Robert & Irvine
Team 2: Ali, Sheila & Dave
Collate the finds data into a single spreadsheet
Investigate simple HTML / JS implementation of a google map with filter
Two extra members joined our group today: Robert and Dave
Provided an update and explanation of what we have done so far to Robert and Dave
It was decided that the we split into two groups:
Ali, Sheila and Dave would explore options for the Mesolithic Deeside website
Andy, Irvine and Robert would continue working with the flint data
The following codepen was found showing what we were looking for, however, the code and script was not runnable, which meant devising our own code
After such a successful weekend at CTC19, we were delighted to be back for CTC20 to continue work on the Aberdeen Harbour Arrivals project. As expected, the team working on the project was made up of both avid coders and history enthusiasts which brings a great range of skills and knowledge to the weekend.
A second spreadsheet was created to input adjustments, this allowed us to clean data to be more presentable whilst keeping the accurate ledger transcriptions intact; a must when dealing with archival material. This data cleaning has allowed us to create a more presentable website which is easier to understand and navigate.
Expanding the data set
The adjustments spreadsheet also included the addition of a new column of information sourced externally from the original transcription documents. When first registered fishing vessels were assigned a Fishing Port Registration Number. Where known, that number has been added and will hopefully allow us to cross reference this vessels with other sources at some point in the future.
Vessel types and roles
Initial steps were taken to begin to create a better understanding about the various vessels, their history and purpose. Many of the vessel names contain prefixes relating to their type (e.g. HMS – His Majesty’s Ship for a regular naval vessel, HMSS for a submarine) and they have now been extracted and a list of definitions is being built up. Decoding these prefixes highlighted just how much naval military activity was taking place around Aberdeen during the First World War.
Visualising the data
Some of the team also looked forward to consider how the data could be used in the future. A series of graphs and charts have been created to highlight patterns such as most frequent ships and most popular cargo. We even have an interactive map to show where the in the world the ships were arriving from.
As with CTC19, the weekend has been a great success. Archivists learned more about data and the coders benefitted from over 15,000 records to play with.
An ideal future step for the project is the creation of individual records in the website for each vessel so we can begin to expand on the information – i.e. vessel name, history of Masters, expanded description about what it was, what role in played in the First World War. Given the heavy use of Wikidata by many of the other projects that were part of CTC19 and CTC20, consideration has to be given to using Wikidata as the expanded repository for building up the bigger picture for each vessel. However, as we are still very much in the historical investigation stage and not entirely sure about the full facts for many vessels it would not be appropriate at this stage to start pushing unverified information into Wikidata.
This project was commenced at CTC19 on 11th -12th April. The aim was to import from Aberdeen Built Ships (with the permission of the Galleries and Museums Service who operate it) a complete set of data on those 3000+ ships into Wikidata data in as clean and well-formatted state as possible.
We got part of the way there at CTC19, and in work done in the following weeks, but the data had still not been imported.
We had in the weeks since CTC19, we had identified issues with two significant aspects of the data in the core ABS system: a lack of standardisation of ship types (meaning that there were up to nine variants of a single type) and a similar issue with ship builders.
For the purposes of CTC20 we agreed to set these aside and press ahead with an import of core data for each ship we could – and to revisit the specific details above later.
What was done
Core data was imported into Wikidata for most of the ships. We excluded some ships from the import if the name field was blank or UNKNOWN or UNNAMED. Other, existing, ships had an ABS ID added to their item. This has resulted in 3085 ships in Wikidata with an ABS ID at the time of writing.
We initially tried to use the CSV format for wikidata quickstatements, but couldn’t get this to work so switched to the TSV version. A python script was written to write the quickstatements file that could then be copied into the quickstatements batch import tool. The import had 2 errors for ships that had a range of years in the Date so generated invalid dates in the quickstatements. These (and 2 duplicates that I noticed after the import) are noted to correct later.
The ABS ID property (P8260) was manually added to the ships that already existed in wikidata.
SELECT ?qid ?absid
?qid wdt:P8260 ?absid.
To complete the project the following needs to be done
Add Country of Origin (P495) to all existing Aberdeen-built ships in Wikidata. This will suppress the warning messages when viewing each ship.
Rationalise all ship builders that exist in ship_builders.csv – deduplicating these and create Wikidata entries for each we will use.
Rationalise all ship types that exist in ship_types.csv – deduplicating these and create Wikidata entries for each we will use.
Update each ship with specific type and ship builder.
Extract / rationalise data from some of the fields, e.g. we have one dimensions field rather than separate fields for length/beam/draft/… and what’s there is inconsistent
Isolate ships that have no Wikidata identifier – i.e. any one not in the list of 59 positive matches. Set aside those which have entries for later processing.
Source and add pictures of the ships in ABS (see below)
Develop a means of monitoring both the original ABS system (rescrape periodically and do a diff on the file in some way? ) and monitor Wikidata for changes to the ships records (Wikidata query, executed periodically, generating a CSV download and checked for differences from previous runs?) to feed back to ABS.
Images of ships
Despite there now being 3,085 Aberdeen-built ships in Wikidata only 12 of these (or 0.388%) has a picture associated with them. There is a significant opportunity to work with Aberdeen Museums to add images from their extensive collection to Wiki Commons and associate these with the ships now in Wikidata.