Code The City is a civic hacking initiative focused on using tech and (open) data for civic good. We use hack weekends, open data, workshops, and idea generation tools. We run Data Meetups, the Aberdeen Python User Group and the annual Scottish Open Data Unconference
One of the Code the City 21 projects was looking at providing Scots translations of Aberdeenshire place names for displaying on an OpenStreetMap map. Part of the outcomes for that project included a list of translated places names and potentially an audio version of name to guide in pronunciation.
I’m a firm believer that Open Data shouldn’t just become “dusty data left on the digital shelf” and to “show don’t tell”. This led me to decide to show just how easy it is to do something with the data created as part of the weekend’s activities and to make use of outcomes from a previous CTC event (Aberdeenshire Settlements on Wikidata and Wikipedia) and thus take that data off the digital shelf.
My plan was to build a simple iOS app, using SwiftUI, that would allow the following:
Listing of place names in English and their Scots translation
View details about a place including its translation, location and photo
Map showing all the places and indicating if a translation exists or not
I used SwiftUI as it is fun (always an important consideration) to play with and quick to get visible results. It also provides the future option to run the app as a Mac desktop app.
Playing along at home
Anyone with a Mac running at least Catalina (macOS 10.15) can install Xcode 12 and run the app on the Simulator. The source code can be found in GitHub.
Getting the source data
Knowing that work had previously been done on populating Wikidata with a list of Aberdeenshire Settlements and providing photos for them, I turned to Wikidata for sourcing the data to use in the app.
# Get list of places in Aberdeenshire, name in English and Scots, single image, lat and long
SELECT ?place (SAMPLE(?place_EN) as ?place_EN) (SAMPLE(?place_SCO) as ?place_SCO) (SAMPLE(?image) as ?image) (SAMPLE(?longitude) as ?longitude) (SAMPLE(?latitude) as ?latitude)
WHERE {
?place wdt:P31/wdt:P279* wd:Q486972 .
?place wdt:P131 wd:Q189912 .
?place p:P625 ?coordinate.
?coordinate psv:P625 ?coordinate_node .
?coordinate_node wikibase:geoLongitude ?longitude .
?coordinate_node wikibase:geoLatitude ?latitude .
OPTIONAL { ?place wdt:P18 ?image }.
OPTIONAL { ?place rdfs:label ?place_EN filter (lang(?place_EN) = "en" )}.
OPTIONAL { ?place rdfs:label ?place_SCO filter (lang(?place_SCO) = "sco" )}.
}
GROUP BY ?place
ORDER By ?place_EN
Single image for the place (some places have multiple images so had to be restricted to single image)
Latitude of place
Longitude of place
Just requesting the coordinate for each place resulted in a text string, such as Point(-2.63004 57.5583), which complicated the use later on. Adding the relevant code
to the query to generate latitude and longitude values simplified the data reuse at the next stage.
The results returned by the query were exported as a JSON file that could be dropped straight into the Xcode project.
The App
SwiftUI allows data driven apps to be quickly pulled together. The data powering the app was a collection of Place structures populated with the contents of the JSON exported from Wikidata.
struct Place: Codable, Identifiable {
let place: String
let place_EN: String
let place_SCO: String?
let image: String?
var latitude: String
var longitude: String
// Computed Property
var id: String { return place }
var location: CLLocationCoordinate2D {
CLLocationCoordinate2D(latitude: Double(latitude)!, longitude: Double(longitude)!)
}
}
The app itself was split into three parts: Places list, Map, Settings. The Places list drills down to a Place details view.
List of places in English and their Scots translation if included in the dataDetails screen about a placeMap showing places and indicating if they have Scots translation (yellow) or not (red)
The Settings screen just displays some about information and where the data came from. It acts partially as a placeholder for now with the room to expand as the app evolves.
Next Steps
The app created over the weekend was very much a proof of concept and so has room from many improvements. The list includes:
Caching the location photos on the device
Displaying additional information about the place
Adding search to the list and map
Adding audio pronunciation of name (the related Doric Tiles project did not achieve adding of audio during the CT21 event)
Modified to run on Mac desktop
Ability to requested updated list of places and translations
The final item on the above list, the ability to request an updated list of places, in theory is straight forward. All that would be required is to send the query to the Wikidata Query Service and process the results within the app. The problem is that the query takes a long time to run (nearly 45 seconds) and there may be timeout issues before the results arrive.
After such a successful weekend at CTC19, we were delighted to be back for CTC20 to continue work on the Aberdeen Harbour Arrivals project. As expected, the team working on the project was made up of both avid coders and history enthusiasts which brings a great range of skills and knowledge to the weekend.
A second spreadsheet was created to input adjustments, this allowed us to clean data to be more presentable whilst keeping the accurate ledger transcriptions intact; a must when dealing with archival material. This data cleaning has allowed us to create a more presentable website which is easier to understand and navigate.
Expanding the data set
The adjustments spreadsheet also included the addition of a new column of information sourced externally from the original transcription documents. When first registered fishing vessels were assigned a Fishing Port Registration Number. Where known, that number has been added and will hopefully allow us to cross reference this vessels with other sources at some point in the future.
Vessel types and roles
Initial steps were taken to begin to create a better understanding about the various vessels, their history and purpose. Many of the vessel names contain prefixes relating to their type (e.g. HMS – His Majesty’s Ship for a regular naval vessel, HMSS for a submarine) and they have now been extracted and a list of definitions is being built up. Decoding these prefixes highlighted just how much naval military activity was taking place around Aberdeen during the First World War.
Visualising the data
Some of the team also looked forward to consider how the data could be used in the future. A series of graphs and charts have been created to highlight patterns such as most frequent ships and most popular cargo. We even have an interactive map to show where the in the world the ships were arriving from.
As with CTC19, the weekend has been a great success. Archivists learned more about data and the coders benefitted from over 15,000 records to play with.
Next steps
An ideal future step for the project is the creation of individual records in the website for each vessel so we can begin to expand on the information – i.e. vessel name, history of Masters, expanded description about what it was, what role in played in the First World War. Given the heavy use of Wikidata by many of the other projects that were part of CTC19 and CTC20, consideration has to be given to using Wikidata as the expanded repository for building up the bigger picture for each vessel. However, as we are still very much in the historical investigation stage and not entirely sure about the full facts for many vessels it would not be appropriate at this stage to start pushing unverified information into Wikidata.
The evening before Code The City 18 I started to think about what fun project to spend the day doing at our one day mini-hack event. After reading Ian Watt’s blogpost about Wikidata and spending 10 minutes or so playing around with it, I decided a topic for further experimentation was required.
At the time of writing, I’m just over a third of the way through my very interesting part-time online MA Railway Studies at University of York. Looking at Britain’s railways from their very beginning, there are many railway companies from 1821 onwards. Some of these companies merged, some were taken over, others just disappeared whilst others were replaced by new companies. All these amalgamations eventually led to the “Big Four” groupings in 1923 and then on to British Railways in 1948’s railway nationalisation. British Railways rebranded as British Rail in 1965 and then splintered into numerous companies as a result of the denationalisation of the 1990s.
With the railway companies appearing in some form or another in Wikipedia, I thought it would be useful to be able to pick any railway company and view the chain of companies that led to it and those that followed. The ultimate goal would be to be able to bring up the data for British Rail and then see the whole past unfold to the left and the future unravel to the right. In theory at least, Wikidata should allow me to do that.
No software coding skills are required to see the results of my experimentation: by clicking on the links provided (usually directly after the code) it is possible to run the queries and see what happens. However, using the code provided as a start, it is possible to build on the examples to find out things for yourself.
Understanding Wikidata and SPARQL
SPARQL is the query language used to retrieve various data sets from Wikidata via the Wikidata Query Service.
As is always the case with anything software related, the examples and tutorials never seem to handle those edge cases that you seem to hit within the first 5 minutes. Maybe I hit these cases so soon due to jumping straight from the “hello world” of requesting all the railway companies formed in the UK to trying to build the more complex web of railway companies rather than working my way through all the simpler steps? However, my belief is to fail quickly, leaving plenty of time left to fail some more before succeeding, after all you never see a young child plan out a strategy when they are learning to get the different shaped blocks through the correct holes.
At the time of writing…
Comments about the state of certain items of data were relevant at the time I wrote this article. As one of the big features of Wikidata is it constantly being updated, expanded and corrected, the data referenced may have changed by the time you read this. Some of the changes are those I’ve made in reaction to my discoveries, but I have left some out there for others to fix.
A simple list
First off, I created a simple SPARQL query to request all the railway companies that were formed in the UK.
SELECT ?company ?companyLabel WHERE { ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 . SERVICE wikibase:label { bd:serviceParam wikibase:language “en”. } } ORDER BY (lcase(?companyLabel))
The output of this query can be seen by running it yourself here by clicking on the white-on-blue arrow displayed on the Wikidata Query Service console. It is safe to modify the query in the console without messing up my query as any changes cause a new bookmarked query to be created. So please experiment as that’s the only way to learn.
Now what does the query mean and where do all those magic numbers come from?
wdt:P31 means get me all Wikidata triples (wdt) that have the property instance of (P31) that is has a value of railway company (Q249556).
wdt:P17 means get me all of the results so far that have the property country (P17) set to United Kingdom (Q145).
Where did I get those numbers from? First, I went to Wikipedia and searched for a railway company, LMS Railway, and got to the page for London, Midland and Scottish Railway. From here I went to the Wikidata item for the page.
Wikipedia page for LMSR that shows how to get to the Wikidata
From here I hovered my pointer over instance of, railway company, country and United Kingdom to find out those magic numbers.
Wikidata page for LMSR
Some unexpected results
Some unexpected companies turned up in the results list due to my query not being specific enough. For example, Algeciras Gibraltar Railway Company, located in Gibraltar but with headquarters registered in the UK the data has its country as United Kingdom. To filter my results down to just those that are located in the UK I tried searching for those that had the located in the administrative territorial entity (P131) with any of the following values:
However, that dropped my result set from 228 to 25 due to not all the companies having that property set.
Note: When trying to find out what values to use it is often quick and easy to run a simple query to ask Wikidata itself. To find out what all the values were for UK countries I wrote the following that asked for all countries that had an instance of value of country within the United Kingdom (Q3336843):
select ?country ?countryLabel WHERE { ?country wdt:P31 wd:Q3336843 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } }
In order to see what other information could easily be displayed for the companies, I looked at the list of properties on the London, Midland and Scottish Railway. I saw several dates listed so decided that would be my next area of investigation. There is an inception (P571) date that shows when something came into being, so I tried a query with that:
SELECT ?company ?companyLabel ?inception WHERE { ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 . ?company wdt:P571 ?inception SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } ORDER BY (lcase(?companyLabel))
This demonstrated two big issues with data. Firstly, the result set had dropped from 228 to 106 indicating that not all the company entries have the inception property set. The second was that only one, Scottish North Eastern Railway, had a full date (29th July 1856) specified, the rest only had a year and that was being displayed as 1st January for the year. Adding the OPTIONAL clause to the inception request returns the full data set with blanks where there is no inception date specified.
SELECT ?company ?companyLabel ?inception WHERE { ?company wdt:P31 wd:Q249556; wdt:P17 wd:Q145 . OPTIONAL { ?company wdt:P571 ?inception. } SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } ORDER BY (lcase(?companyLabel))
Railway companies are not a straightforward case when it comes to a start date due to there being no one single start date. Each railway company required an Act of Parliament to officially enable it to be formed and grant permission to build the railway line(s). This raises the question: is it the date that Act was passed, the date the company was actually formed or the date that the company commenced operating their service that should be used for the start date? Here is a revised query that gets both the start time (P580) and end time(P582) of the company if they have been set:
Unfortunately, of the 228 results only one, London, Midland and Scottish Railway, has a startTime and endTime, and London and North Eastern Railway is the only with endTime. Based on these results it looks like that startTime and endTime are not generally used for railway companies. Looking through the data for Scottish North Eastern Railway did turn up a new source of end dates in the form of the dissolved, abolished or demolished (P576) property. Adding a search for this resulted in 9 companies with dissolved dates.
There is no logic in which companies have this property: they range from Scottish North Eastern Railway dissolving on 10th August 1866 to several that ended due to the formation of British Railways, the more recent British Rail ending on 1st January 2001, and the short lived National Express East Coast (1st January 2007 – 1st January 2009). However, once again, the dates are at times misleading as, in the case of National Express East Coast, it is only the year rather than full date in the inception and dissolved, abolished or demolishedproperties.
Some of the railway companies, such as Underground Electric Railways Company of London, have another source of dates and that is as part of the railway company value for their instance of. It is possible to extract the start and end dates if they are present by making use of nested conditional queries. In the line:
Another date that can be used to work out the start and end of the companies can be found hanging off the values of very useful pair of properties: replaced by (P1366) and replaces (P1365). This conveniently connects into the next part of my exploration that will follow in Part Two. Although, as with many railway related things, the exact time of arrival of part two cannot be confirmed.