Waste Wizards at CTC22

A write-up of progress at the March 2021 Environment-themed hack weekend.

What problem we were addressing?


The public have access to two free, easy accessible waste recycling and disposal methods. The first is “kerbside collection” where a bin lorry will drive close to almost every abode in the UK and crews will (in a variety of different ways) empty the various bins, receptacles, boxes and bags. The second is access to recycling centres, officially named Household Waste Recycling Centres (HWRCs) but more commonly known as the tip or the dump. These HWRCs are owned by councils or local authorities and the information about these is available on local government websites.


However, knowledge about this second option: the tips, the dumps, the HWRCs, is limited. One of the reasons for that is poor standardisation. Council A will label, map, or describe a centre one way; Council B will do it in a different way. There is a lot of perceived knowledge – “well everybody just looks at their council’s website, and everybody knows you can only use your council’s centres”. This is why at CTC22 we wanted to get all the data about HWRCs into a standard set format, and release it into the open for communities to keep it present and up to date. Then we’d use that data to produce a modern UI so that residents can actually get the information they require:

  • Which tips they can use?
  • When these dumps are open?
  • What can they take to these HWRCs?
  • “I have item x – where can I dispose of it?”

Our approach


There were six main tasks to complete:

  1. Get together a list of all the HWRCs in the UK
  2. Build an open data community page to be the centre point
  3. Bulk upload the HWRCs’ data to WikiData
  4. Manually enter the HWRCs into OpenStreetMap
  5. Create a website to show all the data
  6. Create a connection with OpenStreetMap so that users could use the website to update OSM.

What we built / did

All HWRCs are regulated by a nation’s environmental regulator:

  • For Scotland it is SEPA
  • For Northern Ireland it is NIEA
  • For Wales it is NRW
  • For England it is EA

A list of over 1,000 centres was collated from these four agencies. The data was of variable quality and inconsistent.


This information was added to a wiki page on Open Street Map – Household waste in the United Kingdom, along with some definitions to help the community navigate the overly complex nature of the waste industry.


From that the lists for Scotland, Wales and England were bulk uploaded to WikiData. The was achieved by processing the data in Jupiter Notebooks, from which formatted data was exported to be bulk uploaded via the Quick Statements tool. The NIEA dataset did not include geolocation information so future investigation will need to be done to add these before these too can be uploaded. A Wikidata query has been created to show progress on a map. At the time of writing 922 HWRCs are now in Wikidata.

Then the never-ending task of locating, updating, and committing the changes of each of the OSM locations was started.

To represent this data the team built a front-end UI with .NET Core and Leaflet.js that used Overpass Turbo to query OSM. Local Authority geolocation polygons were added to highlight the sites that a member of the public could access. By further querying the accepted waste streams the website is able to indicate which of those centres they can visit can accept the items they are wanting to recycle.

However, the tool is only as good as the data so to close the loop we added a “suggest a change” button that allowed users to post a note on that location on OpenStreetMap so the wider community can update that data.

We named the website OpenWasteMap and released it into the wild.

The github repo from CTC22 is open and available to access.

Pull requests are also welcome on the repo for OpenWasteMap.

What we will do next (or would do with more time/ funding etc)

The next task is to get all the data up-to-date and to keep it up to date; we are confident that we can do this because of the wonderful open data community. It would also be great if we could improve the current interface on the frontend for users to edit existing waste sites. Adding a single note to a map when suggesting a change could be replaced with an edit form with a list of fields we would like to see populated for HWRCs. Existing examples of excellent editing interfaces in the wild include healthsites.io which provides an element of gamification and completionism with a progress bar with how much data is populated for a particular location.

An example entry from Healthsites.io

Source: https://healthsites.io/map#!/locality/way/26794119

While working through the council websites it has become an issue that there is no standard set of terms for household items, and the list is not machine friendly. For example, a household fridge can be called:

  • Fridge
  • Fridge Freezer
  • WEEE
  • Large Domestic Electrical Appliance
  • Electric Appliance
  • White Good

A “fun” next task would be to come up with a taxonomy of terms that allows easier classification and understanding for both the user and the machine. Part of this would include matching “human readable” names to relevant OpenStreetMap tags. For example “glass” as an OSM tag would be “recycling:glass”


There are other waste sites that the public can used called Bring Banks / Recycling Points that are not run by Local Authorities that are more informal locations for recycling – these too should be added but there needs to be some consideration on how this information is maintained as their number could be tenfold that of HWRCs.

As we look into the future we must also anticipate the volume of data we may be able to get out of sources like OpenStreetMap and WikiData once well populated by the community. Starting out with a response time of mere milliseconds when querying a dozen points you created in a hackathon is a great start; but as a project grows the data size can spiral into megabytes and response times into seconds. With around 1,000 recycling centres in the UK and thousands more of the aforementioned Bring Banks this could be a lot of data to handle and serve up to the public in a presentable manner.

Using Wikidata to model Aberdeen’s Industrial Heritage

Saturday 6th March, 2021 was World Open Data Day. To mark this international event CTC ran a Wikidata Taster session. The objectives were to introduce attendees to Wikidata and how it works, and give them a few hours to familiarise themselves with how to add items, link items, and add images.

Presentation title screen
Presentation title screen

The theme of the session (to give it some structure and focus) was the Industrial Heritage of Aberdeen. More specifically the bygone industries of Aberdeen and, more specific still, the many Iron Foundries that once existed. I chose the specific topic as it is still relatively easy to spot the products of the industry on streets and pavements as we walk around the city, photograph those and add them to Wiki Commons, as I have been doing.

We had thirteen people book and eight turn up. After I gave a short presentation on how Wikidata operates we divided ourselves into three groups in breakout rooms. This was all on Zoom, of course, while we were still under lockdown.

The teams of attendees chose a foundry each: Barry, Henry & Cook Limited; Blaikie Brothers, and William McKinnon & Company Ltd. I’d already created an entry for John Duffus and Company in preparation for the event and to use as a model.

I’d also created a Google Sheet with a tab for each of the other thirteen foundries I’d identified (including those selected by the groups). I’d also spent quite a while trying to figure out how to access and search the old business and Post Office Directories for the city which had been digitised for 1824 to 1941. I eventually I built myself a tool, which I shared with the teams, which generated an URL for a specific search term for a certain directory. They used this, as well as other sources, to identify key dates, addresses and name changes of businesses.

By the end of the session our teams had created items for

They had also created items for foundry buildings – linked to Canmore etc, as well as founders. We enhanced these with places of their burial, portraits and images of gravestones. I took further photos which I uploaded to Commons and linked the following Monday. I created two Wikidata queries to show the businesses added, and the founders who created the businesses.

The statistics for the 3 hour session (although some worked into the afternoon and even the next day) are impressive. You can see more detail on the event dashboard.

We received positive feedback from the attendees who have been able to take their first steps towards using Wikidata as a public linked open data for heritage items.

I hope that the attendees will keep working on the iron founders until we have all of these represented on Wikidata. Next we can tackle shipbuilders and the granite industry!

Nautical Wrecks

This is project started as part of CTC21: Put Your City on the Map which ran Saturday 28th Nov 2020 and Sunday 29th Nov 2020. You can find our code on Github.

There are thousands of ship wrecks off the coast of Scotland which can be seen on Marine Scotland’s website

Marine Scotland map of wrecks

In Wikidata the position was quite different with only a few wrecks being logged. The information for the image below was derived from running the following query in Wikidata https://w.wiki/nDt

Initial map of Wikidata shipwrecks

Day one – sourcing the information of the wrecks. 

The project started by research various website to obtain the raw data required. Maps with shipwrecks plotted were found but finding the underlying data source was not so easy.

Data on Marine Scotland, Aberdeenshire Council’s website and on the Canmore website were considered. 

Once data was found, the next stage was finding out the licensing rights and whether or not the data could be downloaded and legitimately reused. The data found on Canmore’s website indicated that it was provided under an Open Government Licence hence could be uploaded to Wikidata. This is the data source which was then used on day two of the project. 

A training session on how to use Wikidata was also required on day one to allow the team to understand how to upload the data to Wikidata and how the identifiers etc worked.

Day two – cleaning and uploaded the data to Wikidata. 

Deciding on the identifiers to use in Wikidata was the starting point, then the data had to be cleaned and manipulated. This involved translating Easting and Northings coordinates to latitude and longitude, matching the ship types between the Canmore file and Wikidata, extracting the reference to the ship from Canmore’s URL and general overall common sense review of the data. To aid with this work a Python script was created. It produced a tab separated file with the necessary statements to upload to Wikidata via Quickstatements. 

A screenshot of the output text file.

The team members were new to Wikidata and were unable to create batch uploads as they didn’t have 4 days since creating their accounts and 50 manual edits to their credit – a safeguard to stop new accounts creating scripts to do damage. 

We asked Ian from Code The City to assist, as he has a long editing history. He continues this blog post. 

Next steps

I downloaded the output.txt file and checked if it could be uploaded straight to Quickstatements. It looked like there were minor problems with the text encoding of strings. So I imported the file into Google Docs. There, I ensured that the Label, Description and Canmore links were surrounded in double quotation marks. A quick find and replace did this. 

I tested an upload of five or six entries and these all ran smoothly. I then did several hundred. That turned up some errors. I spotted loads of ships with the label “unknown” and every wreck had the same description. I returned to the Python script and tweaked it to concatenate the word “Unknown” with a Canmore ID. This fixed the problem. I also had to create a checking method of seeing if our ship had already been uploaded. I did this by downloading all the matching Canmore IDs for successfully uploaded ships. I then filtered these out before re-creating the output.txt file. 

I then generated the bulk of the 24,185 to be uploaded.  I noticed a fairly high error rate. This was due to a similar issue to the Unknown-named ships. The output.txt script was trying to upload multiple ships with the same names (e.g. over 50 ships with the name Hope). I solved this in the same manner as with Unknown-named wrecks, concatenating ship names with “Canmore nnnnnn.”

I prepared this even as the bulk upload was running. Filtering out the recently uploaded ships and re-running the creation of the Output.txt file meant that within a few minutes I was able to have the corrective upload ready. Running this a final time resulted in all shipwrecks being added to WIkidata, albeit with some issues to fix. This had taken about a day to run, refine and rerun. 

The following day I set out to refine the quality of the data. The names of shipwrecks had been left in sentence case: an initial capital and everything else in lower case. I downloaded a CSV of records we’d created, and changed the Labels to Proper Case. I also took the opportunity to amend the descriptions to reflect the provenance of the records from Canmore in the description of each. I set one browser the task of changing Labels, and another the change to descriptions. This was 24,185 changes each – and took many hours to run. I noticed several hundred failed updates – which appear to just be “The save has failed” messages. I checked those and reran them. Having no means of exporting errors from Quickstatements (that I know of) makes fixing errors more difficult than it should be.

Finally I noticed by chance that a good number of records (estimated at 400) are not shipwrecks at all but wrecks of aircraft. Most, if not all, are prefixed “A/C’ in the label.

I created a batch to remove statements for ships and shipwrecks and to add statements saying that these are instances of crash sites. I also scripted the change to descriptions identifying these as aircraft wrecks rather than ship wrecks.

This query https://w.wiki/pjA now identifies and maps all aircraft wrecks.

aircraft wrecks uploaded from Canmore
All aircraft wrecks uploaded from Canmore

This query https://w.wiki/pSy maps all shipwrecks

the location of all shipwrecks uploaded to Wikidata from Canmore.
The location of all shipwrecks uploaded to Wikidata from Canmore.

Next steps?

I’ve noted the following things that the team could do to enhanced and refine the data further:

  • Check what other data is available by download or scraping from Canmore (such as date of sinking, depth, dimensions) and add that to the wikidata records
  • Attempt to reconcile data uploaded from Aberdeen built ships at CTC19 with these wrecks – there may be quite a few to be merged

Finally, in the process of working on the cleaning of this uploaded data I noticed the the data model on Wikidata to support this is not well structured.

This was what I sketched out as I attempted to understand it.

The confusing data model in Wikidata
A confusing data model

Before I changed the aircraft wrecks to “crash site” I merged the two items which works with the queries above. But this needs more work.

  • Should the remains of a crashed aircraft be something other than a crash site? The latter could be cleared of debris and still be the crash site. The term Shipwreck more clearly describes where a wreck is whether buried, on land, or beneath the sea.
  • Why is a shipwreck a facet of a ship, but a crash site is a subclass of aircraft.
  • And Disaster Remains seems like the wrong term for what might be a non-disastrous event (say if a ship from the middle ages gently settled into mud over the centuries and was forgotten about – and certainly isn’t a subclass of Conservation Status, anyway.

I’d be happy to work with anyone else on better working out an ontology for this.

Swift use of Doric Place Names

Introduction

One of the Code the City 21 projects was looking at providing Scots translations of Aberdeenshire place names for displaying on an OpenStreetMap map. Part of the outcomes for that project included a list of translated places names and potentially an audio version of name to guide in pronunciation.

I’m a firm believer that Open Data shouldn’t just become “dusty data left on the digital shelf” and to “show don’t tell”. This led me to decide to show just how easy it is to do something with the data created as part of the weekend’s activities and to make use of outcomes from a previous CTC event (Aberdeenshire Settlements on Wikidata and Wikipedia) and thus take that data off the digital shelf.

My plan was to build a simple iOS app, using SwiftUI, that would allow the following:

  • Listing of place names in English and their Scots translation
  • View details about a place including its translation, location and photo
  • Map showing all the places and indicating if a translation exists or not

I used SwiftUI as it is fun (always an important consideration) to play with and quick to get visible results. It also provides the future option to run the app as a Mac desktop app.

Playing along at home

Anyone with a Mac running at least Catalina (macOS 10.15) can install Xcode 12 and run the app on the Simulator. The source code can be found in GitHub.

Getting the source data

Knowing that work had previously been done on populating Wikidata with a list of Aberdeenshire Settlements and providing photos for them, I turned to Wikidata for sourcing the data to use in the app.

# Get list of places in Aberdeenshire, name in English and Scots, single image, lat and long

 
SELECT  ?place (SAMPLE(?place_EN) as ?place_EN) (SAMPLE(?place_SCO) as ?place_SCO) (SAMPLE(?image) as ?image) (SAMPLE(?longitude) as ?longitude)  (SAMPLE(?latitude) as ?latitude)
  WHERE {
    ?place wdt:P31/wdt:P279* wd:Q486972 .
    ?place wdt:P131 wd:Q189912 .
    ?place p:P625 ?coordinate.
    ?coordinate psv:P625 ?coordinate_node .
    ?coordinate_node wikibase:geoLongitude ?longitude .
    ?coordinate_node wikibase:geoLatitude ?latitude .
    OPTIONAL { ?place wdt:P18 ?image }.
    OPTIONAL { ?place rdfs:label ?place_EN filter (lang(?place_EN) = "en" )}.
    OPTIONAL { ?place rdfs:label ?place_SCO filter (lang(?place_SCO) = "sco" )}.
    }
GROUP BY ?place
ORDER By ?place_EN

The query can be found in the CTC21 Doric Tiles GitHub repository and run via the Wikidata Query Service.

The query returned a dataset that consisted of:

  • Place name in English
  • Place name in Scots (if it exists)
  • Single image for the place (some places have multiple images so had to be restricted to single image)
  • Latitude of place
  • Longitude of place

Just requesting the coordinate for each place resulted in a text string, such as Point(-2.63004 57.5583), which complicated the use later on. Adding the relevant code

?coordinate psv:P625 ?coordinate_node .
?coordinate_node wikibase:geoLongitude ?longitude .
?coordinate_node wikibase:geoLatitude ?latitude .

to the query to generate latitude and longitude values simplified the data reuse at the next stage.

The results returned by the query were exported as a JSON file that could be dropped straight into the Xcode project.

The App

SwiftUI allows data driven apps to be quickly pulled together. The data powering the app was a collection of Place structures populated with the contents of the JSON exported from Wikidata.

struct Place: Codable, Identifiable {
     let place: String
     let place_EN: String
     let place_SCO: String?
     let image: String?
     var latitude: String
     var longitude: String
     
     // Computed Property
     var id: String { return place }
     var location: CLLocationCoordinate2D {
         CLLocationCoordinate2D(latitude: Double(latitude)!, longitude: Double(longitude)!)
     }
 }

The app itself was split into three parts: Places list, Map, Settings. The Places list drills down to a Place details view.

List view of Places showing English and Scots translation.
List of places in English and their Scots translation if included in the data
Details view showing place name, photo, translation and map.
Details screen about a place
Map showing places and indication if they have been translated into Scots or not.
Map showing places and indicating if they have Scots translation (yellow) or not (red)

The Settings screen just displays some about information and where the data came from. It acts partially as a placeholder for now with the room to expand as the app evolves.

Next Steps

The app created over the weekend was very much a proof of concept and so has room from many improvements. The list includes:

  • Caching the location photos on the device
  • Displaying additional information about the place
  • Adding search to the list and map
  • Adding audio pronunciation of name (the related Doric Tiles project did not achieve adding of audio during the CT21 event)
  • Modified to run on Mac desktop
  • Ability to requested updated list of places and translations

The final item on the above list, the ability to request an updated list of places, in theory is straight forward. All that would be required is to send the query to the Wikidata Query Service and process the results within the app. The problem is that the query takes a long time to run (nearly 45 seconds) and there may be timeout issues before the results arrive.

Mapping Memorials to Women in Aberdeen

This project, which was part of CTC20,  grew from a WMUK / Archaeology Scotland join project carried out by Scottish Graduate School of Arts & Humanities intern Roberta Leotta during lockdown 2020. More details about the background to the project can be found here.

It’s often touted that there are some cities in Scotland (coughEdinburghcough) where there are more statues to animals than there are to women. In my own work transferring OpenPlaques data to Wikidata I’ve observed that there are more entries for Charles Rennie Macintosh than there for women in Glasgow. So in this light, it’s somewhat refreshing to work on a project that celebrates all kinds of memorials to women in Scotland.

The Women of Scotland: Mapping Memorials project began in 2010 as a joint project between Glasgow Women’s Library, and Women’s History Scotland. It’s similar in many ways to OpenPlaques, but using Wikidata could add an extra dimension – let’s increase the coverage of women’s history and culture on the Wikimedia projects by getting these memorials and the women they celebrate into Wikidata, use that to identify gaps in knowledge, and then work to fill the gap.

Over the two days, here’s what we did:

Data collection

We scraped the initial list of data from Mapping Memorials website manually, and created a shared worksheet based on a model that’s been used previously for other cities. (The manual process is slow, and a bit fiddly, and is the one thing that I wouldn’t do again. We’re in contact with the admin so going forward, I’m hopeful that we wouldn’t need to repeat this step in the future.)

Once we had this list, we could create a more automated process to deal with gathering the other pieces of information we needed to create new, good quality Wikidata items, although some (description, for example) needed a human eye.

Wikidata identifiers

We were using two main identifiers on Wikidata – P8048 (Women of Scotland memorial ID) and P8050 (Women of Scotland subject ID). The former for the entries to the memorials themselves, and the latter for the women they celebrate. Where the women didn’t have entries, we could create those, and then link them to the entries for the memorials.

Both identifiers use the last part of the URL for each entry on the Mapping Memorials site, so that was fairly easy to do in Google docs. Once we had that info, it’s an easy enough step to bulk-create items either using Quickstatements or Wikibase CLI.

Creating items & avoiding duplicates

There’s a plug in for Google Sheets called Wikipedia and Wikidata Tools which has some useful features for projects like this – WikidataQID for looking up whether something already exists on Wikidata, and WikidataFacts, which tells you what that item is. The former is ok if you have an exact match, the latter is really useful for flagging anything which might lead to a disambiguation page, for example.

Ultimately we did end up with a few duplicates that needed to be merged, but this was pretty easily managed, and it really showed how useful it is to have local knowledge involved in local projects – there were a couple of sets of coordinates that were obviously wrong, but also some errors that wouldn’t have been spotted by someone unfamiliar with the area.

Coordinates and dates

I really like Quickstatements, but there are a few areas in which it’s fiddly, including coordinates and dates. I’m really interested in looking further into Wikibase CLI for dates in particular, as the process there for dates (documented here) looks to be substantially easier in terms of data prep than it does in Quickstatements. Many thanks to Tony for that work, as his expertise saved us a lot of time! He also used that tool to create items for those women commemorated who were missing from Wikidata, documented here.

As with dates, coordinates are entered into Quickstatements in a different format than that which you’d use manually inside Wikidata itself, hence the formatting you’ll see in column Q on the Data collection tab. Most of this we had to grab from Google Maps, which again is a bit fiddly.

Quickstatements

Once we had a master list of QIDs for the memorials we were working with, we could use Quickstatements to bulk upload sets of statements to those items.

For example, matching the memorials to the women commemorated, using this format:

Screenshot of a spreadsheet showing QID for memorials and the women they commemorate
Screenshot of a spreadsheet showing QID for memorials and the women they commemorate

The Q numbers on the left are those of the memorials, P547 is “commemorates”, and the Q numbers on the right are those of the women celebrated. We were also able to add P8050 (Women of Scotland subject ID) to some women who already had entries on Wikidata, but no WoS ID.

Screenshot of a spreadsheet showing each memorial QID and its type
Screenshot of a spreadsheet showing each memorial QID and its type

The Q number on the left again is the memorial, P31 is “instance of”, and the Q number on the right corresponds to a type of thing – a commemorative plaque, a garden, or a road, for example.

Once you’ve got the info in this format, it’s just a case of copy & pasting into QS, clicking import, and then run. (Note – you do need to be an autoconfirmed user to use QS, which means that your account must be at least 4 days old, and having more than 50 edits.) It’s relatively easy, and I was pleased that one of our relatively-new-to-Wikidata participants had the chance to make her first bulk uploads (description & commons category) using the tool over the weekend.

Photos

This project grew out of a desire to increase the coverage of Scottish heritage on Wikimedia Commons, so it was great to take some time on this. Mapping Memorials does have some images, but they’re not openly licensed, and others are missing. After Wikimedia Commons, our next port of call was Geograph, where many images have been released on Wiki-compatible Creative Commons licenses. Using Geograph2Commons, images can easily be transferred over to Wikimedia Commons, so that they can be used in any Wikimedia Project. Geograph also links to this feature from their site – click on “Find out how to reuse this image”, and then scroll down to “Wikipedia template for image page”, then click on the “geograph2commons” link. Really simple. Our group did some detective work for images, and then added them to Commons, and linked them manually to the Wikidata item.

This gave us a list of missing images… which is fine, but wouldn’t it be better to see them on a map?

Visualisation and filling the gaps

Thanks to Ian’s tutorial on how to create a custom WikiShootMe map, we were able to create a custom map that showed us which of the memorials we were working on had images, which didn’t, and where they were. That map is here, and it was great to see it slowly turn more green than red over the weekend as we found more images, or as volunteers headed out across Aberdeen between days to take missing pictures.

A screenshot of a clickable map where people can upload photos of monuments
A screenshot of a clickable map where people can upload photos of monuments

One of the small, but very satisfying, things you can do with these kinds of images is to integrate them into relevant Wikipedia articles. I added images from the project to the articles for Aberdeen Town House, Caroline Phillips, and Katherine Grainger. At the time of writing, around 2500 people have viewed those articles since I added the images.

Next steps

Over the course of the weekend we added 77 new memorials, and 26 new women to Wikidata, as well as a whole host of new photos. These entries all had some quite rich data, and as complete as we could make it.

We were surprised to see some of the individuals who didn’t have a Wikipedia article – and of course, we can use the Wikidata query service to identify those gaps. The queries below could give us a great starting point for an editathon, or indeed, for any Wikipedia editor interested in writing Women’s biography.

  • Wikidata query for women with a Women of Scotland subject ID, a memorial in Aberdeen, but no enwiki article: https://w.wiki/YVH
  • Wikidata query for women with a Women of Scotland subject ID, but no enwiki article: https://w.wiki/YVG

Huge thanks to the team, and to Code the City for another great hack weekend!

Dr Sara Thomas
Scotland Programme Coordinator, Wikimedia UK

——————————————————————————

Header image: The Grave of Jessie Seymour Irvine by Ian Watt on Wiki Commons  (CC-BY-SA)