The current situation

  • Weds 11th November 2020  Aberdeen Python User Group. “Using Streamlit.Io to create interactive Machine Learning applications in Python” by Dr Eyad Elyan of RGU. More details and tickets here.
  • Sat 28 – Sun 29 November 2020, CTC21Put your City on the Map, a hack weekend of learning, and using geospatial tools and data. Learn new skills, contribute to Open Street Map, digitise old maps and much more!  More details. Tickets.
  • Weds 9th December 2020 – Aberdeen Python User Group. “Data Visualisation in Python.” More details and tickets here
  • Weds 13th Jan 2021 – Aberdeen Python User Group. More details soon.

Note – all of our planned events are currently being run fully online. Everything we do is free to attend. Tickets for events have a suggested donation of £5 to help with charity running costs but this should not be a barrier to anyone attending.

To get advanced notice of our events, and make sure of a place, why not sign-up for our monthly, spam-free, mailing list?

Do we know if we know what Open Data is?

A guest post by Karen Jewell, a Data Scientist who attended SODU2020

I went into the weekend of SODU, headset and coffee at the ready, thinking that as SODU was both my first experience of an unconference and of Open Data, I wouldn’t be able to participate much but that I could take the opportunity to learn from the brighter and more informed voices around me. Well, it turns out I was quite wrong about my involvement with Open Data.

In the networking sessions of the first day, I introduced myself as someone who didn’t work at all with Open Data, had no experience of it and was here to learn about it. Yet as the event carried on through the day, many discussions and concepts seemed familiar to me and in the afternoon of the first day I had that “ah-hah!” moment. I realised it wasn’t true that I did not work with Open Data, I did, and actually had done so quite a bit in the last 12 months. I just had not realised that is what it was called.

Open Data is data which is not owned or controlled, and is free for use and distribution. Having only just completed my studies in a MSc Data Science at the Robert Gordon University 3 weeks prior, free data was pretty critical to my work as a student. Not only was I able to practice concepts using freely available datasets, 3 of my 8 taught modules required me to source my own dataset for that module’s assessment. To rephrase that, I needed Open Data to complete my degree. Data Scientists are aware of Kaggle, the UCI ML repository, and a quick online search for Scotland’s data will return the Scottish Government’s statistics portal. We see these sources as free data we can practice on, but we may not have recognised it as Open Data, I certainly didn’t until SODU took my blinkers off.

Coming out of SODU, I started to wonder how many other people were in the same metaphorical boat. Were they not answering the call for involvement because they did not realise the availability of Open Data affected them too? To test the idea, I set up a non-scientific survey on Instagram and asked my peers the question “Do you know what Open Data is?” with a simple “Yes/No” response. Of the 22 persons who responded, 3 said Yes (14%) and 19 said No (86%). In a perfect world, I would have also had a follow-up question asking if they had used information from a list of known Open Data sources to confirm the theory, but we will have to do without for now.

Quick Poll
Quick Poll

Yet in the age of Covid-19 where everyone is quite capable of quoting a statistic or method in every online argument for and against, how many of us haven’t realised we are benefiting from the availability of Open Data when we quote new case counts, % positive tests, and infection rates in our conversations on a daily basis?

I attended SODU to learn about Open Data, and I learnt I’d actually been using it all along. Several prominent themes discussed at SODU included the need for a community of practitioners, having a central point of access, and having evidence of the benefits of supporting Open Data. The question that bugs me now is, how do we know who our practitioners and where our success stories are, if they can’t even recognise themselves? Maybe, there is an opportunity to do some work here?

 

Ten Years After

“Hear me calling, hear me calling loud, 
If you don’t come soon, I’ll be wearing a shroud.” – Ten Years After (1969)

Introduction

Today marks the tenth anniversary of my involvement with Open Data in Scotland. As I wrote here, back in 2009-2010 I’d been following the work that Chris Taggart and others were doing with open data, and was inspired by them to  create what I now believe to have been the first open data published in the public sector in Scotland.

This piece is a reflection of my own views. These views may be the same as those held by colleagues at Code The City or indeed on the civic side of the Open Government Partnership. I’ve not specifically asked other individuals in either group.

While my involvement in, and championing of, open data in Scotland is now a decade long, my enthusiasm for the subject and in the the social and economic benefits it can deliver, is undiminished by my leaving the public sector in 2017 after thirty four years. In fact the opposite is true: the more I am involved in the OD movement, and study what is being achieved beyond Scotland’s narrow borders, the more I am convinced that we are a country intent on squandering a rich opportunity, regardless of our politicians’ public pronouncements.

But the journey has not been easy. primarily due to a lack of direction from Scottish Government and little commitment, resource or engagement at all levels of public service. A friend who reviewed this blog post suggested that I should replace the picture of a birthday cake (above) with one of a naked human back bearing bleeding scars from the our battles. He’s right –  it is STILL a battle ten years on.

It is not as if the position in Scotland is getting better. We are moving at a glacial pace. The gap between Scotland and other countries in this regard is widening. I gave a talk earlier this year in which I showed assessments of Scotland, Romania and Kenya’s performance in Open Government (source: https://www.opengovpartnership.org/campaigns/global-report/ Vol 2) and asked the audience to identify which was Scotland.

Extracts from https://www.opengovpartnership.org/campaigns/global-report/ Vol 2
Extracts from Vol2 to of the Open Gov Partnership report

Show full version of graphic

Question: Which is Scotland? (Answer)

Economic opportunities

In February 2020 the European Data Portal published a report – The Economic Impact of Open Data – which sets out a clear economic case for open data. That paper looks at 15 previous studies between 1999 and 2020 which have examined at the market size of open data at national and international levels, measured in terms of GDP of each study’s geographical area.

Taking the average and median values from those reports (1.33%  and 1.19% respectively) and an estimated GDP for Scotland (2018) of £170.4bn we can see that the missed opportunity for Scotland is of the order of £2.027bn to £2.266bn per annum.  What is the actual value of the local market created by Scottish-created open data? if pushed for a figure I would estimate that it is currently worth a few hundred thousand pounds per annum, and no more. Quite a gap!

Meantime we have the usual suspect of consultants whispering sweetly in the ears of ministers, senior civil servants and council bosses that we should be monetising data, creating markets, selling it. There will be no mention, I suspect of the heavily-subsidised, private sector led, yet failed Copenhagen Data Exchange, I suspect. (Maybe they can make a few bob back selling the domain name! )

You can buy the failed CityDataExchange.com for just $5195
You can buy the failed CityDataExchange.com for just $5195

While this commercial approach to data may plug small gaps in annual funding for Scotland, and line the pockets of some big companies in the process, it won’t deliver the financial benefits at a national level of anything like the figures suggested by that EU Data Portal report but it will, in the process, actively hamper innovation and inhibit societal benefits.

I hear lots of institutions saying “we need to sell data” or “we need to sell access rights to these photos” or similar. Yet, in so many cases, the operation of the mechanisms of control; the staffing, administration, payment processing etc. far outstrips any generated income. When I challenged ex colleagues in local government about this behaviour their response was “but our managers want to see an income line”  to which we could add “no matter how much it is costing us.” And this tweet from The Ferret on Tuesday of this week is another excellent example of this!

I have also heard lots of political proclamations of “open and transparent” government in Scotland since 2014. Yet most of the evidence points in exactly the opposite direction. Don’t forget, when Covid 19 struck, Scotland’s government was reportedly the only political administration apart from Bolsonaro’s far right one in Brazil to use the opportunity to limit Freedom of Information.

Openness, really?

It is clear that there is little or no commitment to open data in any meaningful way at a Scottish Government level, in local authorities, or among national agencies. This is not to say that there aren’t civil servants who are doing their best, often fighting against political or senior administration’s actions.  Public declarations are rarely matched by delivery of anything of substance and conversations with people in those agencies (of which I have had many) paints a grim picture of political masters saying one thing and doing another, of senior management not backing up public statements of intent with the necessary resource commitment and, on more than occasion, suggestions of bad actors actually going against what is official policy.

I mention below that I joined the Open Government Partnership late in 2019. Initially I was enthusiastic about what we might achieve. While there are civil servants working dedicatedly on open government who want to make it work, I am unconvinced about political commitment to it. We really need to get some positive and practical demonstration that Scottish Government are behind us – otherwise I and the other civil society representatives are just assisting in an open-washing exercise.

In my view (and that of others) the press in Scotland does not provide adequate scrutiny and challenge of government. We have a remarkably ineffective political opposition. We also have a network of agencies and quangos which are reliant on the Scottish Government for funding who are unwilling to push back. All of this gives the political side a free pass to spout encouraging words of “open and transparent” yet do the minimum at all times.

We may have an existing Open Data Strategy for Scotland (2015) stating that Scotland’s data is “open by default”, yet my 2019 calculation was that over 95% of the data that could and should be open was still locked up. And there is little movement on fixing that.

We have many examples of agencies doing one thing and saying another, such as  Scottish Enterprise extolling the virtues of  Open Data yet producing none. Its one API has been broken for many months, I am told.

My good friends at The Data Lab do amazing work on funding MSc and Phd places, and providing funding for industrial research in the application of data science. Their mission is “to help Scotland maximise value from data …” yet they currently offer no guidance on open data, no targeted programme of support, no championing of open data at all, despite the widely-accepted economic advantages which it can deliver. There is the potential for The Data Lab to lead on how Scotland makes the most of open data and to guide government thinking on this!

All of this is not to pick on specific organisations, or hard working and dedicated employees within them. But it does highlight systemic failures in Scotland from the top of government downwards.

Fixing this is an enormous task: one which can only be done by the development of a fresh strategy for open data in Scotland, which is mandated for all public sector bodies, is funded as an investment (recognising the economic potential), and which is rigorously monitored and enforced.

I could go on…. but let’s look at this year’s survey.

(skip to summary)

Another year with what to show for it?

In February 2019 I conducted a survey of the state of open data in Scotland. It didn’t paint an encouraging picture. The data behind that survey has been preserved here. A year on, I started thinking about repeating the review.

In the intervening year I’d been involved in quite a bit activity around open data. I had

  • joined the civic side of the Open Government group for Scotland and was asked to lead for the next iteration of the plan on Commitment Three (sharing information and data) ,
  • joined the steering group of Stirling University’s research project, Data Commons Scotland,
  • trained as a trainer for Wikimedia UK, delivering training in Wikidata, Wikipedia and Wiki Commons, and running multiple sessions for Code The City with a focus on Wikidata,
  • created an open Slack Group  for the open data community in Scotland to engage with one another,
  • created an Open Data Scotland twitter account which has gained almost 500 followers, and
  • initiated the first Scottish Open Data Unconference (SODU) 2020 which had been scheduled to take place as a physical event in March this year. That has now been reconfigured as an online unconference which will happen on 5th and 6th September 2020.

In restarting this year’s review of open data publishing in Scotland my aims were to see what had changed in the intervening 12 months and to increase the coverage of the survey: going broader and deeper and developing an even more accurate picture. That work spilled into March at which point Covid-19 struck. During lockdown I was distracted by various pieces of work. It wasn’t until August, and with a growing sense of the imminence of this 10-year anniversary, that I was galvanised to finish that review.

I am conscious that the methodology employed here is not the cleverest – one person counting only the numbers of datasets produced.  This is something I return to later.

The picture in 2020

I broke the review down into sectoral groupings to make it more managable to conduct. By sticking to that I hope to make this overview more readable. The updated Git Hub repo in which I noted my findings is available publicly, and I encourage anyone who spots errors or omissions to make a pull request to correct them. Each heading below has a link to the Github page for the research.

Overall there is little significant positive change. This is one factor which gives rise to concerns about government’s commitment to openness generally and open data specifically; and to a growing cynicism in the civic community about where we go from here.

Local Government

(Source data here)

I reviewed this area in February 2020 and rechecked it in August.  Sadly there has been no significant change in the publication of open data by local government in the eighteen months since I last reviewed this. More than a third of councils (13 out of a total of 32) still make no open data provision.

While the big gain is that Renrewshire Council have launched a new data portal with over fifty datasets, most councils have shown little or no change.

Sadly the Highland Council portal, procured as part of the Scottish Cities Alliance’s Data Cluster programme at £10,000’s cost, has vanished. I dont think it ever saw a dataset being added to it. Searching Highland Council’s website for open data finds nothing.

While big numbers of data sets don’t mean much by themselves, the City of Edinburgh Council has a mighty 236 datasets. Brilliant! BUT … none of them are remotely current. The last update to any of them was September 2019. Over 90% of them haven’t been updated since 2016 or earlier.

Similarly Glasgow, which has 95 datasets listed have a portal which is repeatedly offline for days at a time. A portal which won’t load is useless.

Dundee, Perth and Stirling continue to do well. Their offerings are growing and they demonstrate commitment to the long-haul.

Aberdeen launched a portal, more than three years in the planning, populated it with 16 datasets and immediately let their open data officer leave at the end of a short-term contract. Some of their datasets are interesting and useful – but there was no consultation with the local data community about what they would find useful, or deliver benefits locally; all despite multiple invitations from me to interact with that community at the local data meet-ups which I was running in the city.

It was hoped that the programme under the Scottish Cities alliance would yield uniform datasets, prioritised across all seven Scottish Cities, but there is no sign of that happening, sadly. So what you find on all portals or platforms is pretty much a pot-luck draw.

Where common standards exist – such as the 360 Giving standard for the publication of support for charities – organisations should be universally adopting these. Yet this is only used by two of 32 authorities, all of whom have grant-making services. Surely, during a pandemic especially,  it would be advantageous to funders and recipients to know who is funding which body to deliver what project?

Councils – Open Government Licence and RPSI

This is a slight aside from the publication of open data, but an important one. If the Scottish Authorities were to adopt an OGL approach to the publication of data and information on their website (as both the Scottish Government’s core site and the Information Commissioner for Scotland do) then we would be able to at least reuse data obtained from those sites. This is not a replacement for publishing proper open data but it would be a tiny step forward.

The table below (source and review data here)  shows the current permissions to reuse the content of Scottish Local Authorities’ websites. Many are lacking in clarity, have messy wording, are vague or misunderstand terminologies. They also, in the main, ignore legislation on fair re-use.

Table of local authority adoption of PGL and RPSI
Table of local authority adoption of PGL and RPSI
Open Government Licence

The Scottish Government’s own site is excellent and clear: permitting all content except logos to be be reused under the Open Government Licence. This is not true for local authorities. At present only Falkirk and Orkney Councils – two of the smaller ones – allow, and promote OGL re-use of content. There is no good reason why all of the public sector, including local government, should not be compelled to adopt the terms of OGL.

Re-use of Public Sector Information (RPSI) Regulations

Since 2015 the public sector has been obliged by the RPSI Regulations to permit reasonable reuse of information held by local authorities. So, even if Scottish LAs have not yet adopted OGL for all website content, they should have been making it clear for the last five years how a citizen can re-use their data and information from their website.

In my latest trawl through the T&Cs and Copyright Statements of 32 Scottish Local Authorities, I found only 7 referencing RPSI rights there, with 25 not doing so (see the full table above). I am fairly sure that these authorities are breaking the legal obligation on public bodies to provide that information.

Finally, given the presence of COSLA on the Open Government Scotland steering group, the situation with no open data; poor, missing or outdated data; and OGL and PRSI issues needs to be raised there and some reassurance sought that they will work with their member organisations to fix these issues.

Health

(Source data here)

The NHS Scotland Open Data platform continues to be developed as a very useful resource. The number of datasets  there has more than doubled since last year (from 26 to 73).

None of the fourteen Health Boards publish their own open data beyond what is on the NHS Scotland portal.

Only one of the thirty Health and Social Care Partnerships (HSCPs) publish anything resembling open data: Angus HSCP.

COVID-19 and open data

While we are on health, I’ve wrote (here and here) early in the pandemic about the need for open data to help the better public understanding of the situation, and stimulate innovative responses to the crisis. The statistics team at Scottish Government responded well to this and we’ve started to develop a good relationship. I’ve not followed that up with a retrospective about what did happen. Perhaps I will in time.

It was clear that the need for open data in CV19 situation caught government and health sector napping. The response was slower than it should have been and patchy, and there are still gaps. People find it difficult to locate data when it is on muliple platforms, spread across Scots Govt, Health and NRS. That is, in a microcosm, one of the real challenges of OD in Scotland.

With an open Slack group for Open Data Scotland there is a direct channel that data providers could use to engage the open data community on their plans and proposals. They could also to sound out what data analysts and dataviz specialists would find useful. That opportunity was not taken during the Covid crisis, and while I was OK in the short term with being used as a human conduit to that group, it was neither efficient nor sustainable. My hope is that post SODU 2020, and as the next iteration of the Open Gov Scotland plan comes together we will see better, more frequent, direct engagement with the data community on the outside of Government, and a more porous border altogether.

Further and Higher Education

(Source data here)

There is no significant change across the sector in the past 18 months. The vast majority of institutions make no provision of open data. Some have vague plans, many of them historic – going back four years or more – and not acted on.

Lumping Universities and Colleges together, one might expect at a minimum properly structured and licensed open data from every institution on :

  • courses
  • modules
  • events
  • performance (perhaps some of this is on HESA and SFC sites?)
  • physical assets
  • environmental performance
  • KPI targets and achievements etc.

Of course, there is none of that.

Universities and colleges

I reviewed open data provision of Universities and Colleges around 17 February 2020. I revisited this on 11 August 2020, making minor changes to the numbers of data sets found.

While five of fifteen universities are publishing increasing amounts of data in relation to research projects, most of which are on a CC-0 or other open basis, there continues to be a very limited amount of real operational open data across the sector with loads of promises and statements of intent, some going back several years.

The Higher Education Statistics Agency publishes a range of potentially useful-looking Open Data under a CC-BY-4.0 licence. This is data about insitutions, course, students etc – and not data published by the institutions themselves. But I could identify none of that. Overall, this was very disappointing.

Further, while there are 20 FE colleges. None produces anything that might be classed as open data. A few have anything beyond vague statement of intent. Perhaps City of Glasgow College not only comes closest, but does link to some sources of info and data.

The Crighton Observatory

While doing all of this, I was reminded of the Crighton Institute’s Regional Observatory which was announced to loud fanfares in 2013 and appears to have quietly been shut down in 2017. Two of the team involved say in their Linked In profiles that they left at the end of the project. Even the domain name to which articles point is now up for grabs (Feb 2020).

It now appears (Aug 2020) that the total initial budget for the project was >£1.1m. Given that the purpose of the observatory was to amass a great deal of open data,  I have also attempted to find out where the data is that it collected and where the knowledge and learning arising from the project has been published for posterity? I can’t locate it. This FOI request may help. The big question: what benefits did the £1.1m+ deliver?

Scottish Parliament

(Source data here).

In February 2019 I found that The Scottish Parliament had released 121 data sets. This covers motions, petitions, Bills, petitions and other procedural data, and is very interesting. This year we find that they have still 121 data sets, so, there are no new data sources.

In fact that number is misleading. In February 2020  I discovered that while 75 of these have been updated with new data, the remaining 46 (marked BETA) no longer work. As of August 2020 this is still the case. Why not fix them, or at worst clear them out to simplfy the finadbility of working data?

Some of these BETA datasets should contain potentially more interesting / useful data e.g. Register of Members Interests but just don’t work. Returning: [“{message: ‘Data is presently unavailable’}”]

I didn’t note the availability of APIs last year, but there are 186 API calls available. Many of these are year-specific. I tested half a dozen and about a third of those returned error messages. I suspect some of these align with the non-functioning historic BETAs.

Sadly the issues raised a year ago about the lack of clarity of the licensing of the data is unchanged. To find the licence, you have to go to Notes > Policy on Use of SPCB Copyright Material. Following the first link there (to a PDF) you see that you have to add “Contains information licenced under the Scottish Parliament Copyright Licence.” to anything you make with it, which is OK. But if you go to the second link “Scottish Parliament Copyright Licence” (another PDF) the wording (slightly) contradicts that obligation. It then has a chunk about OGL but says, “This Scottish Parliament Licence is aligned with OGLv3.0” whatever that means. Why not just license all of the data under OGL? I can’t see what they are trying to do.

Scottish Government

(Source data here)

Trying to work out the business units within the structure of Scottish Government is a significant challenge in itself. Attempting to then establish which have published open data, and what those data sets are, and how they are licensed, is almost an impossible task. If my checking, and arithmetic are right, then of 147 discrete business units, only 27 have published any open data and 120 have published none.

So we can say with some confidence  that the issue with findability of data raised in Feb 2019 is unchanged, there being no central portal for open data in the Scottish public sector or even for Scottish Government. Searching the main Scottish Government website for open data yields 633 results, none of which are links to data on the first four screenfuls. I didn’t go deeper than that.

The Scottish Government’s Statistics Team have a very good portal with 295 Data Sets from multiple organisational-providers. This is up by 46 datasets on last year and includes a two new organisations: The Care Inspectorate and Registers of Scotland. The latter, so far (Aug 2020), has no datasets on the portal.

There are some interesting new entrants into the list of  those parts of Scottish Government publishing data such as David MacBrayne Limited which is, I believe, wholly owned by SG and is the parent, or operator of Calmac Ferries Limited.  On 1st March 2020 they released a new data platform to get data about their 29 ferry routes. This is very welcome. After choosing the dates, routes and traffic types you can download a CSV of results. While their intent appears to be to make it Open Data, the website is copyright and there is no specific licensing of the data. This is easily fixable.

It is also interesting to contrast Transport Scotland with work going on in England. Transport Scotland’s publication scheme says of open data “Open data made available by the authority as described by the Scottish Government’s Open Data Strategy and Resource Pack, available under an open licence. We comply with the guidance above when publishing data and other information to our website. Details of publications and statistics can be found in the body of this document or on the Publications section of our website.” I searched both without success for any OD. Why not say “we don’t publish any Open Data”? Compare this complete absence of open data with even the single project Open Bus Data for England. Read the story here. Scotland is yet again so far behind!

Summary

In the review of data I’ve shown that little has changed in 18 months. Very few branches of government are publishing open data at all. The landscape is littered with outdated and forgotten statements of good intent which are not acted on; broken links; portals that vanish or don’t work; out of date data; yawning gaps in publication and so on.

The claim of “Open By Default” in the current (2015) Open Data Strategy is misleading and mostly ignored with consequence.  The First Minister may frequently repeat the mantra of “Open and Transparent” when speaking or questioned by journalists, but it is easily demonstrable that the administration frequently act in the directly opposite way to that.

The recent situations with Covid-19 and the SQA exams results show Scotland would have found itself in a much better place this year with a mature and well-developed approach to open data: an approach one might have reasonably expected after five full years of “open by default”.

The social and economic arguments for open data are indisputable. These have been accepted by most other governments of the developed world. Importantly, they have also been taken up and acted on by developing nations who have in many cases overtaken Scotland in their delivery of their Open Government plans.

The work I have done in 2019 and in this review is not a sustainable one – i.e. one single volunteer monitoring the activity of every branch and level of government  in Scotland. And the methodology is limited to what is achievable by an individual.

A country which was serious about Open Data would have targets and measures, monitoring and open reporting of progress.

  • It wouldn’t just count datasets published. It would be looking at engagement, the usefulness of data and its integration into education.
  • It would fund innovation: specifically in the use of open data; in the creation of tools; in developing services to both support government in creating data pipelines, and in helping citizens in data use.
  • It would co-develop and mandate the use of data standards across the public sector.
  • It would develop and share canonical lists of ‘things’ with unique identifiers allowing data sets to be integrated.
  • It would adopt the concept of data as infrastructure on which new products, services, apps, and insights could be built.

I really want Scotland to make the most of the opportunities afforded by Open Data. I wouldn’t have spent ten years at this if I didn’t believe in the potential this offers; nor if I didn’t have the evidence to show that this can be done. I wouldn’t be giving up my time year-on-year researching this, giving talks, organising groups and creating opportunities for engagement.

What is fundamentally lacking here is some honesty from Scottish Government ministers instead of their pretence of support for open data.

 

Ian Watt
20 August 2020

Link to an index of pieces I have written on Open Data:
http://watty62.co.uk/2019/02/open-data-index-of-pieces-that-i-have-written/

Answer to quiz

Scotland is B, in the centre. Kenya is A, and Romania C.
I could have chosen Mexico, Honduras, Paraguay, Uruguay – or others. All are doing better than Scotland.

Back up to the quiz

Header Image by David Ballew on Unsplash.

Mapping Memorials to Women in Aberdeen

This project, which was part of CTC20,  grew from a WMUK / Archaeology Scotland join project carried out by Scottish Graduate School of Arts & Humanities intern Roberta Leotta during lockdown 2020. More details about the background to the project can be found here.

It’s often touted that there are some cities in Scotland (coughEdinburghcough) where there are more statues to animals than there are to women. In my own work transferring OpenPlaques data to Wikidata I’ve observed that there are more entries for Charles Rennie Macintosh than there for women in Glasgow. So in this light, it’s somewhat refreshing to work on a project that celebrates all kinds of memorials to women in Scotland.

The Women of Scotland: Mapping Memorials project began in 2010 as a joint project between Glasgow Women’s Library, and Women’s History Scotland. It’s similar in many ways to OpenPlaques, but using Wikidata could add an extra dimension – let’s increase the coverage of women’s history and culture on the Wikimedia projects by getting these memorials and the women they celebrate into Wikidata, use that to identify gaps in knowledge, and then work to fill the gap.

Over the two days, here’s what we did:

Data collection

We scraped the initial list of data from Mapping Memorials website manually, and created a shared worksheet based on a model that’s been used previously for other cities. (The manual process is slow, and a bit fiddly, and is the one thing that I wouldn’t do again. We’re in contact with the admin so going forward, I’m hopeful that we wouldn’t need to repeat this step in the future.)

Once we had this list, we could create a more automated process to deal with gathering the other pieces of information we needed to create new, good quality Wikidata items, although some (description, for example) needed a human eye.

Wikidata identifiers

We were using two main identifiers on Wikidata – P8048 (Women of Scotland memorial ID) and P8050 (Women of Scotland subject ID). The former for the entries to the memorials themselves, and the latter for the women they celebrate. Where the women didn’t have entries, we could create those, and then link them to the entries for the memorials.

Both identifiers use the last part of the URL for each entry on the Mapping Memorials site, so that was fairly easy to do in Google docs. Once we had that info, it’s an easy enough step to bulk-create items either using Quickstatements or Wikibase CLI.

Creating items & avoiding duplicates

There’s a plug in for Google Sheets called Wikipedia and Wikidata Tools which has some useful features for projects like this – WikidataQID for looking up whether something already exists on Wikidata, and WikidataFacts, which tells you what that item is. The former is ok if you have an exact match, the latter is really useful for flagging anything which might lead to a disambiguation page, for example.

Ultimately we did end up with a few duplicates that needed to be merged, but this was pretty easily managed, and it really showed how useful it is to have local knowledge involved in local projects – there were a couple of sets of coordinates that were obviously wrong, but also some errors that wouldn’t have been spotted by someone unfamiliar with the area.

Coordinates and dates

I really like Quickstatements, but there are a few areas in which it’s fiddly, including coordinates and dates. I’m really interested in looking further into Wikibase CLI for dates in particular, as the process there for dates (documented here) looks to be substantially easier in terms of data prep than it does in Quickstatements. Many thanks to Tony for that work, as his expertise saved us a lot of time! He also used that tool to create items for those women commemorated who were missing from Wikidata, documented here.

As with dates, coordinates are entered into Quickstatements in a different format than that which you’d use manually inside Wikidata itself, hence the formatting you’ll see in column Q on the Data collection tab. Most of this we had to grab from Google Maps, which again is a bit fiddly.

Quickstatements

Once we had a master list of QIDs for the memorials we were working with, we could use Quickstatements to bulk upload sets of statements to those items.

For example, matching the memorials to the women commemorated, using this format:

Screenshot of a spreadsheet showing QID for memorials and the women they commemorate
Screenshot of a spreadsheet showing QID for memorials and the women they commemorate

The Q numbers on the left are those of the memorials, P547 is “commemorates”, and the Q numbers on the right are those of the women celebrated. We were also able to add P8050 (Women of Scotland subject ID) to some women who already had entries on Wikidata, but no WoS ID.

Screenshot of a spreadsheet showing each memorial QID and its type
Screenshot of a spreadsheet showing each memorial QID and its type

The Q number on the left again is the memorial, P31 is “instance of”, and the Q number on the right corresponds to a type of thing – a commemorative plaque, a garden, or a road, for example.

Once you’ve got the info in this format, it’s just a case of copy & pasting into QS, clicking import, and then run. (Note – you do need to be an autoconfirmed user to use QS, which means that your account must be at least 4 days old, and having more than 50 edits.) It’s relatively easy, and I was pleased that one of our relatively-new-to-Wikidata participants had the chance to make her first bulk uploads (description & commons category) using the tool over the weekend.

Photos

This project grew out of a desire to increase the coverage of Scottish heritage on Wikimedia Commons, so it was great to take some time on this. Mapping Memorials does have some images, but they’re not openly licensed, and others are missing. After Wikimedia Commons, our next port of call was Geograph, where many images have been released on Wiki-compatible Creative Commons licenses. Using Geograph2Commons, images can easily be transferred over to Wikimedia Commons, so that they can be used in any Wikimedia Project. Geograph also links to this feature from their site – click on “Find out how to reuse this image”, and then scroll down to “Wikipedia template for image page”, then click on the “geograph2commons” link. Really simple. Our group did some detective work for images, and then added them to Commons, and linked them manually to the Wikidata item.

This gave us a list of missing images… which is fine, but wouldn’t it be better to see them on a map?

Visualisation and filling the gaps

Thanks to Ian’s tutorial on how to create a custom WikiShootMe map, we were able to create a custom map that showed us which of the memorials we were working on had images, which didn’t, and where they were. That map is here, and it was great to see it slowly turn more green than red over the weekend as we found more images, or as volunteers headed out across Aberdeen between days to take missing pictures.

A screenshot of a clickable map where people can upload photos of monuments
A screenshot of a clickable map where people can upload photos of monuments

One of the small, but very satisfying, things you can do with these kinds of images is to integrate them into relevant Wikipedia articles. I added images from the project to the articles for Aberdeen Town House, Caroline Phillips, and Katherine Grainger. At the time of writing, around 2500 people have viewed those articles since I added the images.

Next steps

Over the course of the weekend we added 77 new memorials, and 26 new women to Wikidata, as well as a whole host of new photos. These entries all had some quite rich data, and as complete as we could make it.

We were surprised to see some of the individuals who didn’t have a Wikipedia article – and of course, we can use the Wikidata query service to identify those gaps. The queries below could give us a great starting point for an editathon, or indeed, for any Wikipedia editor interested in writing Women’s biography.

  • Wikidata query for women with a Women of Scotland subject ID, a memorial in Aberdeen, but no enwiki article: https://w.wiki/YVH
  • Wikidata query for women with a Women of Scotland subject ID, but no enwiki article: https://w.wiki/YVG

Huge thanks to the team, and to Code the City for another great hack weekend!

Dr Sara Thomas
Scotland Programme Coordinator, Wikimedia UK

——————————————————————————

Header image: The Grave of Jessie Seymour Irvine by Ian Watt on Wiki Commons  (CC-BY-SA)

Mesolithic Deeside

Mesolithic Deeside is a group of archaeologists, students and local volunteers investigating the river Dee area 10,000 years ago. They’ve been gathering flints on seasonal field-walking trips and recording the data from the outputs of those allowing them to map Mesolithic Deeside.

Close up of hand holding a lithic
Close up of hand holding a lithic

The following is a summary of what the the group with some additional helpers achieved over the two days of CTC20.

Day 1

Team: Andy, Ali, Sheila and Irvine

Notes:

  • Discussed the goals of the project with the Mesolithic Deeside Team
    • Displaying data visually for public consumption
    • Updating / refreshing the website
    • Looking at ways to identify future sites for test pitting
  • Decided to focus on developing a way of visualising the data that has been collected
  • Data is currently stored in a QGIS project and a number of csv files
  • Initial work looked into the possibility of using QGIS and Tableau for visualisation
  • Tableau was later dropped in favour of QGIS
  • Issues with Andy loading QGIS data from the project – no reason why it shouldn’t work
  • Decided that Irvine would focus on working with QGIS and Andy would focus on finding a solution with Google Maps
  • Andy has selected a subset of the data and is currently working to put that data on a Google Map
  • Data needs to be cleaned and tidied up before being displayed, i.e typos, consistent name formatting
  • Currently working with Google Sites & Awesome Table
    • Awesome table works with Google Sheets and picks up certain types from a header row – can be tricky to get working
    • Unable to disable clustering when zoomed in
Example of AwesomeTable as a Map with clustering
Example of AwesomeTable as a Map with clustering

 

Example of data being filtered by flint type
Example of data being filtered by flint type

 

Example of colour coding for the different finds
Example of colour coding for the different finds

 

Day 2

Team 1: Andy, Robert & Irvine

Team 2: Ali, Sheila & Dave

Objectives:

  • Collate the finds data into a single spreadsheet
  • Investigate simple HTML / JS implementation of a google map with filter

Notes:

  • Two extra members joined our group today: Robert and Dave
  • Provided an update and explanation of what we have done so far to Robert and Dave
  • Andy had done some extra investigating into display finds data on a google map and found that AwesomeTable was limited to a 100 views total before having to pay and suggested that looking into a free option using javascript and HTML would be a better option
  • It was decided that the we split into two groups:
    • Ali, Sheila and Dave would explore options for the Mesolithic Deeside website
    • Andy, Irvine and Robert would continue working with the flint data
  • The following codepen was found showing what we were looking for, however, the code and script was not runnable, which meant devising our own code
  • Andy focused on gathering together individual spreadsheets into a single google sheet that would later be converted to a json file for loading into the google map
    • Contains over 8,000 flint samples
    • Files needed to be manually joined as columns differed between files
  • Irvine tidied up the dropbox to ensure that only processed spreadsheet files were ready for loading, and helped with any issues that came up with the files
  • A number of entries under type needed tidying up to catch variations in spelling and change in case
  • Before loading into Google Maps, the X & Y co-ordinates needed converting from OSGB36 to Lat & Long
  • Robert began working with Google Maps API to get info boxes and data points onto a google map

https://github.com/CodeTheCity/ctc20-mesolithic-deeside

Example of an info box and colour-coded points by find type.
Example of an info box and colour-coded points by find type.
Multiple points displayed at once on a zoomed out version of the map.
Multiple points displayed at once on a zoomed out version of the map.

Summary of Technologies Used

Technology Description Comments
QGIS Original software used by Mesolithic Deeside for collating the flint finds Looked at options to use QGIS cloud, but features were limited.

Andy had issues loading the shape files and project files – likely to be a problem with Andy’s setup, as version was up to date

Python Conversion of co-ordinates from OSGB36 to Lat & Long

Conversion of main spreadsheet to JSON file

Robert put together a short script for carrying out the conversion of co-ordinates, however, points were offset. Method dropped in favour of Batch Convert Tool.
AwesomeTable Seems like a simple way to display and visualise data on a website. Has multiple options for tables and maps. Allowed for quick displaying and filtering of data. No need to worry about coding.

Limited to 100 views before you had to start paying.

Ditched in favour of a manual solution.

Batch Convert Tool Quickly converts osgb36 to wgs84 or wgs84 to osgb36 and vice versa. (link) Very quick when converting 6,000 points at once.
Google Maps (My Maps) Initially tested for displaying the points on a Google Map Limited functionality, but points could be easily colour coded for a simple visualisation
Google Maps API, Javascript & HTML A manual way of displaying a Google Map on a webpage. Allows for full control over what is displayed. Google Maps API was very tricky to work with. Took a bit of working out how to get points and info boxes to display correctly.

The selected solution for going forward

Google Sheets Used for compiling flint finds into one file from multiple csv files. Easily allowed multiple users to work on the same spreadsheet at the same time
Close up of hand holding a lithic
Close up of hand holding a lithic

Header and other images of lithics by Mesolithic Deeside on Wikimedia Commons CC-BY-SA

1914 – 1920 Aberdeen Harbour Arrivals Transcription Project – CTC20 update

Building on our foundations

After such a successful weekend at CTC19, we were delighted to be back for CTC20 to continue work on the Aberdeen Harbour Arrivals project. As expected, the team working on the project was made up of both avid coders and history enthusiasts which brings a great range of skills and knowledge to the weekend.
A second spreadsheet was created to input adjustments, this allowed us to clean data to be more presentable whilst keeping the accurate ledger transcriptions intact; a must when dealing with archival material. This data cleaning has allowed us to create a more presentable website which is easier to understand and navigate.

Expanding the data set

The adjustments spreadsheet also included the addition of a new column of information sourced externally from the original transcription documents. When first registered fishing vessels were assigned a Fishing Port Registration Number. Where known, that number has been added and will hopefully allow us to cross reference this vessels with other sources at some point in the future.

Vessel types and roles

Initial steps were taken to begin to create a better understanding about the various vessels, their history and purpose. Many of the vessel names contain prefixes relating to their type (e.g. HMS – His Majesty’s Ship for a regular naval vessel, HMSS for a submarine) and they have now been extracted and a list of definitions is being built up. Decoding these prefixes highlighted just how much naval military activity was taking place around Aberdeen during the First World War.

Visualising the data

Some of the team also looked forward to consider how the data could be used in the future. A series of graphs and charts have been created to highlight patterns such as most frequent ships and most popular cargo. We even have an interactive map to show where the in the world the ships were arriving from.

As with CTC19, the weekend has been a great success. Archivists learned more about data and the coders benefitted from over 15,000 records to play with.

Next steps

An ideal future step for the project is the creation of individual records in the website for each vessel so we can begin to expand on the information – i.e. vessel name, history of Masters, expanded description about what it was, what role in played in the First World War. Given the heavy use of Wikidata by many of the other projects that were part of CTC19 and CTC20, consideration has to be given to using Wikidata as the expanded repository for building up the bigger picture for each vessel. However, as we are still very much in the historical investigation stage and not entirely sure about the full facts for many vessels it would not be appropriate at this stage to start pushing unverified information into Wikidata.