CTC23 – the future of the City. A new theme to explore. After introductions, initial ideas were sought for the Miro board – to ease us along Bruce put on some Jazz music. This was to inspire Dimi to put forward an idea on how sounds in a City could be. This post-its gathered interest and the Social Sounds project and team was formed.
Dimi shared his vision and existing knowledge on sound projects, namely, Luckas Martinelli project This would become the algorithmic starting point visualising sound data on a map. The first goal was to using this model and apply it and visualise a sound map of Aberdeen. This was achieved over the weekend but was only half of the visualisation goals. The other half was to look build a set of tools that would allow communities to envision and demonstrate noise pollution reductions through interventions, green walls, trees plantings or even popup band stands. An early proof of concept toolkit was produced. The social in social sounds references to community, connecting all those connected by sound and place. The product concluded by show how this social graph could be export to a decision making platform e.g.loomio.org
What is next? The algorithmic model needs to be ground with real world sound sensor data. Air quality devices in Aberdeen can be upgraded with a microphone. Also, noise itself needs to be included in the map experience, this can be achieved through a sound plug in of existing recordings. The toolkit needs much more work, it needs to give members of the community the ability to add their own intervention ideas and for those ideas to be visualised on the map, highlight the noise reduction potential or enhancement, permanent or temporary. Much achieved, much to do.
Open data has the power to bring about economic, social, environmental, and other benefits for everyone. It should be the fuel of innovation and entrepreneurship, and provide trust and transparency in government.
But there are barriers to delivering those benefits. These include:
Knowing who publishes data, and where,
Knowing what data is being published – and when that happens, and
Knowing under what licence (how) the data is made available, so that you can use it, or join it together with other agencies’ data.
In a perfect world we’d have local and national portals publishing or sign-posting data that we all could use. These portals would be easy to use, rich with metadata and would use open standards at their core. And they would be federated so that data and metadata added at any level could be found further up the tree. They’d use common data schemas with a fixed vocabulary which would be used as a standard across the public sector. There would be unique identifiers for all identifiable things, and these would be used without exception.
You could start at your child’s school’s open data presence and get an open data timetable of events, or its own-published data on air quality in the vicinity of the school (and the computing science teacher would be using that data in classes). You could move up to a web presence at the city or shire level and find the same school data alongside other schools’ data; and an aggregation or comparison of each of their data. That council would publish the budget that they spend on each school in the area, and how it is spent. It would provide all of the local authority’s schools’ catchment areas or other LA-level education-specific data sets. And if you went up to a national level you’d see all of that data gathered upwards: and see all Scottish Schools and also see the national data such as SQA results, school inspection reports – all as open data.
But this is Scotland and it’s only six years since the Scottish Government published a national Open Data Strategy; one which committed data publication would be open by default.
Looking at the lowest units – the 32 local authorities – only 10, or less than a third, even have any open data. Beyond local government, of the fourteen health boards none publishes open data, and we note that of the thirty Health and Social Care Partnerships onlyone has open data. Further, in 2020 it was found that of an assumed 147 business units comprising Scottish Government (just try getting data of what comprises what is in the Scottish Government) – 120 have published no data.
And, of course there are no regional or national open data portals. Why would Scottish Government bother? Apart, that is, from that six year old national strategy and an EU report in 2020 from which it was clear that OD done well would benefit the Scottish economy by around £2.21bn per annum? Both of these are referred to in the Digital Strategy for Scotland 2021.
Why there is no national clamour around this is baffling.
And despite there being a clear remit at Scottish Government for implementing the OD Strategy no-one, we are told, measures or counts the performance nationally. Because if you were doing this poorly, you’d want to hide that too, wouldn’t you?
And, for now, there is no national portal. There isn’t even one for the seven cities, let alone all 32 councils. Which means there is
no facility to aggregate open data on, say, planning, across all 32 councils.
no way to download all of the bits of the national cycle paths from their custodians.
no way to find out how much each spends on taxis etc or the amount per pupil per school meal.
There is, of course, the Spatial Hub for Scotland, the very business model of which is designed (as a perfect example of the law of unintended consequences) to stifle the publication of open data by local government.
So, if we don’t have these things, what do we have?
What might we expect?
What should we expect from our councils – or even our cities?
Remember, back about 2013 , both Aberdeen and Edinburgh councils received funding from Nesta Scotland to be part of Code For Europe where they learned from those cities above. One might have expected that by now they’d have reached the same publication levels as these great European cities by now? We’ll see soon.
But let’s be generous. Assume that each local authority in Scotland could produce somewhere between 100 and 200 open data sets.
Scotland has 32 local authorities
Each should be able to produce 100 – 200 datasets per authority – say 150 average
= 150 x 32 = 4800 data sets.
The status quo
Over the weekend our aim was to look in detail at each of Scotland’s 32 local authorities and see which was publishing their data openly – to conform with the 2015 Open Data Strategy for Scotland. What did we find?
As we’ve noted above there is no national portal. And no-one in Scottish Government is counting or publishing this data. So, following the good old adage, “if you want something done, do it yourself”, a few of us set about trying to pull together a list of all the open datasets for Scotland’s 7 cities and the other 25 authorities. For the naive amongst us, it sounded like an easy thing to do. But getting started even became problematic. Why?
Only some councils had any open data – but which?
Only some of those had a landing page for Open Data. Some had a portal. Some used their GIS systems.
Those that did provide data used different categories. There was no standardised schema.
For others, some had a landing page but then additional datasets were being found elsewhere on their websites
Contradictory licence references on pages – was it open or not?
We also looked to see if there was already a central hub of sorts upon which we could build. We found reference to Open Data on Scottish Cities Alliance website but couldn’t find any links to open data.
Curiosity then came into play, why were some councils prepared to publish some data and others so reluctant? What was causing the reluctancy? And for those publishing, why were all datasets not made open, what was the reason for selecting the ones they had chosen?
What we did
Our starting point was to create a file to allow us to log the source of data found. As a group, we decided upon headers in the file, such as the type of file, the date last updated to name but a few.
From previous CTC events which we attended we knew that Ian had put a lot of effort previously into creating a list of council datasets – IW’s work of 2019 and 2020which became our starting source. We also knew that Glasgow and Edinburgh were famous for having large, but very out of date, open data portals which were at some point simply switched off.
We were also made aware of another previous attempt from the end of 2020 to map out the cities’ open data. The screenshot below (Fig 1) is from a PDF by Frank Kelly of DDI Edinburgh which compared datasets across cities in Scotland. You can view the full file here.
For some councils, we were able to pull in a list of datasets using the CKAN API. That worked best of all with a quick bit of scripting to gather the info we needed. If all cities, and other authorities did the same we’d have cracked it all in a few hours! But it appears that there is no joined up thinking, no sharing of best practices, no pooling of resources at play in Scotland. Surely COSLA, SCA, SOCITM and other groups could get their heads together and tackle this?
For others there were varying degrees of friction. We could use the arcGIS API to gather a list of data sets. But the arcGIS API tied us up in knots trying to get past the sign in process, i.e. did we need an account or could we use it anonymously – it was difficult to tell. Luckily with an experienced coder in our team we were able to make calls to the API and get responses – even if these were verbose and needed manual processing afterwards. This post from Terence Eden “What’s your API’s “Time To 200”?” is really relevant here!
For the rest it was a manual process of going into each city/council website and listing files. With three of us working on it for several hours. We succeeded in pulling together the datasets from the different sources into our csv file.
Ultimately, the sources were so varied and difficult to navigate that it took 5 digitally-skilled individuals a full day, that is 30 man-hours, to pull this data together. Yet if we have missed any, as we are sure to have done, it may be because they have moved or are hidden away. Let us know if there are more.
From this output it became clear that there was no consistency in the types of files in which the data was being provided and no consistency in the refresh frequency. This makes it difficult to see a comprehensive view in a particular subject across Scotland (because there are huge gaps) and makes it difficult for someone not well versed in data manipulation to aggregate datasets, hence reducing usability and accessibility. After all, we want everyone to be able to use the data and not put barriers in the way.
We have a list, now what
We now had a list of datasets in a csv file, so it was time to work on understanding what was in it. Using Python in Jupyter Notebooks, graphs were used to analyse the available datasets by file type, the councils which provided it, and how the data is accessed. This made it clear that even among the few councils which provide any data, there is a huge variation in how they do that. There is so much to say about the findings of this analysis, that we are going to follow it up with a blog post of its own.
One of our team also worked on creating a webpage (not currently publicly-accessible) to show the data listings and the graphs from the analysis. It also includes a progress bar to show the number of datasets found against an estimated number of datasets which could be made available – this figure was arbitrary but based on a modest expectation of what any local authority could produce. As you saw above, we set this figure much lower than we see from major cities on the continent.
What did we hope to achieve?
A one stop location where links to all council datasets could be found.
Consistent categories and tags such that datasets containing similar datasets could be found together.
But importantly we wanted to take action – no need for plans and strategies, instead we took the first step.
As we noted at the start of this blog post, Scotland’s approach Open Data is not working. There is a widely-ignored national strategy. There is no responsibility for delivery, no measure of ongoing progress, no penalty for doing nothing and some initiatives which actually work against the drive to get data open.
Despite the recognised economic value of open data – which is highlighted in the 2021 Digital Strategy but was also a driver for the 2015 strategy! – we still have those in government asking why they should publish and looking specifically to Scotland (a failed state for OD) for success stories rather than overseas.
We’ve seen closed APIs being, we assume, to try to measure use. We suspect the thinking goes something like this:
In order for open data to be a success in Scotland we need it to be useful, usable, and used.
That means the data needs to be geared towards those who will be using it: students, lecturers, developers, entrepreneurs, data journalists, infomediaries. Think of the campaign in 2020 led by Ian to get Scottish Government to publish Covid data as open data, and what has been made of it by Travelling Tabby and others to turn raw data into something of use to the public.
The data needs to be findable, accessible, and well structured. It needs to follow common standards for data and the metadata. Publishers need to collaborate – coordinate data releases across all cities, all local authorities. ‘Things’ in the data need to use common identifiers across data sets so that they can be joined together, but the data needs to be usable by humans too.
The data will only be used if the foregoing conditions are met. But government needs to do much more to stimulate its use: to encourage, advertise, train, fund, and invest in potential users.
The potential GDP rewards for Scotland are huge (est £2.21bn per annum) if done well. But that will not happen by chance. If the same lacklustre, uninterested, unimaginative mindsets are allowed to persist; and no coordination applied to cities and other authorities, then we’ll see no more progress in the next six years than we’ve seen in the last.
While the OGP process is useful, bringing a transparency lens to government, it is too limited. Government needs to see this as an economic issue as is the case, and one which the current hands-off approach is failing. We also need civic society to get behind this, be active, visible, militant and hold government to account. What we’ve seen so far from civic society is at best complacent apathy.
Scotland could be great at this – but the signs, so far, are far from encouraging!
Team OD Bods (Karen, Pauline, Rob, Jack, Stephen and Ian)
Once data was found, the next stage was finding out the licensing rights and whether or not the data could be downloaded and legitimately reused. The data found on Canmore’s website indicated that it was provided under an Open Government Licence hence could be uploaded to Wikidata. This is the data source which was then used on day two of the project.
A training session on how to use Wikidata was also required on day one to allow the team to understand how to upload the data to Wikidata and how the identifiers etc worked.
Day two – cleaning and uploaded the data to Wikidata.
Deciding on the identifiers to use in Wikidata was the starting point, then the data had to be cleaned and manipulated. This involved translating Easting and Northings coordinates to latitude and longitude, matching the ship types between the Canmore file and Wikidata, extracting the reference to the ship from Canmore’s URL and general overall common sense review of the data. To aid with this work a Python script was created. It produced a tab separated file with the necessary statements to upload to Wikidata via Quickstatements.
The team members were new to Wikidata and were unable to create batch uploads as they didn’t have 4 days since creating their accounts and 50 manual edits to their credit – a safeguard to stop new accounts creating scripts to do damage.
We asked Ian from Code The City to assist, as he has a long editing history. He continues this blog post.
I downloaded the output.txt file and checked if it could be uploaded straight to Quickstatements. It looked like there were minor problems with the text encoding of strings. So I imported the file into Google Docs. There, I ensured that the Label, Description and Canmore links were surrounded in double quotation marks. A quick find and replace did this.
I tested an upload of five or six entries and these all ran smoothly. I then did several hundred. That turned up some errors. I spotted loads of ships with the label “unknown” and every wreck had the same description. I returned to the Python script and tweaked it to concatenate the word “Unknown” with a Canmore ID. This fixed the problem. I also had to create a checking method of seeing if our ship had already been uploaded. I did this by downloading all the matching Canmore IDs for successfully uploaded ships. I then filtered these out before re-creating the output.txt file.
I then generated the bulk of the 24,185 to be uploaded. I noticed a fairly high error rate. This was due to a similar issue to the Unknown-named ships. The output.txt script was trying to upload multiple ships with the same names (e.g. over 50 ships with the name Hope). I solved this in the same manner as with Unknown-named wrecks, concatenating ship names with “Canmore nnnnnn.”
I prepared this even as the bulk upload was running. Filtering out the recently uploaded ships and re-running the creation of the Output.txt file meant that within a few minutes I was able to have the corrective upload ready. Running this a final time resulted in all shipwrecks being added to WIkidata, albeit with some issues to fix. This had taken about a day to run, refine and rerun.
The following day I set out to refine the quality of the data. The names of shipwrecks had been left in sentence case: an initial capital and everything else in lower case. I downloaded a CSV of records we’d created, and changed the Labels to Proper Case. I also took the opportunity to amend the descriptions to reflect the provenance of the records from Canmore in the description of each. I set one browser the task of changing Labels, and another the change to descriptions. This was 24,185 changes each – and took many hours to run. I noticed several hundred failed updates – which appear to just be “The save has failed” messages. I checked those and reran them. Having no means of exporting errors from Quickstatements (that I know of) makes fixing errors more difficult than it should be.
Finally I noticed by chance that a good number of records (estimated at 400) are not shipwrecks at all but wrecks of aircraft. Most, if not all, are prefixed “A/C’ in the label.
I created a batch to remove statements for ships and shipwrecks and to add statements saying that these are instances of crash sites. I also scripted the change to descriptions identifying these as aircraft wrecks rather than ship wrecks.
I’ve noted the following things that the team could do to enhanced and refine the data further:
Check what other data is available by download or scraping from Canmore (such as date of sinking, depth, dimensions) and add that to the wikidata records
Attempt to reconcile data uploaded from Aberdeen built ships at CTC19 with these wrecks – there may be quite a few to be merged
Finally, in the process of working on the cleaning of this uploaded data I noticed the the data model on Wikidata to support this is not well structured.
This was what I sketched out as I attempted to understand it.
Before I changed the aircraft wrecks to “crash site” I merged the two items which works with the queries above. But this needs more work.
Should the remains of a crashed aircraft be something other than a crash site? The latter could be cleared of debris and still be the crash site. The term Shipwreck more clearly describes where a wreck is whether buried, on land, or beneath the sea.
Why is a shipwreck a facet of a ship, but a crash site is a subclass of aircraft.
And Disaster Remains seems like the wrong term for what might be a non-disastrous event (say if a ship from the middle ages gently settled into mud over the centuries and was forgotten about – and certainly isn’t a subclass of Conservation Status, anyway.
I’d be happy to work with anyone else on better working out an ontology for this.
During the course of Code The City 17: Make Aberdeen Better this weekend we made a startling discovery. It is easier to recycle your old fridge-freezer than to get data and content for re-use from Scottish public sector websites. As a consequence, innovating new solutions to common problems and helping make things easier for citizens is made immeasurably more difficult.
One of the event’s challenges posed was “How do we easily help citizens to find where to recycle item ‘x’ in the most convenient fashion. That was quickly broadened out to ‘dispose of an item” since not everything can be recycled – some might be better reused, and others treated as waste, if it can’t be reused or recycled. With limited kerbside collections, getting rid of domestic items mainly involves taking them somewhere – but where?
With climate change, and the environment on most people’s minds at the moment, and legislative and financial pressures on local authorities to put less to landfill, surely it is in everyone’s interest to make it work as well as it can.
To test how to help people to help themselves by giving advice and guidance, we came up with a list of 12 items to test this on – including a fridge, a phone charger, a glass bottle, and tetra pack carton. On the face of it this should be simple, and probably has been solved already.
The Github Repo
All of Code The City hack weekend projects are based on open data and open source code. We use Github to share that code – and any other digital artefacts created as part of the project. All of this one’s outputs can be found (and shared openly) here.
That was where we started: looking to see if the problem has already been solved. There is no point in reinventing the wheel.
We looked for two things – apps for mobile phones, and websites with appropriate guidance.
Aberdeen specific information?
Since we were at an event in Aberdeen we first looked at Aberdeen City Council’s website. What could we find out there?
Not much as it turned out – and certainly not anything useful in an easy-to-use fashion. On the front page there was an icon and group of suggested services for Bins and recycling; none of which were what we were looking for.
Typing recycling into the search box (and note we didn’t at this stage know if our hypothetical item could be recycled) returned the first 15 of 33 results. As shown below.
The results were a strangely unordered list – neither sorted alphabetically nor by obvious themes. So relevant items could be on page 3 of the results. Who wants to read policies if they are trying to dispose of a sofa? Why are two of (we later discovered) five recycling centres shown but three others not? Why would I as a citizen want to find out about trade waste when I just want to get rid of a dodgy phone charger?
Why is there a link to all recycling points (smaller facilities in supermarket carparks or such like, with limited acceptance of items), but apparently not to all centres which cover much more items? Actually there is a link ‘Find Your Nearest Recycling Centre’ (but not your nearest recycling point which are much more numerous). This takes you a map and tabular list of centres and what they accept. And it is easy to miss the search box between the two. No such facility exists for the recycling points.
Perhaps there is open data on the ACC Data portal that we could re-purpose – allowing us to build our own solution? Sadly not – the portal has had the same five data sets for almost two years, and every one of those has a broken link to the WMSes.
If we were in Dundee we could download and use freely their recycling centre data. But not in Aberdeen.
Apps to the rescue?
There are some apps and services that do most of what we are trying to do. For example iRecycle – Iphone and Android is a nice app for Android and iOS that would work were it not for US locations only.
We couldn’t find something for Scotland that worked as an App.
Other sources of information?
Since we drew a blank as far as both Aberdeen City Council and any useable apps, we widened our search.
Recycle For Scotland
The website Recycle For Scotland (RFS) is, on the face of it a useful means to identify what to do with a piece of domestic waste. Oddly, there appears not to be any link to it that we could find from any of the ACC recycling pages.
BUT …… it doesn’t work as well as it could and the content, and data behind it have no clear licence to permit reuse.
The Issues with RFS
Searching the site, or navigating by the menus, for Electrical Items results in a page that is headed “This content was archived on 13th August 2018” – hardly inspiring confidence. No alternative page appears to exist and this page is the one turned up in navigation on the site.
Searching for what to do with batteries in Aberdeen results in a list of shops at least one of which closed down about 18 months ago. Entering a search means entering your location manually – every time you search! This quickly becomes wearing.
While the air of neglect is strong, the site is at least useful compared to the ACC website. But it doesn’t do what we want. Perhaps we could re-use some of the content? No – there is no clear licence regarding reuse of the website’s content.
ZWS are publicly funded by the Scottish Government and the European Regional Development Fund – all public money.
Public funding should equal open licences
We argue that any website operated by a government agency, or department, or NDPB, should automatically be licensed under the Open Government Licence (OGL). And any data behind that site should be licensed as Open Data.
Changing the licensing of Recycle For Scotland website, making its code open source, and making its data open would have many benefits.
its functionality could be improved on by anyone
the data could be repurposed in new applications
errors could be corrected by a larger group than a single company maintaining it.
Where did this leave us?
Having failed to identify an app that worked for Scotland, nor interactive guidance on the ACC website, we tried the patchy and, on the face of it, unreliable RFS site. We’d turned to the data and whether we could construct something useable from open data and repurposed, fixed, content over the weekend – this is a hack event after all.
But in this we were defeated – data is wrapped up in web pages: formatted for human readability, not reuse in new apps.
Websites which were set up to encourage re-use and recycling ironically prohibit that as far as their content and data is concerned, and deliberately stifle innovation.
Public funding from the City Council, the Scottish Government and the European Regional Development Fund is used to fund sites which you have paid but elements of which you cannot reuse yourself.
At a time of climate crisis, which the Scottish Government has announced is a priority action, it can’t be right that not only is it difficult to find ways to divert domestic items from landfill, but also that these Government-funded websites have deliberate measures in place to stop us innovating in order to make access to reuse and recycle easier!
Hopefully politicians, ministers and councillors will read this (please draw it to their attention) and wake up to the fact that Scotland deserves, and needs, better than this.
Only by having an Open Data by default policy for the whole of the Scottish Public Sector, and an open government licence on all websites can we fix these problems through innovation.
After all if the non-functioning Northern Ireland Assembly can come up with an open data strategy that commits the region to open data by default, why on earth can’t Scotland?
“Northern Ireland public sector data is open by default. Open by default is the first guiding principle that will facilitate and accelerate Open Data publication.”
[Edit – Added 12-Nov-2019]
If you are interested to read more about the poor state of Scottish Open Data you might be interested in this post I wrote in February 2019 which also contains links to other posts on the subject:
At Code The City our objective is help our local community become literate in both technology and data and to use them to full advantage. We help people, organisations and charities to gain the right skills. We are improving what we do at Code The City, and how we do it: changes which are fundamental to making that vision a reality.
Our work up to now
Over the past five years we’ve run 16 Hack Weekends and, in Spring 2018, we started to host monthly data meet-ups. Both things have been very successful but are not the sum total of our ambitions. To deliver those fully we needed a base from which to operate and to grow.
We’re now set up in the ONE Tech Hub, hosted by ONE Codebase. This has cemented our position as part of the local ecosystem. Since moving in six weeks ago we’ve launched the Young City Coders sessions. Our first one, last week attracted 22 keen young people and there is a waiting list for places. We’ll run those twice a month from now on. We’re really grateful for assitance we have received. Inoapps gave us sponsorship to get these sessions started, and both they and the James Hutton Institute donated used laptops.
The immediate future
In another six weeks or so we’ll start a Tech Tribe. That’s the name we’ve given to a programme to get people, and women in particular, into STEM careers and education. Many of them missed the chance first time round. The Data Lab already sponsor our Data Meet-ups and are now sponsoring these sessions, too.
All this educational activity is reliant on volunteer time. Two of our founders, Ian and Bruce, have now become STEM ambassadors. Part of that was getting PVG checks to allow them to work with children and vulnerable adults. We have a handful of others who are going to go through the same process. But, we want to be resilient, and scale up and so we need more people. If you would like to volunteer and get the appropriate certification, please get in touch.
This week also sees the start of the new Aberdeen Python User Group which kicks off on Wednesday. Python is by many measures the most popular, flexible and growing programming language which is used in data science, astronomy, biology, security, web development…. the list is endless.
Our next Hack Weekend will be in November and will address volunteering and civic engagement. We also hope to run another hack weekend in December just before Christmas.
We are planning a springtime event: the Scottish Open Data Unconference. Details will be announced of these very soon.
A picture takes shape
All this is like a jigsaw puzzle, the picture of which is gradually emerging as we fit the pieces together.
By running coding sessions for youngsters and mums, we are starting to help families better understand the potential of data and technology to transform their lives.
By creating Data and Python Meet-ups we are creating networking opportunities. These raise awareness of the good work that is going on in academia and industry. It exposes employers to graduate talent. We help people to share their skills, experience and expertise and to self-organise.
By running hack events we are helping charities and public sector organisations to make the most of the opportunities of digital and data to transform. We also help the local tech community of coders and developers and others to give something back to worthy causes.
By leading projects such as Aberdeen Air Quality we put the creation of data into people’s hands. This demonstrates the potential of collective endeavour for a common cause. The data is made available openly for anyone to build any new product or service. And it offers up the potential for schools and universities to use that data to better understand the local environment.
By running a national unconference we bring specialists, experts, and a wider network to the city to mix with local practitioners. This facilitates discussions at local, regional and national levels and between data users, publishers and academics at every level.
Our charity values. Your values?
In addition to all of the above, Ian, our founder CEO, is a non-executive director of the UK-wide Community Interest Company, Democracy Club. Its strapline is “Our vision is of a country with the digital foundations to support everyone’s participation in democratic life.” Now, Ian has joined the steering group of Scotland’s Open Government Network. He is also now on the board of Stirling University-led project, Data Commons.
The commitment of our charity and its founders is to create that better world underpinned by data and digital, from the ground up. That means running events of many kinds. empowering people, giving them the skills and knowledge they need.
You can do your bit too: come to meet-ups; share your work; be part of a network; becoming a STEM ambassador; coach and mentor others, put something back.