CTC8 – Thanks to our sponsors

Code The City Weekends would not happen were it not for the generosity of our sponsors.

As we approach CodeTheCity #8 we must recognise two organisations who have backed this event.

The first is The Health and Social Care Alliance Scotland (the ALLIANCE) who is sponsoring the event through its ALISS Programme.

The ALISS Programme is excited to be sponsoring and attending the Code the City, AI and Chatbots hack weekend. This area of our work allows us to test concepts and prototype real world solutions to problems like; how does a person who is living with sight loss access the great local resources available on ALISS? Or, how does a person who finds it challenging to use normal desk-based computers access the local support available through ALISS? We will hopefully test these types of problems at the hack weekend and follow this up with a blog on our work.

ALISS Logo
ALISS Logo

 

 

 

 

 

Please follow the Alliance and their work on Twitter: at @ALLIANCEScot  @ALISSProgramme or the hashtag search for #ALISS .  If you are attending the weekend please make sure you hook up with @DouglasMaxw3ll and have a chat to him about the great work that his organisation does.

Our other main sponsor is Fifth Ring.

contact___fifth_ring

Fifth Ring is a marketing and communications agency in Aberdeen, Houston and Singapore with a big focus on digital and inbound marketing. Fifth Ring is already experimenting with conversational interfaces for some of their client work.

If you are attending the weekend please make sure to say hello to Steve Milne or Alan Stobie to discuss some of the great work they do.

You can find Fifth Ring on the web, on Twitter, and on LinkedIn.

 

Scraping Goes Off The Rails

This post was originally published on 10ml.com by Ian Watt

The art of scraping websites is one beset by difficulties, as I was reminded this week when re-testing a scraper that I built recently.

Schienenbruch

 

Railway performance

As part of my participation in 100 Days of Code I’ve been working on a few projects.

The first one that I tackled was a scraper to gather data from the PDF performance reports which are published on a four-weekly cycle Scotrail’s website. On the face of it this is a straightforward things to do.

  1. Find the link to the latest PDF on the performance page using the label “Download Monthly Performance Results”.
  2. Grab that PDF to archive it. (Scotrail don’t do that – they vanish each one and replace it with a new one every four weeks, so there is no archive).
  3. Use a service such as PDFTables which has an API, uploading the PDF and getting a CSV file in return (XSLX and XML versions are also available but less useful in this project).
  4. Parse the CSV file and extract a number of values, including headline figures, and four monthly measures for each of the 73 stations in Scotland.
  5. Store those values somewhere. I decided on clean monthly CSV output files as a failsafe, and a relational SQLite database as an additional, better solution.

Creating the scraper

So, I built the bones of the scraper in a few hours over the first couple of days of the year. I tested it on the then current PDF which was for period nine of 2016-17. That worked, first creating the clean CSV, then later adding the DB-write routines.

Boom – number 1

I then remembered that I had downloaded the previous period’s PDF. So I modified the code (to omit the downloading routine) and ran it to test the scraping routine on it – and it blew up my code. The format of the table structure in the PDF had changed with an extra blank link to the right of the first list of station names.

After creating a new version and publishing that, I sat back and waited for the publication of period 10 data. That was published in the middle of this week.

Boom – number 2

I re-ran the scraper to add that new PDF to my database – and guess what? It blew up the scraper again. What had happened? Scotrail had changed the structure of the filename of the PDF – from using dashes (as in ‘performance-display-p1617-09.pdf’) to underscores (‘performance_display_p1617_10.pdf’)

That change meant that my routine for sicking out the year and period, which is used to identify database records, broke. So I had to rewrite it. Not a major hassle – but it means that each new publication has necessitated a tweaking of the code. Hopefully in time the code will be flexible enough to accommodate minor deviations from what is expected without manual changes. We’ll see.

We’re ‘doing the wrong thing righter’ – Drucker

Of course, none of this should be necessary.

In a perfect world Scotrail would publish well structured, machine-readable open data for performance. I did email them on 26th November 2016, long before I started the scraper, both asking for past periods’ data and asking if they wanted assistance in creating Open Data. I got a customer service reply on 7th December saying that a manager would be in touch. To date (15 Jan 2017) I’ve had no further response.

The right thing

Abelio operates the Scotrail franchise under contract to the Scottish Government.

Should the terms of such contracts not put an obligation on the companies not only to put the monthly data into the public domain, but also that it be made available as good open data – and follow the Scottish Government’s on strategy for Open Data ? Extending the government’s open data obligation to those performing contracts for governments would be a welcome step forward for Scotland.

CTC7 – Health – Final Presentations

These are the six presentations made by the teams at the conclusion of Code The City 7, Health Hack, captured on periscope.tv.

Team Float My Boat

An enhanced prototype has been created, with plans to create a more complete version. Using postcodes and mapping it would be straightforward to consume good data from elsewhere if available.

Some community centres and churches have over 100 groups operating at some point in the month. They can be hugely valuable, but somewhat invisible to the internet. Just making the existence of many of these groups visible can be a big step.

Also discussion of the importance of occupational therapists, librarians, dog walkers – many different individuals in the community that can feed valuable information into this kind of platform – important to remember that it’s not just primary care data that matters.

https://www.periscope.tv/w/1gqxvRgODoexB

Some interesting visualisations of the underlying data were also created, and led to some interesting discussions around assumptions that are made about data. Again, the value of having the experts in the room at a hack event was demonstrated, as assumptions were challenged, and analysis changed based on feedback. Such feedback can often take weeks to acquire – but was available during the presentation. A snapshot of the data is available on github, and you can see the visualisation here.

https://www.periscope.tv/w/1dRKZRLnErbKB

Team Text

phone

The team have a working prototype, with functioning logic to query the Aliss dataset and return three results vis SMS. Pulling json data from Aliss based on a query generated from the SMS exchange, and sending those results.

The team say that there is still work to do to make this production ready, and some of the language processing and logic could be improved – but getting a working prototype over the course of the weekend is a real achievement. You can see elements of the code on github.

https://www.periscope.tv/watty62/1nAKEkZeAqXJL?

Team Pomoc

The team have created a video prototype, which looks great. The full Polish translation is complete, and will be added to the video using youtube closed captions, as well as an audio overlay later.

The project is to be presented to a group of GPs later this week for feedback as to usefulness and likely impact. Code, and scripts, are posted to the team github page.

https://www.periscope.tv/watty62/1vAGRXqNVBaxl?

Team Delta Test

The limiting factor for this team has been the size of the datasets that they are working with, and the speed at which these can be moved around. Despite early setbacks with port access through the wifi (something we’re working on for the next weekend) the team were able to show some real results for the final presentation.

Some interesting findings around the geotagging, and inconsistencies that can arise. Some really interesting possible extensions to the project were discussed. The plan is to take this project ‘back to the office’ as the prototype for a full roll out to help optimise the use of lab support for GPs.

https://www.periscope.tv/watty62/1kvJpqRjjzMxE?

Team Friend Tree

This team found that overlaps between their objectives and those of other teams were significant, so concentrated on some of the more ‘marketing’ aspects of service delivery – identity, and some thoughts around messaging to bring people into the service.

Hand drawn illustrations
Hand drawn illustrations

A good example of a service that could be rolled out quickly on top of the kind of datasets being used by the Float your boat project.

https://www.periscope.tv/watty62/1eaKblAmPdnJX?

On the horizon

We currently deliver several types of event: Hackathons, Aberdeen Python User Group (APUG) meetings, Aberdeen Data Meetups (ADM), Aberdeen Wiki Meetup, and the annual Scottish Open Data Unconference (SODU). Others use our space such as the Aberdeen Linux User Group (AberLUG).

Our events are now held at our base at The Soap Factory, 111 Gallowgate, Aberdeen AB25 1BU.

Our upcoming events include:

  • 📅 Tues 4th Nov – ADM – Lightning Data TalksDetails and booking
  • 📅 Sat 8th Nov – Volunteer Clean-Up Day (10:00 – 16:00). Just drop in and help us continue with the spruce-up!
  • 📅 Weds 12th Nov- APUG – Two talks including “Historical Data Meets Modern Tech” – Details and Booking
  • 📅 Mon 24th Nov – Wiki Meetup – Details soon
  • 📅 Sun 30th Nov  2pm – Aberdeen Linux User GroupDetails and booking
  • 📅 Thurs 4th Dec – Christmas Fundraising Social Night with mega raffle draw – Book here Early bird discounted tickets on sale until 31st Oct.

We are seeking sponsorship for both the Aberdeen Data Meetup, and the Aberdeen Python User Group. If your company is interested in this opportunity please contact info@codethecity.org.

Note – Our events have reverted to being run physically (with an option to attend virtually) where we can accommodate this.

To get advanced notice of our events, and make sure of a place, why not sign-up for our bi-monthly, spam-free, mailing list?

CTC7 – Health – more ideas than you can shake a prescription pad at

In the lead up to Code the City 7 we sent attendees some blank Barrier and Opportunity cards.  We asked them to complete and bring them – with a single suggestion or idea per sheet.

On arrival people were to stick them to the wall. The response was great – with an enormous display of creativity quickly assembled. Many of these suggestions grouped well together.  As we got started, five volunteers stepped forward to be the champion for one idea each, which formed the starting point of each of the projects taken forward during the weekend. You can read more about these from this blogpost onwards. Even the drawings accompanying the ideas were great – see the montage above!

But what of the remaining ideas – of which there were dozens? I read each of them and have summarised some of them – often grouping several together – below. Each of these has merit as a potential area to explore further (perhaps at a future event).

  • Find out how busy a GP practice is, before you register

This links number of a blog post I wrote recently about the ratio of  GPS to patients at Scottish Surgeries.

  • Information on GP practices

It is suggested that there is no consistency across the NHS Grampian area – with some good examples of websites and some poor.

  • Waiting times for appointments at GPs’ surgeries?

Where is the data to show which days are busier than others. How could that help patients?

  • Live Tracking of referrals to consultants

Patients, on being referred to a consultant are often left in the dark for weeks or months until a letter arrives. How could that be made transparent? Could we have a ‘track my referral’ as you would a ‘track my parcel’? How or when will you get an appointment with a consultant? Could you self select from calendar rather than get one which doesn’t suit and has to be changed.

  • Lack of data interoperability between elements of health service / Health and Social Care etc.
  • Assist GPS to do more online – self service –  online calendars for appointments  – meaning that they can spend longer with patients or reduce waiting times for appointments
  • Citizen / Patient digital literacy

How could we assist patients to use digital services as these are developed. Which also raise the issue of health literacy – how could we assist people to understand their own health – e.g. cause and effect.

  • Persuade / help GPs to get citizens to use informal / community-based support
  • A shared calendaring across NHS Grampian to share training opportunities. Much training is common but is delivered is a siloed basis.
  • Develop a common organogram showing remits, areas of operation across the formal and inform H&SC landscape
  • Address the challenges of patients being treated in parallel between two specialists, so that they don’t feel that they are being passed from pillar to post.

These ideas alone would feed another three hack weekends! If you are interested in working or these – or sponsoring a further weekend such as this, please let us know!