Chatbot weekend on Sat 25th to Sunday 26th February 2017.
This post was originally published on 10ml.com by Ian Watt
The art of scraping websites is one beset by difficulties, as I was reminded this week when re-testing a scraper that I built recently.
As part of my participation in 100 Days of Code I’ve been working on a few projects.
The first one that I tackled was a scraper to gather data from the PDF performance reports which are published on a four-weekly cycle Scotrail’s website. On the face of it this is a straightforward things to do.
- Find the link to the latest PDF on the performance page using the label “Download Monthly Performance Results”.
- Grab that PDF to archive it. (Scotrail don’t do that – they vanish each one and replace it with a new one every four weeks, so there is no archive).
- Use a service such as PDFTables which has an API, uploading the PDF and getting a CSV file in return (XSLX and XML versions are also available but less useful in this project).
- Parse the CSV file and extract a number of values, including headline figures, and four monthly measures for each of the 73 stations in Scotland.
- Store those values somewhere. I decided on clean monthly CSV output files as a failsafe, and a relational SQLite database as an additional, better solution.
Creating the scraper
So, I built the bones of the scraper in a few hours over the first couple of days of the year. I tested it on the then current PDF which was for period nine of 2016-17. That worked, first creating the clean CSV, then later adding the DB-write routines.
Boom – number 1
I then remembered that I had downloaded the previous period’s PDF. So I modified the code (to omit the downloading routine) and ran it to test the scraping routine on it – and it blew up my code. The format of the table structure in the PDF had changed with an extra blank link to the right of the first list of station names.
After creating a new version and publishing that, I sat back and waited for the publication of period 10 data. That was published in the middle of this week.
Boom – number 2
I re-ran the scraper to add that new PDF to my database – and guess what? It blew up the scraper again. What had happened? Scotrail had changed the structure of the filename of the PDF – from using dashes (as in ‘performance-display-p1617-09.pdf’) to underscores (‘performance_display_p1617_10.pdf’)
That change meant that my routine for sicking out the year and period, which is used to identify database records, broke. So I had to rewrite it. Not a major hassle – but it means that each new publication has necessitated a tweaking of the code. Hopefully in time the code will be flexible enough to accommodate minor deviations from what is expected without manual changes. We’ll see.
We’re ‘doing the wrong thing righter’ – Drucker
Of course, none of this should be necessary.
In a perfect world Scotrail would publish well structured, machine-readable open data for performance. I did email them on 26th November 2016, long before I started the scraper, both asking for past periods’ data and asking if they wanted assistance in creating Open Data. I got a customer service reply on 7th December saying that a manager would be in touch. To date (15 Jan 2017) I’ve had no further response.
The right thing
Abelio operates the Scotrail franchise under contract to the Scottish Government.
Should the terms of such contracts not put an obligation on the companies not only to put the monthly data into the public domain, but also that it be made available as good open data – and follow the Scottish Government’s on strategy for Open Data ? Extending the government’s open data obligation to those performing contracts for governments would be a welcome step forward for Scotland.
These are the six presentations made by the teams at the conclusion of Code The City 7, Health Hack, captured on periscope.tv.
Team Float My Boat
An enhanced prototype has been created, with plans to create a more complete version. Using postcodes and mapping it would be straightforward to consume good data from elsewhere if available.
Some community centres and churches have over 100 groups operating at some point in the month. They can be hugely valuable, but somewhat invisible to the internet. Just making the existence of many of these groups visible can be a big step.
Also discussion of the importance of occupational therapists, librarians, dog walkers – many different individuals in the community that can feed valuable information into this kind of platform – important to remember that it’s not just primary care data that matters.
Some interesting visualisations of the underlying data were also created, and led to some interesting discussions around assumptions that are made about data. Again, the value of having the experts in the room at a hack event was demonstrated, as assumptions were challenged, and analysis changed based on feedback. Such feedback can often take weeks to acquire – but was available during the presentation. A snapshot of the data is available on github, and you can see the visualisation here.
The team have a working prototype, with functioning logic to query the Aliss dataset and return three results vis SMS. Pulling json data from Aliss based on a query generated from the SMS exchange, and sending those results.
The team say that there is still work to do to make this production ready, and some of the language processing and logic could be improved – but getting a working prototype over the course of the weekend is a real achievement. You can see elements of the code on github.
The team have created a video prototype, which looks great. The full Polish translation is complete, and will be added to the video using youtube closed captions, as well as an audio overlay later.
The project is to be presented to a group of GPs later this week for feedback as to usefulness and likely impact. Code, and scripts, are posted to the team github page.
Team Delta Test
The limiting factor for this team has been the size of the datasets that they are working with, and the speed at which these can be moved around. Despite early setbacks with port access through the wifi (something we’re working on for the next weekend) the team were able to show some real results for the final presentation.
Some interesting findings around the geotagging, and inconsistencies that can arise. Some really interesting possible extensions to the project were discussed. The plan is to take this project ‘back to the office’ as the prototype for a full roll out to help optimise the use of lab support for GPs.
Team Friend Tree
This team found that overlaps between their objectives and those of other teams were significant, so concentrated on some of the more ‘marketing’ aspects of service delivery – identity, and some thoughts around messaging to bring people into the service.
A good example of a service that could be rolled out quickly on top of the kind of datasets being used by the Float your boat project.
In the lead up to Code the City 7 we sent attendees some blank Barrier and Opportunity cards. We asked them to complete and bring them – with a single suggestion or idea per sheet.
On arrival people were to stick them to the wall. The response was great – with an enormous display of creativity quickly assembled. Many of these suggestions grouped well together. As we got started, five volunteers stepped forward to be the champion for one idea each, which formed the starting point of each of the projects taken forward during the weekend. You can read more about these from this blogpost onwards. Even the drawings accompanying the ideas were great – see the montage above!
But what of the remaining ideas – of which there were dozens? I read each of them and have summarised some of them – often grouping several together – below. Each of these has merit as a potential area to explore further (perhaps at a future event).
- Find out how busy a GP practice is, before you register
This links number of a blog post I wrote recently about the ratio of GPS to patients at Scottish Surgeries.
- Information on GP practices
It is suggested that there is no consistency across the NHS Grampian area – with some good examples of websites and some poor.
- Waiting times for appointments at GPs’ surgeries?
Where is the data to show which days are busier than others. How could that help patients?
- Live Tracking of referrals to consultants
Patients, on being referred to a consultant are often left in the dark for weeks or months until a letter arrives. How could that be made transparent? Could we have a ‘track my referral’ as you would a ‘track my parcel’? How or when will you get an appointment with a consultant? Could you self select from calendar rather than get one which doesn’t suit and has to be changed.
- Lack of data interoperability between elements of health service / Health and Social Care etc.
- Assist GPS to do more online – self service – online calendars for appointments – meaning that they can spend longer with patients or reduce waiting times for appointments
- Citizen / Patient digital literacy
How could we assist patients to use digital services as these are developed. Which also raise the issue of health literacy – how could we assist people to understand their own health – e.g. cause and effect.
- Persuade / help GPs to get citizens to use informal / community-based support
- A shared calendaring across NHS Grampian to share training opportunities. Much training is common but is delivered is a siloed basis.
- Develop a common organogram showing remits, areas of operation across the formal and inform H&SC landscape
- Address the challenges of patients being treated in parallel between two specialists, so that they don’t feel that they are being passed from pillar to post.
These ideas alone would feed another three hack weekends! If you are interested in working or these – or sponsoring a further weekend such as this, please let us know!
Great progress overnight and through the morning. Very few drop outs overnight – keeping a real energy in the room.
Float your boat
The team have created a prototype website focusing on helping people find events and services locally. Includes stories about people improving their lives through accessing services.
Currently acting as a central hub for finding further information.
Have discussed turning it into an app, but clearly a web first approach seems the most appropriate at this stage. Discussion about the potential for local community ownership, or for a body like Health Partnership Development to take the lead.
A key observation was that the scope of ambition for the project has jumped from very ambitious and broad, to much tighter, and back again multiple times. Deciding on the scope to tackle took significant time, and was acknowledged as key to making progress.
Worth noting that the team is treating the sourcing and management of high quality data to be a parallel problem, likely tackled elsewhere.
The team has a paper prototype app aiming to guide people towards independently finding a way to take part in the local community.
Similar to the Float’ project, but focussed more tightly on social isolation issues and solutions.
They have been looking at the scoring, rating and categorisation of services and activities to aid in selection and guide people towards appropriate choices.
This team agreed with the importance of selecting a specific objective for the project – and to focus on that. Very easy to get distracted by related issues.
The unique element identified in the group discussion was the potential to allow the creation of some small groups. A fascinating example was the creation of a ‘take the bins out’ agreement among neighbours – helping people find help if they are away from home, and easing a ‘first contact’ event with new neighbours when you move home.
While the team ‘have nothing fancy to show’ they have made substantial progress since the last update, and are confident of having a well progressed project by the end of the day. Work has progressed on three fronts:
Data collection and insertion to new database.
Reporting layer, where work if focussed on generating mean values for overview presentation.
The Geo team have been translating postcodes into coordinates, and creating workflows and automation to allow this to happen as time goes on when new data and boundary changes happen.
An interesting discussion about availability of data about GP practices, (there is more than you might think, much of which can be reviewed here) and what can be done with it.
The language barrier project has focussed on refining the story told in the video and literature that it is creating. The discussion touched on the existing use of mobile phones as a primary translation tool for many people with English as a second language, especially when confronted with technical or medical terms.
Also discussed options to not only offer better translation access, but to offer language skills development services as a preferred approach.
This team have met a couple of technical barriers when tying their various elements together, but have achieved a number of key elements.
SMS messages are being relayed successfully.
A prototype of the service has been created in Java to simulate the interaction, on screen for now rather than by SMS.
Discussion has been primarily around the importance of marketing and communications around the service. Targeting of publicity thought to centre on food banks, shelters, pubs, chemists, community centres – all places with high footfall from the demographic groups the service would be most appropriate for.
The wider group identified this as a key tool in self management of long term issues, and something that would have a genuine impact.
Finally, a demonstration of some visualisation options using off the shelf visualisation tools to gain insights into the quality, coverage and usefulness of a data set.
Discussion around the demonstration identified the usefulness of the geographical visualisations in identifying differences and gaps in service levels from area to area.
Pre-pizza updates from the teams:
Since lunchtime the team have grabbed more coffee, created a big list of tasks, and been working on pulling Aliss data into their project. Work still to be done on the SMS layer.
Also discussing interesting natural language processing element to improve ease of use for the app.
Watch team Team Text on Periscope.
The team have created a script and video prototype in English with Polish translation underway. Web based version is in progress and likely to be complete early tomorrow.
Looking at options for animation of the video tomorrow.
Watch team Pomoc on Periscope.
Since lunch the group have wrangled some network issues which held up progress, but have completed initial database design, and are working on the data and reporting layers in parallel.
Watch Team Delta Test on Periscope.
Since lunch the team have worked on a web prototype of the front end of the service. Lowering participation through better data access, easier navigation and quality curation.
Watch team Friends Tree on Periscope.
Floating my boat
Since lunch have eaten sweets and cola. Community layer is vital to many health issues – a service discovery app.
They have created a number of user personas to enable
Four sections to the envisaged service:
- Folk that can help
- Your community
- Folk that can listen
- Events and getting about
Watch Float My Boat on Periscope.
Lunchtime updates from the teams:
Everyone has the right to access to information to make them well. Many people do not have regular access to the internet, best indicated by the deprivation index.
This project is looking at how to bridge this gap over SMS – allowing individuals to send a text with a location and theme such as “Aberdeen Anxiety” and receive a response either asking further questions, or suggesting a course of action. No web links, just telephone numbers and addresses.
Currently looking at quality of results, data protection issues, tech layer, how to pull information, how to store information, how to access and discover the service.
Pomoc is “help” in Polish. Many people are blocked from accessing appropriate services effectively due to language barriers.
This team is looking at the best ways to overcome this. They have discovered that in many cases it is cultural, rather than linguistic differences that are the biggest challenge.
Examples include cold remedies being available from doctors in Poland, but easily available in high street pharmacy in the UK. Seeing a doctor can waste GP time, lead to frustrating first appointments, and discourage further access.
They are currently working on the communication challenge around these differences, rather than on translation of medical terminology.
The team are addressing demand optimisation of GP requests into labs. The core problem is that many GPs do not have access to adequate information to select which tests to request – so request a full suite of tests.
Current system is very disjointed, excel based, and difficult to use. They’re working on an on-demand alternative to deliver easier access.
A Gumtree for friends. Social isolation is a significant problem. The team wants to tackle this with a theme of ‘finding volunteers and other ways to provide social contact’.
The research phase has discovered a number of apps delivering similar ideas. The team is keen to avoid recreating something that already exists. The next phase is to test existing options.
The Team (formerly known Torry Dolphin Watchers and now known as) Float My Boat – Signposting
The key challenge here is how to enable people to identify options for health and wellbeing services and other related activities.
Have identified the potential for an app that can be aware of individuals, carers, cared for, and even provide profile management for care professionals.
Looking at novel interfaces to encourage usage. Also looking at challenges around the sourcing and validation of data.
We have a great turn out for the first morning of Codethecity Health this morning – despite the venue change and the sub zero temperatures in Aberdeen.
Following the initial idea capture using our barrier and opportunity cards, and a few warm us exercises we formed teams around initial ideas and problem areas.
More updates about each of the teams as the day progresses, and on #codethecity.
Saturday 10th December from 0900 – 1600 at Bridge of Don Academy, Aberdeen, AB22 8RR
Sport Aberdeen and Code the City are inviting people from across Bridge of Don and the wider Aberdeen area to take part in a full day community workshop looking at active travel ideas. The day will consider ideas to develop an Active Travel Hub in Bridge of Don which can promote and support cycling and walking in the community.
The event will be structured across a whole day, and allowing for drop in attendance throughout the day.
You can choose to drop in either morning or afternoon – or even stay for the full day if you like.
The day will involve:
- Identification of potential opportunities or problems relating to the siting and functionality of an active travel hub in the Bridge of Don area.
- Group idea generation session to address each of these areas employing a variety of appropriate techniques in order to generate the best ideas possible.
- Team and group work to explore each idea – developing these to envision what future states might be.
- Iterative development of prototype ‘solutions’.
- Catering (teas, coffee, juices, snacks during the morning and afternoon and a sandwich / pizza lunch) for all participants.
Please register your interest via event brite https://www.eventbrite.co.uk/e/active-travel-hub-innovation-day-tickets-29474133928
For more detail on the event please contact Susan Fraser, Project Development Manager, Sport Aberdeen email@example.com
Code the City #8, which will take place in on Sat 25th to Sunday 26th February 2017, will be an exploration of the world of chatbots and AI (or Artificial Intelligence), identifying problems to tackle and quickly prototyping solutions.
>>> Book a ticket on our Eventbrite page
What are chat bots?
A chatbot is a piece of software that interacts with a customer or user to directly answer their questions. It uses existing data or information coupled with artificial intelligence to respond in a human-like way, guiding the user to a solution.
There are many examples of live chat bots in this exciting, emerging field. A chatboat could give you travel directions, tell you when its next going to rain in your area, or help you contest parking tickets. It could book you a flight and hotel, or act as a free lawyer to help the homeless get housing . The HBO series Westworld has even launched a bot to help you interact with the (fictional) holiday park!
If you are new to this field and want to get started we suggest you read the Complete Beginners Guide to Chatbots (and some of the links at the end of this article).
How will the weekend run?
We’ll apply our usual Code The City methodology:
- Bring together a diverse range of people from various backgrounds, to form teams.
- Identify problems that we’d like to apply chatbots to solve.
- Identify approaches, information and data, to guide how we develop the bots and train them
- Mix academic thinking, and user need, with open source technology and open data to develop new services
- Iterate quickly through approaches, testing ideas, failing quickly and refining our approaches.
- Prototype and demonstrate solutions to an interested audience
Who should attend?
- Service owners – and service providers
- Academics and students in the field of chatbots and artificial intelligence
- Data specialists
- Front-end and UX designers
- Bloggers and social media practitioners
- Anyone with an interest in getting involved in creating bots even for fun!
What you will do?
You will create mixed teams to workshop chatbot solutions to real world issues. Maybe these will building on the outputs of previous work we’ve done at CodeTheCity. Through rapid prototyping you will create new applications and have some fun in the process.
We’ll show you new techniques for service design, idea generation, prototyping, and rapid iterative application development – and you will show other participants some tricks and approaches, too. We’ll share knowledge and learning.
You might even get a Tshirt, and we can guarantee the best catering of any weekend workshop in the city!
To book a free ticket visit our Eventbrite page But be quick, tickets will go swiftly!
All attendees will get a year’s free membership of the Open Data Institute.
If you have any questions please get in touch.
How can I support this event?
If you are interested in sponsoring this event please, or providing other support such as access to online tools or services, please get in touch.
Useful Articles and Resources
- Chatbots Magazine
- Chatbots Aren’t A Fad – they’re a revolution
- Ten tools to build your own Chatbot.
- Building Bots for Service
- Eight Principles of Bot Design
- Chatbots – the ultimate prototyping tool.
- Introduction: Deep Learning for Chatbots, part 1 and part 2
- Eleven Examples of Conversational Commerce and Chatbots 2016
- If you are a Slack user, you can create a Slack bot to mimic your colleagues in Python.
>>> Book a ticket on our Eventbrite page