CTC8 – Chatbots and AI -final presentations

After two days of intense activity and a whole heap of learning for all of us, Code The City #8, our Chatbots and AI weekend came to an end at tea time on Sunday.

It couldn’t have happened without the generous sponsorship of our two sponsors: The Health Alliance, and Fifth Ring, for which we are very grateful.

The weekend rounded off with presentations of each project, four of which we’ve captured on video (see below).

Each of the projects has its own Github repo. Links are included at the end of each project description. And, two days later, the projects are still being worked on!

Team: ALISS

Team ALISS worked on providing a chatbot interface onto healthcare and social data provided via the ALISS system.

ALISS bot project : Code the City 8 from Andrew Sage on Vimeo.

You can find Project ALISS’s code here on Github.

You can also watch this video of Douglas Maxwell from the Alliance being interviewed about the weekend (although at the time of writing the video is offline due to an AWS problem).

Team: City-consult

This team aimed to make the quality of consultations better through using intelligent chatbot interfaces to guide users through the process – and to provide challenge by prompting citizens to comment on previous consultees’ input.

City-Consult bot project : Code the City 8 from Andrew Sage on Vimeo.

You can find the code for City-Consult at this Github repo.

Team: NoBot

The concept for NoBot came from an initial idea which was of a bot which would make scheduling meetings easier. That spawned the idea – what if the Bot’s purpose was to make you have fewer meetings by challenging you at every turn, and in the process the bot’s personality as a sarcastic gatekeeper was born.

NoBot project : Code the City 8 from Andrew Sage on Vimeo.

The code for Nobot lives here on Github.

Team: Seymour

Sadly there is no video of the wind-up talk for Seymour. In short the purpose of Seymour is to help you keep your houseplants alive. (More details to come).

You can find the code for Seymour at this repo on Github.

Team: Stuff Happens

We started this project with the aim to help citizens find out what was happening in the myriad of local events which we each often seem to miss. Many local authorities have a What’s On calendar, sometimes with an RSS feed. None we found had an API unfortunately.

We identified that by pulling multiple RSS feeds into a single database then putting a bot in front of it, and either through scripting or applying some AI, it should be possible to put potential audiences in touch with what is happening.

Further, by enhancing the collected data – enriching it either manually or by applying machine logic, we could make it more easily navigable and intelligible.

Expect a full write-up of the challenges of this project, and what progress was made, on Ian’s blog,

There is no video, but you an find the project code here on Github.

Team: W[oa]nder

This project set out to solve the problem of checking if a shop or business was still open for the day through a Facebook bot interface – as you with wander around, wondering about the question, as it were.

W[oa]nder bot project : Code the City 8 from Andrew Sage on Vimeo.

You can find their code here.

And finally we were joined by Rory on day two who set out to assist team Stuff-Happens through developing some of the AI around terminologies or categories. That became the:

Word Association Scorer

This is now on Github – not a bot but a set of python functions that scores a given text against a set of categories.

And Finally

We had loads of positive feedback from those who attended the weekend (both old hands and newbies) and from those who watched from afar, following progress on Twitter.

We’ve published the dates for CTC9 and subsequent workshops on our front page. We hope you can join us for more creative fun.

Ian, Andrew, Steve and Bruce
@codethecity

Scraping Goes Off The Rails

This post was originally published on 10ml.com by Ian Watt

The art of scraping websites is one beset by difficulties, as I was reminded this week when re-testing a scraper that I built recently.

Schienenbruch

 

Railway performance

As part of my participation in 100 Days of Code I’ve been working on a few projects.

The first one that I tackled was a scraper to gather data from the PDF performance reports which are published on a four-weekly cycle Scotrail’s website. On the face of it this is a straightforward things to do.

  1. Find the link to the latest PDF on the performance page using the label “Download Monthly Performance Results”.
  2. Grab that PDF to archive it. (Scotrail don’t do that – they vanish each one and replace it with a new one every four weeks, so there is no archive).
  3. Use a service such as PDFTables which has an API, uploading the PDF and getting a CSV file in return (XSLX and XML versions are also available but less useful in this project).
  4. Parse the CSV file and extract a number of values, including headline figures, and four monthly measures for each of the 73 stations in Scotland.
  5. Store those values somewhere. I decided on clean monthly CSV output files as a failsafe, and a relational SQLite database as an additional, better solution.

Creating the scraper

So, I built the bones of the scraper in a few hours over the first couple of days of the year. I tested it on the then current PDF which was for period nine of 2016-17. That worked, first creating the clean CSV, then later adding the DB-write routines.

Boom – number 1

I then remembered that I had downloaded the previous period’s PDF. So I modified the code (to omit the downloading routine) and ran it to test the scraping routine on it – and it blew up my code. The format of the table structure in the PDF had changed with an extra blank link to the right of the first list of station names.

After creating a new version and publishing that, I sat back and waited for the publication of period 10 data. That was published in the middle of this week.

Boom – number 2

I re-ran the scraper to add that new PDF to my database – and guess what? It blew up the scraper again. What had happened? Scotrail had changed the structure of the filename of the PDF – from using dashes (as in ‘performance-display-p1617-09.pdf’) to underscores (‘performance_display_p1617_10.pdf’)

That change meant that my routine for sicking out the year and period, which is used to identify database records, broke. So I had to rewrite it. Not a major hassle – but it means that each new publication has necessitated a tweaking of the code. Hopefully in time the code will be flexible enough to accommodate minor deviations from what is expected without manual changes. We’ll see.

We’re ‘doing the wrong thing righter’ – Drucker

Of course, none of this should be necessary.

In a perfect world Scotrail would publish well structured, machine-readable open data for performance. I did email them on 26th November 2016, long before I started the scraper, both asking for past periods’ data and asking if they wanted assistance in creating Open Data. I got a customer service reply on 7th December saying that a manager would be in touch. To date (15 Jan 2017) I’ve had no further response.

The right thing

Abelio operates the Scotrail franchise under contract to the Scottish Government.

Should the terms of such contracts not put an obligation on the companies not only to put the monthly data into the public domain, but also that it be made available as good open data – and follow the Scottish Government’s on strategy for Open Data ? Extending the government’s open data obligation to those performing contracts for governments would be a welcome step forward for Scotland.

Chatbots and AI – #CTC8

Code the City #8, which will take place in on Sat 25th to Sunday 26th February 2017, will be an exploration of the world of chatbots and AI (or Artificial Intelligence), identifying problems to tackle and quickly prototyping solutions.

>>> Book a ticket on our  Eventbrite page 

What are chat bots?

A chatbot is a piece of software that interacts with a customer or user to directly answer their questions. It uses existing data or information coupled with artificial intelligence to respond in a human-like way, guiding the user to a solution.

There are many examples of live chat bots in this exciting, emerging field. A chatboat could give you travel directions, tell you when its next going to rain in your area, or help you contest parking tickets. It could book you a flight and hotel, or act as a free lawyer to help the homeless get housing . The HBO series Westworld has even launched a bot to help you interact with the (fictional) holiday park!

If you are new to this field and want to get started we suggest you read the Complete Beginners Guide to Chatbots (and some of the links at the end of this article).

Example Travel Bot
Example Travel Bot
Example Waste Bot
Example Waste Bot

How will the weekend run?

We’ll apply our usual  Code The City methodology:

  • Bring together a diverse range of people from various backgrounds, to form teams.
  • Identify problems that we’d like to apply chatbots to solve.
  • Identify approaches,  information and data, to guide how we develop the bots and train them
  • Mix academic thinking, and user need, with open source technology and open data to develop new services
  • Iterate quickly through approaches, testing ideas, failing quickly and refining our approaches.
  • Prototype and demonstrate solutions to an interested audience

Who should attend?

  • Service owners – and service providers
  • Academics and students in the field of chatbots and artificial intelligence
  • Coders
  • Data specialists
  • Front-end and UX designers
  • Bloggers and social media practitioners
  • Anyone with an interest in getting involved in creating bots even for fun!

What you will do?

You will create mixed teams to workshop chatbot solutions to real world issues.  Maybe these will building on the outputs of previous work we’ve done at CodeTheCity. Through rapid prototyping you will create new applications and have some fun in the process.

We’ll show you new techniques for service design, idea generation, prototyping, and rapid iterative application development – and you will show other participants some tricks and approaches, too. We’ll share knowledge and learning.

You might even get a Tshirt, and we can guarantee the best catering of any weekend workshop in the city!

To book a free ticket visit our Eventbrite page   But be quick, tickets will go swiftly!

All attendees will get a year’s free membership of the Open Data Institute.

You can find out more about the previous events on tumblr, on the eventifier, and on flickr.

If you have any questions please get in touch.

How can I support this event?

If you are interested in sponsoring this event please, or providing other support such as access to online tools or services, please  get in touch.

Useful Articles and Resources

>>> Book a ticket on our  Eventbrite page

Journeygrid

“We should build one for here!”

So starts another Codethecity conversation on discovering a neat data driven tool. This time it’s the excellent New York subway toy created by Jason Wright.

Brand_New_Subway

The tool allows you to redesign transit provision in the city by building new subway routes. By adding new stations. By removing or moving existing lines.

It’s addictive and fascinating.

As is so often the case, we then start riffing on what it could also do. It could time travel using that tram data we have from the early 1900s. It could give alternate route options if we hook up to that academic project we spoke with earlier in the year. It could carbon count. It could give safety information for cyclists. We could data collect with a new app to feed it improved validation data…

Before we have the cake we’re discuss how pretty the icing will look.

In reality what we should be looking at is the bottom layer. The underpinnings.  The data.

Where do people live? Where do they work? Where do they school run? Where is the football stadium and where do the fans live? Where are the shops and where is the money?

We’re going to start with the commute. Where do people start, spend, and end their day? How do they move around? And when? No agenda. No grand insights planned. Just a good solid data gathering and modelling project.

We’re calling it journeygrid.

journeygrid open data transportation project

If you have any data, or methodologies for gathering and storing such data we’d love to speak to you.

You can find out more about the New York Subway project here, and you can play with it here.

Tourism Hack – Perth – TBC

PLEASE NOTE – Due to low take-up this event has been postponed. We are sorry for any inconvenience this will cause. 

Perth wants to boost its tourism offer and wants some help!. They want to see whether some well developed apps could help the city and its wider area bring attractions, trails, events, culture,accommodation, eateries; and activities to life.

They are also interested in bringing the quirky and interesting aspects of the city together, using great images and interesting user generated content through social media.

==================================================
=
= Update
=
= DATA sOURCES aDDED On Github
=
==================================================

They have developed the website http://www.perthcity.co.uk/ and there is an app (http://www.mi-perthshire.co.uk/ ) but want some creative minds to take a fresh look at the city and surrounding area, generate new ideas that they could then develop into some new apps, open data or other projects.

As always we’re looking for coders, designers, data wranglers, service users and providers, bloggers – in fact anyone with an interest – to join us for a weekend of ideation, creation, open data and rapid prototyping.

We’ll feed you, keep you stimulated, and provide good wifi. You will leave with a sense of accomplishment, new skills and potentially new friends.

Accommodation.

We’ve uploaded a list of hotels in this Perth City Accommodation List.

In addition there are a cluster of B&BS on Dunkeld Road.

Also, just outside the city itself, The Lodge at the Perth Racecourse are offering a flat rate of £90 per night in a Double or Twin bedded room (£45 per person), which also includes a full breakfast. See  http://perthlodge.co.uk/dining

New Code the City dates set

We’ve set the dates for the next three Code The City events. Please add these to your diaries. Bookings will open soon.

Code The City #7

Dates: Saturday 19th  – Sunday 20th November 2016
Location: Aberdeen University (tbc)
Theme: Health
More info /  Tickets

Code The City #8

Dates: Saturday 25th  – Sunday 26th February 2017
Location: Aberdeen University
Theme:  Chatbots and AI
More info /  Tickets

Code The City #9

Dates: Saturday 13th  – Sunday 14th May 2017
Location: Aberdeen University
Theme:  Transport (tbc)
Further details to follow.

======================================================

The following event will be rescheduled:

Code The City #Perth

Dates: TBC in 2017
Location: Perth College UHI
Theme: Tourism

So, how did CTC6 – The History Jam go?

Intro

On 19th and 20th March we found ourselves back at Aberdeen Uni with 35 or so eager hackers looking to bring to life a 3D Virtual Reality historic model of Aberdeen city centre using new open data. So how did it go?

This time we were more prescriptive than at any previous Code The City event. In the run up to the weekend we’d identified several sub-team roles.

  • Locating, identifying and curating historic content
  • Transcribing, formatting and creating valid open data
  • Building the 3D model, fixing and importing images and
  • Integrating and visualising the new data in the model.
Andrew Gives us an Open Data Briefing
Andrew Gives us an Open Data Briefing

After some breakfast, an intro and a quick tutorial on Open Data, delivered by Andrew Sage, we got stuck in to the work in teams.

Old Books into Open Data

We were lucky to have a bunch (or should be a shelf-ful) of city librarians, an archivist and a gaggle of other volunteers working on sourcing and transcribing data into some templates we’d set up in Google Sheets.

Given that we’d been given scanned photos of all the shop frontages of Union Street, starting in 1937, of which more below, we settled on that as the main period to work from.

The Transcribers
The Transcribers

The librarians and helpers quickly got stuck into transcribing the records they’d identified – particularly the 1937-38 Post Office Directory of Aberdeen. If my arithmetic is correct they completely captured the details of 1100+ business in the area around Union Street.

At present these are sitting in a Google Spreadsheet – and we will be working out with the librarians how we present this as well structured, licensed Open Data. It is also a work in progress. So there are decisions to be made – do we complete the transcription of the whole of Aberdeen – or do we move onto another year? e.g. 1953 which is when we have the next set of shopfront photos.

We have a plan
We have a plan

Music, pictures and sound

At the same time as this transcription was ongoing, we had someone sourcing and capturing music such might have been around in 1937, and sounds that you might have heard on the street – including various tram sounds – which could be imported into the model.

Sounds of the city
Sounds of the city

And three of us did some work on beginning an open list of gigs for Aberdeen since the city had both the Capitol Theatre (Queen, AC/DC, Hawkwind) and the Music Hall (Led Zeppelin, David Bowie, Elton John) on Union Street. This currently stands at 735 gigs and growing. Again, we need to figure out when to make it live and how.

The 3D Model

At CTC5 back in November 2015, Andrew Sage had started to build a 3D model of the city centre in Unity. That relied heavily on manually creating the buildings. Andrew’s idea for CTC6 was to use Open Streetmap data as a base for the model, and to use some scripting to pull the building’s footprints into the model.

Oculus Rift Headset and a 1937 Post Office Directory
Oculus Rift Headset and a 1937 Post Office Directory

This proved to be more challenging than expected. Steven Milne has written a great post on his site. I suggest that you read that then come back to this article.

As you’ve hopefully just read, Steve has identified the challenge of using Open Streetmap data for a project such as this: the data just isn’t complete enough or accurate enough to be the sole source of the data.

While we could update data – and push it back to OSM, that isn’t necessarily the best use of time at a workshop such as this.

An alternative

There is an alternative to some of that. All 32 local authorities in Scotland maintain a gazetteer of all properties in their area. These are highly accurate, constantly-update, and have Unique Property Reference Numbers (UPRNs) and geo-ordinates for all buildings. This data (if it was open) would make projects such as this so much easier. While we would still need building shapes to be created in the 3D model, we would have accurate geo-location of all addresses, and so could tie the transcribed data to the 3d map very easily.

By using UPRNs as the master data across each transcribed year’s data we could match the change in use of individual buildings through time much more easily.  There is a real need to get the data released by authorities as open data, or at least with a licence allowing generous re-use of the data. ODI Aberdeen are exploring this with Aberdeen City Council and the Scottish Government

Fixing photos

We were given by the city’s Planning Service, scans of photos of shopfronts of Union Street from a number of decades from 1937, 1953 and on to the present. Generally the photos are very good but there are issues: we have seams between photos which run down the centre of buildings. We have binding tape showing through etc.

A split building on Castle Street.
A split building on Castle Street.

These issues are not so very difficult to fix – but they do need someone with competence in Photoshop, some standard guidance, and workflow to follow.

We started fixing some photos so that they could provide the textures for the building of Union Street in the model. But given the problems we were having with model, and a lack of dedicated Photoshop resource we parked this for now.

Next steps

Taking this project forward, while still posing some challenges, is far from impossible. We’ve shown that the data for the entire city centre for any year can be crowd-transcribed in just 36 hours. But there are some decisions to be made.

Picking up on the points above, these can be broken down as follows.

Historical Data

  • Licensing model to be agreed
  • Publishing platform to be identified
  • Do we widen geo-graphically (across the city as a whole) or temporally (same area different years)
  • Creating volunteer transcribing teams, with guidance, supervision and perhaps a physical space to carry out the work.
  • Identify new data sources (e.g. the Archives were able to offer valuation roll data for the same period – would these add extra data for buildings, addresses, businesses?)
  • Set up a means for the general public to get involved – gamifying the transcription process, perhaps?

Photos

  • Similar to the data above.
  • We need clear CC licences to be generated for the pictures
  • Crowdsource the fixing of the photos
  • Create workflow, identify places for the pictures to be stored
  • Look at how we gamify or induce skilled Photoshop users to get involved
  • Set up a repository of republished, fixed pictures, licensed for reuse, with proper addressing system and naming  – so that individual pictures can be tied to the map and data sources

The 3D Model

  • Build the model
  • Extend the coverage (geographically and through time)
  • Establish how best to display the transcribed data – and to allow someone in the 3D environment to move forward and back in time.
  • Look at how we can import other data such as a forthcoming 3D scan of the city centre to shortcut some development work
  • Look at how we can reuse the data in other formats and platforms (such as Minecraft) with minimum rework.
  • Speed up the 3D modelling by identifying funding streams that could be used to progress this more quickly. If you have suggestions please let us know as a comment below.

Taking all of this forward is quite an undertaking, but it is also achievable if we break the work down into streams and work on those. Some aspects would benefit from CTC’s involvement – but some could be done without us. So, libraries could use the experience gained here to set up transcribing teams of volunteers – and be creating proper open data with real re-use value. That data could then easily be used by anyone who wants to reuse it – e.g. to create a city centre mobile app which allows you to see any premises on Union Street, call up photos from different periods, find out which businesses operated there etc

As the model takes shape and we experiment with how we present the data we can hopefully get more attention and interest (and funding?) to support its development. It would be good to get some students on placements working on some aspects of this too.

Aberdeen City Council is working with the Scottish Cities Alliance to replace and improve the Open Data platforms for all seven Scottish cities later this year – and that will provide a robust means of presenting and storing all this open data once in place but in the mean time we will need to find some temporary alternatives (perhaps on Github ) until we are ready.

We welcome your input on this – how could you or your organisation help, what is your interest, how could you assist with taking this forward? Please leave comments below.

Code The City 6 – The History Jam was funded by Aberdeen City Council’s Libraries service and generously supported by Eventifier who provided us with free use of their Social Media platform and its LiveWall for the sixth consecutive time!.

History Jam – #CTC6

The History Jam (or Code The City #6 if you are counting) will take place on 19-20 March 2016 at Aberdeen University. You can get one of the remaining tickets here.

As an participant, you’ll be bringing history to life, creating a 3D virtual reality map of a square mile of Aberdeen’s city centre. You’ll be gathering data from a variety of historical sources, transcribing that and creating new open data. You’ll import that into the the 3D model.
And there will also be the opportunity to re-use that data in imaginative new ways. So, if you are a MineCraft fan, why not use the data to start building Minecraft Aberdeen.
This is not one of our usual hacks, whatever that is! This time around instead of you proposing problems to be worked on, we’ve set the agenda, we’ll help form the teams, and provide you with more guidance and support.
If you come along you’ll learn open data skills. And you’ll get a year’s free membership of the Open Data Institute!

Saturday’s Running Order

09:00 Arrive in time for fruit juices, coffee, pastries, or a rowie.

09:30 Introduction to the day
09:45 Briefing of teams and, if you are new to Open Data, a quick training session

10:15 Split into three streams:

  • Sourcing and curation of data, and structuring capture mechanisms
  • Transcribing,  cleaning, and  publishing open data
  • Creating the 3D map, importing and visualising the data

CTC-6-Flow1

Throughout the day we’ll have feedback sessions, presenting back to the room on progress. We’ll write blog posts, create videos, photograph progress.

13:00 Lunch (the best sandwiches in Aberdeen)

More workstream sessions with feedback and questions.

17:30 (or so) Pizza and a drink

We’ll wind up about 8pm or so if you can stay until then

Sunday’s Agenda

09:30 arrive for breakfast

10:00 kick off

Morning sessions

12:30 Lunch

Afternoon sessions

16:00 Show and Tell sessions – demonstrate to the room, and a wider audience, and preserve for posterity what you’ve produced in less than 36 hours. You’ll be amazed!