Category: Data Journalism

Uni applications drop off – but which subject area is hardest hit?

Thanks to a hike in tuition fees, there has been a drop off in the numbers of people applying to UK universities, compared to the 2011 figures.

Usefully, the Guardian has posted the numbers on their Datablog and I’m starting to munch through the data

Here’s the first set of findings – by subject area grouped into discipline, thanks to Wikipedia’s List of Academic Disciplines

MP’s subsidized dining rooms

So MP’s are expected to pay the equivalent of “high street pub” prices when buying food in the subsidized restaurant in House of Commons are they? (Telegraph)

Let’s see if that’s the case.

Taking 5 dishes mentioned in the article above – I compared them to an equivalent TYPE of dish at Wetherspoons, Walkabout and All Bar One. (click for larger image)

Sources:

http://www.jdwetherspoon.co.uk/

http://www.allbarone.co.uk/

http://www.walkabout.eu.com/

And special thanks to @keridavies (http://www.keridavies.net)

 

 

 

 

data header

5 ways to gather data

Before you can begin to draw fancy charts, visualisations and create in-depth hard hitting stories about data – you need to find the data in the first place.

Here’s how I sourced the data for my Datamud project, a look at the statistics behind the big UK music music festivals.

1. SEARCH

Official Site

The last thing you want to do is call up a press officer asking for some stats, when they are there, for all to see, on the website. Dig around in any areas labelled information, statistics, FOI and Press Area. Often companies will post useful statistics if they are often requested,but they won’t necessarily make those statistics easy to find.

The Glastonbury Festival Educational Resources area is rich with information. A series of PDF’s contain details about every element of the event – from crowd management, security, stalls, sanitation etc. As the UK’s largest festival is is often the subject of assignments and reports. This was useful as I looked for recycling information to back up the organisers claims that they are a green event.

Google

Google is a wonderful tool – it not only searches websites, but also blogs, news postings, pictures and videos. It’s well worth checking the NEWS section as someone else may have already done similar research and posted the stats online.

Unfortunately a search can return thousands of pages, so you need to be smart when submitting your search. Inverted commas around a phrase will search for those words as written, but combined with simple searches it can be a useful tool.

e.g. “were arrested” 2010

Don’t forget to check the later pages of the search too – sometimes you will find some juicy stuff buried on the less Google juicy sites.

Governing Bodies

Often Google won’t be able to pick up deep linked pages, or documents embedded or linked in pages so it’s always worth looking at official agencies and Governing bodies websites too.
Councils and the Government are now much better at archiving their agendas and minutes and whilst the search facilities are still pretty archaic and frustrating, it’s a start.

None of the various police forces websites had the crime stats that I needed, although they do often have documents that may be of use e.g. Leicestershire Police

Search / Scraping Sites

Although I did not use this during this assignment, in retrospect using a site like Scraperwiki to access data from an official site would have saved me a lot of time. I could have used it to draw together all the line ups, for example, instead of a long winded cut-and-paste process, and plenty of cleaning up.

Nowadays there are also sites that have done a lot of the work for you, by monitoring official sites and databases and turning the data into an easy to handle format.

First stop should be What Do They Know – a site geared up around FOI requests (more on this in a moment) but also you should definitely visit TheyWorkForYou (I set up an alert in regards to the Glastonbury festival, which would tell me whenever it was mentioned. My hope was that the crime levels, or crowd management would be raised at some point and reference to information given.)

Interest Sites

I mentioned Google News search above, but it’s also worth looking for sites that deal with the specific subject area. They may have useful resources but may not appear on page 1 of a Google Search.

When I was compiling lists of the bands playing the various festivals, often the official sites were clunky or the names were shown on a JPG of the official event poster. However festival news/interest sites, such as EFestivals, present the information in a more useful way

2. ASK PRESS OFFICE

For archive or very up to date statistics, often a call to the press office is necessary.

I wanted to find out more about historical weather forecasts so a visit to the MetOffice website informed me that they had a library of data that could be accessed. Within one quick email conversation I was furnished with a link to a host of archive weather data with records often going back to the 1700′sIn CSV format, these were simple to manipulate and visualise.

Press Offices are used to to dealing with requests for information, its their job, as well as being happy to help you meet deadlines.

3. FOI

FOI requests are for those tricky bits of data othat an organisation is less reluctant to send out (for time, size, sensitivity etc issues). I set ONE FOI request, for crime stats to a police force, foolishly thinking this would be quicker than contacting the press office directly. It was not.

Use these if you do not need the information urgently (it can take up to a month from start to finish)

Interesting article on FOI Requests from Channel 4

4. CROWDSOURCE

Of course carrying out ryour own research is one way of gathering data, but this project relied on the theory that “many hands make light work”.

I wanted to find out how much it would cost to see the various mainstage bands, if you were to see them on their own headline tours. I could have spent DAYS trawling the internet ticketing sites (both UK and international) collecting the data. Instead I started a public Google Docs spreadsheet. Through the social networks I encouraged people to enter the prices of tickets they had recently bought. The database was soon a third full, and a chance message from an old friend (the man behind Ents24) completed the rest by gaining access to their database.

Google Docs is a fantastic way of collaborating and getting large jobs completed.

5. I GOT MY CALCULATOR OUT

This can be hard work if you are dealing with a lot of data, but for me it was feasible

I wanted to assess the nationalities of the various bands, and compare the overall nationalties of the different lineups. This involved a lot of searches on Myspace and Wikipedia (still both very useful resources for the facts about bands) and using visualisation Software Tableau.

In retrospect I should have doubled this database up with the ticket prices one, and asked people to fill in the nationalities of the bands as well. Hindsight is a wonderful thing.

datamud_edited-1

Datamud

For the past few weeks I have been working on another project for my MA Online Journalism course.

This is an investigation into some of the facts and figures of the UK music festivals.

The site will be updated over the next few days, so stay tuned and, of cours,e any feedback, much appreciated.

http://www.datamudwordpress.com

jonny dorey x

Another flash project …

As I gather my portfolio together for my MA Online Journalism Multimedia module, I discovered my first ever Flash project.

Sad story, a British student missing in America, where he was studying.

I decided to take the facts of the story and turn it into a roll-over breakdown.

It’s basic, but it works. It still needs an embedded link (to the Facebook group) and some embedded video, but it works as a basic test of the theory.

Get Adobe Flash player

festival map_edited-1

UK Festival headliners map

Get Adobe Flash player

As a keen festival goer, I thought it would be interesting to MAP where some the larger bands can be seen playing this summer.

I am hoping to work on a larger version of this map for a later assignment, but this is a taster of what is to come!

I took the 6 big UK festivals, Glastonbury, T in the Park, Reading/Leeds, V Festival, Download and Sonisphere and noted down all the bands who headlining, whether that is the main stage or second stage.

I wanted a clickable flash map where you could see where your favourite band was playing, and if they were playing multiple events over the summer.

METHOD

  • Find headliner information from official festival website
  • paste a map of the UK into new Flash CS4 document
  • Create a second layer and write a list of bands names down the left hand side (leaving room for further additions as Reading/Leeds have not announced any bands yet)
  • Turn each band name into a BUTTON, with the text turning red and showing red points, Festival name, and appearance date on the map for the OVER, DOWN and HIT options
  • export map as *.swf file, upload into WordPress and embed

PROBLEMS

I wanted to create a second tier to the map, where the user could click on the Festival point on the map and be shown all the band playing (marked with red boxes around their names)

Unfortunately the map was too crowded with “hotspots” and became messy

I may still simply add a list of festival names NEXT to the map, so the user can click on those and see all the bands playing.

There is a slight glitch with the map in that it has turned the FESTIVAL NAMES TEXT BOXES into buttons – so if the mouse rolls over those, it highlights one of the bands playing. (If it highlighted ALL of them, that would have solved my above problem, but it does not)

I also had a few problems working out how to embed the file into Wordress. The solution was simple

  • upload the file into Media
  • install the Kimili Flash Embed Tag plugin
  • Type in the name, tweak the size and it’s done!
itunes analysis

Looks like I’m not into metal any more Toto

Data can be an interesting and eye opening thing.

I decided to cut and paste some sections of my ITunes library into Google Docs and create a data set from Artist Track, Genre and Plays.

  1. sort tracks by PLAY COUNT
  2. remove TIME, BITRATE, DATE ADDED and TRACK NO columns
  3. scroll down to the bottom of the tracks with “2″ plays
  4. select every song with 2+ plays
  5. CTRL+C
  6. open a blank spreadsheet (I use Google Docs) and CTRL-V into the top left corner of the page
  7. the Itunes data appear in the Spreadsheet

Obviously this data is immediately out of date, so I am looking now into turning this into a live feed. As a PC user ITunes stats is not an option.

Points to Note

  • I often listen to Spotify instead of Itunes at home
  • I only listen to Itunes when I am working – this does not take into account Ipod plays, or CD listening in car
  • genre categorizations on Itunes can be questionable

So the first chart:

I’m not sure what I find more interesting – that metal is SUCH a tiny category (smaller than Country, worryingly) or that I seem to really like pop. I will investigate this further. Ok – a quick tweak to the options (colour to genre and label to ARTIST) showed that, phew, Ive not turned into a pop-loving indie kid just yet. It’s just that someone thinks Celldweller (industrial drum n bass noise) is alternative (see for yourself). (See, mislabelling , very deceiving)

NEXT STOP:

  • Find a way to make my Itunes data public, feed this into a live chart.
  • Create a flash animation using one of these charts, with shooty out bits that play music from that artist or genre …
  • Stop messing around with data for today and make some tea.