Friday, March 19, 2010

Responding to jacquesm's challenge

The other day on Hacker News a user posted an anonymous comment. Regular Hacker News participant jacquesm wanted to unmask the writer and posted a challenge to unmask the user.

He also emailed me because he thought I might be the anonymous user. I agreed to help him with a little bit of text mining. Jacques had a nice database of all Hacker News submissions and comments and gave me a 250Mb SQL file suitable for loading into MySQL. Unfortunately Jacques' database only had comments up until September 2009 so if the user who wrote the anonymous comment has joined recently then it won't be possible to identify them.

I quickly whipped up a naive Bayesian text classifier in Perl (similar to this one that I wrote about in Dr Dobbs five years ago) with one category per Hacker News user. I took the text of the anonymous comment and fed it to the classifier. The classifier used the complete set of comments for each user to train each category and then scored the anonymous comment against those categories.

Culling the list for users who have commented recently the 10 most likely users (according to a text classification) are (in order of likelihood): sh1mmer, stcredzero, gojomo, patio11, andreyf, anigbrowl, teej, physcab, thorax, run4yourlives.

Now none of those users actually commented on the thread in question. Assuming it was someone who was commenting and then switched to a different account the most likely candidates are (again in order): petercooper and jacquesm.

So did this approach work?

PS As a quick test of my classifier I ran it against one of jacquesm's own comments and got the following people in order of likelihood: jacquesm, geoscripting, kunqiana, jerf, mixmax.

Labels: ,

Thursday, March 18, 2010

A fascinating little beastie

Back in 2004 I was living in New York and commuting between New York and Washington, DC on the Acela. I was working in a fairly rural part of Virginia and was lucky enough to accidentally experience a once in 17 years event: the emergence of Magicicada Brood X.


(Picture from Wikipedia)

Now I realize that most people probably don't think that being in a place where millions of large winged insects appear from the ground and make a deafening noise in search of a mate is fun. But Magicicada is so cool that it's hard to miss their emergence. And Magicicada loves prime numbers.

For that reason, Magicicada Brood X (which lives along the Eastern coast of the US) is place number 127 in The Geek Atlas. Your next chance to meet the little beastie is 2021.

This particular brood spends 17 years underground living off the sap inside the roots of trees. Once ready to become an adult it burrows upwards and climbs its tree. Then when high up in the tree it molts and spreads its wings.

Once ready to fly it makes a humming sound which, given that millions emerge all at the same time, fills the air with an incredible din. The female cicadas make a clicking sound and between the humming and clicking the males and females find each other and mate.

They live only a few weeks above ground and new baby cicadas fall to the grown and burrow to the roots to start another 17 year wait.

To get a feeling for what it's like to be around when Magicicada makes its appearance, here's Sir David Attenborough:



There are other cicadas that live on a 7 and 13 year cycle. All these cycles are based around a prime number of years. The hypothesis is that the cicada uses a combination of emergence en masse and a prime numbered cycle to avoid predators. A prime numbered cycle means that the cicada rarely meets a predator and mass emergence would overwhelm any predator around.

A prime numbered cycle avoids predators because it doesn't have any factors. Suppose that the cicada had an 18 year cycle, any predator that peaked every 2, 3, 6 or 9 years could synchronize with the cicada and always meet it and eat it. By having a prime numbered cycle the cicada rarely meets a predator and predators have a hard time synchronizing with it.

For a more detailed look at the prime numbered cycle, you can read the paper Prime Number Selection of Cycles in a Predator-Prey Model.

PS If you've enjoyed this, consider buying my book.

Labels:

Tuesday, March 16, 2010

London Transport Museum: Acton Depot Weekend

This past weekend the London Transport Museum held an open weekend at its Acton Depot where they keep a collection of trams, trolley cards, buses and underground trains, plus all the associated equipment. They only open the depot twice a year so this was a chance to see some things that are rarely open to the public.

I didn't include this museum in The Geek Atlas but after a visit it's likely a candidate for a volume 2 since it is packed with interesting stuff.

Like a really big collection of old underground signs:


Or shielding used while constructing the tunnels for the London Underground:


And speaking of the Underground, here's a power control panel with meters indicating hundreds of amps and some serious on/off switches:


And a lovely mercury arc rectifier used to turn AC into DC (the Underground uses 630V DC power).


And here are the wheels of a 1930s trolley car:


And here's the control panel from an Otis elevator:


But the highlight was a ride on the prototype Routemaster RM-1 bus. I forgot to photograph it, so here's a picture from Wikipedia:

Labels:

Thursday, March 11, 2010

My bio

Occasionally I get asked for some sort of official bio. Here's one people can use:

John Graham-Cumming is computer programmer and author. He studied mathematics and computation at Oxford and stayed for a doctorate in computer security. As a programmer he has worked in Silicon Valley and New York, and the UK and France. His open source POPFile program won a Jolt Productivity Award in 2004.

He is the author of a travel book for scientists called The Geek Atlas and has written articles for The Times, The Guardian, The Sunday Times, The San Francisco Chronicle and New Scientist.

He is CTO of Causata. He can be found on the web at jgc.org and on Twitter as @jgrahamc.

Labels: ,

Wednesday, March 10, 2010

An Olympic honour for Alan Turing

Over at The Guardian I write:

Last year I led a campaign to obtain an apology for the mistreatment of the British mathematician Alan Turing. Turing's prosecution for homosexuality led to the death of a true genius at the age of only 41 in 1954. On 10 September last year, Gordon Brown issued an apology that recognised Turing's stature as one of the greatest Britons. But Britain has a final opportunity to unapologetically recognise Alan Turing in two years' time, at the 2012 Olympics.

Read the rest here.

Labels:

Tuesday, March 09, 2010

Did Monbiot try to understand climate science?

In The Guardian's Comment is Free section there's an article by George Monbiot called The trouble with trusting complex science which argues that:

The detail of modern science is incomprehensible to almost everyone, which means that we have to take what scientists say on trust.

He does this in the context of climate change science. I wonder if he actually tried to read the key paper that describes why we know that the global temperature is increasing. The paper is Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850. Go on, read it. I dare you.

The critical thing you need to be able to understand to understand that paper is... how to calculate an average. That's a GCSE level maths subject; here's a quick page to revise that in case you've forgotten how to average.

Because, you see, the entire process described in that paper involves the following steps:

1. Get temperature data (i.e. thermometer readings) at different places around the world for many, many years
2. Work out the average temperature at each location by averaging the values between 1961 and 1990 on a monthly basis. So you end up knowing things like the average January temperature at Heathrow.
3. Now go back and work out how much the temperature for any given month and year deviates from the average: all that means is subtract the average temperature from the observed temperature for the same month. Now you know how 'different' the temperature is. This is called the anomaly. If it's getting hotter the anomalies will get bigger.
4. Divide the globe up into squares 5 degrees on each side. Find all the thermometers inside each square, find their anomalies for each month and year. Average them to get an average anomaly for that square.
5. Take all the squares in the northern hemisphere, average their anomalies for each month and year. Draw a graph showing the temperature changing. Repeat the for southern hemisphere.
6. Now take the northern and southern hemisphere temperatures for each month and year and average them to get a global temperature anomaly chart.

Child's play? Yes.

I'll admit that the rest of the paper has some harder concepts (standard deviation, anyone?). But I'll wager that the real reason that people don't understand science is not because it's too hard to understand, but because they aren't motivated.

Yes, there are parts of science that require a lot of knowledge, but covering your eyes and not trying to understand is likely where many people go wrong.

Or to put it Monbiot's way:

My heart rebels against this project: I would rather be pelting scientists with eggs than trying to understand their datasets.

Labels:

Wednesday, March 03, 2010

A welcome bunch of amateurs

Here's me writing in The Guardian's Comment is Free section:

We're all the children of amateurs: amateur parents. There's no government department that will certify you as a parent (thankfully), nor a university department where you get your PhD in being a daddy, nor a professional body ready to strike you off for not following mothering standards. But any parent who's held a newborn child in their arms has unconsciously taken the amateur's oath: "I may not be a professional, but I'm going to do whatever it takes to act like one."

You can read the rest here.

Labels: