Friday, 27 March 2009

United we tag

A depth first and pruned 3D rendering of the tags similar to 'obama'.   The video should be embedded below, if it is not then it can be found here.

Thursday, 26 March 2009

Mockneys

This post was originally published in 2006 and was lost when my disk crashed.

The Metropolitan Police, Linguistics Division are today on the hunt for a number of dangerous criminals. Raids are expected all over London as the Police crack down on the fake accents of celebrities. Suspects include:
  • Vernon Kay for his attempts to convince people he really is from Bolton.
  • Jamie Oliver for his mockney twitter.
  • Russell Brand for his Elizabethan tone which leaves the young women he bogarts* in a dazed and confused state.
  • and finally Sir Alan Sugar who is alleged to have use voice enhancing drugs during filming of The Apprentice to ensure his accent is as cockney as the day he first started out selling shoe laces on the streets of Hackney.
*bogart (verb) To keep something all for oneself, thus depriving anyone else of having any. The act of persuing a woman through charm, sharp suits and expensive dinners.

Wednesday, 25 March 2009

The Formula Formula

London, UK. Scientists have finally uncovered one of the most elusive secrets of modern tabloid journalism: the so-called 'formula formula'.

The editor of 'Super News Celebrity Fun' explains: 'picture the scene: it's 11PM on a Monday. The final copy is due in 20 minutes and there's still half a page unfilled because the pullout 'grow your own clothes' ran under-length. Our only recourse is to ring some media-friendly pseudo-scientists for a 'madcap boffin formula' that renders some banal aspect of everyday life in nauseatingly pedantic mathematics. Recent 'successes' include 'duncability analysis' and the 'Brittany Spears Ratio'.

'Duncability' is an approximated count of the number of biscuits that can be 'dunked' into the unit cup of coffee before the unit research group runs out of funding. The formula is complicated but elegant:

Dunkability = mu / gamma + 2 * phi.

Where mu is the number of biscuits in a packet, gamma is the mass of the earth divided by its angular momentum and phi is very complicated, you probably wouldn't understand.

'The problem', sighed the editor, 'is that with this whole credit crunch thing going on it's a lot harder to get enough junk science to fill our pages'. It is for this reason that interest in the 'formula formula' has peaked.

The 'formula formula' is a formula that generates all manner of nonsense formulae (a meta-formula) without the need for a misguided research group.  The formula formula is imprecise and is more a recipe for creating further nonsense than an actual formula.  The most important aspect of the guide dictates that subject of the research must be able to be prefixed with 'formula for the perfect ...', e.g. 'toast, woman, haircut, etc.'. The second key ingredient is plenty of Greek letters, these summon up images of bearded men (and women) of science in white coats, lending the formula an air of credibility. If this isn't sufficient simply replace every instance of 'scientist' and of 'researcher' with 'boffin'.




The final ingredient is to ensure that you use language that completely alienates the reader from scientific enquiry for themselves. If you are to televise your formula ensure that the narrator is standing in front of a blackboard with as many wave equations as you can muster.   For larger 'feature' length articles  include a picture like the one to the right.

Next week: 'knit yourself thin'.

Tuesday, 24 March 2009

Pirates

Inspired by the exploits of Somali pirates, local pirates off the cost of Southampton have gone one further and stolen one square mile of the English Channel.

The coastguard has advised sailors to avoid the affected area which is located twenty miles due South North South of the Isle of Dogs (becoming cyclonic). It is thought than in an audacious heist pirates pumped the precious salt-water 200 miles inland into an evil network of disused swimming pools, leaving a gap as wide as it is fat.

This is not the first the high seas have been stolen, in 1998 two youths stole some of the Atlantic and part of the Thames. If anyone has seen the Seine or any of the Channel Islands they are advised to contact their MP.

Monday, 23 March 2009

In abstract

After hearing the story of the Black Swan it is impossible not to feel (at least slightly) smug.   As programmers, however, we often have to deal with objects from the 'real world', (which to remind people that we are sort-of scientists,  we call 'the problem domain', or if we're sitting opposite a logician: the 'domain of discourse').

Nomenclature aside, it is often our task to build a model of the real world, a world about which we do not have complete information.   A model (my CS inspired definition) is really just the result of taking salient features of the real world and representing them in a way that is suitable for a computer to do something useful with.  If you hang around with enough programmers for long enough the word 'abstraction' will rear its head, which (at least on my blog) is a synonym for the act of modelling.

Modelling (or abstraction) is favoured by programmers because it allows great swathes of the real world to be captured by very little code.  Imagine you are building a till/checkout system for a supermarket, which to reduce the size of this post, contains only 3 items:
  1. bread
  2. milk 
  3. the guardian
Let us implement this as naively as possible:
  • scan bread, add £1.10 to the bill, display total on screen
  • scan milk, add £0.90 to the bill, display total on screen
  • scan the guardian, add £0.70 to the bill, display total on screen
The problem with this code is that we're treating every item with too much reverence.  For every extra item we add to the shop (a fine cheese for example) we'd have to add some more code.   We don't really care about the specifics of each item, all we care about is their price.   By treating the goods of the supermarket as  'priced products' we can simplify the logic:  (although unrealistic, we will assume the price is encoded in the barcode):
  • scan priced product X, convert barcode to price, display total.
If you've ever received junk mail you can be sure it was generated by treating all 'customers' as 'named addressable units', not as the unique individuals all my readers inevitably are.  Back to our example, the shopkeeper has been adding extra products to his store without need to update any software until one day he decides to add a 'pick and mix' counter.  His 'pick and mix' bags have a barcode, but the price is determined by the bag's weight, not just the barcode.  

To fix this we have to add an exception or specialisation:
  • if product is 'pick and mix' weigh product, calculate price times weight, display total
  • else scan priced product X, convert barcode to price, display total 
I'd hoped this post would lead nicely into an overview of default reasoning, where we accept a default argument (the price of an item is determined by its barcode) unless we have further information (the item 'pick and mix' and should be weighed) but I have had more fun writing about a fictitious supermarket.   Sorry!

Transport

The government announced today that the District Line is to join the long list of National Parks. The line was selected primarily for its proximity to central London, but its serenity and tranquilly were all key to its selection. Inspectors OHMNPATF (On Her Majesty's National Park Acquisition Task Force) rejected both the Central and Piccadilly lines noting that moving trains presented a significant risk to picnicking families. It is expected that Earl's Court station will be repurposed; creating a two storey cafe and visitors' centre. Holborn station will return to its pre-war role as an owl sanctuary. Entry to the park will cost 1.50 per badger and the park opens from 6AM to 12PM closing briefly at 8AM to allow the 08:01 to Uxbridge to settle just outside Earl's Court. Commuters are advised not to add any extra time to complete their journey.

Money news

London, UK. The pound has continued to tumble and is now trading at 6 litres a barrel. Traders are quick to blame the fall on 'quantitative quantity'. They argue that the meagre size of the average pound coin is reducing confidence in sterling.

Financial analysists predict that simply increasing the size of a pound coin to the size of a small grapefruit will restore confidence to post-war levels.

The new coins will capture our glorious monarch (in full high definition: 2 billion pixels per square furlong) updating her Twitter status. The new coins will go on sale from April and are expected to cost around £1.50.

Tuesday, 17 March 2009

Reading this may or may not make you smarter

"The bus always leaves when I reach the end of the road"

Imagine you are the proud owner of a (admittedly irregular, but fair) four-sided dice, instead of the usual dot patterns the faces are marked with 'N', 'S', 'E' and 'W'.  Imagine also that you are standing at the base of Trafalgar Square intending to go for an unusual stroll.  Before taking a step you roll the dice.  You observe the result and take one pace in the direction indicated by the dice (North, South, East or West).  You repeat this until your legs wither.  Where do you end up and what does your route look like?

It might look something like the image to the right.  Such an exercise is a 'random walk', the mathematical implications of which we will not discuss further.   I've told you how this graph was constructed, but given the only the graph would you have been able to determine (or even guess) what created it?   If I had told you that this graph showed the most polluted zones in a city you'd probably want to live in the extreme North-West of the diagram.  

The problem for humans as decision makers is that we are only presented with the physical results of a process not the generator.   Nassim Nicholas Taleb states this problem better than I ever could, in the context of trading.  He uses a coin toss but we can equally apply our random walk.  Let us start with 4000 traders, each with their own dice.  At each step we eliminate (remove from the game) those who fall behind the most northerly traders and reward the remaining traders with 10,000 pounds and a bigger office.  At each step we lose (on average and approximately) 3/4 of traders (and their massive offices) since rolling anything other than a 'N' sees you ejected from the game (well, ignoring the pathological case were no trader rolls a 'N').    Imagine you are recruiting for an investment fund, who do you pick to manage it?  I'd pick one of those superstar traders striding forward.

Unfortunately we have to make decisions based on visible perturbations of the world - when standing in the middle of the road too much philosophising about the nature of  traffic flow results in one less philosopher.   A tool that has served us well, or at least during our rise to the top of the food chain, is that of correlation.   A caveman who notes that all those cavemen who have hats made of leaves are (generally) not eaten by lions may fashion himself an equally leafy hat.  If his inference is correct and having a hat of leaves causes the lions not to eat you then he has extended his life by possibly many years.  He may be wrong, maybe it is not the hat that scares away lions, consider that those with leafy hats must be able climb the tall trees to get the leaves for the hats.  It is this climbing ability that allows them to escape tigers.  So it was not the hat causing the lack of death: it was the ability to climb trees that caused both the immunity to tigers and the silly hats.   If we assume causality then in the worst case we are wearing a pointless hat, if we miss the pattern then we are dead.  

The authors of Freakonomics (another good read) give a contemporary example: childrens' names.  In summary: whilst most rich children are called Cuthbert; naming your child Cuthbert won't make your child rich.  It's clear from the caveman's hat dilemma that assuming causality in earlier days this served us well.  However, in modern, interconnected and fast-paced situations blindly inferring causality (that one event causes another) can be as detrimental as missing the relationship all together.  

Cargo cults are groups of otherwise intelligent people copy the external attributes of a process.  The term originates from tribes who (at least according to Wikipedia) upon seeing that nearby armies received food and supplies from airplanes landing at airports, assumed that it was the presence of the airport that caused the food to arrive.  In attempt to receive their own food they built all the external facets of an airbase: runways, headphones, a control tower.  Of course no food came.    

We are coming close to the moral of today's story.  Remember, when you see two events, A and B say, there are four causal possibilities (ignoring mutual causation):
  1. A causes B (when A happens B happens).
  2. B causes A (when B happens A happens).
  3. A and B are not causally related, (it was just chance that they happened to occur together, e.g having a rock in your front garden and not being eaten by a tiger).
  4. Another event, C, causes A and B (see the leafy hat example).
So if you hear that the successful company down the road uses some technology or technique, e.g. an advanced spanner, and you blame all your failures on a lack of such a spanner, remember that followed to its logical conclusion you may end up wearing a leafy hat or building an airport out of palm trees.

Making an inference

  1. Something must be done.
  2. This is something.
  3. Therefore we must do it.
I'm fascinated by our ability as humans to reason, argue and debate within the confines of our expressive, yet imprecise, natural language.  We quickly reject nonsense arguments; very few managers would accept the following argument from an employee:
  1. Today is Tuesday.
  2. In two days' time it will Thursday.
  3. I deserve a 2K pay rise.
The problem is that of inference: drawing conclusions from existing knowledge.   Combined with our childhood-learnt knowledge of the calendar step 1. 'obviously' implies step two.  That is, if it is Tuesday then in two days' time it will be Thursday.   Equivalently there exists no Tuesday where in two days' time it will not be Thursday.  

Step 3 is pure speculation (if you are wasting your time reading this I wouldn't hold out for one!), most importantly it doesn't seem to follow from the previous two statements.   It's not always clear that an argument is 'wrong', in the earlier argument at the top of this post (the so-called politician's fallacy) each step seems to follow logically from the previous but the ultimate conclusion is garbage.   

The point of the waffle that has preceded is show that discriminating between reasonable and fallacious logical argument is hard.    To help I'm going to reiterate the fallacy of the undistributed middle.   It is a fallacy that I see committed on almost a daily basis by myself, in direct contact with people and in print, it goes something like this:
  1. Assign a property to all people or objects or a certain group.
  2. Find another object or person that also shares the property.
  3. Conclude that the person or object is a member of the group.
Do you agree with the following (quite convincing) argument?:
  1. Terrible programmers write Java
  2. John writes Java.
  3. John is a terrible programmer.
How about this (less convincing) one:
  1. All dogs smell
  2. John smells
  3. John is a dog
So, to reiterate, this post is part of my attempt to post one item per day, it has no conclusion.  Well, ok, maybe that you should think about your inferences carefully and don't be led by attractive but specious arguments.

Monday, 16 March 2009

Unfazed by randomness

About a month ago, after having my hair cut, I met Fran in the Waterstones in Fulham with the intention of buying the memoirs of Sherlock Holmes.   To cut what was originally a long story short, I ended up buying 'The Black Swan'  by Nassim Nicholas Taleb.  

The book's central tenet is that in many situations we are both not very good at predicting future events and these situations do not lend themselves to accurate prediction.  We ignore the lessons of history and are left at the mercy of 'Black Swans': rare but often devastating events.  To use his phraseology: traders are (often) picking up pennies from in front of an oncoming stream roller.  My summary was written rather tentatively since the author is quick to chastise those who claim, but fail, to fully comprehend and act upon his ideas.

The Black Swan analogy should be appealing to Computing Students.   Imagine you're on a train passing through a country prone to large populations of swans.  On the trip you pass by thousands of swans, all of whom are white.  What statements can be made about the swan population of our fictitious county?  Given a 'reasonable' distribution of swans and a taking into account or large sample we might (wrongly) reason that there exist only white swans.   At this juncture the author reiterates a distinction probably made in many undergraduate logic courses: having no evidence to support the existence of black swans is distinct from having evidence that rules out the existence of black swans.   Our earlier train trip offered no evidence for the existence of black swans (we didn't see any), but did not find evidence to discredit existence of black swans (like an anthropological argument such as 'swans have trouble producing pigments' might do).  In real life it's very easy to miss the distinction between the two: the sever has never crashed - we don't need a backup.

Computing has its own softer variant of the Black Swan: the shaved yak.   Imagine you have some marking to do, but you've lost your red pen.  You find yourself skipping down to the stationary shop to buy a red pen.  On the way you meet a friend who has a bug he'd like you to look at.  You take a look at it, but can solve it without a copy of an authoritative reference.  The library has a copy, however you have overdue books (that you lost years ago) so can't get the book out.  So you ring your friend to ask him to get the book for you, but he's stuck in the queue for a coffee... (continues).  This chaotic diversion is called 'yak shaving' because the original story ends with the author shaving a yak (a problem usually unrelated to programming).  

This post doesn't really have a point, and is going to end abruptly.  But it has a moral, don't expect your completion estimates for software to be even within 90% of the original, make sure when you're programming a swan class you allow for some perturbations in the problem domain (Black Swans) and read 'The Black Swan' (but don't also read Fooled by Randomness, because they are very similar).

Hello

Welcome to my blog.  I will attempt to write on here once a day.  I will inevitably fail, however, once a day is a suitable maxim.  These meta-posts (posts about blogging) are the easiest to write since they require little research.  I have set myself the following goals (and non-goals):
  • To use at least one word whose exact meaning I have had to look up.
  • To publish thoughts even if they are 'wrong', and acknowledge that the process of being corrected is a little humiliating, but certainly better than remaining an ignorant fool. 
  • To not mindlessly propagate memes.
  • To avoid jargon; there will be no lazy-initialised singleton factory beans here.
I will probably add to these, but they seem like a suitable starting point.  Wish me luck!