The journey starts in a plain, unfurnished apartment on the second floor of a high-rise housing complex on 214th and Broadway, Inwood, New York City, a down-at-heel Spanish-speaking neighbourhood just a few hundred metres from the tip of Manhattan Island’s Northern extension.

From 214th st, Broadway runs north for six more streets to an old rugged metal bridge, which links Manhattan to the Northern Bronx. Across the river, Route 9 continues north, running alongside a train track that is raised above the road by riveted steel stilts, passing through intermingled stretches of high-rise housing projects, scruffy commercial strips, light-industrial zones full of ware-houses and pre-fabricated work-shops. Everything is grimly functional – prettiness is a luxury this part of the city cannot afford. After three or four miles, a slip road provides access to the elevated highway system which allows cars to speed unperturbed over the grubby urban sprawl of the Bronx on their way to the shiny lights of Manhattan. The slip road joins the Saw Mill River Parkway which takes traffic north at a steady speed of 55 miles per hour. Within a few minutes, the city limits have been crossed and Westchester County begins. A dense wall of trees encroaches on all sides.

The wall of trees thins out for a moment as the town of Yonkers passes. A couple of warehouses and a lone, distant, skyscraper appear fleetingly between the trees before the foliage closes in again. Mile after mile goes by without seeing a single building – trees stretch back from the road as far as the eye can see. It feels like one is passing through a region of virgin forests, but the appearance is deceptive. Behind the trees is one of America’s wealthiest boroughs, a suburban spill-over of Manhattan’s elite, with the second most expensive real estate in the country, a far cry from wilderness. “a clandestine, top-secret, hi-tech institute hidden in the woods”

Aerial view of TJ Watson under construction image: wikipedia

After 40 miles or so, a slip road leaves the highway and joins the Kitchawan Road, a nondescript secondary road. A small paved driveway leaves that road on the right hand side. A sign, set discreetly back from the road, bears the legend IBM, gold on black, perhaps a metre high. Beyond the sign, the drive enters a sylvan idyll. Herds of deer roam freely, grazing in grassy clearings between copses of old-growth deciduous trees. A security booth interrupts the peaceful setting. It is automated – operated from a remote security centre somewhere – a camera swivels, a disembodied voice asks one’s business, and the barrier rises. The road sweeps onwards through lush parkland. After a mile or so, a most peculiar building appears. It is semi-circular, set low to the ground, 3 stories high, but partially sunken into the ground and hidden behind a grassy knoll that has been built up as if to conceal its existence. The trajectory of the driveway is designed so that, by the time one sees the building, one is right on top of it. Its scale is enormous – the arc of its semi-circle is over a mile long. It has a uniform featureless smooth, black-tinted, shiny glass façade, only interrupted by a metal canopy which extends over the main entrance. Everything about it suggests secrecy. It is reminiscent of what I always imagined the headquarters of SPECTRE, James Bond’s villainous opponents, would look like: a clandestine, top-secret, hi-tech institute hidden in the woods.

The driveway forks, with one route leading up to the front door and the other veering off to the left, towards the back of the building. At the end of this road is a parking lot: the interior of the semi-circular building is filled with endless rows of modern, sensible suburban vehicles. Despite the number of cars filling the lot, it is a strangely desolate place. Occasionally, somebody can be seen walking briskly through the lot, but mostly it is quiet, deserted and windswept, creating a sense of exposure and vulnerability. The back of the building has entrance gangways every hundred metres or so. The ground floor is subterranean here, so the entrances give onto the building’s middle floor.

Diverging spokes – internal corridors. image: wikipedia

A swipe card opens the automated doors. Inside the building, long, curving corridors extend in all directions. The main arteries of the building run along its edges uninterrupted through the entire length of the building. Other corridors form regular radial connections between the front and back – like spokes on the bicycle wheel. Everything is gleaming, antiseptic and white. Very occasionally, some eccentric looking man with a wild beard can be seen gliding into view in the distance before disappearing into one of the corridors or doors that line the walls. Other than that an eerie quiet holds semi-permanent sway. From the entrance, a short stroll along the boundary corridor and a turn to the left brings one to a door on the right which opens onto a fairly large office, maybe 6 metres square.

This was my daily commute to work for 6 months or so from late April 1997. I was working as an intern in IBM’s T.J. Watson research centre, in Yorktown Heights in New York, as a programmer for the artificial intelligence group of the mathematics department of IBM research. This job represented the personal crossing of a category boundary in the world of work. Up to this point, I had worked in a variety of jobs: security guard, warehouse picker and shipper, supermarket shelf-stacker, waiter, shoe shop assistant, barrista. Those jobs all had a lot in common despite the different roles. In particular, they all corresponded reasonably closely with the basic Marxist model of waged labour. On one side there was management who saw their workers as a commodity to be exploited, commanded, mistrusted and watched. On the other was an antagonistic work force, who basically attempted to do as little work as they could for their wage. Working for IBM research was completely different. Whereas I had been used to punching a clock in and out, and having every minute of presence measured, here nobody paid any attention to what time anybody showed up at. My boss disappeared for weeks at a time. I had my own office and, when I shut the door, nobody would disturb me. Nobody issued orders. My boss would ask me if I would be available to do things and never minded when I said no. My wage of $1700 a week was more than 3 times higher than it had been in any previous job. Recruiters would cold-call my office with offers of 6 figure salaries in Wall St. IBM paid for my flight and relocation expenses and gave me an expense account to cover all costs on my first week. Word got around that I was using public transport to get to work so an employee in an office a couple of corridors over dropped in to me and gave me a car. He just freaking gave it to me. He was getting a new one and it was too much trouble to sell the old one. It was in perfect condition.

Deep Blue – good at chess for a lump of plastic and silicon image: IBM

The T.J. Watson building was home to most of IBM’s core research activity. It was packed full of some of the world’s top mathematicians, physicists and computer scientists. In the canteen, one could rub shoulders with the inventor of the track-pad and the people responsible for many of the most fundamental advances in digital storage technology. Wild white beards were de rigueur. In the building’s lobby sat Deep Blue, the chess playing computer which had just defeated the human world champion, Gary Kasparov, and some of the project team who had built it shared my corridor.

My particular research team were working on natural language processing – trying to get computers to understand human language. Amusingly, to me at least, one of the team’s big academic rivals in this field was Noam Chomsky, who was based at MIT, just a couple of hundred miles north. My colleagues frequently grumbled about him getting all the attention.

I was working on a project in conjunction with a Southern US bank. The bank’s problem was that they had could not handle the volume of email that they were getting. Every day they printed out all the emails that they had received that day, put them on a truck and send them to a warehouse, where workers would go through them and sort them into bags for each department in the bank. The next morning the truck would return and deliver the sorted bags of printed emails along with the daily mail. Yes, they actually did that.

Our team’s goal was to automate this process, to teach a computer how to deliver these emails to the correct department, based upon their contents. This is a very standard problem, known as text-categorisation, with a relatively long history of research behind it. The general solution is simple and fairly standard – you take a bunch of pre-categorised example documents, divide them up into tokens (e.g. words), and use some AI algorithm or other to identify the most characteristic patterns of tokens in each category. Then you analyse each new document to see which category-pattern it most closely resembles. Our team’s innovation was to use compound multi-word phrases in the texts in addition to simple words (although the words were stemmed, lemma-ised and linguistically processed: so not altogether simple). My personal contribution was writing the code that took the tokenised output from the linguistic tools, statistically analysed it and turned it into a great big feature-vector which could be fed to a decision tree algorithm. Our training documents were a set of news bulletins, categorised by Reuters, about the international commodities markets. I came to appreciate the subtle differences in the vocabularies of the palm oil and palm kernel trading communities much better than I could ever have expected.

Having built the system, it turned out that it did make a difference – but the difference was as often bad as it was good. In the years since then, research has largely taken an opposite direction – towards simpler statistical models and away from linguistic complexity. Some of the most effective text categorisers nowadays don’t even use words, just letter sequences known as n-grams, where n is usually a small number like 3. For example, this sentence could be made into 3-grams as follows: “for”, “or “, “r e”, “exa”, “xam”, and so on. You just add up the number of times each 3-gram appears in the documents in each category and extract the most characteristic patterns. This simple approach has proved surprisingly successful in such fields as language identification (getting a computer to figure out which language a given document is written in), sentiment analysis and even author identification – it turns out that the patterns of 3-grams that people use are pretty individual and tend to repeat themselves in whatever they write.

The point of this digression into the arcane details of text categorisation is, firstly, that it illustrates an interesting general point. In practice, simple, models that don’t make much intuitive sense can be more accurate reflections of the real world than more sophisticated models which embody much more intelligence and capture much more meaningful information. The underlying cause is that the problem of human language comprehension is extraordinarily, ridiculously complex and to fully master it, extraordinarily complex, sophisticated models will ultimately be required. In between the simple, statistical models that currently perform the best, and the type of processing that human brains do, there is a ridiculously wide gap. In bridging this gap, performance declines at first, and it seems likely that we will have to travel a long way down that road before we can arrive at solutions that can consistently out-perform the most simple models.

But the real reason for going into all of this detail about text categorisation is because it is a window into the most wondrous phenomenon in the universe, the human brain and its capacity to solve absurdly complex problems. “the solution is five years away and always has been”

Optimism – 1958 30 Aug 1958 image: The Times

The history of artificial intelligence research has followed a regular pattern in most of its sub-fields. Researchers invariably start from a naively optimistic point of view where it is assumed that human behaviour can be encoded in a set of rules which determine what to do in any given situation. So, for example, when it comes to translation between natural languages, there is a standard joke in the research community that the solution is five years away and always has been. The earliest computer researchers, back in the 1950s, assumed that language translation would be easy and would merely require parallel dictionaries and a few simple rules. Oh how wrong they were. A similar pattern has repeated itself again and again in every sub-discipline that has attempted to translate human behaviours into computer form. A naively optimistic researcher starts out on the problem, writes a few rules that cover the simple cases, tests them, realises that they are inadequate, then goes back to try to figure out how to fix the problems and, sooner or later, he has an “Oh shit, we’re going to need a bigger boat” moment as the jaw-dropping complexity of the problem becomes clear. Every few years, new teams of researchers attack the problem again, with slightly more sophisticated tools but an equally naïve optimism in their ability to solve the problem, until they too hit the lurking complexity wall. Advances do regularly happen, but they are in small, highly constrained areas. For example, after many decades of effort, in the last couple of years a researcher finally managed to get a robot to successfully fold a sheet, albeit in a highly controlled environment. The goal of emulating the rich, integrated, multi-faceted nature of human behaviour has, if anything, receded even further into the future as our understanding of the problem has advanced.

image: briandeadly (flickr)

The propensity of computer scientists to be extravagantly optimistic in their assumptions about the feasibility of codifying human behaviour and human systems in simple rule-based forms is endlessly infuriating. However, it is understandable. Firstly, in my experience, there is a strong tendency for computer science research to attract individuals who have personalities that are not a million miles away from the personality space known as high-functioning autistic spectrum disorder: very interested in numbers, abstract reasoning and order, not so interested in social interactions. Secondly, when you have a room full of the world’s cleverest people, all intensely focusing their collective intelligence on trying to solve a problem that some of the least intelligent people on the planet can master effortlessly, it’s hard not to be optimistic. Just how difficult these problems are is not obvious a priori and the only reason that we now understand the explosion of complexity that lies beneath the surface is that foolhardy researchers have charged headlong in and discovered all the traps by falling into them and all the brick walls by running headfirst into them.

It’s hard to explain in writing just how difficult the problem of emulating human behaviour is – because it only really comes apparent after you have gone into considerable detail in analysing a problem in a structured way and have stripped out the easily solvable aspects. Nevertheless, a couple of examples can serve as reasonable illustrations at least.

The self-destructing domestic servant

If you want to go beating us at chess, you can do the bloody laundry. image:

In the first example, consider the problem of writing a program to control a robot that can carry out simple domestic duties like folding the laundry. There are all sorts of intricate, mechanical problems that are far from trivial to solve, but a much bigger problem is that, unless you explicitly tell it otherwise, the robot will happily move around the space it is in, bashing itself into things and knocking bits off itself. This is one of the things that we take for granted with humans and even the simplest of animals. It seems sort of trivial – until we have to address it. So, how do we get our robot to not destroy itself ? We need to teach it self-preservation, but this can’t be an absolute rule, as there are certain situations in which self-preservation should be sacrificed: it’s better for the robot to crash into a wall than crashing into a toddler, even if the first is likely to cause greater damage. So, we now need a fairly complex internal ethical model to dictate when self-preservation principles should be applied and set aside and to do this we need a fairly complex model of human ethical hierarchies and… oh shit we’re going to need a bigger boat.

The gnomic conversationalist

In the second example, we take a situation where two men are standing beside each other looking out a window. One of the men says to the other “it’ll be a clear sky for the birds tonight”. Our computer now magically takes control of the second man and we have to come up with a program that will produce a response. We could, of course, simply respond with a banality such as a formulaic affirmation and this is the approach that most Turing test competitors have tended to take. But, formulaic answers only go so far and if we are to improve upon them, we need to actually understand the statement. In doing so, we must ask ourselves: “what are the factors that I must take into account in interpreting the statement.” Most obviously, the actual scene outside the window is significant – if it is raining heavily outside, then perhaps the other person is making an ironic joke? Perhaps the reference to birds is slang and he is actually talking about human females and the comment relates to a planned excursion to a den of iniquity later that night? Or maybe it is a quote by a third party that is known to both people: some shared cultural reference point? Or maybe it harks back to some shared experience of the two men – maybe they were in a war together and the phrase is one that they used to signal that bombing raids were imminent? Or maybe he just has a strange fascination with the effects of meteorology on bird’s moods? And how do we distinguish between these possibilities? Well, that must depend upon the personality of the first man and the nature of their relationship, past and present. So, all we need to take into account in order to understand the phrase properly, is the immediate visual environment, the cultural memes in circulation, the shared history of the two people, and a sophisticated model of the first man’s personality and…oh shit we’re going to need a bigger boat.

When I first encountered this phenomenon – whereby problems that are ludicrously difficult for us to solve explicitly are apparently trivially easy for even the least intelligent humans – it resonated with my egalitarian anarchist politics. It was, as I saw it, just another example of highly educated elitists under-estimating the inherent creativity and intelligence of ordinary people. Although there is some truth to this, it turns out to be extremely limited as an explanation – an individual may be able to effortlessly master the extraordinary complexity of language, but still be eternally incapable of mastering much simpler problems which require explicit abstract reasoning. To properly understand what is going on, we need to go much deeper. We need to look at these problems, understand why exactly they are so difficult, and how the brain manages to solve them with such apparent ease. It turns out that, although a comprehensive understanding of the brain is far off in the distant future, we already know enough to come up with pretty good answers to these questions. And these answers reveal an awful lot about the world, human society and why things happen the way they do and very few people know them.

My next post, which will appear next week, will have answers!

Leave a Reply