“Then you should say what you mean,” the March Hare went on. “I do,” Alice hastily replied; “at least-at least I mean what I say-that’s the same thing, you know.” “Not the same thing a bit!” said the Hatter.Lewis Carroll, Alice in Wonderland
In recent times, the technology behind human-computer communication has advanced rapidly. We have seen the arrival of Siri (2010), Alexa (2014) and the Google Home (2016), all with a remarkable level of success.
A computer that can talk and listen to you can be broadly divided in two, not taking into account processing in between.
- Speech-to-text (listening)
- Text-to-speech (speaking)
Out of the two halves, text-to-speech has been around for ages, albeit in primitive fashion. It is easy but tedious to make a recording of every word in the dictionary. At the moment, computer programs can sound almost like a human, with fluency improving more and more. However, speech-to-text, or speech recognition, is much more difficult. There are thousands of different accents and dialects in the world, and finding a way to recognise words through them is extremely difficult, which is why only a handful of companies have managed it.
There is a distinction to be drawn between the challenging task of speech recognition, and the even more daunting one of computers understanding complex vocabulary and grammar. Most of the time, asking Siri or Alexa a question yields nothing more than what you would get if you typed the nouns into Google. It lacks nuance. Their abilities in this respect seem limited for the moment, but is there potential for improvement?
To dive into whether computers are theoretically capable of answering deeper questions, we must take a look at some of the deeper problems behind human-computer communication, in the overlapping fields of linguistics and philosophy.
Firstly, I would ask, what is meaning?
The ability to impart meaning unambiguously is an important skill in many fields. In mathematics, proofs can depend on very careful specification of what a particular word means. In law, the intricacy of the legal code demonstrates the ability of humans to argue their way out of a supposedly airtight situation. In contexts such as these, the intention behind the word is next to worthless.
These fields could prove to be fertile ground for computers; it may be an advantage to ‘take things literally’. Computers have already made great strides in this, for example the development of mathematical proof-checking software. A program called EQP was used to prove the Robbins conjecture in 1996. In fact, using computers to prove mathematical theorems was one of the main motivations of the development of computer science.
However, in normal conversation, meaning is rarely this simple. We humans have the concept of the ‘cooperative principle’, in which it is assumed that we speak in a cooperative way. For example, if you say, “My pen is out of ink,” and I reply, “I have a pen here,” the subtext is that I am offering to lend you a pen. This makes sense only in light of the cooperative principle; without it, I would merely be making a factual statement that has little relevance to yours. For the moment at least, computers do not have the capability to respond to this line of reasoning.
A second reason computers find it difficult to converse in casual contexts is the inherent ambiguity in language built upon a shared perspective. For a concrete example, read these two sentences (an example of a Winograd schema).
- The city councilmen refused the demonstrators a permit because they feared violence.
- The city councilmen refused the demonstrators a permit because they advocated violence.
If you consider these for a few seconds, it is obvious that the ‘they’ refers to a different group of people (the first to the councilmen, the second to the demonstrators). To distinguish these meanings, some basic knowledge of the world is required. As of yet, computers do not have this, this ability to make judgments about how people might act and react to various situations. This requires an internal ‘model of reality’.
A model of reality is what everybody has inside their head. Their picture of the world. It is broadly similar for all humans but is affected by almost anything in the environment. To what extent a person’s model of reality is valid or realistic is the point of much debate. The central issue here is that of perception.
How much of what we see can be trusted?
In the Internet age, this question has become divisive, with many holding to traditional news outlets, while others turn to bottom-up alternatives such as Reddit. Political partisanship is at an all-time high in the United States, and Europe fares little better, with the resurgence of populism.
The facts are that we live in a bigger, faster, more confusing world every day. Most people would agree that we can trust the evidence of our eyes, but independent investigation is not bringing us most of our information today. Overall, the ability to deceive others has increased rather than decrease with the advent of ‘Big Data’. In the long term, deception about facts changes a person’s model of reality. Altering what we think gradually turns into altering how we think, which is how societies accrue bias and assumptions.
Clearly, models of reality can be flawed, which has implications for how we build AI.
As anybody who has tried to write a computer program realises, there are certain things you need to build into your program in order to encapsulate what is happening in the real world. If you want an algorithm that recommends videos to you, you need to create something within your program that represents you, a set of variables that describe your interests and personality traits. How do you create a program that can represent key facts about the world?
There are two bad alternatives here:
- That we create programs with zero inherent bias and hence no comprehension of human traits like morality, culture or music.
- That we pass on our own biases to computer programs, such as the example of automated sentencing systems in the USA, where racist sentencing habits were passed on to the neural network.
Neither pure objectivity nor subjectivity seem to be acceptable alternatives, which will be discussed in the article on ethics.
Finally, I would like to talk about machine translation, i.e. between human languages.
Machine translation has become almost everyday, even though it has appeared surprisingly quickly (Google Translate was first released in 2006). Unlike other products in the field, Google Translate has little competition. The next most popular general translation software (as opposed to dictionaries) is Bing Translate which is both of a poorer quality and less popular. Machine translation suffers from many of the problems discussed earlier as well as some unique to it:
- Polysemy, where a word can have multiple translations (think informal vs. formal ‘you’ in Romance languages)
- Homographs (words spelt the same but with different meanings)
- Metaphor “There’s a grey cloud over him”
- Idiom “It’s raining cats and dogs”
Machine translation seems to work well for very common languages such as Mandarin, French, Spanish and German but performs badly with rarer languages. It is also reliant on a community of people to check translations and suggest new ones, as well as existing texts that have been translated by experts.
There seems to be an attitude in today’s world that problems can be solved simply by throwing more data at them, but the evidence seems to show that coming up with innovative solutions is a better approach. We need to think carefully about the philosophical questions behind what communication fundamentally is, and come up with solutions that can be applied in a context where everything must be digitised and compartmentalised.
There are no easy answers for giving computers ‘common sense’ and an understanding of the world. But perhaps one day a program will be written (perhaps by another program) that will hear “the cricket bat didn’t fit in the suitcase because it was too big”, and understand that suitcases are for putting things in. In fact, this would be a general AI, the holy grail of AI researchers, something that can interact with the world without needing to be fed carefully curated input.
The problem, as you will see both here and in the following articles, is that computers think in an amazingly different way to humans. Many people do not realise this, and have the mental image of a computer ‘thinking’.
When I said earlier that a computer might understand something, it was a figure of speech, shorthand for ‘this program contains the information that…’. Computers are merely extensions of the human intellect. I would urge you to remember that computer programs are merely sets of instructions carried out in an extremely organised manner.
Are humans the same? I’ll leave that question to the neurologists, and leave you with a thought on the nature of language.
What we cannot speak about we must pass over in silence.Ludwig Wittgenstein
- Fifth Generation Computer Corporation http://www.fifthgen.com/speaker-independent-connected-s-r.htm
- Melanie Pinola, PC World (2011) https://www.pcworld.com/article/243060/speech_recognition_through_the_decades_how_we_ended_up_with_siri.html
- Mann, Allen (2003) http://math.colgate.edu/~amann/MA/robbins_complete.pdf
- Partisan voters… http://bit.ly/1I3nUkJ
- Stanford Encyclopedia of Philosophy (2018) https://plato.stanford.edu/entries/wittgenstein/
- Quora.com, Google vs. Bing translate (2012) https://www.quora.com/What-is-better-between-Google-Translate-or-Bing-Translate-why-and-if-the-former-is-better-then-why-do-most-websites-use-the-Bing-translator
- Wired.com, Courts are using AI… (2017) https://www.wired.com/2017/04/courts-using-ai-sentence-criminals-must-stop-now/