Software Development: Natural Language Processing

There have been high hopes for Natural Language Processing. Natural Language Processing, also known
simply as NLP, is part of the broader field of Artificial Intelligence, the effort towards making machines think.
Computers may appear intelligent as they crunch numbers and process information with blazing speed. In truth,
computers are nothing but dumb slaves who only understand on or off and are limited to exact instructions. But
since the invention of the computer, scientists have been attempting to make computers not only appear intelligent
but be intelligent. A truly intelligent computer would not be limited to rigid computer language commands, but
instead be able to process and understand the English language. This is the concept behind Natural Language
Processing.
The phases a message would go through during NLP would consist of message, syntax, semantics,
pragmatics, and intended meaning. (M. A. Fischer, 1987) Syntax is the grammatical structure. Semantics is the
literal meaning. Pragmatics is world knowledge, knowledge of the context, and a model of the sender. When
syntax, semantics, and pragmatics are applied, accurate Natural Language Processing will exist.
Alan Turing predicted of NLP in 1950 (Daniel Crevier, 1994, page 9):
"I believe that in about fifty years' time it will be possible to program computers .... to
make them play the imitation game so well that an average interrogator will not have more than
70 per cent chance of making the right identification after five minutes of questioning."
But in 1950, the current computer technology was limited. Because of these limitations, NLP programs of
that day focused on exploiting the strengths the computers did have. For example, a program called SYNTHEX
tried to determine the meaning of sentences by looking up each word in its encyclopedia. Another early approach
was Noam Chomsky's at MIT. He believed that language could be analyzed without any reference to semantics or
pragmatics, just by simply looking at the syntax. Both of these techniques did not work. Scientists realized that
their Artificial Intelligence programs did not think like people do and since people are much more intelligent than
those programs they decided to make their programs think more closely like a person would. So in the late 1950s,
scientists shifted from trying to exploit the capabilities of computers to trying to emulate the human brain. (Daniel
Crevier, 1994)
Ross Quillian at Carnegie Mellon wanted to try to program the associative aspects of human memory to
create better NLP programs. (Daniel Crevier, 1994) Quillian's idea was to determine the meaning of a word by the
words around it. For example, look at these sentences:
After the strike, the president sent him away.
After the strike, the umpire sent him away.
Even though these sentences are the same except for one word, they have very different meaning because of the
meaning of the word "strike". Quillian said the meaning of strike should be determined by looking at the subject.
In the first sentence, the word "president" makes the word "strike" mean labor dispute. In the second sentence, the
word "umpire" makes the word "strike" mean that a batter has swung at a baseball and missed.
In 1958, Joseph Weizenbaum had a different approach to Artificial Intelligence, which he discusses in this
quote (Daniel Crevier, 1994, page 133):
"Around 1958, I published my first paper, in the commercial magazine Datamation. I
had written a program that could play a game called "five in a row." It's like ticktacktoe, except
you need rows of five exes or noughts to win. It's also played on an unbounded board; ordinary
coordinate will do. The program used a ridiculously simple strategy with no look ahead, but it
could beat anyone who played at the same naive level. Since most people had never played the
game before, that included just about everybody. Significantly, the paper was entitled: "How to
Make a Computer Appear Intelligent" with appear emphasized. In a way, that was a forerunner
to my later ELIZA, to establish my status as a charlatan or con man. But the other side of the
coin was that I freely started it. The idea was to create the powerful illusion that the computer
was intelligent. I went to considerable trouble in the paper to explain that there wasn't much
behind the scenes, that the machine wasn't thinking. I explained the strategy well enough that
anybody could write that program, which is the same thing I did with ELIZA."
ELIZA was a program written by Joe Weizenbaum which communicated to its user while impersonating a
psychotherapist. Weizenbaum wrote the program to demonstrate the tricky alternatives to having programs look at
syntax, semantics, or pragmatics. One of ELIZA's tricks was mirroring sentences. Another trick was to pick a
sentence from earlier in the dialogue and return it attached to a leading phrase at random intervals Also, ELIZA
would watch for a list of key words, transform it in some way, and return it attached to a leading sentence. These
tricks worked well under the context of a psychiatrist who encourages patients to talk about their problems and
answers their questions with other questions. However, these same tricks do not work well in other situations.
In 1970, William Wood, AI researcher at Bolt, Beranek, and Newman, described an NLP method called
Augmented Transition Network. (Daniel Crevier, 1994) Their idea was to look at the case of the word: agent
(instigator of an event), instrument (stimulus or immediate physical cause of an event), and experiencer (undergoes
effect of the action). To tell the case, Filmore put restrictions on the cases such as an agent had to be animate. For
example, in "The heat is baking the cake", cake is inanimate and therefor the experiencer. Heat would be the
instrument. An ATN could mix syntax rules with semantic props such as knowing a cake is inanimate. This
worked out better than any other NLP technique to date. ATNs are still used in most modern NLPs.
Roger Schank, Stanford researcher (Daniel Crevier, 1994, page 167):
"Our aim was to write programs that would concentrate on crucial differences in
meaning, not on issues of grammatical structure .... We used whatever grammatical rules were
necessary in our quest to extract meanings from sentences but, to our surprise, little grammar
proved to be relevant for translating sentences into a system of conceptual representations."
Schank reduced all verbs to 11 basic acts. Some of them are ATRANS (to transfer an abstract
relationship), PTRANS (to transfer the physical location of an object), PROPEL (to apply physical force to an
object), MOVE (for its owner to move a body part), MTRANS (to transfer mental information), and MBUILD (to
build new information out of old information). Schank called these basic acts semantic primitives. When his
program saw in a sentence words usually relating to the transfer of possession (such as give, buy, sell, donate, etc.)
it would search for the normal props of ATRANS: the object being transferred, its receiver and original owner, the
means of transfer, and so on If the program didn't find these props, it would try another possible meaning of the
verb. After successfully determining the meaning of the verb, the program would make inferences associated with
the semantic primitive. For example, an ATRANS rule might be that if someone gets something they want, they
may be happy about it and may use it. (Daniel Crevier, 1994)
Schank implemented his idea of conceptual dependency in a program called MARGIE (memory, analysis,
response generation in English.) MARGIE was a program that analyzed English sentences, turned them into
semantic representations, and generated inferences from them. Take for example: "John went to a restaurant. He
ordered a hamburger. It was cold when the waitress brought it. He left her a very small tip." MARGIE didn't work.
Schank and his colleagues found that "any single sentence lends itself to so many plausible inferences that it was
impossible to isolate those pertinent to the next sentence." For example, from "It was cold when the waitress
brought it" MARGIE might say "The hamburger's temperature was between 75 and 90 degrees, The waitress
brought the hamburger on a plate, She put the plate on a table, etc." The inference that cold food makes people
unhappy would be so far down the line that it wouldn't be looked at and as a result MARGIE wouldn't have
understood the story well enough to answer the question, "Why did John leave a small tip?" While MARGIE
applied syntax and semantics well, it forgot about pragmatics. To solve this problem, Schank moved to Yale and
teamed up with Professor of Psychology Robert Abelson. They realized that most of our everyday activities are
linked together in chains which they called "scripts." (Daniel Crevier, 1994)
In 1975, SAM (Script Applied Mechanism), written by Richard Cullingford, used an automobile accident
script to make sense out of newspaper reports of them. SAM built internal representations of the articles using
semantic primitives. SAM was the first working natural language processing program. SAM successfully went
from message to intended meaning because it successfully implemented the steps in-between - syntax, semantics,
and pragmatics.
Despite the success of SAM, Schank said "real understanding requires the ability to establish connections
between pieces of information for which no prescribed set of rules, or scripts, exist." (Daniel Crevier, 1994, page
167) So Robert Wilensky created PAM (Plan Applier Mechanism). PAM interpreted stories by linking sentences
together through a character's goals and plans.
Here is an example of PAM (Daniel Crevier, 1994):
John wanted money. He got a gun and walked into a liquor store. He told the owner he wanted some
money. The owner gave John the money and John left.
In the process of understanding the story, PAM put itself in the shoes of the participants. From John's
point of view:
I needed to get some dough. So I got myself this gun, and I walked down to the liquor store. I told the
shopkeeper that if he didn't let me have the money then I would shoot him. So he handed it over. Then I left.
From the store owner's point of view:
I was minding the store when a man entered. He threatened me with a gun and demanded all the cash
receipts. Well, I didn't want to get hurt so I gave him the money. Then he escaped.
A new idea from MIT is to grab bits and parts of speech and ask for more details from the user to
understand what it didn't before and to understand better what it did before (G. McWilliams, 1993).
In IBM's current NLP programs, instead of having rules for determining context and meaning, the
program determines its own rules from the relationships between words in its input. For example, the program
could add a new definition to the word "bad" once it realized that it is slang for "incredible." IBM also uses
statistical probability to determine the meaning of a word. IBM's NLP programs also use a sentence-charting
technique. For example, charting the sentence "The boy has left" and storing the boy as a noun phrase allows the
computer to see that the subject of a following sentence beginning with "He" as "the boy." (G. McWilliams, 1993)
In the 1950s, Noam Chomsky believed that NLP consisted only of syntax. With MARGIE, Roger Schank
added semantics. By 1975, Robert Wilensky's PAM could handle pragmatics, too. And as Joe Weizenbaum did
with ELIZA in 1958, over 35 years later IBM is adding tricks to its NLP programs. Natural Language Processing
has had many successes - and many failures. How well can a computer understand us?

Software Development

Pages

Sunday, 14 October 2012

Natural Language Processing

No comments:

Post a Comment