Natural Language Processing

Posted by ljmacphee on July 13, 2007 under topics in artificial intelligence |

Natural language processing allows people to interact with computers and robots in a more natural way. There are five major areas in NLP: Speech recognition, Language processing, Language understanding, Language generation, Speed synthesis.

Speech recognition consists of breaking down speech into words. Most modern systems require a pause between words so the computer knows where one word ends and another begins.

Language processing is breaking down of words it is then parsed using grammar rules and a lexicon. A lexicon is a list of words to be understood by the computer and attributes for those words( noun/verb/etc ). The grammar is a formal description of the language. It might have rules explaining how sentences may be formed example Subject->NounPhrase->Verb or NounPhrase->VerbPhrase etc. Sometimes you have parsers that break a sentence down into parts and a parse tree. Other times you have a recognizer which parses the sentence and determines if it is valid with the grammar and lexicon supplied.

Understanding is mapping the processed language onto a representation that the computer can handle.

Language generation is the formation of an answer by the machine. Usually these are predefined phrases.

Speech synthesis is the verbal expression of the language generated and is also usually predefined phrases.

Language is difficult to parse structure can be as simple as two word sentences or complex lengthy sentences. By using a limited grammar and lexicon this can be simplified.

Language can also be ambiguous words can some times be nouns or verbs ( a run - to run ; a program - to program ). This can create multiple possible parse trees from a single statement.

Metaphors are not well translated by machines.

Parsers can be top down, bottom up, or finite state machines. Top down parsers start with sentences and work down to words. When every word has been classified it is complete. Bottom up parsing starts with words applying rules to parse them into verb and noun phrases until a sentence can be fit into a sentence rule. Finite state machines work from the bottom up. Finite state machines identify words and work through a graph like structure to form the sentence.

NLP is used in speech interfaces, text processing, language translation and information retrieval. It is an important tool in the war against spam.

More information:
Stanford School of Engineering, Natural Language Processing Online Course
Simon: Open source speech to text software
Natural Language Processing Blog
Getting Started with OpenNLP ( Natural Language Processing )
Has voice recognition finally come of age?

See also:
Darpa builds speech translator
Neural net learing vowels

Add A Comment

You must be logged in to post a comment.