Some rules governing natuaral language must be explained prior producing object and relational models. The rules of natural language are known as linguistics. From these rules both object and relational representations can then be divised.
The term natural language processing generally implies the interpretation of written or textual language. Natural language processing is the syntactical and semantical validation of words and sentences based on known words and strings of words. For example, a apple is invalid whereas an apple in invalid since a noun beginning with a vowel is preceeded by the preposition an and not the preposition a in this case.
Syntax validation can be broken into two distinct steps, spelling and grammer. Syntax validation is the validation of the positioning of words in a sentence based on their spellings, existence in a lexicon (set of valid words) and their relationships to each other based on their words types, ie. the grammer or linguistics of a language. Syntax analysis determine the meaning of a word, a group of words or a sentence. Note that computerised grammatical validation is not always an exact match to normal language linguistic rules. Sometimes computerisation requires a certain amount of specialisation added to a set of grammatical rules.
| I run |
| you run |
| he runs |
| she runs |
| we run |
| they run |
| I walk |
| you walk |
| he walks |
| she walks |
| we walk |
| they walk |
| English | French | ||||
|---|---|---|---|---|---|
| masculine | feminine | masculine | feminine | familiar(close friends) | non-familiar |
| I speak | je parle | ||||
| you speak | tu parles | vous parlez | |||
| he speaks | she speaks | il parle | elle parle | ||
| we speak | nous parlons | ||||
| they speak | ils sont | elles sont | |||
| they speak | ols parlent | elles parlent | |||
| I am |
| you are |
| he is |
| she is |
| we are |
| they are |
| English | French | ||||
|---|---|---|---|---|---|
| masculine | feminine | masculine | feminine | familiar(close friends) | non-familiar |
| I am | je suis | ||||
| you are | tu es | vous etes | |||
| he is | she is | il est | elle est | ||
| we are | nous sommes | ||||
| they are | ils sont | elles sont | |||
| I was |
| you were |
| he was |
| she was |
| we were |
| they were |
| I ran |
| you ran |
| he ran |
| she ran |
| we ran |
| they ran |
| I run |
| you run |
| he runs |
| she runs |
| we run |
| they run |
| I will/should/etc. run |
| you will/should/etc. run |
| he will/should/etc. run |
| she will/should/etc. run |
| we will/should/etc. run |
| they will/should/etc. run |
Semantics is the meaning of words and the combination of those words. The meaning of words is known as semantics. A lexicon, as already stated, stores words and their meanings. A program can not understand these meanings since to a program a sentence is simply a string of characters. Meaning or semantics is probably closer to what is meant by natural language processing, than is syntax. In some respects semantics is probably more complicated than voice recognition, for instance. Voice recognition is possibly more dependent on microphone technology than anything else. Voice recognition is the breaking up of sounds into patterns of frequencies which are known quantities. Dialects could cause problems. However, learning of new frequency patterns can be accomplished over a period of time. A computer can effectively learn new frequency patterns more easily than it can learn semantics. This is because meaning is not necessarily interpretable from a set number of known patterns. Semantics is probably interpretable from a fixed number of known patterns. However, known patterns of the meanings of word combinations could produce a phenomenal number of permutations of the meanings that can be derived from groups of similar and sometimes even the same words. In some languages, accednts or expressions of the words or groups of words can have multiple meanings. The own way a computer could interpret meanings would be to take all factors involved into accounts. This would imply judging syntax, sound and word order all simultaneously in order to determine meaning. The permutations in this are unimaginable. Atleast with current comnputer technology. The future is however, open to debate. The best currently available supercomputers may be able to accomplish this task. However, five years from now those currently available supercomputers will probably be on your wrist. Programmers probably will not be able to solve the software construction problem that quickly. However, we can now construct software which can learn simply by human interaction. Therefore the computers will learn how to communicate withis us on this level.
Let us take a quick look at two simple sentences. How would these simple sentences be validated semantically such that their meaning is correctly interpreted and understood to the point where the two sentences could perhaps be explained by a computer in the computer's own words.
How would the computer determine between the two words there and their. Both words sound the same but are spelt differently. The only way to correctly deduce the different meanings of both sentences is to compare the order of other words in each sentence. The order of words in each sentence may assist in determining meaning.
Even these two simple sentences could generate enormous complexities in semantic analysis. Semantics, even in its simplest computer based form can potentially reference a very large amount of information at once. Meaning is uncovered based on inter-relationships between words on an enormous scale and can include standard responses to standard questions or statements. A machine can appear to understand by varying answers and ensuring that things like tense are correctly constructed. Also different languages, colloqialisms (slang) and multiple dialects could cause problems in terms of the quantity of information required to validate word and sentence meanings.
A lexicon is by definition a set of words. A dictionary contains the same set of words as a lexicon but also contains meanings for each of those words. Atleast that is what I mean by a computerised lexicon.
It should now be apparent that a computerised lexicon is much more complex than a book-form dictionary. A program can not judge the meaning of a word based on the definition of that word. The computer can not simply understand the meaning of the word in the dictionary since the machine does not understand the meaning of the words comprising the explanation of that word.
A lexicon or a dictionary is a group of words and their associated meanings. Book form and computerised forms of lexicons could be very different. In general a lexicon is a storage area for data. In book form a lexicon contains words and meanings. A computerised form of lexicon could contain words, word inter-relationships and rules governing or triggered by access to those words or relationships.
At some stage the computerised lexicon becomes firstly, an expert system and secondly, a knowledge base.
An expert system or database is an expert in a specific area. Thus if one is to ask an expert system a question about engineering and the expert system was programmed by an architect one could get an abstract answer to a precise question. The point to note is that an expert system is written by an expert for an expert in a very specific field. An expert system gives standard answers to standard questions. In its most simple form an expert system will always give the same answer to the same question and could have multiple questioning and answering pathways depending on user responses. However, it is important to note that there is no semantic capability. An expert system only appears to be intelligent based on its content.
A knowledge base is a database repository of expert information. An expert system is like a knowledge base with a user friendly front-end. A knowledge base however, not only contains standard answers to standard questions but can also contain rules. These rules can be triggered under specified circumstances. The original idea of a knowledge base was that of an expert system repository which has some inherent processing power, ie. The processing power is contained within the database in the form of event triggers firing rules and performing other tasks such as creating new knowledge base entries.
This leads us to the next item of interest, learning.
Intelligence is made up of a number of things. One of these is learning. Learning implies gaining more information by interaction with an environment. Thus if a computer is asked a question it has does not have answer to it could store the question and request an answer. Thus by storing question and answer the computer is learning. This is a very simplistic descrioption. Neural networks are effective with learning, particularly with language. This is because language is generally finite in comparison to other things. Language is easily learned because there are lots of people to talk to. The problem with learning is that the more one learns the more searching is done when questions are asked. Searching takes time. Neural networks attempt to solve this problem by attempting to model the way the brain works. In short, direct and indirect dynamic node interconnections. Isaac Azimov called them Positronic Pathways, routes from one point to another. Neural networks are highly complex and intensive in processing time.