Herself’s Artificial Intelligence

Humans, meet your replacements.

Archive for February, 2007

Perceptron

without comments

Perceptron

Rosenblatt added the learning law to the McCulloch-Pitts neurode to make it Perception, which is the first of the neural net learning models. The perception has one layer of inputs and one layer of outputs, but only one group of weights. If data points on a plot are linearly separable (we can draw a straight line separating points that belong in different categories), then we can use this learning method to teach the neural net to properly separate the data points.

The McCulloch-Pitts neurode fires a +1 if the neurode’s total input the sum of each input * its weight + some bias function is greater than the set threshold. If it is less than the set threshold, or if there is any inhibitory input a -1 is fired. If the weights are chosen to be 1 for each input and the threshold is zero, then the bias is chosen to be 0.5 input*weight then the neurode works as an AND function. If the bias is chosen to be -0.5 then the neurode acts as a OR function. If the bias is chosen to be 0.5 it behaves as a NOT operator. Any logical function can be created using only AND, OR and NOT gates so a neural net can be created with McCulloch-Pitts neurodes to solve any logical function.

We start with a weight vector that has its tail at the origin and a randomly picked point. Each data point is input to the neurode and it responds with either a +/1, the weight vector is multiplied by the correct output. This is done until all data points are input and the neurode gives the correct output for each point.

The perception fell out of favor since it can only handle linearly separable functions which means simple functions like XOR, or parity can not be computed by them. Minsky and Papert published a book ‘Perceptions’, in the 1980′s, that proved that one and two layer neural nets could not handle many real world problems and research fell off for about twenty years in neural nets.

An additional layer and set of weights can enable the Perception to handle functions that are not linear. A separate layer is needed for each vertex needed to separate the function. A 1950′s paper by A.N. Kilmogorov published a proof that a three layer neural network could perform any mapping exactly between any two sets of numbers.

Multi layered perceptrons were developed than can handle XOR functions. Hidden layers are added and they are trained using backpropagation or a similar training algorithm. Using one layer linearly separable problems can be solved. Using two layers regions can be sorted and with three layers enclosed regions can be sorted.

More information:
Handwritten Arabic Numeral Recognition using a Multi Layer Perceptron

Written by Linda MacPhee-Cobb

February 28th, 2007 at 12:00 pm

Hebbian Learning

without comments

Hebbian Learning

“When the axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.” [D.O.Hebb, The Organization of Behavior]

In other words, in a neural net, the connections between neurodes get larger weights if they are repeatedly used during training.

There are adjustments that have been made to this rule. Weights are bounded between -1.0 and 1.0. Neurodes that are not used are decreased in value. Neohebbian Learning takes this into consideration. It iteratively computes each nodes connection weights using NewWeight = OldWeight F*ForgottenWeight + N*NewLearningWeight. F, N are constants between 0 and 1.0, F being how quickly to forget and N being how quickly to learn.

Differential Hebbian Learning adjusts the learning and forgetting by pro portion to the amount of change in weight since last cycle. Which is just the derivative of the neurode’s output over time.

Drive reinforcement theory developed by Harry Klopf is a learning system that modifies differential Hebbian learning. The weight increase depends on the product of the change in the output signal of the receiving neurode and the weighted sum of the inputs over time. This allows some temporal learning to occur in the system. This system is closer to the classical conditioned training done by Pavlov. 207

See also:
Reinforcement Learning

Written by Linda MacPhee-Cobb

February 27th, 2007 at 12:00 pm

Neural Networks

without comments

Neural Networks

Neural nets are good at doing what computers traditionally do not do well, pattern recognition. They are good for sorting data, classifying information, speech recognition, diagnosis, and predictions of non-linear phenomena. Neural nets are not programmed but learn from examples either with or without supervised feedback.

Modeled after the human brain, they give more weight to connections used frequently and reduce the size (weight) of connections not used. Some neural nets must be supervised while learning, given data to sort and given feedback as to whether data is correctly sorted, forward feed back propagation networks are the best understood and most successful of these. Some, such as self organizing networks, figure things out for themselves.

If a neural net is too large it will memorize rather than learn. Neural nets usually are composed of three layers, input, hidden, and output. More layers can be added, but usually little is gained from doing so. The connections vary by the network type. Some nets have connections from each node in one layer to the next, some have backward connections to the previous layer and some have connections with in the same layer.

McCulloch and Pitts, in 1943, proved that networks comprised of neurodes could represent any finite logical expression. In 1949 Hebb defined a method for updating the weights in neural networks. Kolmogorov’s Theorem was published in the 1950′s. It states that any mapping between two sets of numbers can be exactly done with a three layer neural network. He did not refer to neural networks in his paper, this was applied later. His paper also describes how the neural network is to be constructed. The input layer has one neurode for every input. These neurodes have a connection to each neurode in the hidden layer. The hidden layer has (2*n + 1) Neurodes, n is the number of inputs. The hidden layer sums a set of continuous real monotonically increasing functions, like the sigmoid function. The output layer has one neurode for every output. 205

Rosenblatt in 1961 developed the Perception ANN (artificial neural network). In the 1960′s Cooley and Tucky devised the Fast Fourier Transform algorithm which made signal processing with neural networks feasible. Widrow and Hoff then developed Adaline. 1969 was the year neural networks almost died. A paper published by Minsky and Papert showed that the XOR function could not be done with the Adeline and other similar networks. 1972 brought new interest with Kohonen and Anderson independently published papers about networks that learned with out supervision, SOM, (self organizing maps). Grossberg and Carpenter developed the ART (adaptive resonance theory) which learns with out supervision in the late 1960′s. The 1970′s brought NEOCOGNItrON, for visual pattern recognition. Hopfield published PDP (“Parallel Distributed Processing”) in three volumes. These books described neural networks in a way that was easy to understand.

Neural networks map sets of inputs to sets of outputs. Learning is what shapes the neural networks surface. Supervised learning algorithms take inputs and match them to outputs, correcting the network if the output does not match the desired output. Unsupervised learning algorithms do not correct the output given by the neural net. The net is provided with inputs, but not with outputs.

Training data for a neural net should be fairly representative of the actual data that will be used. All possibilities should be covered and the proportion of data in each area should match the proportion in the real data. Ways of training of neural nets:
Hard coded weights determined by experience or mathematical formulas can serve in place of a training algorithm.
Supervised training uses input and matching output patterns to let the net set the weights.
Graded training only uses input patterns, but then the neural net receives feedback on how accurate its answer is.
Unsupervised Training uses only input patterns then the neural nets out put is the correct answer.

Autonomous learning in neural nets is different from other unsupervised learning systems in that the neural net can learn selectively, it doesn’t learn every pattern input, only those that are ‘important’. An autonomous learning neural net has the following capabilities; it organizes information into categories without outside input and will reorganize them if it makes sense to do so; it retrieves information from less than perfect input; it is configured to work in parallel to keep speed reasonable; the system is always selectively learning; priorities given to input patterns can change; it can generalize; and it has more memory space than it needs; it must be able to expand and add to its knowledge rather than overwriting previously learned knowledge. Of course something this wonderful should also make your coffee and sort your email for you too.

The delta rule is used for error correction in backpropagation networks. This is also known as the least mean squared rule. NewWeight = OldWeight LearningConstant*NeurodeOutput(desiredOutput-actualOutput) The delta rule uses local information for error correction. This rule looks for a minimum. In an effort to find a minimum it may find a local minimum rather than the global minimum. Picture trying to find the deepest hole in your yard, if you measure small sections at a time you may locate a hole but it may not be the deepest in the yard. The generalized delta rule seeks to correct this by looking at the gradient for the entire surface, not just local gradients.

Simulated annealing is a statistical way to solve optimization problems, like setting a schedule or wiring a network. Boltzmann networks use this algorithm to learn. A random solution is chosen and compared to the current best solution found. The better of the two is kept and then depending on the problem some random changes are made. The amount of randomness in each loop is decreased over time allowing the net to slowly settle into a solution. The randomness helps to keep the net from settling into local minimas rather than global minimas.

Self organization is a form of unsupervised learning. This sets weights with a ‘winner take all’ algorithm. Each neurode learns a classification. Input vectors will be classed into the group to which they are closest.

The Lyapunov function, also known as the energy function, is used to test for convergence of the neural network. The function decreases as the network changes and assures stability.

Neural net building tool ( Java )

More information:
Birth of a Learn Law, Steve Grossberg

Introduction to Neural Networks

See also:
Neural networks in financial modeling
Song of the neurons

Written by Linda MacPhee-Cobb

February 26th, 2007 at 12:00 pm

Inter agent communication

without comments

Inter-agent Communication

One of the obstacles in designing agents is to find a common language for them to use to communicate with each other and other people’s agents. Right now each designer writes a communications system for her own agents, so they can only talk to each other, much like people who speak only one language can only speak with others who speak the same language. Several ideas have been put forth and these are the most promising.

KQML (Knowledge Query Markup Language) messages carry information about the type information they are transmitting; assertion, request, query. Performatives are the primitives which define permissible operations agents may do in effort to communicate with each other.

It uses special agents called facilitators that handle many tasks: Track locations of agents by specific identity or type of service; Track services available and needed by agents; Act like post offices, holding, forwarding, receiving messages for agents; Translate between agent communication languages; Break complex problems in to parts and distribute tasks to agents that can handle them; Monitor the agents.

KQML uses categories or levels for agent communication content the content of the message, text, binary strings etc. communication sender, recipient, message ids. message identifies protocols for message transfer, handles encoding, and de scriptions of content

Requirements of KQML
form simple, declarative, and easily understood by humans
semantics Should be familiar, unambiguous, well grounded in theory
implementation Needs to be efficient and backward compatible
networking Platform independent across networks and support synchronous and asynchronous.
environment will be distributed, dynamic and non-heterogeneous.
reliability must be reliable and secure
content the language should be layered, like all networked software.

There are still some difficulties with KQML. There are ambiguities and vagueness and misdirections, misclassifications and some things that are needed but not yet in existence, in statements known as performatives (statements that work just by declaration).

KSE (Knowledge Sharing Effort) is sponsored by the Advanced Research Projects Agency. It is a project to put together a uniform method for agents to communicate with each other. KQML is one of the projects being developed. The two main subproblems are: translating from one representation language to another; content sharing among applications. Communication protocols must develop: an interaction protocol; a communication language; and a transport protocol (SMTP, HTTP, …).

ACL (Agent Communication Language) has three pieces; vocabularies, KIF, and KQML. The vocabulary uses an open ended dictionary of terms that can be referenced by agents.

KIF (Knowledge Interchange Format) is particular syntax format, similar to Lisp, for first predicate calculus communication between agents. KIF can perform translation from one language format to another. It can also be used to communicate between agents. KIF is a first order predicate calculus using prefix format. It supports the definition of objects, functions, relations, rules and meta knowledge. It is not a programming language. KIF has three main parts; variables, operators, and constants. KIF has two types of variables; individual (begin with ?) and sequences (begin with @). It has four operators; term (objects), rule (legal logical inference steps), sentence (facts), definition (constants). A form is a sentence, rule or definition.

Telescript is an object orientated remote programming language for use with mobile agents developed by General Magic. It has three main parts: a language for developing agents and environments; an interpreter for the Telescript language; and communication protocols (TCP/IP). The entire application can be written in Telescript but usually a combination of Telescript and C/C++ is used.

KAoS (Knowledgeable Agent-orientated System) differs from the other inter agent communication methods in that it considers not only the message but the sequence of messages in which it occurs. This enables agents to coordinate frequently recurring interactions. KAoS makes use of ‘agent orientated programming’ which is an extension of object orientated programming. This provides a consistent structure for the agents and an easier way to do agent programming. The agents contain: Knowledge (facts, beliefs); Desires; Intentions; and Capabilities. From birth the agent goes into a loop of ‘updating the structure’ and ‘formulating and acting on intentions’, unless it is in a cryogenic state, until its death.

Communication takes place with messages containing verbs, and informa tion. The messages are structured much like the KQML messages. Communication between agents takes place only within the domain the agents are in. Proxy agents communicate between domains in a given environment. Mediation agents communicate with outside agents.

Instances of agents of particular classes are created to work in various do mains. Using inheritance specialized agents are created. Domain managers control entry and exit of agents in a domain, and matchmaker agents give ac cess and information about services in the domain they are in.

More information:
Robots evolve and learn how to lie

See also:
Ant Algorithms

Written by Linda MacPhee-Cobb

February 23rd, 2007 at 12:00 pm

Knowledge Representation and Predicate Calculus

without comments

One of the trickier parts of designing artificial intelligence is how to represent information to something that can not understand language. Knowledge representation is defined by its syntax and its facts (semantics) which together along with the rules for deducing new sentences make up the logic. Entailment is the relationship between current sentences and new sentences derived from the current sentences. A good knowledge representation language will combine the ease of spoken and written languages, like English, with the conciseness of a computer language, like C.

Propositional (boolean) Logic This consists of a syntax, the semantics and rules for deducing new sentences. Symbols are used to represent facts which may or may not be true. W may 72 represent the fact it is windy outside. The symbols are combined with boolean logic to generate new sentences. Facts can be true or false only. The program may know a statement is true or false or be uncertain as to its truth or falsity. The syntax consists of , v, => (if x then y), (equal), and (not). Backus-Naur Form (bnf) is the grammar used to put sentences together. The semantics are just the interpretation of a statement about a proposition. Truth tables are used to define the semantics of sentences or test validity of statements.

Truth table for AND

T and T = T

T and F = F

F and F = F

F and T = F

A model is a world in which a sentence is true under a particular interpreta tion. There can be several models at once that have the same interpretations. An inference rule is a rule that is derived from a pattern that is true all the time for specific cases.

First Order Logic (first-order predicate calculus)
This consists of objects, predicates on objects, connectives and quantifiers. Predicates are the relations between objects, or properties of the objects. Connectives and quantifiers allow for universal sentences. Relations between objects can be true or false as well as the objects themselves. The program may not know whether something is true or false or give it a probability of truth or falseness.

Procedural Representation
This method of knowledge representation encodes facts along with the sequence of operations for manipulation and processing of the facts. This is what expert systems are based on. Knowledge engineers question experts in a given field and code this information into a computer program. It works best when experts follow set procedures for problem solving, i.e. a doctor making a diagnoses.

The most popular of the Procedural Representations is the Declarative. In the Declarative Representation the user states facts, rules, and relationships. These represent pure knowledge. This is processed with hard coded procedures.

Relational Representation
Collections of knowledge are stored in table form. This is the method used for most commercial databases, Relational Databases. The information is manipulated with relational calculus using a language like SQL. This is a flexible way to store information but not good for storing complex relationships. Problems arise when more than one subject area is attempted. A new knowledge base from scratch has to be built for each area of expertise.

Hierarchical Representation
This is based on inherited knowledge and the relationships and shared attributes between objects. This good for abstracting or granulating knowledge. Java and C++ are based on this.

Semantic Net
A data graph structure is used and concrete and abstract knowledge is represented about a class of problems. Each one is designed to handle a specific problem. The nodes are the concepts features or processes. The edges are the relationships (is a, has a, begins, ends, duration, etc). The edges are bidirectional, backward edges are called ‘Hyper Types’ or ‘Back Links’. This allows backward and forward walking through the net. The reasoning part of the nets includes: expert systems; blackboard architecture; and a semantic net description of the problem. These are used for natural language parsing and databases. Predicate Logic and propositional Most of the logic done with AI is predicate logic. It used to represent objects, functions and relationships. Predicate logic allows representation of complex facts about things and the world. (If A then B). A ‘knowledge base’ is a set of facts about the world called ‘sentences’. These are put in a form of ‘knowledge representation language’. The program will ‘ASK’ to get information from the knowledge base and ‘TELL’ to put information into the knowledge base. Using objects, relations between them, and their attributes almost all knowledge can be represented. It does not do well deriving new knowledge. The knowledge representation must take perceptions and turn them into sentences for the program to be able to use them, and it must take queries and put them into a form the program can understand.

Resolution and unification
Resolution: prove A true by proving A Not is false. Unification: take two predicate logic sentences and using substitutions make them the same. Unification is the single operation that can be done on data structures (expression trees) in Prolog. These are the techniques used to process predicate logic knowledge and the are the basis for Lisp and Prolog.

Frames
Each frame has a name and a set of attribute-value pairs called slots. The frame is a node in a semantic network. Hybrid frame systems are meant to over come serious limitations in current setups. They work much like an object oriented language. A frame contains an object, its attributes, relationships and its inherited attributes. This is much like Java classes. We have a main class and sub classes that have attributes, relationships, and methods for use.

A logic has a language, inference rules, and semantics. Two logical languages are propositional calculus and predicate calculus.

Propositional Calculus which is a descendant of boolean algebra is a language that can express constraints among objects, values of objects, and inferences about objects and values of objects.

The elements of propositonal calculus are:
Atoms: the smallest elements
Connectives: or, and, implies, not
Sentences: aka ‘well-formed formula’s, wffs

The legal wwfs
disjunction: or
conjunction: and
implication: implies
negation: not

Rules of inference are used to produce other wwfs
modus ponens: ( x AND ( x implies y) ) implies y
AND introduction: x, y implies ( x AND y )
AND commutativity: x AND y implies x
OR introduction: x, y implies ( x OR y )
NOT elimination: NOT ( NOT x ) implies x
resolution: combining rules of inference into one rule:
{ example: (x OR y) AND ( NOT y OR z ) = x OR z
Horn clauses: a clause having one TRUE literal, there are three types: a single atom (q); an implication or rule ( p AND q = r ); a set of negative literals ( p OR q => ) . These have linear time algorithms.

Definitions:
Semantics:
associations of elements of a language with the elements of the domain

Propositions:
a statement about an atom, example: The car is running. Car is the atom, ‘is running’ is the proposition.

interpretation:
is the association of the proposition with the atom

denotation:
in a given interpretation the proposition associated with the atom is the denotation

value:
TRUE, FALSE, given to an atom

knowledge base:
a collection of propositional calculus statements that are true in the domain

truth table:
a tablular format for representing states

satisfies:
a true statement under a given interpretation

model:
an interpretation that satisfies each statement in a set of statements.

validity:
a statement that is TRUE under all interpretations

equivalence:
statements are equivalent if their truth values are identical under all interpretations.

Examples:
DeMorgan’s Laws:
NOT (x OR y ) = ( NOT x ) AND ( NOT y )
NOT ( x AND y ) = ( Not x ) OR ( NOT y )

Contrapositive:
( x implies y ) = ( NOT x implies NOT y )
if ( x = y ) then ( x implies y ) = ( y implies x )

Propositional satisfiability, aka PSAT
a model formula that comprises the conjunction of all the statements in the set.

Predicate Calculus takes propositional calculus further by allowing statements about propositions as well as about objects. This is first order predicate calculus.
Contains:
object constants, term: strings of characters, xyz, linda, paris
relation constants: divided by, distance to/from, larger than
function constants: small, big, blue
functional expression: examples: distance(here, there); xyz;
worlds: can have infinite objects, functions on objects, relations over objects
interpretations: maps object constants into objects in the world
quantifiers: can be universal or for a selected object or group of objects

Predicate Calculus is used to express mathematical theories. It consists of sentences, inference rules and symbols. First-order predicate calculus symbols consist of variables about which a statement can be made, logic symbols (and, or, not, for all, there exists, implies) and punctuation ( ‘(‘, ‘)’ ).

If we have a set S in which all of the statements are true then S is a model. If S implies U then U is true for all models of S and NOT U is false for all models of S. If we make a set S’ which has all of the statements of S and the statement NOT U it is not a model. All statements in a model must be true. S’ is unsatisfiable since there is no way for the statements of S and the statement NOT U, both of which are in S’ to be true at the same time. This is used to prove formulas in theorem proving. To show S implies U is is sufficient to show S’= S, NOT U is unsatisfiable.

Resolution and unification

Resolution: prove A true by proving A Not is false.

Unification: take two predicate logic sentences and using substitutions make them the same. Unification is the single operation that can be done on data structures (expression trees) in Prolog. These are the techniques used to process predicate logic knowledge and the are the basis for Lisp and Prolog.

Resolution is one way to prove unsatisfiability.

More information:
Programming in Haskell ( pdf)
Real World Haskell
The Haskell Road to Logic (pdf)
First Order Predicate Calculus
Syntax of Predicate Calculus (pdf)
Knowledge Representation and Predicate Calculus (pdf)
Predicate Calculus (pdf)

See also:
Knowledge based expert systems

Written by Linda MacPhee-Cobb

February 20th, 2007 at 12:00 pm