Herself’s Artificial Intelligence

Humans, meet your replacements.

Archive for February, 2007

Reasoning and common sense

without comments

Reasoning Programs and Common Sense

Programs that have commonsense are some of the earliest attempts at artificial intelligence. The problem with the commonsense programs is there is no way to easily define the knowledge in a way the computer can use with out the computer understanding language. Little success was made here, but the descendants of common sense programs, expert systems have been very successful at problem solving of the type people learn in college, like engineering, law and medical diagnoses. From these studies came the computer languages Lisp and Prolong.

“Common sense knowledge includes the basic facts about events (including actions) and their effects, facts about knowledge and how it is obtained, facts about beliefs and desires. It also includes the basic facts about material objects and their properties.” [John McCarthy]

John McCarthy’s paper on common sense shaped the artificial intelligence movement for several decades. The Advice Taker is a program that was to have commonsense. He states that ‘A program has common sense if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows’. So the goal is for the Advice Taker to be able to discover simple abstractions. McCarthy defines proof of a program having common sense to be: that all behaviors be representable in the system; interesting changes in behavior must be expressible in a simple way; the system must understand the concept of partial success; and the system must be able to create subroutines to be used in procedures.

Another attempt at a general problem solving program following this type of reasoning is the “General Problem Solver” [Newell, Shaw, Simon, Ernst]. The program was given objects and operators with which to compare objects, take actions, and develop situations. An initial situation and a goal are given to the program and the program determines how to get from here to there. Three basic steps are taken to do this:
Evaluate difference between current situation and the goal state
Find an operator that lowers the difference between current and goal state
If it is legal to apply this operator, do so, else determine a situation that can use that operator and set it as a short term goal.

Reasoning Programs are the next step in this line of research. They will need to be able to sense and or find information about their environment, to prove whether or not solutions exist to problems given them. They will need to be able to reason out steps from initial situations to goals. Importantly it will need a language to define objects, goals, operators, logic, and temporality, states of being and other things.

Situational states or states of being are data structures and often defined with matrices, lists, statements, sets, trees, or more commonly graphs. Situational calculus, a derivative of first-order predicate calculus was developed to allow for states to be described as data structures as well as the effect of actions on states.

Common sense is very different than intelligence or education. Some people have one, two, or all three of these qualities. Teaching and testing for common sense has not progressed well with people and will probably not do well with computers until we have a greater understanding of exactly what common sense is, how it is acquired and how it can be tested for. Some other problems developing these systems are putting common sense into a language that is easily understood by people and computers. A second major problem has been representing time and changes that occur over time. Common sense seems to be learned from doing rather than being taught, so it may be that agents may gain common sense about the computer network they exist on, or further down the line robots may gain a bit of what we consider common sense about our world.

See also:
Knowledge based expert systems

Written by Linda MacPhee-Cobb

February 19th, 2007 at 12:00 pm

Bayesian Logic

without comments

Bayesian neural networks and expert systems (a.k.a. Uncertainty represen tation, belief networks, probabilistic networks) are based on “Bayes’ rule” or “Bayes’ theorem” which is a statistical theory. It was developed by Thomas Bayes in the 18th century. It takes and flips the probability given the original conditional probability. This is used to deal with uncertainty in expert systems. They often form the main part of spell correcting and speech recognition programs.

P (y|x) = P (x|y)P (y) / P (x)
Example:
P(A) is the event of a person having cancer (10%)
P(B) is the event of person being a smoker (50%)
P(B–A) is the percent cancer patients who smoke (80%)
We wish to know the likelihood of the smoker having cancer P (A|B) = (.8.1) .5 = .16 or a 16%chance.

A Bayesian network is an acyclic tree graph. An acyclic tree graph can not cycle back to previous conditions. Its nodes, occurrences, contain the possible outcomes and tables of probabilities of each considering the inputs to this node. The connecting edges contain the effects of occurrences on one another. The probabilities of all occurrences must total 100%, and all occurrences must be accounted for.

A node must be conditionally independent of any subset of nodules that are not descendants of it, this reduces the number of possibilities for each node that must be calculated.

There are three commonly used patterns of inference in Bayes Networks; Top-down which uses a chain rule to add up probabilities; Bottom-up which uses Bayes Rule; and a hybrid system. All of these use recursion in the algorithms making them computationally intensive.

Children of a parent node can be independent of each other, none of them contributing to the probabilities of another. In that case the parent is said to d-separate them. This can be used to cut down the number of calculations needed to work through the net.

The is network trained by giving the likely probabilities to seed it. When something new happens the probabilities are re-evaluated. This causes all the probabilities to be re-calculated (remember they must total 100%). The network structure must also be determined. Often this can be done before training occurs. Hidden nodes can sometimes help reduce the size of the network.

More information:
Bayesian Nets
Java Bayes

See also:
Graphical models of knowledge representation

Written by Linda MacPhee-Cobb

February 16th, 2007 at 2:00 pm

Hidden Markov Models

without comments

These have been used in speech recognition, handwriting recognition and in many biotechnology projects.

Markov Chain: is a statistical technique that uses a weighted automaton, a weighted directed graph, in which the input sequence uniquely determines the path through the automaton to the output observed.

Hidden Markov Model: is a weighted automaton in which only one path is allowed per specific input. The Viterbi is the most commonly used algorithm for processing these models. Viterbi Algorithm traces through state graph multiplying the probabilities.

If the probability from the previous level is higher it back traces Example, for words: need (n-iy), neat (n-iy-t), new (n-uw), knee (n-iy)

Begin
n/1.0
iy/.64 uw/.36
t/.24 d/.315
End

Possible paths are:
new/n – uw => 1.0 .36 => .36
neat/n – iy – t => 1.0 .64 .24 => .128
need/n – iy – d => 1.0 .64 .445 => .178
knee/n – iy => 1.0 .64 .315 => .2016

The first loop checks n – uw1.0 .36 = .36 and n – iy1.0 .64 = .64 64 is a higher probability so we pursue that

Next pass gives us iy – t.64 .24 = .128
iy.64 .315 = .2016
iy – d.64 .445 = .178

But these are smaller than the .36 we collected as a high probability in the previous pass so we back track to that. If there were more levels through our graph we would continue this loop until reaching the end.

The probabilities are calculated as so: weight = -log[actualprobability] so if the probability of n – uwis.44 the graph weight is -log(.44) => .36

See also:
Simple example, Java source parsing and printing text using Markov models
Graphical models of knowledge representation

Papers:
An Efficient Hidden Markov Mobel for Offline Handwritten Numerical Recognition
Network as a computer: ranking paths to find flows ( pdf )
A spectral algorithm for learning hidden markov models

Written by Linda MacPhee-Cobb

February 15th, 2007 at 2:00 pm

Support Vector Machines

without comments

Support Vector Machines

Support Vector Machines (SVM) are based on a non-linear generalization of the ‘Generalized Portrait’ algorithm developed in Russia in the 1960s. They have been around since the 1970s but only recently have begun to attract attention. They have been successfully used for handwriting and speech recognition, as well as speaker recognition and have the ability to pull objects such as faces out of images.

Support vector machines can sort data into two classes, it is in the set or it is not in the set. Data which is not linearly separable can become linearly separable in higher dimensions. However, if data is put into too many dimensions then data classes are memorized rather than learned. If the data is memorized then new data is not handled well, this is known as over-fitting. The error rate of SVM can be explicitly calculated which can not be done for neural networks.

We want to create a hyperplane that gives the maximum possible distance between the points in the set and the points out of the set, with the maximum margin around it. So we want the maximum distance between the point in each set that is closest to the other group. We then create a margin of two lines between the data. The main method used to do this is using Lagrange Multipliers. (aka the ‘Quadratic Programming Problem’ most better spreadsheets and math programs have this built in to them.)

The ‘kernel’ is a formula for the dot product in higher dimensional ‘feature space’. Feature space is the higher dimension space we have mapped our data into to make it linearly separable. A Polynomial of various dimensions and Gaussian Kernels are the most commonly used.

More information:
Overview of Support Vector Machines
Intro to SVM

Written by Linda MacPhee-Cobb

February 14th, 2007 at 2:59 am

User Interfaces

without comments

Computers were originally only used by people familiar with the workings of them and programmers. The GUI (Graphical User Interface) originally was demonstrated by Ivan Sutherland in Sketchpad, his Phd thesis. Others attribute the GUI interface design to the smalltalk Project at Xerox. Apple was the company to bring the GUI to the public, followed by Windows and lastly by Unix/Linux users, who stubbornly considered it to be unnecessary overhead.

Later came hypertext and the Internet morphed into the WWW as we know it today. Hypertext was first used by Vannevar Bush in MEMEX. The term hypertext was coined later by Ted Nelson. The universities used it in information tools and Apple used it in HyperCard bringing the idea to the public. Others attribute the first use of hyperlinks to Dynatext.

Interactive interfaces with bots and agents combined with speech recognition are the current cutting edge in interfaces. Many new concerns come with the new interfaces. A too human appearing interface may be scary, or may attributed far more intelligence than it should be. Consider how much intelligence the public already attributes to computers. The user interface should speed up and make the task easier, not exist to entertain and should not slow down the task at hand.

Some interfaces may be able to do speech interaction. If the task is user intensive such as word processing, searches, etc. Then speech in place of typing will not slow things down since the computer is already tied to the users speed. Otherwise traditional keyboarding and mouse interfaces should be used.

Other, non-anthropomorphic interfaces should be considered. People take in large amounts of information visually. Looking at a plot is far more informative than looking a list of numbers. A graphical user interface can convey much more information than a textual one. TreeViz conveys information about files stored on a users hard drive. It uses color sound and shape to map the whole drive on the screen in front of the user for easy cleanups. Animations should only be used to convey information quickly, or speed up user interactions.

See also:
Does your computer have a Halloween costume?
Games are merging fantasy with reality
Just how real is that dinosaur you are carrying?

Written by Linda MacPhee-Cobb

February 13th, 2007 at 2:00 pm