Archive for the ‘topics in artificial intelligence’ Category
Reinforcement Learning
This is an algorithm where an artificial intelligence interacts with the environment, receives either positive or negative feedback and adjusts its behavior due to this reinforcement.
Reinforcement learning is usually used to solve Markov decision problems. Markov decision problems have a set of states; possible actions for each state; probabilities and rewards for each action; and a goal for the artificial intelligence to reach.
We start by setting up an environment of states. The artificial intelligence then repeatedly explores our states and will learn an optimal method of finding a path from start to goal that gains it the most rewards. Think of a game character starting on one square of a grid and trying to get to the final square. Between the start and end square are squares filled with gold and traps, like a Wupus World game board. Reinforcement learning is used in control problems for robotics, not just in games.
Learning can be supervised or unsupervised.
One popular unsupervised reinforcement learning algorithm is the Q Learning Algorithm.
Start:
Loop:
For current state select an action
Collect reward or cost
Update Q Value
Move to new state
End Loop:
Update Q Value
Q-Value += Learning_Rate ( Reward_New_State + Discount_Factor( Max_Q_New_State – Q-Value ) )
Move to new state
Use either Greedy algorithm or a Probabilistic Greedy which calculates best choices and randomly chooses one of the better choices.
This algorithm can be slow and memory intensive but has been used successfully in game problems, scheduling problems, and robotic problems.
See also:
Hidden Markov Models
Why Reinforcement Learning is Important
Reinforcement Learning (CASTrader Blog)
Wiki, Reinforcement Learning
Wiki, Q Learning
More information:
Ms. Pac-Man Plays Herself
Ant algorithms
The ant algorithm was developed from real ants. Ants searching for food leave a trail of pheromones. An ant who takes the shortest path between the nest and the food leaves the most amount of pheromone since he makes more trips. The stronger the pheromone scent the more ants follow that trail.
This has been applied to the Traveling Salesman Problem, as well as to scheduling problems. Though it is an interesting solution it is not fast so may have limited applications with out modification.
This algorithm assumes a fully connected graph, otherwise your ants may quickly box them selves into corners. Each ant starts in a different city. Each ant chooses a city he has not yet visited. The choice is based on a greedy heuristic ( shortest path ) and the path with the greatest amount of pheromone.
Once a path is chosen, the ant puts down pheromone. The amount of pheromone is some constant divided by distance. This gives more pheromone to shortest paths.
The older pheromone evaporates. So each round we decrease the pheromone on each edge by a small amount.
Algorithm:
-choose a map or generate one
-generate map ( pick a specific number of cities, generate a random x,y for each to give it a position. )
-create one ant for each city
-start each ant in a different city
LOOP
for each ant {
pick city to move to
lay down pheromone along path just traveled
add this city to city list so we can see the path the ant took when we are done
update the total distance ant traveled so far
}
for each path {
decrease pheromone
}
END LOOP
Print shortest path
} //end
Pick_City( cities_ant_can_move_to ){
beta //is length or amount of pheromone most important? Use a number between 0 and 1
for each city ant may choose from this location{
Score = Amount_of_Pheromone + Length_to_City^Beta
If this score lowest keep
}
Move on path with least score
}
Lay_down_pheromone( edge ){
gamma //some constant between 0 & 1 to adjust pheromone
pheromone_on_this_edge += gamma/length_of_edge
}
Decrease_pheromone_with_time( edge ){
alpha // some number between 0 and 1
pheromone_on_this_edge *= alpha
}
Ant{
total_distance_traveled;
cities_traveled_in_order;
}
Edge{
City1;
City2;
length;
current_amount_of_pheromone;
}
More information:
The ant algorithm was developed from real ants. Ants searching for food leave a trail of pheromones. An ant who takes the shortest path between the nest and the food leaves the most amount of pheromone since he makes more trips. The stronger the pheromone scent the more ants follow that trail.
This has been applied to the Traveling Salesman Problem, as well as to scheduling problems. Though it is an interesting solution it is not fast so may have limited applications with out modification.
This algorithm assumes a fully connected graph, otherwise your ants may quickly box them selves into corners. Each ant starts in a different city. Each ant chooses a city he has not yet visited. The choice is based on a greedy heuristic ( shortest path ) and the path with the greatest amount of pheromone.
Once a path is chosen, the ant puts down pheromone. The amount of pheromone is some constant divided by distance. This gives more pheromone to shortest paths.
The older pheromone evaporates. So each round we decrease the pheromone on each edge by a small amount.
Algorithm:
-choose a map or generate one
-generate map ( pick a specific number of cities, generate a random x,y for each to give it a position. )
-create one ant for each city
-start each ant in a different city
LOOP
for each ant {
pick city to move to
lay down pheromone along path just traveled
add this city to city list so we can see the path the ant took when we are done
update the total distance ant traveled so far
}
for each path {
decrease pheromone
}
END LOOP
Print shortest path
} //end
Pick_City( cities_ant_can_move_to ){
0<beta<1 //is length or amount of pheromone most important?
for each city ant may choose from this location{
Score = Amount_of_Pheromone + Length_to_City^Beta
If this score lowest keep
}
Move on path with least score
}
Lay_down_pheromone( edge ){
gamma //some constant between 0 & 1 to adjust pheromone
pheromone_on_this_edge += gamma/length_of_edge
}
Decrease_pheromone_with_time( edge ){
alpha // some number between 0 and 1
pheromone_on_this_edge *= alpha
}
Ant{
total_distance_traveled;
cities_traveled_in_order;
}
Edge{
City1;
City2;
length;
current_amount_of_pheromone;
}
See also:
SantaFe Ant trail description and source code
Ant Colony Algorithm for Weighted Item Layout Optimization Problem
More information:
Central Intelligence: From Ants to Web ( YouTube TED video )
Artificial Intelligence network load balancing using Ant Colony Optimisation ( source code and article )
Ant Colony System: A Cooperative Learning Approach to the TSP
ACO Algorithms for the TSP
Wiki, Ant colony optimization
Ants riddled with cheating and corruption ( perhaps they are not as simple as we think? )
Classifier systems
John Holland in the mid 1970s designed the first ‘classifier systems’. These systems take inputs, match them to known conditions then perform the action requested.
Inputs can be true, negative, or not relevant ( 1, -1, 0 ). Say we have a game character who can be hungry, thirsty both or neither. We would then have inputs:
| hungry | thirsty |
| -1 | -1 |
| -1 | 1 |
| 1 | -1 |
| 1 | 1 |
Inside our program we will have a table of four possible conditions and a specific action for each.
I. If -1, -1 keep doing what we were doing
II. If -1, 1 stop and get a drink
III. If 1, -1 stop and eat
IV. If 1, 1 stop eat and drink.
If some conditions over lap we can use statistics to find the closest match. In newer classifiers we also allow for learning new conditions.
If no existing condition matches our state create a new condition. Usually we start by adding zero ( not relevant ) to existing conditions in various inputs and building a rule based on that. If we run out of rule space we drop a little used rule from our database.
Currently these are used in determining what type of melanoma a person has; Price Grabber uses these algorithms to sort and fetch prices; and in NLP ( natural language processing ) applications.
For more information:
Natural Language Processing
Classifier System Abstracts
What’s a Classifier System
Wiki, Learning Classifier System
Knowledge based expert systems
There are knowledge based agents and expert systems that reason using rules of logic. These systems that do what an expert in a given field might do, tax consulting, medical diagnosis etc. They do well at the type of problem solving that people go to a university to learn. Usually predicate calculus is used to work through a given problem. This type of problem solving is known as ‘system inference’. The program should be able to infer relationships, functions between sets, some type of grammar, and some basic logic skills. The system needs to have three major properties: soundness, confidence that a conclusion is true; completeness, the system has the knowledge to be able to reach a conclusion; and tractability, it is realistic that a conclusion can be reached.
Reasoning is commonly done with if-then rules in expert systems. Rules are easily manipulated, forward chaining can produce new facts and backward chaining can check statements accuracy. The newer expert systems are set up so that users, who are not programmers, can add rules and objects and alter existing rules and objects. This provides a system that can remain current and useful with out having to have a full time programmer working on it.
There are three main parts to the expert system: knowledge base, a set of if-then rules; working memory, a database of facts; inference engine, the reasoning logic to create rules and data.
The knowledge base is composed of sentences. Each sentence is a representation of a fact or facts about the world the agent exists in or facts the expert system will use to make determinations. The sentences are in a language known as the knowledge representation language.
Rule learning for knowledge based and expert systems is done with either inductive or deductive reasoning. Inductive learning creates new rules, that are not derivable from previous rules about a domain. Deductive learning creates new rules from existing rules and facts.
Rules are made of antecedent clauses (if), conjunctions (and, or) and consequent clauses (then). A rule in which all antecedent clauses are true is ready to fire or triggered. Rules are generally named for ease of use and usually have a confidence index. The confidence index (certainty factor) shows how true something is, i.e. 100\% a car has four wheels, 50\% a car has four doors. Sometimes sensors are also part of the system. They may monitor states in the computer or environment. The Rete algorithm is the most efficient of the forward chaining algorithms.
Reasoning can be done using ‘Horn Clauses’, these are first-order predicate calculus statements that have, at most, one true literal. Horn Clauses have linear order time algorithms and this allows for a faster method of reasoning through lots of information. This is usually done with PROLOG or lisp. Clauses are ordered as such: goal, facts, rules. Rules have one or more negative literals and one positive literal that can be strung together in conjunctions that imply a true literal. A fact is a rule that has no negative literals. A list of positive literals with out a consequent are a goal. The program loops checking the list in order, when a resolution is performed a new loop is begun with that resolution. If the program resolves its goal the proof can be given in tree form, ‘and/or tree’.
Nonmonotomic reasoning is used to fix problems created by a change in information over time. More information coming in negates a previous conclusion and a new one needs to be drawn.
A conflict resolution process must be put in place as well to deal with conflicting information. This can be done by: first come, first serve; most specific rule is kept; most recently changed data rule triggered; once rule is resolved take it out of the conflict resolution set.
Forward chaining takes the available facts and rules and deduces new facts which it then uses to deduce more new facts, or invoke actions. Forward chaining can also be done by simply the application of if-then statements: The RETE algorithm is the most efficient at doing forward chaining right now, it compiles the rules into a network that it traverses efficiently. This is similar to the blackboard systems.
Dynamic knowledge bases, known as truth maintenance systems, may be used. This uses a ‘spreadline’ which is similar to a spread sheet that will calculate missing and updated values as other values entered.
General algorithm forward chaining
load rule base into memory
load facts into memory
load initial data into memory
match rules to data and collect triggered rules
LOOP
if conflict resolution done BREAK
use conflict resolution to resolve conflicts among rules
fire selected rules
END LOOP
Backward Chaining evaluates a goal and moves backward through the rules to see if true. An example is a medical diagnosis expert system, that takes in information from questions then returns a diagnoses. PROLOG systems are backward chaining.
General algorithm backward chaining
load rule base into memory
load facts into memory
load initial data
specify a goal
load rules specific to that goal onto a stack
LOOP
if stack is empty BREAK
pop stack
WHILE MORE ANTECEDENT CLAUSES
if antecedent is false pop stack and NEXT WHILE
if antecedent true fire rule and NEXT WHILE
if antecedent unknown PUSH onto stack (we may later ask user for more information about this antecedent.
END LOOP
The Rete Algorithm is considered to be the best algorithm for forward chaining expert systems. It is the fastest but also requires much memory. It uses temporal redundancy, rules only alter a few facts at a time, and structural similarity in the left hand side of rules to do so.
The Rete is a an acyclic graph that has a root node. The nodes are patterns and the paths are the left hand sides of the rules. The root node has one kind node attached to it for each kind of fact. Each kind node has one alpha node attached to it for each rule and pattern. Then the alpha nodes have associated memories which describe relationships. Each rule has two beta nodes. The left part is from alpha(i) and the right from alpha(i+1). Each beta node stores the JOIN relationships. Changes to rules are entered at the root and propagated through the graph.
Knowledge based agents loop through two main functions. One is to sense the world and TELL the knowledge base what it senses, two is to ASK what it should do about what it senses, which it then does. An agent can be constructed by giving it all the sentences it will need to perform its functions. An agent can also be constructed by a learning mechanism that takes perceptions about the environment and turns them into sentences that it adds to the knowledge base.
The Plant Doctor (perl)
Plant doctor in action
iPhone Plant Doctor
Particle Swarms
Particle Swarms take several particles move them randomly around an area looking for a minimum/maximum or some other condition. As particles test each location they land at the best score for the group and each particle is updated. The particles velocity is then adjusted to head more in the direction of each one’s personal best and in the direction of the group best. The velocities also contain a random variable to keep particles from swarming the first good location they find and ending up at a local min or max.
The algorithm is very simple:
Initialize particles
Loop:
For each particle:
move
test location to see if it is better than previous best
update velocity vectors
End for each particle
Test to see if any particle has beaten the group best so far and up date if so
End Loop:
The velocity vector is calculated:
OldVelocity += ConstantForGroup * randomNumberBetweenZeroAndOne * (GroupBestLocation – CurrentParticleLocation) + ConstantForParticles * randomNumberBetweenZeroAndOne * ( PersonalBestLocation – CurrentLocation)
The time step is decreased each cycle by multiplying dt by .99. Otherwise the particles start zooming around and fail to home in except by brute force.
The example takes six particles through a hundred loops and attempts to find the location of the maximum. The code is heavily commented and should be easy to understand and adjust.
See also:
Particle Swarm Optimization
Swarm Intelligence, Focus on Ant and Particle Swarm Optimization
Ant Algorithms
Particle Swarm Optimization ( PSO ) in Python (source code available)
Swarm Behavior – National Geographic
The swarm is reporting for duty
Swarming birds, plasma, crowds and stock markets
Swarm intelligence reaches a new level
Are swarms chaotic
Swarm approach to photography