Archive for the ‘source code’ Category
Genetic Algorithms create agents who compete to find the best solution to a problem using evolution.
Agents are initially created either using the best chromosomes from a previous run or a random mixture of chromosomes. After each round we evaluate each agent to see how well he has done. We then mix up the genes by switching among our best agents and or adding in some mutations. We repeat this for a set number of rounds or until some goal is reached.
If only gene switching is done and no mutations are entered into the population the agents will often only find local maximums not global maximums. If the number of agents is too small a solution may not be found.
Simple Genetic Program – you have predators, prey and food for the prey. When either a predator or prey reaches reproduction energy 2 of 3 genes are passed directly to the child, one is random.
There are knowledge based agents and expert systems that reason using rules of logic. These systems that do what an expert in a given field might do, tax consulting, medical diagnosis etc. They do well at the type of problem solving that people go to a university to learn. Usually predicate calculus is used to work through a given problem. This type of problem solving is known as ‘system inference’. The program should be able to infer relationships, functions between sets, some type of grammar, and some basic logic skills. The system needs to have three major properties: soundness, confidence that a conclusion is true; completeness, the system has the knowledge to be able to reach a conclusion; and tractability, it is realistic that a conclusion can be reached.
Reasoning is commonly done with if-then rules in expert systems. Rules are easily manipulated, forward chaining can produce new facts and backward chaining can check statements accuracy. The newer expert systems are set up so that users, who are not programmers, can add rules and objects and alter existing rules and objects. This provides a system that can remain current and useful with out having to have a full time programmer working on it.
There are three main parts to the expert system: knowledge base, a set of if-then rules; working memory, a database of facts; inference engine, the reasoning logic to create rules and data.
The knowledge base is composed of sentences. Each sentence is a representation of a fact or facts about the world the agent exists in or facts the expert system will use to make determinations. The sentences are in a language known as the knowledge representation language.
Rule learning for knowledge based and expert systems is done with either inductive or deductive reasoning. Inductive learning creates new rules, that are not derivable from previous rules about a domain. Deductive learning creates new rules from existing rules and facts.
Rules are made of antecedent clauses (if), conjunctions (and, or) and consequent clauses (then). A rule in which all antecedent clauses are true is ready to fire or triggered. Rules are generally named for ease of use and usually have a confidence index. The confidence index (certainty factor) shows how true something is, i.e. 100\% a car has four wheels, 50\% a car has four doors. Sometimes sensors are also part of the system. They may monitor states in the computer or environment. The Rete algorithm is the most efficient of the forward chaining algorithms.
Reasoning can be done using ‘Horn Clauses’, these are first-order predicate calculus statements that have, at most, one true literal. Horn Clauses have linear order time algorithms and this allows for a faster method of reasoning through lots of information. This is usually done with PROLOG or lisp. Clauses are ordered as such: goal, facts, rules. Rules have one or more negative literals and one positive literal that can be strung together in conjunctions that imply a true literal. A fact is a rule that has no negative literals. A list of positive literals with out a consequent are a goal. The program loops checking the list in order, when a resolution is performed a new loop is begun with that resolution. If the program resolves its goal the proof can be given in tree form, ‘and/or tree’.
Nonmonotomic reasoning is used to fix problems created by a change in information over time. More information coming in negates a previous conclusion and a new one needs to be drawn.
A conflict resolution process must be put in place as well to deal with conflicting information. This can be done by: first come, first serve; most specific rule is kept; most recently changed data rule triggered; once rule is resolved take it out of the conflict resolution set.
Forward chaining takes the available facts and rules and deduces new facts which it then uses to deduce more new facts, or invoke actions. Forward chaining can also be done by simply the application of if-then statements: The RETE algorithm is the most efficient at doing forward chaining right now, it compiles the rules into a network that it traverses efficiently. This is similar to the blackboard systems.
Dynamic knowledge bases, known as truth maintenance systems, may be used. This uses a ‘spreadline’ which is similar to a spread sheet that will calculate missing and updated values as other values entered.
General algorithm forward chaining
load rule base into memory
load facts into memory
load initial data into memory
match rules to data and collect triggered rules
if conflict resolution done BREAK
use conflict resolution to resolve conflicts among rules
fire selected rules
Backward Chaining evaluates a goal and moves backward through the rules to see if true. An example is a medical diagnosis expert system, that takes in information from questions then returns a diagnoses. PROLOG systems are backward chaining.
General algorithm backward chaining
load rule base into memory
load facts into memory
load initial data
specify a goal
load rules specific to that goal onto a stack
if stack is empty BREAK
WHILE MORE ANTECEDENT CLAUSES
if antecedent is false pop stack and NEXT WHILE
if antecedent true fire rule and NEXT WHILE
if antecedent unknown PUSH onto stack (we may later ask user for more information about this antecedent.
The Rete Algorithm is considered to be the best algorithm for forward chaining expert systems. It is the fastest but also requires much memory. It uses temporal redundancy, rules only alter a few facts at a time, and structural similarity in the left hand side of rules to do so.
The Rete is a an acyclic graph that has a root node. The nodes are patterns and the paths are the left hand sides of the rules. The root node has one kind node attached to it for each kind of fact. Each kind node has one alpha node attached to it for each rule and pattern. Then the alpha nodes have associated memories which describe relationships. Each rule has two beta nodes. The left part is from alpha(i) and the right from alpha(i+1). Each beta node stores the JOIN relationships. Changes to rules are entered at the root and propagated through the graph.
Knowledge based agents loop through two main functions. One is to sense the world and TELL the knowledge base what it senses, two is to ASK what it should do about what it senses, which it then does. An agent can be constructed by giving it all the sentences it will need to perform its functions. An agent can also be constructed by a learning mechanism that takes perceptions about the environment and turns them into sentences that it adds to the knowledge base.
Particle Swarms take several particles move them randomly around an area looking for a minimum/maximum or some other condition. As particles test each location they land at the best score for the group and each particle is updated. The particles velocity is then adjusted to head more in the direction of each one’s personal best and in the direction of the group best. The velocities also contain a random variable to keep particles from swarming the first good location they find and ending up at a local min or max.
The algorithm is very simple:
For each particle:
test location to see if it is better than previous best
update velocity vectors
End for each particle
Test to see if any particle has beaten the group best so far and up date if so
The velocity vector is calculated:
OldVelocity += ConstantForGroup * randomNumberBetweenZeroAndOne * (GroupBestLocation – CurrentParticleLocation) + ConstantForParticles * randomNumberBetweenZeroAndOne * ( PersonalBestLocation – CurrentLocation)
The time step is decreased each cycle by multiplying dt by .99. Otherwise the particles start zooming around and fail to home in except by brute force.
The example takes six particles through a hundred loops and attempts to find the location of the maximum. The code is heavily commented and should be easy to understand and adjust.
Particle Swarm Optimization
Swarm Intelligence, Focus on Ant and Particle Swarm Optimization
Particle Swarm Optimization ( PSO ) in Python (source code available)
Swarm Behavior – National Geographic
The swarm is reporting for duty
Swarming birds, plasma, crowds and stock markets
Swarm intelligence reaches a new level
Are swarms chaotic
Swarm approach to photography
This AI process mimicks the physical process of annealing. Annealing is the process of heating and cooling a metal in a controlled manner. This gives us a stronger metal if the cooling part is done slowly. A higher temperature means more energy. So if I am trying to place a marble on a hilly landscape in the global minimum I may have to shake things up a great deal to bounce the marble out of a local minimum. So I start out with lots of energy hoping to locate the global minimum. Then reduce the energy hoping to zero in on the minimum better.
Start with a random solution or your best solution from a previous run. This is our current solution. The energy of our solution is then measured. For the eight queens problem the amount of energy is the amount of conflicts between our queens.
We then take a copy of the solution, bounce it around a bit and see if it has less energy than the previous solution. If so we use that as our working solution. If not we test it using P(dE) = exp( -dE/T). At higher temperatures we take worse solutions than we do when we get close to finding the correct solution.
Simulated Annealing is used in traveling salesman and scheduling problems and also it has been used to do some image clean up work.
Set up initial board
Copy initial solution to current, working and best solutions
Bounce the queens around a the board
See if this solution is better than last one ( uses less energy / has less conflicts )
if so use new solution
if not test again using P(dE) = exp( -dE/T )
if this gives a lower energy than a random number we pick, use this solution
Switch copy boards as needed
Bail from loop if problem solved ( energy < some number )
Simulated Annealing was developed by S. Kirkpatrick, CD Gelatt and MP Vecchi in 1983 and by Cerny 1985. It is a Monte Carlo type of solution to a problem.
Game theory is the study of how decision makers will act and react in various situations like negotiating business deals. It is used quite a bit in the study of economics and politics. This is the field that has recently gained some fame with the ‘A Beautiful Mind’ book and movie about John Nash. Much of game theory is built on the Nash Equilibrium. John Von Neumann laid much of the ground work for game theory. Using simplified models, often based only a few rules, many behaviors of people in various situations can be predicted.In these game models it is assumed that the players will make rational choices in each decision. Rational choice is defined as: ‘the action chosen by a decision maker is at least as good as every other available action’. A player is presented with a set of actions from which she chooses one. She may prefer one action over another or may consider some actions to be equally preferable. The only restriction on the actions preferences is that if a player prefers action A over action B, and she prefers action B over actions C then she must also prefer action A over C. A payoff function is used to describe the benefit of each action from the player’s point of view. For example I may visit a used car lot and find a car that is worth 1,000 by my valuation, you may value that car at 500. So the payoff function for the car for me is 1,000 for you it is 500. If I see another car I like that I value at 250, it only means I prefer the first car to the second car. It does not mean that I prefer it 4 times as much. The values are only to show ordering of choices.
The Nash Equilibrium is the place where each player can do no better, no matter what the other person decides to do.
In the well known game of ‘Prisoner’s Dilemma’ we have two crooks who worked together on a crime and have each been caught and are being held separately from each other. They each have a choice of finking or remaining quiet. If one finks, he walks with 1 years time in jail, the other person gets 4 years in jail. If neither finks there is enough evidence to put each away for 2 years time.
The highest score of all the plays is for both crooks remain quiet and each receives 2 years jail time. But if Crook one finks, he gets a score of 3 (1 years time ) against the other player who remains quiet or a score of 1(and gets 3 years jail time) So his best bet is to fink, as is the other crooks. The Nash equilibrium is at fink/fink (1,1) since finking is the best move for each player individually.
The payoff function for this game is the same for each player and is: f(Fink, Quiet) f(Quiet, Quiet) f(Fink, Fink) f(Quiet, Fink) So we could as easily score it 3, 2, 1, 0 rather than counting years out or 32, 21, 9, 1 the score only serves to order the choices. A clearer example is a game in which we have 2 players moving pieces on a 3-d game board. Each player can move in the x, y, or z direction.
The * is the best move for each player considering what the other player does. If player 1 moves in the Y direction player 2’s best move is also in the Y direction (2*,1) The squares with both players have *s are X1 X2 and Z1 Y 2 . These are both Nash equilibriums. Finding the Nash Equilibrium this way is called ‘Best Response’
Suppose instead of set numbers I have a function that describes the payoff for each player. I could have A(x) = y 2 andB(y) = 1 2 x + xy then to find the Nash equilibrium I take the derivative of each function, set it to zero, solve and plot. Any and all places the functions cross on the plot are Nash equilibriums.
It is not possible, even for most simple games, to search all the possible routes the game can take. Too much time and hardware would be needed. For example tic-tac-toe is one of the simplest games we all know. A game tree that mapped all possible moves from start to finish would be 9! or 362,880 nodes large. The first player would have 9 choices of which box to play in, the second 8 choices since the first player had taken one, the first player’s second move would have 7 choices, etc. So the top level of nodes would have 9 choices. Each level in the tree represents a turn in the game. The second level would have 8 nodes off of each of the original 9 nodes and so on. So you can imagine what chess or other more complicated games have as the number of possible moves. All games and all competitions can be represented by trees.
Pruning is used to take sections off of the search tree that make no difference to play. Heuristic (rule of thumb) evaluations allow approximations to save search time. For instance in the tic-tac-toe tree described above once the first player chooses a position to play then the other 8 nodes of the top layer can be trimmed off and only the 8 trees under that node need to be searched.
Since it is not usually practical to calculate each possible outcome a cut off is usually put in place. As an example look at a chess game. For each board in play we can calculate the advantage by adding up the point value of the pieces on the board or adding points for position. Then the program can see which of those gives the program a higher score. Then the program need only calculate five or so moves ahead, calculate the advantage at each node and choose the best path. Rather than calculate ahead a set number of moves, the program can use an iterative deepening approach and calculate until time runs out. A quiescent search restricts the above approach. This eliminates moves that are likely to cause wild swings in the score. The horizon problem occurs when searches do not look ahead to the end of the game. This is a current unsolved problem in game programming.
The min max algorithm assumes a ‘zero sum game’, such as tic-tac-toe where what is good for one player is bad for the other player. This algorithm assumes that both players will play perfectly and attempt to maximize their scores. The algorithm only generates the trees on the nodes that are likely to be played. Max is the computer, min is the opposing player. It is assumed max will get first turn.
-generate entire game tree down to the maximum level to check generate each terminal state value, high values are most beneficial to max, negative values are most beneficial to min, zero holds no advantage for either player.
-go up one level, give the node above the previous layer the best score from the prior layer
-continue up the tree one level at a time until top is reached
-pick the node with the highest score.
The Alpha-Beta method determines whether an evaluation should be made of the top node by the min-max algorithm. It searches all of the nodes, like min max, then eliminates (prunes) those that are never going to reached. The program begins by proceeding with the min-max algorithm systematically through the nodes of a tree. First we go down a branch of the tree and calculate the score for that node. Then we proceed down the next branch. If the score at one of the leaves is lower than the score obtained in a previous branch of the tree we don’t finish evaluating all the nodes of the branch, rather we move onto the next branch. The search can be shallow rather than deep saving time. Further gains in speed can be made by caching the information from branches in a look up table, re-ordering results, extending some and shortening other searches, or using probabilities rather than actual numbers for cutoffs and using parallel algorithms.
Ordering may be used to save time as well. In chess captures would be considered first, followed by forward moves, followed by backward moves. Or, ordering can consider the nodes with the highest values first.
The program must try to find a winning strategy that does not depend on the human user’s moves. Humans often make small goals and consider moves that work toward that goal, i.e. capture the queen. David Wilkins Paradise is the only program so far to do this successfully. Another approach is to use book learning. Several boards are loaded into a table in memory and if the same board comes into play the computer can look up what to do from there. The Monte Carlo simulation has been used successfully in games with non-deterministic information, such as; Scrabble, dice, and card games.
Temporal-difference learning is derived from Samuel’s machine learning re search. Several games are played out and kept in a database. This works well with board games like backgammon, chess and checkers. Neural nets can be trained to play games this way, TD Gammon being one the more famous ones.
Martin J Osborne
Distinguishing the opponents in the prisoner dilemma in well-mixed populations
Towards Network Games with Social Preferences
An introduction to quantum game theory
Games on social networks: On a problem posed by Goyal
Are mathematics the new artificial intelligence?