Archive for the ‘things you should know’ tag
Data Mining
Most datamining is either done using private corporate databases, online government databases, or with web bots, spiders and scrapers. RSS has made data mining the web trivial with PERL. I’m told it is trival with PHP also, I’m still experimenting with that.
Currently, with the help of computers, most fields of science, the government and businesses are collecting data faster than they can comb through it. Some agencies have what would be hundreds of years worth of data if it had to be parsed by humans. So we need to use artificially intelligent datamining to sort the data, develop useful informative rules about the data, and or put it into useful formats for us humans.
This task of artificial intelligence is often put under the category of ‘machine learning’. Sometimes a set of rules is used. The rules may be created by an expert ( domain knowledge ) or discovered through machine learning using statistics.
Problems with this type of machine learning include coming up with an insane number of rules that are far too specific ( over fitting ) and using example data that skews the learning. Other problems include when do you decide you have the best set of rules? At what point is your algorithm good enough? Do you want all possible out comes or is it only specific outcomes you need? An example would be do you need 40 categories of healthy plant or only descriptions and diseases for unhealthy ones?
Datamining has four main styles of sorting through data. Classification: classes are presented and future data is to be sorted into one of the given classes. Association: associations between data are sought. Clustering: data is sorted into clusters usually using various traits as vectors. Prediction: in which some specific information, usually numeric is to be output.
Data details for data mining is often stored in ARFF Attribute-Relation File Format
More information:
Applications of Machine Learning and Rule Induction
Machine Learning ( Theory )
UT ML Group: Text Data Mining ( several papers here )
UCI Machine Learning Repository has over 160 data sets for you to use to test and develop your AI.
See also:
Electronic cop solves crimes
Finding new diseases for known cures
Predator Prey Chases using Potential Functions
Using potential functions for predator prey movement gives a more realistic chase than using line of sight or other line algorithms.
Potential functions are well known in physics and describe attraction and repulsion due to electricity, gravity etc.
Potential_Energy = – Attraction/distance^j + Repulsion/distance^k
A strong attraction will cause the predator to chase the prey, a strong repulsion will cause the prey to flee the predator. Either of these can be set to zero. Predators may only be attracted to prey, prey may only be repulsed by predators. Various prey and predators can be given slightly different numbers for attraction, repulsion, j, k allowing them to behave differently.
To chase or run a character using potential functions use this algorithm:
Move_Predator (){
double attraction = some_constant; // experiment with these, the size of your game
double repulsion = some_constant; // area will determine good choices
double j = some_constant; // try something between 1 to 3 for these numbers
double k = some_constant; // higher number mean smaller results
double x_predator;
double y_predator;
double x_prey;
double y_prey;
//distance between predator and prey
double distance_x = x_predator – x_prey;
double distance_y = y_predator – y_prey;
double distance = sqrt ( distance_x*distance_x + distance_y*distance_y);
//normalized distance vector
distance_x = distance_x / distance;
distance_y = distance_y / distance;
//see what our attraction or repulsion is
double potential = -attraction/distance^j + repulsion/distance^k;
//now use that attraction/repulsion to see how far we move
double move_x = potential * distance_x;
double move_y = potential * distance_y;
//new location for us
double new_x = predator_x + move_x;
double new_y = predator_y + move_y;
}
More information:
The use of potential functions on modelling animal movement
Advanced Movement Model of Crowd Robots ( pdf )
Books:
AI for Game Developers
Natural Language Processing
Natural language processing allows people to interact with computers and robots in a more natural way. There are five major areas in NLP: Speech recognition, Language processing, Language understanding, Language generation, Speed synthesis.
Speech recognition consists of breaking down speech into words. Most modern systems require a pause between words so the computer knows where one word ends and another begins.
Language processing is breaking down of words it is then parsed using grammar rules and a lexicon. A lexicon is a list of words to be understood by the computer and attributes for those words( noun/verb/etc ). The grammar is a formal description of the language. It might have rules explaining how sentences may be formed example Subject->NounPhrase->Verb or NounPhrase->VerbPhrase etc. Sometimes you have parsers that break a sentence down into parts and a parse tree. Other times you have a recognizer which parses the sentence and determines if it is valid with the grammar and lexicon supplied.
Understanding is mapping the processed language onto a representation that the computer can handle.
Language generation is the formation of an answer by the machine. Usually these are predefined phrases.
Speech synthesis is the verbal expression of the language generated and is also usually predefined phrases.
Language is difficult to parse structure can be as simple as two word sentences or complex lengthy sentences. By using a limited grammar and lexicon this can be simplified.
Language can also be ambiguous words can some times be nouns or verbs ( a run – to run ; a program – to program ). This can create multiple possible parse trees from a single statement.
Metaphors are not well translated by machines.
Parsers can be top down, bottom up, or finite state machines. Top down parsers start with sentences and work down to words. When every word has been classified it is complete. Bottom up parsing starts with words applying rules to parse them into verb and noun phrases until a sentence can be fit into a sentence rule. Finite state machines work from the bottom up. Finite state machines identify words and work through a graph like structure to form the sentence.
NLP is used in speech interfaces, text processing, language translation and information retrieval. It is an important tool in the war against spam.
More information:
Stanford School of Engineering, Natural Language Processing Online Course
Simon: Open source speech to text software
Natural Language Processing Blog
Getting Started with OpenNLP ( Natural Language Processing )
Has voice recognition finally come of age?
See also:
Darpa builds speech translator
Neural net learing vowels
Simple Predator Prey Chase Algorithms
Whether you are writing a game with artificial intelligence or a life form to evolve you have to decide how and when predators catch their prey. Should the predator try to catch the prey? If so what’s the best way? And the predator must watch out for obstacles during the chase, we don’t want any Wiley Coyote type accidents with our predators.
About the simplest method the predator can use to meet up with the prey is to see if the prey’s x & y positions are above or below the predator’s.
Prey ( 3, 6 )
Predator ( 5, 1 )
if ( prey_x > predator_x ) { x++; } else { x–; }
if ( prey_y > predator_y ) { y++; } else { y–; }
So the first pass through the predator moves from ( 5, 1 ) to ( 4, 2 ).
A bit more natural appearing predator chase would be to use line of sight as the path to move towards the prey. Bresenham’s Line Algorithm is usually used for this. While this is not more effective than the previous method it is more natural appearing. It is also more steps to compute.
prey current position ( xp, yp )
predator current position ( xP, yP )
x = x position to move to
y = y position to move to
dx = xp – xP
dy = yp – yP
Adx = AbsoluteValue ( dx )
Ady = AbsoluteValue (dy )
if ( xp > xP ) stepX = 1 else stepX = -1
if ( yp > yP ) stepY = 1 else stepY = -1
if ( Ady > Adx ){ //the y distance from prey is larger than the x distance
fraction = 2*dx – dy;
if (( yP != yp ) && ( fraction > 0 )){
x += stepX
}
y += stepY
}else{
fraction = 2*dy – dx;
if (( xP != xp ) && ( fraction > 0 )){
y += stepY
}
x += stepX
}
Sometimes instead of having the predator chase the prey it is more effective to have the predator intercept the path of the prey. To do this the velocity as well as the position of the prey must be taken into account.
//figure out when we can reach the prey
dvx = vxp – vxP //difference in x direction velocity
dvy = vyp – vxP //difference in y direction velocity
dx = xp – xP //difference in x positions
dy = yp – yP //difference in y positions
dtx = dx/dvx //time away from x intercept position
dty = dy/dvy //time away from y intercept position
//calculate where prey will be at that time
x = xp + vxp*dxt
y = yp + vyp*dxy
//use above Bresenham’s line algorithm to aim for that location ( x, y )
More information:
Line Drawing explained
Bresenham’s line algorithm
Genetic Algorithms
Genetic Algorithms create agents who compete to find the best solution to a problem using evolution.
Agents are initially created either using the best chromosomes from a previous run or a random mixture of chromosomes. After each round we evaluate each agent to see how well he has done. We then mix up the genes by switching among our best agents and or adding in some mutations. We repeat this for a set number of rounds or until some goal is reached.
If only gene switching is done and no mutations are entered into the population the agents will often only find local maximums not global maximums. If the number of agents is too small a solution may not be found.
Simple Genetic Program – you have predators, prey and food for the prey. When either a predator or prey reaches reproduction energy 2 of 3 genes are passed directly to the child, one is random.
GeneticLife.java
See Also:
Bioinformatics Blog: Genetic Algorithms: A Quick Tutorial
Racing with Evolutionary Algorithms