Forward Feed Back Propagation networks (aka Three Layer Forward Feed Networks) have been very successful. Some uses include teaching neural networks to play games, speak and recognize things. If a non-linear activation function is used (usually the sigmoid function) then one hidden layer is sufficient to solve almost any equation with reasonable accuracy.
Back-propagation supervised training for Forward-Feed neural nets uses pairs of input and output patterns.
The weights on all the vectors are set to random values.
Then input is fed to the net and propagates to the output layer
hidden layer = sigmoid( sum ( inputs * weights + bias ) )
output layer = sigmoid( sum ( hidden * weights + bias ) )
The errors are calculated.
Error = Output * ( 1 – Output ) * ( expected value – Output )
Then the error correction is propagated back through the hidden layer
adjust output weights
Wnew = Wold + ( Error * Output ) * learningRate
then to the input layer in the network.
Error = Hidden * ( 1 – Hidden ) * ( OutputError * Weight + OutputError * Weight….. )
Wnew = Wold + ( Error * Output ) * learningRate
There is one input neurode for each number (dimension) in the input vector, there is one output neurode for each dimension in the output vector. So the network maps in-dimensional space to out-dimension space.
The weight adjustment is gradient descent. You are calculating the derivative and finding the quickest route to zero error.
The learningRate is a small number ~0.1 to keep the network from blowing up or over shooting a minimum. If the rate is too small the network will be slow to learn.
There is no set rule for determining the number of hidden layers or the number of neurodes in the hidden layer. However, if too few hidden neurodes are chosen then the network can not learn. If too many are chosen, then the network memorizes the patterns rather than learning to extract relevant information. A rule of thumb for choosing the number of hidden neurodes is to choose log ( 2)X where X is the number of patterns. So if you have 8 distinct patterns to be learned, then log ( 2)8 = 3 and 3 hidden neurodes are probably needed. This is just a rule of thumb, experiment to see what works best for your situation.
The error vector is aimed at zero during training.
Error = ( 1/2 * (sum (desired-actual)^2))
To get the error close to zero, with in a tolerance, we use iteration. Each iteration we move a step downward. We take the gradient, the derivative of a vector, and use the steepest descent to minimize the error. So thenewweight = oldW eight + stepsize (-gradientW (e(W )).
The error may be backpropagated after each training example, or more commonly after a batch of examples.
Perceptrons separate data linearly, if the data is linearly separable.
y = mx + b
m => the weights which give the slope of the line
b => the bias, the place where the line crosses the y axis
x => input data
y => output
Rosenblatt added the learning law to the McCulloch-Pitts neurode to make it Perception, which is the first of the neural net learning models. The perception has one layer of inputs and one layer of outputs, and one group of weights. If data points on a plot are linearly separable (we can draw a straight line separating points that belong in different categories), then we can use this learning method to teach the neural net to properly separate the data points.
The McCulloch-Pitts neurode fires a +1 if the neurode’s total input the sum of each input * its weight is greater than the set threshold. If it is less than the set threshold, or if there is any inhibitory input a 0 is fired. Any logical function can be created using only AND, OR and NOT gates so a neural net can be created with McCulloch-Pitts neurodes to solve any logical function.
We start with a weight vector that is set to zeros or random numbers. During training weights are increased or decreased by adding or subtracting the input to the weights. This is done until all data points are input and the neurode gives the correct output for each point.
The perception fell out of favor since it can only handle linearly separable functions which means simple functions like XOR, or parity can not be computed by them. Minsky and Papert published a book ‘Perceptions’, in the 1980’s, that proved that one and two layer neural nets could not handle many real world problems and research fell off for about twenty years in neural nets.
An additional layer and set of weights can enable the Perception to handle functions that are not linear. A separate layer is needed for each vertex needed to separate the function. A 1950’s paper by A.N. Kilmogorov published a proof that a three layer neural network could perform any mapping exactly between any two sets of numbers.
Multi layered perceptrons were developed than can handle XOR functions. Hidden layers are added and they are trained using backpropagation or a similar training algorithm. Using one layer linearly separable problems can be solved. Using two layers regions can be sorted and with three layers enclosed regions can be sorted.
GitHub source code for a Swift Perceptron network
Synthetic psychology is a field where biological behavior is synthesized rather than analyzed. The father of such behavior Valentino Braitenberg ( home page ) did some interesting work with toy vehicles in the 1986 book “Vehicles: Experiments in Synthetic Psychology“.
The Braitenberg vehicles were simple toy cars with light sensors as headlights. In some cars the wire for each headlight went to the real wheel directly behind it, in some the wires went to the opposite back wheel. The headlight receptors were aimed slightly off to the outside. The more light received by a receptor the faster the wheel wired to that receptor would turn.
Each vehicle exhibited realistic behavior when placed in a plane with several randomly placed light sources. A vehicle wired straight back when placed near a light source will then rush towards the light veering off as it gets closer to the light. As it gets more distant from the light sources the vehicle slows down. The reason is the wheel receiving the most light spins fastest turning the car away from the light source.
The vehicle with the crossed wires will turn and drive towards the brightest light in its field of view. The closer it gets, the faster it goes eventually running into the light.
Pretty interesting behavior from a very simple machine. But what if we add in a neurode to each wire and instead of a plain wire we use a wire that inhibits signals? Neurodes are set to only fire if they receive signals over a certain threshold. In this case zero is to be our threshold. So now our cars send signals to the wheels if there is no light, and do not send a signal if there is a light. Now the vehicle with the wires straight back moves toward a light and slows as it approaches, facing the light. The second vehicle now avoids light sources, speeding off to the darkest corner it can find.
Using thresholds ( see logic version ) you begin to see group behavior, one set of vehicles herds and traps another set.
So what has this all to do with current artificial intelligence? Some of our best stuff right now came from earlier work that was done and stopped. Some of our best mathematical algorithms come from extremely early math. And to remind you ( and me ) that simple rules can create very complex behavior in game characters and artificial life forms.
Four neurode versions of these vehicles have been built and they will exhibit more complex, non-linear behavior. “The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” (Edsger Dijkstra) At what point does the behavior become realist enough to be considered a life form?
ObjC simulation of Braitenberg Vehicles( #2 fear and aggression )
ObjC simulation of Braitenberg Vehicles( #3 love, explore and complex )
ObjC simulation of Braitenberg Vehicles( #4 values and tastes )
ObjC simulation of Braitenberg Vehicles( #5 Logic )
Braitenberg Vehicles ( Java and Lisp simulators here )
I’m revisiting this problem again and hoping to dig more into evolutionary computing.
The SantaFe Ants Trail is a maze of 89 food pellets, with breaks in between. The ants must find a way to pick up all of the pellets.
Each ant checks to see if there is food in front, moves forward and collects it if so, else turns left, right or moves one forward depending on its coding. The food is removed for this ant once it is collected.
The genome must be a minimum of 32 chromosomes, that’s the minimum the state machine must have to solve the maze. It’s been solved with 32 chromosomes, 200 steps, 200 generations, 65,536 starting ants.
The fitness of the ant is the number of pellets it has collected.
Each generation the top 50% of the ants mate, half the genes ( in sequence from one parent, half from the other )
I only released 2048 ants at once, I suppose with out graphics the computer would’ve handled much more but it was interesting to watch and I spotted several problems along the way while watching.
At 512 ants, my best score was 43, at 1024 best was 50.
Problems I ran into:
– If one or a few ants were far above rest they cloned the entire population and diversity was lost.
– Need a way to cull individuals too far from the center and or remove clones. I inserted random mutations rather than check for clones, sorting by genes might make it easy to id and remove clones.
ObjC iOS SantaFe Ant Trail
This is very cool, iOS video image recognition.
I am totally convinced that deep learning approaches to hard AI are going to change our world, especially when they’re running on cheap networked devices scattered everywhere. I’m a believer because I’ve seen how good the results can be on image recognition, but I understand why so many experienced engineers are skeptical. It sounds too good to be true, and we’ve all been let down by AI promises in the past.
That’s why I’ve decided to release DeepBeliefSDK, an iOS version of the deep learning approach that has taken the computer vision world by storm. In technical terms it’s a framework that implements the full Krizhevsky stack of 60 million neural network connections, with a customizable top layer inspired by the Decaf approach. It does all this in under 300ms on an iPhone 5S, and in less than 20MB of memory. Here’s a video of me of me using the sample app to detect our cat!
More at Pete Warden’s Blog