## Hopfield Networks

John Hopfield, in the late 1970’s, brought us these networks. These networks can be generalized and are robust. These networks can also be described mathematically. On the downside they can only store 15% as many states as they have neurodes, and the patterns stored must have Hamming distances that are about 50% of the number of neurodes.

Hopfield networks, are a networks that recall what is fed into them. This makes it useful for restoring degraded images. It is a fully connected net, every node is connected to every other node. The nodes are not connected to themselves.

Calculating the weight matrix for a Hopfield network is easy. All inputs are added together and the summed of all inputs is used to adjust the weights.

The Hopfield network is fully connected, each weight connects to every other weight

Hop fields work best on input data that has been set to vectors containing only -1, 1.

They can also be described as having a potential energy surface with conical holes representing the data. Holes having similar depth and diameter represent data with similar properties. The input data seeks the lowest potential energy and settles in to the closest hole. The energy surfaces of these networks are mathematically equivalent to that of ‘spin glasses’. ( E = 1/2 Sum(Input * weights * output ))

Some problems with these neural nets are they are computationally intensive, use lots of memory.

Hopfield Network source code using MNIST

Excellent tutorial with example

## Backpropagation Networks

Forward Feed Back Propagation networks (aka Three Layer Forward Feed Networks) have been very successful. Some uses include teaching neural networks to play games, speak and recognize things. If a non-linear activation function is used (usually the sigmoid function) then one hidden layer is sufficient to solve almost any equation with reasonable accuracy.

Back-propagation supervised training for Forward-Feed neural nets uses pairs of input and output patterns.

The weights on all the vectors are set to random values.

Then input is fed to the net and propagates to the output layer

hidden layer = sigmoid( sum ( inputs * weights + bias ) )

output layer = sigmoid( sum ( hidden * weights + bias ) )

The errors are calculated.

Error = Output * ( 1 – Output ) * ( expected value – Output )

Then the error correction is propagated back through the hidden layer

adjust output weights

Wnew = Wold + ( Error * Output ) * learningRate

then to the input layer in the network.

Error = Hidden * ( 1 – Hidden ) * ( OutputError * Weight + OutputError * Weight….. )

Wnew = Wold + ( Error * Output ) * learningRate

There is one input neurode for each number (dimension) in the input vector, there is one output neurode for each dimension in the output vector. So the network maps in-dimensional space to out-dimension space.

The weight adjustment is gradient descent. You are calculating the derivative and finding the quickest route to zero error.

The learningRate is a small number ~0.1 to keep the network from blowing up or over shooting a minimum. If the rate is too small the network will be slow to learn.

There is no set rule for determining the number of hidden layers or the number of neurodes in the hidden layer. However, if too few hidden neurodes are chosen then the network can not learn. If too many are chosen, then the network memorizes the patterns rather than learning to extract relevant information. A rule of thumb for choosing the number of hidden neurodes is to choose log ( 2)X where X is the number of patterns. So if you have 8 distinct patterns to be learned, then log ( 2)8 = 3 and 3 hidden neurodes are probably needed. This is just a rule of thumb, experiment to see what works best for your situation.

The error vector is aimed at zero during training.

Error = ( 1/2 * (sum (desired-actual)^2))

To get the error close to zero, with in a tolerance, we use iteration. Each iteration we move a step downward. We take the gradient, the derivative of a vector, and use the steepest descent to minimize the error. So thenewweight = oldW eight + stepsize (-gradientW (e(W )).

The error may be backpropagated after each training example, or more commonly after a batch of examples.

Other resources:

GitHub source code for a backpropagtion network in Swift using the Accelerate Library

Backpropagation Tutorial

An Introduction to Neural Networks, textbook by Krose and van der Smagt

## Perceptron

Perceptrons separate data linearly, if the data is linearly separable.

y = mx + b

m => the weights which give the slope of the line

b => the bias, the place where the line crosses the y axis

x => input data

y => output

Rosenblatt added the learning law to the McCulloch-Pitts neurode to make it Perception, which is the first of the neural net learning models. The perception has one layer of inputs and one layer of outputs, and one group of weights. If data points on a plot are linearly separable (we can draw a straight line separating points that belong in different categories), then we can use this learning method to teach the neural net to properly separate the data points.

The McCulloch-Pitts neurode fires a +1 if the neurode’s total input the sum of each input * its weight is greater than the set threshold. If it is less than the set threshold, or if there is any inhibitory input a 0 is fired. Any logical function can be created using only AND, OR and NOT gates so a neural net can be created with McCulloch-Pitts neurodes to solve any logical function.

We start with a weight vector that is set to zeros or random numbers. During training weights are increased or decreased by adding or subtracting the input to the weights. This is done until all data points are input and the neurode gives the correct output for each point.

The perception fell out of favor since it can only handle linearly separable functions which means simple functions like XOR, or parity can not be computed by them. Minsky and Papert published a book ‘Perceptions’, in the 1980’s, that proved that one and two layer neural nets could not handle many real world problems and research fell off for about twenty years in neural nets.

An additional layer and set of weights can enable the Perception to handle functions that are not linear. A separate layer is needed for each vertex needed to separate the function. A 1950’s paper by A.N. Kilmogorov published a proof that a three layer neural network could perform any mapping exactly between any two sets of numbers.

Multi layered perceptrons were developed than can handle XOR functions. Hidden layers are added and they are trained using backpropagation or a similar training algorithm. Using one layer linearly separable problems can be solved. Using two layers regions can be sorted and with three layers enclosed regions can be sorted.

More information:

GitHub source code for a Swift Perceptron network

An Introduction to Neural Networks, textbook by Krose and van der Smagt

## Braitenberg Vehicles

Synthetic psychology is a field where biological behavior is synthesized rather than analyzed. The father of such behavior Valentino Braitenberg ( home page ) did some interesting work with toy vehicles in the 1986 book “Vehicles: Experiments in Synthetic Psychology“.

The Braitenberg vehicles were simple toy cars with light sensors as headlights. In some cars the wire for each headlight went to the real wheel directly behind it, in some the wires went to the opposite back wheel. The headlight receptors were aimed slightly off to the outside. The more light received by a receptor the faster the wheel wired to that receptor would turn.

Each vehicle exhibited realistic behavior when placed in a plane with several randomly placed light sources. A vehicle wired straight back when placed near a light source will then rush towards the light veering off as it gets closer to the light. As it gets more distant from the light sources the vehicle slows down. The reason is the wheel receiving the most light spins fastest turning the car away from the light source.

The vehicle with the crossed wires will turn and drive towards the brightest light in its field of view. The closer it gets, the faster it goes eventually running into the light.

Pretty interesting behavior from a very simple machine. But what if we add in a neurode to each wire and instead of a plain wire we use a wire that inhibits signals? Neurodes are set to only fire if they receive signals over a certain threshold. In this case zero is to be our threshold. So now our cars send signals to the wheels if there is no light, and do not send a signal if there is a light. Now the vehicle with the wires straight back moves toward a light and slows as it approaches, facing the light. The second vehicle now avoids light sources, speeding off to the darkest corner it can find.

Using thresholds ( see logic version ) you begin to see group behavior, one set of vehicles herds and traps another set.

So what has this all to do with current artificial intelligence? Some of our best stuff right now came from earlier work that was done and stopped. Some of our best mathematical algorithms come from extremely early math. And to remind you ( and me ) that simple rules can create very complex behavior in game characters and artificial life forms.

Four neurode versions of these vehicles have been built and they will exhibit more complex, non-linear behavior. “The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” (Edsger Dijkstra) At what point does the behavior become realist enough to be considered a life form?

Source Code:

ObjC simulation of Braitenberg Vehicles( #2 fear and aggression )

ObjC simulation of Braitenberg Vehicles( #3 love, explore and complex )

ObjC simulation of Braitenberg Vehicles( #4 values and tastes )

ObjC simulation of Braitenberg Vehicles( #5 Logic )

More information:

Braitenberg Vehicles ( Java and Lisp simulators here )

Papers:

Swarm modelling. The use of Swarm Intelligence to generate architectural form. ( pdf)

## SantaFe Ant Trail

I’m revisiting this problem again and hoping to dig more into evolutionary computing.

The SantaFe Ants Trail is a maze of 89 food pellets, with breaks in between. The ants must find a way to pick up all of the pellets.

Each ant checks to see if there is food in front, moves forward and collects it if so, else turns left, right or moves one forward depending on its coding. The food is removed for this ant once it is collected.

The genome must be a minimum of 32 chromosomes, that’s the minimum the state machine must have to solve the maze. It’s been solved with 32 chromosomes, 200 steps, 200 generations, 65,536 starting ants.

The fitness of the ant is the number of pellets it has collected.

Each generation the top 50% of the ants mate, half the genes ( in sequence from one parent, half from the other )

I only released 2048 ants at once, I suppose with out graphics the computer would’ve handled much more but it was interesting to watch and I spotted several problems along the way while watching.

At 512 ants, my best score was 43, at 1024 best was 50.

Problems I ran into:

– If one or a few ants were far above rest they cloned the entire population and diversity was lost.

– Need a way to cull individuals too far from the center and or remove clones. I inserted random mutations rather than check for clones, sorting by genes might make it easy to id and remove clones.

Source code:

ObjC iOS SantaFe Ant Trail

More info:

Wiki SantaFe Ant Trail

Solving the Artificial Ant on the Santa Fe Trail Problem in 20,696 Fitness Evaluations

See also:

Evolutionary AI