Archive for March, 2007
Probabilistic Neural Networks
Probabilistic neural networks are forward feed networks built with three layers. They are derived from Bayes Decision Networks. They train quickly since the training is done in one pass of each training vector, rather than several. Probabilistic neural networks estimate the probability density function for each class based on the training samples.
The probabilistic neural network uses Parzen or a similar probability density function. This is calculated for each test vector. This is what is used in the dot product against the input vector as described below. Usually a spherical Gaussian basis function is used, although many other functions work equally well.
Vectors must be normalized prior to input into the network. There is an input unit for each dimension in the vector. The input layer is fully connected to the hidden layer. The hidden layer has a node for each classification. Each hidden node calculates the dot product of the input vector with a test vector subtracts 1 from it and divides the result by the standard deviation squared. The output layer has a node for each pattern classification. The sum for each hidden node is sent to the output layer and the highest values wins.
The Probabilistic neural network trains immediately but execution time is slow and it requires a large amount of space in memory. It really only works for classifying data. The training set must be a thorough representation of the data. Probabilistic neural networks handle data that has spikes and points outside the norm better than other neural nets.
More information:
A tutorial on probabilistic neural networks ( very nice )
A weighted probabilistic neural network (pdf)
Speech recognition using the probabilistic neural network (pdf)
Parzen probabilistic neural networks
Associative Memories
Associative Memories
Associate memory stores information by associating or correlating it with other memories. Most neural nets have the capability to store memory this way. Associate memory systems can recall information based on garbled input, details are stored in a distributive fashion, are accessible by content, are very robust, and most importantly can generalize. The two classes of associative memory classified by how they store memories are: auto associative; hetero-associative.
Autoassociate: each data item is associated with itself. Used for cleaning up and recognizing handwriting. Training is done by giving the same pattern to the input and output nodes.
Hetero-associative: different data items are associated with each other. One pattern is given and another is output, a translation program would fall in this category. This one is trained by giving one input pattern to the input nodes and the desired output pattern to the output nodes.
The main architectures for associated memory neural networks are: crossbar (aka Hopfield); adaptive filter networks; competitive filter networks. Adaptive filter networks, like Adelines, test each neurode to see if it is the pattern specific to that neurode. These are used in signal processing.
Competitive filter networks, like Kohonens, have neurodes competing to be the one that matches the pattern. They self-organize and they perform statistical modeling with out outside aid or input.
Adaptive Resonance Networks
Developed by spouses Stephen Grossberg and Gail Carpenter Adaptive Resonance Theory, ART, is a self organizing network that learns without supervised training. ART uses a competitive input-output training to allow the network to learn new information with out losing information already learned. It does this by classifying information into groups. If a group is not found for incoming information a new group is formed.
These networks consist of an input (comparison) layer with a node for each input dimension and an output (recognition) layer that has a node for each category. There is a hidden layer between them that filters information feed back to the input layer from the output layer. There are also controls for each layer to control the direction of information. Competitive training occurs and the highest valued node wins.
Patterns are presented to the input layer which tries to find the closest matching weight vector. If a matching weight vector is found it is compared to the categories for a match. If there are weight and category matches then the network is in resonance and training is performed to better match the weights. If no category is found a new one is created.
Input is in the form of a binary vector. Say we are trying to match people with businesses. An input { 1, 0, 0, 0 } might describe a customer’s hobbies.
garden true
golf false
run false
tennis false
Algorithm:
Initialize prototype vector
For each example input vector:
is example close to an existing prototype ( proximity test )?
yes -> ? pass vigilance test ?
yes-> place vector in current prototype vector
no-> ? is example close to an existing prototype ?
yes -> add to existing prototype
no -> add prototype
no-> add prototype
end for each
Testing of vectors:
VP = vigilance parameter ( set between 0 and 1 )
d = the dimension of the vectors ( number of inputs )
B = beta ( usually choosen by experiment )
P = prototype vector
I = input vector
1) proximity test
| P && I | / ( B + |P| ) ?> |I| / ( B + d )
2) vigilance test
| P && I | / | I | ?>= VP
So if our input vector I = { 1 1 0 }
and our prototype vector P = { 0 1 0 }
lets try B = 1.0
and VP = 0.5
So then:
d = 3
P && I = { 0 1 0 }
| P && I | = 1
|P| = 1
B + |P| = 2
|I| = 2
B + d = 4
our proximity test is then:
1/2 ?> 2/4 false
our vigilance test is then:
1/2 >= 0.5 true
Adeline Neural Nets
Adeline Neural Nets
Adeline, ADAptive linear Neuron was developed by Widrow and Hoff in 1959. It is a classic example of an ‘Adaptive Filter Associative Memory Neural Net’ or ‘Adaptive linear Element’. It has only an input layer consisting of a node for each input and an output layer that has only one node. It can learn to sort linear input into two groups. Inputs are real numbers between -1..+1. The 208 neurode forms a weighted sum of all inputs and output’s a +/-1. There is one input with a weighted synapse for every number in the input vector. It has an extra input ‘mentor’ used during training which carries the expected output for the given input.
Adeline can only separate data in to two groups. The data must be linearly separable. The Adeline’s training starts with a straight line drawn anywhere on the plot provided it intersects the origin. The training effectively rotates this line until it properly separates the data into the two groups, using the least mean squares algorithm. The angle of this line is the angle Adeline tests against the input vector times the weight vector (dot product). If the angle of the dot product of these two vectors is less than a 1 is output, if it is less than a 0 is output.
Dot product:A*B or Ax Bx + Ay By + …. or ABcos() where theta is the angle between vector A and vector B. A bold A or B represents the length of the vector. (now adjust the weights by the amounts in the vector) The learning constant must be less than 2 or the network will not stabilize.
Input patterns are used to set the initial weights, during which time the mentor node is set to +/ 1 depending on the desired output. Following that a training set, different from the initial set, is tried. If the answer is correct we do nothing. If the answer is not correct the weights are adjusted using the delta rule.
The delta rule changes the weights in proportion to the amount they are incorrect. The distance is determined by subtracting network’s actual response difference from expected response; multiply this by a training constant; multiply by the size and direction of the input pattern vector; and use this information to determine the change in weight. This is also known as the Least Mean Squared Rule ChangeInW eight = 2 LearningRate InputN odej (DesiredOutput ActualOutput)
Collections of Adeline’s in a layer can be taught multiple patterns. Adelines can have additional inputs that are powers or multiplications of inputs and are referred to as higher order networks. It may work better at pattern solving than a many layered single order network. This may be used in more than two dimensions. A line separates linear data in a plane, a plane separates linear data in three dimensions, etc. Adelines and Madelines can be used to clean up noise from data provided there is a good copy of the data to learn from during training.