Archive for the ‘neural networks’ Category
Wavelets
I hadn’t heard anything about wavlets in several years and then this news story caught my eye.
. . .Meningiomas are tumours of the brain and nervous system and they account for 20% of all brain tumours. Doctors have a major problem of discriminating between the four different subtypes of meningiomas but doctors face three key problems in making such a diagnois:
– The work can be painstakingly slow requiring up to two hours of analysis and expert consideration of a full “slide” of information.
– The finest tumour specialists (histopathologists) can at times come up with completely contradictory findings based on slight variations in their method of analysis.
– Currently the slides that specialists examine contain a few million pixels of data and the task of tumour diagnosis is painstakingly slow already. This problem is quite literally growing as medical equipment is coming on stream that can produce slides with hundreds of millions pixel resolution.
. . .
Now researchers in the University of Warwick’s Department of Computer Science have devised a method of using “wavelets” to provide an automated analysis of the varying texture of the tumours and guidance to doctor’s within seconds of being presented the data.
[ read more Wavelets crunch through doctor’s day to long struggle to diagnose brain tumors
Maybe wavelets are about to make a bigger splash in the world of artificial intelligence?
Learn more:
An introduction to wavelets
A really friendly guide to wavelets
Tutorial on continuous wavelet analysis
Wavelet ( Wolfram site )
Wavelets
Wavelets for computer graphics
Code:
WAILI – Wavelets C++ library ( open source )
PyWavelets – Python library ( open source )
Wavelets in Java ( source code provided)
Hopfield Networks
John Hopfield, in the late 1970′s, brought us these networks. These networks can be generalized and are robust. These networks can also be described mathematically. On the downside they can only store 15% as many states as they have neurodes, and the patterns stored must have Hamming distances that are about 50% of the number of neurodes.
Hopfield networks, aka crossbar systems, are a networks that recall what is fed into them. This makes it useful for restoring degraded images. It is a fully connected net, every node is connected to every other node. The nodes are not connected to themselves.
Calculating the weight matrix for a Hopfield network is easy. This is an example with 3 input vectors. You can train the network to match any number of vectors provided that they are orthogonal.
Orthogonal vectors are vectors that give zero when you calculate the dot product.
orthogonal (0, 0, 0, 1) (1, 1, 1, 0) => 0*1 + 0*1 + 0*1 + 1*0 = 0
orthogonal (1, 0, 1, 0) (0, 1, 0, 1) => 1*0 + 0*1 + 1*0 + 0*1 = 0
NOT orthogonal (0, 0, 0, 1) (0, 1, 0, 1) = 0*0 + 0*1 + 0*0 + 1*1 = 1
Orthogonal vectors are perpendicular to each other.
To calculate the weight matrix for the orthogonal vectors (0, 1, 0, 0), (1, 0, 1, 0), (0, 0, 0, 1)
first make sure all the vectors are orthogonal
(0, 1, 0, 0) (1, 0, 1, 0) => 0*1 + 1*0 + 0*1 + 0*0 = 0
(0, 1, 0, 0) (0, 0, 0, 1) => 0*0 + 1*0 + 0*0 + 0*1 = 0
(1, 0, 1, 0) (0, 0, 0, 1) => 1*0 + 1*0 + 1*0 + 0*1 = 0
Change the zeros to negative ones in each vector
(0, 1, 0, 0) => (-1, 1, -1, -1)
(1, 0, 1, 0) => (1, -1, 1, -1)
(0, 0, 0, 1) => (-1, -1, -1, 1)
Multiply each matrix by itself
(-1) (-1, 1, -1, -1)
[ 1, -1, 1, 1]
( 1)
[-1, 1, -1, -1]
(-1)
[ 1, -1, 1, 1]
(-1)
[ 1, -1, 1, 1]
( 1) (1, -1, 1, -1)
[ 1, -1, 1, -1]
(-1)
[-1, 1, -1, 1]
( 1)
[ 1, -1, 1, -1]
(-1)
[-1, 1, -1, 1]
(-1) (-1, -1, -1, 1)
[ 1, 1, 1,-1]
(-1)
[ 1, 1, 1,-1]
(-1)
[ 1, 1, 1,-1]
( 1)
[-1,-1,-1,-1]
The third step is to put zeros on the main diagonal of each matrix and add them together. (Putting zeros on the main diagonal keeps each node from being connected to itself.
( 0,-1, 1, 1)
( 0,-1, 1,-1)
(0, 1, 1,-1)
(-1, 0,-1,-1)
+
(-1, 0,-1, 1)
+
(1, 0, 1,-1)
( 1,-1, 0, 1)
( 1,-1, 0,-1)
(1, 1, 0,-1)
( 1,-1, 1, 1)
(-1, 1,-1, 0)
(1, 1, 1, 0)
The resulting matrix is:
[ 0,-1, 3,-1]
[-1, 0,-1,-1]
[ 3,-1, 0,-1]
[ 1, 1, 0, 1]
The Hopfield network is fully connected, each weight connects to every other weight
[n1] -> [n2] => weight is -1
[n1] -> [n3] => weight is 3
[n1] -> [n4] => weight is -1
[n2] -> [n1] => weight is -1
[n2] -> [n3] => weight is -1
[n2] -> [n4] => weight is -1
[n3] -> [n1] => weight is 3
[n3] -> [n2] => weight is -1
[n3] -> [n4] => weight is -1
[n4] -> [n1] => weight is 1
[n4] -> [n2] => weight is 1
[n4] -> [n3] => weight is 1
They can also be described as having a potential energy surface with conical holes representing the data. Holes having similar depth and diameter represent data with similar properties. The input data seeks the lowest potential energy and settles in to the closest hole. The energy surfaces of these networks are mathematically equivalent to that of ‘spin glasses’.
Some problems with these neural nets are they are computationally intensive, use lots of memory, and although I haven’t seen it mentioned I would guess race conditions may present a problem since data is updated continuously at each node with the output from one becoming the input for another.
BAM, bidirectional associative memory is an example of a Hopfield network. It consists of two fully connected layers, one for input and one for output. The nodes in each layer do not have connections to other nodes in the same layer. The weights are bidirectional, meaning that there is communication in both directions along the weight vector. There are no connections between neurodes in the same layer. BAM networks take only -1′s and 1′s as input and only output -1′s and 1′s. So if you are working with binary data, you must convert the zeros to -1′s. The weights are calculated in the same way as the Hopfield example above. The nodes are either 0 or 1 (on or off).
Hopfield Network Example
Backpropagation Networks
Forward Feed Back Propagation networks (aka Three Layer Forward Feed Networks) have been very successful. Some uses include teaching neural networks to play games, speak and recognize things. Backpropagation networks can be used on several network architectures. The networks are all highly interconnected and use non-linear transfer functions. The network must have at minimum three layers, but rarely needs more than three layers.
Back-propagation supervised training for Forward-Feed neural nets uses pairs of input and output patterns. The weights on all the vectors are set to random values. Then input is fed to the net and propagates to the output layer and the errors are calculated. Then the error correction is propagated back through the hidden layer then to the input layer in the network. There is one input neurode for each number (dimension) in the input vector, there is one output neurode for each dimension in the output vector. So the network maps IN-dimensional space to OUT-dimension space. There is no set rule for determining the number of hidden layers or the number of neurodes in the hidden layer. However, if too few hidden neurodes are chosen then the network can not learn. If too many are chosen, then the network memorizes the patterns rather than learning to extract relevant information. A rule of thumb for choosing the number of hidden neurodes is to choose log ( 2)X where X is the number of patterns. So if you have 8 distinct patterns to be learned, then log ( 2)8 = 3 and 3 hidden neurodes are probably needed. This is just a rule of thumb, experiment to see what works best for your situation.
The error vector is aimed at zero during training. The vector is calculated as: Error = ( 1/2 * (sum (desired-actual)^2)) To get the error close to zero, with in a tolerance, we use iteration. Each iteration we move a step downward. We take the gradient, the derivative of a vector, and use the steepest descent to minimize the error. So thenewweight = oldW eight + stepsize (-gradientW (e(W )).
The derivative of the function T (x) = (1/(1 e^-x )) is just T (x) (1 T (x)) so using the chain rule we arrive at the error correction function (desired actual)(1 actual) eachN odeOutW eight eachN odeHiddenW eight the weight is then changed by the amount of the error correction function as it propagates back through the network.
To train the net all weights are randomly set to a value between -1.0 and 1.0
To do the calculations going forward through the net:
Each NodeInput is multplied by each weight connected to it
Each HiddenNode sums up these incoming weights and adds a bias to the total
This value is used in the sigmoid function as x { 1/(1+e^-x) }
If this value is greater than the threshold the HiddenNode fires this value, else it fires zero
Each HiddenNode is multiplied by each weight connected to it
Each OutputNode sums up these incoming weights and adds a bias to the total
This value is used in the sigmoid function as x { 1/(1 + e^-x) }
This is the value out put by the OutputNode
To calculate the adjusments during training, you figure out the error and propigate it back like this:
Adjust weights between HiddenNodes and OutputNodes
ErrorOut = ( OutputNode)*(1-OutputNode)(DesiredOutput – OutputNode)
ErrorHidden = (HiddenNode)*(1-HiddenNode)*(Sum { ErrorOut*Weight + ErrorOut*Weight … } ) for each weight connected to this node
LearningRate = LearningConstant * HiddenNode
(LearningConstant is usually set to something around 0.2 )
Adjustment = ErrorOut * LearningRate
Weight = Weight – Adjustment
Adjust weights between HiddenNodes and InputNodes
Adjustment = ( ErrorHidden)*(LearningConstant)*(NodeInput)
Weight = Weight – Adjustment
Adjust Threshold
On OutputNode, Threshold = Threshold – ErrorOut * LearningRate
On HiddenNode, Threshold = Threshold – ErrorHidden * LearningRate
If you use a neural net that also accounts for imaginary numbers you can adapt this function so it is not always positive and calculate all of the four derivatives needed.
Numerous iterations are required for a backpropagation network to learn. Therefore it is not practical for neural nets that must learn in ‘real time’. It will not always arrive at a correct set of weights. It may get trapped in local minimums rather than an actual minimum. This is a problem with the ‘steepest decent’ algorithm. A momentum term that allows the calculation to slide over small bumps is sometimes employed. Back propagation networks do not scale well. They are only good for small neural nets.
Winning Dog Track Predictor ( C/PERL)
