Multiple Inputs for Backpropagation Neural Network - algorithm

I've been working on this for about a week. There are no errors in my coding, I just need to get algorithm and concept right. I've implemented a neural network consisting of 1 hidden layer. I use the backpropagation algorithm to correct the weights.
My problem is that the network can only learn one pattern. If I train it with the same training data over and over again, it produces the desired outputs when given input that is numerically close to the training data.
training_input:1, 2, 3
training_output: 0.6, 0.25
after 300 epochs....
input: 1, 2, 3
output: 0.6, 0.25
input 1, 1, 2
output: 0.5853, 0.213245
But if I use multiple varying training sets, it only learns the last pattern. Aren't neural networks supposed to learn multiple patterns? Is this a common beginner mistake? If yes then point me in the right direction. I've looked at many online guides, but I've never seen one that goes into detail about dealing with multiple input. I'm using sigmoid for the hidden layer and tanh for the output layer.
+
Example training arrays:
13 tcp telnet SF 118 2425 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 26 10 0.38 0.12 0.04 0 0 0 0.12 0.3 anomaly
0 udp private SF 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 3 0 0 0 0 0.75 0.5 0 255 254 1 0.01 0.01 0 0 0 0 0 anomaly
0 tcp telnet S3 0 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 255 79 0.31 0.61 0 0 0.21 0.68 0.6 0 anomaly
The last columns(anomaly/normal) are the expected outputs. I turn everything into numbers, so each word can be represented by a unique integer.
I give the network one array at a time, then I use the last column as the expected output to adjust the weights. I have around 300 arrays like these.
As for the hidden neurons, I tried from 3, 6 and 20 but nothing changed.
+
To update the weights, I calculate the gradient for the output and hidden layers. Then I calculate the deltas and add them to their associated weights. I don't understand how that is ever going to learn to map multiple inputs to multiple outputs. It looks linear.

If you train a neural network too much, with respect to the number of iterations through the back-propagation algorithm, on one data set the weights will eventually converge to a state where it will give the best outcome for that specific training set (overtraining for machine learning). It will only learn the relationships between input and target data for that specific training set, but not the broader more general relationship that you might be looking for. It's better to merge some distinctive sets and train your network on the full set.
Without seeing the code for the back-propagation algorithm I could not give you any advice on if it's working correctly. One problem I had when implementing the back-propagation was not properly calculating the derivative of the activation function around the input value. This website was very helpful for me.

No Neural networks are not supposed to know multiple tricks.
You train them for a specific task.
Yes they can be trained for other tasks as well
But then they get optimized for another task.
So thats why you should create load and save functions, for your network so that you can easily switch brains and perform other tasks, if required.
If your not sure what taks it is currently train a neural to find the diference between the tasks.

Related

Connected component labeling in matrix

I'm trying to do the following
Given the following matrix (where 1's are empty cells and 0's are obstacles):
0 0 1 1
1 0 0 0
1 0 1 1
1 1 0 0
I want it to become like this:
0 0 1 1
2 0 0 0
2 0 2 2
2 2 0 0
What I need to do is to label all connected components (free spaces).
What I already tried to do is to write a function called isConnected() which takes indexes of two cells and checks if there is a connected path between them. And by repeating this function n^2 times on every empty cell on the matrix I can label all connected spaces. But as this algorithm has a bad time complexity (n^2*n^2*O(isConnected())) I prefer to use something else.
I hope these pictures will explain better what I'm trying to accomplish:

Evaluating the model in WEKA

I have applied classification algorithm on dataset and came out with below stats:
Correctly Classified Instances 684 76.1693 %
Incorrectly Classified Instances 214 23.8307 %
Kappa statistic 0
Mean absolute error 0.1343
Root mean squared error 0.2582
Relative absolute error 100 %
Root relative squared error 100 %
Total Number of Instances 898
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0 0 0 0 0 0.5 1
0 0 0 0 0 0.5 2
1 1 0.762 1 0.865 0.5 3
0 0 0 0 0 ? 4
0 0 0 0 0 0.5 5
0 0 0 0 0 0.5 U
Weighted Avg. 0.762 0.762 0.58 0.762 0.659 0.5
=== Confusion Matrix ===
a b c d e f <-- classified as
0 0 8 0 0 0 | a = 1
0 0 99 0 0 0 | b = 2
0 0 684 0 0 0 | c = 3
0 0 0 0 0 0 | d = 4
0 0 67 0 0 0 | e = 5
0 0 40 0 0 0 | f = U
I can understand much of the data however there is a problem interpreting the values since i am new to Weka:
1. Which error rate to report overall?
2. How to interpret if something interesting about the model?
1) Overall error measure
The triplet Precision, Recall and F-Measure together is reported quite often because each number represents a different aspect of the model.
If would like to have a single number only then take Percent (In)correctly Classified Instances or Weighted Avg. F-Measure.
The other error measures are also useful but they require deeper knowledge of statistics (which I'm lacking :-)
2) Something interesting about the model
From Detailed Accuracy By Class and Confusion Matrix you can see that the model is quite simple. It classifies everything as class 3. The error measures looks quite successful, but it is just because 76% of instances in the dataset have the class 3. The model corresponds with often used baseline algorithm called "most common class".
The ROC area is also useful in terms of evaluating accuracy and interpreting how interesting a model is. Simply speaking, the true positive rate is plotted against the false positive rate and the ROC area is calculated as the area underneath this curve. A high ROC area, say 0.9 to 1, indicates that the model is very good at classifying instances, whereas a ROC area of 0.5 (as in your model) means that the model is no better at classification than a random method like flipping coins.

Explanation of Matlab's bwlabel,regionprops & centroid functions

I have spent all day reading up on the above MATLAB functions. I can't seem to find any good explanations online, even on the MathWorks website!
I would be very grateful if anyone could explain bwlabel, regionprops and centroid. How do they work if applied to a grayscale image?
Specifically, they are being used in this code below. How do the above functions apply to the code below?
fun=#minutie; L = nlfilter(K,[3 3],fun);
%% Termination LTerm=(L==1);
figure; imshow(LTerm)
LTermLab=bwlabel(LTerm);
propTerm=regionprops(LTermLab,'Centroid');
CentroidTerm=round(cat(1,LTerm(:).Centroid));
figure; imshow(~K)
set(gcf,'position',[1 1 600 600]); hold on
plot(CentroidTerm(:,1),CentroidTerm(:,2),'ro')
That's quite a mouthful to explain!... nevertheless, I'd love to explain it to you. However, I'm a bit surprised that you couldn't understand the documentation from MathWorks. It's actually quite good at explaining a lot (if not all...) of their functions.
BTW, bwlabel and regionprops are not defined for grayscale images. You can only apply these to binary images.
Update: bwlabel still has the restriction of accepting a binary image but regionprops no longer has this restriction. It can also take in a label matrix that is usually output from bwlabel as well as binary images.
Assuming binary images is what you want, my explanations for each function is as follows.
bwlabel
bwlabel takes in a binary image. This binary image should contain a bunch of objects that are separated from each other. Pixels that belong to an object are denoted with 1 / true while those pixels that are the background are 0 / false. For example, suppose we have a binary image that looks like this:
0 0 0 0 0 1 1 1 0 0
0 1 0 1 0 0 1 1 0 0
0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 1
0 0 1 1 1 1 0 0 1 1
You can see in this image that there are four objects in this image. The definition of an object are those pixels that are 1 that are connected in a chain by looking at local neighbourhoods. We usually look at 8-pixel neighbourhoods where you look at the North, Northeast, East, Southeast, South, Southwest, West, Northwest directions. Another way of saying this is that the objects are 8-connected. For simplicity, sometimes people look at 4-pixel neighbourhoods, where you just look at the North, East, South and West directions. This woudl mean that the objects are 4-connected.
The output of bwlabel will give you an integer map where each object is assigned a unique ID. As such, the output of bwlabel would look something like this:
0 0 0 0 0 3 3 3 0 0
0 1 0 1 0 0 3 3 0 0
0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 4
0 0 0 0 0 0 0 0 4 4
0 0 2 2 2 2 0 0 4 4
Because MATLAB processes things in column major, that's why the labelling is how you see above. As such, bwlabel gives you the membership of each pixel. This tells you where each pixel belongs to if it falls on an object. 0 in this map corresponds to the background. To call bwlabel, you can do:
L = bwlabel(img);
img would be the binary image that you supply to the function and L is the integer map I just talked about. Additionally, you can provide 2 outputs to bwlabel, and the second parameter tells you how many objects exist in the image. As such:
[L, num] = bwlabel(img);
With our above example, num would be 4. As another method of invocation, you can specify the connected pixel neighbourhoods you would examine, and so you can do this:
[L, num] = bwlabel(img, N);
N would be the pixel neighbourhood you want to examine (i.e. 4 or 8).
regionprops
regionprops is a very useful function that I use daily. regionprops measures a variety of image quantities and features in a black and white image. Specifically, given a black and white image it automatically determines the properties of each contiguous white region that is 8-connected. One of these particular properties is the centroid. This is also the centre of mass. You can think of this as the "middle" of the object. This would be the (x,y) locations of where the middle of each object is located. As such, the Centroid for regionprops works such that for each object that is seen in your image, this would calculate the centre of mass for the object and the output of regionprops would return a structure where each element of this structure would tell you what the centroid is for each of the objects in your black and white image. Centroid is just one of the properties. There are other useful features as well, but I'm assuming you don't want to do this. To call regionprops, you would do this:
s = regionprops(img, 'Centroid');
The above code will calculate the centroids of each of your objects in the image. You can specify additional flags to regionprops to specify each feature that you want. I do highly encourage that you take a look at all of the possible features that regionprops can calculate, as there are many that are useful in a variety of different applications and situations.
Also, by omitting any flags as input into the function, you would calculate all of the features in your image by default. Therefore, if we were to declare the image that we have seen above in MATLAB, this is what would happen after I run regionprops. After, let's calculate what the centroids are:
img = logical(...
[0 0 0 0 0 1 1 1 0 0;
0 1 0 1 0 0 1 1 0 0;
0 1 1 1 0 0 0 0 0 0;
0 0 0 0 0 0 0 0 0 1;
0 0 0 0 0 0 0 0 1 1;
0 0 1 1 1 1 0 0 1 1]);
s = regionprops(img, 'Centroid');
... and finally when we display the centroids:
>> disp(cat(1,s.Centroid))
3.0000 2.6000
4.5000 6.0000
7.2000 1.4000
9.6000 5.2000
As such, the first centroid is located at (x,y) = (3, 2.6), the next centroid is located at (x,y) = (4.5, 6) and so on. Take special note that the x co-ordinate is the column while the y co-ordinate is the row.
Hope this is clear!

create multiple NxM matrices

I'm looking for a code which returns all combinations of NxM matrices consisting of only ones and zeros. Every row contains precisely one '1'. Every column consists of one or more '1'. If the rules for rows and columns are hard to program they may be left out since computational problem is not going to be an issue.
1 0 0
1 0 0
1 0 0
0 1 0
0 0 1
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
etc. etc. etc.
Hopefully someone can help me out.
Cheers, Raymond

Enumerate graphs under edge and symmetry constraints

I would like to create the set of all directed graphs with n vertices where each vertex has k direct successors and k direct predecessors. n and k won't be that large, rather around n = 8 and k = 3. The set includes cyclic and acyclic graphs. Each graph in turn will serve as a template for sampling a large number of weighted graphs.
My interest is in the role of topology motifs so I don't want to sample weights for any two graphs that are symmetric to each other, where symmetry means that no permutation of vertices exists in one graph that transforms it into the other.
A naive solution would be to consider the 2 ^ (n * (n - 1)) adjacency matrices and eliminate all those (most of them) for which direct successor or predecessor constraints are violated. For n = 8, that's still few enough bits to represent and simply enumerate each matrix comfortably inside a uint64_t.
Keeping track of row counts and column counts would be another improvement, but the real bottleneck will be adding the graph to the result set, at which point we need to test for symmetry against each other graph that's already in the set. For n = 8 that would be already more than 40,000 permutations per insert operation.
Could anyone refer me to an algorithm that I could read up on that can do all this in a smarter way? Is there a graph library for C, C++, Java, or Python that already implements such a comprehensive graph generator? Is there a repository where someone has already "tabulated" all graphs for reasonable n and k?
Graph isomorphism is, in my opinion, not something you should be thinking about implementing yourself. I believe the current state-of-the-art is Brendan McKay's Nauty (and associated programs/libraries). It's a bit of a bear to work with, but it may be worth it to avoid doing your own, naive graph isomorphism. Also, it's primarily geared towards undirected graphs, but it can do digraphs as well. You may want to check out the geng (which generates undirected graphs) and directg (which generates digraphs given an underlying graph) utilities that come with Nauty.
This is more of a comment than an answer, because it seems like I have missed something in your question.
First of all, is it possible for such a graph to be acyclic?
I am also wondering about your symmetry constraint. Does this not make all such graphs symmetric to one another? Is it allowed to permute rows and columns of the connection-matrix?
For example, if we allow self-connections in the graph, does the following connection-matrix fulfill your conditions?
1 1 0 0 0 0 0 1
1 1 1 0 0 0 0 0
0 1 1 1 0 0 0 0
0 0 1 1 1 0 0 0
0 0 0 1 1 1 0 0
0 0 0 0 1 1 1 0
0 0 0 0 0 1 1 1
1 0 0 0 0 0 1 1
Starting from this matrix, is it then not possible to permute the rows and columns of it to obtain all such graphs where all rows and columns have a sum of three?
One example of such a matrix can be obtained from the above matrix A in the following way (using MATLAB).
>> A(randperm(8),randperm(8))
ans =
0 1 0 0 0 1 1 0
0 0 1 0 1 0 1 0
1 1 0 1 0 0 0 0
1 1 0 0 0 1 0 0
1 0 0 1 0 0 0 1
0 0 1 1 0 0 0 1
0 0 1 0 1 0 0 1
0 0 0 0 1 1 1 0
PS. In this case I have repeated the command a few times in order to obtain a matrix with only zeros in the diagonal. :)
Edit
Ah, I see from your comments that I was not correct. Of course the permutation index must be the same for rows and columns. I at least should have noticed it when I started out with a graph with self-connections and obtained one without them after the permutation.
A random isomorphic permutation would instead look like this:
idx = randperm(8);
A(idx,idx);
which will keep all the self-connections.
Perhaps this could be of some use when the matrices are generated, but it is not at all as useful as I thought it would be.

Resources