I'm having hard time to construct good algorithm of teaching neural network because there are some manu nuanses. First things first: my goal is to teach nn xor function i'm using sigmoid as activation function and simple gradient descent. Feed forward is easy but backprop is confusing somehow - steps that are common for most descriptions of algorithm are:
1. Calculate error on output layer.
2. Propagate this error to hidden layer regarding weights
3. Update weights on synapses
So my questions:
1. Should bias be also update and if yes how? Currently i choose bias randomly [0.5;1]?
2. Can be weights be update during step 2?
3. My approch assumes that the first layer in nn is input layer with neurons. So what values in this layer must be updated? Only weights on synapses connecting input layer to the first hidden layer?
The bias should also be updated. Treat the bias as a weight with activation 1.
The backpropagation step should include a weight update. That is the purpose of this step.
The first layer is a question of terminology. Often the inputs are modeled as a layer. However this is a special case as input=activation. The inputs themselves do not have a weight. The weights are the connections to the next layer. This layer is then no different from the rest of the other layers.
Related
I am currently delevoping a Self-Organizing-Map prototype for clustering of BigData and don't understand one thing.
The SOM-algorithm updates the weights of the Best Matching Unit and its neighborhood to better fit the input vector. Does the algorithm somehow change the real position of the neuron on the lattice? I mean, if I define a square lattice (5x5) each neuron can be referenced by a two dimensional coordinate (for example 1/1 or 1/5). So what I ask is, if the SOM algorithm update the coordinate of the neuron (for example from 1/1 to 1.1/1.3).
If not, how does the software show up clusters? I mean some programs show a unified distance between neurons (for example black areas are those, where the distances between neurons are low and white areas are those, where the distances are high). So how does the software know, which Neurons are next to each other?
Weight vectors are updated but the positions of neurons in the lattice never change.
SOM is a topology preserving map. That is, if two vectors are close to one another in input space, so is the case for the map representation [1].
But sometime topographic errors occur.
[1]: Engelbrecht, A.P., 2007. Computational intelligence: an introduction. John Wiley & Sons.
I am curious to know why the Watts-Strogatz random graph generation model uses a ring
lattice in its algorithm.
I am creating a spatially embedded network, where nodes are randomly placed on a grid. Each
node will connect to its k-nearest neighbors. Then, at random with probability p, connections
are rewired.
In principle, this sounds exactly the same as the Watts-Strogatz algorithm, but nodes are
not neatly organised in a lattice. In terms of the logical topology, are there any significant
differences?
To answer your first question (why using a ring): to my opinion, they used a ring lattice because it's the simplest form of lattice, and they didn't need to use a more complex form to illustrate their point. By using the ring as a starting point and by applying their rewiring process, they showed they could obtain the desired topological properties.
For your second question (regarding your own method), I think the effect depends on the spatial distribution of the nodes. Also, what is the exact rule you use to create a link between two nodes? Do both nodes need to be among the k-nearest neighbors of one another? (in which case the maximal degree is k), or do you apply only a unilateral condition? (and then, the degree can be much larger than k depending on the spatial distribution).
I searched through the internet a lot but could not come to the conclusion why do we use weights in each layer of backpropagation algorithm. i know that the weights are multiplied to the output of previous layer to get the input of the next layer, but i do not understand why do we need these weights? Please help
Thanks Ark
Without the weights there could be no learning. Weigths are the values which are adjusted during the backpropagation learning process. A neural network is nothing more than a function and the weights parametrize the behavior of this function.
To better understand first look at a single layer perceptron such as the ADALINE.
The question is about KNN algorithm for classification - the class labels of training samples are discrete.
Suppose that the training set has n points that are identical to the new pattern which we are about to classify, that is the distances from these points to new observation are zero (or <epsilon). It may happen that these identical training points have different class labels. Now suppose that n < K and there are some other training points which are the part of nearest neighbors collection but have non-zero distances to the new observation. How do we assign the class label to new point in this case?
There are few possibilities such as:
consider all K (or more if there are ties with the worst nearest neighbor) neighbors and do majority voting
ignore the neighbors with non-zero distances if there are "clones" of the new point in training data and take the majority vote only over the clones
same as 2. but assign the class with the highest prior probability in the training data (among clones)
...
Any ideas? (references would be appreciated as well)
Each of proposed methods will work in some problems, and in some they won't. In general, there is no need to actually think about such border cases and simply use the default behaviour (option "1" from your question). In fact, if border cases of any classification algorithm becomes the problem it is a signal of at least one of:
bad problem definition,
bad data representation,
bad data preprocessing,
bad model used.
From the theoretical point of view nothing changes if some points are exactly in the place of your training data. The only difference would be, if you have consistent training set (in the sense, that duplicates with different labels do not occur in the training data) and 100% correct (each label is a perfect labeling fot this point), then it would be reasonable to add an if clausule that answers according to the label of the point. But in reallity it is rarely the case.
I need some input on solving the following problem:
Given a set of unordered (X,Y) points, I need to reduce/simplify the points and end up with a connected graph representation.
The following image show an example of an actual data set and the corresponding desired output (hand-drawn by me in MSPaint, sorry for shitty drawing, but the basic idea should be clear enough).
Some other things:
The input size will be between 1000-20000 points
The algorithm will be run by a user, who can see the input/output visually, tweak input parameters, etc. So automatically finding a solution is not a requirement, but the user should be able to achieve one within a fairly limited number of retries (and parameter tweaks). This also means that the distance between the nodes on the resulting graph can be a parameter and does not need to be derived from the data.
The time/space complexity of the algorithm is not important, but in practice it should be possible to finish a run within a few seconds on a standard desktop machine.
I think it boils down to two distinct problems:
1) Running a filtering pass, reducing the number of points (including some noise filtering for removing stray points)
2) Some kind of connect-the-dots graph problem afterwards. A very problematic area can be seen in the bottom/center part on the example data. Its very easy to end up connecting wrong parts of the graph.
Could anyone point me in the right direction for solving this? Cheers.
K-nearest neighbors (or, perhaps more accurately, a sigma neighborhood) might be a good starting point. If you're working in strictly Euclidean space, you may able to achieve 90% of what you're looking for by specifying some L2 distance threshold beyond which points are not connected.
The next step might be some sort of spectral graph analysis where you can define edges between points using some sort of spectral algorithm in addition to a distance metric. This would give the user a lot more knobs to turn with regards to the connectivity of the graph.
Both of these approaches should be able to handle outliers, e.g. "noisy" points that simply won't be connected to anything else. That said, you could probably combine them for the best possible performance (as spectral clustering performs a lot better when there are no 1-point clusters): run a basic KNN to identify and remove outliers, then a spectral analysis to more robustly establish edges.