weights of layers in backpropagation algorithm - algorithm

I searched through the internet a lot but could not come to the conclusion why do we use weights in each layer of backpropagation algorithm. i know that the weights are multiplied to the output of previous layer to get the input of the next layer, but i do not understand why do we need these weights? Please help
Thanks Ark

Without the weights there could be no learning. Weigths are the values which are adjusted during the backpropagation learning process. A neural network is nothing more than a function and the weights parametrize the behavior of this function.
To better understand first look at a single layer perceptron such as the ADALINE.

Related

Neural network backpropagation and bias

I'm having hard time to construct good algorithm of teaching neural network because there are some manu nuanses. First things first: my goal is to teach nn xor function i'm using sigmoid as activation function and simple gradient descent. Feed forward is easy but backprop is confusing somehow - steps that are common for most descriptions of algorithm are:
1. Calculate error on output layer.
2. Propagate this error to hidden layer regarding weights
3. Update weights on synapses
So my questions:
1. Should bias be also update and if yes how? Currently i choose bias randomly [0.5;1]?
2. Can be weights be update during step 2?
3. My approch assumes that the first layer in nn is input layer with neurons. So what values in this layer must be updated? Only weights on synapses connecting input layer to the first hidden layer?
The bias should also be updated. Treat the bias as a weight with activation 1.
The backpropagation step should include a weight update. That is the purpose of this step.
The first layer is a question of terminology. Often the inputs are modeled as a layer. However this is a special case as input=activation. The inputs themselves do not have a weight. The weights are the connections to the next layer. This layer is then no different from the rest of the other layers.

Demons algorithm for image registration (for dummies)

I was trying to make a application that compares the difference between 2 images in java with opencv. After trying various approaches I came across the algorithm called Demons algorithm.
To me it seems to give the difference of images by some transformation on each place. But I couldn't understand it since the references I found were too complex for me.
Even the demons algorithm does not do what I need I'm interested in learning it.
Can any one explain simply what happens in the demons algorithm and how to write a simple code to use that algorithm on 2 images.
I can give you an overview of general algorithms for deformable image registration, demons is one of them
There are 3 components of the algorithm, a similarity metric, a transformation model and an optimization algorithm.
A similarity metric is used to compute pixel based / patch based similarity between pixels/patches. Common similarity measures are SSD, normalized cross correlation for mono-modal images while information theoretic measures like mutual information are used in the case of multi-modal image registration.
In the case of deformable registration, they generally have a regular grid super-imposed over the image and the grid is deformed by solving an optimization problem which is formulated such that the similarity metric and the smoothness penalty imposed over the transformation is minimized. In deformable registration, once there are deformations over the grid, the final transformation at the pixel level is computed using a B-Spine interpolation of the grid at the pixel level so that the transformation is smooth and continuous.
There are 2 general approaches towards solving the optimization problem, some people use discrete optimization and solve it as a MRF optimization problem while some people use gradient descent, I think demons uses gradient descent.
In case of MRF based approaches, the unary cost is the cost for deforming each node in grid and it is the similarity computed between patches, the pairwise cost which imposes the smoothness of the grid, is generally a potts/truncated quadratic potential which ensures that neighboring nodes in the grid have almost the same displacement. Once you have the unary and pairwise cost, you feed it to a MRF optimization algorithm and get the displacements at the grid level, then you use a B-Spline interpolation to compute pixel level displacement. This process is repeated in a coarse to fine fashion over several scales and also the algorithm is run many times at each scale (reducing the displacement at each node every time).
In case of gradient descent based methods, they formulate the problem with the similarity metric and the grid transformation computed over the image and then compute the gradient of the energy function which they have formulated. The energy function is minimized using iterative gradient descent, however these approaches can get stuck in a local minima and are quite slow.
Some popular methods are DROP, Elastix, itk provides some tools
If you want to know more about algorithms related to deformable image registration, I will recommend you to take a look to FAIR( guide book), FAIR is a toolbox for Matlab so you will have examples to understand the theory.
http://www.cas.mcmaster.ca/~modersit/FAIR/
Then if you want to specifically see some demon example,, here you have this other toolbox:
http://www.mathworks.es/matlabcentral/fileexchange/21451-multimodality-non-rigid-demon-algorithm-image-registration

Algorithm for building a graph from a set of points

I need some input on solving the following problem:
Given a set of unordered (X,Y) points, I need to reduce/simplify the points and end up with a connected graph representation.
The following image show an example of an actual data set and the corresponding desired output (hand-drawn by me in MSPaint, sorry for shitty drawing, but the basic idea should be clear enough).
Some other things:
The input size will be between 1000-20000 points
The algorithm will be run by a user, who can see the input/output visually, tweak input parameters, etc. So automatically finding a solution is not a requirement, but the user should be able to achieve one within a fairly limited number of retries (and parameter tweaks). This also means that the distance between the nodes on the resulting graph can be a parameter and does not need to be derived from the data.
The time/space complexity of the algorithm is not important, but in practice it should be possible to finish a run within a few seconds on a standard desktop machine.
I think it boils down to two distinct problems:
1) Running a filtering pass, reducing the number of points (including some noise filtering for removing stray points)
2) Some kind of connect-the-dots graph problem afterwards. A very problematic area can be seen in the bottom/center part on the example data. Its very easy to end up connecting wrong parts of the graph.
Could anyone point me in the right direction for solving this? Cheers.
K-nearest neighbors (or, perhaps more accurately, a sigma neighborhood) might be a good starting point. If you're working in strictly Euclidean space, you may able to achieve 90% of what you're looking for by specifying some L2 distance threshold beyond which points are not connected.
The next step might be some sort of spectral graph analysis where you can define edges between points using some sort of spectral algorithm in addition to a distance metric. This would give the user a lot more knobs to turn with regards to the connectivity of the graph.
Both of these approaches should be able to handle outliers, e.g. "noisy" points that simply won't be connected to anything else. That said, you could probably combine them for the best possible performance (as spectral clustering performs a lot better when there are no 1-point clusters): run a basic KNN to identify and remove outliers, then a spectral analysis to more robustly establish edges.

Pathfinding through four dimensional data

The problem is finding an optimal route for a plane through four dimensional winds (winds at differing heights and that change as you travel (predicative wind model)).
I've used the traditional A* search algorithm and hacked it to get it to work in 3 dimensions and with wind vectors.
It works in a lot the cases but is extremely slow (im dealing with huge amounts of data nodes) and doesnt work for some edge cases.
I feel like I've got it working "well" but its feels very hacked together.
Is there a better more efficient way for path finding through data like this (maybe a genetic algorithm or neural network), or something I havent even considered? Maybe fluid dynamics? I dont know?
Edit: further details.
Data is wind vectors (direction, magnitude).
Data is spaced 15x15km at 25 different elevation levels.
By "doesnt always work" I mean it will pick a stupid path for an aircraft because the path weight is the same as another path. Its fine for path finding but sub-optimal for a plane.
I take many things into account for each node change:
Cost of elevation over descending.
Wind resistance.
Ignoring nodes with too high of a resistance.
Cost of diagonal tavel vs straight etc.
I use euclidean distance as my heuristic or H value.
I use various factors for my weight or G value (the list above).
Thanks!
You can always have a trade off of time-optimilaity by using a weighted A*.
Weighted A* [or A* epsilon], is expected to find a path faster then A*, but the path won't be optimal [However, it gives you a bound on its optimality, as a paremeter of your epsilon/weight].
A* isn't advertised to be the fastest search algorithm; it does guarantee that the first solution it finds will be the best (assuming you provide an admissible heuristic). If yours isn't working for some cases, then something is wrong with some aspect of your implementation (maybe w/ the mechanics of A*, maybe the domain-specific stuff; given you haven't provided any details, can't say more than that).
If it is too slow, you might want to reconsider the heuristic you are using.
If you don't need an optimal solution, then some other technique might be more appropriate. Again, given how little you have provided about the problem, hard to say more than that.
Are you planning offline or online?
Typically for these problems you don't know what the winds are until you're actually flying through them. If this really is an online problem you may want to consider trying to construct a near-optimal policy. There is a quite a lot of research in this area already, one of the best is "Autonomous Control of Soaring Aircraft by Reinforcement Learning" by John Wharington.

Explain 0-extension algorithm

I'm trying to implement the 0-extension algorithm.
It is used to colour a graph with a number of colours where some nodes already have a colour assigned and where every edge has a distance. The algorithm calculates an assignment of colours so that neighbouring nodes with the same colour have as much distance between them as possible.
I found this paper explaining the algorithm: http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=1FBA2D22588CABDAA8ECF73B41BD3D72?doi=10.1.1.100.8049&rep=rep1&type=pdf
but I don't see how I need to implement it.
I already asked this question on the "theoretical computer science" site, but halfway the discussion we went beyond the site's scope:
https://cstheory.stackexchange.com/questions/6163/explain-0-extension-algorithm
Can anyone explain this algorithm in layman's terms?
I'm planning to make the final code opensource in the jgrapht package.
The objective of 0-extension is to minimize the total weighted cost of edges with different color endpoints rather than to maximize it, so 0-extension is really a clustering problem rather than a coloring problem. I'm generally skeptical that using a clustering algorithm to color would have good results. If you want something with a theoretical guarantee, you could look into approximations to the MAXCUT problem (really a generalization if there are more than two colors), but I suspect that a local-search algorithm would work better in practice.

Resources