How to adjust the cost function for rprop in Accord? - backpropagation

I am playing with Accord.net and seem to be unable to find detailed documentation anywhere, sadly.
I would like to try different cost functions for an activation network with a resilient backpropagation teacher. What is the default? Least squares? How can I change it to, say, cross entropy?

Related

Path finding in real world maps with custom valuation function

Description:
Our customer interested in logistics wants automated path finding for his business. The issue is that each city, harbor, state, and country has different legal system with different politics for various goods. For example, some goods are forbidden to be transported through the country or fee applies on others, etc.
Intention:
For a given transport of particular goods I need to find the best possible route with a respect to the known politics. If no such route exists, I have to find several the less problematic alternatives.
Questions:
I believe that custom weights of nodes and edges in the graph disqualifies the public maps APIs. The quick search I made showed that no well-known API accepts custom valuation function. Am I right?
I thought that the path finding over custom graph is quite simple even with the custom valuation function. On the other hand, the real world maps are such a giant graph that only thinking about application of conventional algorithms seems silly. Should I think about this solution or is that too complicated and I should look at some other options?
The only thing I can think of to be possible is something like OSPF algorithm - divide the world into regions with their politics and then find the routes only through the possible regions. For these routes for each region find routes through states, etc. That means dynamically change granularity based on the supported geographical objects (countries, states, cities, ...) and slowly converge to the highest granularity, i.e., streets. The bad side of this approach is that it requires a lot of programming as well as lot of computation. Furthermore, I am not sure if maps with such granularity exist and are publicly available. Is this wrong way of thinking or is this too complicated?
What are other options? I could not figure out anything else.

Indoor positioning of a moving object in 3D space

I am working on a project which determines the indoor position of an object which moves in 3D space (e.g. a quadcopter).
I have built some prototypes which use a combination of gyroscope, accelerometer and compass. However the results were far from being satisfactory, especially related to the moved distance, which I calculated using the accelerometer. Determining the orientation using a fusion of gyroscope and compass was close to perfect.
In my opinion I am missing some more sensors to get some acceptable results. Which additional sensors would I need for my purpose? I was thinking about adding one or more infrared cameras/distance sensors. I have never worked with such sensors and I am not sure which sensor would lead to better results.
I appreciate any suggestions, ideas and experiences.
The distance checking would decidedly help. The whole algorithm of any surface geo survey is based on the conception of start/final check. You know the start, then you add erroneous steps, and come to the finish that you know, too. But you have collected some sum error by the way. Then you distribute the error found among all steps done, with the opposite sign, of course.
What is interesting, in most cases you not only somewhat diminish the effect of arbitrary mistakes, but almost eliminate the systematical ones. Because they mostly are linear or close to linear and such linear distribution of found error will simply kill them.
That is only the illustration idea. Any non-primitive task will contain collecting all data and finding their dependencies, linearizing them and creating parametrical or correlational systems of equations. The solving of them you get the optimal changes in the measured values. By parametrical method you can also easily find approximate errors of these new values.
The utmost base of these methods is the lesser squares method of Gauss. The more concrete methodics can be found in old books on geodesy/geomatic/triangulation/ geodesy nets. The books after introduction of GPS are for nothing, because everything was terribly simplified by it. Look for the books with matrix formulaes for lesser squares solutions.
Sorry if I had translated some terms into English with errors.

methods of lessening the number of features when machine learning on images

I'm performing machine learning on a 25 x 125 image set. After getting the rgb components it becomes 9375 features per example (and I have about 675). I was trying fminunc and fminsearch and I thought that there was something wrong with my method, because it was 'freezing', but when I decrease the number of features by a factor of 10, it took a while but worked. How can I minimise the number of features, while maintaining the information relevant in the picture? I tried k-means, but I don't see how that helps, as I still have the same number of features, just that there are a lot of redundancy.
You're looking for feature reduction or selection methods. For example see this library:
http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html
or see this question
Feature Selection in MATLAB
If you google feature selection/reduction matlab will find many relevant articles/tools. Or you could google some commonly used methods like PCA (principal component analysis).

Implementing a model written in a Predicate Calculus into ProLog, how do I start?

I have four sets of algorithms that I want to set up as modules but I need all algorithms executed at the same time within each module, I'm a complete noob and have no programming experience. I do however, know how to prove my models are decidable and have already done so (I know Applied Logic).
The models are sensory parsers. I know how to create the state-spaces for the modules but I don't know how to program driver access into ProLog for my web cam (I have a Toshiba Satellite Laptop with a built in web cam). I also don't know how to link the input from the web cam to the variables in the algorithms I've written. The variables I use, when combined and identified with functions, are set to identify unknown input using a probabilistic, database search for best match after a breadth first search. The parsers aren't holistic, which is why I want to run them either in parallel or as needed.
How should I go about this?
I also don't know how to link the
input from the web cam to the
variables in the algorithms I've
written.
I think the most common way for this is to use the machine learning approach: first calculate features from your video stream (like position of color blobs, optical flow, amount of green in image, whatever you like). Then you use supervised learning on labeled data to train models like HMMs, SVMs, ANNs to recognize the labels from the features. The labels are usually higher level things like faces, a smile or waving hands.
Depending on the nature of your "variables", they may already be covered on the feature-level, i.e. they can be computed from the data in a known way. If this is the case you can get away without training/learning.

Artificial Neural Network Question

Generally speaking what do you get out of extending an artificial neural net by adding more nodes to a hidden layer or more hidden layers?
Does it allow for more precision in the mapping, or does it allow for more subtlety in the relationships it can identify, or something else?
There's a very well known result in machine learning that states that a single hidden layer is enough to approximate any smooth, bounded function (the paper was called "Multilayer feedforward networks are universal approximators" and it's now almost 20 years old). There are several things to note, however.
The single hidden layer may need to be arbitrarily wide.
This says nothing about the ease with which an approximation may be found; in general large networks are hard to train properly and fall victim to overfitting quite frequently (the exception are so-called "convolutional neural networks" which really are only meant for vision problems).
This also says nothing about the efficiency of the representation. Some functions require exponential numbers of hidden units if done with one layer but scale much more nicely with more layers (for more discussion of this read Scaling Learning Algorithms Towards AI)
The problem with deep neural networks is that they're even harder to train. You end up with very very small gradients being backpropagated to the earlier hidden layers and the learning not really going anywhere, especially if weights are initialized to be small (if you initialize them to be of larger magnitude you frequently get stuck in bad local minima). There are some techniques for "pre-training" like the ones discussed in this Google tech talk by Geoff Hinton which attempt to get around this.
This is very interesting question but it's not so easy to answer. It depends on the problem you try to resolve and what neural network you try to use. There are several neural network types.
I general it's not so clear that more nodes equals more precision. Research show that you need mostly only one hidden layer. The numer of nodes should be the minimal numer of nodes that are required to resolve a problem. If you don't have enough of them - you will not reach solution.
From the other hand - if you have reached the number of nodes that is good to resolve solution - you can add more and more of them and you will not see any further progress in result estimation.
That's why there are so many types of neural networks. They try to resolve different types of problems. So you have NN to resolve static problems, to resolve time related problems and so one. The number of nodes is not so important like the design of them.
When you have a hidden layer is that you are creating a combined feature of the input. So, is the problem better tackled by more features of the existing input, or through higher-order features that come from combining existing features? This is the trade-off for a standard feed-forward network.
You have a theoretical reassurance that any function can be represented by a neural network with two hidden layers and non-linear activation.
Also, consider using additional resources for boosting, instead of adding more nodes, if you're not certain of the appropriate topology.
Very rough rules of thumb
generally more elements per layer for bigger input vectors.
more layers may let you model more non-linear systems.
If the kind of network you are using has delays in propagation , more layers may allow modelling of time series . Take care to have time jitter in the delays or it wont work very well. If this is just gobbledegook to you, ignore it.
More layers lets you insert recurrent features. This can be very useful for discrimination tasks. You ANN implementation my not permit this.
HTH
The number of units per hidden layer accounts for the ANN's potential to describe an arbitrarily complex function. Some (complicated) functions may require many hidden nodes, or possibly more than one hidden layer.
When a function can be roughly approximated by a certain number of hidden units, any extra nodes will provide more accuracy...but this is only true if the training samples used are enough to justify this addition - otherwise what will happen is "overconvergence". Overconvergence means that your ANN has lost its generalization abilities because it has overemphasized on the particular samples.
In general it is best to use the less hidden units possible, if the resulting network can give good results. The additional training patterns required to justify more hidden nodes can not be found easily in most cases, and accuracy is not the NNs' strong point.

Resources