Can usage of FLANN come under Machine Learning? - algorithm

I have written a program to compute SURF features and then use FLANN(Fast Library for Nearest Neighbour) to match and show the nearest neighbours. Now can the usage of FLANN be considered as using machine learning as it is my understanding that it is a an approximate version of k- nearest neighbour search which is considered as machine learning algorithm(supervised learning).

You will find mention of methods like FLANN, LSH, Spectral Hashing, and KD-tree (variants) in a lot of machine learning publications.
However, as you said, these methods themselves are not learners/classifiers, but they may often be used within typical machine learning applications. Per your example, FLANN is not a supervised classifier, but it can be used to significantly improve taggers and recommenders.
(That said, this question may be more appropriate for CrossValidated or the proposed Machine Learning forum.)

FLANN is just an approximate nearest neighbor search stucture; that's not machine learning.
But your K-nearest-neighbor classifier that uses FLANN is machine learning.

Related

performance issue, edit distance for large strings LCP vs Levenshtein vs SIFT

So I'm trying to calculate the distance between two large strings (about 20-100).
The obstacle is the performance, I need to run 20k distance comparisons. (It takes hours)
After investigating, I came a cross few algorithms, And I'm having trouble to decide which to choose. (based on performance VS accuracy)
https://github.com/tdebatty/java-string-similarity - performance list for each of the algorithms.
** EDITED **
Is SIFT4 algorithm well-proven / reliable?
Is SIFT4 the right algorithm for the task?
How come it's so much faster than LCP-based / Levenshtein algorithm?
Is SIFT also used in image processing? or is it a different thing? answered by AMH
Thanks.
As far as i know Scale-invariant feature transform (SIFT) is an algorithm in computer vision detect and describe local features in images.
also if you want to find similar images you must compare local features of images to each other by calculating their distance which may do what you intend to do. but local features are vector of numbers as i remember. it uses Brute-Force matcher:Feature Matching - OpenCV Library - SIFT
please read about SIFT here: http://docs.opencv.org/3.1.0/da/df5/tutorial_py_sift_intro.html
SIFT4 which is mentioned on your provided link is completely different thing.

What is the algorithm used by the "Universal Recommender" on Prediction.IO?

good Afternoon
What is the name algorithm used by the "Universal Recommender (UR)" on Prediction.IO?
during which i know Algorithm for
system recommendation are "collaborative filtering" and "content based filtering".
thanks!
It uses Correlated Cross-Occurrence(CCO) algorithm from Apache-mahout.
check out these
https://actionml.com/blog/cco
https://mahout.apache.org/users/algorithms/recommender-overview.html
Prediction.io uses Apache Spark MLLib's Alternating Least Squares matrix factorization method (ALS). It is one of basic methods of Collaborative Filtering, which are User-based, Item-based and Matrix factorization. Documentation can be found at http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html
Universal Recommender Template use this algorithm for computing "events" that are appearing "often" with "buying" some "item". Use of factorization is not what authors of Universal Recommender principle describe in their original idea, instead they used LLR similarity to find statistically significant "events". I personally doubt about suitability of use of matrix factorization and use of HBase (use Redis cluster instead). You can read about Universal Recommender general idea at https://www.mapr.com/practical-machine-learning and http://mahout.apache.org/users/algorithms/recommender-overview.html

Reinforcement Learning for Continuous State Spaces with Discrete Actions (in NetLogo)

For anybody unfamiliar, NetLogo is an agent-based modeling language. In this case the agents are simulating organisms in a dynamic environment where they search for energy. The energy moves unpredictably but diffuses over time so that foragers can find the source by going "uphill". (I'm simplifying slightly, the agents also socialize and reproduce, but if we can find a good algorithm for eating and moving then it should generalize)
The goal is for agents to maximize their energy using two actions: move N/S/E/W, and eat. The agents have access to some information: the energy level on surrounding locations and their own energy, all continuous variables. The agents can't have full explicit knowledge of their past or the world, which limits the use of most traditional RL algorithms. They can have implicit knowledge (so for example a neural network with weights that are adapted over time is okay).
My intuition was that a neural network could solve this and I implemented one successfully... But I simply ran the simulation a few thousand times to optimize the weights. This (1) doesn't guarantee convergence, and (2) is probably far from optimal/efficient.
Any ideas for how to go about learning in this world? Either a better reinforcement learning approach or an algorithm for learning the neural network weights would be great. I have gone through a lot of literature recently trying to find a solution and each algorithm I find ends up having one or two issues that preclude their use. Thanks in advance for any help!
Since your environment is continuous, standard algorithms such as Q-learning or SARSA aren't directly applicable -- they expect a discrete environment state. However, your actions are discrete and that might be useful.
One possibility is to use some Bayesian approach in order to estimate the world state and apply it to Reinforcement Learning with function approximation. In fact, that's what I did in my undergraduate thesis, in which the state was estimated via Bayesian Programming.

What are some algorithms for symbol-by-symbol handwriting recognition?

I think there are some algorithms that evaluate difference between drawn symbol and expected one, or something like that. Any help will be appreciated :))
You can implement a simple Neural Network to recognize handwritten digits. The simplest type to implement is a feed-forward network trained via backpropagation (it can be trained stochastically or in batch-mode). There are a few improvements that you can make to the backpropagation algorithm that will help your neural network learn faster (momentum, Silva and Almeida's algorithm, simulated annealing).
As far as looking at the difference between a real symbol and an expected image, one algorithm that I've seen used is the k-nearest-neighbor algorithm. Here is a paper that describes using the k-nearest-neighbor algorithm for character recognition (edit: I had the wrong link earlier. The link I've provided requires you to pay for the paper; I'm trying to find a free version of the paper).
If you were using a neural network to recognize your characters, the steps involved would be:
Design your neural network with an appropriate training algorithm. I suggest starting with the simplest (stochastic backpropagation) and then improving the algorithm as desired, while you train your network.
Get a good sample of training data. For my neural network, which recognizes handwritten digits, I used the MNIST database.
Convert the training data into an input vector for your neural network. For the MNIST data, you will need to binarize the images. I used a threshold value of 128. I started with Otsu's method, but that didn't give me the results I wanted.
Create your network. Since the images from MNIST come in an array of 28x28, you have an input vector with 784 components and 1 bias (so 785 inputs), to your neural network. I used one hidden layer with the number of nodes set as per the guidelines outlined here (along with a bias). Your output vector will have 10 components (one for each digit).
Randomly present training data (so randomly ordered digits, with random input image for each digit) to your network and train it until it reaches a desired error-level.
Run test data (MNIST data comes with this as well) against your neural network to verify that it recognizes digits correctly.
You can check out an example here (shameless plug) that tries to recognize handwritten digits. I trained the network using data from MNIST.
Expect to spend some time getting yourself up to speed on neural network concepts, if you decide to go this route. It took me at least 3-4 days of reading and writing code before I actually understood the concept. A good resource is heatonresearch.com. I recommend starting with trying to implement neural networks to simulate the AND, OR, and XOR boolean operations (using a threshold activation function). This should give you an idea of the basic concepts. When it actually comes down to training your network, you can try to train a neural network that recognizes the XOR boolean operator; it's a good place to start for an introduction to learning algorithms.
When it comes to building the neural network, you can use existing frameworks like Encog, but I found it to be far more satisfactory to build the network myself (you learn more that way I think). If you want to look at some source, you can check out a project that I have on github (shameless plug) that has some basic classes in Java that help you build and train simple neural-networks.
Good luck!
EDIT
I've found a few sources that use k-nearest-neighbors for digit and/or character recognition:
Bangla Basic Character Recognition Using Digital Curvelet Transform (The curvelet coefficients of an
original image as well as its morphologically altered versions are used to train separate k–
nearest neighbor classifiers. The output values of these classifiers are fused using a simple
majority voting scheme to arrive at a final decision.)
The Homepage of Nearest Neighbors and Similarity Search
Fast and Accurate Handwritten Character Recognition using Approximate Nearest Neighbors Search on Large Databases
Nearest Neighbor Retrieval and Classification
For resources on Neural Networks, I found the following links to be useful:
CS-449: Neural Networks
Artificial Neural Networks: A neural network tutorial
An introduction to neural networks
Neural Networks with Java
Introduction to backpropagation Neural Networks
Momentum and Learning Rate Adaptation (this page goes over a few enhancements to the standard backpropagation algorithm that can result in faster learning)
Have you checked Detexify. I think it does pretty much what you want
http://detexify.kirelabs.org/classify.html
It is open source, so you could take a look at how it is implemented.
You can get the code from here (if I do not recall wrongly, it is in Haskell)
https://github.com/kirel/detexify-hs-backend
In particular what you are looking for should be in Sim.hs
I hope it helps
Addendum
If you have not implemented machine learning algorithms before you should really check out: www.ml-class.org
It's a free class taught by Andrew Ng, Director of the Stanford Machine Learning Centre. The course is an entirely online-taught course specifically on implementing a wide range of machine learning algorithms. It does not go too much into the theoretical intricacies of the algorithms but rather teaches you how to choose, implement, use the algorithms and how diagnose their performance. - It is unique in that your implementation of the algorithms is checked automatically! It's great for getting started in machine learning at you have instantaneous feedback.
The class also includes at least two exercises on recognising handwritten digits. (Programming Exercise 3: with multinomial classification and Programming Exercise 4: with feed-forward neural networks)
The class has started a while ago but it should still be possible to sign up. If not, a new run should start early next year. If you want to be able to check your implementations you need to sign up for the "Advanced Track".
One way to implement handwriting recognition
The answer to this question depends on a number of factors, including what kind of resource constraints you have (embedded platform) and whether you have a good library of correctly labelled symbols: i.e. different examples of a handwritten letter for which you know what letter they represent.
If you have a decent sized library, implementation of a quick and dirty standard machine learning algorithm is probably the way to go. You can use multinomial classifiers, neural networks or support vector machines.
I believe a support vector machine would be fastest to implement as there are excellent libraries out there who handle the machine learning portion of the code for you, e.g. libSVM. If you are familiar with using machine learning algorihms, this should take you less than 30 minutes to implement.
The basic procedure you would probably want to implement is as follows:
Learning what symbols "look like"
Binarise the images in your library.
Unroll the images into vectors / 1-D arrays.
Pass the "vector representation" of the images in your library and their labels to libSVM to get it to learn how the pixel coverage relates to the represented symbol for the images in the library.
The algorithm gives you back a set of model parameters which describe the recognition algorithm that was learned.
You should repeat 1-4 for each character you want to recognise to get an appropriate set of model parameters.
Note: steps 1-4 you only have to carry out once for your library (but once for each symbol you want to recognise). You can do this on your developer machine and only include the parameters in the code you ship / distribute.
If you want to recognise a symbol:
Each set of model parameters describes an algorithm which tests whether a character represents one specific character - or not. You "recognise" a character by testing all the models with the current symbol and then selecting the model that best fits the symbol you are testing.
This testing is done by again passing the model parameters and the symbol to test in unrolled form to the SVM library which will return the goodness-of-fit for the tested model.

Nesting maximum amount of shapes on a surface

In industry, there is often a problem where you need to calculate the most efficient use of material, be it fabric, wood, metal etc. So the starting point is X amount of shapes of given dimensions, made out of polygons and/or curved lines, and target is another polygon of given dimensions.
I assume many of the current CAM suites implement this, but having no experience using them or of their internals, what kind of computational algorithm is used to find the most efficient use of space? Can someone point me to a book or other reference that discusses this topic?
After Andrew in his answer pointed me to the right direction and named the problem for me, I decided to dump my research results here in a separate answer.
This is indeed a packing problem, and to be more precise, it is a nesting problem. The problem is mathematically NP-hard, and thus the algorithms currently in use are heuristic approaches. There does not seem to be any solutions that would solve the problem in linear time, except for trivial problem sets. Solving complex problems takes from minutes to hours with current hardware, if you want to achieve a solution with good material utilization. There are tens of commercial software solutions that offer nesting of shapes, but I was not able to locate any open source solutions, so there are no real examples where one could see the algorithms actually implemented.
Excellent description of the nesting and strip nesting problem with historical solutions can be found in a paper written by Benny Kjær Nielsen of University of Copenhagen (Nielsen).
General approach seems to be to mix and use multiple known algorithms in order to find the best nesting solution. These algorithms include (Guided / Iterated) Local Search, Fast Neighborhood Search that is based on No-Fit Polygon, and Jostling Heuristics. I found a great paper on this subject with pictures of how the algorithms work. It also had benchmarks of the different software implementations so far. This paper was presented at the International Symposium on Scheduling 2006 by S. Umetani et al (Umetani).
A relatively new and possibly the best approach to date is based on Hybrid Genetic Algorithm (HGA), a hybrid consisting of simulated annealing and genetic algorithm that has been described by Wu Qingming et al of Wuhan University (Quanming). They have implemented this by using Visual Studio, SQL database and genetic algorithm optimization toolbox (GAOT) in MatLab.
You are referring to a well known computer science domain of packing, for which there are a variety of problems defined and research done, for both 2-dimnensional space as well as 3-dimensional space.
There is considerable material on the net available for the defined problems, but to find it you knid of have to know the name of the problem to search for.
Some packages might well adopt a heuristic appraoch (which I suspect they will) and some might go to the lengths of calculating all the possibilities to get the absolute right answer.
http://en.wikipedia.org/wiki/Packing_problem

Resources