Practical use cases for machine learning algorithms - algorithm

I am just starting out studying machine learning and currently doing Andrew Ng's course on Coursera. I am going through the course but am a bit lost. It will make studying all those algorithms/theory a lot rewarding if I can see some use cases for them.
For example, the first topic I read about was gradient descent and then linear regression and logistic regression. Are these used directly in practice or are other algorithms like k-means and kernel density used? I guess I am trying to get real world (software engineering, data mining) examples of these topics. Can some one suggest a post that might have some explanation of any machine learning algorithm(s) usage? It will be greatly helpful.

NO FREE LUNCH THEOREM states that if algorithm A outperforms algorithm B for some problem, then loosely speaking there must exist exactly as many other problems where B outperforms.
So, it is difficult to link algorithm with particular use case.
If you are looking only for use cases where you can use machine learning algorithms, visit https://www.kaggle.com/wiki/DataScienceUseCases
Update : Just now, i came across http://pkghosh.wordpress.com Check it out. (use cases with algorithms)

Related

Understanding algorithm design techniques in depth

"Designing the right algorithm for a given application is a difficult job. It requires a major creative act, taking a problem and pulling a solution out of the ether. This is much more difficult than taking someone else's idea and modifying it or tweaking it to make it a little better. The space of choices you can make in algorithm design is enormous, enough to leave you plenty of freedom to hang yourself".
I have studied several basic design techniques of algorithms like Divide and Conquer, Dynamic Programming, greedy, backtracking etc.
But i always fail to recognize what principles to apply when i come across certain programming problems. I want to master the designing of algorithms.
So can any one suggest a best place to understand the principles of algorithm design in depth.....
I suggest Programming Pearls, 2nd edition, by Jon Bentley. He talks a lot about algorithm design techniques and provides examples of real world problems, how they were solved, and how different algorithms affected the runtime.
Throughout the book, you learn algorithm design techniques, program verification methods to ensure your algorithms are correct, and you also learn a little bit about data structures. It's a very good book and I recommend it to anyone who wants to master algorithms. Go read the reviews in amazon: http://www.amazon.com/Programming-Pearls-2nd-Edition-Bentley/dp/0201657880
You can have a look at some of the book's contents here: http://netlib.bell-labs.com/cm/cs/pearls/
Enjoy!
You can't learn algorithm design just from reading books. Certainly, books can help. Books like Programming Pearls as suggested in another answer are great because they give you problems to work. Each problem forces you to think about how to solve a particular type of problem.
The idea is that you expose yourself to many different types of problems and their solutions. In doing so, you learn how to examine a problem and see if it shares anything in common with problems you've already seen. In that regard, it's not a whole lot different than the way you learned how to solve "word problems" in math class. Granted, most algorithm problems are more complex than having to figure out where on the tracks the two trains will collide, but the way you learn how to solve the problems is the same. You learn common techniques used to solve simple problems, then combine those techniques to solve more complex problems, etc.
Read, practice, lather, rinse, repeat.
In addition to books like Programming Pearls, there are sites online that post different programming challenges that you can test yourself on. It helps if you have friends or co-workers who also are interested in algorithms, because you can bounce ideas off each other and pose interesting challenges, or work together to come up with solutions to problems.
Did I mention that it takes practice?
"Mastering" anything takes time. A long time. A popular theory is that it takes 10,000 hours of practice to become an expert at anything. There's some dispute about that for particular endeavors, but in general it's true. You don't master anything overnight. You have to study. And practice. And read what others have done. Study some more and practice some more.
A good book about algorithm design is Kleinbeg Tardos. Every design technique depends on the problem that you are going to tackle. It is very important to do the exercises in the algorithm books and have feedback from teachers about that.
If there exist a locally optimal choice taht brings the globally optimal solution you can use a greedy algorithm.
If the problem has optimal substructure, you can use dynamic programming.

Neural network and IDS

I am trying to get a grasp on the efficiency of neural networks over other artificial intelligence algorithms for use in intrusion detection systems. Most of the literature I’m reading isn’t giving a good comparison of neural networks compared to other IDS's.
Do they work better (detect more true attacks and less false positives)? Are they more or less efficient?
Another question is how new is NN in IDS environments? Are they used widely, is it old news?
It seems like you're asking the problem:
Will this algorithm help me, reliably, detect when an "intrusion" has happened.
Looking at some of the criticism of Neural Networks, it seems that NNs could be over-trained (which is possible for any AI algorithm); this could be overcome by using k-fold cross validation. NNs are also difficult because it is difficult to explain why the NN gave the result that it did.
Is this a research problem that you are working on?
Initially, I'd look at Naive Bayes to solve this problem because 1) it is easy to implement and 2) serves as a good base-line. Also, look at Decision Trees as a solution to your problem.
After implementing NB and DT, implement the NN and see if NN does better.
You could also try an ensemble technique and see if that gives you better results.
There is a Java-based package called Weka which implements many of the algorithms I've discussed and could be valuable to you.
I am also new in NN. I think you can use Encog Neural Network Library to implement NN Algorithms. It is available in both Java and C#.

Learning efficient algorithms

Up until now I've mostly concentrated on how to properly design code, make it as readable as possible and as maintainable as possible. So I alway chose to learn about the higher level details of programming, such as class interactions, API design, etc.
Algorithms I never really found particularly interesting. As a result, even though I can come up with a good design for my programs, and even if I can come up with a solution to a given problem it rarely is the most efficient.
Is there a particular way of thinking about problems that helps you come up with an as efficient solution as possible, or is it simple a matter of practice and/or memorizing?
Also, what online resources can you recommend that teach you various efficient algorithms for different problems?
Data dominates. If you design your program around the right abstract data structures (ADTs), you often get a clean design, the algorithms follow quite naturally and when performance is lacking, you should be able to "plug in" more efficient ones.
A strong background in maths and logic helps here, as it allows you to visualize your program at a high level as the interaction between functions, sets, graphs, sequences, etc. You then decide whether the sets need to be ordered (balanced BST, O(lg n) operations) or not (hash tables, O(1) operations), what operations need to supported on sequences (vector-like or list-like), etc.
If you want to learn some algorithms, get a good book such as Cormen et al. and try to implement the main data structures:
binary search trees
generic binary search trees (that work on more than just int or strings)
hash tables
priority queues/heaps
dynamic arrays
Introduction To Algorithms is a great book to get you thinking about efficiency of different algorithms/data structures.
The authors of the book also teach an algorithms course on MIT . You can find most lectures here
I would say that in coming up with good algorithms (which is actually part of good design IMHO), you have to develop a way of thinking. This is best done by studying algorithm design. By study I don't mean just knowing all the common algorithms covered in a textbook, but actually understanding how and why they work, and being able to apply the basic idea contained in them to actual problems you are trying to solve.
I would suggest reading a good book on algorithms (my favourite is CLRS). For an online resource I would recommend the series of articles in the TopCoder Algorithm Tutorials.
I do not understand why you would mention practice and memorization in the same breath. Memorization won't help you at all (you probably already know this), but practice is essential. If you cannot apply what you learned, its not really learning. You can practice at various online programming contest/puzzle sites like SPOJ, Project Euler and PythonChallenge.
Recommendations:
First of all i recommend the book "Intro to Algorithms, Second Edition By corman", great book has most(if not all) of the algorithms you will need. (Some of the more important topics are sorting-algorithms, shortest paths, dynamic programing, many data structures like bst, hash maps, heaps).
another great way to learn algorithms is http://ace.delos.com/usacogate, great practice after the begining.
To your questions you will just get used to write good fast running code, after a little practice you just wouldnt want to write un-efficient code.
While I think #larsmans is correct inasmuch that understanding logic and maths is a fast way to understanding how to choose useful ADTs for solving a given problem, studying existing solutions may be more instructive for those of us who struggle with those topics. In particular, reviewing code of established software (OSS) that solves some similar problem as the one you're interested in.
I find a particularly good method for this method of study is reviewing unit tests of such a project. Apache Lucene, for example, has a source control repository containing numerous examples. While it doesn't reveal the underlying algorithms, it helps trace to particular functionality that solves a certain problem. This leads to an opportunity for studying its innards - i.e. an interesting algorithm. In Lucene's case inverted indices come to mind.
While this does not guarantee the algorithm you discover is the best, it's likely one that's received a lot scrutiny and probably comes from project with an active mailing that may answer your questions. So it's a good resource for finding a solution that is probably better than what most of us would come up with on our own.

state-of-the-art of classification algorithms

We know there are like a thousand of classifiers, recently I was told that, some people say adaboost is like the out of the shell one.
Are There better algorithms (with
that voting idea)
What is the state of the art in
the classifiers.Do you have an example?
First, adaboost is a meta-algorithm which is used in conjunction with (on top of) your favorite classifier. Second, classifiers which work well in one problem domain often don't work well in another. See the No Free Lunch wikipedia page. So, there is not going to be AN answer to your question. Still, it might be interesting to know what people are using in practice.
Weka and Mahout aren't algorithms... they're machine learning libraries. They include implementations of a wide range of algorithms. So, your best bet is to pick a library and try a few different algorithms to see which one works best for your particular problem (where "works best" is going to be a function of training cost, classification cost, and classification accuracy).
If it were me, I'd start with naive Bayes, k-nearest neighbors, and support vector machines. They represent well-established, well-understood methods with very different tradeoffs. Naive Bayes is cheap, but not especially accurate. K-NN is cheap during training but (can be) expensive during classification, and while it's usually very accurate it can be susceptible to overtraining. SVMs are expensive to train and have lots of meta-parameters to tweak, but they are cheap to apply and generally at least as accurate as k-NN.
If you tell us more about the problem you're trying to solve, we may be able to give more focused advice. But if you're just looking for the One True Algorithm, there isn't one -- the No Free Lunch theorem guarantees that.
Apache Mahout (open source, java) seems to pick up a lot of steam.
Weka is a very popular and stable Machine Learning library. It has been around for quite a while and written in Java.
Hastie et al. (2013, The Elements of Statistical Learning) conclude that the Gradient Boosting Machine is the best "off-the-shelf" Method. Independent of the Problem you have.
Definition (see page 352):
An “off-the-shelf” method is one that
can be directly applied to the data without requiring a great deal of timeconsuming data preprocessing or careful tuning of the learning procedure.
And a bit older meaning:
In fact, Breiman (NIPS Workshop, 1996) referred to AdaBoost with trees as the “best off-the-shelf classifier in the world” (see also Breiman (1998)).

Genetic Engineering simulation

Does anybody have any good source of software/tutorial about Genetic Engineering Simulation?
Maybe open source software about gene splicing / cloning simulation ?
Thanks
This might be up your alley:
Genetic Programming - Evolution of Mona Lisa by Roger Alsing.
Mona Lisa Source Code and binaries
Mona Lisa FAQ
All very interesting and impressive stuff.
This may not be exactly what you're after, but have you looked at Spore?
It's quite possible that there is no such thing as "serious" genetic engineering simulation software. I think it would be a very difficult concept to implement.
Excellent question!
Clearly genetic engineering can't be simulated at an analytic level - ie. modelling the mechanism involved - because some of the processes, such as protein folding, DNA transcription, etc., are too complex and haven't been modelled yet.
They are still working on a map of the human genome, and I'm sure they haven't gotten around to anything else because anything of reasonable complexity isn't that much less complex (in terms of number of genes) from a human. For example, you can save 2% in complexity, but then you have a chimp not a human.
However, there might be possible approaches based on empirical methods - i.e. "best guess" methods, or experimentation. Thus, you might look at recent work in machine learning or data mining for some possible approaches. There are a lot of data mining and machine learning methods out there, ranging from genetic algorithms, neural networks, decision trees, SVM, etc.
http://en.wikipedia.org/wiki/Data_mining
http://en.wikipedia.org/wiki/Machine_learning

Resources