Related
Let me begin by saying that I am struggling to understand what is going on in deep learning. From what I gather, it is an approach to try to have a computer engineer different layers of representations and features to enable it learn stuff on its own. SIFT seems to be a common way to sort of detect things by tagging and hunting for scale invariant things in some representation. Again I am completely stupid and in awe and wonder about how this magic is achieved. How does one have a computer do this by itself? I have looked at this paper https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf and I must say at this point I think it is magic. Can somebody help me distill the main points of how this works and why a computer can do it on its own?
SIFT and CNN are both methods to extract features from images in different ways and outputs.
SIFT/SURF/ORB or any similar feature extraction algorithms are "Hand-made" feature extraction algorithms. It means, independent from the real world cases, they are aiming to extract some meaningful features. This approach has some advantages and disadvantages.
Advantages :
You don't have to care about input image conditions and probably you don't need any pre-processing step to extract those features.
You can directly get SIFT implementation and integrate it to your application
With GPU based implementations (i.e. GPU-SIFT), you can achieve high inference speed.
Disadvantages:
It has limitation about finding the features. You will have trouble about getting features over quite plain surfaces.
SIFT/SURF/ORB cannot solve all problems that requires feature classification / matching. Think face recognition problem. Do you think that extracting & classifying SIFT features over face will be enough to recognize people?
Those are hand-made feature extraction techniques, they cannot be improved over time (of course unless a better technique is being introduced)
Developing such a feature extraction technique requires a lot of research work
In the other hand, in deep learning, you can start analyzing much complex features which are impossible by human to recognize. CNNs are perfect approach as today to analyze hierarchical filter responses and much complex features which are created by combining those filter responses (going deeper).
The main power of CNNs are coming from not extracting features by hand. We only define "how" PC has to look for features. Of course this method has some pros and cons too.
Advantages :
More data, better success! It is all depending on data. If you have enough data to explain your case, DL outperforms hand-made feature extraction techniques.
As soon as you extract the features from image, you can use it for many purposes like to segment image, to create description words, to detect objects inside image, to recognize them. The better part is, all of them can be obtained in one shot, rather than complex sequential processes.
Disadvantages:
You need data. Probably a lot.
It is better to use supervised or reinforcement learning methods in these days. As unsupervised learning is still not good enough yet.
It takes time and resource to train a good neural net. A complex hierarchy like Google Inception took 2 weeks to be trained on 8 GPU server rack. Of course not all the networks are so hard to train.
It has some learning curve. You don't have to know how SIFT is working to use it for your application but you have to know how CNNs are working to use them in your custom purposes.
Up until now I've mostly concentrated on how to properly design code, make it as readable as possible and as maintainable as possible. So I alway chose to learn about the higher level details of programming, such as class interactions, API design, etc.
Algorithms I never really found particularly interesting. As a result, even though I can come up with a good design for my programs, and even if I can come up with a solution to a given problem it rarely is the most efficient.
Is there a particular way of thinking about problems that helps you come up with an as efficient solution as possible, or is it simple a matter of practice and/or memorizing?
Also, what online resources can you recommend that teach you various efficient algorithms for different problems?
Data dominates. If you design your program around the right abstract data structures (ADTs), you often get a clean design, the algorithms follow quite naturally and when performance is lacking, you should be able to "plug in" more efficient ones.
A strong background in maths and logic helps here, as it allows you to visualize your program at a high level as the interaction between functions, sets, graphs, sequences, etc. You then decide whether the sets need to be ordered (balanced BST, O(lg n) operations) or not (hash tables, O(1) operations), what operations need to supported on sequences (vector-like or list-like), etc.
If you want to learn some algorithms, get a good book such as Cormen et al. and try to implement the main data structures:
binary search trees
generic binary search trees (that work on more than just int or strings)
hash tables
priority queues/heaps
dynamic arrays
Introduction To Algorithms is a great book to get you thinking about efficiency of different algorithms/data structures.
The authors of the book also teach an algorithms course on MIT . You can find most lectures here
I would say that in coming up with good algorithms (which is actually part of good design IMHO), you have to develop a way of thinking. This is best done by studying algorithm design. By study I don't mean just knowing all the common algorithms covered in a textbook, but actually understanding how and why they work, and being able to apply the basic idea contained in them to actual problems you are trying to solve.
I would suggest reading a good book on algorithms (my favourite is CLRS). For an online resource I would recommend the series of articles in the TopCoder Algorithm Tutorials.
I do not understand why you would mention practice and memorization in the same breath. Memorization won't help you at all (you probably already know this), but practice is essential. If you cannot apply what you learned, its not really learning. You can practice at various online programming contest/puzzle sites like SPOJ, Project Euler and PythonChallenge.
Recommendations:
First of all i recommend the book "Intro to Algorithms, Second Edition By corman", great book has most(if not all) of the algorithms you will need. (Some of the more important topics are sorting-algorithms, shortest paths, dynamic programing, many data structures like bst, hash maps, heaps).
another great way to learn algorithms is http://ace.delos.com/usacogate, great practice after the begining.
To your questions you will just get used to write good fast running code, after a little practice you just wouldnt want to write un-efficient code.
While I think #larsmans is correct inasmuch that understanding logic and maths is a fast way to understanding how to choose useful ADTs for solving a given problem, studying existing solutions may be more instructive for those of us who struggle with those topics. In particular, reviewing code of established software (OSS) that solves some similar problem as the one you're interested in.
I find a particularly good method for this method of study is reviewing unit tests of such a project. Apache Lucene, for example, has a source control repository containing numerous examples. While it doesn't reveal the underlying algorithms, it helps trace to particular functionality that solves a certain problem. This leads to an opportunity for studying its innards - i.e. an interesting algorithm. In Lucene's case inverted indices come to mind.
While this does not guarantee the algorithm you discover is the best, it's likely one that's received a lot scrutiny and probably comes from project with an active mailing that may answer your questions. So it's a good resource for finding a solution that is probably better than what most of us would come up with on our own.
I am brainstorming an idea of developing a high level software to manipulate matrix algebra equations, tensor manipulations to be exact, to produce optimized C++ code using several criteria such as sizes of dimensions, available memory on the system, etc.
Something which is similar in spirit to tensor contraction engine, TCE, but specifically oriented towards producing optimized rather than general code.
The end result desired is software which is expert in producing parallel program in my domain.
Does this sort of development fall on the category of expert systems?
What other projects out there work in the same area of producing code given the constraints?
What you are describing is more like a Domain-Specific Language.
http://en.wikipedia.org/wiki/Domain-specific_language
It wouldn't be called an expert system, at least not in the traditional sense of this concept.
Expert systems are rule-based inference engines, whereby the expertise in question is clearly encapsulated in the rules. The system you suggest, while possibly encapsulating insight about the nature of the problem domain inside a linear algebra model of sorts, would act more as a black box than an expert system. One of the characteristics of expert systems is that they can produce an "explanation" of their reasoning, and such a feature is possible in part because the knowledge representation, while formalized, remains close to simple statements in a natural language; matrices and operations on them, while possibly being derived upon similar observation of reality, are a lot less transparent...
It is unclear from the description in the question if the system you propose would optimize existing code (possibly in a limited domain), or if it would produced optimized code, in that case driven bay some external goal/function...
Well production systems (rule systems) are one of four general approaches to computation (Turing machines, Church recursive functions, Post production systems and Markov algorithms [and several more have been added to that list]) which more or less have these respective realizations: imperative programming, functional programming, rule based programming - as far as I know Markov algorithms don't have an independent implementation. These are all Turing equivalent.
So rule based programming can be used to write anything at all. Also early mathematical/symbolic manipulation programs did generally use rule based programming until the problem was sufficiently well understood (whereupon the approach was changed to imperative or constraint programming - see MACSYMA - hmmm MACSYMA was written in Lisp so perhaps I have a different program in mind or perhaps they originally implemented a rule system in Lisp for this).
You could easily write a rule system to perform the matrix manipulations. You could keep a trace depending on logical support to record the actual rules fired that contributed to a solution (some rules that fire might not contribute directly to a solution afterall). Then for every rule you have a mapping to a set of C++ instructions (these don't have to be "complete" - they sort of act more like a semi-executable requirement) which are output as an intermediate language. Then that is read by a parser to link it to the required input data and any kind of fix up needed. You might find it easier to generate functional code - for one thing after the fix up you could more easily optimize the output code in functional source.
Having said that, other contributors have outlined a domain specific language approach and that is what the TED people did too (my suggestion is that too just using rules).
I was wondering how common it is to find genetic algorithm approaches in commercial code.
It always seemed to me that some kinds of schedulers could benefit from a GA engine, as a supplement to the main algorithm.
Genetic Algorithms have been widely used commercially. Optimizing train routing was an early application. More recently fighter planes have used GAs to optimize wing designs. I have used GAs extensively at work to generate solutions to problems that have an extremely large search space.
Many problems are unlikely to benefit from GAs. I disagree with Thomas that they are too hard to understand. A GA is actually very simple. We found that there is a huge amount of knowledge to be gained from optimizing the GA to a particular problem that might be difficult and as always managing large amounts of parallel computation continue to be a problem for many programmers.
A problem that would benefit from a GA is going to have the following characteristics:
A good way to encode potential solutions
A way to compute an a numerical score to evaluate the quality of the solution
A large multi-dimensional search space where the answer is non-obvious
A good solution is good enough and a perfect solution is not required
There are many problems that could probably benefit from GAs and in the future they will probably be more widely deployed. I believe that GAs are used in cutting edge engineering more than people think however most people (like my company does) guards those secrets extremely closely. It is only long after the fact that it is revealed that GAs were used.
Most people that deal with "normal" applications probably don't have much use for them though.
If you want to find an example, look at Postgres's Query Planner. It uses many techniques, and one just so happens to be genetic.
http://developer.postgresql.org/pgdocs/postgres/geqo-pg-intro.html
I used GA in my Master's thesis, but after that I haven't found anything in my daily work a GA could solve that I couldn't solve faster with some other Algorithm.
I don't think it is particularly common to find genetic algorithms in everyday-commercial code. They are more commonly found in academic/research code where the need to find the "best algorithm" is less important than the need to just find a good solution to a problem.
Nonetheless, I have consulted on a couple of commercial projects that do use GAs (chiefly as a result of my involvement with GAUL). I think the most interesting example was at a Biotech company. They used the GA to optimise scoring functions that were used for virtual screening, as part of their drug discovery application.
Earlier this year, with my current company, I added a new feature to one of our products that uses another GA. I think we might be marketing this from next month. Basically, the GA is used to explore molecules that have the potential for binding to a protein, and could therefore be further investigated as drugs targeting that protein. A competing product that also uses a GA is EA inventor.
As part of my thesis I wrote a generic java framework for the multi-objective optimisation algorithm mPOEMS (Multiobjective prototype optimization with evolved improvement steps), which is a GA using evolutionary concepts. It is generic in a way that all problem-independent parts have been separated from the problem-dependent parts, and an interface is povided to use the framework with only adding the problem-dependent parts. Thus one who wants to use the algorithm does not have to begin from zero, and it facilitates work a lot.
You can find the code here.
The solutions which you can find with this algorithm have been compared in a scientific work with state-of-the-art algorithms SPEA-2 and NSGA, and it has been proven that
the algorithm performes comparable or even better, depending on the metrics you take to measure the performance, and especially depending on the optimization-problem you are looking on.
You can find it here.
Also as part of my thesis and proof of work I applied this framework to the project selection problem found in portfolio management. It is about selecting the projects which add the most value to the company, support most the strategy of the company or support any other arbitrary goal. E.g. selection of a certain number of projects from a specific category, or maximization of project synergies, ...
My thesis which applies this framework to the project selection problem:
http://www.ub.tuwien.ac.at/dipl/2008/AC05038968.pdf
After that I worked in a portfolio management department in one of the fortune 500, where they used a commercial software which also applied a GA to the project selection problem / portfolio optimization.
Further resources:
The documentation of the framework:
http://thomaskremmel.com/mpoems/mpoems_in_java_documentation.pdf
mPOEMS presentation paper:
http://portal.acm.org/citation.cfm?id=1792634.1792653
Actually with a bit of enthusiasm everybody could easily adapt the code of the generic framework to an arbitrary multi-objective optimisation problem.
I haven't but I've heard of this company (can't remember their name) which uses mutating, genetic algos to calculate placements and lengths of antennas (or something) from a friend of mine. And they're supposed to (according to my friend) have huge success with this. I guess GA is just too complex for "average Joe developer" to become mainstream. Kind of like Map Reduce - spectacularly cool, but WAY too advanced to hit the "mainstream"...
LibreOffice Calc uses it in its Solver module.
I wonder how many of you have implemented one of computer science's "classical algorithms" like Dijkstra's algorithm or data structures (e.g. binary search trees) in a real world, not academic project?
Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality?
Is there a benefit to our dayjobs in knowing these algorithms and data structures when there are tons of libraries, frameworks and APIs which give you the same functionality?
The library doesn't know what your problem domain is and won't be able to chose the correct algorithm to do the job. That is why I think it is important to know about them: then YOU can make the correct choice of algorithms to solve YOUR problem.
Knowing, or being able to understand these algorithms is important, these are the tools of your trade. It does not mean you have to be able to implement A* in an hour from memory. But you should be able to figure out what the advantages of using a red-black tree as opposed to a normal unbalanced tree are so you can decide if you need it or not. You need to be able to judge the fitness of an algorithm for solving your problem.
This might sound too school-masterish but these "classical algorithms" were not invented to give college students exam questions, they were invented to solve problems or improve on current solutions, just like the array, the linked list or the stack are building blocks to write a program so are some of these. Just like in math where you move from addition and subtraction to integration and differentiation, these are advanced techniques that will help you solve problems that are out there.
They might not be directly applicable to your problems or work situation but in the long run knowing of them will help you as a professional software engineer.
To answer your question, I did an implementation of A* recently for a game.
Is there a benefit to understanding your tools, rather than simply knowing that they exist?
Yes, of course there is. Taking a trivial example, don't you think there's a benefit to knowing what the difference is List (or your language's equivalent dynamic array implementation) and LinkedList (or your language's equivalent)? It's pretty important to know that one has constant random access time, while the other is linear. And one requires N copies if you insert a value in the middle of the sequence, while the other can do it in constant time.
Don't you think there's an advantage to understanding that the same sorting algorithm isn't always optimal? That for almost-sorted data, quicksort sucks, for example? Naively just calling Sort() and hoping for the best can become ridiculously expensive if you don't understand what's happening under the hood.
Of course there are a lot of algorithms you probably won't need, but even so, just understanding how they work may make it easier for yourself to come up with efficient algorithms to solve other, unrelated, problems.
Well, someone has to write the libraries. While working at a mapping software company, I implemented Dijkstra's, as well as binary search trees, b-trees, n-ary trees, bk-trees and hidden markov models.
Besides, if all you want is a single 'well known' algorithm, and you also want the freedom to specialise it and optimise it if it becomes critical to performance, including a whole library seems like a poor choice.
We use a home grown implementation of a p-random number generator from Knuth SemiNumeric as an aid in some statistical processing
In my previous workplace, which was an EDA company, we implemented versions of Prim and Dijsktra's algorithms, disjoint set data structures, A* search and more. All of these had real world significance. I believe this is dependent on problem domain - some domains are more algorithm-intensive and some less so.
Having said that, there is a fine line to walk - I see no business reason for re-implementing STL or Java Generics. In many cases, a standard library is better than "inventing a wheel". The more you are near your core application, the more it may be necessary to implement a textbook algorithm or data structure.
If you never work with performance-critical code, consider yourself lucky. However, I consider this scenario unrealistic. Performance problems could occur anywhere. And then it's necessary to know how to fix that problem. Obviously, merely knowing a few algorithm names isn't enough here – unless you want to implement them all and try them out one after the other.
No, knowing (at least some of) the inner workings of different algorithms is important for gauging their strengths and weaknesses and for analyzing how they would handle your situation.
Obviously, if there's a library already implementing exactly what you need, you're incredibly lucky. But let's face it, even if there is such a library, using it is often not completely straightforward (at the very least, interfaces and data representation often have to be adapted) so it's still good to know what to expect.
A* for a pac man clone. It took me weeks to really get but to this day I consider it a thing of beauty.
I've had to implement some of the classical algorithms from numerical analysis. It was easier to write my own than to connect to an existing library. Also, I've had to write variations on classical algorithms because the textbook case didn't fit my application.
For classical data structures, I nearly always use the standard libraries, such as STL for C++. The one time recently when I thought STL didn't have the structure I needed (a heap) I rolled my own, only to have someone point out almost immediately that I didn't need to do that.
Classical algorithms I have used in actual work:
A topological sort
A red-black tree (although I will
confess that I only had to implement
insertions for that application and
it only got used in a prototype).
This got used to implement an
'ordered dict' type structure in
Python.
A priority queue
State machines of various sorts
Probably one or two others I can't remember.
As to the second part of the question:
An understanding of how the algorithms work, their complexity and semantics gets used on a fairly regular basis. They also inform the design of systems. Occasionally one has to do things involving parsing or protocol handling, or some computation that's slightly clever. Having a working knowledge of what the algorithms do, how they work, how expensive they are and where one might find them lying around in library code goes a long way to knowing how to avoid reinventing the wheel poorly.
I use the Levenshtein distance algorithm to help implement a 'Did you mean [suggested word]?' feature in our website search.
Works quite well when combined with our 'tagging' system, which allows us to associate extra words (other than those in title/description/etc) with items in the database. \
It's not perfect by any means, but it's way better than most corporate site searches, if I don't say so myself ; )
Classical algorithms are usually associated with something glamorous, like games, or Web search, or scientific computation. However, I had to use some of the classical algorithms for a mere enterprise application.
I was building a metadata migration tool, and I had to use topological sort for dependency resolution, various forms of graph traversals for queries on metadata, and a modified variation of Tarjan's union-find datastructure to partition forest-like structured metadata to trees.
That was a really satisfying experience. Most of those algorithms were implemented before, but their implementations lacked something that I would need for my task. That's why It's important to understand their internals.