Is there any document about how MPI functions such as MPI_Algather, MPI_AlltoAll, MPI_Allreduce etc.. are implemented ?
I would like to learn about their algorithm and compute the complexity of them in term of uni-directional or bi-directional bandwidth and total data transfer size for a number of nodes and fixed data size.
I think the exact implementation of those algoritms varies, depending on the communication mechanism: in example a network will have tree-based reduction algorithms, while shared memory models will have different ones.
I'm not exactly sure about where to find answers to such questions, but I think that a good search for papers in google scholar or having a look at this paper list at open-mpi.org should be useful.
http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395/ref=sr_1_10?s=books&ie=UTF8&qid=1314807638&sr=1-10
shown above is great link that explains all the basic MPI algorithms and allows you to implement a simple version yourself. However, when doing comparisons between the algorithms that you have implemented and the MPI algorithms, you will see that they have made many optimizations depending on the size of the message and number of nodes that you are running on. Hopefully this helps
Related
Can someone tell me which is the best algorithm for minimum cost maximum flow (and easy to implement) and from where to read will be helpful? I searched online and got names of many algorithms and unable to decide which one to study.
From my experience benchmarking MCF in an industry setting, there are three publicly available implementations that are competitive,
Andrew V Goldberg's cost scaling implementation.
Coin-OR's Lemon library cost scaling implementation.
Coin-OR's Network Simplex implementation.
I would try those in that order if you are limited for time. Other honorable mentions are,
Google-OR's cost scaling implementation. I haven't benchmarked this, but I'd expect it to be competitive with those above.
MCFClass has several implementations listed under various restricted licenses for commercial use. RelaxIV is very competitive but restrictive.
In terms of studying literature and a survey of competitive algorithms, the work of Kirarly and Kovacs are an excellent starting point.
Given a data structure specification such as a purely functional map with known complexity bounds, one has to pick between several implementations. There is some folklore on how to pick the right one, for example Red-Black trees are considered to be generally faster, but AVL trees have better performance on work loads with many lookups.
Is there a systematic presentation (published paper) of this knowledge (as relates to sets/maps)? Ideally I would like to see statistical analysis performed on actual software. It might conclude, for example, that there are N typical kinds of map usage, and list the input probability distribution for each.
Are there systematic benchmarks that test map and set performance on different distributions of inputs?
Are there implementations that use adaptive algorithms to change representation depending on actual usage?
These are basically research topics, and the results are generally given in the form of conclusions, while the statistical data is hidden. One can have statistical analysis on their own data though.
For the benchmarks, better go through the implementation details.
The 3rd part of the question is a very subjective matter, and the actual intentions may never be known at the time of implementation. However, languages like perl do their best to implement highly optimized solutions to every operation.
Following might be of help:
Purely Functional Data Structures by Chris Okasaki
http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
I have been reading parts of Introduction to Algorithms by Cormen et al, and have implemented some of the algorithms.
In order to test my implementations I wrote some glue code to do file io, then made some sample input by hand and some more sample input by writing programs that generate sample input.
However I am doubtful as to the quality of my own sample inputs -- corner cases; I may have missed the more interesting possibilities; I may have miscalculated the proper output; etc.
Is there a set of test inputs and outputs for various algorithms collected somewhere on the Internet so that I might be able to test my code? I am looking for test data reasonably specific to particular algorithms, rather than contest problems that often involve a problem solving component as well.
I understand that I might have to adjust my code depending on the format the input is collected in (e.g. The various constraints of the inputs; for graph algorithms, the representation of the graph; etc.) although, I am hoping that the change I would have to make would be reasonably trivial.
Edit:
Some particular datasets I am currently looking for are:
Lists of numbers
Skewed so that Quick sort performs badly.
Skewed so that Fibonacci Heap performs particularly well or poorly for specific operations.
Graphs (for which High Performance Mark has offered a number of interesting references)
Sparse graphs (with specific bounds on number of edges),
Dense graphs,
Since, I am still working through the book, if you are in a similar situation as I am, or you just feel the list could be improved, please feel free to edit the list -- some time soon, I may come to need datasets similar to what you are looking for. I am not entirely sure how editing privileges work, but if I have any say over it, I will try to approve it.
I don't know of any one resource which will provide you with sample inputs for all the types of algorithm that Cormen et al cover but for graph datasets here are a couple of references:
Knuth's Stanford Graphbase
and
the Stanford Large Network Dataset Collection
which I stumbled across while looking for the link to the former. You might be interested in this one too:
the Matrix Market
Why not edit your question and let SO know what other types of input you are looking for.
I am going to stick my head on the line and say that I do not know of any such source, and I very much doubt that such a source exists.
As you seem to be aware, algorithms can be applied to almost any sort of data, and so it would be fruitless to attempt to provide sample data.
There are various types of trees I know. For example, binary trees can be classified as binary search trees, two trees, etc.
Can anyone give me a complete classification of all the trees in computer science?
Please provide me with reliable references or web links.
It's virtually impossible to answer this question since there are essentially arbitrarily many different ways of using trees. The issue is that a tree is a structure - it's a way of showing how various pieces of data are linked to one another - and what you're asking for is every possible way of interpreting the meaning of that structure. This would be similar, for example, to asking for all uses of calculus in engineering; calculus is a tool with which you can solve an enormous class of problems, but there's no concise way to explain all possible uses of the integral because in each application it is used a different way.
In the case of trees, I've found that there are thousands of research papers describing different tree structures and ways of using trees to solve problems. They arise in string processing, genomics, computational geometry, theory of computation, artificial intelligence, optimization, operating systems, networking, compilers, and a whole host of other areas. In each of these domains they're used to encode specific structures that are domain-specific and difficult to understand without specialized knowledge of the field. No one reference can cover all these ares in any reasonable depth.
In short, you seem to already know the structure of a tree, and this general notion is transferrable to any of the above domains. But to try to learn every possible way of using this structure or all its applications would be a Herculean undertaking that no one, not even the legendary Don Knuth, could ever hope to achieve in a lifetime.
Wikipedia has a nice compilation of the various trees at the bottom of the page
Dictionary of Algorithms and Data Structures has more information
What specifics are you looking for?
I'm a self-taught developer and, quite frankly, am not all that great at figuring out which search or sort algorithm to use in any particular situation. I was just wondering if there was a Design Patterns-esque listing of the common algorithms available out there in the ether for me to bookmark. Something like:
Name of algorithm (with aliases, if any)
Problem it addresses
Big-O cost
Algorithm itself
Examples
Other algorithms it may be used with/substituted for
I'm just looking for a simple, concise listing of the algorithms I probably should know in one location. Is there anything like this available?
The web site http://www.sorting-algorithms.com/ shows many popular sorting algorithms, and describes their complexity and implementation. It goes the extra step to show, via animations, how those algorithms perform on different types of data (i.e pre-sorted, sparse, reverse-sorted, etc...).
This site has some examples of sorting algorithms, included visual aids to help you get the hang of it. I personally like the various best/worst/average/few unique cases they show.
Wikipedia has a nice table that lists most of the common sorting algorithms along with classification of them and basic analysis of their complexity characteristics.
The more common sorting algorithms have pseudocode and more in-depth analysis. For less common sorting algorithms, you'll probably have better luck finding details in academic papers or real implementations.
your should read CLRS.
In terms of problems variety, there are millions. and it all comes from puzzles and math.
Skienna has nice problems with different varieties.
You have a great article on the wikipedia.
http://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms
But I would suggest reading some book. Almost every book has one chapter about sorting.