MapReduce alternatives [closed] - algorithm

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 months ago.
Improve this question
Are there any alternative paradigms to MapReduce (Google, Hadoop)? Is there any other reasonable way how to split & merge big problems?

Definitively. Check out, for example, Bulk Synchronous Parallel. Map/Reduce is in fact a very restricted way of reducing problems, however that restriction makes it manageable in a framework like Hadoop. The question is if it is less trouble to press your problem into a Map/Reduce setting, or if its easier to create a domain-specific parallelization scheme and having to take care of all the implementation details yourself. Pig, in fact, is only an abstraction layer on top of Hadoop which automates many standard problem transformations from not-Map-Reduce-y to Map-Reduce-compatible.
Edit 26.1.13: Found a nice up-to-date overview here

Phil Colella identified seven numerical methods for scientific computation based on the patterns of scattering and gathering of data between processing nodes, and called them 'dwarfs'. These have been added to by others, a list is available at the Dwarf Mine:
Dense Linear Algebra
Sparse Linear Algebra
Spectral Methods
N-Body Methods
Structured Grids
Unstructured Grids
MapReduce
Combinational Logic
Graph Traversal
Dynamic Programming
Backtrack and Branch-and-Bound
Graphical Models
Finite State Machines

Update (August 2014): Stratosphere is now called Apache Flink (incubating).
Have a look at Stratosphere. It is another Big Data runtime that offers more operators (map, reduce, join, union, cross, iterate, ...). It also allows to define advanced data flow graphs (with Hadoop MR, you would have to chain jobs).
Stratosphere also supports BSP with its graph processing abstraction (called Spargel).
If you like to read scientific papers, have a look at Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing, it explains the theoretical backgrounds of the system.
Another system in the field is Spark which has its own model (RDDs). Since BSP has been mentioned here, also have a look at GraphLab, the offer an alternative to BSP.

Microsoft's Dryad is claimed to be more general than MapReduce.

Best alternate for MapReduce is Spark, because its 10 to 100 times faster than the MapReduce.
And also very easy to maintain, less coding high performance.

Related

Application of dynamic programming in real world programming [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have found that dynamic programming is a bit skillful and demanding. But since I expect myself to become an adequate software engineer, I am wondering in which development scenario will DP massively be used or in other words are there any practical usage of it in development based on modern computers?
If you think about design patterns like Proxy pattern and dynamic proxy, which is broadly used in spring framework, DP seems like it is only useful in tech interview.
Also, application of parallelized computing and distributed system seems not easy to empower DP in modern computer context.
Are there any not rare scenarios where DP is widely used in very practical ways?
Please forgive my ignorance, since I haven't meet DP in real production level development, which makes me doubt the meaning of digging into DP.
I agree with # Matt Timmermans.
You don't learn about DP in case you have to use DP someday. By practicing DP, you learn ways of thinking about problems that will make you a better developer. In 10 years, nobody will care about the spring framework, but the techniques you learned from DP will still serve you well.
Now, the answer to your questions, part by parts:
1) Why DP if we have modern computers?
I think you got confused by the analogy of modern computers and the need for DP. Although modern computers are powerful in processing, you may think why I need DP if I have modern fast processors to run my application on.
Not every task can be executed on these modern computers as they come up with storage, network, and compute costs. In fact, as an engineer, we should be thinking of optimizing the usage of such resources, that is, making your code efficient to make it capable of running on minimum system configurations.
In today's world, we have a shared service architecture. It means that different independent services share resources. But the fact is they are interdependent indirectly. Imagine what will happen if a non-optimized code is consuming a lot of memory and compute time. These processors will face difficulty in allocating resources for other services or applications.
The thing is, "Why should I buy an apartment if a multi-bedroom flat can meet my needs and also creates an opportunity for others to buy a flat in the same apartment?"
2) DP in tech interviews
The fact that makes DP the most challenging topic to ace is the number of variations in DP.
It checks your ability to break down a difficult task into small ones to avoid reputations and thus save time, efforts, and thus overall resources.
That is one of the most prime reasons why DP is part of tech interviews.
Not only DP teaches you to optimize and learn useful things, but it also highlights bad practices of writing codes.
3) Usage of DP in real life
In Google Maps to find the shortest path between source and the series of destinations (one by one) out of the various available paths.
In networking to transfer data from a sender to various receivers in a sequential manner.
Document Distance Algorithms- to identify the extent of similarity between two text documents used by Search engines like Google, Wikipedia, Quora, and other websites
Edit distance algorithm used in spell checkers.
Databases caching common queries in memory: through dedicated cache tiers storing data to avoid DB access, web servers store common data like configuration that can be used across requests. Then multiple levels of caching in code abstractions within every single request that prevents fetching the same data multiple times and save CPU cycles by avoiding recomputation. Finally, caches within your browser or mobile phones that keep the data that doesn't need to be fetched from the server every time.
Git merge. Document diffing is one of the most prominent uses of LCS.
Dynamic programming is used in TeX's system of calculating the right amounts of hyphenations and justifications.
Genetic algorithms.
Also, I found a great answer on Quora which lists the areas in which DP can be used:
Operations research,
Decision making,
Query optimization,
Water resource engineering,
Economics,
Reservoir Operations problems,
Connected speech recognition,
Slope stability analysis,
Using Matlab,
Using Excel,
Unit commitment,
Image processing,
Optimal Inventory control,
Reservoir operational Problems,
Sap Abap,
Sequence Alignment,
Simulation for sewer management,
Finance,
Production Optimization,
Genetic Algorithms for permutation problem,
Haskell,
HTML,
Healthcare,
Hydropower scheduling,
LISP,
Linear space,
XML indexing and querying,
Business,
Bioinformatics

Javascript data structures library [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'd like to ask for recommendation of JavaScript library/libraries that supply an implementation of some basic data structures such as a priority queue, map with arbitrary keys, tries, graphs, etc. along with some algorithms that operate on them.
I'm mostly interested in:
The set of features covered,
Flexibility of the solution - this mostly applies to graphs. For example do I have to use a supplied graph implementation,
Use of functional features of the language - again it sometimes gives greater flexibility,
Performance of the implementation
I'd like to point out that I'm aware that it's possible to implement using JavaScript the following data structures:
A map, if key values are either strings or numbers,
A set, (using a map implementation),
A queue, although as was pointed out below, it's inefficient on some browsers,
At the moment I'm mostly interested in priority queues (not to confuse with regular queues), graph implementations that aren't very intrusive as to the format of the input graph. For example they could use callbacks to traverse the structure of the graph rather than access some concrete properties with fixed names.
I recommend to use Closure Library (especially with closure compiler).
Here you have a library with data structures goog.structs.
The library contains:
goog.structs.AvlTree
goog.structs.CircularBuffer
goog.structs.Heap
goog.structs.InversionMap
goog.structs.LinkedMap
goog.structs.Map
goog.structs.PriorityQueue
goog.structs.Set
As example you can use unit test: goog.structs.PriorityQueueTest.
If you need to work on arrays, there's also an array lib: goog.array.
As noted in comments, the source has moved to github.com/google/closure and the documentation's new location is: google.github.io/closure-library.
You can try Buckets is a very complete JavaScript data structure library that includes:
Linked List
Dictionary
Multi Dictionary
Binary Search Tree
Stack
Queue
Set
Bag
Binary Heap
Priority Queue
Probably most of what you want is built-in to Javascript in one way or another, or easy to put together with built-in functionality (native Javascript data structures are incredibly flexible). You might like JSClass.
As for the functional features of the language, underscore.js is where it's at..
I can help you with the maps with arbitrary keys: my jshashtable does this, and there is also a hash set implementation built on top of it.
Efficient queue.
If you find more of these, could you please add them to jswiki. Thanks. :)
Especially for graph-like structures, i find graphlib very convenient:
https://github.com/cpettitt/graphlib/wiki/API-Reference
It is very straight-forward, faster than other implementations I tried, has all the basic features, popular graph-algorithms and a JSON data export.
Adding a link to a custom javascript library which provides Priority Queues, Tries, Basic Graph processing and other implementation, for future reference of the visitors to this thread . Check out dsjslib
data.js.
I don't believe it's as feature rich as you want but it has graphs, hashes and collections.
I would take this a lightweight start that you can extend on.
As for what it does offer, it's well written, efficient and documented.
Is your javascript in an application, or a web page? If it's for an application, why not outsource the data structures to Redis? There's a client for nodejs
Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.

Fast linear system solver for D? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Where can I get a fast linear system solver written in D? It should be able to take a square matrix A and a vector b and solve the equation Ax = b for b and, ideally, also perform explicit inversion on A. I have one I wrote myself, but it's pretty slow, probably because it's completely cache-naive. However, for my use case, I need something with the following absolute, non-negotiable requirements, i.e. if it doesn't meet these, then I don't otherwise care how good it otherwise is:
Must be licensed public domain, Boost license, or some similar permissive license. Ideally it should not require attribution in binaries (i.e. not BSD), though this point is somewhat negotiable.
Must be written in pure D or easily translatable to pure D. Inscrutable Fortran code (i.e. LAPACK) is not a good answer no matter how fast it is.
Must be optimized for large (i.e. n > 1000) systems. I don't want something that's designed for game programmers to solve 4x4 matrices really, really fast.
Must not be inextricably linked to a huge library of stuff I don't need.
Edit: The reason for these seemingly insane requirements is that I need this code for a permissively licensed open source library that I don't want to have any third-party dependencies.
If you don't like Fortran code, one reasonably fast C++ dense matrix library with modest multi-core support, well-written code and a good user-interface is Eigen. It should be straightforward to translate its code to D (or to take some algorithms from it).
And now my "think about your requirements": there is a reason why "everyone" (Mathematica, Matlab, Maple, SciPy, GSL, R, ...) uses ATLAS / LAPACK, UMFPACK, PARDISO, CHOLMOD etc. It is hard work to write fast, multi-threaded, memory-efficient, portable and numerically stable matrix solvers (trust me, I have tried). A lot of this hard work has gone into ATLAS and the rest.
So my approach would be to write bindings for the relevant library depending on your matrix type, and link from D against the C interfaces. Maybe the bindings in multiarray are enough (I haven't tried). Otherwise, I'd suggest looking at another C++ library, namely uBlas and the respective bindings for ideas.

Data structures for bioinformatics [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
What are some data structures that should be known by somebody involved in bioinformatics? I guess that anyone is supposed to know about lists, hashes, balanced trees, etc., but I expect that there are domain specific data structures. Is there any book devoted to this subject?
The most fundamental data structure used in bioinformatics is string. There are also a whole range of different data structures representing strings. And algorithms like string matching are based on the efficient representation/data structures.
A comprehensive work on this is Dan Gusfield's Algorithms on Strings, Trees and Sequences
A lot of introductory books on bioinformatics will cover some of the basic structures you'd use. I'm not sure what the standard textbook is, but I'm sure you can find that. It might be useful to look at some of the language-specific books:
Bioinformatics Programming With Python
Beginning Perl for Bioinformatics
I chose those two as examples because they're published by O'Reilly, which, in my experience, publishes good quality books.
I just so happen to have the Python book on my hard drive, and a great deal of it talks about processing strings for bioinformatics using Python. It doesn't seem like bioinformatics uses any fancy special data structures, just existing ones.
Spatial hashing datastructures (kd-tree) for example are used often for nearest neighbor queries of arbitrary feature vectors as well as 3d protein structure analysis.
Best book for your $$ is Understanding Bioinformatics by Zvelebil because it covers everything from sequence analysis to structure comparison.
In addition to basic familiarity with the structures you mentioned, suffix trees (and suffix arrays), de Bruijn graphs, and interval graphs are used extensively. The Handbook of Computational Molecular Biology is very well written. I've never read the whole thing, but I've used it as a reference.
I also highly recommend this book, http://www.comp.nus.edu.sg/~ksung/algo_in_bioinfo/
And more recently, python is much more frequently used in bioinformatics than perl. So I really suggest you start with python, it is widely used in my projects.
Many projects in bioinformatics involve combining information from different, semi-structured sources. RDF and ontologies are essential for much of this. See, for example, the bio2RDF project. http://bio2rdf.org/. A good understanding of identifiers is valuable.
Much bioinformatics is exploratory and rapid lightweight tools are often used. See workflow tools such as Taverna where the primary resource is often a set of web services - so HTTP/REST are common.
Whatever your mathematical or computational expertise is, you are likely to find an application in computational biology. If not, make this another question of stackoverflow and you'll be helped :o)
As mentioned in the other answers, somewhat timeless are string comparisons and pattern discovery in 1-dimensional data since sequences are so easy to get. With a renewed interest in medical informatics though you also have two/three-dimensional image analysis that you run e.g. against genomic data. With molecular biochemistry you also have pattern searches on 3D surfaces and molecular simulations. To study drug effects you will work with gene networks and compare those across tissues. Typical challenges for big data and information integration apply. And then, you need statistical descriptions of the likelihood of a pattern or the clinical association of any features identified to be found by chance.

Behavior Tree Implementations [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am looking for behavior tree implementations in any language, I would like to learn more about how they are implemented and used so can roll my own but I could only find one
Owyl, unfortunately, it does not contain examples of how it is used.
Any one know any other open source ones that I can browse through the code see some examples of how they are used etc?
EDIT: Behavior tree is the name of the data structure.
Here's a few I've found:
C# - https://github.com/netgnome/BehaviorLibrary (free)
C++ - http://aigamedev.com/insider/tutorial/second-generation-bt/ ($10)
C# - http://code.google.com/p/treesharp/ (free)
C# - https://github.com/ArtemKoval/Simple-Behavior-Tree-Library
Java - http://code.google.com/p/daggame/ DAG AI Code
C# - http://www.sgtconker.com/affiliated-projects/brains/
This Q on GameDev could be helpful, too.
Take a look at https://skill.codeplex.com/. this is a BehaviorTree code generator for unity. you can download source code and see if it is useful.
I did my own behavior tree implementation in C++ and used some modified code from the Protothreads Library. Coroutines in C is also a good read. Using this one can implement a coroutine system that allows one to run multiple behaviors concurrently without the use of multiple threads. Basically each tree node would have its own coroutine.
I don't know that I understand you right but I think to implement a tree you'r better choice is to use an formal language such as F# or Haskell. With Haskell you can use flexible and fast tree-structures and with F# you have an multiparadigm Language to parse and handle tree structures in oo Code.
I hope that helps you.
You can find behavior trees implemented in .NET in the YVision framework. We found them to be particularly suited for the development of Natural User Interface (NUI) applications.
It's not open-source but it's free to use and you can find information on how we implemented them in the tutorials: http://www.yvision.com/support/tutorials/
EDIT: Let me add that we use behavior trees for a lot more than just AI. Even the synchronization of the subsystem in the game loop is define by them.
Check the cases page to find the range of applications we are using them: robotics, camera-based interaction, augmented reality, etc.
Download the framework, try the samples and please give us feedback on our implementation.
https://github.com/TencentOpen/behaviac is a really excellent one.
behaviac supports the behavior tree, finite state machine and hierarchical task network.
Behaviors can be designed and debugged in the designer, exported and executed by the game.
The C++ version is suitable for the client and server side.
and, it is open sourced!

Resources