This question already has answers here:
Sort with the limited memory
(6 answers)
Closed 5 years ago.
Now, to sort some limited number of registers, we often use RAM to hold elements under process. The problem is when we are asked to sort millions of random registers where each register contains set of elements. This huge file cannot be sorted using traditional sorting algorithms . how i can solve this problem.
You need to look an efficient algorithm for sorting data that is not completely read into memory. A few adaptations to Merge-Sort can achieve this.
Here is the Java Implementation of merge sort that sorts very large files:
Take a look into these too:
http://en.wikipedia.org/wiki/Merge_sort
http://en.wikipedia.org/wiki/External_sorting
Related
This question already has answers here:
1D Number Array Clustering
(6 answers)
Closed 8 years ago.
Can I cluster data with one variable instead of many (What I had already test) using mahout K-means Algorithm ? if yes (I hope so :) )could you give me an Example of clustering and thinks
How big is your data? If it is not exabytes, you would be better off without Mahout.
If it is exabytes, use sampling, and then process it on a single machine.
See also:
Cluster one-dimensional data optimally?
1D Number Array Clustering
Which clustering algorithm is suitable for one-dimensional Lists without knowing k?
and many more.
Mahout is not your general go-to place for data anlysis. It only shines when you have Google scale data. Otherwise, the overhead is too large.
This question already has answers here:
What is stability in sorting algorithms and why is it important?
(10 answers)
Closed 8 years ago.
Can someone explain what "stable" and "unstable" mean in relation to various sorting algorithms> How can one determine whether an algorithm is stable or not, and what applications do unstable sorting algorithms usually have (since they are unstable)?
If a sorting algorithm is said to be "unstable", this means that for any items that rank the same, the order of the tied members is not guaranteed to stay the same with successive sorts of that collection. For a 'stable' sort, the tied entries will always end up in the same order when sorted.
For an example of applications, the quick sort algorithm is not stable. This would work fine for something like sorting actions by priority (if two actions are of equal priority, you would not be likely to care about which elements of a tie are executed first).
A stable sorting algorithm, on the other hand, is good for things like a leaderboard for an online game. If you were to use an unstable sort, sorting by points (for instance), then a user viewing the sorted results on a webpage could experience different results on page refreshes and operations like paging through results would not function correctly.
A stable sort retains the order of identical items. Any sort can be made stable by appending the row index to the key. Unstable sorts, like heap sort and quick sort for example do not have this property inherently, but they are used because they tend to be faster and easier to code than stable sorts. As far as I know there are no other reasons to use unstable sorts.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
A data structure supporting O(1) random access and worst-case O(1) append?
I saw an answer a while ago on StackOverflow regarding a provably optimal vector ("array list") data structure, which, if I remember correctly, lazily copied elements onto a larger vector so that it wouldn't cause a huge pause every time the vector reallocated.
I remember it needed O(sqrt(n)) extra space for bookkeeping, and that the answer linked to a published paper, but that's about it... I'm having a really hard time searching for it (you can imagine that searches like optimal vector are getting me nowhere).
Where can I find the paper?
I think that the paper you are referring to is "Resizable Arrays in Optimal Time and Space" by Brodnik et al. Their data structure uses the lazy copying dynamic array you mentioned in your question as a building block to assemble this structure. There is this older question on Stack Overflow describing the lazy-copying data structure, which might be useful to get a better feel for how it works.
Hope this helps!
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Practical uses of different data structures
Could please anyone point me out to a brief summary which describes real-life applications of various data structures? I am looking for ready-to-use summary not a reference to the Cormen's book :)
For example, almost every article says what a Binary tree is; but they doesn't provide with examples when they really should be used in real-life; the same for other data structures.
Thank you,
Data structures are so widely used that this summary will be actually enormous. The simplest cases are used almost every day -- hashmaps for easy searching of particular item. Linked lists -- for easy adding/removing elements(you can for example describe object's properties with linked lists and you can easily add or remove such properties). Priority queues -- used for many algorithms (Dijsktra's algorithm, Prim's algorithm for minimum spanning tree, Huffman's encoding). Trie for describing dictionary of words. Bloom filters for fast and cheap of memory search (your email's spam filter may use this). Data structures are all around us -- you really should study and understand them and then you can find application for them everywhere.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
B- trees, B+ trees difference
What are the advantages/disadvantages of BTree+ over BTree? When should I prefer one over other? I'm also interested in knowing any real world examples where one has been preferred over other.
According to the Wikipedia article about BTree+, this kind of data structure is frequently used for indexing block-oriented storage. Apparently, BTree+ stored keys (and not values) are stored in the intermediate nodes. This would mean that you would need fewer intermediate node blocks and would increase the likelihood of a cache hit.
Real world examples include various file systems; see the linked article.