I found that non-linear data structures gives better memory efficiency in comparison to linear data structure. How non-linear data structure increases memory efficiency ?
Related
I'm trying to solve this exercise for this algorithm.
I've tried to research on multithreading but I couldn't come up with a solution.
Cache-oblivious traversal is not about complexity, it is about efficient use of the CPU cache.
The performance when traversing matrices is very dependent on the CPU cache. There can be orders of magnitude difference between two algorithms with identical complexity but with different cache access patterns.
It is a technique that can be used both in a single-threaded and a multi-threaded implementation.
It's basic idea is that you do not traverse the matrix line by line but quadrant by quadrant allowing the CPU to bring in the data from memory in its cache. Experiment with the size of your quadrant and you will see a huge improvement.
I'm watching university lectures on algorithms and it seems so many of them rely almost entirely binary search trees of some particular sort for querying/database/search tasks.
I don't understand this obsession with Binary Search Trees. It seems like in the vast majority of scenarios, a BSP could be replaced with a sorted array in the case of a static data, or a sorted bucketed list if insertions occur dynamically, and then a Binary Search could be employed over them.
With this approach, you get the same algorithmic complexity (for querying at least) as a BST, way better cache coherency, way less memory fragmentation (and less gc allocs depending on what language you're in), and are likely much simpler to write.
The fundamental issue is that BSP are completely memory naïve -- their focus is entirely on O(n) complexity and they ignore the very real performance considerations of memory fragmentation and cache coherency... Am I missing something?
Binary search trees (BST) are not totally equivalent to the proposed data structure. Their asymptotic complexity is better when it comes to both insert and remove sorted values dynamically (assuming they are balanced correctly). For example, when you when to build an index of the top-k values dynamically:
while end_of_stream(stream):
value <- stream.pop_value()
tree.insert(value)
tree.remove_max()
Sorted arrays are not efficient in this case because of the linear-time insertion. The complexity of bucketed lists is not better than plain list asymptotically and also suffer from a linear-time search. One can note that a heap can be used in this case, and in fact it is probably better to use a heap here, although they are not always interchangeable.
That being said, your are right : BST are slow, cause a lot of cache miss and fragmentation, etc. Thus, they are often replaced by more compact variants like B-trees. B-tree uses a sorted array index to reduce the amount of node jumps and make the data-structure much more compact. They can be mixed with some 4-byte pointer optimizations to make them even more compact. B-trees are to BST what bucketed linked-lists are to plain linked-lists. B-trees are very good for building dynamic database index of huge datasets stored on a slow storage device (because of the size): they enable applications to fetch values associated to a key using very few storage-device lookups (which as very slow on HDD for example). Another example of real-world use-case is interval-trees.
Note that memory fragmentation can be reduced using compaction methods. For BSTs/B-trees, one can reorder the root nodes like in a heap. However, compaction is not always easy to apply, especially on native languages with pointers like in C/C++ although some very clever methods exists to do so.
Keep in mind that B-trees shine only on big datasets (especially the ones that do not fit in cache). On relatively small ones, using just plain arrays or even sorted array is often a very good solution.
I recently started reading about Data Structures in detail. I came across trees. AVL trees are designed taking fast memory access into consideration and B trees are designed taking efficient disk storage into consideration. Suppose I want to design a tree which is both memory efficient and disk storage efficient, what tree should I use? Is there any way I can combine AVL tree and B Tree? Is there any other tree that can do both? Is this fundamentally possible in a real-world scenario?
I want to design a tree which is both memory efficient and disk storage efficient (...) Is there any way I can combine AVL tree and B Tree?
Short answer is no, there isn't, unless you make a breakthrough discovery in the field of data structures. Both of them were designed with specific optimization requirements in mind, you can't have the best of both worlds.
There's a concept in computing called Space–time tradeoff which can be extended to other types of tradeoffs, like the one you're interested in. You can think of it like this: to improve a property of an already optimized algorithm you will have to worsen another (unless you discover some new approach no one thought before).
I suggest you take a look at the available implementations of optimized Binary Trees and start with the one that best fits your needs.
I'm looking for heap structures for priority queues that can be implemented using Object[] arrays instead of Node objects.
Binary heaps of course work well, and so do n-ary heaps. Java's java.util.PriorityQueue is a binary heap, using an Object[] array as storage.
There are plenty of other heaps, such as Fibonacci heaps for example, but as far as I can tell, these need to be implemented using nodes. And from my benchmarks I have the impression that the overhead paid by managing all these node objects comes at a cost that may well eat up all benefits gained. I find it very hard to implement a heap that can comete with the simple array-backed binary heap.
So I'm currently looking for advanced heap / priority queue structures that also do not have the overhead of using Node objects. Because I want it to be fast in reality, not just better in complexity theory... and there is much more happening in reality: for example the CPU has L1, L2 and L3 caches etc. that do affect performance.
My question also focuses on Java for the very reason that I have little influence on memory management here, and there are no structs as in C. A lot of heaps that work well when implemented in C become costly in Java because of the memory management overhead and garbage collection costs.
Several uncommon heap structures can be implemented this way. For example, the Leonardo Heap used in Dijkstra's smoothsort algorithm is typically implemented as an array augmented with two machine words. Like a binary heap, the array representation is an array representation of a specific tree structure (or rather, a forest, since it's a collection of trees).
The poplar heap structure was introduced as a slightly less efficient theoretically but more efficient practically heap structure with the same basic idea - the data is represented as an array with some extra small state information, and is a compressed representation of a forest structure.
A more canonical array-based heap is a d-ary heap, which is a natural generalization of the binary heap to have d children instead of two.
As a very silly example, a sorted array can be used as a priority queue, though it's very inefficient.
Hope this helps!
I have started studying data structures again . I found very few practical uses of this. One of those were about file system on disk . Can someone give me more example of practical uses
of m-way tree .
M-way trees come up in a lot of arenas. Here's a small sampling:
B-trees: these are search trees like a binary search tree with a huge branching factor. They're designed in such a way that each node can fit just inside of the memory that can be read from a hard disk in one pass. They have all the same asymPtotic guarantees of regular BSTs, but are designed to minimize the number of nodes searched to find a particular element. Consequently, many giant database systems use B-trees or other related structures to store large tables on disks. That way, the number of expensive disk reads is minimized and the overall efficiency is much greater.
Octrees. Octrees and their two-dimensional cousins quadtrees are data structures for storing points in three dimensional space. They're used extensively in video games for fast collision detection and real-time rendering computations, and we would be much the worse odd if not for them.
Link/cut trees. These specialized trees are used in network flow problems to efficiently compute matchings or find maximum flows much faster than conventional approaches, which has huge applicability in operations research.
Disjoint-set forests. These multiway trees are used in minimum-spanning tree algorithms to compute connectivity blindingly fast, optimizing the runtime to around the theoretical limit.
Tries. These trees are used to encode string data and allow for extremely fast lookup, storage, and maintenance of sets of strings. They're also used in some regular expression marchers.
Van Emde Boas Trees- a lightning fast implementation of priority queues of integers that is backed by a forest of trees with enormous branching factor.
Suffix trees. These jewels of the text processing world allow for fast string searches. They also typically have a branching factor much greater than two.
PQ-trees. These trees for encoding permutations allow for linear-time planarity testing, which has applications in circuit layout and graph drawing.
Phew! That's a lot of trees. Hope this helps!
By m-way, do you mean a generalized tree? If so, pretty much any 'single parent' hierarchy.