Optimal vector data structure? [duplicate] - data-structures

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
A data structure supporting O(1) random access and worst-case O(1) append?
I saw an answer a while ago on StackOverflow regarding a provably optimal vector ("array list") data structure, which, if I remember correctly, lazily copied elements onto a larger vector so that it wouldn't cause a huge pause every time the vector reallocated.
I remember it needed O(sqrt(n)) extra space for bookkeeping, and that the answer linked to a published paper, but that's about it... I'm having a really hard time searching for it (you can imagine that searches like optimal vector are getting me nowhere).
Where can I find the paper?

I think that the paper you are referring to is "Resizable Arrays in Optimal Time and Space" by Brodnik et al. Their data structure uses the lazy copying dynamic array you mentioned in your question as a building block to assemble this structure. There is this older question on Stack Overflow describing the lazy-copying data structure, which might be useful to get a better feel for how it works.
Hope this helps!

Related

Need a good overview for Succinct Data Structures [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Cross posted: Need a good overview for Succinct Data Structure algorithms
Since I knew about Succinct Data Structures I'm in a desperate need of a good overview of most recent developments in that area.
I have googled and read a lot of articles I could see in top of google results on requests from top of my head. I still suspect I have missed something important here.
Here are topics of particular interest for me:
Succinct encoding of binary trees with efficient operations of getting parent, left/right child, number of elements in a subtree.
The main question here is as following: all approaches I know of assume the tree nodes enumerated in breath-first order (like in the pioneer work in this area Jacobson, G. J (1988). Succinct static data structures), which does not seem appropriate for my task. I deal with huge binary trees given in depth-first layout and the depth-first node indices are keys to other node properties, so changing the tree layout has some cost for me which I'd like to minimize. Hence the interest in getting references to works considering other then BF tree layouts.
Large variable-length items arrays in external memory. The arrays are immutable: I don't need to add/delete/edit the items. The only requirement is O(1) element access time and as low overhead as possible, better then straightforward offset and size approach. Here is some statistics I gathered about typical data for my task:
typical number of items - hundreds of millions, up to tens of milliards;
about 30% of items have length not more then 1 bit;
40%-60% items have length less then 8 bits;
only few percents of items have length between 32 and 255 bits (255 bits is the limit)
average item length ~4 bit +/- 1 bit.
any other distribution of item lengths is theoretically possible but all practically interesting cases have statistics close to the described above.
Links to articles of any complexity, tutorials of any obscurity, more or less documented C/C++ libraries, - anything what was useful for you in similar tasks or what looks like that by your educated guess - all such things are gratefully appreciated.
Update: I forgot to add to the question 1: binary trees I'm dealing with are immutable. I have no requirements for altering them, all i need is only traversing them in various ways always moving from node to children or to parent, so that the average cost of such operations was O(1).
Also, typical tree has milliards of nodes and should not be fully stored in RAM.
Update 2 Just if someone interested. I got a couple of good links in https://cstheory.stackexchange.com/a/11265/9276.

What are the standard data structures that can be used to efficiently represent the world of minecraft? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am thinking of something like a 3x3 matrix for each of the x,y,z coordinates. But that would be a waste of memory since a lot of block spaces are empty. Another solution would be to have a hashmap ((x,y,z) -> BlockObject), but that doesn't seem too efficient either.
When I say efficient, I do not mean optimal. It simply means that it would be enough to run smoothly on your modern day computer. Keep in mind, that the worlds generated by minecraft are quite huge, efficiency is important regardless. There's is also tons of meta-data that needs to be stored.
As noted in my comment, I have no idea how MineCraft does this, but a common efficient way of representing this sort of data is an Octree; http://en.wikipedia.org/wiki/Octree. The general idea is that it's like a binary tree but in three-space. You recursively divide each block of space in each dimension to get eight smaller blocks, and each block contains the pointers to the smaller blocks and a pointer to its parent block.
This allows you to be efficient about storing large blocks of the same material (e.g., "empty space"), because you can terminate the recursion whenever you get to a block that is made up of all the same thing, even if you haven't recursed down to the level of individual "cube" units.
Also, this means that you can efficiently find all the cubes in a given region by taking your current block and going up the tree just far enough to get to a block that contains all you can see -- and that way, you can very easily ignore all the cubes that are somewhere else.
If you're interested in exploring alternative means to represent Minecraft world (chunk)data, you can also look into the idea of bitstrings. Each 'chunk' is comprised of a volume 16*16*128, whereas 16*16 can adequately be represented by a single byte character and can be consolidated into a binary string.
As this approach is highly specific to a certain goal of trading client-computation vs highly optimized storage and transfer time, it seems imprudent to attempt to explain all the details, but I have created a specification for just this purpose, if you're interested.
Using this method, calculating storage cost is drastically different than the current 1byte-per-block, but instead is 'variable-bit-rate': ((1bit-per-block, rounded up to a multiple of 8) * (number of unique layers a blocktype appears in a chunk) + 2bytes)
This is then summed for the (unique number of blocktypes in that chunk).
Pretty much only in deliberate edgecases can this be more expensive than a normally structured chunk, in excess of 99% of Minecraft chunks are naturally generated and would benefit from this variable-bit-representation by a ratio of 8:1 or more in many of my tests.
Your best bet is to decompile Minecraft and look at the source. Modifying Minecraft: The Source Code is a nice walkthrough on how to do that.
Minecraft is very far from efficent. It just stores "chunks" of data.
Check out the "Map formats" in the Development Resources at Minecraft Wiki. AFAIK, the internal representation is exactly the same.

Is there any summary which describes "real-life" applications of various data structures? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Practical uses of different data structures
Could please anyone point me out to a brief summary which describes real-life applications of various data structures? I am looking for ready-to-use summary not a reference to the Cormen's book :)
For example, almost every article says what a Binary tree is; but they doesn't provide with examples when they really should be used in real-life; the same for other data structures.
Thank you,
Data structures are so widely used that this summary will be actually enormous. The simplest cases are used almost every day -- hashmaps for easy searching of particular item. Linked lists -- for easy adding/removing elements(you can for example describe object's properties with linked lists and you can easily add or remove such properties). Priority queues -- used for many algorithms (Dijsktra's algorithm, Prim's algorithm for minimum spanning tree, Huffman's encoding). Trie for describing dictionary of words. Bloom filters for fast and cheap of memory search (your email's spam filter may use this). Data structures are all around us -- you really should study and understand them and then you can find application for them everywhere.

Advantages of BTree+ over BTree [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
B- trees, B+ trees difference
What are the advantages/disadvantages of BTree+ over BTree? When should I prefer one over other? I'm also interested in knowing any real world examples where one has been preferred over other.
According to the Wikipedia article about BTree+, this kind of data structure is frequently used for indexing block-oriented storage. Apparently, BTree+ stored keys (and not values) are stored in the intermediate nodes. This would mean that you would need fewer intermediate node blocks and would increase the likelihood of a cache hit.
Real world examples include various file systems; see the linked article.

Skip Lists -- ever used them? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm wondering whether anyone here has ever used a skip list. It looks to have roughly the same advantages as a balanced binary tree but is simpler to implement. If you have, did you write your own, or use a pre-written library (and if so, what was its name)?
My understanding is that they're not so much a useful alternative to binary trees (e.g. red-black trees) as they are to B-trees for database use, so that you can keep the # of levels down to a feasible minimum and deal w/ base-K logs rather than base-2 logs for performance characteristics. The algorithms for probabilistic skip-lists are (IMHO) easier to get right than the corresponding B-tree algorithms. Plus there's some literature on lock-free skip lists. I looked at using them a few months ago but then abandoned the effort on discovering the HDF5 library.
literature on the subject:
Papers by Bill Pugh:
A skip list cookbook
Skip lists: A probabilistic alternative to balanced trees
Concurrent Maintenance of Skip Lists
non-academic papers/tutorials:
Eternally Confuzzled (has some discussion on several data structures)
"Skip Lists" by Thomas A. Anastasio
Actually, for one of my projects, I am implementing my own full STL. And I used a skiplist to implement my std::map. The reason I went with it is that it is a simple algorithm which is very close to the performance of a balanced tree but has much simpler iteration capabilities.
Also, Qt4's QMap was a skiplist as well which was the original inspiration for my using it in my std::map.
Years ago I implemented my own for a probabilistic algorithms class. I'm not aware of any library implementations, but it's been a long time. It is pretty simple to implement. As I recall they had some really nice properties for large data sets and avoided some of the problems of rebalancing. I think the implementation is also simpler than binary tries in general. There is a nice discussion and some sample c++ code here:
http://www.ddj.us/cpp/184403579?pgno=1
There's also an applet with a running demonstration. Cute 90's Java shininess here:
http://www.geocities.com/siliconvalley/network/1854/skiplist.html
Java 1.6 (Java SE 6) introduced ConcurrentSkipListSet and ConcurrentSkipListMap to the collections framework. So, I'd speculate that someone out there is really using them.
Skiplists tend to offer far less contention for locks in a multithreaded situation, and (probabilistically) have performance characteristics similar to trees.
See the original paper [pdf] by William Pugh.
I implemented a variant that I termed a Reverse Skip List for a rules engine a few years ago. Much the same, but the reference links run backward from the last element.
This is because it was faster for inserting sorted items that were most likely towards the back-end of the collection.
It was written in C# and took a few iterations to get working successfully.
The skip list has the same logarithmic time bounds for searching as is achieved by the binary search algorithm, yet it extends that performance to update methods when inserting or deleting entries. Nevertheless, the bounds are expected for the skip list, while binary search of a sorted table has a worst-case bound.
Skip Lists are easy to implement. But, adjusting the pointers on a skip list in case of insertion and deletion you have to be careful. Have not used this in a real program but, have doen some runtime profiling. Skip lists are different from search trees. The similarity is that, it gives average log(n) for a period of dictionary operations just like the splay tree. It is better than an unbalanced search tree but is not better than a balanced tree.
Every skip list node has forward pointers which represent the current->next() connections to the different levels of the skip list. Typically this level is bounded at a maximum of ln(N). So if N = 1million the level is 13. There will be that much pointers and in Java this means twie the number of pointers for implementing reference data types. where as a balanced search tree has less and it gives same runtime!!.
SkipList Vs Splay Tree Vs Hash As profiled for dictionary look up ops a lock stripped hashtable will give result in under 0.010 ms where as a splay tree gives ~ 1 ms and skip list ~720ms.

Resources