Why std::map has got bidirectional iterator type? - std

I'm studying different types of iterators right now. I've read that std::map has got bidirectional iterators. And std::set, std::list also have got this type of iterators. Why are they not random access iterators? Can anyone explain this to me? Thank you!

The Standard C++ library provides random access iterators to container types
for which access to an arbitrary element takes constant time, e.g. std::array,std::vector,
std::deque. That is because
a random access container is one of
which any element can be accessed in constant time.
std::list is not a random access container type:
access takes linear time. Neither is std::set: access
takes logarithmic time. Likewise std::map.

Related

Hashtable with both values as key

Is there a hashing based data structure where I can search an item in O(1) time on both key and value.
This can be achieved by adding duplicate entry in the list for each key value par by reversing key and value, but it will take double the space.
This kind of data structure might be useful in some scenarios: like I want to store opening and closing parenthesis in a map and while parsing the string, I can just check in the map if the key is present without worrying about whether it is opening-closing map or closing-opening map or without storing duplicate.
I hope I am clear enough!!
Data structure that fulfills your needs is called bidirectional map.
I suppose that you are looking for the existing implementation, not for the pointers how to implement it :) Since you didn't specify the programming language, this is the current situation for Java - there is no such data structure in Java API. However, there is Google Guava's bi-directional map interface with several implementations. From the docs:
A bimap (or "bidirectional map") is a map that preserves the
uniqueness of its values as well as that of its keys. This constraint
enables bimaps to support an "inverse view", which is another bimap
containing the same entries as this bimap but with reversed keys and
values.
Alternatively, there is BidiMap from Apache Collections.
For C++, have a look at Boost.Bimap.
For Python, have a look at bidict.
In C#, as well as in other languages, there does not exist an official implementation, but that's where Jon Skeet comes in.
You're searching for a bidirectional map. Here is an article describing the implementation in c++. Note though that a bidirectional map is basically two maps merged into a single object. There isn't any more efficient solution than this though, for a simple reason:
a map is basically an unconnected directed graph of (key,value)-pairs. Each pair is represented by an edge. If you want the map to be bidirectional you'll wind up with twice as many edges, thus doubling the amount of required memory.
C++ and Java STL don't provide any classes for this purpose though. In Java you can use Googles Guava library, in C++ the boost-library provides bi-directional maps.

An efficient Javascript set structure

After reading many similar questions:
JavaScript implementation of a set data structure
Mimicking sets in JavaScript?
Node JS, traditional data structures? (such as Set, etc), anything like Java.util for node?
Efficient Javascript Array Lookup
Best way to find if an item is in a JavaScript array?
How do I check if an array includes an object in JavaScript?
I still have a question: suppose I have a large array of strings (several thousands), and I have to make many lookups (i.e. check many times whether a given string is contained in this array). What is the most efficient way to do this in Node.js ?
A. Sort the array of strings, then use binary search? or:
B. Convert the strings to keys of an object, then use the "in" operator
?
I know that the complexity of A is O(log N), where N is the number of strings.
But I don't know the complexity of B.
IF a Javascript object is implemented as a hash table, then the complexity of B is, on average, O(1), which is better than A. However, I don't know if a Javascript object is really implemented as a hash table!
Update for 2016
Since you're asking about node.js and it is 2016, you can now use either the Set or Map object from ES6 as these are built into ES6. Both allows you to use any string as a key. The Set object is appropriate when you just want to see if the key exists as in:
if (mySet.has(someString)) {
//code here
}
And, Map is appropriate when you want to store a value for that key as in:
if (myMap.has(someString)) {
let val = myMap[someString];
// do something with val here
}
Both ES6 features are now built into node.js as of node V4 (the current version of node.js as of this edit is v6).
See this performance comparison to see how much faster the Set operations are than many other choices.
Older Answer
All important performance questions should be tested with actual performance tests in a tool like jsperf.com. In your case, a javascript object uses a hash-table like implementation because without something that performs pretty well, the whole implementation would be slow since so much of javascript uses object.
String keys on an object would be the first thing I'd test and would be my guess for the best performer. Since the internals of an object are implemented in native code, I'd expect this to be faster than your own hashtable or binary search implemented in javascript.
But, as I started my answer with, you should really test your specific circumstance with the number and length of strings you are most concerned about in a tool like jsperf.
For fixed large array of string I suggest to use some form of radix search
Also, take a look at different data structures and algorithms (AVL trees, queues/heaps etc) in this package
I'm pretty sure that using JS object as storage for strings will result in 'hash mode' for that object. Depending on implementation this could be O(log n) to O(1) time. Look at some jsperf benchmarks to compare property lookup vs binary search on sorted array.
In practice, especially if I'm not going to use the code in browser I would offload this functionality to something like redis or memcached.

Data Structure that supports constant time for addbegin, addend and random access

I am looking for a data structure that supports constant time performance for adding an element to the beginning, end, and random access.
I am thinking double ended queue. Does double ended queue support constant time performance for random access? If so, how does it achieve it?
I know one can use double linked list to build a double ended queue. But How do you build an index on all the elements to achieve constant time random access?
Thank you for your help.
Jerry
What you're looking for would be a double-ended queue by definition, but that's just an abstract data structure. Wikipedia discusses several implementation strategies for deques in terms of dynamic arrays; using these, you can get amortized O(1) prepend and append. The circular buffer strategy seems the easiest to implement, at first glance.

why 2D array is better than objects to store x-y coordinates for better performance and less memory?

Assuming I want to store n points with integer (x,y) coordinates. I can use a 2-d (2Xn) array or use a list / collection / or an array of n objects where each object has 2 integer fields to store the coordinates.
As far as I know is the 2d array option is faster and consumes less memory, but I don't know why? Detailed explanation or links with details are appreciated.
This is a very broad question, and kinda has many parts to it. First off, this is relative to the language you are working in. Lets take Java as an example.
When you create an object, it inherits from the main object class. When the object is created, the overhead comes from the fact that the user defined class inherits from Object. The compiler has to virtualize certain method calls in memory so that when you call .equals() or .toString(), the program knows which one to call (that is, your classes' .equals() or Object's .equals()). This is accomplished with a lookup table and determined at runtime with pointers.
This is called virtualization. Now, in java, an array is actually an object, so you really don't gain much from an array of arrays. In fact, you might do better using your own class, since you can limit the metadata associated with it. Arrays in java store information on their length.
However, many of the collections DO have overhead associated with them. ArrayList for example will resize itself and stores metadata about itself in memory, that you might not need. LinkedList has references to other nodes, which is overhead to its actual data.
Now, what I said is only true about Java. In other OO languages, objects behave differently on the insides, and some may be more/less efficient.
In a language such as C++, when you allocate an array, you are really just getting a chunck of memory and it is up to you what you want to do with it. In that sense, it might be better. C++ has similar overhead with its objects if you use overriding (keyword virtual) as it will create these virtual lookups in memory.
All comes down to how efficiently you'll be using the storage space and what your access requirements are. Having to set aside memory to hold a 10,000 x 10,000 array to store only 10 points would be a hideous waste of memory. On the flip side, saving memory by storing the points in a linked list will also be pointless if you spend so much time iterating the list to find the one point you actually need in the 10,000,000 stored.
Some of the downsides of both can be overcome. sparse arrays, pre-sorting the list by some rule so "needed" points float to the top, etc...
In most languages, With a multidimentional array say AxB, you just have a chunk of memory big enough to hold A*B objects, and when you look up an element (m,n) all you need to do is find the element at location m*A+b. When you have an list of objects, there is overhead associated with every list, plus the lookup is more complex than a simple address calculation.
If the size of your matrix is constant, a 2D array is the fastest option. If it needs to grow and shrink though you probably have no option but to use the second approach.

What is the standard OCaml data structure with fastest iteration?

I'm looking for a container that provides fastest unordered iterations through the encapsulated elements. In other words, "add once, iterate many times".
Is there one among OCaml's standard modules that is fast enough (such that further optimization of it would be useless)? Or some kind of third-party GPL-ready ones?
AFAIK there's just one OCaml compiler, so the concept of being fast is more or less clear...
...But after I saw a couple of answers, it appears, it's not. Of course, there's a plenty of data structures that allow O(n) iteration through container of size n. But the task I'm solving is one of those, where difference between O(n) and O(2n) matters ;-).
I also see that Arrays and Lists provide unnecessary information about the order of elements added, which I don't need. Maybe in "functional world" there exists data structures such that can trade this information for a bit of iteration speed.
In C I would outright pick a plain array. The question is, what should I pick in OCaml?
You are unlikely to do better than built-in arrays and lists, since they are hand-coded in C, unless you bind to your own native implementation of an iterator. An array will behave almost exactly like an array in C (a contiguously allocated block of memory containing a sequence of element values), possibly with some extra pointer indirections due to boxing. List are implemented exactly how you would expect: as cells with a value and a "next" pointer. Arrays will give you the best locality for unboxed types (especially floats, which have a super-special unboxed implementation).
For information about the implementation of arrays and lists, see Section 18.3 of the OCaml manual and the files byterun/mlvalues.h, byterun/array.c, and byterun/alloc.c in the OCaml source code.
From the questioner: indeed, Array appeared to be the fastest solution. However it only outperformed List by 7%. Maybe it was because the type of an array element was not plain enough: it was an algebraic type. Hashtbl performed 4 times worse, as expected.
So, I will pick Array and I'm accepting this one. good.
To know for sure, you're going to have to measure. Based on the machine instructions the compiler is likely to generate, I would try an array, then a list.
Access to an array element requires a bounds check, address arithmetic, and a load
Access to the head of a list requires a load, a test for empty list, and a load at a known compile-time offset.
The details of which is faster probably depend on your application and what else is happening on your machine. They also depend on the type of elements; for example, if they are floating-point numbers, ocamlopt may be clever enough to make an unboxed array, which will save you a level of indirection.
Other common data structures like hash tables or balanced trees generally require that you allocate some context somewhere to keep track of where you are. With an array, keeping track requires only an integer index; with a list, keeping track requires a single pointer. I think this is going to be hard to beat in another data structure.
Finally please note that there may be only one OCaml compiler, but it has two back ends: bytecode and native code. Naturally if you care about this level of performance, you are using the native-code ocamlopt version. Right?
Please take measurements and edit the results into your question.
Don't forget about Bigarrays, they are most close to C arrays (just a flat piece of memory), but cannot contain arbitrary OCaml values. Also consider switching bounds checking off (unsafe_set/get). And of course you should profile first.
The array - a linear piece of memory with the items visited in sequential order - best utilises the CPU's L1 data cache.
All common data structures are iterable in O(n) time, so the differences between data structures will only be constant (and very probably not significant).
At least lists and arrays allow iteration without significant overhead. I can't think of a situation where that would not be fast enough.

Resources