Iterating over unordered_map C++ - c++11

Is it true that keys inserted in a particular order in an unordered_map, will come in the same order while iterating over the map using iterator?
Like for example: if we insert (4,3), (2, 5), (6, 7) in B.
And iterate like:
for(auto it=B.begin();it!=B.end();it++) {
cout<<(it->first);
}
will it give us 4, 2, 6 or keys may come in any order?

From the cplusplus.com page about the begin member function of unordered_map (link):
Notice that an unordered_map object makes no guarantees on which specific element is considered its first element.
So no, there is no guarantee the elements will be iterated over in the order they were inserted.
FYI, you can iterate over an unordered_map more simply:
for (auto& it: B) {
// Do stuff
cout << it.first;
}

Information added to the answer provided by #Aimery,
Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements
have average constant-time complexity.
Internally, the elements are not sorted in any particular order but organized into buckets. Which bucket an element is placed into
depends entirely on the hash of its key. This allows fast access to
individual elements since once the hash is computed, it refers to the
exact bucket the element is placed into.
See the ref. from https://en.cppreference.com/w/cpp/container/unordered_map.
According to Sumod Mathilakath gave an answer in Quora
If you prefer to keep intermediate data in sorted order, use std::map<key,value> instead std::unordered_map. It will sort on key by default using std::less<> so you will get result in ascending order.
std::unordered_map is an implementation of hash table data structure, so it will arrange the elements internally according to the hash value using by std::unordered_map. But in case std::map it is usually a red black binary tree implementation.
See the ref. from What will be order of key in unordered_map in c++ and why?.
So, I think we got the answer more clearly.

Related

How can I invert the order of items in a golang map? [duplicate]

This question already has answers here:
Collect values in order, each containing a map
(7 answers)
Closed 4 years ago.
I'm new to golang and I'd like to invert the order of appearance of pairs in a map like this so that the last pair comes first:
mapA := map[string]int {
"cat": 5,
"dog": 2,
"fish": 3 ,
}
fmt.Println(mapA)
map[cat:5 dog:2 fish:3]
The resulting map should be like:
map[fish:3 dog:2 cat:5]
It can be a new mapB with the same items but iverted order.
How can I achieve this?
The Go Programming Language Specification
Map types
A map is an unordered group of elements.
For statements
The iteration order over maps is not specified and is not guaranteed
to be the same from one iteration to the next.
You can't. A Go map is not ordered. A Go map is a hash map (hash table).
To get the map contents in order (or reverse order), read the map contents into a slice and sort it.
Hash table - Wikipedia
A hash table uses a hash function to compute an index into an array of
buckets or slots, from which the desired value can be found.
Drawbacks
The entries stored in a hash table can be enumerated efficiently (at
constant cost per entry), but only in some pseudo-random order.
Therefore, there is no efficient way to locate an entry whose key is
nearest to a given key. Listing all n entries in some specific order
generally requires a separate sorting step, whose cost is proportional
to log(n) per entry.

Ruby- delete a value from sorted (unique) array at O(log n) runtime

I have a sorted array (unique values, not duplicated).
I know I can use Array#binarysearch but it's used to find values not delete them.
Can I delete a value at O(log n) as well? How?
Lets say I have this array:
arr = [-3, 4, 7, 12, 15, 20] #very long array
And I would like to delete the value 7.
So far I have this:
arr.delete(7) #I'm quite sure it's O(n)
Assuming Array#delete-at works at O(1).
I could do arr.delete_at(value_index)
Now I just need to get the value's index.
binary search can do it, since the array is already sorted.
But the only method utilizing the sorted attribute (that i know of) is binary search which returns values, nothing about deleting or returning indexes.
To sum it up:
1) How to delete a value from sorted not duplicated array at O(log n) ?
Or
2) Assuming array#delete-at works at O(1) (does it?), how can I get the value's index at O(log n)? ( I mean the array is already sorted, must I implement it myself?)
Thank you.
The standard Array implementation has no constraint on sorting or duplicate. Therefore, the default implementation has to trade performance with flexibility.
Array#delete deletes an element in O(n). Here's the C implementation. Notice the loop
for (i1 = i2 = 0; i1 < RARRAY_LEN(ary); i1++) {
...
}
The cost is justified by the fact Ruby has to scan all the items matching given value (note delete deletes all the entries matching a value, not just the first), then shift the next items to compact the array.
delete_at has the same cost. In fact, it deletes the element by given index, but then it uses memmove to shift the remaining entries one index less on the array.
Using a binary search will not change the cost. The search will cost you O(log n), but you will need to delete the element at given key. In the worst case, when the element is in position [0], the cost to shift all the other items in memory by 1 position will be O(n).
In all cases, the cost is O(n). This is not unexpected. The default array implementation in Ruby uses arrays. And that's because, as said before, there are no specific constraints that could be used to optimize operations. Easy iteration and manipulation of the collection is the priority.
Array, sorted array, list and sorted list: all these data structures are flexible, but you pay the cost in some specific operations.
Back to your question, if you care about performance and your array is sorted and unique, you can definitely take advantage of it. If your primary goal is finding and deleting items from your array, there are better data structures. For instance, you can create a custom class that stores your array internally using a d-heap where the delete() costs O(log[d,n]), same applies if you use a binomial heap.

Hashtable and the bucket array

I read that into a hash table we have a bucket array but I don't understand what that bucket array contains.
Does it contain the hashing index? the entry (key/value pair)? both?
This image, for me, is not very clear:
(reference)
So, which is a bucket array?
The array index is mostly equivalent to the hash value (well, the hash value mod the size of the array), so there's no need to store that in the array at all.
As to what the actual array contains, there are a few options:
If we use separate chaining:
A reference to a linked-list of all the elements that have that hash value. So:
LinkedList<E>[]
A linked-list node (i.e. the head of the linked-list) - similar to the first option, but we instead just start off with the linked-list straight away without wasting space by having a separate reference to it. So:
LinkedListNode<E>[]
If we use open addressing, we're simply storing the actual element. If there's another element with the same hash value, we use some reproducible technique to find a place for it (e.g. we just try the next position). So:
E[]
There may be a few other options, but the above are the best-known, with separate-chaining being the most popular (to my knowledge)
* I'm assuming some familiarity with generics and Java/C#/C++ syntax - E here is simply the type of the element we're storing, LinkedList<E> means a LinkedList storing elements of type E. X[] is an array containing elements of type X.
What goes into the bucket array depends a lot on what is stored in the hash table, and also on the collision resolution strategy.
When you use linear probing or another open addressing technique, your bucket table stores keys or key-value pairs, depending on the use of your hash table *.
When you use a separate chaining technique, then your bucket array stores pairs of keys and the headers of your chaining structure (e.g. linked lists).
The important thing to remember about the bucket array is that it establishes a mapping between a hash code and a group of zero or more keys. In other words, given a hash code and a bucket array, you can find out, in constant time, what are the possible keys associated with this hash code (enumerating the candidate keys may be linear, but finding the first one needs to be constant time in order to meet hash tables' performance guarantee of amortized constant time insertions and constant-time searches on average).
* If your hash table us used for checking membership (i.e. it represents a set of keys) then the bucket array stores keys; otherwise, it stores key-value pairs.
In practice a linked list of the entries that have been computed (by hashing the key) to go into that bucket.
In a HashTable there are most of the times collisions. That is when different elements have the same hash value. Elements with the same Hash value are stored in one bucket. So for each hash value you have a bucket containing all elements that have this hash-value.
A bucket is a linked list of key-value pairs. hash index is the one
to tell "which bucket", and the "key" in the key-value pair is the one to tell "which entry in that bucket".
also check out
hashing in Java -- structure & access time, i've bee telling more details there.

Programming : find the first unique string in a file in just 1 pass

Given a very long list of Product Names, find the first product name which is unique (occurred exactly once ). You can only iterate once in the file.
I am thinking of taking a hashmap and storing the (keys,count) in a doubly linked list.
basically a linked hashmap
can anyone optimize this or give a better approach
Since you can only iterate the list once, you have to store
each string that occurs exactly once, because it could be the output
their relative position within the list
each string that occurs more than once (or their hash, if you're not afraid)
Notably, you don't have to store the relative positions of strings that occur more than once.
You need
efficient storage of the set of strings. A hash set is a good candidate, but a trie could offer better compression depending on the set of strings.
efficient lookup by value. This rules out a bare list. A hash-set is the clear winner, but a trie also performs well. You can store the leaves of the trie in a hash set.
efficient lookup of the minimum. This asks for a linked list.
Conclusion:
Use a linked hash-set for the set of strings, and a flag indicating if they're unique. If you're fighting for memory, use a linked trie. If a linked trie is too slow, store the trie leaves in a hash map for look-up. Include only the unique strings in the linked list.
In total, your nodes could look like: Node:{Node[] trieEdges, Node trieParent, String inEdge, Node nextUnique, Node prevUnique}; Node firstUnique, Node[] hashMap
If you strive for ease of implementation, you can have two hash-sets instead (one linked).
The following algorithm solves it in O(N+M) time.
where
N=number of strings
M=total number of characters put together in all strings.
The steps are as follows:
`1. Create a hash value for each string`
`2. Xor it and find the one which didn't have a pair`
Xor has this useful property that if you do a xor a=0 and b xor 0=b.
Tips to generate the hash value for a string:
Use a 27 base number system, and give a a value of 1, b a value of 2 and so on till z which gets 26, and so if string is "abc" , we compute hash value as:
H=3*(27 power 0)+2*(27 power 1)+ 1(27 power 2)
=786
You could use modulus operator to make hash values small enough to fit in 32-bit integers.If you do that keep an eye out for collisions, which are basically two strings which are different but get the same hash value due to the modulus operation.
Mostly I guess you won't be needing it.
So compute the hash for each string, and then start from the first hash and keep xor-ing, the result will hold the hash value of the string which din't have a pair.
Caution:This is useful only when strings occur in pairs.Still this is a good idea to start with, that's why I answered it.
Using a linked hashmap is obvious enough. Otherwise, you could use a TreeMap style data structure where the strings are ordered by count. So as soon as you are done reading the input, the root of your tree is unique if a unique string exists. Unlike a linked hash map, insertion takes at most O(log n) as opposed to O(n). You can read up on TreeMaps for insight on how to augment a basic TreeMap into what you need. Also in your linked hashmap you may have to travel O(n) to find your first unique key. With a TreeMap style data structure, your look up is O(1) -- the root. Even if more unique keys exist, the first one you encountered will be the root. The subsequent ones will be children of the root.

Hashtable with doubly linked lists?

Introduction to Algorithms (CLRS) states that a hash table using doubly linked lists is able to delete items more quickly than one with singly linked lists. Can anybody tell me what is the advantage of using doubly linked lists instead of single linked list for deletion in Hashtable implementation?
The confusion here is due to the notation in CLRS. To be consistent with the true question, I use the CLRS notation in this answer.
We use the hash table to store key-value pairs. The value portion is not mentioned in the CLRS pseudocode, while the key portion is defined as k.
In my copy of CLR (I am working off of the first edition here), the routines listed for hashes with chaining are insert, search, and delete (with more verbose names in the book). The insert and delete routines take argument x, which is the linked list element associated with key key[x]. The search routine takes argument k, which is the key portion of a key-value pair. I believe the confusion is that you have interpreted the delete routine as taking a key, rather than a linked list element.
Since x is a linked list element, having it alone is sufficient to do an O(1) deletion from the linked list in the h(key[x]) slot of the hash table, if it is a doubly-linked list. If, however, it is a singly-linked list, having x is not sufficient. In that case, you need to start at the head of the linked list in slot h(key[x]) of the table and traverse the list until you finally hit x to get its predecessor. Only when you have the predecessor of x can the deletion be done, which is why the book states the singly-linked case leads to the same running times for search and delete.
Additional Discussion
Although CLRS says that you can do the deletion in O(1) time, assuming a doubly-linked list, it also requires you have x when calling delete. The point is this: they defined the search routine to return an element x. That search is not constant time for an arbitrary key k. Once you get x from the search routine, you avoid incurring the cost of another search in the call to delete when using doubly-linked lists.
The pseudocode routines are lower level than you would use if presenting a hash table interface to a user. For instance, a delete routine that takes a key k as an argument is missing. If that delete is exposed to the user, you would probably just stick to singly-linked lists and have a special version of search to find the x associated with k and its predecessor element all at once.
Unfortunately my copy of CLRS is in another country right now, so I can't use it as a reference. However, here's what I think it is saying:
Basically, a doubly linked list supports O(1) deletions because if you know the address of the item, you can just do something like:
x.left.right = x.right;
x.right.left = x.left;
to delete the object from the linked list, while as in a linked list, even if you have the address, you need to search through the linked list to find its predecessor to do:
pred.next = x.next
So, when you delete an item from the hash table, you look it up, which is O(1) due to the properties of hash tables, then delete it in O(1), since you now have the address.
If this was a singly linked list, you would need to find the predecessor of the object you wish to delete, which would take O(n).
However:
I am also slightly confused about this assertion in the case of chained hash tables, because of how lookup works. In a chained hash table, if there is a collision, you already need to walk through the linked list of values in order to find the item you want, and thus would need to also find its predecessor.
But, the way the statement is phrased gives clarification: "If the hash table supports deletion, then its linked lists should be doubly linked so that we can delete an item quickly. If the lists were only singly linked, then to delete element x, we would first have to find x in the list T[h(x.key)] so that we could update the next attribute of x’s predecessor."
This is saying that you already have element x, which means you can delete it in the above manner. If you were using a singly linked list, even if you had element x already, you would still have to find its predecessor in order to delete it.
I can think of one reason, but this isn't a very good one. Suppose we have a hash table of size 100. Now suppose values A and G are each added to the table. Maybe A hashes to slot 75. Now suppose G also hashes to 75, and our collision resolution policy is to jump forward by a constant step size of 80. So we try to jump to (75 + 80) % 100 = 55. Now, instead of starting at the front of the list and traversing forward 85, we could start at the current node and traverse backwards 20, which is faster. When we get to the node that G is at, we can mark it as a tombstone to delete it.
Still, I recommend using arrays when implementing hash tables.
Hashtable is often implemented as a vector of lists. Where index in vector is the key (hash).
If you don't have more than one value per key and you are not interested in any logic regarding those values a single linked list is enough. A more complex/specific design in selecting one of the values may require a double linked list.
Let's design the data structures for a caching proxy. We need a map from URLs to content; let's use a hash table. We also need a way to find pages to evict; let's use a FIFO queue to track the order in which URLs were last accessed, so that we can implement LRU eviction. In C, the data structure could look something like
struct node {
struct node *queueprev, *queuenext;
struct node **hashbucketprev, *hashbucketnext;
const char *url;
const void *content;
size_t contentlength;
};
struct node *queuehead; /* circular doubly-linked list */
struct node **hashbucket;
One subtlety: to avoid a special case and wasting space in the hash buckets, x->hashbucketprev points to the pointer that points to x. If x is first in the bucket, it points into hashbucket; otherwise, it points into another node. We can remove x from its bucket with
x->hashbucketnext->hashbucketprev = x->hashbucketprev;
*(x->hashbucketprev) = x->hashbucketnext;
When evicting, we iterate over the least recently accessed nodes via the queuehead pointer. Without hashbucketprev, we would need to hash each node and find its predecessor with a linear search, since we did not reach it via hashbucketnext. (Whether that's really bad is debatable, given that the hash should be cheap and the chain should be short. I suspect that the comment you're asking about was basically a throwaway.)
If the items in your hashtable are stored in "intrusive" lists, they can be aware of the linked list they are a member of. Thus, if the intrusive list is also doubly-linked, items can be quickly removed from the table.
(Note, though, that the "intrusiveness" can be seen as a violation of abstraction principles...)
An example: in an object-oriented context, an intrusive list might require all items to be derived from a base class.
class BaseListItem {
BaseListItem *prev, *next;
...
public: // list operations
insertAfter(BaseListItem*);
insertBefore(BaseListItem*);
removeFromList();
};
The performance advantage is that any item can be quickly removed from its doubly-linked list without locating or traversing the rest of the list.

Resources