Is there such a data-structure that combines queue and hashmap? - data-structures

Is there such a data structure that combines a Queue and a Hashmap?
In addition to the FIFO (enqueue/dequeue) behaviour where a queue normally has, I want
when enqueuing, always enqueue with a key,
when peeking without the key, returns the head of the queue
when peeking with the key, returns the first element enqueued with this key
when dequeuing without the key, remove the first element ever enqueued
when dequeuing with the key, remove all elements having the key
I wonder if such data structure already exist in the wild?

No there is not. But you can combine both to achieve the behavior you want (though you will have to make tradeoffs along the way).
To do so, you will store:
A HashMap where the values are references to items in the queue: HashMap<Key, ReferenceToFIFOElement> or HashMap<Key, Set<ReferenceToFIFOElement>>.
An actual FIFO queue: FIFO<Item>
When you enqueue, you first add your element at the top of the queue. Then you update the hashmap with a reference to this newly created element if the key was not registered yet (or add the said reference to the reference bucket mapped to the given key in the set case).
Peeking will be easy: just retrieve the key and access the referenced item (or the first referenced item in the set case, or the top if no key were provided).
Dequeuing is where the real tradeoff will take place:
If you only store a reference to the first item inserted with a given key in the hashmap, then you will have to iterate over all the queue, starting from the said item. This means an overall higher time complexity.
If you store all the references to items with a given key in the hashmap (using a set), then you will just have to iterate over that set and remove the referenced elements from the queue. This increases the space complexity of the data structure.
However, in reality it can be more complicated depending on the data structure you choose to place under the hood of the FIFO:
Array list: cache friendly, random access... But can require reallocation as you insert/delete elements. This invalidates references -> store indices instead of actual references.
Linked list: not cache friendly but insertion and deletion are guaranteed to be O(1).

Related

Best statically allocated data structure for writing and extending contiguous blocks of data?

Here's what I want to do:
I have an arbitrary number of values of a different kind: string, int, float, bool, etc. that I need to store somehow. Multiple elements are often written and read as a whole, forming "contiguous blocks" that can also be extended and shortened at the users wish and even elements in the middle might be taken out. Also, the whole thing should be statically allocated.
I was thinking about using some kind of statically allocated forward lists. The way I imagine this to work is defining an array of a struct containing one std::variant field and a field "previous head" which always points to the location of the previous head of the list. A new element is always placed at the globally known "head" which it stores inside "previous head" field. This way I can keep track of holes inside my list because once an element is taken out, its location is written to global head and will be filled up by subsequent inserts.
This approach however has downsides: When a "contiguous block" is extended, there might be the case that further elements of other blocks have already queued up in the list past its last element. So I either need to move all subsequent entries or copy over the last element in the previous list and insert a link object that allows me to jump to the new location when traversing the contiguous block.
The priority to optimize this datastructure is following (by number of use cases):
Initially write contigous blocks
read the whole data structure
add new elements to contigous blocks
remove elements of contigous blocks
At the moment my data structure will have time complexity of O(1) für writes, O(n) for continous reads (with the caveat that in the worst case there is a jump to the next location inside the array every other element), O(1) for adding new elements and O(1) for removing elements. However, space complexity is S(2n) in the worst case (when I have to do a jump every second time the slot to store data is lost to the "link").
What I'm wondering now is: Is the described way the best viable way to accomplish what I'm trying or is there a better data structure? Is there an official name for this data structure?

How to implement dynamic indexes?

I know, Maybe the title is a little confusing. however, my actual question is basic I think.
I'm working on a brand new LRU implementation for that I use an Index Table which maps the name of the incoming packet to index of where the content of packet stored in CS.
As illustrated below each incoming packet store in the CS and can be addressed by Index Table.
Now suppose new packet arrived, as we know, regarding LRU, its index must set to top of CS (zero) and it needs to upgrade other indexes, they need to be incremented as a result.
One obvious solution is to loop over all entries in the Index Table and increment them.
Is there any solution or structure that is using for such a problem?
I don't see how you are establishing the order of your cache in the description. But to answer your question, it's possible to reduce the LRU store method to O(1) time complexity.
The classical way to do it is to have these two data structures:
Doubly Linked List : for order in the cache. Each node stores a data element (it plays the role of your content store).
HashMap that associates each key to the pointer to the node in the linked list. (it plays the role of your index table)
So when you access already stored data in your cache, it must be at the top of the list, so you delete the corresponding node from the linked list (in O(1) time because you have access to its previous and next nodes) and store it at the head.
For new data it is simpler, only store it at the head of the list and store your (key, value) in the hashmap.

Heap-like data structure with fast random access?

My situation is the following:
I have a collection of entities, each of which has a "goodness" property.
I wish to grab the entities one at a time, from "best" to "worst."
After a "best" entity is grabbed, the "goodness" properties of several (relatively few) of my other entities change, and this change must be incorporated into my upcoming decision of the next "best" entity to grab.
Some (relatively few) entities may become "worthless" after a grab, and these should be removed from my collection.
It is easy for me to construct, given the entity that I just grabbed, the set of now-"dirty" objects, that is, the set of entities which potentially have a now-different "goodness," or have become "worthless."
So, I need a data structure that allows me to:
Quickly grab the "biggest" of a collection (as in, a max-heap).
Quickly update the underlying ordering of the objects in my collection to accommodate the situation described above. (Easy to do in a heap, if we can access the dirty objects' locations, e.g. array indices, within the underlying heap implementation.)
There is a guarantee that there are no collisions among the entries of my collection. (The entries are references to the entities I described above.)
The idea I have is to use a max-heap together with an unordered map, keyed on the heap entries, and having values equal to, e.g., the objects' respective indices in the underlying array in the heap implementation.
What I'm wondering is whether there may be a data structure which is better for this situation.
If few members are affected when the best entity is grabbed, then you might be able to improve the runtime by using a linked list and an unordered map (each with the original set of entities), and a max heap. After removing the best entity from the end of the linked list you'll use the map to locate the affected entities, removing them from the list and adding the non-worthless entities to the max heap. Thereafter, the next best entity is the greater of the entity at the end of the list or the max entity in the heap. The advantage of this setup is that removal from the linked list is a constant time operation, and insertion into the max heap will be a relatively small (compared to the total number of entities) log time operation.
Because entities' values can only get worse, you can lazily remove them from the linked list - if the item is worthless then remove it, and if its value has changed then flag it as "changed." Check the "changed" flag on the entity at the end of the linked list, and if it's "true" then remove the entity and add it to the max-heap. The advantage of lazy updates is that you usually won't need to update items that are in the heap (you'll just need to update the value of items in the linked list), and if an item is changed and then later made worthless then you can remove it from the linked list without ever having to add it to the heap.

Data Structure, independent of volume of data in it

Is there any data structure in which locating a data is independent of its volume ?
"locating a data is independent of volume of data in it" - I assume this means O(1) for get operations. That would be a hash map.
This presumes that you fetch the object based on the hash.
If you have to check each element to see if an attribute matches a particular value, like your rson or ern or any other parts of it, then you have to make that value the key up front.
If you have several values that you need to search on - all of the must be unique and immutable - you can create several maps, one for each value. That lets you search on more than one. But they have to all be unique, immutable, and known up front.
If you don't establish the key up front it's O(N), which means you have to check every element in turn until you find what you want. On average, this time will increase as the size of the collection grows. That's what O(N) means.

Best way to remove an entry from a hash table

What is the best way to remove an entry from a hashtable that uses linear probing? One way to do this would be to use a flag to indicate deleted elements? Are there any ways better than this?
An easy technique is to:
Find and remove the desired element
Go to the next bucket
If the bucket is empty, quit
If the bucket is full, delete the element in that bucket and re-add it to the hash table using the normal means. The item must be removed before re-adding, because it is likely that the item could be added back into its original spot.
Repeat step 2.
This technique keeps your table tidy at the expense of slightly slower deletions.
It depends on how you handle overflow and whether (1) the item being removed is in an overflow slot or not, and (2) if there are overflow items beyond the item being removed, whether they have the hash key of the item being removed or possibly some other hash key. [Overlooking that double condition is a common source of bugs in deletion implementations.]
If collisions overflow into a linked list, it is pretty easy. You're either popping up the list (which may have gone empty) or deleting a member from the middle or end of the linked list. Those are fun and not particularly difficult. There can be other optimizations to avoid excessive memory allocations and freeings to make this even more efficient.
For linear probing, Knuth suggests that a simple approach is to have a way to mark a slot as empty, deleted, or occupied. Mark a removed occupant slot as deleted so that overflow by linear probing will skip past it, but if an insertion is needed, you can fill the first deleted slot that you passed over [The Art of Computer Programming, vol.3: Sorting and Searching, section 6.4 Hashing, p. 533 (ed.2)]. This assumes that deletions are rather rare.
Knuth gives a nice refinment as Algorithm R6.4 [pp. 533-534] that instead marks the cell as empty rather than deleted, and then finds ways to move table entries back closer to their initial-probe location by moving the hole that was just made until it ends up next to another hole.
Knuth cautions that this will move existing still-occupied slot entries and is not a good idea if pointers to the slots are being held onto outside of the hash table. [If you have garbage-collected- or other managed-references in the slots, it is all right to move the slot, since it is the reference that is being used outside of the table and it doesn't matter where the slot that references the same object is in the table.]
The Python hash table implementation (arguable very fast) uses dummy elements to mark deletions. As you grow or shrink or table (assuming you're not doing a fixed-size table), you can drop the dummies at the same time.
If you have access to a copy, have a look at the article in Beautiful Code about the implementation.
The best general solutions I can think of include:
If you're can use a non-const iterator (ala C++ STL or Java), you should be able to remove them as you encounter them. Presumably, though, you wouldn't be asking this question unless you're using a const iterator or an enumerator which would be invalidated if the underlying collection is modified.
As you said, you could mark a deleted flag within the contained object. This doesn't release any memory or reduce collisions on the key, though, so it's not the best solution. Also requires the addition of a property on the class that probably doesn't really belong there. If this bothers you as much as it would me, or if you simply can't add a flag to the stored object (perhaps you don't control the class), you could store these flags in a separate hash table. This requires the most long-term memory use.
Push the keys of the to-be-removed items into a vector or array list while traversing the hash table. After releasing the enumerator, loop through this secondary list and remove the keys from the hash table. If you have a lot of items to remove and/or the keys are large (which they shouldn't be), this may not be the best solution.
If you're going to end up removing more items from the hash table than you're leaving in there, it may be better to create a new hash table, and as you traverse your original one, add to the new hash table only the items you're going to keep. Then replace your reference(s) to the old hash table with the new one. This saves a secondary list iteration, but it's probably only efficient if the new hash table will have significantly fewer items than the original one, and it definitely only works if you can change all the references to the original hash table, of course.
If your hash table gives you access to its collection of keys, you may be able to iterate through those and remove items from the hash table in one pass.
If your hash table or some helper in your library provides you with predicate-based collection modifiers, you may have a Remove() function to which you can pass a lambda expression or function pointer to identify the items to remove.
A common technique when time is a factor is to have a second table of deleted items, and clean up the main table when you have time. Commonly used in search engines.
How about enhancing the hash table to contain pointers like a linked list?
When you insert, if the bucket is full, create a pointer from this bucket to the bucket where the new field in stored.
While deleting something from the hashtable, the solution will be equivalent to how you write a function to delete a node from linkedlist.

Resources