Scala Update first element of collection - performance

I need a collection that is efficient when pre-appending and returning the first element. A stack does the job well.
Now, the objects of the stacks is also a collection. Let's say it is a list. So I have a Stack of Lists.
My question is the following:
If I want to append an element to the list of the head of the Stack, is my only choice to: pop the head, add the element to the list and push the new list?
Efficiently wise, all times are constant, correct?

If you're working with immutable collections the list you'd add and the outer stack(or List) would be new as well so it doesn't matter. Still lists are efficient in prepend and head operations (you can check doc for Collection - performance characteristics for details on the costs of operations)

Related

Why Use A Doubly Linked List and HashMap for a LRU Cache Instead of a Deque?

I have implemented the design a LRU Cache Problem on LeetCode using the conventional method (Doubly Linked List+Hash Map). For those unfamiliar with the problem, this implementation looks something like this:
I understand why this method is used (quick removal/insertion at both ends, fast access in the middle). What I am failing to understand is why someone would use both a HashMap and a LinkedList when one could simply use a array-based deque (in Java ArrayDeque, C++ simply deque). This deque allows for ease of insertion/deletion at both ends, and quick access in the middle which is exactly what you need for an LRU cache. You also would use less space because you wouldn't need to store a pointer to each node.
Is there a reason why the LRU cache is almost universally designed (on most tutorials at least) using the latter method as opposed to the Deque/ArrayDeque method? Would the HashMap/LinkedList method have any benefits?
When an LRU cache is full, we discard the Least Recently Used item.
If we're discarding items from the front of the queue, then, we have to make sure the item at the front is the one that hasn't been used for the longest time.
We ensure this by making sure that an item goes to the back of the queue whenever it is used. The item at the front is then the one that hasn't been moved to the back for the longest time.
To do this, we need to maintain the queue on every put OR get operation:
When we put a new item in the cache, it becomes the most recently used item, so we put it at the back of the queue.
When we get an item that is already in the cache, it becomes the most recently used item, so we move it from its current position to the back of the queue.
Moving items from the middle to the end is not a deque operation and is not supported by the ArrayDeque interface. It's also not supported efficiently by the underlying data structure that ArrayDeque uses. Doubly-linked lists are used because they do support this operation efficiently.
The purpose of an LRU cache is to support two operations in O(1) time: get(key) and put(key, value), with the additional constraint that least recently used keys are discarded first. Normally the keys are the parameters of a function call and the value is the cached output of that call.
Regardless of how you approach this problem we can agree that you MUST use a hashmap. You need a hashmap to map a key already present in the cache to the value in O(1).
In order to deal with the additional constraint of least recently used keys being discarded first you can use a LinkedList or ArrayDeque. However since we don't actually need to access the middle, a LinkedList is better since you don't need to resize.
Edit:
Mr. Timmermans discussed in his answer why ArrayDeques cannot be used in an LRU cache due to the necessity of moving elements from the middle to the end. With that being said here is an implementation of an LRU cache that successfully submits on leetcode using only appends and poplefts in the deque. Note that python's collections.deque is implemented as a doubly linked list, however we are only using operations in collections.deque that are also O(1) in a circular array, so the algorithm stays the same regardless.
from collections import deque
class LRUCache:
def __init__(self, capacity: 'int'):
self.capacity = capacity
self.hashmap = {}
self.deque = deque()
def get(self, key: 'int') -> 'int':
res = self.hashmap.get(key, [-1, 0])[0]
if res != -1:
self.put(key, res)
return res
def put(self, key: 'int', value: 'int') -> 'None':
self.add(key, value)
while len(self.hashmap) > self.capacity:
self.remove()
def add(self, key, value):
if key in self.hashmap:
self.hashmap[key][1] += 1
self.hashmap[key][0] = value
else:
self.hashmap[key] = [value, 1]
self.deque.append(key)
def remove(self):
k = self.deque.popleft()
self.hashmap[k][1] -=1
if self.hashmap[k][1] == 0:
del self.hashmap[k]
I do agree with Mr. Timmermans that using the LinkedList approach is preferable - but I want to highlight that using an ArrayDeque to build an LRU cache is possible.
The main mixup between myself and Mr. Timmermans is how we interpreted capacity. I took capacity to mean caching the last N get / put requests, while Mr. Timmermans took it to mean caching the last N unique items.
The above code does have a loop in put which slows the code down - but this is just to get the code to conform to caching the last N unique items. If we had the code cache the last N requests instead, we could replace the loop with:
if len(self.deque) > self.capacity: self.remove()
This will make it as fast if not faster than the linked-list variant.
Regardless of what maxsize is interpreted as, the above method still works as an LRU cache - least recently used elements get discarded first.
I just want to highlight that the designing an LRU cache in this manner is possible. The source is right there - try to submit it on Leetcode!
Doubly linked list is the implementation of the queue. Because doubly linked lists have immediate access to both the front and end of the list, they can insert data on either side at O(1) as well as delete data on either side at O(1). Because doubly linked lists can insert data at the end in O(1) time and delete data from the front in O(1) time, they make the perfect underlying data structure for a queue. Queeus are lists of items in which data can only be inserted at the end and removed from the beginning.
Queues are an example of an abstract data type, and that we are able to use an array to implement them under the hood. Now, since queues insert at the end and delete from the beginning, arrays are only so good as the underlying data structure. While arrays are O(1) for insertions at the end, they’re O(N) for deleting from the beginning. A doubly linked list, on the other hand, is O(1) for both inserting at the end and for deleting from the beginning. That’s what makes it a perfect fit for serving as the queue’s underlying data structure.
Pyhon deque uses a linked list as part of its data structure. This is the kind of linked list it uses. With doubly linked lists, deque is capable of inserting or deleting elements from both ends of a queue with constant O(1) performance. pyhton-deque

Implement an efficient stack without pointers?

So, I'm working in an environment where pointers are non-existent (or at least, inaccessible), and I'm trying to efficiently implement a stack. I have a stack implementation working, but it's O(n), which of course isn't as efficient as the usual O(1) you get with pointer-based stacks. I just can't figure out a better way to implement this.
Some important background of the limitations of this environment: there's a global array of instances of a class called Entity; variables can only store signed integers; and there's no method of using pointers or even creating new arrays. Super limited.
Entities have members for (x,y,z) coordinates, a map of strings to integers for arbitrary data storage (of integers, at least), and a list of strings for arbitrary string storage. The environment provides no way of comparing two strings, except by comparing them to hard-coded values, and it provides no native way of comparing two integers, unless one is hard-coded; so to compare two variable integers, you have to subtract them and compare to 0 (very Assembly-like in that regard).
The implementation I have now adds a new Entity instance to the list for each entry in the stack, storing its value and index in its map with the keys Value and Index (I know, original). Whenever a value is pushed onto the stack, I iterate through the list and increment the Index of each existing Entity, then create a new Entity with an Index of 0. When it's popped, I iterate through the list, find the one with Index=0, and copy that value; I decrement the Index of every non-zero Entity I find on that list.
It works perfectly, but of course that's O(n) for both pushing and popping. Even if I were to track the head Index somewhere, the only way to find the entry with the matching Index would be to subtract the head Index from all the entries first, which is still O(n).
Is there any way to do this more efficiently than O(n) without access to pointers or even additional arrays? Or is this the best that can be done with these restrictions?

How to implement a collection that supports real-time filtering?

I want to implement a mutable sequential collection FilteredList that wraps another collection List and filters it based on a predicate.
Both the wrapped List and the exposed FilteredList are mutable and observable, and should be synchronized (so for example, if someone adds an element to List that element should appear in the correct position in FilteredList, and vice versa).
Elements that don't satisfy the predicate can still be added to FilteredList, but they will not be visible (they will still appear in the inner list).
The collections should support:
Insert(index,value) which inserts an element value at position index, pushing elements forward.
Remove(index) which removes the element at position index, moving all proceeding elements back.
Update(index, value), which updates the element at position index to be value.
I'm having trouble coming up with a good synchronization mechanism.
I don't have any strict complexity bounds, but real world efficiency is important.
The best way to avoid synchronization difficulties is to create a data structure that doesn't need them: use a single data structure to present the filtered and unfiltered data.
You should be able to do that with a modified skip list (actually, an indexable skip list), which will give you O(log n) access by index.
What you do is maintain two separate sets of forward pointers for each node, rather than just one set. The one set is for the unfiltered list, as in the normal skip list, and the other set is for the filtered list.
Adding to or removing from the list is the same for the filtered and unfiltered lists. That is, you find the node at index by following the appropriate filtered or unfiltered links, and then add or remove the node, updating both sets of link pointers.
This should be more efficient than a standard sequential list, because insertion and removal don't incur the cost of moving items up or down to make a hole or fill a gap; it's all done with references.
It takes a little more space per node, though. On average, skip list requires two extra references per node. Since you're building what is in effect two skip lists in one, expect your nodes to require, on average, four extra references per node.
Edit after comment
If, as you say, you don't control List, then you still maintain this dual skip list that I described. But the data stored in the skip list is just the index into List. You said that List is observable, so you get notification of all insert and delete operations, so you should be able to maintain an index by reacting to all notifications.
When somebody wants to operate on FilteredList, you use the filtered index links to find the List index of the FilteredList record the user wanted to affect. Then you pass the request onto List, using the translated index. And then you react to the observable notification from List.
Basically, you're just maintaining a secondary index into List, so that you can translate FilteredList indexes into List indexes.

Linked lists sharing tails?

Is there a library implementation or way to define singly linked lists in C++ that have a common tail from some element onwards? I'd like to be able to append a linked list to another, so that appending would just equal changing the final next pointer of one of the lists to point to the head of the other. Similarly, I would like be able to prepend an element to a list returning a new list where the head is just one step before the head of the other.
This would allow me to trivially get memory consumption down to linear from quadratic in a piece of code that I have... I.e. use the standard way of handling lists that all functional languages internally do.

Heap-like data structure with fast random access?

My situation is the following:
I have a collection of entities, each of which has a "goodness" property.
I wish to grab the entities one at a time, from "best" to "worst."
After a "best" entity is grabbed, the "goodness" properties of several (relatively few) of my other entities change, and this change must be incorporated into my upcoming decision of the next "best" entity to grab.
Some (relatively few) entities may become "worthless" after a grab, and these should be removed from my collection.
It is easy for me to construct, given the entity that I just grabbed, the set of now-"dirty" objects, that is, the set of entities which potentially have a now-different "goodness," or have become "worthless."
So, I need a data structure that allows me to:
Quickly grab the "biggest" of a collection (as in, a max-heap).
Quickly update the underlying ordering of the objects in my collection to accommodate the situation described above. (Easy to do in a heap, if we can access the dirty objects' locations, e.g. array indices, within the underlying heap implementation.)
There is a guarantee that there are no collisions among the entries of my collection. (The entries are references to the entities I described above.)
The idea I have is to use a max-heap together with an unordered map, keyed on the heap entries, and having values equal to, e.g., the objects' respective indices in the underlying array in the heap implementation.
What I'm wondering is whether there may be a data structure which is better for this situation.
If few members are affected when the best entity is grabbed, then you might be able to improve the runtime by using a linked list and an unordered map (each with the original set of entities), and a max heap. After removing the best entity from the end of the linked list you'll use the map to locate the affected entities, removing them from the list and adding the non-worthless entities to the max heap. Thereafter, the next best entity is the greater of the entity at the end of the list or the max entity in the heap. The advantage of this setup is that removal from the linked list is a constant time operation, and insertion into the max heap will be a relatively small (compared to the total number of entities) log time operation.
Because entities' values can only get worse, you can lazily remove them from the linked list - if the item is worthless then remove it, and if its value has changed then flag it as "changed." Check the "changed" flag on the entity at the end of the linked list, and if it's "true" then remove the entity and add it to the max-heap. The advantage of lazy updates is that you usually won't need to update items that are in the heap (you'll just need to update the value of items in the linked list), and if an item is changed and then later made worthless then you can remove it from the linked list without ever having to add it to the heap.

Resources