I just got this question from a textbook exercise saying
"Stack and Queue ADT's can be implemented using array. Which one is simpler to implement using an array? Explain"
I think using an array is probably not the best way to implement both a stack and queue in the first place because of the fixed space in an array, unless it is resized after each overflow of item.
I do not have a perfect response to this but which one of them is simpler to implement using arrays?
The only difference that I can think of is that with a stack, you only have to keep track of the front of the stack in the array, while with a queue you will need of keep track of both the front and end of the queue.
"Keep track of" means "storing an array index/offset for".
Other than that, the standard operations on stacks and queues are fairly similar in number; push(), pop() for stacks, and enqueue(), dequeue() for queues, and neither data type is particularly complex or difficult to implement.
A stack would be better implemented as an array compared to a queue, mainly because of how the types of operations affect the array itself.
Queue
For a queue data structure, you need to be able to remove elements from one end and push elements into the other. When you have an array, adding or removing an element from the front of the array is relatively bad because it involves you having to shift every other element to accommodate the new one.
queue: [2, 3, 4, 5, 6]
enqueue: 1
queue: [1, 2, 3, 4, 5, 6] (every element had to shift to fit 1 in the front)
or if you oriented your queue the opposite way,
queue: [1, 2, 3, 4, 5, 6]
dequeue: 1
queue: [2, 3, 4, 5, 6] (every element had to shift when 1 was removed from the front)
So no matter which direction you orient your queue, you will always have some operation (enqueue or dequeue) which involves adding/removing an element from the front of the array, which in turn causes every other element to shift which is relatively inefficient (would be great to avoid, and is why most queues aren't implemented with an array).
Stack
With a stack data structure, you only need to add and remove elements from the same end. This allows us to avoid the problem we were having with adding/removing elements from the front of the array. We just need to orient our stack to add and remove elements from the back of the array, and we will not encounter the problem with having to shift all the elements when something is added or removed.
stack: [1, 2, 3, 4]
push: 5
stack: [1, 2, 3, 4, 5] (nothing had to be shifted)
pop:
stack: [1, 2, 3, 4] (nothing had to be shifted)
Yes, It is obvious that array is not the best to implement queue or stack in data structure for real life problems.
I think, Implementation of a stack is always easier than the implementation of a queue because in stack we just have to push the element on the highest index and pop the same element from the same index. And if we want to push another element than we will push it on the same index. Every operation performs on the same index.
But in the case of a queue, there are two indices to trace one from which we have to dequeue the element and another index for the operation enqueue.
We have to update the indices for their corresponding operations(i.e. front when deque and end when enqueue).
Related
I have a ragged array represented as a contiguous block of memory along with its "shape", corresponding to the length of each row, and its "offsets", corresponding to the index of the first element in each row. To illustrate, I have an array conceptually like this:
[[0, 0, 0],
[1, 1, 1, 1],
[2],
[3, 3],
[4, 4]]
Represented in-memory as:
values: [0, 0, 0, 1, 1, 1, 1, 2, 3, 3, 4, 4]
shape: [3, 4, 1, 2, 2]
offsets: [0, 3, 7, 8, 10]
I may have in the hundreds of millions of rows with typically, say, 3-20 four-byte floats per row, though with no hard upper bound on the row length.
I wish to shuffle the rows of this array randomly. Since the array is ragged, I can't see how the Fisher-Yates algorithm can be applied in a straightforward manner. I see how I can carry out a shuffle by randomly permuting the array shape, pre-allocating a new array, and then copying over rows according to the permutation generating the new shape with some book-keeping on the indexes. However, I do not necessarily have the RAM required to duplicate the array for the purposes of this operation.
Therefore, my question is whether there is a good way to perform this shuffle in-place, or using only limited extra memory? Run-time is also a concern, but shuffling is unlikely to be the main bottleneck.
For illustration purposes, I wrote a quick toy-version in Rust here, which attempts to implement the shuffle sketched above with allocation of a new array.
shape is redundant since shape[i] is offset[i+1]-offset[i] (if you extend offset by one element containing the length of the values array). But since your data structure has both these fields, you could shuffle your array by just in-place shuffling the two descriptor vectors (in parallel), using F-Y. This would be slightly easier if shape and offset were combined into an array of pairs (offset, length), which also might improve locality of reference, but it's certainly not critical if you have some need for the separate arrays.
That doesn't preserve the contiguity of the rows in the values list, but if all your array accesses are through offset, it will not require any other code modification.
It is possible to implement an in-place swap of two variable-length subsequences using a variant of the classic rotate-with-three-reversals algorithm. Given P V Q, a sequence conceptually divided into three variable length parts, we first reverse P, V, and Q in-place independently producing PR VR QR. Then we reverse the entire sequence in place, yielding Q V P. (Afterwards, you'd need to fixup the offsets array.)
That's linear time in the length of the span from P to Q, but as a shuffle algorithm it will add up to quadratic time, which is impractical for "hundreds of millions" of rows.
As often happens, I started with a complex idea and then simplified it. Here is the simple version, with the complex one below.
What we're going to do is quicksort it into a random arrangement. The key operation is partitioning. That is we want to take a section of m blocks and randomly partition them into m_l blocks on the left and m_r on the right.
The idea here is this. We have a queue of temporarily copied blocks on the left, and a queue of temporarily copied blocks on the right. It is a queue of blocks, but the queue size is the number of elements in it. The partitioning logic looks like this:
while m_l + m_r < m:
pick the larger queue, breaking ties randomly
if the queue is empty:
read a block into it
get block from queue
if random() < m_l / (m_l + m_r):
# put the block on the left
until we have enough room to write the block:
copy blocks into the left queue
write block to the left
m_l--
else:
# put the block on the right
until we have enough room to write the block:
copy blocks into the right queue
write block to the right
m_r--
And now we need to recursively repeat until we've quicksorted it into a random order.
Note that, unlike with a regular quicksort, we are constructing our partitions to be exactly the size we expect. So we don't have to recurse. Instead we can:
# pass 1
partition whole array
# pass 2
partition 0..(m//2-1)
partition (m//2)..(m-1)
# pass 3
partition 0..(m//4-1)
partition (m//4)..(m//2-1)
partition (m//2)..((3*m)//4-1)
partition ((3*m)//4)..(m-1)
etc.
The result is time O(n * log(m)). And the queues will never get past 5k data where k is the largest block size.
Here is an approach that we can calculate in time O(n log(n)). The maximum space needed is O(k) where k is the maximum block size.
First note, shape and offsets are largely redundant. Because shape[i] = offset[i+1] - offset[i] for all i but the last. So with O(1) extra data (which we already have in values.len()) we can make shape redundant, then (ab)use it, however we want, and then calculate it at the end.
So let's start by picking a random permutation of 0..(shape.len()-1) and placing it in shape. This will be where each element will go, and can be found in time O(n) using Fisher-Yates.
Our idea now is to use quicksort to actually get them to the right places.
First, our pivot. For O(n) work in a single pass we can add up the lengths of all blocks which will come before the median block, and also find the length of said median block.
Now quicksort is dependent upon being able to swap things. But we can't do that directly (your whole problem). However the idea is that we'll partition from the middle out. And so the values, shape and offsets arrays will have beginning and ending sections that we haven't gotten to, and a piece in the middle that we've rewritten. Where those sections meet we'll also need to have queues of blocks copied off of the left and right and not yet written back. And, of course, we'll need to have a record of where the boundaries are.
So the idea is this.
set up the data structures.
copy a few blocks in the middle to one of the queues - enough to have a place for the median block to go.
record where the median will go
while have not finished partitioning:
pick the larger queue (break ties by farthest from its end, then randomly)
if it is empty:
read a block into it
figure out where its next block needs to be written
copy blocks in its way to the appropriate queue
write the block out
Where writing the block out means writing its elements to the right place, setting its offset, and setting its shape to the still final location for that block.
This operation will partition around the median block. Recursively repeat to sort each side into blocks being in their final position.
And, finally, fix the shape array back to what it was supposed to be.
The time complexity of this is O(n log(n)) for the same reason that quicksort is. As for space complexity, if k is the largest block size, any time the queues get past size 4k then the next block you extract must be able to be written, so they cannot grow any farther. This makes the maximum space used O(k).
I am taking the Algorithms course in coursera. One of the assignments is the following:
Randomized queue. A randomized queue is similar to a stack or queue,
except that the item removed is chosen uniformly at random among items
in the data structure.
I am trying to find a way to implement dequeue (randomly removing an item) in a constant amount of time. I have thought of an idea to do this recquiring a deque (which supports removing and adding an item from the front and back in constant time). My idea is as follows:
Use a deque as an underlying data structure inside the randomized queue
Enqueue - use a library function to generate an integer between 0 and 1 inclusive. If the integer is 0, add the item to the front of the deque. Otherwise, add it to the back.
Dequeue - any direction would be fine.
The reason why the randomness happens in enqueue rather than in dequeue is because I find it to be not exactly random (E.G. n calls to enqueue will have dequeue only return either the first or nth item). So to be sure the items are being removed randomly, I decided to enqueue them randomly.
This idea seems good to me because I cannot find holes in it, but the problem is I cannot prove that my idea would really work. I don't know much about randomness. In fact, this is only the 5th time I am working with random data structures.
Is my idea correct? Will it generate a data structure that removes items at random?
Enqueueing only at the ends does not produce a uniformly random sequence. The last item to be enqueued is necessarily at either ends, and the first item to be enqueued is more likely to be somewhere in the middle than at either ends after enqueueing other items.
To illustrate, take the set of three items {1, 2, 3}, the smallest set that does not result in a uniform distribution. Enqueueing them in that order gives the following possible results (in parenthesis is where to enqueue the next item).
[1] -> (front) -> [1, 2] -> (front) -> [1, 2, 3]
[1] -> (front) -> [1, 2] -> (back) -> [3, 1, 2]
[1] -> (back) -> [2, 1] -> (front) -> [2, 1, 3]
[1] -> (back) -> [2, 1] -> (back) -> [3, 2, 1]
These four results are the only possibilities and are all equally likely. And as you can see, the last item is never in the middle while both the first and second items are in the middle twice.
What you want is to dequeue at a random place. But you don't need to preserve the order of other items, since they are uniformly distributed. That means you can just swap the last item with a random one, and then dequeue that one (which became the last item).
I don't think your proposed approach will work because of the uniformity requirement. Uniformity means that every item in the queue has an equal likelihood of being dequeued. Your proposal always adds elements to one of the ends, and dequeues from one end or the other. Consequently, on any given dequeue request the non-end elements have zero probability of being selected.
An alternative might be to use an array-based stack. Add elements at the end of the stack, but for dequeueing choose an element at random, swap it with the last element, and then pop it. That will have uniformity of selection, and all of the component operations are constant time.
I have a linked list as 1->2->3->4->5->6
I need to change it to 1->6->2->5->3->4
i.e.last element linked to first element, last second element linked to the second element and so on.
I used 2 pointers, one fast and one slow. Once I reach the center, I put all the elements in the second half in a stack. [4, 5, 6]
Now, using a third pointer, I traverse the original linked list and insert node from the stack i.e. pop [6, 5, 4]
Is there any better solution than this?
I think this is optimal.
I use 2 pointers. One slow, one jump at a time and other fast, 2 jumps at a time.
Hence, I find the center and also the mid count.
Now, from the center till the end I reverse the linked list.
I have 2 linked lists now, one from start to center and other reversed from center to end.
Simply take one element from list 1 and link the element from link 2 to this and increment both the lists.
No extra space required and time complexity is O(N)
On many occasions, we need to perform two or more different operations on an array like flatten and compact.
some_array.flatten.compact
My concern here is that it will loop over the array two times. Is there more efficient way of doing this?
I actually think this is a great question. But first off, why is everyone not too concerned about this? Here's the performance of flatten and flatten.compact compared:
Here's the code I used to generate this chart, and one that includes memory.
Hopefully now you see why most folks won't worry: it is just another constant factor you're adding when you compose a flatten with a compact, maybe it's valuable at least theoretically to say: how can we shave off the time and space of this intermediate structure? Again, asymptotically not super valuable, but curious to think about.
As far as I can tell, you can't do this by making use of flatten:
Before looking at the source, I hoped that flatten could take a block like so:
[[3, [3, 3, 3]], [3, [3, 3, 3]], [3, [3, 3, 3]], nil].flatten {|e| e unless e.nil? }
No dice though. We get this as a return:
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, nil]
This is weird in that it basically tosses the block away as a no-op. But it makes sense with the source. The C method flatten used in Ruby's core isn't parameterized to take a block.
The procedure in the Ruby source code reads kinda weird to me (I am not a C programmer) but it's basically doing something like depth first search. It's using a stack that it adds every new nested array in to process that it encounters. (It terminates when none remain.) I've not calculated this formally, but it leads me to guess the complexity is on par with DFS.
So the source code could've been written such this would work by allowing for extra setup if a block is passed in. But without that, you're stuck with the (small) performance hit!
It is not iterating over the same array two times. flatten creates in general an array that has an entirely different structure from the original one. Therefore, the first and the second iteration are not iterating over the same elements. So, it naturally follows that you cannot do that.
If the array is one layer deep, then the arrays can be merged in a set.
require 'set'
s = Set.new
Ar.each{|a| s.merge(a)}
I was reading Tortoise and hare algorithm from wikipedia. I am wondering whether the python pseudocode is wrong. It seems to fail for the array: [1, 2, 2, 3, 4, 5, 6, 7, 8, 9, ....] as at the very beginning, the two values meet and the algorithm continues to find the start of the cycle which is doomed to failure.
I understand that there is condition of i ≥ μ, should this constraint be added to the code for finding the start of the cycle?
If this constraint is added, should the algorithm terminate and return no cycle found when failed or continue for another iteration? Since what if the input is [1, 2, 2, 3, 4, 5, 3, 4, 5, 3, 4, 5, ....] ?
How does this algorithm guarantee that at the first meeting point, both pointers are inside some cycles?
The tortoise and hare algorithm runs two pointers, one at offset i and the other at offset 2i, both one-based, so initially 1 and 2, and is meant to detect cycles in linked-list-style data structures.
And, just to be clear, it compares the pointers rather than data values they point to, I'm unsure whether you understand that so, on the off-chance you didn't, I just thought I'd mention it.
The initial start point is to have the tortoise on the first element and the hare on the second (assuming they exist of course - if they don't, no loop is possible), so it's incorrect to state that they're equal at the start. The pointer value can only ever become equal if the hare cycles and therefore catches the tortoise from behind.