So given an array and a window size, I need to find the second largest in every window. Brute force solution is pretty simple, but I want to find an efficient solution using dynamic programming
The brute force solution times out when I try it for big arrays, so I need to find a better solution. My solution was to find the second greatest in each sliding window by sorting them and getting the second element, I understand that some data structures can sort faster, but I would like to know if there are better ways.
There are many ways that you can solve this problem. Here are a couple of options. In what follows, I'm going to let n denote the number of elements in the input array and w be the window size.
Option 1: A simple, O(n log w)-time algorithm
One option would be to maintain a balanced binary search tree containing all the elements in the current window, including duplicates. Inserting something into this BST would take time O(log w) because there are only w total elements in the window, and removing an element would also take time O(log w) for the same reason. This means that sliding the window over by one position takes time O(log w).
To find the second-largest element in the window, you'd just need to apply a standard algorithm for finding the second-largest element in a BST, which takes time O(log w) in a BST with w elements.
The advantage of this approach is that in most programming languages, it'll be fairly simple to code this one up. It also leverages a bunch of well-known standard techniques. The disadvantage is that the runtime isn't optimal, and we can improve upon it.
Option 2: An O(n) prefix/suffix algorithm
Here's a linear-time solution that's relatively straightforward to implement. At a high level, the solution works by splitting the array into a series of blocks, each of which has size w. For example, consider the following array:
31 41 59 26 53 58 97 93 23 84 62 64 33 83 27 95 02 88 41 97
Imagine that w = 5. We'll split the array into blocks of size 5, as shown here:
31 41 59 26 53 | 58 97 93 23 84 | 62 64 33 83 27 | 95 02 88 41 97
Now, imagine placing a window of length 5 somewhere in this array, as shown here:
31 41 59 26 53 | 58 97 93 23 84 | 62 64 33 83 27 | 95 02 88 41 97
|-----------------|
Notice that this window will always consist of a suffix of one block followed by a prefix of another. This is nice, because it allows us to solve a slightly simpler problem. Imagine that, somehow, we can efficiently determine the two largest values in any prefix or suffix of any block. Then we could find the second-max value in any window as follows:
Figure out which blocks' prefix and suffix the window corresponds to.
Get the top two elements from each of those prefixes and suffixes (or just the top one element, if the window is sufficiently small).
Of those (up to) four values, determine which is the second-largest and return it.
With a little bit of preprocessing, we can indeed set up our windows to answer queries of the form "what are the two largest elements in each suffix?" and "what are the two largest elements in each prefix?" You can kinda sorta think of this as a dynamic programming problem, set up as follows:
For any prefix/suffix of length one, store the single value in that prefix/suffix.
For any prefix/suffix of length two, the top two values are the two elements themselves.
For any longer prefix or suffix, that prefix or suffix can be formed by extending a smaller prefix or suffix by a single element. To determine the top two elements of that longer prefix/suffix, compare the element used to extend the range to the top two elements and select the top two out of that range.
Notice that filling in each prefix/suffix's top two values takes time O(1). This means that we can fill in any window in time O(w), since there are w entries to fill in. Moreover, since there are O(n / w) total windows, the total time required to fill in these entries is O(n), so our overall algorithm runs in time O(n).
As for space usage: if you eagerly compute all prefix/suffix values throughout the entire array, you'll need to use space O(n) to hold everything. However, since at any point in time we only care about two windows, you could alternatively only compute the prefixes/suffixes when you need them. That will require only space O(w), which is really, really good!
Option 3: An O(n)-time solution using clever data structures
This last approach turns out to be totally equivalent to the above approach, but frames it differently.
It's possible to build a queue that allows for constant-time querying of its maximum element. The idea behind this queue - beginning with a stack that supports efficient find-max and then using it in the two-stack queue construction - can easily be generalized to build a queue that gives constant-time access to the second-largest element. To do so, you'd just adapt the stack construction to store the top two elements at each point in time, not just the largest element.
If you have a queue like this, the algorithm for finding the second-max value in any window is pretty quick: load the queue up with the first w elements, then repeatedly dequeue an element (shift something out of the window) and enqueue the next element (shift something into the window). Each of these operations takes amortized O(1) time to complete, so this takes time O(n) overall.
Fun fact - if you look at what this queue implementation actually does in this particular use case, you'll find that it's completely equivalent to the above strategy. One stack corresponds to suffixes of the previous block and the other to prefixes of the next block.
This last strategy is my personal favorite, but admittedly that's just my own data structures bias.
Hope this helps!
So just take a data structure as like as set which stores the data orderly.
like if you store 4 2 6 on the set it will store as 2 4 6.
So what will be the algorithm:
Let,
Array = [12,8,10,11,4,5]
window size =4
first window= [12,8,10,11]
set =[8,10,11,12]
How to get the second highest:
- Remove the last element from the set and store in a container. set=[8,10,11],contaniner = 12
- After removing, current last element of the set is the second largest of the current window.
- Again put the removed element stored in the container to the set,set=[8,10,11,12]
Now shift your window,
- delete 12 from the set and add 4.
- Now you will get the new window and set.
- check like the similar process.
Complexity of removing and adding element in a set is about log(n).
One tricks:
If you always want to store the data in decreasing order, then you can store the data by multiplying it by -1. And when you pop up the data, use it by multiplying it by -1.
We can use a double ended queue for an O(n) solution. The front of the queue will have larger (and earlier seen) elements:
0 1 2 3 4 5
{12, 8,10,11, 4, 5}
window size: 3
i queue (stores indexes)
- -----
0 0
1 1,0
2 2,0 (pop 1, then insert 2)
output 10
remove 0 (remove indexes not in
the next window from the front of
the queue.)
3 3 (special case: there's only one
smaller element in queue, which we
need so keep 2 as a temporary variable.)
output 10
4 4,3
output 10
remove 2 from temporary storage
5 5,3 (pop 4, insert 5)
output 5
The "pop" and "remove from front" are while A[queue_back] <= A[i] and while queue_front is outside next window respectively (the complication of only one smaller element left represented in the queue notwithstanding). We output the array element indexed by the second element from the front of the queue (although our front may have a special temporary friend that was once in the front, too; the special friend is dumped as soon as it represents an element that's either outside of the window or smaller than the element indexed by the second queue element from the front). A double ended queue has complexity O(1) to remove from either front or back. We insert in the back only.
Per templatetypedef's request in the comments: "how you determine which queue operations to use?" At every iteration, with index i, before inserting it into the queue, we (1) pop every element from the back of the queue that represents an element in the array smaller than or equal to A[i], and (2) remove every element from the front of the queue that is an index outside the current window. (If during (1), we are left with only one smaller or equal element, we save it as a temporary variable since it is the current second largest.)
There is a relatively simple dunamic programming O(n^2) solution:
Build the classic pyramid structure for aggregate value over a subset (the one where you combine the values from the pairs below to make each step above), where you track the largest 2 values (and their position), then simply keep the largest 2 values of the 4 combined values (which is less in practise due to overlap, use the position to ensure they are actually different). You then just read off the second largest value from the layer with the correct sliding window size.
Related
If i search in the middle of the list and it is sorted in ascending order like
10 20 30 40 50
then imagine i search "20" and then pick 30(in the middle)
10 20 30 40 50
and then i can move left in the doubly-linked-list so move left(pick 20)
10 20 30 40 50
Does it can be O(log n) because the searching time be halved?
The thing that makes binary search (which is what you're discussing) possible is the ability to get to any element in the collection in constant time, O(1).
This means that arrays are one possibility since you can convert the base and index into a memory location in O(1). For example, if the array is based at byte location 314159 and each element is seven bytes in size, element 42 can be found at 314159 + 42 * 7 = 314453.
A linked list, even a doubly linked one, cannot do this since, in order to find an element, you have to traverse from either the head or the tail until you get there (and, in fact, you don't know you've gotten there unless you traverse them all or have the count handy).
So, for a linked list, it's very much an O(n) method for finding a given element.
It’s possible only in linear O(n) time. Here is the whole explanation. https://stackoverflow.com/a/19554432/3457707 The main thing is you don’t have direct access to every element, so access time is not constant as in the array. So each time you have to iterate through elements, you cannot skip them as in the array.
I am studying Hash Table problems and I have one doubt about the delete operation in a linear probing problem. I need to remove the element 97 from the follow array
empty 37 97 50 49 38 empty empty empty 45 57 46
When I delete the element '97' from my array I will need to resize the array or I can just remove the element?
This is my doubt because when (in one previous exercise) I need to rezise the array for double of their capacity since N >= M/2
NEW DOUBT
When I delete one element from a linear probing hash table like in the example just imagine if the h(49) = 3. But since 50 is in array[3] i will search for the next empty spot that is array[4]. I put it there. Then I will delete the element 50. Now I need to search for the element 49 and I know that h(49) = 3 but when I look at array[3] is a NULL value and the result says to me that the element 49 does not exists. How can I resolve this situation?
You'd typically not resize a hashtable array when removing elements, as there's "nothing" gained in terms of access speed by having a smaller array, but you'd waste quite some CPU time re-hashing the entries and copying them from the old to the new array.
Only when the array gets very sparse, for memory economy it might become worth investing the CPU time to reduce the memory footprint of the hashtable.
That's different from adding to the hashtable, where a too densely populated array makes access slower, as more often you have to use linear search after the direct hash-based access. So, when adding to a hashtable, you'd typically resize the array when reaching some load factor.
A min-max heap can be useful to implement a double-ended priority queue because of its constant time find-min and find-max operations. We can also retrieve the minimum and maximum elements in the min-max heap in O(log2 n) time. Sometimes, though, we may also want to delete any node in the min-max heap, and this can be done in O(log2 n) , according to the paper which introduced min-max heaps:
...
The structure can also be generalized to support the operation Find(k) (determine the kth smallest value in the structure) in constant time and the operation Delete(k) (delete the kth smallest value in the structure) in logarithmic time, for any fixed value (or set of values) of k.
...
How exactly do I perform a deletion of the kth element on a min-max heap?
I don't consider myself an "expert" in the fields of algorithms and data structures, but I do have a detailed understanding of binary heaps, including the min-max heap. See, for example, my blog series on binary heaps, starting with http://blog.mischel.com/2013/09/29/a-better-way-to-do-it-the-heap/. I have a min-max implementation that I'll get around to writing about at some point.
Your solution to the problem is correct: you do indeed have to bubble up or sift down to re-adjust the heap when you delete an arbitrary node.
Deleting an arbitrary node in a min-max heap is not fundamentally different from the same operation in a max-heap or min-heap. Consider, for example, deleting an arbitrary node in a min-heap. Start with this min-heap:
0
4 1
5 6 2 3
Now if you remove the node 5 you have:
0
4 1
6 2 3
You take the last node in the heap, 3, and put it in the place where 5 was:
0
4 1
3 6 2
In this case you don't have to sift down because it's already a leaf, but it's out of place because it's smaller than its parent. You have to bubble it up to obtain:
0
3 1
4 6 2
The same rules apply for a min-max heap. You replace the element you're removing with the last item from the heap, and decrease the count. Then, you have to check to see if it needs to be bubbled up or sifted down. The only tricky part is that the logic differs depending on whether the item is on a min level or a max level.
In your example, the heap that results from the first operation (replacing 55 with 31) is invalid because 31 is smaller than 54. So you have to bubble it up the heap.
One other thing: removing an arbitrary node is indeed a log2(n) operation. However, finding the node to delete is an O(n) operation unless you have some other data structure keeping track of where nodes are in the heap. So, in general, removal of an arbitrary node is considered O(n).
What led me to develop this solution (which I'm not 100% sure is correct) is the fact that I've actually found a solution to delete any node in a min-max heap, but it's wrong.
The wrong solution is can be found here (implemented in C++) and here (implemented in Python). I'm going to present the just mentioned wrong Python's solution, which is more accessible to everyone:
The solution is the following:
def DeleteAt(self, position):
"""delete given position"""
self.heap[position] = self.heap[-1]
del(self.heap[-1])
self.TrickleDown(position)
Now, suppose we have the following min-max heap:
level 0 10
level 1 92 56
level 2 41 54 23 11
level 3 69 51 55 65 37 31
as far as I've checked this is a valid min-max heap. Now, suppose we want to delete the element 55, which in an 0-based array would be found at index 9 (if I counted correctly).
What the solution above would do is simply put the last element in the array, in this case 31, and put it at position 9:
level 0 10
level 1 92 56
level 2 41 54 23 11
level 3 69 51 31 65 37 55
it would delete the last element of the array (which is now 55), and the resulting min-max heap would look like this:
level 0 10
level 1 92 56
level 2 41 54 23 11
level 3 69 51 31 65 37
and finally it would "trickle-down" from the position (i.e. where now we have the number 31).
"tricle-down" would check if we're in an even (or min) or odd (or max) level: we're in an odd level (3), so "trickle-down" would call "trickle-down-max" starting from 31, but since 31 has no children, it stops (check the original paper above if you don't know what I'm talking about).
But if you observe that leaves the data structure in a state that is no more a min-max heap, because 54, which is at even level and therefore should be smaller than its descendants, is greater than 31, one of its descendants.
This made me think that we couldn't just look at the children of the node at position, but that we also needed to check from that position upwards, that maybe we needed to use "trickle-up" too.
In the following reasoning, let x be the element at position after we delete the element that we wanted to delete and before any fix operations has run. Let p be its parent (if any).
The idea of my algorithm is really that one, and more specifically is based on the fact that:
If x is on a odd level (like in the example above), and we exchange it with its parent p, which is on an even level, that would not break any rules/invariants of the min-max heap from the new x's position downwards.
The same reasoning (I think) can be done if the situation would be reversed, i.e., x was originally in a even position and it would be greater than its parent.
Now, if you noticed, the only thing that could need a fix is that, if x was exchange with its parent and it's now in a even (and respectively odd) position we may need to check if it's smaller (and respectively greater) than the node at the previous even (and respectively odd) level.
This of course didn't seem to be the whole solution to me, and of course I also wanted to check if the previous parent of x, i.e. p, is in a correct position.
If p, after the exchange with x, is on a odd (and respectively even) level, it means it could be smaller (and respectively greater) than any of its descendants, because it was previously in a even (and respectively odd) level. So, I thought we needed a "trickle-down" here.
Regarding the fact if p is in a correct position with respect to its ancestors, I think the reasoning would be similar to the one above (but I'm not 100% sure).
Putting this together I came up with the solution:
function DELETE(H, i):
// H is the min-max heap array
// i is the index of the node we want to delete
// I assume, for simplicity,
// it's not out of the bounds of the array
if i is the last index of H:
remove and return H[i]
else:
l = get_last_index_of(H)
swap(H, i, l)
d = delete(H, l)
// d is the element we wanted to remove initially
// and was initially at position i
// So, at index i we now have what was the last element of H
push_up(H, i)
push_down(H, i)
return d
This seems to work according to an implementation of a min-max heap that I made and that you can find here.
Note also that the solution run in O(log2 n) time, because we're just calling "push-up" and "push-down" which run in that order.
Am I correct in saying that there would be many ways to perform a Quick Sort?
For argument sakes, lets use the first textbook's numbers:
20 47 12 53 32 84 85 96 45 18
This book says to swap the 18 and 20 (in the book the 20 is red and the 18 is blue, so I've bolded the 20).
Basically it keeps moving the blue pointer until the numbers are:
18 12 20 53 32 84 85 96 45 47
Now it says (and this is obvious to me) that all the numbers to the left of the 20 are less than and all of the numbers to the right are greater than, but it never names the 20 as a "pivot", which is how most other resources talk about it. Then as all other methods state, it does a quick sort on two sides and then we end up with (it only covers sorting the right half of the list):
47 32 45 53 96 85 84 and the book ends. Now I know from the other resources that once all of the lists are in order they are put back together. I guess I understand this but am constantly confused by the one "Cambridge approved" textbook that differs from the second one. The second one talking about finding a pivot by picking the median.
What's the best way to find a "pivot" for a list?
What is given in your textbook is similar to the pivot-based concept except that they haven't mentioned this terminology over there. But,anyways the concepts are the same.
What's the best way to find a "pivot" for a list?
There's not a fixed way of selecting pivotal-element. You can select any of the element of the array---first,second,last,etc. It can also be randomly selected for a given array.
But, scientists and mathematicians generally talk about the median element which is the middle element of the list for the symmetry based reason,thereby reducing the recursive calls.
It is almost obvious that when you'll select the first or the last element of the array, there will be more number of recursive calls --- thereby moving closer to the worst case scenario. The more number of recursive calls will be framed to separately perform quick-sort on the two partitions.
Theoretically - choosing the median element as the pivot guarantees least number of recursive calls, and guarantees Theta(nlogn) running time.
However, finding this median is done with selection algorithm - and if you want to guarantee selection takes linear time - it needs median of medians algorithm, which has poor constants.
If you chose the first (or last) element as pivot - you are guaranteed to get poor performance for sorted or almost sorted array - which is pretty likely to be your input array in many applications - so that's not a good choice either. So choosing the first/last element of the array is actually a bad idea.
A good solid solution to chose pivot - is at random. Draw a random number from r = rand([0,length(array)), and chose the r'th element as your pivot.
While there is a theoretical possibility for worst case here - it is:
Very unlikely
Hard for mallicious user to predict what is the worst case input, especially if the random function and/or seed is unknown to him.
Let's say I have a list of objects that are sorted by a specific field on that object. If one of the objects changes that property, its position in the sorted list would need to be updated.
What sorting algorithm or "tricks" could I use to very quickly sort this list, given that it falls out of sort only one item at a time?
The data structure is an array, and I have direct access to the index of the changed item.
I am using Scala for this, but any general tips or pointers would be helpful too.
If the list is sorted, you can simply remove the element you're about to change from the list, and after changing it, you could "binary-insert" it, no? That would take an average of log(n) steps.
If you can, change from an array to a java.util.TreeMap: both removal and insertion will be log(n) operations: which will be faster than your O(1) access + O(n) re-insertion solution using an array.
Depending on whether the new value is larger than, or smaller than, the previous one, you could "bubble" it in place.
The pseudo-code would look something like this:
if new value larger than old value
then if new value is larger than next value in collection
then swap the value with the next value
iterate until value is not larger than next value
else if new value is smaller than previous value in collection
then swap the value with the previous value
iterate until value is not smaller than the previous value
Of course, a better way would be to use binary search.
First, locate the new spot in the collection where the element should be. Then, shift elements into place. If the new spot index is greater than the current spot index, you shift elements down one element, otherwise you shift them up. You shift elements starting from the spot you previously occupied, to the one you want to occupy. Then you store the value into the spot you found.
For instance, assume this collection:
a b c d e f g h i j
10 20 30 40 50 60 70 80 90 100
Then you want to change the value of the f element from 60 to 95.
First you figure out where it should be. Using binary search, we found that it should be between 90 and 100:
a b c d e f g h i j
10 20 30 40 50 60 70 80 90 100
^
+- here
Then you shift elements from the current position down one element, like this:
a b c d e f g h i j
10 20 30 40 50 60 70 80 90 100 <-- from this
10 20 30 40 50 70 80 90 ?? 100 <-- to this
And then you store the value into the ?? space, which gives you this formation
a b c d e g h i f j
10 20 30 40 50 70 80 90 95 100
If the list is really big and there are a large number of update operations expected, a simple random access array or linked list will be too slow. If you use arrays/linked lists, each update operation will cost O(n). With small lists and/or small number of updates this is adequate.
For larger lists, you can achieve O(log(n)) updates by using a sorted O(log(n)) data structure (AVL/RB-trees, Skip-Lists,segment trees etc.). A simple implementation can involve removing the element to be updated, changing the value and then reinserting it. Many popular languages have some sort of sorted data structure in their library (e.g. TreeMap/TreeSet in Java, multiset/multimap in C++'s STL), or you can easily find a free implementation for your language.
Moving the unsorted element to left or right in the list seems the optimal solution
For an array, inserting an element into the correct position would be O(n), because you need to copy the array elements to make room for an extra element. You could find the index you need to insert at by doing a binary search (O(log n)) or linear search (O(n)). Whatever choice you make, the algorithm as a whole will be O(n).
The only way to do this very quickly is to use a data structure that's better suited to this situation: a binary search tree. Insertion would be O(log n) if the tree remains decently balanced (Use a self-balancing binary search tree to ensure this, or hope your data will not be inserted in a highly regular order to approximate O(log n).)
O(log n) is way faster than O(n) for even moderately large lists, so if you have lists that may be nearly arbitrarily large and really care about sorting performance, use a binary search tree.
You could just do one iteration of bubble sort: start from the beginning of the list, and iterate until you find the out of order element. Then move it in the appropriate direction until it falls in place. This will give you at worst 2N performance.
Remove the one item, and re-add it into the correct position. IF you are only doing one item, your max run-time is N.
If you are doing more than one, you should wait till they are all done and then resort. But you'll need to tell us a lot more about your problem space. Quick is contrained by memory and other factors which you'll need to determine to pick the right algorithm.