I have been learning about sorting.
Most of sorting algorithms(merge, quick, etc) utilise arrays.
I was thinking what if I did not sort an array in place.
An algorithm I thought of is
Iterate through each element in array - O(n).
For each element, compare the element with starting and ending element of a doubly linked list.
Add the element to correct position in linked list. (Start iterating from start/end of list based on which one would be faster).
When all elements in original array are sorted, create background thread which copies them into array. Until copy is not done, return index element by iterating over list.
When copy is done, return elements through array index.
Now, what would be time complexity of this and how do I calculate it?
Let's go through everything one step at a time.
Iterate through each element in array - O(n).
Yep!
For each element, compare the element with starting and ending element of a doubly linked list.
Add the element to correct position in linked list. (Start iterating from start/end of list based on which one would be faster).
Let's suppose that the doubly-linked list currently has k elements in it. Unfortunately, just by looking at the front and back element of the list, you won't be able to tell where in the list the element is likely to go. It's quite possible that your element is closer in value to the front element of the list than the back, but would actually belong just before the back element. You also don't have random access in a linked list, so in the worst case you may have to scan all k elements of the linked list trying to find the spot where this element belongs. That means that the work done is in the worst case going to be O(k). Now, each iteration of the algorithm increases k (the number of elements in the list) by one, so the work done is in the worst case 1 + 2 + 3 + ... + n = Θ(n2).
When all elements in original array are sorted, create background thread which copies them into array. Until copy is not done, return index element by iterating over list.
When copy is done, return elements through array index.
This is an interesting idea and it's hard to measure the complexity. If the background thread gets starved out or is really slow, then the cost of looking up any element will be O(n) in the worst case because you may have to scan over half the elements in the list to find the one you're looking for.
In total, your algorithm runs in time O(n2) and uses Θ(n) memory. It's essentially a variant of insertion sort (as #Yu Hao) pointed out and, in practice, I'd expect that this would be substantially slower than just using a standard O(n log n) sorting algorithm, or even an in-place insertion sort, due to the extra memory overhead and poor locality of reference afforded by linked lists.
The algorithm you describe is basically a variant version of Insertion sort.
The major reason of using a linked list here is to avoid the extra swap of elements in arrays. Comparing elements with both the head and tail of doubly linked list provides minor performance improvmenet, if any.
The time complexity is still O(N2) for random input.
Related
Write four O(1)-time procedures to insert elements into and delete elements from both ends of a deque constructed from an array.
In my implementation I have maintained 4 pointers front1,rear1,front2,rear2.
Do you have any other algorithm with less pointers and O(1) complexity ? Please explain.
There are two common ways to implement a deque:
Doubly linked list: You implement a doubly linked list, and maintain pointers to the front and the end of the list. It is easy to both insert and remove the start/end of the linked list in O(1) time.
A circular dynamic array: In here, you have an array, which is treated as circular array (so elements in index=arr.length-1 and index=0 are regarded as adjacent).
In this implementation you hold the index number of the "head", and the "tail". Adding element to the "head" is done to index head-1 (while moving the head backward), and adding element to the tail is done by writing it to index tail+1.
This method is amortized O(1), and has better constants then the linked list implementation. However, it is not "strict worst case" O(1), since if the number of elements exceeds the size of the array, you need to reallocate a new array and move elements from the old one to the new one. This takes O(n) time (but needs to be done after at least O(n) operations), and thus it is O(1) amortized analysis, but can still fall to O(n) from time to time.
I'm trying to balance a set of (Million +) 3D points using a KD-tree and I have two ways of doing it.
Way 1:
Use an O(n) algorithm to find the arraysize/2-th largest element along a given axis and store it at the current node
Iterate over all the elements in the vector and for each, compare them to the element I just found and put those smaller in newArray1, and those larger in newArray2
Recurse
Way 2:
Use quicksort O(nlogn) to sort all the elements in the array along a given axis, take the element at position arraysize/2 and store it in the current node.
Then put all the elements from index 0 to arraysize/2-1 in newArray1, and those from arraysize/2 to arraysize-1 in newArray2
Recurse
Way 2 seems more "elegant" but way 1 seems faster since the median search and the iterating are both O(n) so I get O(2n) which just reduces to O(n). But then at the same time, even though way 2 is O(nlogn) time to sort, splitting up the array into 2 can be done in constant time, but does it make up for the O(nlogn) time for sorting?
What should I do? Or is there an even better way to do this that I'm not even seeing?
How about Way 3:
Use an O(n) algorithm such as QuickSelect to ensure that the element at position length/2 is the correct element, all elements before are less, and all afterwards are larger than it (without sorting them completely!) - this is probably the algorithm you used in your Way 1 step 1 anyway...
Recurse into each half (except middle element) and repeat with next axis.
Note that you actually do not need to make "node" objects. You can actually keep the tree in a large array. When searching, start at length/2 with the first axis.
I've seen this trick being used by ELKI. It uses very little memory and code, which makes the tree quite fast.
Another way:
Sort for each of the dimensions: O(K N log N). This will be performed only once, we will utilize the sorted list on the dimensions.
For the current dimension, find the median in O(1) time, split for the median in O(N) time, split also the sorted arrays for each of the dimensions in O(KN) time, and recurse for the next dimension.
In that way, you will perform sorts at the beginning. And perform (K+1) splits/filterings for each subtree, for a known value. For small K, this approach should be faster than the other approaches.
Note: The additional space needed for the algorithm can be decreased by the tricks pointed out by Anony-Mousse.
Notice that if the query hyper-rectangle contains many points (all of them for example) it does not matter if the tree is balanced or not. A balanced tree is useful if the query hyper-rects are small.
A sorting algorithm is stable if it preserves the relative order of any two elements with equals keys. Under which conditions is quicksort stable?
Quicksort is stable when no item is passed unless it has a smaller key.
What other conditions make it stable?
Well, it is quite easy to make a stable quicksort that uses O(N) space rather than the O(log N) that an in-place, unstable implementation uses. Of course, a quicksort that uses O(N) space doesn't have to be stable, but it can be made to be so.
I've read that it is possible to make an in-place quicksort that uses O(log N) memory, but it ends up being significantly slower (and the details of the implementation are kind of beastly).
Of course, you can always just go through the array being sorted and add an extra key that is its place in the original array. Then the quicksort will be stable and you just go through and remove the extra key at the end.
The condition is easy to figure out, just keep the raw order for equal elements. There is no other condition that has essential differences from this one. The point is how to achieve it.
There is a good example.
1. Use the middle element as the pivot.
2. Create two lists, one for smaller, the other for larger.
3. Iterate from the first to the last and put elements into the two lists. Append the element smaller than or equals to and ahead of the pivot to the smaller list, the element larger than or equals to and behind of the pivot to the larger list. Go ahead and recursive the smaller list and the larger list.
https://www.geeksforgeeks.org/stable-quicksort/
The latter 2 points are both necessary. The middle element as the pivot is optional. If you choose the last element as pivot, just append all equal elements to the smaller list one by one from the beginning and they will keep the raw order in nature.
You can add these to a list, if that's your approach:
When the elements have absolute ordering
When the implementation takes O(N) time to note the relative orderings and restores them after the sort
When the pivot chosen is ensured to be of a unique key, or the first occurence in the current sublist.
I have a sorted list at hand. Now i add a new element to the end of the list. Which sorting algorithm is suitable for such scenario?
Quick sort has worst case time complexity of O(n2) when the list is already sorted. Does this mean time complexity if quick sort is used in the above case will be close to O(n2)?
If you are adding just one element, find the position where it should be inserted and put it there. For an array, you can do binary search for O(logN) time and insert in O(N). For a linked list, you'll have to do a linear search which will take O(N) time but then insertion is O(1).
As for your question on quicksort: If you choose the first value as your pivot, then yes it will be O(N2) in your case. Choose a random pivot and your case will still be O(NlogN) on average. However, the method I suggest above is both easier to implement and faster in your specific case.
It depends on the implementation of the underlying list.
It seems to me that insertion sort will fit your needs except the case when the list is implemented as an array list. In this case too many moves will be required.
Rather than appending to the end of the list, you should do an insert operation.
That is, when adding 5 to [1,2,3,4,7,8,9] you'd result want to the "insert" by putting it where it belongs in the sorted list, instead of at the end and then re-sorting the whole list.
You can quickly find the position to insert the item by using a binary search.
This is basically how insertion sort works, except it operates on the entire list. This method will have better performance than even the best sorting algorithm, for a single item. It may also be faster than appending at the end of the list, depending on your implementation.
I'm assuming you're using an array, since you talk about quicksort, so just adding an element would involve finding the place to insert it (O(log n)) and then actually inserting it (O(n)) for a total cost of O(n). Just appending it to the end and then resorting the entire list is definitely the wrong way to go.
However, if this is to be a frequent operation (i.e. if you have to keep adding elements while maintaining the sorted property) you'll incur an O(n^2) cost of adding another n elements to the list. If you change your representation to a balanced binary tree, that drops to O(n log n) for another n inserts, but finding an element by index will become O(n). If you never need to do this, but just iterate over the elements in order, the tree is definitely the way to go.
Of possible interest is the indexable skiplist which, for a slight storage cost, has O(log n) inserts, deletes, searches and lookups-by-index. Give it a look, it might be just what you're looking for here.
What exactly do you mean by "list" ? Do you mean specifically a linked list, or just some linear (sequential) data structure like an array?
If it's linked list, you'll need a linear search for the correct position. The insertion itself can be done in constant time.
If it's something like an array, you can add to the end and sort, as you mentioned. A sorted collection is only bad for Quicksort if the Quicksort is really badly implemented. If you select your pivot with the typical median of 3 alogrithm, a sorted list will give optimal performance.
I have a set of positive numbers. Given a number not in the set, I want to find the next smallest and next largest numbers that are in the set. The only way I can think to do it now is to find the next smallest by decreasing by 1 until I find a number in the set, and then do the same for finding the next largest.
Motivation: I have a bunch of data in a hashmap, keyed by dates. I don't have a datapoint for every single date. If I have data for, say, 10/01/2000 as 60 and 10/05/2000 as 68, and I ask for 10/02/2000, I want to linearly interpolate. I should get 62.
It depends on if your set is sorted.
If your set is unsorted then finding the closest (higher and lower) is an O(n) operation and a fairly simple algorithm.
If your set is sorted then you can use a modified bisection search to find the answer in O(log n), which is obviously a lot better particularly on larger sets.
If you're doing this repeatedly it might be worth sorting the set, which incurs an O(n log n) cost that might be once off or not depending on how often the set changes. Some kind of tree sort may help improve future sorts as new items are added.
All this boils down to is binary search, provided you can get your data sorted. There are two options.
Sorted Container
If you keep your numbers in a sorted container, this is pretty easy. Instead of using a HashMap, put the data in a TreeMap, then you can efficiently find the next lower or next higher element. Java even has methods to do exactly what you want:
higherKey(K)
lowerKey(K)
This is efficient because TreeMap uses a red-black tree (a kind of balanced binary search tree) internally. higherKey and lowerKey simply start at the root and traverse the tree to find where your element should go.
I'm not sure what language you're using, but in C++ you would usestd::map, and the analogous methods are:
iterator lower_bound(const key_type& k)
iterator upper_bound(const key_type& k)
Array + Sorting
If you don't want to keep your data sorted all the time, you can always dump your data into an array (or any random access container), use sort, and then use the STL's binary search routines on the array:
lower_bound
upper_bound
In Java the analog would be to dump things into an ArrayList, call Java's sort(), then use binarySearch().
All the search routines here are O(logn) time. The cost of keeping your data sorted is O(nlogn) with either a sorted container or with the array. With a sorted container, the cost is amortized over n insertions; with the array you pay it all at once when you call sort().
If you don't want to sort things at all, you can always use a linear search, but you will pay if you use this a lot, as it's an O(n) algorithm.
Put your data items into a tree, like an AVL tree, a red-black tree, or a B+/B- tree. Then you can search the ordered values.
Sort the numbers, then perform binary search on each key to bisect the set. You can then find which numbers are on either side of your missing key.
Convert the set to a list and sort it, then run a binary search for the number not in the set. The result will be the insertion point, i.e. the position at which the number would be present if it were there. If you call that n, then the element at index n of the sorted list is the next smallest number and the element at index n+1 of the sorted list is the next largest number.
You can also do this by keeping the set in sorted order as you construct it, then it becomes an easy matter to search for the insertion point. This approach is used by e.g. the floorEntry() and ceilingEntry() methods of Java's TreeMap.
Keep your set as a sorted list/array and perform bisection-search: e.g., in Python, a sorted list and the bisect module from the standard Python library match your needs to the hilt.
If you get the keys in an array, you can sort the array and find the index of the last element that is less than the desired element. Then you know the index of the key directly before your desired point, and the next element after that is the one directly after.
That should give you enough to interpolate.
(The data structure used need not be an array, anything that will sort is fine. A balanced binary tree, as suggested by others, would be ideal, especially if you plan to reuse the data later).
Finding the n'th element in an unsorted set is O(n). (Select Algorithm) Although here you can boil it down to a simpler, less general algorithm, if you always want the smallest & next smallest elements. But in general, finding the smallest, second smallest, etc. element within an unsorted list is O(n). (You should have been taught this in your algorithms class...)
Sorting a set, and then indexing the element is O(n log n)
Finding an element in a sorted set is O(log n) (binary search)
If you know that there will always be a data point for, say, each week, then keep your HashMap as it is and do what you suggest... That will be a constant time operation since you will be doing 14 hash table lookups (probing 7 days on each side of your search date), each taking O(1) primitive operations.
If you don't know how dense your data is and you can keep it in RAM, then put it into a balanced tree structure as suggested by many others. But this can be costly if you have very many dates and if you have to load the data over the network from a database.