Complexity of insertion - sorting

I have a sorted vector,
what is better:
using lower_bound and insert the element at its place in the vector.
or simply push_back the element into vector and sort it when it is required with sort()?

For that individual operation performed once, it is cheaper to use lower_bound and insert. lower_bound is O(log(n)) and insert will be (worst and average case) O(n), giving a final complexity of O(n).
If you don't need the vector sorted until after you're done inserting into it, it will be much cheaper to sort only once, since all the push_backs will be O(1)*O(n) = O(n) and the final sort will be O(n*log(n)), giving you a final complexity of O(n*log(n)), whereas the multiple lower_bound+insert will give you a complexity of O(n)*O(n) = O(n**2).
If you need the elements sorted throughout the insertion process, which one is cheaper will depend heavily on your usage pattern (how often and when you need the elements sorted). It can often be the case that you only need the elements sorted at the end of the construction process, which is why sorting at the end can be a very powerful tool in your tool belt. If that is not your usage pattern, you may want to at least consider using an associative container such as multi_set or multi_map.

Related

Best Case For Merge Sort

I've made a program that count cost of mergesort algorithm for different value of n, i've taken cost variable and i am incrementing it every time loop encounter or condition chech occure and when i get sorted array i gave that sorted array in and input to merge sort again and after that in third case i am reversing the sorted array so it would be worst case but for all three cases i am getting the same cost,so what would be the Best And Worst Case For Mergesort.
The cost of mergesort implemented classically either as a top-down recursive function or a bottom-up iterative with a small local array of pointers is the same: O(N.log(N)). The number of comparisons will vary depending on the actual contents of the array, but by at most a factor of 2.
You can improve this algorithm at a linear cost by adding an initial comparison between the last element of the left slice and the first element of the right slice in the merge phase. If the comparison yields <= then you can skip the merge phase for this pair of slices.
With this modification, a fully sorted array will sort much faster, with a linear complexity, making it the best case, and a partially sorted array will behave better as well.

How does merge sort have space complexity O(n) for worst case?

O(n) complexity means merge sort in worst case takes a memory space equal to the number of elements present in the initial array. But hasn't it created new arrays while making the recursive calls? How that space is not counted?
A worst case implementation of top down merge sort could take more space than the original array, if it allocates both halves of the array in mergesort() before making the recursive calls to itself.
A more efficient top down merge sort uses an entry function that does a one time allocation of a temp buffer, passing the temp buffer's address as a parameter to one of a pair of mutually recursive functions that generate indices and merge data between the two arrays.
In the case of a bottom up merge sort, a temp array 1/2 the size of the original array could be used, merging both halves of the array, ending up with the first half of data in the temp array, and the second half in the original array, then doing a final merge back into the original array.
However the space complexity is O(n) in either case, since constants like 2 or 1/2 are ignored for big O.
MergeSort has enough with a single buffer of the same size as the original array.
In the usual version, you perform a merge from the array to the extra buffer and copy back to the array.
In an advanced version, you perform the merges from the array to the extra buffer and conversely, alternately.
Note: This answer is wrong, as was pointed out to me in the comments. I leave it here as I believe it is helpful to most people who wants to understand these things, but remember that this algorithm is actually called in-place mergesort and can have a different runtime complexity than pure mergesort.
Merge sort is easy to implement to use the same array for everything, without creating new arrays. Just send the bounds in each recursive call. So something like this (in pseudocode):
mergesort(array) ->
mergesort'(array, 0, length of array - 1)
mergesort'(array, start, end) ->
mergesort'(array, start, end/2)
mergesort'(array, end/2+1, end)
merge(array, start, end/2, end/2+1, end)
merge(array, start1, end1, start2, end2) ->
// This function merges the two partitions
// by just moving elements inside array
In Merge Sort, space complexity is always omega(n) as you have to store the elements somewhere. Additional space complexity can be O(n) in an implementation using arrays and O(1) in linked list implementations. In practice implementations using lists need additional space for list pointers, so unless you already have the list in memory it shouldn't matter. edit if you count stack frames, then it's O(n)+ O(log n) , so still O(n) in case of arrays. In case of lists it's O(log n) additional memory.
That's why in merge-sort complexity analysis people mention 'additional space requirement' or things like that. It's obvious that you have to store the elements somewhere, but it's always better to mention 'additional memory' to keep purists at bay.

Complexity of maintaining a sorted list vs inserting all values then sorting

Would the time and space complexity to maintain a list of numbers in sorted order (i.e start with the first one insert it, 2nd one comes along you insert it in sorted order and so on ..) be the same as inserting them as they appear and then sorting after all insertions have been made?
How do I make this decision? Can you demonstrate in terms of time and space complexity for 'n' elements?
I was thinking in terms of phonebook, what is the difference of storing it in a set and presenting sorted data to the user each time he inserts a record into the phonebook VS storing the phonebook records in a sorted order in a treeset. What would it be for n elements?
Every time you insert into a sorted list and maintain its sortedness, it is O(logn) comparisons to find where to place it but O(n) movements to place it. Since we insert n elements this is O(n^2). But, I think that if you use a data structure that is designed for inserting sorted data into (such as a binary tree) then do a pass at the end to turn it into a list/array, it is only O(nlogn). On the other hand, using such a more complex data structure will use about O(n) additional space, whereas all other approaches can be done in-place and use no additional space.
Every time you insert into an unsorted list it is O(1). Sorting it all at the end is O(nlogn). This means overall it is O(nlogn).
However, if you are not going to make lists of many elements (1000 or less) it probably doesn't matter what big-O it is, and you should either focus on what runs faster for small data sets, or not worry at all if it is not a performance issue.
It depends on what data structure you are inserting them in. If you are asking about inserting in an array, the answer is no. It takes O(n) space and time to store the n elements, and then O(n log n) to sort them, so O(n log n) total. While inserting into an array may require you to move \Omega(n) elements so takes \Theta(n^2). The same problem will be true with most "sequential" data structures. Sorry.
On the other hand, some priority queues such as lazy leftist heaps, fibonacci heaps, and Brodal queues have O(1) insert. While, a Finger Tree gives O(n log n) insert AND linear access (Finger trees are as good as a linked list for what a linked list is good for and as good as balanced binary search trees for what binary search trees are good for--they are kind of amazing).
There are going to be application-specific trade-offs to algorithm selection. The reasons one might use an insertion sort rather than some kind of offline sorting algorithm are enumerated on the Insertion Sort wikipedia page.
The determining factor here is less likely to be asymptotic complexity and more likely to be what you know about your data (e.g., is it likely to be already sorted?)
I'd go further, but I'm not convinced that this isn't a homework question asked verbatim.
Option 1
Insert at correct position in sorted order.
Time taken to find the position for i+1-th element :O(logi)
Time taken to insert and maintain order for i+1-th element: O(i)
Space Complexity:O(N)
Total time:(1*log 1 +2*log 2 + .. +(N-1)*logN-1) =O(NlogN)
Understand that this is just the time complexity.The running time can be very different from this.
Option 2:
Insert element O(1)
Sort elements O(NlogN)
Depending on the sort you employ the space complexity varies, though you can use something like quicksort, which doesn't need much space anyway.
In conclusion though both time complexity are the same, the bounds are weak and mathematically you can come up with better bounds.Also note that worst case complexity may never be encountered in practical situations, probably you will see only average cases all the time.If performance is such a vital issue in your application, you should test both sets of code on random sampling.Do tell me which one works faster after your tests.My guess is option 1.

sorting algorithm suitable for a sorted list

I have a sorted list at hand. Now i add a new element to the end of the list. Which sorting algorithm is suitable for such scenario?
Quick sort has worst case time complexity of O(n2) when the list is already sorted. Does this mean time complexity if quick sort is used in the above case will be close to O(n2)?
If you are adding just one element, find the position where it should be inserted and put it there. For an array, you can do binary search for O(logN) time and insert in O(N). For a linked list, you'll have to do a linear search which will take O(N) time but then insertion is O(1).
As for your question on quicksort: If you choose the first value as your pivot, then yes it will be O(N2) in your case. Choose a random pivot and your case will still be O(NlogN) on average. However, the method I suggest above is both easier to implement and faster in your specific case.
It depends on the implementation of the underlying list.
It seems to me that insertion sort will fit your needs except the case when the list is implemented as an array list. In this case too many moves will be required.
Rather than appending to the end of the list, you should do an insert operation.
That is, when adding 5 to [1,2,3,4,7,8,9] you'd result want to the "insert" by putting it where it belongs in the sorted list, instead of at the end and then re-sorting the whole list.
You can quickly find the position to insert the item by using a binary search.
This is basically how insertion sort works, except it operates on the entire list. This method will have better performance than even the best sorting algorithm, for a single item. It may also be faster than appending at the end of the list, depending on your implementation.
I'm assuming you're using an array, since you talk about quicksort, so just adding an element would involve finding the place to insert it (O(log n)) and then actually inserting it (O(n)) for a total cost of O(n). Just appending it to the end and then resorting the entire list is definitely the wrong way to go.
However, if this is to be a frequent operation (i.e. if you have to keep adding elements while maintaining the sorted property) you'll incur an O(n^2) cost of adding another n elements to the list. If you change your representation to a balanced binary tree, that drops to O(n log n) for another n inserts, but finding an element by index will become O(n). If you never need to do this, but just iterate over the elements in order, the tree is definitely the way to go.
Of possible interest is the indexable skiplist which, for a slight storage cost, has O(log n) inserts, deletes, searches and lookups-by-index. Give it a look, it might be just what you're looking for here.
What exactly do you mean by "list" ? Do you mean specifically a linked list, or just some linear (sequential) data structure like an array?
If it's linked list, you'll need a linear search for the correct position. The insertion itself can be done in constant time.
If it's something like an array, you can add to the end and sort, as you mentioned. A sorted collection is only bad for Quicksort if the Quicksort is really badly implemented. If you select your pivot with the typical median of 3 alogrithm, a sorted list will give optimal performance.

Fastest data structure for inserting/sorting

I need a data structure that can insert elements and sort itself as quickly as possible. I will be inserting a lot more than sorting. Deleting is not much of a concern and nethier is space. My specific implementation will additionally store nodes in an array, so lookup will be O(1), i.e. you don't have to worry about it.
If you're inserting a lot more than sorting, then it may be best to use an unsorted list/vector, and quicksort it when you need it sorted. This keeps inserts very fast. The one1 drawback is that sorting is a comparatively lengthy operation, since it's not amortized over the many inserts. If you depend on relatively constant time, this can be bad.
1 Come to think of it, there's a second drawback. If you underestimate your sort frequency, this could quickly end up being overall slower than a tree or a sorted list. If you sort after every insert, for instance, then the insert+quicksort cycle would be a bad idea.
Just use one of the self-balanced binary search trees, such as red-black tree.
Use any of the Balanced binary trees like AVL trees. It should give O(lg N) time complexity for both of the operations you are looking for.
If you don't need random access into the array, you could use a Heap.
Worst and average time complexity:
O(log N) insertion
O(1) read largest value
O(log N) to remove the largest value
Can be reconfigured to give smallest value instead of largest. By repeatedly removing the largest/smallest value you get a sorted list in O(N log N).
If you can do a lot of inserts before each sort then obviously you should just append the items and sort no sooner than you need to. My favorite is merge sort. That is O(N*Log(N)), is well behaved, and has a minimum of storage manipulation (new, malloc, tree balancing, etc.)
HOWEVER, if the values in the collection are integers and reasonably dense, you can use an O(N) sort, where you just use each value as an index into a big-enough array, and set a boolean TRUE at that index. Then you just scan the whole array and collect the indices that are TRUE.
You say you're storing items in an array where lookup is O(1). Unless you're using a hash table, that suggests your items may be dense integers, so I'm not sure if you even have a problem.
Regardless, memory allocating/deleting is expensive, and you should avoid it by pre-allocating or pooling if you can.
I had some good experience for that kind of task using a Skip List
At least in my case it was about 5 times faster compared to adding everything to a list first and then running a sort over it at the end.

Resources