Related
Is there a data structure with elements that can be indexed whose insertion runtime is O(1)? So for example, I could index the data structure like so: a[4], and yet when inserting an element at an arbitrary place in the data structure that the runtime is O(1)? Note that the data structure does not maintain sorted order, just the ability for each sequential element to have an index.
I don't think its possible, since inserting somewhere that is not at the end or beginning of the ordered data structure would mean that all the indicies after insertion must be updated to know that their index has increased by 1, which would take worst case O(n) time. If the answer is no, could someone prove it mathematically?
EDIT:
To clarify, I want to maintain the order of insertion of elements, so upon inserting, the item inserted remains sequentially between the two elements it was placed between.
The problem that you are looking to solve is called the list labeling problem.
There are lower bounds on the cost that depend on the relationship between the the maximum number of labels you need (n), and the number of possible labels (m).
If n is in O(log m), i.e., if the number of possible labels is exponential in the number of labels you need at any one time, then O(1) cost per operation is achievable... but this is not the usual case.
If n is in O(m), i.e., if they are proportional, then O(log2 n) per operation is the best you can do, and the algorithm is complicated.
If n <= m2, then you can do O(log N). Amortized O(log N) is simple, and O(log N) worst case is hard. Both algorithms are described in this paper by Dietz and Sleator. The hard way makes use of the O(log2 n) algorithm mentioned above.
HOWEVER, maybe you don't really need labels. If you just need to be able to compare the order of two items in the collection, then you are solving a slightly different problem called "list order maintenance". This problem can actually be solved in constant time -- O(1) cost per operation and O(1) cost to compare the order of two items -- although again O(1) amortized cost is a lot easier to achieve.
When inserting into slot i, append the element which was first at slot i to the end of the sequence.
If the sequence capacity must be grown, then this growing may not necessarily be O(1).
I am at an intermediate level in Algorithms. Recently when I was comparing different sorting algorithms this thing stuck to me.
How do you compare different sorting algorithms when the data is rather incoming than already being present?
I have compared a few myself but not very sure if it is the right approach.
Insertion Sort : As the name itself suggests, it presents a nice solution to the problem with O(n^2) complexity.
Heap Sort : The technique is to build the heap for each data item pushed. It corresponds to a sift-up operation with O(logn) complexity and then exchange the first element with the last element and Heapify to restore the heap properties. Heapify is again O(logn) so the overall complexity is O(n logn logn). But if we have all the data items present with us already it is only O(n logn) because we are doing only Heapify operation on the data items after we have built the heap.
Selection sort : It needs all the data items before sorting, so I assume there is no solution to our problem using selection sort.
Tree sort : The dominant step in this technique is to build a tree which has a Worst-case time complexity of O(n^2). And then an in-order traversal would do.
I am not very sure about the other algorithms.
I am posting this question as I am looking for a complete hold on these sorting techniques.
Pardon me if you find any discrepancies in either my question or my comparisons.
Building a heap one item at a time doesn't add any extra complexity (and your O(n log n log n) doesn't make sense to me at all). In fact, the normal way to build a heap is to add one item at a time to the heap anyway. That is to say, even if you have all your items available to start with, you start by building a heap of the first two, then adding the third, the fourth, and so on. If you're receiving the items one at a time, it doesn't cause any extra work at all. It ends up O(N log N), just like always.
Just to be clear: the swapping an item to the first position and then sifting into the tree is not for when you're adding items -- it's for when you're removing items. That is to say: when you're done inserting all the items into the heap, you're reading to produce sorted output. You do that by swapping the item at the base of the heap (i.e., first item in the array) to the end of the array, then taking the item that started out at the end of the array, and sifting it upward in the heap until the heap property is restored.
When you're adding new items to the heap, you add them at the end, then sift them downward to restore the heap property. In the typical case where they're already all available, you do that by starting at the beginning of the array, and having a partition between items that form a heap, and items that are still just a jumbled mess. To build all the items into a heap, you move that partition over one item, sift that item down to where it belongs, and repeat. With an "online" version that doesn't have all the data immediately available, all that changes is that after you sift an item into the heap, you have to wait for the next item to arrive before you sift it into the heap (where you'd normally insert the next item immediately).
Building a tree is only O(N2) if 1) the items arrive perfectly ordered, and 2) you don't realize that, and 3) you don't re-balance the tree as you build it. If the items are in fairly random order to start with, even without balancing, your complexity will be close to O(N log N). In a typical case of building something like a Red-Black or AVL tree, it'll remain O(N log N) even if the items are in order (though it would be O(N log N) with lower constants if they arrive in more or less random order instead of sorted). Likewise, if you use a B-tree, you'll get O(N log N) complexity, even if the data arrive in order.
Insertion Sort : As the name itself suggests, it presents a nice solution to the problem with O(n^2) complexity.
I agree, not all that great, in either case.
Heap Sort : The technique is to build the heap for each data item pushed. It corresponds to a sift-up operation with O(logn) complexity and then exchange the first element with the last element and Heapify to restore the heap properties. Heapify is again O(logn) so the overall complexity is O(n logn logn). But if we have all the data items present with us already it is only O(n logn) because we are doing only Heapify operation on the data items after we have built the heap.
You can do this with a min-heap Binary tree, but if you're using an array, this may be difficult.
Selection sort : It needs all the data items before sorting, so I assume there is no solution to our problem using selection sort.
Terrible in either case.
Tree sort : The dominant step in this technique is to build a tree which has a Worst-case time complexity of O(n^2). And then an in-order traversal would do.
O(nlogn) if you use radix sort. See heap sort comments about array.
Other sorts:
Counting sort: Terrible if you don't know the data before hand. Since you really don't know how much space you need before hand.
Radix Sort: Still amazing, will be the best sorting algorithm in either case. Especially when you do it with a byte sort. All you do is input into a histogram as the data comes in, then store it into a temp linked list. Then do the radix sort afterwards.
Bucket Sort: Can go either way, much better than counting sort, not as good as radix sort.
Would the time and space complexity to maintain a list of numbers in sorted order (i.e start with the first one insert it, 2nd one comes along you insert it in sorted order and so on ..) be the same as inserting them as they appear and then sorting after all insertions have been made?
How do I make this decision? Can you demonstrate in terms of time and space complexity for 'n' elements?
I was thinking in terms of phonebook, what is the difference of storing it in a set and presenting sorted data to the user each time he inserts a record into the phonebook VS storing the phonebook records in a sorted order in a treeset. What would it be for n elements?
Every time you insert into a sorted list and maintain its sortedness, it is O(logn) comparisons to find where to place it but O(n) movements to place it. Since we insert n elements this is O(n^2). But, I think that if you use a data structure that is designed for inserting sorted data into (such as a binary tree) then do a pass at the end to turn it into a list/array, it is only O(nlogn). On the other hand, using such a more complex data structure will use about O(n) additional space, whereas all other approaches can be done in-place and use no additional space.
Every time you insert into an unsorted list it is O(1). Sorting it all at the end is O(nlogn). This means overall it is O(nlogn).
However, if you are not going to make lists of many elements (1000 or less) it probably doesn't matter what big-O it is, and you should either focus on what runs faster for small data sets, or not worry at all if it is not a performance issue.
It depends on what data structure you are inserting them in. If you are asking about inserting in an array, the answer is no. It takes O(n) space and time to store the n elements, and then O(n log n) to sort them, so O(n log n) total. While inserting into an array may require you to move \Omega(n) elements so takes \Theta(n^2). The same problem will be true with most "sequential" data structures. Sorry.
On the other hand, some priority queues such as lazy leftist heaps, fibonacci heaps, and Brodal queues have O(1) insert. While, a Finger Tree gives O(n log n) insert AND linear access (Finger trees are as good as a linked list for what a linked list is good for and as good as balanced binary search trees for what binary search trees are good for--they are kind of amazing).
There are going to be application-specific trade-offs to algorithm selection. The reasons one might use an insertion sort rather than some kind of offline sorting algorithm are enumerated on the Insertion Sort wikipedia page.
The determining factor here is less likely to be asymptotic complexity and more likely to be what you know about your data (e.g., is it likely to be already sorted?)
I'd go further, but I'm not convinced that this isn't a homework question asked verbatim.
Option 1
Insert at correct position in sorted order.
Time taken to find the position for i+1-th element :O(logi)
Time taken to insert and maintain order for i+1-th element: O(i)
Space Complexity:O(N)
Total time:(1*log 1 +2*log 2 + .. +(N-1)*logN-1) =O(NlogN)
Understand that this is just the time complexity.The running time can be very different from this.
Option 2:
Insert element O(1)
Sort elements O(NlogN)
Depending on the sort you employ the space complexity varies, though you can use something like quicksort, which doesn't need much space anyway.
In conclusion though both time complexity are the same, the bounds are weak and mathematically you can come up with better bounds.Also note that worst case complexity may never be encountered in practical situations, probably you will see only average cases all the time.If performance is such a vital issue in your application, you should test both sets of code on random sampling.Do tell me which one works faster after your tests.My guess is option 1.
I have a sorted list at hand. Now i add a new element to the end of the list. Which sorting algorithm is suitable for such scenario?
Quick sort has worst case time complexity of O(n2) when the list is already sorted. Does this mean time complexity if quick sort is used in the above case will be close to O(n2)?
If you are adding just one element, find the position where it should be inserted and put it there. For an array, you can do binary search for O(logN) time and insert in O(N). For a linked list, you'll have to do a linear search which will take O(N) time but then insertion is O(1).
As for your question on quicksort: If you choose the first value as your pivot, then yes it will be O(N2) in your case. Choose a random pivot and your case will still be O(NlogN) on average. However, the method I suggest above is both easier to implement and faster in your specific case.
It depends on the implementation of the underlying list.
It seems to me that insertion sort will fit your needs except the case when the list is implemented as an array list. In this case too many moves will be required.
Rather than appending to the end of the list, you should do an insert operation.
That is, when adding 5 to [1,2,3,4,7,8,9] you'd result want to the "insert" by putting it where it belongs in the sorted list, instead of at the end and then re-sorting the whole list.
You can quickly find the position to insert the item by using a binary search.
This is basically how insertion sort works, except it operates on the entire list. This method will have better performance than even the best sorting algorithm, for a single item. It may also be faster than appending at the end of the list, depending on your implementation.
I'm assuming you're using an array, since you talk about quicksort, so just adding an element would involve finding the place to insert it (O(log n)) and then actually inserting it (O(n)) for a total cost of O(n). Just appending it to the end and then resorting the entire list is definitely the wrong way to go.
However, if this is to be a frequent operation (i.e. if you have to keep adding elements while maintaining the sorted property) you'll incur an O(n^2) cost of adding another n elements to the list. If you change your representation to a balanced binary tree, that drops to O(n log n) for another n inserts, but finding an element by index will become O(n). If you never need to do this, but just iterate over the elements in order, the tree is definitely the way to go.
Of possible interest is the indexable skiplist which, for a slight storage cost, has O(log n) inserts, deletes, searches and lookups-by-index. Give it a look, it might be just what you're looking for here.
What exactly do you mean by "list" ? Do you mean specifically a linked list, or just some linear (sequential) data structure like an array?
If it's linked list, you'll need a linear search for the correct position. The insertion itself can be done in constant time.
If it's something like an array, you can add to the end and sort, as you mentioned. A sorted collection is only bad for Quicksort if the Quicksort is really badly implemented. If you select your pivot with the typical median of 3 alogrithm, a sorted list will give optimal performance.
If I have a sorted list (say quicksort to sort), if I have a lot of values to add, is it better to suspend sorting, and add them to the end, then sort, or use binary chop to place the items correctly while adding them. Does it make a difference if the items are random, or already more or less in order?
If you add enough items that you're effectively building the list from scratch, you should be able to get better performance by sorting the list afterwards.
If items are mostly in order, you can tweak both incremental update and regular sorting to take advantage of that, but frankly, it usually isn't worth the trouble. (You also need to be careful of things like making sure some unexpected ordering can't make your algorithm take much longer, q.v. naive quicksort)
Both incremental update and regular list sort are O(N log N) but you can get a better constant factor sorting everything afterward (I'm assuming here that you've got some auxiliary datastructure so your incremental update can access list items faster than O(N)...). Generally speaking, sorting all at once has a lot more design freedom than maintaining the ordering incrementally, since incremental update has to maintain a complete order at all times, but an all-at-once bulk sort does not.
If nothing else, remember that there are lots of highly-optimized bulk sorts available.
Usually it's far better to use a heap. in short, it splits the cost of maintaining order between the pusher and the picker. Both operations are O(log n), instead of O(n log n), like most other solutions.
If you're adding in bunches, you can use a merge sort. Sort the list of items to be added, then copy from both lists, comparing items to determine which one gets copied next. You could even copy in-place if resize your destination array and work from the end backwards.
The efficiency of this solution is O(n+m) + O(m log m) where n is the size of the original list, and m is the number of items being inserted.
Edit: Since this answer isn't getting any love, I thought I'd flesh it out with some C++ sample code. I assume that the sorted list is kept in a linked list rather than an array. This changes the algorithm to look more like an insertion than a merge, but the principle is the same.
// Note that itemstoadd is modified as a side effect of this function
template<typename T>
void AddToSortedList(std::list<T> & sortedlist, std::vector<T> & itemstoadd)
{
std::sort(itemstoadd.begin(), itemstoadd.end());
std::list<T>::iterator listposition = sortedlist.begin();
std::vector<T>::iterator nextnewitem = itemstoadd.begin();
while ((listposition != sortedlist.end()) || (nextnewitem != itemstoadd.end()))
{
if ((listposition == sortedlist.end()) || (*nextnewitem < *listposition))
sortedlist.insert(listposition, *nextnewitem++);
else
++listposition;
}
}
I'd say, let's test it! :)
I tried with quicksort, but sorting an almost sorting array with quicksort is... well, not really a good idea. I tried a modified one, cutting off at 7 elements and using insertion sort for that. Still, horrible performance. I switched to merge sort. It might need quite a lot of memory for sorting (it's not in-place), but the performance is much better on sorted arrays and almost identical on random ones (the initial sort took almost the same time for both, quicksort was only slightly faster).
This already shows one thing: The answer to your questions depends strongly on the sorting algorithm you use. If it will have poor performance on almost sorted lists, inserting at the right position will be much faster than adding at the end and then re-sorting it; and merge sort might be no option for you, as it might need way too much external memory if the list is huge. BTW I used a custom merge sort implementation, that only uses 1/2 of external storage to the naive implementation (which needs as much external storage as the array size itself).
If merge sort is no option and quicksort is no option for sure, the best alternative is probably heap sort.
My results are: Adding the new elements simply at the end and then re-sorting the array was several magnitudes faster than inserting them in the right position. However, my initial array had 10 mio elements (sorted) and I was adding another mio (unsorted). So if you add 10 elements to an array of 10 mio, inserting them correctly is much faster than re-sorting everything. So the answer to your question also depends on how big the initial (sorted) array is and how many new elements you want to add to it.
In principle, it's faster to create a tree than to sort a list. The tree inserts are O(log(n)) for each insert, leading to overall O(nlog(n)). Sorting in O(nlog(n)).
That's why Java has TreeMap, (in addition to TreeSet, TreeList, ArrayList and LinkedList implementations of a List.)
A TreeSet keeps things in object comparison order. The key is defined by the Comparable interface.
A LinkedList keeps things in the insertion order.
An ArrayList uses more memory, is faster for some operations.
A TreeMap, similarly, removes the need to sort by a key. The map is built in key order during the inserts and maintained in sorted order at all times.
However, for some reason, the Java implementation of TreeSet is quite a bit slower than using an ArrayList and a sort.
[It's hard to speculate as to why it would be dramatically slower, but it is. It should be slightly faster by one pass through the data. This kind of thing is often the cost of memory management trumping the algorithmic analysis.]
It's about the same. Inserting an item into a sorted list is O(log N), and doing this for every element in the list, N, (thus building the list) would be O(N log N) which is the speed of quicksort (or merge sort which is closer to this approach).
If you instead inserted them onto the front it would be O(1), but doing a quicksort after, it would still be O(N log N).
I would go with the first approach, because it has the potential to be slightly faster. If the initial size of your list, N, is much greater than the number of elements to insert, X, then the insert approach is O(X log N). Sorting after inserting to the head of the list is O(N log N). If N=0 (IE: your list is initially empty), the speed of inserting in sorted order, or sorting afterwards are the same.
Inserting an item into a sorted list takes O(n) time, not O(log n) time. You have to find the place to put it, taking O(log n) time. But then you have to shift over all the elements - taking O(n) time. So inserting while maintaining sorted-ness is O(n ^ 2), where as inserting them all and then sorting is O(n log n).
Depending on your sort implementation, you can get even better than O(n log n) if the number of inserts is much smaller than the list size. But if that is the case, it doesn't matter either way.
So do the insert all and sort solution if the number of inserts is large, otherwise it probably won't matter.
If the list is a) already sorted, and b) dynamic in nature, then inserting into a sorted list should always be faster (find the right place (O(n)) and insert (O(1))).
However, if the list is static, then a shuffle of the remainder of the list has to occur (O(n) to find the right place and O(n) to slide things down).
Either way, inserting into a sorted list (or something like a Binary Search Tree) should be faster.
O(n) + O(n) should always be faster than O(N log n).
At a high level, it's a pretty simple problem, because you can think of sorting as just iterated searching. When you want to insert an element into an ordered array, list, or tree, you have to search for the point at which to insert it. Then you put it in, at hopefully low cost. So you could think of a sort algorithm as just taking a bunch of things and, one by one, searching for the proper position and inserting them. Thus, an insertion sort (O(n* n)) is an iterated linear search (O(n)). Tree, heap, merge, radix, and quick sort (O(n*log(n))) can be thought of as iterated binary search (O(log(n))). It is possible to have an O(n) sort, if the underlying search is O(1) as in an ordered hash table. (An example of this is sorting 52 cards by flinging them into 52 bins.)
So the answer to your question is, inserting things one at a time, versus saving them up and then sorting them should not make much difference, in a big-O sense. You could of course have constant factors to deal with, and those might be significant.
Of course, if n is small, like 10, the whole discussion is silly.
You should add them before and then use a radix sort this should be optimal
http://en.wikipedia.org/wiki/Radix_sort#Efficiency
(If the list you're talking about is like C# List<T>.) Adding some values to right positions into a sorted list with many values is going to require less operations. But if the number of values being added becomes large, it will require more.
I would suggest using not a list but some more suitable data structure in your case. Like a binary tree, for example. A sorted data structure with minimal insertion time.
If this is .NET and the items are integers, it's quicker to add them to a Dictionary (or if you're on .Net 3.0 or above use the HashSet if you don't mind losing duplicates)This gives you automagic sorting.
I think that strings would work the same way as well. The beauty is you get O(1) insertion and sorting this way.
Inserting an item into a sorted list is O(log n), while sorting a list is O(n log N)
Which would suggest that it's always better to sort first and then insert
But remeber big 'O' only concerns the scaling of the speed with number of items, it might be that for your application an insert in the middle is expensive (eg if it was a vector) and so appending and sorting afterward might be better.