Is there a data structure with elements that can be indexed whose insertion runtime is O(1)? So for example, I could index the data structure like so: a[4], and yet when inserting an element at an arbitrary place in the data structure that the runtime is O(1)? Note that the data structure does not maintain sorted order, just the ability for each sequential element to have an index.
I don't think its possible, since inserting somewhere that is not at the end or beginning of the ordered data structure would mean that all the indicies after insertion must be updated to know that their index has increased by 1, which would take worst case O(n) time. If the answer is no, could someone prove it mathematically?
EDIT:
To clarify, I want to maintain the order of insertion of elements, so upon inserting, the item inserted remains sequentially between the two elements it was placed between.
The problem that you are looking to solve is called the list labeling problem.
There are lower bounds on the cost that depend on the relationship between the the maximum number of labels you need (n), and the number of possible labels (m).
If n is in O(log m), i.e., if the number of possible labels is exponential in the number of labels you need at any one time, then O(1) cost per operation is achievable... but this is not the usual case.
If n is in O(m), i.e., if they are proportional, then O(log2 n) per operation is the best you can do, and the algorithm is complicated.
If n <= m2, then you can do O(log N). Amortized O(log N) is simple, and O(log N) worst case is hard. Both algorithms are described in this paper by Dietz and Sleator. The hard way makes use of the O(log2 n) algorithm mentioned above.
HOWEVER, maybe you don't really need labels. If you just need to be able to compare the order of two items in the collection, then you are solving a slightly different problem called "list order maintenance". This problem can actually be solved in constant time -- O(1) cost per operation and O(1) cost to compare the order of two items -- although again O(1) amortized cost is a lot easier to achieve.
When inserting into slot i, append the element which was first at slot i to the end of the sequence.
If the sequence capacity must be grown, then this growing may not necessarily be O(1).
I thought about doing this in sort array and save the index of the median and its takes O(1). but I couldn't think about any way to do the insert in O(1) and keep the array sorted.
I really appreciate it if someone can help me with this problem
What you are asking for is impossible, because it would allow comparison-based sorting in O(n) time:
Suppose you have an unsorted array of length n.
Find the minimum element and maximum element in O(n) time.
Insert all n elements into the data structure, each insertion takes O(1) time so this takes O(n) time.
Insert n-1 extra copies of the minimum element. This also takes O(n) time.
Initialise an output array of length n.
Do this n times:
Read off the median of the elements currently in the data structure, and write it at the next position into the output array. This takes O(1) time.
Insert two copies of the maximum element into the data structure. This takes O(1) time.
The above algorithm supposedly runs in O(n) time, and the result is a sorted array of the elements from the input array. But this is impossible, because comparison-sorting takes Ω(n log n) time. Therefore, the supposed data structure cannot exist.
This is a question for one of my assignments.
Given four lists of N names, devise a linearithmic algorithm to determine if there is any name common to all four lists.
The closest I've come to a solution that satisfies O(n log n), only works if there are only two data sets. Iterating through one of the sets and using binary search to find a match.
Any hints on how to solve this? I first posted this on programmers.stackexchange, but most of the replies mistook linearithmic for linear.
The algorithm you proposed can be extended to work with any (constant) number of lists:
Sort all the lists but one, using an O(n * log n) sort.
Iterate over the unsorted list.
For each item, use binary search on each sorted list to see if it is present in them all.
This takes the same amount of time as your solution, multiplied by a constant (the number of lists). So it is still O(n * log n).
Note that it is also possible to get an O(n) average-case runtime by using hash tables instead of sort + binary search.
Sort all four lists in O(N.Log(N)).
Then sequentially select the smallest among the four lists (this takes a constant number of comparison per element) until all lists are exhausted, in O(N). In case of ties, you will progress in all lists with the same value (an report quadruple ties).
So I have developed a Priority Queue using a Min Heap and according to online tutorials it takes O(nlogn) time to sort an entire array using a Priority Queue. This is because we extract 'n' times and for every extraction we have to perform a priority fix which takes logn time. Hence it is nlogn.
However, if I only want to sort half an array every single time, would it still be O(nlogn) time? Or would it be just O(logn)? The reason why I want to do this is because I want to get the element with middle priority and this seems like the only way to do it using a priority queue by extracting half the elements unless there is a more intuitive way of getting the element with middle priority in Priority Queue.
I think that the question is in two parts, so I will answer in two parts:
(a) If I understand you correctly, by sorting "half an array" you mean obtaining a sorted array of (n/2) smallest values of the given array. This will have to take O(n lg n) time. If there were a technique for doing this shorter than O(n lg n) time, then whenever we wanted to sort an array of n values whose maximum value is known to be v (and we can obtain the maximum value in O(n) time), we could construct an array of 2n elements, where the first half is the original array and the second half is filled with a value larger than v. Then, applying the hypothetical technique, we could in effect sort the original array in a time shorter than O(n lg n), which is known to be impossible.
(b) But if I am correct in understanding "the element with middle priority" as the median element in an array, you may be interested in this question.
This is a question that's been lingering in my mind for some time ...
Suppose I have a list of items and an equivalence relation on them, and comparing two items takes constant time.
I want to return a partition of the items, e.g. a list of linked lists, each containing all equivalent items.
One way of doing this is to extend the equivalence to an ordering on the items and order them (with a sorting algorithm); then all equivalent items will be adjacent.
But can it be done more efficiently than with sorting? Is the time complexity of this problem lower than that of sorting? If not, why not?
You seem to be asking two different questions at one go here.
1) If allowing only equality checks, does it make partition easier than if we had some ordering? The answer is, no. You require Omega(n^2) comparisons to determine the partitioning in the worst case (all different for instance).
2) If allowing ordering, is partitioning easier than sorting? The answer again is no. This is because of the Element Distinctness Problem. Which says that in order to even determine if all objects are distinct, you require Omega(nlogn) comparisons. Since sorting can be done in O(nlogn) time (and also have Omega(nlogn) lower bounds) and solves the partition problem, asymptotically they are equally hard.
If you pick an arbitrary hash function, equal objects need not have the same hash, in which case you haven't done any useful work by putting them in a hashtable.
Even if you do come up with such a hash (equal objects guaranteed to have the same hash), the time complexity is expected O(n) for good hashes, and worst case is Omega(n^2).
Whether to use hashing or sorting completely depends on other constraints not available in the question.
The other answers also seem to be forgetting that your question is (mainly) about comparing partitioning and sorting!
If you can define a hash function for the items as well as an equivalence relation, then you should be able to do the partition in linear time -- assuming computing the hash is constant time. The hash function must map equivalent items to the same hash value.
Without a hash function, you would have to compare every new item to be inserted into the partitioned lists against the head of each existing list. The efficiency of that strategy depends on how many partitions there will eventually be.
Let's say you have 100 items, and they will eventually be partitioned into 3 lists. Then each item would have to be compared against at most 3 other items before inserting it into one of the lists.
However, if those 100 items would eventually be partitioned into 90 lists (i.e., very few equivalent items), it's a different story. Now your runtime is closer to quadratic than linear.
If you don't care about the final ordering of the equivalence sets, then partitioning into equivalence sets could be quicker. However, it depends on the algorithm and the numbers of elements in each set.
If there are very few items in each set, then you might as well just sort the elements and then find the adjacent equal elements. A good sorting algorithm is O(n log n) for n elements.
If there are a few sets with lots of elements in each then you can take each element, and compare to the existing sets. If it belongs in one of them then add it, otherwise create a new set. This will be O(n*m) where n is the number of elements, and m is the number of equivalence sets, which is less then O(n log n) for large n and small m, but worse as m tends to n.
A combined sorting/partitioning algorithm may be quicker.
If a comparator must be used, then the lower bound is Ω(n log n) comparisons for sorting or partitioning. The reason is all elements must be inspected Ω(n), and a comparator must perform log n comparisons for each element to uniquely identify or place that element in relation to the others (each comparison divides the space in 2, and so for a space of size n, log n comparisons are needed.)
If each element can be associated with a unique key which is derived in constant time, then the lowerbound is Ω(n), for sorting ant partitioning (c.f. RadixSort)
Comparison based sorting generally has a lower bound of O(n log n).
Assume you iterate over your set of items and put them in buckets with items with the same comparative value, for example in a set of lists (say using a hash set). This operation is clearly O(n), even after retreiving the list of lists from the set.
--- EDIT: ---
This of course requires two assumptions:
There exists a constant time hash-algorithm for each element to be partitioned.
The number of buckets does not depend on the amount of input.
Thus, the lower bound of partitioning is O(n).
Partitioning is faster than sorting, in general, because you don't have to compare each element to each potentially-equivalent already-sorted element, you only have to compare it to the already-established keys of your partitioning. Take a close look at radix sort. The first step of radix sort is to partition the input based on some part of the key. Radix sort is O(kN). If your data set has keys bounded by a given length k, you can radix sort it O(n). If your data are comparable and don't have a bounded key, but you choose a bounded key with which to partition the set, the complexity of sorting the set would be O(n log n) and the partitioning would be O(n).
This is a classic problem in data structures, and yes, it is easier than sorting. If you want to also quickly be able to look up which set each element belongs to, what you want is the disjoint set data structure, together with the union-find operation. See here: http://en.wikipedia.org/wiki/Disjoint-set_data_structure
The time required to perform a possibly-imperfect partition using a hash function will be O(n+bucketcount) [not O(n*bucketcount)]. Making the bucket count large enough to avoid all collisions will be expensive, but if the hash function works at all well there should be a small number of distinct values in each bucket. If one can easily generate multiple statistically-independent hash functions, one could take each bucket whose keys don't all match the first one and use another hash function to partition the contents of that bucket.
Assuming a constant number of buckets on each step, the time is going to be O(NlgN), but if one sets the number of buckets to something like sqrt(N), the average number of passes should be O(1) and the work in each pass O(n).