Writing an linearithmic algorithm - algorithm

This is a question for one of my assignments.
Given four lists of N names, devise a linearithmic algorithm to determine if there is any name common to all four lists.
The closest I've come to a solution that satisfies O(n log n), only works if there are only two data sets. Iterating through one of the sets and using binary search to find a match.
Any hints on how to solve this? I first posted this on programmers.stackexchange, but most of the replies mistook linearithmic for linear.

The algorithm you proposed can be extended to work with any (constant) number of lists:
Sort all the lists but one, using an O(n * log n) sort.
Iterate over the unsorted list.
For each item, use binary search on each sorted list to see if it is present in them all.
This takes the same amount of time as your solution, multiplied by a constant (the number of lists). So it is still O(n * log n).
Note that it is also possible to get an O(n) average-case runtime by using hash tables instead of sort + binary search.

Sort all four lists in O(N.Log(N)).
Then sequentially select the smallest among the four lists (this takes a constant number of comparison per element) until all lists are exhausted, in O(N). In case of ties, you will progress in all lists with the same value (an report quadruple ties).

Related

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

What is the running time of merge sort for these inputs

I am trying to determine the running time in Big O of merge sort for:
(A) sorted input
(B) reverse-ordered input
(C) random input
My answer is that it would take O(n lgn) for all three scenarios, since regardless of the default order of the input, merge sort will always divide the input into the smallest unit of 1 element. Then it will compare each element with each element in the adjacent list to sort and merge the two adjacent lists. It will continue to do this until finally all the elements are sorted and merged.
That said, all we really need to find then is the Big O complexity of merge sort in general, since the worst, average, and best cases will all take the same time.
My question is, can somebody tell me if my answers are correct, and if so, explain why the Big O complexity of merge sort ends up being O(n lgn)?
The answer to this question depends on your implementation of Merge Sort.
When naively implemented, merge sort indeed uses O(n * log n) time as it will always divide the input down to the smallest unit. However, there's a specific implementation called Natural Merge Sort that will keep numbers in their correct order if they're already ordered in the input array by essentially first looking at the given input and deciding which parts need to be ordered, that is, divided and later merged again.
Natural Merge Sort will only take O(n) time for an ordered input and in general be faster for a random input than for a reverse-ordered input. In the latter two cases, runtime will be O(n * log n).
To answer your last question, I'll look at the "normal" Mergesort; the explanation is easier that way.
Note that Mergesort can be visualized as a binary tree where in the root we have the whole input, on the next layer the two halves you get from dividing the input once, on the third layer we have four quarters and so on... On the last layer we finally have individual numbers.
Then note that the whole tree is O(log n) deep (this can also be proved mathematically). On each layer we have to make some comparisons and swaps on n numbers in total - this is because the total amount of numbers on a layer doesn't decrease when we go down the tree. In the picture, we need to do comparisons and swaps on 8 numbers on each layer. The way Mergesort works, we'll actually have to do exactly 8 comparisons and up to 8 swaps per layer. If we have an input of length n instead of 8, we'll need n comparisons and up to n swaps per layer (this is O(n)). We have O(log n) layers, so the whole runtime will be O(n * log n).

Running time of merge sort :: All elements are identical

Suppose we have an array of size n with all the elements identical. What will be O(n)? Will it be linear?
This depends on how the algorithm is implemented.
With a standard "vanilla" implementation of mergesort, the time required to sort an array will always be Θ(n log n) because the merges required at each step each take linear time.
However, with the appropriate optimizations, it's possible to get this to run in time O(n). In many mergesort implementations, the input array is continuously modified so that larger and larger ranges are sorted, and when a merge step occurs, the algorithm uses an external buffer to merge two adjacent sorted ranges. In that case, there's a nifty optimization you can do: before doing the merge, check if the last element of the first range is less than or equal to the first element of the second range. If so, the two ranges taken together are already sorted, so no merging needs to be done.
Suppose you perform this optimization and try sorting an array where all elements are already sorted. What happens? Well, each call to mergesort will fire off two more recursive calls. After those return, it can check the endpoints of the sorted ranges and will notice that they're already in sorted order, so there's no more work left to be done. Overall, this does O(1) work per call, so we have this recurrence relation for the time complexity of the algorithm:
T(n) = 2T(n/2) + O(1)
This solves to O(n), so only linear work is done.

Is partitioning easier than sorting?

This is a question that's been lingering in my mind for some time ...
Suppose I have a list of items and an equivalence relation on them, and comparing two items takes constant time.
I want to return a partition of the items, e.g. a list of linked lists, each containing all equivalent items.
One way of doing this is to extend the equivalence to an ordering on the items and order them (with a sorting algorithm); then all equivalent items will be adjacent.
But can it be done more efficiently than with sorting? Is the time complexity of this problem lower than that of sorting? If not, why not?
You seem to be asking two different questions at one go here.
1) If allowing only equality checks, does it make partition easier than if we had some ordering? The answer is, no. You require Omega(n^2) comparisons to determine the partitioning in the worst case (all different for instance).
2) If allowing ordering, is partitioning easier than sorting? The answer again is no. This is because of the Element Distinctness Problem. Which says that in order to even determine if all objects are distinct, you require Omega(nlogn) comparisons. Since sorting can be done in O(nlogn) time (and also have Omega(nlogn) lower bounds) and solves the partition problem, asymptotically they are equally hard.
If you pick an arbitrary hash function, equal objects need not have the same hash, in which case you haven't done any useful work by putting them in a hashtable.
Even if you do come up with such a hash (equal objects guaranteed to have the same hash), the time complexity is expected O(n) for good hashes, and worst case is Omega(n^2).
Whether to use hashing or sorting completely depends on other constraints not available in the question.
The other answers also seem to be forgetting that your question is (mainly) about comparing partitioning and sorting!
If you can define a hash function for the items as well as an equivalence relation, then you should be able to do the partition in linear time -- assuming computing the hash is constant time. The hash function must map equivalent items to the same hash value.
Without a hash function, you would have to compare every new item to be inserted into the partitioned lists against the head of each existing list. The efficiency of that strategy depends on how many partitions there will eventually be.
Let's say you have 100 items, and they will eventually be partitioned into 3 lists. Then each item would have to be compared against at most 3 other items before inserting it into one of the lists.
However, if those 100 items would eventually be partitioned into 90 lists (i.e., very few equivalent items), it's a different story. Now your runtime is closer to quadratic than linear.
If you don't care about the final ordering of the equivalence sets, then partitioning into equivalence sets could be quicker. However, it depends on the algorithm and the numbers of elements in each set.
If there are very few items in each set, then you might as well just sort the elements and then find the adjacent equal elements. A good sorting algorithm is O(n log n) for n elements.
If there are a few sets with lots of elements in each then you can take each element, and compare to the existing sets. If it belongs in one of them then add it, otherwise create a new set. This will be O(n*m) where n is the number of elements, and m is the number of equivalence sets, which is less then O(n log n) for large n and small m, but worse as m tends to n.
A combined sorting/partitioning algorithm may be quicker.
If a comparator must be used, then the lower bound is Ω(n log n) comparisons for sorting or partitioning. The reason is all elements must be inspected Ω(n), and a comparator must perform log n comparisons for each element to uniquely identify or place that element in relation to the others (each comparison divides the space in 2, and so for a space of size n, log n comparisons are needed.)
If each element can be associated with a unique key which is derived in constant time, then the lowerbound is Ω(n), for sorting ant partitioning (c.f. RadixSort)
Comparison based sorting generally has a lower bound of O(n log n).
Assume you iterate over your set of items and put them in buckets with items with the same comparative value, for example in a set of lists (say using a hash set). This operation is clearly O(n), even after retreiving the list of lists from the set.
--- EDIT: ---
This of course requires two assumptions:
There exists a constant time hash-algorithm for each element to be partitioned.
The number of buckets does not depend on the amount of input.
Thus, the lower bound of partitioning is O(n).
Partitioning is faster than sorting, in general, because you don't have to compare each element to each potentially-equivalent already-sorted element, you only have to compare it to the already-established keys of your partitioning. Take a close look at radix sort. The first step of radix sort is to partition the input based on some part of the key. Radix sort is O(kN). If your data set has keys bounded by a given length k, you can radix sort it O(n). If your data are comparable and don't have a bounded key, but you choose a bounded key with which to partition the set, the complexity of sorting the set would be O(n log n) and the partitioning would be O(n).
This is a classic problem in data structures, and yes, it is easier than sorting. If you want to also quickly be able to look up which set each element belongs to, what you want is the disjoint set data structure, together with the union-find operation. See here: http://en.wikipedia.org/wiki/Disjoint-set_data_structure
The time required to perform a possibly-imperfect partition using a hash function will be O(n+bucketcount) [not O(n*bucketcount)]. Making the bucket count large enough to avoid all collisions will be expensive, but if the hash function works at all well there should be a small number of distinct values in each bucket. If one can easily generate multiple statistically-independent hash functions, one could take each bucket whose keys don't all match the first one and use another hash function to partition the contents of that bucket.
Assuming a constant number of buckets on each step, the time is going to be O(NlgN), but if one sets the number of buckets to something like sqrt(N), the average number of passes should be O(1) and the work in each pass O(n).

Is it possible to find two numbers whose difference is minimum in O(n) time

Given an unsorted integer array, and without making any assumptions on
the numbers in the array:
Is it possible to find two numbers whose
difference is minimum in O(n) time?
Edit: Difference between two numbers a, b is defined as abs(a-b)
Find smallest and largest element in the list. The difference smallest-largest will be minimum.
If you're looking for nonnegative difference, then this is of course at least as hard as checking if the array has two same elements. This is called element uniqueness problem and without any additional assumptions (like limiting size of integers, allowing other operations than comparison) requires >= n log n time. It is the 1-dimensional case of finding the closest pair of points.
I don't think you can to it in O(n). The best I can come up with off the top of my head is to sort them (which is O(n * log n)) and find the minimum difference of adjacent pairs in the sorted list (which adds another O(n)).
I think it is possible. The secret is that you don't actually have to sort the list, you just need to create a tally of which numbers exist. This may count as "making an assumption" from an algorithmic perspective, but not from a practical perspective. We know the ints are bounded by a min and a max.
So, create an array of 2 bit elements, 1 pair for each int from INT_MIN to INT_MAX inclusive, set all of them to 00.
Iterate through the entire list of numbers. For each number in the list, if the corresponding 2 bits are 00 set them to 01. If they're 01 set them to 10. Otherwise ignore. This is obviously O(n).
Next, if any of the 2 bits is set to 10, that is your answer. The minimum distance is 0 because the list contains a repeated number. If not, scan through the list and find the minimum distance. Many people have already pointed out there are simple O(n) algorithms for this.
So O(n) + O(n) = O(n).
Edit: responding to comments.
Interesting points. I think you could achieve the same results without making any assumptions by finding the min/max of the list first and using a sparse array ranging from min to max to hold the data. Takes care of the INT_MIN/MAX assumption, the space complexity and the O(m) time complexity of scanning the array.
The best I can think of is to counting sort the array (possibly combining equal values) and then do the sorted comparisons -- bin sort is O(n + M) (M being the number of distinct values). This has a heavy memory requirement, however. Some form of bucket or radix sort would be intermediate in time and more efficient in space.
Sort the list with radixsort (which is O(n) for integers), then iterate and keep track of the smallest distance so far.
(I assume your integer is a fixed-bit type. If they can hold arbitrarily large mathematical integers, radixsort will be O(n log n) as well.)
It seems to be possible to sort unbounded set of integers in O(n*sqrt(log(log(n))) time. After sorting it is of course trivial to find the minimal difference in linear time.
But I can't think of any algorithm to make it faster than this.
No, not without making assumptions about the numbers/ordering.
It would be possible given a sorted list though.
I think the answer is no and the proof is similar to the proof that you can not sort faster than n lg n: you have to compare all of the elements, i.e create a comparison tree, which implies omega(n lg n) algorithm.
EDIT. OK, if you really want to argue, then the question does not say whether it should be a Turing machine or not. With quantum computers, you can do it in linear time :)

Resources