Average case nlogn Nuts and Bolts matching - algorithm

I have to make an algorithm that matches items from two arrays, we are not allowed to sort either array first, we can only match by comparison with an item from array 1 to and an item to array 2 (comparisons being <,=,>). The output is two lists and they have the same order. I can think of ways to solve it using n(n+1)/2 time. The goal is nlog(n). I have been banging my head against a wall trying to think of a way but I can't. Can anyone give me a hint?
So to explain the input is two arrays ex. A = [1,3,6,2,5,4] B =[4,2,3,5,1,6] and the output is the two arrays with the same order. You can not sort the arrays individually first or compare items within the same array. You can only compare items across lists like so A_1<B_1, A_2=B_3, A_4<B_3.

Similar to quicksort:
Use a random A-element to partition B into smaller-B, equal-B and larger-B. Use its equal B-element to partition A. Recursively match smaller-A with smaller-B as well as larger-A with larger-B.
Just like quicksort, expected time is O(n log n) and worst case is O(n2).

Related

Is there a way to recover the two sorted halfs in a merge sort algorithm if I have the sorted array?

Suppose I have an unsorted array P and it's sorted equivalent P_Sorted. Suppose L and R refer to the left and right halves of P. Is there a way to recover L_Sorted and R_Sorted from P and P_Sorted in linear time without using extra memory?
For further clarification, during a recursive merge sort implementation L_Sorted and R_Sorted would be merged together to form P_Sorted, so I'm kinda looking to reverse the merge step.
In a merge sort, you divide the array into two halves recursively and merge them. So at the last merge, you would have already sorted the left and right halves - they are sorted independently - that is why divide and conquer name.
Therefore when doing a merge you can just look at the sizes of the arrays to be merged and if they are equal ( even input size ) or differ by 1 ( odd input size ), you are at the last merge. Then you could store those sorted arrays in some variable before merging them.
BUT if you are not allowed to mess with the function, and you need to work only with the sorted array and the original array, I think the solution is not straightforward. I found an url that poses this problem and a possible solution.
It seems feasible in linear time for very specific datasets:
If there is a way to tell the original position of each data element in the sorted list, for example if these are records with a creation date and a name field and the original array is in chronological order, selecting from the sorted array the elements that fall in the first or second half can be done in a single scan in linear time with no space overhead.
In the general case, sorting the left and right half seems the most efficient way to get L_sorted and R_sorted, with or without P_sorted. The time complexity is O(n.log(n)).

Does the non-parallel sample sort have the same complexity as quick sort?

According to wikipedia and other resources, quick sort happens to be a special case of sample sort, because we always choose 1 partitioning item, put it in it's place and continue the sort, so quick sort is sample sort, where m (the number of partitioning items at each step) is 1. So, my question is, for 1 < m < n does it have the same complexity as quick sort when it's not parallel?
The following is the algorithm for sample sort described on wikipedia.
1) Find splitters, values that break up the data into buckets, by sampling the data.
2) Use the sorted splitters to define buckets and place data in appropriate buckets.
3) Sort each of the buckets.
I am not exactly sure I understand this algorithm correctly, but I think we first find the partitioning item, put it in it's place and then look to the left and to the right to find more partitioning items there, and then recursively call the same function to partition each one of those m samples into m samples again, am I right? Because if so, it seems that sample sort performs the same as quick sort because it simply does the same thing, except half of it iteratively (when looking for splitters) and half of it recursively.
They will have different complexity. When m > 1, their running would be approximate to CNlogm+1N. The constant C will be large enough to make it slower than ordinary QuickSort because there is no known algorithm to partition list into m + 1 buckets as efficiency as partition list into two buckets.
For example, normal QuickSort would takes O(N) to partition the list into two sub array. Assuming in best case, QuickSort perfectly choose value that split list into two buckets of the same size.
Cn = 2Cn/2 + n = nlog2n
Let assume that m = 2 that's mean we need to partition the list into three sub array. Let assuming that in best case, we can perfectly choose values that split the list into three buckets of the same size. However, let's say the cost of partition is O(3N).
Cn = 3Cn/3 + 3n = 3nlog3n
As you can see
3nlog3n > nlog2n.

Algorithm for finding mutual name in lists

I've been reading up on Algorithms from the book Algorithms by Robert Sedgewick and I've been stuck on an exercise problem for a while. Here is the question :
Given 3 lists of N names each, find an algorithm to determine if there is any name common to all three lists. The algorithm must have O(NlogN) complexity. You're only allowed to use sorting algorithms and the only data structures you can use are stacks and queues.
I figured I could solve this problem using a HashMap, but the questions restricts us from doing so. Even then that still wouldn't have a complexity of NlogN.
If you sort each of the lists, then you could trivially check if all three lists have any 1 name in O(n) time by picking the first name of list A compare it to the first name in list B, if that element is < that of list A, pop the list b element and repeat until list B >= list A. If you find a match repeat the process on C. If you find a match in C also return true, otherwise return to the next element in a.
Now you have to sort all of the lists in n log n time. which you could do with your favorite sorting algorithm though you would have to be a little creative using just stacks and queues. I would probably recommend merge sort
The below psuedo code is a little messed up because I am changing lists that I am iterating over
pseudo code:
assume listA, b and c are sorted Queues where the smallest name is at the top of the queue.
eltB = listB.pop()
eltC = listC.pop()
for eltA in listA:
while(eltB<=eltA):
if eltB==eltA:
while(eltC<=eltB):
if eltB==eltC:
return true
if eltC<eltB:
eltC=listC.pop();
eltB=listB.pop()
Steps:
Sort the three lists using an O(N lgN) sorting algorithm.
Pop the one item from each list.
If any of the lists from which you tried to pop is empty, then you are done i.e. no common element exists.
Else, compare the three elements.
If the elements are equal, you are done - you have found the common element.
Else, keep the maximum of the three elements (constant time) and replenish from the same lists from which the two elements were discarded.
Go to step 3.
Step 1 takes O(N lgN) and the rest of the steps take O(N), so the overall complexity is O(N lgN).

Union of inverted lists

Give k sorted inverted lists, I want an efficient algorithm to get the union of these k lists?
Each inverted list is a read-only array in memory, each list contains integer in sorted order.
the result will be saved in a predefined array which is large enough. Is there any algorithm better than k-way merge?
K-Way merge is optimal. It has O(log(k)*n) ops [where n is the number of elements in all lists combined].
It is easy to see it cannot be done better - as #jpalecek mentioned, otherwise you could sort any array better then O(nlogn) by splitting it into chunks [inverted indexes] of size 1.
Note: This answer assumes it is important that inverted indexes
[resulting array] will be sorted. This assumption is true for most
applications that use inverted indexes, especially in the
Information-Retrieval area. This feature [sorted indexes] allows
elegant and quick intersection of indexes.
Note: that standard k-way merge allows duplications, you will have to
make sure that if an element is appearing in two lists, it will be
added only once [easy to do it by simply checking the last element in
the target array before adding].
If you don't need the resulting array to be sorted, the best approach would be using a hash table to mark which of the elements you have seen. This way, you can get O(n) (n being the total number of elements) time complexity.
Something along the lines of (Perl):
my %seen;
#merged = grep { exists $seen{$_} ? 0 : ($seen{$_} = 1) } (map {(#$_)} #inputs);

Find a common element within N arrays

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

Resources