Best Logic to implement comparison of two arrays - logic

I have two arrays -arr 1 and arr 2, here both the arrays will have some common items and many uncommon items, first the common items should be removed from both the arrays.
therefore for each uncommon item in arr 1 may probably be a sum of two or more values in arr 2 or vice versa.if the sum is found the values must be removed from the respective arrays. Finally the output should only be the unmatched values on both the arrays   
I need a logic where i can do this calculation in much faster way.

I'm not going to give out the code that implements your logic but rather I would be happy to point you in the right direction.
I code in C++, so I'm gonna answer with respect to that. If you want a different language, I'm sure you can freely do so.
To remove the common elements:
You first sort the arrays arr1 and arr2. Then do a set_symmetric_difference on them. This will effectively run in linear time. Then create two sets out of them, say set1 and set2.
To remove pairs that sum up to an element in the other array
For this, you need to loop through all the possible pairs of elements in arr1 and check if this sum of this pair exists in set2. Likewise do for arr2 as well and remove the element when necessary.
This can be done in O(n2). If you don't want to create the extra sets, you can always trade performance for memory, by just looping through the pairs of arr1 and checking for the sum in the arr2 by doing a binary search.
Then the time complexity may shoot up to O(n2 log(n)).

Related

algorithm to accomplish comparing two arrays with user define criteria

I want to compare tow float arrays' value. But it may be different from other criteria. Here is how I define which array is the best.
Say we have two array named a,b.First, we compare the max value of these two array, and the array with smaller max value wins. If they have same value, then we can divide each array into two parts. The first part is a[1:max_loc(a)-1] and a[max_loc(a)+1,len(a)], and b is similar. Then we use the same criteria on a[1:max_loc(a)-1] and b[1:max_loc(b)-1] to see which array has the smaller max value. If they have the same max value on these intervals, then divide them to smaller arrays and do the same comparison. We also do the same thing for the a[max_loc(a)+1,len(a)] and b[max_loc(b)+1,len(b)]. Until we find smaller max value on the same intervals, the program end and print out the best array.
What's the algorithm to fulfill this comparison?
P.S. these two arrays may have different length.
Most of the time, what you search is somewhere already on the Internet :
https://www.ics.uci.edu/~eppstein/161/960118.html
Here you got 2 examples with full explanations which follows the divide and conquer idea (MergeSort and QuickSort)

Possible to do quicksort without splitting into separate lists?

In many quicksort algorithms, the programming involves placing the elements from each array into three groups:(less, pivot, more), and sometimes placing the groups back together. What if I do not want to use this? Is there a simpler approach to sorting a list with quicksort manually?
Basically, I plan to keep the array as one, and swap all the elements based on a partition (for example, given a list x and pivot r, we could have the reference lists of [0:r] and [r:len(x)]. However, as the sorting continues, how do I continue referencing each smaller "subarray"?
So this is my code, but I'm not sure how to continue from here:
x = [4,7,4,2,4,6,5]
#r is pivot POSITION
r = len(x)-1
i = -1
for a in range(0,r+1):
if x[a] <= x[r]:
i+=1
x[i], x[a] = x[a], x[i]
You can implement quicksort purely by swapping the locations of items in a list, rather than actually creating new lists.
But unless this is some sort of homework assignment, the best option is generally to use python's built-in sort() function, which automatically uses quicksort where appropriate.
There's something not right in here. You need to have two definitions, one for the partition and one for the quicksort process itself. The quicksort will then need to have some sort of loop, so that it will continue applying the partition to subarrays of the array. Go and check the Wikipedia article to understand how this works.

Best algorithm to find N unique random numbers in VERY large array

I have an array with, for example, 1000000000000 of elements (integers). What is the best approach to pick, for example, only 3 random and unique elements from this array? Elements must be unique in whole array, not in list of N (3 in my example) elements.
I read about Reservoir sampling, but it provides only method to pick random numbers, which can be non-unique.
If the odds of hitting a non-unique value are low, your best bet will be to select 3 random numbers from the array, then check each against the entire array to ensure it is unique - if not, choose another random sample to replace it and repeat the test.
If the odds of hitting a non-unique value are high, this increases the number of times you'll need to scan the array looking for uniqueness and makes the simple solution non-optimal. In that case you'll want to split the task of ensuring unique numbers from the task of making a random selection.
Sorting the array is the easiest way to find duplicates. Most sorting algorithms are O(n log n), but since your keys are integers Radix sort can potentially be faster.
Another possibility is to use a hash table to find duplicates, but that will require significant space. You can use a smaller hash table or Bloom filter to identify potential duplicates, then use another method to go through that smaller list.
counts = [0] * (MAXINT-MININT+1)
for value in Elements:
counts[value] += 1
uniques = [c for c in counts where c==1]
result = random.pick_3_from(uniques)
I assume that you have a reasonable idea what fraction of the array values are likely to be unique. So you would know, for instance, that if you picked 1000 random array values, the odds are good that one is unique.
Step 1. Pick 3 random hash algorithms. They can all be the same algorithm, except that you add different integers to each as a first step.
Step 2. Scan the array. Hash each integer all three ways, and for each hash algorithm, keep track of the X lowest hash codes you get (you can use a priority queue for this), and keep a hash table of how many times each of those integers occurs.
Step 3. For each hash algorithm, look for a unique element in that bucket. If it is already picked in another bucket, find another. (Should be a rare boundary case.)
That is your set of three random unique elements. Every unique triple should have even odds of being picked.
(Note: For many purposes it would be fine to just use one hash algorithm and find 3 things from its list...)
This algorithm will succeed with high likelihood in one pass through the array. What is better yet is that the intermediate data structure that it uses is fairly small and is amenable to merging. Therefore this can be parallelized across machines for a very large data set.

Find the maximum element which is common in n different arrays?

Earlier today I asked similar problem to find the maximum element which is common in two arrays. I got a couple of good solutions here ( Find the maximum element which is common in two arrays?).
Now it occurred to me, what if instead of two arrays we have to find the maximum element which is common in n different arrays?
Example:
array1 = [1,5,2,4,6,88,34]
array2 = [1,5,6,2,34]
array3 = [1,34]
array4 = [7,99,34]
Here the maximum element which is common in all the arrays is 34.
Is it a good idea to create a hashmap of the array1, array2 ..... array(N-1) separately and then check every element of arrayN in each of these hashmaps keeping track of maximum element (when present in all the hashmaps)?
Can we have better solutions than this?
For each array A_n:
Add all elements in A_n to a hashset, H_n
Create a hashmap, M, which maps values to counts.
For each hashset H_n:
For each value, v, in H_n:
M[v]++
Go through M for the highest value with count == N
This will run in O(n) time and space, where n is the total number of elements in all arrays. It also properly deals with elements being duplicated in a single array, which you didn't mention but which might cause problems for some algorithms. If you know that you won't have duplicate elements in a single array you can skip the first step and add values to M directly from the arrays.
hashmap is not good to keep track of max elements. what you want is a max heap or hashset.
you create n max heap and you traverse n of them. you take the common max element. this might be hard to implement.
another approach would be to find the intersection of all the sets and take the max from the intersection. in this case, you can use hashset.

Find a common element within N arrays

If I have N arrays, what is the best(Time complexity. Space is not important) way to find the common elements. You could just find 1 element and stop.
Edit: The elements are all Numbers.
Edit: These are unsorted. Please do not sort and scan.
This is not a homework problem. Somebody asked me this question a long time ago. He was using a hash to solve the problem and asked me if I had a better way.
Create a hash index, with elements as keys, counts as values. Loop through all values and update the count in the index. Afterwards, run through the index and check which elements have count = N. Looking up an element in the index should be O(1), combined with looping through all M elements should be O(M).
If you want to keep order specific to a certain input array, loop over that array and test the element counts in the index in that order.
Some special cases:
if you know that the elements are (positive) integers with a maximum number that is not too high, you could just use a normal array as "hash" index to keep counts, where the number are just the array index.
I've assumed that in each array each number occurs only once. Adapting it for more occurrences should be easy (set the i-th bit in the count for the i-th array, or only update if the current element count == i-1).
EDIT when I answered the question, the question did not have the part of "a better way" than hashing in it.
The most direct method is to intersect the first 2 arrays and then intersecting this intersection with the remaining N-2 arrays.
If 'intersection' is not defined in the language in which you're working or you require a more specific answer (ie you need the answer to 'how do you do the intersection') then modify your question as such.
Without sorting there isn't an optimized way to do this based on the information given. (ie sorting and positioning all elements relatively to each other then iterating over the length of the arrays checking for defined elements in all the arrays at once)
The question asks is there a better way than hashing. There is no better way (i.e. better time complexity) than doing a hash as time to hash each element is typically constant. Empirical performance is also favorable particularly if the range of values is can be mapped one to one to an array maintaining counts. The time is then proportional to the number of elements across all the arrays. Sorting will not give better complexity, since this will still need to visit each element at least once, and then there is the log N for sorting each array.
Back to hashing, from a performance standpoint, you will get the best empirical performance by not processing each array fully, but processing only a block of elements from each array before proceeding onto the next array. This will take advantage of the CPU cache. It also results in fewer elements being hashed in favorable cases when common elements appear in the same regions of the array (e.g. common elements at the start of all arrays.) Worst case behaviour is no worse than hashing each array in full - merely that all elements are hashed.
I dont think approach suggested by catchmeifyoutry will work.
Let us say you have two arrays
1: {1,1,2,3,4,5}
2: {1,3,6,7}
then answer should be 1 and 3. But if we use hashtable approach, 1 will have count 3 and we will never find 1, int his situation.
Also problems becomes more complex if we have input something like this:
1: {1,1,1,2,3,4}
2: {1,1,5,6}
Here i think we should give output as 1,1. Suggested approach fails in both cases.
Solution :
read first array and put into hashtable. If we find same key again, dont increment counter. Read second array in same manner. Now in the hashtable we have common elelements which has count as 2.
But again this approach will fail in second input set which i gave earlier.
I'd first start with the degenerate case, finding common elements between 2 arrays (more on this later). From there I'll have a collection of common values which I will use as an array itself and compare it against the next array. This check would be performed N-1 times or until the "carry" array of common elements drops to size 0.
One could speed this up, I'd imagine, by divide-and-conquer, splitting the N arrays into the end nodes of a tree. The next level up the tree is N/2 common element arrays, and so forth and so on until you have an array at the top that is either filled or not. In either case, you'd have your answer.
Without sorting and scanning the best operational speed you'll get for comparing 2 arrays for common elements is O(N2).

Resources