Algorithm get a new list containing no duplicated item by adding any 2 elements in a big array - algorithm

I can only think of this naive algorithm. Any better way? C/C++, Ruby ,Haskell is OK.
arry = [1,5,.....4569895] //1000000 elements ,sorted , no duplicated
newArray = Hash.new
for (i = 0 ; i < arry.length ;i++ )
{
for (j = 0 ; j < arry.length ;j ++ )
{
elem = arry[i] + arry[j]
if (! newArray.key?(elem))
{
newArray [elem] = arry[i] + arry[j]
}
}
}
EDIT : sorry. I have discrete value in the array , instead of [1..1000000]

It would be more efficient to separate the algorithm into two distinct steps. (Warning: pseudocode ahead)
First create n-1 lists by adding the rest of the elements to the ith element. This can be done in parallel for each list. Note that the resulting lists will be sorted.
newArray = array(array.length);
for (i = 0 ; i < array.length ;i++ ) {
newArray[i] = array(array.length - i - 1);
for (j = 0; j < array.length - i; j++) {
newArray[i][j] = array[i] + array[j + i];
}
}
Second use merge sort in to merge the resulted lists. You can do this in parallel, e.g. merge newArray[0] - newArray[i], newArray[2] - newArray[1-i], ... and then again until you only have one list.

If the condition says that you should be able to add any item in the range, then the only way i can think of is to check if the sum is not yet in the result list. Since for any number x, there are x different additions that lead to x. (Or x/2 if you think that 1 + 2 and 2 + 1 is the same addition).

There is one obvious optimization: make the second loop start at the indice i, that way you will avoid having x+y and y+x.
Then if you don't want to use a set, you could use the fact that the items are sorted, so you could build N lists, and merge them while removing the duplicates.

I'm afraid the best worst-case time complexity is O(n2). For input {20, 21, 22, ...}, you won't get any duplicate adding these numbers. Assuming hash insertions are O(1), you already have the best algorithm...

Related

How to get original array from random shuffle of an array

I was asked in an interview today below question. I gave O(nlgn) solution but I was asked to give O(n) solution. I could not come up with O(n) solution. Can you help?
An input array is given like [1,2,4] then every element of it is doubled and
appended into the array. So the array now looks like [1,2,4,2,4,8]. How
this array is randomly shuffled. One possible random arrangement is
[4,8,2,1,2,4]. Now we are given this random shuffled array and we want to
get original array [1,2,4] in O(n) time.
The original array can be returned in any order. How can I do it?
Here's an O(N) Java solution that could be improved by first making sure that the array is of the proper form. For example it shouldn't accept [0] as an input:
import java.util.*;
class Solution {
public static int[] findOriginalArray(int[] changed) {
if (changed.length % 2 != 0)
return new int[] {};
// set Map size to optimal value to avoid rehashes
Map<Integer,Integer> count = new HashMap<>(changed.length*100/75);
int[] original = new int[changed.length/2];
int pos = 0;
// count frequency for each number
for (int n : changed) {
count.put(n, count.getOrDefault(n,0)+1);
}
// now decide which go into the answer
for (int n : changed) {
int smallest = n;
for (int m=n; m > 0 && count.getOrDefault(m,0) > 0; m = m/2) {
//System.out.println(m);
smallest = m;
if (m % 2 != 0) break;
}
// trickle up from smallest to largest while count > 0
for (int m=smallest, mm = 2*m; count.getOrDefault(mm,0) > 0; m = mm, mm=2*mm){
int ct = count.getOrDefault(mm,0);
while (count.get(m) > 0 && ct > 0) {
//System.out.println("adding "+m);
original[pos++] = m;
count.put(mm, ct -1);
count.put(m, count.get(m) - 1);
ct = count.getOrDefault(mm,0);
}
}
}
// check for incorrect format
if (count.values().stream().anyMatch(x -> x > 0)) {
return new int[] {};
}
return original;
}
public static void main(String[] args) {
int[] changed = {1,2,4,2,4,8};
System.out.println(Arrays.toString(changed));
System.out.println(Arrays.toString(findOriginalArray(changed)));
}
}
But I've tried to keep it simple.
The output is NOT guaranteed to be sorted. If you want it sorted it's going to cost O(NlogN) inevitably unless you use a Radix sort or something similar (which would make it O(NlogE) where E is the max value of the numbers you're sorting and logE the number of bits needed).
Runtime
This may not look that it is O(N) but you can see that it is because for every loop it will only find the lowest number in the chain ONCE, then trickle up the chain ONCE. Or said another way, in every iteration it will do O(X) iterations to process X elements. What will remain is O(N-X) elements. Therefore, even though there are for's inside for's it is still O(N).
An example execution can be seen with [64,32,16,8,4,2].
If this where not O(N) if you print out each value that it traverses to find the smallest you'd expect to see the values appear over and over again (for example N*(N+1)/2 times).
But instead you see them only once:
finding smallest 64
finding smallest 32
finding smallest 16
finding smallest 8
finding smallest 4
finding smallest 2
adding 2
adding 8
adding 32
If you're familiar with the Heapify algorithm you'll recognize the approach here.
def findOriginalArray(self, changed: List[int]) -> List[int]:
size = len(changed)
ans = []
left_elements = size//2
#IF SIZE IS ODD THEN RETURN [] NO SOLN. IS POSSIBLE
if(size%2 !=0):
return ans
#FREQUENCY DICTIONARY given array [0,0,2,1] my map will be: {0:2,2:1,1:1}
d = {}
for i in changed:
if(i in d):
d[i]+=1
else:
d[i] = 1
# CHECK THE EDGE CASE OF 0
if(0 in d):
count = d[0]
half = count//2
if((count % 2 != 0) or (half > left_elements)):
return ans
left_elements -= half
ans = [0 for i in range(half)]
#CHECK REST OF THE CASES : considering the values will be 10^5
for i in range(1,50001):
if(i in d):
if(d[i] > 0):
count = d[i]
if(count > left_elements):
ans = []
break
left_elements -= d[i]
for j in range(count):
ans.append(i)
if(2*i in d):
if(d[2*i] < count):
ans = []
break
else:
d[2*i] -= count
else:
ans = []
break
return ans
I have a simple idea which might not be the best, but I could not think of a case where it would not work. Having the array A with the doubled elements and randomly shuffled, keep a helper map. Process each element of the array and, each time you find a new element, add it to the map with the value 0. When an element is processed, increment map[i] and decrement map[2*i]. Next you iterate over the map and print the elements that have a value greater than zero.
A simple example, say that the vector is:
[1, 2, 3]
And the doubled/shuffled version is:
A = [3, 2, 1, 4, 2, 6]
When processing 3, first add the keys 3 and 6 to the map with value zero. Increment map[3] and decrement map[6]. This way, map[3] = 1 and map[6] = -1. Then for the next element map[2] = 1 and map[4] = -1 and so forth. The final state of the map in this example would be map[1] = 1, map[2] = 1, map[3] = 1, map[4] = -1, map[6] = 0, map[8] = -1, map[12] = -1.
Then you just process the keys of the map and, for each key with a value greater than zero, add it to the output. There are certainly more efficient solutions, but this one is O(n).
In C++, you can try this.
With time is O(N + KlogK) where N is the length of input, and K is the number of unique elements in input.
class Solution {
public:
vector<int> findOriginalArray(vector<int>& input) {
if (input.size() % 2) return {};
unordered_map<int, int> m;
for (int n : input) m[n]++;
vector<int> nums;
for (auto [n, cnt] : m) nums.push_back(n);
sort(begin(nums), end(nums));
vector<int> out;
for (int n : nums) {
if (m[2 * n] < m[n]) return {};
for (int i = 0; i < m[n]; ++i, --m[2 * n]) out.push_back(n);
}
return out;
}
};
Not so clear about the space complexity required in the question, so this is my top-of-the-mind attempt to this question if this requires O(n) time complexity.
If the length of the input array is not even, then its wrong !!
Create a map, add the elements of the input array to it.
Divide each element in the input array by 2 and check if that value exists in the map. If it exists, add it to the array (slice) orig.
There is a chance we have added duplicate values to this original array, clean it!!
Here is a sample go code:
https://go.dev/play/p/w4mm-rloHyi
I am sure we can optimize this code in a lot of ways for space complexities. But its O(n) time complexity.

Find max sum of elements in an array ( with twist)

Given a array with +ve and -ve integer , find the maximum sum such that you are not allowed to skip 2 contiguous elements ( i.e you have to select at least one of them to move forward).
eg :-
10 , 20 , 30, -10 , -50 , 40 , -50, -1, -3
Output : 10+20+30-10+40-1 = 89
This problem can be solved using Dynamic Programming approach.
Let arr be the given array and opt be the array to store the optimal solutions.
opt[i] is the maximum sum that can be obtained starting from element i, inclusive.
opt[i] = arr[i] + (some other elements after i)
Now to solve the problem we iterate the array arr backwards, each time storing the answer opt[i].
Since we cannot skip 2 contiguous elements, either element i+1 or element i+2 has to be included
in opt[i].
So for each i, opt[i] = arr[i] + max(opt[i+1], opt[i+2])
See this code to understand:
int arr[n]; // array of given numbers. array size = n.
nput(arr, n); // input the array elements (given numbers)
int opt[n+2]; // optimal solutions.
memset(opt, 0, sizeof(opt)); // Initially set all optimal solutions to 0.
for(int i = n-1; i >= 0; i--) {
opt[i] = arr[i] + max(opt[i+1], opt[i+2]);
}
ans = max(opt[0], opt[1]) // final answer.
Observe that opt array has n+2 elements. This is to avoid getting illegal memory access exception (memory out of bounds) when we try to access opt[i+1] and opt[i+2] for the last element (n-1).
See the working implementation of the algorithm given above
Use a recurrence that accounts for that:
dp[i] = max(dp[i - 1] + a[i], <- take two consecutives
dp[i - 2] + a[i], <- skip a[i - 1])
Base cases left as an exercise.
If you see a +ve integer add it to the sum. If you see a negative integer, then inspect the next integer pick which ever is maximum and add it to the sum.
10 , 20 , 30, -10 , -50 , 40 , -50, -1, -3
For this add 10, 20, 30, max(-10, -50), 40 max(-50, -1) and since there is no element next to -3 discard it.
The last element will go to sum if it was +ve.
Answer:
I think this algorithm will help.
1. Create a method which gives output the maximum sum of particular user input array say T[n], where n denotes the total no. of elements.
2. Now this method will keep on adding array elements till they are positive. As we want to maximize the sum and there is no point in dropping positive elements behind.
3. As soon as our method encounters a negative element, it will transfer all consecutive negative elements to another method which create a new array say N[i] such that this array will contain all the consecutive negative elements that we encountered in T[n] and returns N[i]'s max output.
In this way our main method is not affected and its keep on adding positive elements and whenever it encounters negative element, it instead of adding their real values adds the net max output of that consecutive array of negative elements.
for example: T[n] = 29,34,55,-6,-5,-4,6,43,-8,-9,-4,-3,2,78 //here n=14
Main Method Working:
29+34+55+(sends data & gets value from Secondary method of array [-6,-5,-4])+6+43+(sends data & gets value from Secondary method of array [-8,-9,-4,-3])+2+78
Process Terminates with max output.
Secondary Method Working:
{
N[i] = gets array from Main method or itself as and when required.
This is basically a recursive method.
say N[i] has elements like N1, N2, N3, N4, etc.
for i>=3:
Now choice goes like this.
1. If we take N1 then we can recurse the left off array i.e. N[i-1] which has all elements except N1 in same order. Such that the net max output will be
N1+(sends data & gets value from Secondary method of array N[i-1] recursively)
2. If we doesn't take N1, then we cannot skip N2. So, Now algorithm is like 1st choice but starting with N2. So max output in this case will be
N2+(sends data & gets value from Secondary method of array N[i-2] recursively).
Here N[i-2] is an array containing all N[i] elements except N1 & N2 in same order.
Termination: When we are left with the array of size one ( for N[i-2] ) then we have to choose that particular value as no option.
The recursions will finally yield the max outputs and we have to finally choose the output of that choice which is more.
and redirect the max output to wherever required.
for i=2:
we have to choose the value which is bigger
for i=1:
We can surely skip that value.
So max output in this case will be 0.
}
I think this answer will help to you.
Given array:
Given:- 10 20 30 -10 -50 40 -50 -1 -3
Array1:-10 30 60 50 10 90 40 89 86
Array2:-10 20 50 40 0 80 30 79 76
Take the max value of array1[n-1],array1[n],array2[n-1],array2[n] i.e 89(array1[n-1])
Algorithm:-
For the array1 value assign array1[0]=a[0],array1=a[0]+a[1] and array2[0]=a[0],array2[1]=a[1].
calculate the array1 value from 2 to n is max of sum of array1[i-1]+a[i] or array1[i-2]+a[i].
for loop from 2 to n{
array1[i]=max(array1[i-1]+a[i],array1[i-2]+a[i]);
}
similarly for array2 value from 2 to n is max of sum of array2[i-1]+a[i] or array2[i-2]+a[i].
for loop from 2 to n{
array2[i]=max(array2[i-1]+a[i],array2[i-2]+a[i]);
}
Finally find the max value of array1[n-1],array[n],array2[n-1],array2[n];
int max(int a,int b){
return a>b?a:b;
}
int main(){
int a[]={10,20,30,-10,-50,40,-50,-1,-3};
int i,n,max_sum;
n=sizeof(a)/sizeof(a[0]);
int array1[n],array2[n];
array1[0]=a[0];
array1[1]=a[0]+a[1];
array2[0]=a[0];
array2[1]=a[1];
for loop from 2 to n{
array1[i]=max(array1[i-1]+a[i],array1[i-2]+a[i]);
array2[i]=max(array2[i-1]+a[i],array2[i-2]+a[i]);
}
--i;
max_sum=max(array1[i],array1[i-1]);
max_sum=max(max_sum,array2[i-1]);
max_sum=max(max_sum,array2[i]);
printf("The max_sum is %d",max_sum);
return 0;
}
Ans: The max_sum is 89
public static void countSum(int[] a) {
int count = 0;
int skip = 0;
int newCount = 0;
if(a.length==1)
{
count = a[0];
}
else
{
for(int i:a)
{
newCount = count + i;
if(newCount>=skip)
{
count = newCount;
skip = newCount;
}
else
{
count = skip;
skip = newCount;
}
}
}
System.out.println(count);
}
}
Let the array be of size N, indexed as 1...N
Let f(n) be the function, that provides the answer for max sum of sub array (1...n), such that no two left over elements are consecutive.
f(n) = max (a[n-1] + f(n-2), a(n) + f(n-1))
In first option, which is - {a[n-1] + f(n-2)}, we are leaving the last element, and due to condition given in question selecting the second last element.
In the second option, which is - {a(n) + f(n-1)} we are selecting the last element of the subarray, so we have an option to select/deselect the second last element.
Now starting from the base case :
f(0) = 0 [Subarray (1..0) doesn't exist]
f(1) = (a[1] > 0 ? a[1] : 0); [Subarray (1..1)]
f(2) = max( a(2) + 0, a[1] + f(1)) [Choosing atleast one of them]
Moving forward we can calculate any f(n), where n = 1...N, and store them to calculate next results. And yes, obviously, the case f(N) will give us the answer.
Time complexity o(n)
Space complexity o(n)
n = arr.length().
Append a 0 at the end of the array to handle boundary case.
ans: int array of size n+1.
ans[i] will store the answer for array a[0...i] which includes a[i] in the answer sum.
Now,
ans[0] = a[0]
ans[1] = max(a[1], a[1] + ans[0])
for i in [2,n-1]:
ans[i] = max(ans[i-1] , ans[i-2]) + a[i]
Final answer would be a[n]
If you want to avoid using Dynamic Programming
To find the maximum sum, first, you've to add all the positive
numbers.
We'll be skipping only negative elements. Since we're not
allowed to skip 2 contiguous elements, we will put all contiguous
negative elements in a temp array, and can figure out the maximum sum
of alternate elements using sum_odd_even function as defined below.
Then we can add the maximum of all such temp arrays to our sum of all
positive numbers. And the final sum will give us the desired output.
Code:
def sum_odd_even(arr):
sum1 = sum2 = 0
for i in range(len(arr)):
if i%2 == 0:
sum1 += arr[i]
else:
sum2 += arr[i]
return max(sum1,sum2)
input = [10, 20, 30, -10, -50, 40, -50, -1, -3]
result = 0
temp = []
for i in range(len(input)):
if input[i] > 0:
result += input[i]
if input[i] < 0 and i != len(input)-1:
temp.append(input[i])
elif input[i] < 0:
temp.append(input[i])
result += sum_odd_even(temp)
temp = []
else:
result += sum_odd_even(temp)
temp = []
print result
Simple Solution: Skip with twist :). Just skip the smallest number in i & i+1 if consecutive -ve. Have if conditions to check that till n-2 elements and check for the last element in the end.
int getMaxSum(int[] a) {
int sum = 0;
for (int i = 0; i <= a.length-2; i++) {
if (a[i]>0){
sum +=a[i];
continue;
} else if (a[i+1] > 0){
i++;
continue;
} else {
sum += Math.max(a[i],a[i+1]);
i++;
}
}
if (a[a.length-1] > 0){
sum+=a[a.length-1];
}
return sum;
}
The correct recurrence is as follow:
dp[i] = max(dp[i - 1] + a[i], dp[i - 2] + a[i - 1])
The first case is the one we pick the i-th element. The second case is the one we skip the i-th element. In the second case, we must pick the (i-1)th element.
The problem of IVlad's answer is that it always pick i-th element, which can lead to incorrect answer.
This question can be solved using include,exclude approach.
For first element, include = arr[0], exclude = 0.
For rest of the elements:
nextInclude = arr[i]+max(include, exclude)
nextExclude = include
include = nextInclude
exclude = nextExclude
Finally, ans = Math.max(include,exclude).
Similar questions can be referred at (Not the same)=> https://www.youtube.com/watch?v=VT4bZV24QNo&t=675s&ab_channel=Pepcoding.

Algorithm to partition/distribute sum between buckets in all unique ways

The Problem
I need an algorithm that does this:
Find all the unique ways to partition a given sum across 'buckets' not caring about order
I hope I was clear reasonably coherent in expressing myself.
Example
For the sum 5 and 3 buckets, what the algorithm should return is:
[5, 0, 0]
[4, 1, 0]
[3, 2, 0]
[3, 1, 1]
[2, 2, 1]
Disclaimer
I'm sorry if this question might be a dupe, but I don't know exactly what these sort of problems are called. Still, I searched on Google and SO using all wordings that I could think of, but only found results for distributing in the most even way, not all unique ways.
Its bit easier for me to code few lines than writing a 5-page essay on algorithm.
The simplest version to think of:
vector<int> ans;
void solve(int amount, int buckets, int max){
if(amount <= 0) { printAnswer(); return;}
if(amount > buckets * max) return; // we wont be able to fulfill this request anymore
for(int i = max; i >= 1; i--){
ans.push_back(i);
solve(amount-i, buckets-1, i);
ans.pop_back();
}
}
void printAnswer(){
for(int i = 0; i < ans.size(); i++) printf("%d ", ans[i]);
for(int i = 0; i < all_my_buckets - ans.size(); i++) printf("0 ");
printf("\n");
}
Its also worth improving to the point where you stack your choices like solve( amount-k*i, buckets-k, i-1) - so you wont create too deep recurrence. (As far as I know the stack would be of size O(sqrt(n)) then.
Why no dynamic programming?
We dont want to find count of all those possibilities, so even if we reach the same point again, we would have to print every single number anyway, so the complexity will stay the same.
I hope it helps you a bit, feel free to ask me any question
Here's something in Haskell that relies on this answer:
import Data.List (nub, sort)
parts 0 = []
parts n = nub $ map sort $ [n] : [x:xs | x <- [1..n`div`2], xs <- parts(n - x)]
partitions n buckets =
let p = filter (\x -> length x <= buckets) $ parts n
in map (\x -> if length x == buckets then x else addZeros x) p
where addZeros xs = xs ++ replicate (buckets - length xs) 0
OUTPUT:
*Main> partitions 5 3
[[5,0,0],[1,4,0],[1,1,3],[1,2,2],[2,3,0]]
If there are only three buckets this wud be the simplest code.
for(int i=0;i<=5;i++){
for(int j=0;j<=5-i&&j<=i;j++){
if(5-i-j<=i && 5-i-j<=j)
System.out.println("["+i+","+j+","+(5-i-j)+"]");
}
}
A completely different method, but if you don't care about efficiency or optimization, you could always use the old "bucket-free" partition algorithms. Then, you could filter the search by checking the number of zeroes in the answers.
For example [1,1,1,1,1] would be ignored since it has more than 3 buckets, but [2,2,1,0,0] would pass.
This is called an integer partition.
Fast Integer Partition Algorithms is a comprehensive paper describing all of the fastest algorithms for performing an integer partition.
Just adding my approach here along with the others'. It's written in Python, so it's practically like pseudocode.
My first approach worked, but it was horribly inefficient:
def intPart(buckets, balls):
return uniqify(_intPart(buckets, balls))
def _intPart(buckets, balls):
solutions = []
# base case
if buckets == 1:
return [[balls]]
# recursive strategy
for i in range(balls + 1):
for sol in _intPart(buckets - 1, balls - i):
cur = [i]
cur.extend(sol)
solutions.append(cur)
return solutions
def uniqify(seq):
seen = set()
sort = [list(reversed(sorted(elem))) for elem in seq]
return [elem for elem in sort if str(elem) not in seen and not seen.add(str(elem))]
Here's my reworked solution. It completely avoids the need to 'uniquify' it by the tracking the balls in the previous bucket using the max_ variable. This sorts the lists and prevents any dupes:
def intPart(buckets, balls, max_ = None):
# init vars
sols = []
if max_ is None:
max_ = balls
min_ = max(0, balls - max_)
# assert stuff
assert buckets >= 1
assert balls >= 0
# base cases
if (buckets == 1):
if balls <= max_:
sols.append([balls])
elif balls == 0:
sol = [0] * buckets
sols.append(sol)
# recursive strategy
else:
for there in range(min_, balls + 1):
here = balls - there
ways = intPart(buckets - 1, there, here)
for way in ways:
sol = [here]
sol.extend(way)
sols.append(sol)
return sols
Just for comprehensiveness, here's another answer stolen from MJD written in Perl:
#!/usr/bin/perl
sub part {
my ($n, $b, $min) = #_;
$min = 0 unless defined $min;
# base case
if ($b == 0) {
if ($n == 0) { return ([]) }
else { return () }
}
my #partitions;
for my $first ($min .. $n) {
my #sub_partitions = part($n - $first, $b-1, $first);
for my $sp (#sub_partitions) {
push #partitions, [$first, #$sp];
}
}
return #partitions;
}

How can I efficiently determine if two lists contain elements ordered in the same way?

I have two ordered lists of the same element type, each list having at most one element of each value (say ints and unique numbers), but otherwise with no restrictions (one may be a subset of the other, they may be completely disjunct, or share some elements but not others).
How do I efficiently determine if A is ordering any two items in a different way than B is? For example, if A has the items 1, 2, 10 and B the items 2, 10, 1, the property would not hold as A lists 1 before 10 but B lists it after 10. 1, 2, 10 vs 2, 10, 5 would be perfectly valid however as A never mentions 5 at all, I cannot rely on any given sorting rule shared by both lists.
You can get O(n) as follows. First, find the intersection of the two sets using hashing. Second, test whether A and B are identical if you only consider elements from the intersection.
My approach would be to first make sorted copies of A and B which also record the positions of elements in the original lists:
for i in 1 .. length(A):
Apos[i] = (A, i)
sortedApos = sort(Apos[] by first element of each pair)
for i in 1 .. length(B):
Bpos[i] = (B, i)
sortedBpos = sort(Bpos[] by first element of each pair)
Now find those elements in common using a standard list merge that records the positions in both A and B of the shared elements:
i = 1
j = 1
shared = []
while i <= length(A) && j <= length(B)
if sortedApos[i][1] < sortedBpos[j][1]
++i
else if sortedApos[i][1] > sortedBpos[j][1]
++j
else // They're equal
append(shared, (sortedApos[i][2], sortedBpos[j][2]))
++i
++j
Finally, sort shared by its first element (position in A) and check that all its second elements (positions in B) are increasing. This will be the case iff the elements common to A and B appear in the same order:
sortedShared = sort(shared[] by first element of each pair)
for i = 2 .. length(sortedShared)
if sortedShared[i][2] < sortedShared[i-1][2]
return DIFFERENT
return SAME
Time complexity: 2*(O(n) + O(nlog n)) + O(n) + O(nlog n) + O(n) = O(nlog n).
General approach: store all the values and their positions in B as keys and values in a HashMap. Iterate over the values in A and look them up in B's HashMap to get their position in B (or null). If this position is before the largest position value you've seen previously, then you know that something in B is in a different order than A. Runs in O(n) time.
Rough, totally untested code:
boolean valuesInSameOrder(int[] A, int[] B)
{
Map<Integer, Integer> bMap = new HashMap<Integer, Integer>();
for (int i = 0; i < B.length; i++)
{
bMap.put(B[i], i);
}
int maxPosInB = 0;
for (int i = 0; i < A.length; i++)
{
if(bMap.containsKey(A[i]))
{
int currPosInB = bMap.get(A[i]);
if (currPosInB < maxPosInB)
{
// B has something in a different order than A
return false;
}
else
{
maxPosInB = currPosInB;
}
}
}
// All of B's values are in the same order as A
return true;
}

Algorithm to find two repeated numbers in an array, without sorting

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]
There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.
OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.
You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q
Insert each element into a set/hashtable, first checking if its are already in it.
You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.
Check this old but good paper on the topic:
Finding Repeated Elements (PDF)
Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).
suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.
Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.
Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.
Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)
You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);
answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!
check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers
For each number: check if it exists in the rest of the array.
Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each
How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)
In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.
I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.
for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}
Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).
What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Resources