Find subarray with given sum - algorithm

I am trying to implement functional style of finding subarray with given sum.
Code i wrote is not up to functional style. Can someone help to make it more functional.
Problem : Given an unsorted array of nonnegative integers, find a continous subarray which adds to a given number.
Input: arr[] = {1, 4, 20, 3, 10, 5}, sum = 33
Ouptut: Sum found between indexes 2 and 4
Input: arr[] = {1, 4, 0, 0, 3, 10, 5}, sum = 7
Ouptut: Sum found between indexes 1 and 4
I could solve this problem in brute force approach. But looking for more effective functional solution.
val sumList = list.foldLeft(List(0), 0)((l, r) => (l._1 :+ (l._2+r), l._2 + r))._1.drop(1)
//Brute force approach
sumList.zipWithIndex.combinations(2).toList.collectFirst({
case i if i(1)._1 - i(0)._1 == sum => i
}) match {
case Some(List(x, y)) => println("elements which form the given sum are => "+ list.drop(x._2+1).take(y._2-x._2))
case _ => println("couldn't find elements which satisfy the given condition")
}
Algorithm : Initialize a variable curr_sum as first element. curr_sum indicates the sum of current subarray. Start from the second element and add all elements one by one to the curr_sum. If curr_sum becomes equal to sum, then print the solution. If curr_sum exceeds the sum, then remove trailing elemnents while curr_sum is greater than sum.
val list:List[Int] = List(1, 4, 20, 3, 10, 5)
val sum = 33
val (totalSum, start, end, isSumFound) = list.zipWithIndex.drop(1).foldLeft(list.head, 0, 1, false)((l, r) =>
if(!l._4) {
val tempSum = l._1 + r._1
if (tempSum == sum){
(sum, l._2, r._2, true)
} else if(tempSum > sum){
var (curSum, curIndex) = (tempSum, l._2)
while(curSum > sum && curIndex < list.length-1){
curSum = curSum - list(curIndex)
curIndex = l._2 +1
}
(curSum, curIndex, r._2, curSum == sum)
} else {
(tempSum, l._2, r._2, false)
}
}else
l
)
if(isSumFound || totalSum == sum){
println("elements which form the given sum are => "+ list.drop(start+1).take(end-start))
}else{
println("couldn't find elements which satisfy the given condition")
}

val list:List[Int] = List(1, 4, 20, 3, 10, 5)
val sum = 33
A method to return a iterator of sublists, first with the ones that start with the first element, then starting with the second...
def subLists[T](xs:List[T]):Iterator[List[T]] =
if (xs == Nil) Iterator.empty
else xs.inits ++ subLists(xs.tail)
Find the first list with the correct sum
val ol = subLists(list).collectFirst{ case x if x.sum == sum => x}
Then find the index again, and print the result
ol match {
case None => println("No such subsequence")
case Some(l) => val i = list.indexOfSlice(l)
println("Sequence of sum " + sum +
" found between " + i +
" and " + (i + l.length - 1))
}
//> Sequence of sum 33 found between 2 and 4
(you could keep track of the index associated with the sublist when building the iterator, but that seems more trouble than it is worth, and reduces the general usefulness of subLists)
EDIT: Here's a version of the code you posted that's more "functional". But I think my first version is clearer - it's simpler to separate the concerns of generating the sequences from checking their sums
val sumList = list.scanLeft(0){_ + _}
val is = for {i <- 1 to list.length - 1
j <- 0 to i
if sumList(i)-sumList(j) == sum}
yield (j, i-1)
is match {
case Seq() => println("No such subsequence")
case (start, end) +: _ =>
println("Sequence of sum " + sum +
" found between " + start + " and " + end )
}
//> Sequence of sum 33 found between 2 and 4
EDIT2: And here's an O(N) one. "Functional" in that there are no mutable variables, but it's less clear than the others, in my opinion. It's a bit clearer if you just print the results as they are found (no need to carry the rs part of the accumulator between iterations) but that side-effecting way seems less functional, so I return a list of solutions.
val sums = list.scanLeft(0)(_ + _) zipWithIndex
sums.drop(1).foldLeft((sums, List[(Int, Int)]())) {
case ((leftTotal, rs), total) =>
val newL = leftTotal.dropWhile(total._1 - _._1 > target)
if (total._1 - newL.head._1 == target)
(newL, (newL.head._2, total._2 - 1) :: rs)
else (newL, rs)
}._2
//> res0: List[(Int, Int)] = List((2,4))
O(N) because we pass the shortened newL as the next iterations leftTotal, so dropWhile only ever goes through the list once. This one relies on the integers being non-negative (so adding another element cannot reduce the total), the others work with negative integers too.

Related

Counting the number of ways to make up a string

I have just started learning dynamic programming and was able to do some of the basic problems, such as fibbonaci, the knapsack and a few more problems. Coming across the problem
below, I got stuck and do not know how to proceed forward. What confuses me is what would be the base case in this case, and the overlapping problems. Not knowing
this prevents me from developing a relation. They are not as apparent in this example as they were in the previous ones I have solved thus far.
Suppose we are given some string origString, a string toMatch and some number maxNum greater than or equal to 0. How can we count in how many ways it is possible to take maxNum number of nonempty and nonoverlapping substrings of the string origString to make up the string toMatch?
Example:
If origString = "ppkpke", and toMatch = "ppke"
maxNum = 1: countWays("ppkpke", "ppke", 1) will give 0 because toMatch is not a substring of origString.
maxNum = 2: countWays("ppkpke", "ppke", 2) will give 4 because 4 different combinations of 2 substring made up of "ppkpke" can make "ppke".
Those strings are "ppk" & "e", "pp" & "ke" , "p" & "pke" (excluding "p") and "p" & "pke" (excluding "k")
As an initial word of caution, I’d say that although my solution happens to match the expected output for the tiny test set, it is very likely wrong. It’s up to you to double-check it on other examples you may have etc.
The algorithm walks the longer string and tries to spread the shorter string over it. The incremental state of the algorithm consists of tuples of 3 elements:
long string coordinate i (origString[i] == toMatch[j])
short string coordinate j (origString[i] == toMatch[j])
number of ways we made it into that^^^ state
Then we just walk along the strings over and over again, using stored, previously discovered state, and sum up the total number(s) of ways each state was achieved — in the typical dynamic programming fashion.
For a state to count as a solution, j must be at the end of the short string and the number of iterations of the dynamic algorithm must be equivalent to the number of substrings we wanted at that point (because each iteration added one substring).
It is not entirely clear to me from the assignment whether maxNum actually means something like “exactNum”, i.e. exactly that many substrings, or whether we should sum across all lower or equal numbers of substrings. So the function returns a dictionary like { #substrings : #decompositions }, so that the output can be adjusted as needed.
#!/usr/bin/env python
def countWays(origString, toMatch, maxNum):
origLen = len(origString)
matchLen = len(toMatch)
state = {}
for i in range(origLen):
for j in range(matchLen):
o = i + j
if origString[o] != toMatch[j]:
break
state[(o, j)] = 1
sums = {}
for n in range(1, maxNum):
if not state:
break
nextState = {}
for istart, jstart in state:
prev = state[(istart, jstart)]
for i in range(istart + 1, origLen):
for j in range(jstart + 1, matchLen):
o = i + j - jstart - 1
if origString[o] != toMatch[j]:
break
nextState[(o, j)] = prev + nextState.get((o, j), 0)
sums[n] = sum(state[(i, j)] for i, j in state if j == matchLen - 1)
state = nextState
sums[maxNum] = sum(state[(i, j)] for i, j in state if j == matchLen - 1)
return sums
result = countWays(origString='ppkpke', toMatch='ppke', maxNum=5)
print('for an exact number of substrings:', result)
print(' for up to a number of substrings:', {
n: s for n, s in ((m, sum(result[k] for k in range(1, m + 1)))
for m in range(1, 1 + max(result.keys())))})
This^^^ code is a quick and ugly hack and nothing more. There is a huge room for improvement, including (but not limited to) the use of generator functions (yield), the use of #memoize etc. Here’s some output:
for an exact number of substrings: {1: 0, 2: 4, 3: 8, 4: 4, 5: 0}
for up to a number of substrings: {1: 0, 2: 4, 3: 12, 4: 16, 5: 16}
It would be an interesting (and nicely challenging) exercise to store a bit more of the dynamic state (e.g. to keep it for each n) and then reconstruct and pretty-print (efficiently) the exact string (de)compositions that were counted.
Here is a recursive solution.
Compares the first character of source and target, and if they're equal, choose to either take it (advancing by 1 char in both strings) or not take it (advancing by 1 char in source but not in target). The value of k is decremented everytime a new substring is created; there is an additional variable continued which is True if we're in the middle of building a substring, and False otherwise.
def countWays(source, target, k, continued=False):
if len(target) == 0:
return (k == 0)
elif (k == 0 and not continued) or len(source) == 0:
return 0
elif source[0] == target[0]:
if continued:
return countWays(source[1:], target[1:], k, True) + countWays(source[1:], target[1:], k-1, True) + countWays(source[1:], target, k, False)
else:
return countWays(source[1:], target[1:], k-1, True) + countWays(source[1:], target, k, False)
else:
return countWays(source[1:], target, k, False)
print(countWays('ppkpke', 'ppke', 1))
# 0
print(countWays('ppkpke', 'ppke', 2))
# 4
print(countWays('ppkpke', 'ppke', 3))
# 8
print(countWays('ppkpke', 'ppke', 4))
# 4
print(countWays('ppkpke', 'ppke', 5))
# 0

alternating substring using (recursive) dynamic programming

My task is to solve alternating sub string problem with a recursive dynamic programming approach:
Consider a sequence A = a1, a2, a3, ... an of integers. A subsequence B of a is a sequence B = b1, b2, .... ,bn which is created from A by removing some elements but by keeping the order. Given an integer sequence A, the goal is to compute anb alternating subsequence B, i.e. a sequence b1, ... bn such that for all i in {2, 3, ... , m-1}, if b{i-1} < b{i} then b{i} > b{i+1} and if b{i-1} > b{i} then b{i} < b{i+1}
So far I need to check on every recursive step if I want to take the element and look for the next alternating number or if I simply take the next number and start with both possibles of alternation.
s index from left
e end ( len(Array))
A array
g(A,s) a function which get next greater or smaller integer.
my recursive formula is:
V(A, s, e) = max( V(A, g(A,s),e), V(A, s+1, e) ) +1
V(A, g(A,s),e) takes the element and continues with next alternating one
V(A, s+1, e) leaves the element and start new sequence at next element
assuming that my implementation and approach is correct I suggest the runntime to O(n^2) since we need to know every combination ones.
Without the mesmerization part it would be O(2^n) like a binary trees leaf amount.
Is this analysis correct? Is might be just correct for the formula but not for the code...
the function getsmaller and getbigger are g(A,s)
A = [5,6,5,5,5,7,5,5,5,87,5]
s = 0
e = len(A)
memo_want_small = [-1] * len(A)
memo_want_bigger = [-1] * len(A)
def getsmaller(A, s):
erg = 0
for i in range(s, len(A)):
if A[i] < A[s]:
if i is not None:
return i
return -1
def getbigger(A, s):
for i in range(s, len(A)):
if A[i] > A[s]:
if i is not None:
return i
return -1
def wantsmall(A, s, e):
if s == -1: # no more alternating element
return 0
if s == e: # base case
return 0
if memo_want_small[s] is not -1:
return memo_want_small[s]
return_V = max(wantbigger(A, getbigger(A, s), e) , alt(A, s+1, e)) + 1
memo_want_small[s] = return_V
return return_V
def wantbigger(A, s, e):
if s == -1: # no more alternating element
return 0
if s == e: # base case
return 0
if memo_want_bigger[s] is not -1:
return memo_want_bigger[s]
return_V = max(wantsmall(A, getsmaller(A, s), e) , alt(A, s+1, e)) + 1
memo_want_bigger[s] = return_V
return return_V
def alt(A, s, e):
if s == e: # base case
return 0
return max(wantsmall(A, getsmaller(A, s), e), wantbigger(A, getbigger(A, s), e))
print("solution: " + str(alt(A,s,e)))
Let's consider a sequence going left from A[i], with direction up first.
First, there could not be a higher element, A[j] to the left of A[i] that ends a longer sequence, because if there were, we could always switch that element with A[i] and end up with an up-first sequence of the same length.
* Going left from A[i], up-first
↖
A[j]
... A[i]
Secondly, there could not be a lower element, A[j] to the left, ending a longer up-first sequence, and an element in between, A[k], that's higher than A[i], because then we could just add both A[i] and the higher element and get a sequence longer by two.
* Going left from A[i], up-first
A[k]
... ... A[i]
↖
A[j]
So, looking left, the longest up-first sequence ending at A[i] is either (1) the same length or longer than the sequence ending at the next higher element to the left, or (2) the same length as the sequence ending at the lowest element of any contiguous, monotonically increasing subarray that reaches A[i].
Now, consider an element, A[r], the first higher to the right of A[i], for which we would like to find the longest down-first sequence ending at it. As we've shown, elements to the left of A[i] that end an up-first sequence and are either higher or lower than A[i] can already be accounted for when calculating a result for A[i], therefore it remains the only cell of interest for calculating the longest down-first sequence ending at A[r] (looking to the left). This points to an O(n) dynamic program.
JavaScript code:
// Preprocess previous higher and lower elements in O(n)
// Adapted from https://www.geeksforgeeks.org/next-greater-element
function prev(A, higherOrLower) {
function compare(a, b){
if (higherOrLower == 'higher')
return a < b
else if (higherOrLower == 'lower')
return a > b
}
let result = new Array(A.length)
let stack = [A.length - 1]
for (let i=A.length-2; i>=0; i--){
if (!stack.length){
stack.push(A[i])
continue
}
while (stack.length && compare(A[ stack[stack.length-1] ], A[i]))
result[ stack.pop() ] = i
stack.push(i)
}
while (stack.length)
result[ stack.pop() ] = -1
return result
}
function longestAlternatingSequence(A){
let prevHigher = prev(A, 'higher')
let prevLower = prev(A, 'lower')
let longestUpFirst = new Array(A.length)
let longestDownFirst = new Array(A.length)
let best = 1
longestUpFirst[0] = 1
longestDownFirst[0] = 1
for (let i=1; i<A.length; i++){
// Longest up-first
longestUpFirst[i] = Math.max(
A[i] >= A[i-1] ? longestUpFirst[i - 1] : -Infinity,
prevHigher[i] != -1 ? longestUpFirst[ prevHigher[i] ] : -Infinity,
prevHigher[i] != -1 ? 1 + longestDownFirst[ prevHigher[i] ] : -Infinity,
1
)
// Longest down-first
longestDownFirst[i] = Math.max(
A[i] <= A[i-1] ? longestDownFirst[i - 1] : -Infinity,
prevLower[i] != -1 ? longestDownFirst[ prevLower[i] ] : -Infinity,
prevLower[i] != -1 ? 1 + longestUpFirst[ prevLower[i] ] : -Infinity,
1
)
best = Math.max(best, longestUpFirst[i], longestDownFirst[i])
}
console.log(`${ longestUpFirst } (longestUpFirst)`)
console.log(`${ longestDownFirst } (longestDownFirst)`)
return best
}
var tests = [
[5,6,5,5,5,7,5,5,5,87,5],
[1,2,3,4,5,6,7,8],
new Array(10).fill(null).map(_ => ~~(Math.random()*50))
]
for (let A of tests){
console.log(JSON.stringify(A))
console.log(longestAlternatingSequence(A))
console.log('')
}
Update
Heh, there's a simpler O(n) recurrence here: https://www.geeksforgeeks.org/longest-alternating-subsequence/

Find number of continuous subarray having sum zero

You have given a array and You have to give number of continuous subarray which the sum is zero.
example:
1) 0 ,1,-1,0 => 6 {{0},{1,-1},{0,1,-1},{1,-1,0},{0}};
2) 5, 2, -2, 5 ,-5, 9 => 3.
With O(n^2) it can be done.I am trying to find the solution below this complexity.
Consider S[0..N] - prefix sums of your array, i.e. S[k] = A[0] + A[1] + ... + A[k-1] for k from 0 to N.
Now sum of elements from L to R-1 is zero if and only if S[R] = S[L]. It means that you have to find number of indices 0 <= L < R <= N such that S[L] = S[R].
This problem can be solved with a hash table. Iterate over elements of S[] while maintaining for each value X number of times it was met in the already processed part of S[]. These counts should be stored in a hash map, where the number X is a key, and the count H[X] is the value. When you meet a new elements S[i], add H[S[i]] to your answer (these account for substrings ending with (i-1)-st element), then increment H[S[i]] by one.
Note that if sum of absolute values of array elements is small, you can use a simple array instead of hash table. The complexity is linear on average.
Here is the code:
long long CountZeroSubstrings(vector<int> A) {
int n = A.size();
vector<long long> S(n+1, 0);
for (int i = 0; i < n; i++)
S[i+1] = S[i] + A[i];
long long answer = 0;
unordered_map<long long, int> H;
for (int i = 0; i <= n; i++) {
if (H.count(S[i]))
answer += H[S[i]];
H[S[i]]++;
}
return answer;
}
This can be solved in linear time by keeping a hash table of sums reached during the array traversal. The number of subsets can then be directly calculated from the counts of revisited sums.
Haskell version:
import qualified Data.Map as M
import Data.List (foldl')
f = foldl' (\b a -> b + div (a * (a + 1)) 2) 0 . M.elems . snd
. foldl' (\(s,m) x -> let s' = s + x in case M.lookup s' m of
Nothing -> (s',M.insert s' 0 m)
otherwise -> (s',M.adjust (+1) s' m)) (0,M.fromList[(0,0)])
Output:
*Main> f [0,1,-1,0]
6
*Main> f [5,2,-2,5,-5,9]
3
*Main> f [0,0,0,0]
10
*Main> f [0,1,0,0]
4
*Main> f [0,1,0,0,2,3,-3]
5
*Main> f [0,1,-1,0,0,2,3,-3]
11
C# version of #stgatilov answer https://stackoverflow.com/a/31489960/3087417 with readable variables:
int[] sums = new int[arr.Count() + 1];
for (int i = 0; i < arr.Count(); i++)
sums[i + 1] = sums[i] + arr[i];
int numberOfFragments = 0;
Dictionary<int, int> sumToNumberOfRepetitions = new Dictionary<int, int>();
foreach (int item in sums)
{
if (sumToNumberOfRepetitions.ContainsKey(item))
numberOfFragments += sumToNumberOfRepetitions[item];
else
sumToNumberOfRepetitions.Add(item, 0);
sumToNumberOfRepetitions[item]++;
}
return numberOfFragments;
If you want to have sum not only zero but any number k, here is the hint:
int numToFind = currentSum - k;
if (sumToNumberOfRepetitions.ContainsKey(numToFind))
numberOfFragments += sumToNumberOfRepetitions[numToFind];
I feel it can be solved using DP:
Let the state be :
DP[i][j] represents the number of ways j can be formed using all the subarrays ending at i!
Transitions:
for every element in the initial step ,
Increase the number of ways to form Element[i] using i elements by 1 i.e. using the subarray of length 1 starting from i and ending with i i.e
DP[i][Element[i]]++;
then for every j in Range [ -Mod(highest Magnitude of any element ) , Mod(highest Magnitude of any element) ]
DP[i][j]+=DP[i-1][j-Element[i]];
Then your answer will be the sum of all the DP[i][0] (Number of ways to form 0 using subarrays ending at i ) where i varies from 1 to Number of elements
Complexity is O(MOD highest magnitude of any element * Number of Elements)
https://www.techiedelight.com/find-sub-array-with-0-sum/
This would be an exact solution.
# Utility function to insert <key, value> into the dict
def insert(dict, key, value):
# if the key is seen for the first time, initialize the list
dict.setdefault(key, []).append(value)
# Function to print all sub-lists with 0 sum present
# in the given list
def printallSublists(A):
# create an empty -dict to store ending index of all
# sub-lists having same sum
dict = {}
# insert (0, -1) pair into the dict to handle the case when
# sub-list with 0 sum starts from index 0
insert(dict, 0, -1)
result = 0
sum = 0
# traverse the given list
for i in range(len(A)):
# sum of elements so far
sum += A[i]
# if sum is seen before, there exists at-least one
# sub-list with 0 sum
if sum in dict:
list = dict.get(sum)
result += len(list)
# find all sub-lists with same sum
for value in list:
print("Sublist is", (value + 1, i))
# insert (sum so far, current index) pair into the -dict
insert(dict, sum, i)
print("length :", result)
if __name__ == '__main__':
A = [0, 1, 2, -3, 0, 2, -2]
printallSublists(A)
I don't know what the complexity of my suggestion would be but i have an idea :)
What you can do is try to reduce element from main array which are not able to contribute for you solution
suppose elements are -10, 5, 2, -2, 5,7 ,-5, 9,11,19
so you can see that -10,9,11 and 19 are element
that are never gone be useful to make sum 0 in your case
so try to remove -10,9,11, and 19 from your main array
to do this what you can do is
1) create two sub array from your main array
`positive {5,7,2,9,11,19}` and `negative {-10,-2,-5}`
2) remove element from positive array which does not satisfy condition
condition -> value should be construct from negative arrays element
or sum of its elements
ie.
5 = -5 //so keep it //don't consider the sign
7 = (-5 + -2 ) // keep
2 = -2 // keep
9 // cannot be construct using -10,-2,-5
same for all 11 and 19
3) remove element form negative array which does not satisfy condition
condition -> value should be construct from positive arrays element
or sum of its elements
i.e. -10 // cannot be construct so discard
-2 = 2 // keep
-5 = 5 // keep
so finally you got an array which contains -2,-5,5,7,2 create all possible sub array form it and check for sum = 0
(Note if your input array contains 0 add all 0's in final array)

how to generate the following sequence?

I want to generate the following sequence:
set S = {1,2,3}
op = {{1,2},{1,3},{2,3}}
set S = {1,2,3,4}
op = {{1,2,3},{1,2,4},{1,3,4},{2,3,4}}
set S = {1,2,3,4,5}
op = {{1,2,3,4},{1,2,3,5},{1,2,4,5},{1,3,4,5},{2,3,4,5}}
in general, given a set of n numbers, I have to find all the possible subsets of (n-1) numbers with the constraint that they are in alphabetical order (numbers in order).
Is there any algorithm or approach to solve the particular problem? I know that we can use recursion to generate smaller subsets.
There are only n such subsets; each one with one of the n original numbers removed from the original set. So sort the set, and for each of the numbers, create a set which is the original set with that number removed.
A possible caveat is that if there are duplicate numbers in the original set, you will only have as many subsets as there are unique numbers in the original set, so possibly fewer than n in that case.
Some languages have this functionality built-in. For example, Python's itertools.combinations() method. In your case:
>>> import itertools
>>> l = [1,2,3,4]
>>> combinations = itertools.combinations(l, len(l) - 1) #for the list of numbers l, for sublists with a length 1 less than l's length
>>> for comb in combinations:
... print comb
...
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
>>>
However, if you want to implement this yourself the link above may still prove useful as it shows equivalent code. You could use this code to make your own implementation in any language.
This should be simple enough. Let arr have the sorted set and n be the number of elements:
int arr[100];
int n;
printf("{");
for (int i = n - 1; i >= 0; i--){
printf("{");
for (int j = 0; j < n; j++) {
if (i == j) {
continue;
}
printf("%d, ", arr[j]);
}
printf("}, ");
}
printf("}\n");
The above prints some additional commas and you can filter them out yourself.
Think about
how to generate the set 1..N
how to identify the number n to be removed from each set (n: N .. 1)
To generate/print a set from 1..N
print "{"
for i=1 to N
if (i > 1) print ","
print i
end
print "}"
How to create a loop that selects n from N to 1
for j=N to 1
...
end
Use that last loop as a wrapper around that above loop - and in the above loop test if the current selected number j is equal to i and don't print it in that case.
For the fun a Perl implementation that does not pretend to be optimized :-)
$N = 5;
sub rec {
my($j,$i,#a) = #_;
if ($j > 0) {
while (++$i <= $N) { push(#a,$i) if ($i != $j); }
print('{' . join(',', #a) . "}\n");
&rec($j-1);
}
}
&rec($N);
Or this, (maybe) more conventional
for ($i=$N ; $i>0 ; $i--) {
#a = ();
for (1..$N) { push(#a,$_) if ($i != $_); }
print('{' . join(',', #a) . "}\n");
}
In Haskell you could do this:
import Data.List
combinations 0 _ = [ [] ]
combinations n xs = [ y:ys | y:xs' <- tails xs
, ys <- combinations (n-1) xs']
subsets set = combinations (length set - 1) (sort set)
Haskell, briefly:
_ => anyting
[] => empty list
a = 1; as = [2,3] => a:as = [1,2,3]
[a:b | a <- [1], b <- [[2],[3]]] => [[1,2],[1,3]]
tails [1,2,3] => [[1,2,3],[2,3],[3],[]]
For example, "combinations 2 [1,2,3]":
tails xs = [[1,2,3],[2,3],[3],[]]
[1,2,3] => y = 1; ys = [[2],[3]] => [1,2],[1,3]
[2,3] => y = 2; ys = [[3]] => [2,3]
[3] => y = 3; ys = NULL => []
Result [[1,2],[1,3],[2,3]]

Algorithm to find two repeated numbers in an array, without sorting

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]
There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.
OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.
You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q
Insert each element into a set/hashtable, first checking if its are already in it.
You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.
Check this old but good paper on the topic:
Finding Repeated Elements (PDF)
Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).
suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.
Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.
Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.
Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)
You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);
answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!
check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers
For each number: check if it exists in the rest of the array.
Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each
How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)
In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.
I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.
for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}
Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).
What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Resources