How to find the following type of set with computation time less than O(n)? - algorithm

Here 5 different sets are shown. S1 contains 1. Next set S2 is calculated from S1 considering the following logic:
Suppose Sn contains {a1,a2,a3,a4.....,an} and middle element of Sn is b.
Then the set Sn+1 contains elements {b,b+a1,b+a2,......,b+an}. Total (n+1) elements. If a set contains even number of elements then middle element is (n/2 +1) .
Now, if n is given as input then we have to display all the elements of set Sn.
Clearly it is possible to solve the problem in O(n) time.
we can compute all the middle element as (2^(n-1) - middle element of the previous set + 1) where s1 ={1} is base case. In this way O(n) time we will get the all middle elements till (n-1)th set. So, middle element of (n-1)th set is the first element of the nth set set. (middle element of (n-1)th set + middle element of (n-2)th set) is the middle second element of the nth set. In this way we will get all the elements of nth set.
So it needs O(n) time.
Here id the complete java code I have written:
public class SpecialSubset {
private static Scanner inp;
public static void main(String[] args) {
// TODO Auto-generated method stub
int N,fst,mid,con=0;
inp = new Scanner(System.in);
N=inp.nextInt();
int[] setarr=new int[N];
int[] midarr=new int[N];
fst=1;
mid=1;
midarr[0]=1;
for(int i=1;i<N;i++)
{
midarr[i]=(int) (Math.pow(2, i)-midarr[i-1]+1);
}
setarr[0]=midarr[N-2];
System.out.print(setarr[0]);
System.out.print(" ");
for(int i=1,j=N-3;i<N-1;i++,j--)
{
setarr[i]=setarr[i-1]+midarr[j];
System.out.print(setarr[i]);
System.out.print(" ");
}
setarr[N-1]=setarr[N-2]+1;
System.out.print(setarr[N-1]);
}
}
Here is the link of the Question:
https://www.hackerrank.com/contests/projecteuler/challenges/euler103
IS it possible to solve the problem with less than O(n) time?

#Paul Boddington has given an answer that relies on the sequence of first numbers of these sets being the Narayana-Zidek-Capell numbers and has checked it for some small-ish values. However, there was no proof of the conjecture given. This answer is in addition to the above, to make it complete. I'm no HTML/CSS/Markdown guru, so you'll have to excuse the bad positioning of subscripts (If anyone can improve those - be my guest.
Notation:
Let aij be the i-th number in the j-th set.
I'll also define bj as the first number of the j-2-th set. This is the sequence the proof is about. The -2 is to account for the first and second 1 in the Narayana-Zidek-Capell sequence.
Generating rules:
The problem statement didn't clarify what "center number" is for a even-length set (a list really, but whatever), but it seems they meant the "center right" in that case. I'll denote the rules numbers in bold when I use them below.
a11 = 1
a1n = aceil(n+1⁄2)n-1
ain = a1n + ai-1n-1
bn = a1n-2
Proof:
First step is to make a slightly more involved formula for ain by unwinding the recursion a bit more and substituting b:
ain = Σ a1n-j = Σ bn-j+2 for j in [0 ... i-1]
Next, we consider two cases for bn - one where n is odd, one where n is even.
Even case:
b2n+2 = a12n =
2 = aceil(2n+1⁄2)2n-1 = an+12n-1 =
3 = a12n-1 + an2n-2 =
2, 4 = b2n+1 + a12n-1 =
5 = 2 * b2n+1
Odd case:
b2n+1 = a12n-1 =
2 = aceil(2n⁄2)2n-2 = an2n-2 =
3 = a12n-2 + an-12n-3 =
4 = 2 * b2n + (an-12n-3 - a12n-2) =
2 = 2 * b2n + (an-12n-3 - an2n-3) =
5 = 2 * b2n - bn
These rules are the exact sequence definition, and provide a way to generate the nth set in linear time (as opposed to quadratic when generating each set in turn)

The smallest numbers in the sets appear to be the Narayana-Zidek-Capell numbers
1, 1, 2, 3, 6, 11, 22, ...
The other numbers are obtained from the first number by repeatedly adding these numbers in reverse.
For example,
S6 = {11, 17, 20, 22, 23, 24}
+6 +3 +2 +1 +1
Using a recurrence for the Narayana-Zidek-Capell sequence found in that link, I have managed to produce a solution for this problem that runs in O(n) time. Here is a solution in Java. It only works for n <= 32 due to int overflow, but it could be written using BigInteger to work for higher values.
static Set<Integer> set(int n) {
int[] a = new int[n + 2];
for (int i = 1; i < n + 2; i++) {
if (i <= 2)
a[i] = 1;
else if (i % 2 == 0)
a[i] = 2 * a[i - 1];
else
a[i] = 2 * a[i - 1] - a[i / 2];
}
Set<Integer> set = new HashSet<>();
int sum = 0;
for (int i = n + 1; i >= 2; i--) {
sum += a[i];
set.add(sum);
}
return set;
}
I'm not able to justify right now why this is the same as the set in the question, but I'm working on it. However I have checked for all n <= 32 that this algorithm gives the same set as the "obvious" algorithm, so I'm reasonably sure it's correct.

Related

How to find algorithm for triple 1 in bitvector in O(nlog(n)) with divide and conquer without FFT? [duplicate]

I had this question on an Algorithms test yesterday, and I can't figure out the answer. It is driving me absolutely crazy, because it was worth about 40 points. I figure that most of the class didn't solve it correctly, because I haven't come up with a solution in the past 24 hours.
Given a arbitrary binary string of length n, find three evenly spaced ones within the string if they exist. Write an algorithm which solves this in O(n * log(n)) time.
So strings like these have three ones that are "evenly spaced": 11100000, 0100100100
edit: It is a random number, so it should be able to work for any number. The examples I gave were to illustrate the "evenly spaced" property. So 1001011 is a valid number. With 1, 4, and 7 being ones that are evenly spaced.
Finally! Following up leads in sdcvvc's answer, we have it: the O(n log n) algorithm for the problem! It is simple too, after you understand it. Those who guessed FFT were right.
The problem: we are given a binary string S of length n, and we want to find three evenly spaced 1s in it. For example, S may be 110110010, where n=9. It has evenly spaced 1s at positions 2, 5, and 8.
Scan S left to right, and make a list L of positions of 1. For the S=110110010 above, we have the list L = [1, 2, 4, 5, 8]. This step is O(n). The problem is now to find an arithmetic progression of length 3 in L, i.e. to find distinct a, b, c in L such that b-a = c-b, or equivalently a+c=2b. For the example above, we want to find the progression (2, 5, 8).
Make a polynomial p with terms xk for each k in L. For the example above, we make the polynomial p(x) = (x + x2 + x4 + x5+x8). This step is O(n).
Find the polynomial q = p2, using the Fast Fourier Transform. For the example above, we get the polynomial q(x) = x16 + 2x13 + 2x12 + 3x10 + 4x9 + x8 + 2x7 + 4x6 + 2x5 + x4 + 2x3 + x2. This step is O(n log n).
Ignore all terms except those corresponding to x2k for some k in L. For the example above, we get the terms x16, 3x10, x8, x4, x2. This step is O(n), if you choose to do it at all.
Here's the crucial point: the coefficient of any x2b for b in L is precisely the number of pairs (a,c) in L such that a+c=2b. [CLRS, Ex. 30.1-7] One such pair is (b,b) always (so the coefficient is at least 1), but if there exists any other pair (a,c), then the coefficient is at least 3, from (a,c) and (c,a). For the example above, we have the coefficient of x10 to be 3 precisely because of the AP (2,5,8). (These coefficients x2b will always be odd numbers, for the reasons above. And all other coefficients in q will always be even.)
So then, the algorithm is to look at the coefficients of these terms x2b, and see if any of them is greater than 1. If there is none, then there are no evenly spaced 1s. If there is a b in L for which the coefficient of x2b is greater than 1, then we know that there is some pair (a,c) — other than (b,b) — for which a+c=2b. To find the actual pair, we simply try each a in L (the corresponding c would be 2b-a) and see if there is a 1 at position 2b-a in S. This step is O(n).
That's all, folks.
One might ask: do we need to use FFT? Many answers, such as beta's, flybywire's, and rsp's, suggest that the approach that checks each pair of 1s and sees if there is a 1 at the "third" position, might work in O(n log n), based on the intuition that if there are too many 1s, we would find a triple easily, and if there are too few 1s, checking all pairs takes little time. Unfortunately, while this intuition is correct and the simple approach is better than O(n2), it is not significantly better. As in sdcvvc's answer, we can take the "Cantor-like set" of strings of length n=3k, with 1s at the positions whose ternary representation has only 0s and 2s (no 1s) in it. Such a string has 2k = n(log 2)/(log 3) ≈ n0.63 ones in it and no evenly spaced 1s, so checking all pairs would be of the order of the square of the number of 1s in it: that's 4k ≈ n1.26 which unfortunately is asymptotically much larger than (n log n). In fact, the worst case is even worse: Leo Moser in 1953 constructed (effectively) such strings which have n1-c/√(log n) 1s in them but no evenly spaced 1s, which means that on such strings, the simple approach would take Θ(n2-2c/√(log n)) — only a tiny bit better than Θ(n2), surprisingly!
About the maximum number of 1s in a string of length n with no 3 evenly spaced ones (which we saw above was at least n0.63 from the easy Cantor-like construction, and at least n1-c/√(log n) with Moser's construction) — this is OEIS A003002. It can also be calculated directly from OEIS A065825 as the k such that A065825(k) ≤ n < A065825(k+1). I wrote a program to find these, and it turns out that the greedy algorithm does not give the longest such string. For example, for n=9, we can get 5 1s (110100011) but the greedy gives only 4 (110110000), for n=26 we can get 11 1s (11001010001000010110001101) but the greedy gives only 8 (11011000011011000000000000), and for n=74 we can get 22 1s (11000010110001000001011010001000000000000000010001011010000010001101000011) but the greedy gives only 16 (11011000011011000000000000011011000011011000000000000000000000000000000000). They do agree at quite a few places until 50 (e.g. all of 38 to 50), though. As the OEIS references say, it seems that Jaroslaw Wroblewski is interested in this question, and he maintains a website on these non-averaging sets. The exact numbers are known only up to 194.
Your problem is called AVERAGE in this paper (1999):
A problem is 3SUM-hard if there is a sub-quadratic reduction from the problem 3SUM: Given a set A of n integers, are there elements a,b,c in A such that a+b+c = 0? It is not known whether AVERAGE is 3SUM-hard. However, there is a simple linear-time reduction from AVERAGE to 3SUM, whose description we omit.
Wikipedia:
When the integers are in the range [−u ... u], 3SUM can be solved in time O(n + u lg u) by representing S as a bit vector and performing a convolution using FFT.
This is enough to solve your problem :).
What is very important is that O(n log n) is complexity in terms of number of zeroes and ones, not the count of ones (which could be given as an array, like [1,5,9,15]). Checking if a set has an arithmetic progression, terms of number of 1's, is hard, and according to that paper as of 1999 no faster algorithm than O(n2) is known, and is conjectured that it doesn't exist. Everybody who doesn't take this into account is attempting to solve an open problem.
Other interesting info, mostly irrevelant:
Lower bound:
An easy lower bound is Cantor-like set (numbers 1..3^n-1 not containing 1 in their ternary expansion) - its density is n^(log_3 2) (circa 0.631). So any checking if the set isn't too large, and then checking all pairs is not enough to get O(n log n). You have to investigate the sequence smarter. A better lower bound is quoted here - it's n1-c/(log(n))^(1/2). This means Cantor set is not optimal.
Upper bound - my old algorithm:
It is known that for large n, a subset of {1,2,...,n} not containing arithmetic progression has at most n/(log n)^(1/20) elements. The paper On triples in arithmetic progression proves more: the set cannot contain more than n * 228 * (log log n / log n)1/2 elements. So you could check if that bound is achieved and if not, naively check pairs. This is O(n2 * log log n / log n) algorithm, faster than O(n2). Unfortunately "On triples..." is on Springer - but the first page is available, and Ben Green's exposition is available here, page 28, theorem 24.
By the way, the papers are from 1999 - the same year as the first one I mentioned, so that's probably why the first one doesn't mention that result.
This is not a solution, but a similar line of thought to what Olexiy was thinking
I was playing around with creating sequences with maximum number of ones, and they are all quite interesting, I got up to 125 digits and here are the first 3 numbers it found by attempting to insert as many '1' bits as possible:
11011000011011000000000000001101100001101100000000000000000000000000000000000000000110110000110110000000000000011011000011011
10110100010110100000000000010110100010110100000000000000000000000000000000000000000101101000101101000000000000101101000101101
10011001010011001000000000010011001010011001000000000000000000000000000000000000010011001010011001000000000010011001010011001
Notice they are all fractals (not too surprising given the constraints). There may be something in thinking backwards, perhaps if the string is not a fractal of with a characteristic, then it must have a repeating pattern?
Thanks to beta for the better term to describe these numbers.
Update:
Alas it looks like the pattern breaks down when starting with a large enough initial string, such as: 10000000000001:
100000000000011
10000000000001101
100000000000011011
10000000000001101100001
100000000000011011000011
10000000000001101100001101
100000000000011011000011010000000001
100000000000011011000011010000000001001
1000000000000110110000110100000000010011
1000000000000110110000110100000000010011001
10000000000001101100001101000000000100110010000000001
10000000000001101100001101000000000100110010000000001000001
1000000000000110110000110100000000010011001000000000100000100000000000001
10000000000001101100001101000000000100110010000000001000001000000000000011
1000000000000110110000110100000000010011001000000000100000100000000000001101
100000000000011011000011010000000001001100100000000010000010000000000000110100001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011000000001100100000000100100000000000010000000010000100000100100010010000010000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000110000010000000000000000000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011000000001100100000000100100000000000010000000010000100000100100010010000010000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000110000010000000000000000000001001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011000001000000000000000000000100100000000000000000000000000000000000011
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011000001000000000000000000000100100000000000000000000000000000000000011001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001100100000000000000000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001100100000000000000000000001001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001100100000000000000000000001001000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011000001000000000000000000000100100000000000000000000000000000000000011001000000000000000000000010010000010000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011000000001100100000000100100000000000010000000010000100000100100010010000010000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000110000010000000000000000000001001000000000000000000000000000000000000110010000000000000000000000100100000100000011
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001100100000000000000000000001001000001000000110000000000001
I suspect that a simple approach that looks like O(n^2) will actually yield something better, like O(n ln(n)). The sequences that take the longest to test (for any given n) are the ones that contain no trios, and that puts severe restrictions on the number of 1's that can be in the sequence.
I've come up with some hand-waving arguments, but I haven't been able to find a tidy proof. I'm going to take a stab in the dark: the answer is a very clever idea that the professor has known for so long that it's come to seem obvious, but it's much too hard for the students. (Either that or you slept through the lecture that covered it.)
Revision: 2009-10-17 23:00
I've run this on large numbers (like, strings of 20 million) and I now believe this algorithm is not O(n logn). Notwithstanding that, it's a cool enough implementation and contains a number of optimizations that makes it run really fast. It evaluates all the arrangements of binary strings 24 or fewer digits in under 25 seconds.
I've updated the code to include the 0 <= L < M < U <= X-1 observation from earlier today.
Original
This is, in concept, similar to another question I answered. That code also looked at three values in a series and determined if a triplet satisfied a condition. Here is C# code adapted from that:
using System;
using System.Collections.Generic;
namespace StackOverflow1560523
{
class Program
{
public struct Pair<T>
{
public T Low, High;
}
static bool FindCandidate(int candidate,
List<int> arr,
List<int> pool,
Pair<int> pair,
ref int iterations)
{
int lower = pair.Low, upper = pair.High;
while ((lower >= 0) && (upper < pool.Count))
{
int lowRange = candidate - arr[pool[lower]];
int highRange = arr[pool[upper]] - candidate;
iterations++;
if (lowRange < highRange)
lower -= 1;
else if (lowRange > highRange)
upper += 1;
else
return true;
}
return false;
}
static List<int> BuildOnesArray(string s)
{
List<int> arr = new List<int>();
for (int i = 0; i < s.Length; i++)
if (s[i] == '1')
arr.Add(i);
return arr;
}
static void BuildIndexes(List<int> arr,
ref List<int> even, ref List<int> odd,
ref List<Pair<int>> evenIndex, ref List<Pair<int>> oddIndex)
{
for (int i = 0; i < arr.Count; i++)
{
bool isEven = (arr[i] & 1) == 0;
if (isEven)
{
evenIndex.Add(new Pair<int> {Low=even.Count-1, High=even.Count+1});
oddIndex.Add(new Pair<int> {Low=odd.Count-1, High=odd.Count});
even.Add(i);
}
else
{
oddIndex.Add(new Pair<int> {Low=odd.Count-1, High=odd.Count+1});
evenIndex.Add(new Pair<int> {Low=even.Count-1, High=even.Count});
odd.Add(i);
}
}
}
static int FindSpacedOnes(string s)
{
// List of indexes of 1s in the string
List<int> arr = BuildOnesArray(s);
//if (s.Length < 3)
// return 0;
// List of indexes to odd indexes in arr
List<int> odd = new List<int>(), even = new List<int>();
// evenIndex has indexes into arr to bracket even numbers
// oddIndex has indexes into arr to bracket odd numbers
List<Pair<int>> evenIndex = new List<Pair<int>>(),
oddIndex = new List<Pair<int>>();
BuildIndexes(arr,
ref even, ref odd,
ref evenIndex, ref oddIndex);
int iterations = 0;
for (int i = 1; i < arr.Count-1; i++)
{
int target = arr[i];
bool found = FindCandidate(target, arr, odd, oddIndex[i], ref iterations) ||
FindCandidate(target, arr, even, evenIndex[i], ref iterations);
if (found)
return iterations;
}
return iterations;
}
static IEnumerable<string> PowerSet(int n)
{
for (long i = (1L << (n-1)); i < (1L << n); i++)
{
yield return Convert.ToString(i, 2).PadLeft(n, '0');
}
}
static void Main(string[] args)
{
for (int i = 5; i < 64; i++)
{
int c = 0;
string hardest_string = "";
foreach (string s in PowerSet(i))
{
int cost = find_spaced_ones(s);
if (cost > c)
{
hardest_string = s;
c = cost;
Console.Write("{0} {1} {2}\r", i, c, hardest_string);
}
}
Console.WriteLine("{0} {1} {2}", i, c, hardest_string);
}
}
}
}
The principal differences are:
Exhaustive search of solutions
This code generates a power set of data to find the hardest input to solve for this algorithm.
All solutions versus hardest to solve
The code for the previous question generated all the solutions using a python generator. This code just displays the hardest for each pattern length.
Scoring algorithm
This code checks the distance from the middle element to its left- and right-hand edge. The python code tested whether a sum was above or below 0.
Convergence on a candidate
The current code works from the middle towards the edge to find a candidate. The code in the previous problem worked from the edges towards the middle. This last change gives a large performance improvement.
Use of even and odd pools
Based on the observations at the end of this write-up, the code searches pairs of even numbers of pairs of odd numbers to find L and U, keeping M fixed. This reduces the number of searches by pre-computing information. Accordingly, the code uses two levels of indirection in the main loop of FindCandidate and requires two calls to FindCandidate for each middle element: once for even numbers and once for odd ones.
The general idea is to work on indexes, not the raw representation of the data. Calculating an array where the 1's appear allows the algorithm to run in time proportional to the number of 1's in the data rather than in time proportional to the length of the data. This is a standard transformation: create a data structure that allows faster operation while keeping the problem equivalent.
The results are out of date: removed.
Edit: 2009-10-16 18:48
On yx's data, which is given some credence in the other responses as representative of hard data to calculate on, I get these results... I removed these. They are out of date.
I would point out that this data is not the hardest for my algorithm, so I think the assumption that yx's fractals are the hardest to solve is mistaken. The worst case for a particular algorithm, I expect, will depend upon the algorithm itself and will not likely be consistent across different algorithms.
Edit: 2009-10-17 13:30
Further observations on this.
First, convert the string of 0's and 1's into an array of indexes for each position of the 1's. Say the length of that array A is X. Then the goal is to find
0 <= L < M < U <= X-1
such that
A[M] - A[L] = A[U] - A[M]
or
2*A[M] = A[L] + A[U]
Since A[L] and A[U] sum to an even number, they can't be (even, odd) or (odd, even). The search for a match could be improved by splitting A[] into odd and even pools and searching for matches on A[M] in the pools of odd and even candidates in turn.
However, this is more of a performance optimization than an algorithmic improvement, I think. The number of comparisons should drop, but the order of the algorithm should be the same.
Edit 2009-10-18 00:45
Yet another optimization occurs to me, in the same vein as separating the candidates into even and odd. Since the three indexes have to add to a multiple of 3 (a, a+x, a+2x -- mod 3 is 0, regardless of a and x), you can separate L, M, and U into their mod 3 values:
M L U
0 0 0
1 2
2 1
1 0 2
1 1
2 0
2 0 1
1 0
2 2
In fact, you could combine this with the even/odd observation and separate them into their mod 6 values:
M L U
0 0 0
1 5
2 4
3 3
4 2
5 1
and so on. This would provide a further performance optimization but not an algorithmic speedup.
Wasn't able to come up with the solution yet :(, but have some ideas.
What if we start from a reverse problem: construct a sequence with the maximum number of 1s and WITHOUT any evenly spaced trios. If you can prove the maximum number of 1s is o(n), then you can improve your estimate by iterating only through list of 1s only.
This may help....
This problem reduces to the following:
Given a sequence of positive integers, find a contiguous subsequence partitioned into a prefix and a suffix such that the sum of the prefix of the subsequence is equal to the sum of the suffix of the subsequence.
For example, given a sequence of [ 3, 5, 1, 3, 6, 5, 2, 2, 3, 5, 6, 4 ], we would find a subsequence of [ 3, 6, 5, 2, 2] with a prefix of [ 3, 6 ] with prefix sum of 9 and a suffix of [ 5, 2, 2 ] with suffix sum of 9.
The reduction is as follows:
Given a sequence of zeros and ones, and starting at the leftmost one, continue moving to the right. Each time another one is encountered, record the number of moves since the previous one was encountered and append that number to the resulting sequence.
For example, given a sequence of [ 0, 1, 1, 0, 0, 1, 0, 0, 0, 1 0 ], we would find the reduction of [ 1, 3, 4]. From this reduction, we calculate the contiguous subsequence of [ 1, 3, 4], the prefix of [ 1, 3] with sum of 4, and the suffix of [ 4 ] with sum of 4.
This reduction may be computed in O(n).
Unfortunately, I am not sure where to go from here.
For the simple problem type (i.e. you search three "1" with only (i.e. zero or more) "0" between it), Its quite simple: You could just split the sequence at every "1" and look for two adjacent subsequences having the same length (the second subsequence not being the last one, of course). Obviously, this can be done in O(n) time.
For the more complex version (i.e. you search an index i and an gap g>0 such that s[i]==s[i+g]==s[i+2*g]=="1"), I'm not sure, if there exists an O(n log n) solution, since there are possibly O(n²) triplets having this property (think of a string of all ones, there are approximately n²/2 such triplets). Of course, you are looking for only one of these, but I have currently no idea, how to find it ...
A fun question, but once you realise that the actual pattern between two '1's does not matter, the algorithm becomes:
scan look for a '1'
starting from the next position scan for another '1' (to the end of the array minus the distance from the current first '1' or else the 3rd '1' would be out of bounds)
if at the position of the 2nd '1' plus the distance to the first 1' a third '1' is found, we have evenly spaces ones.
In code, JTest fashion, (Note this code isn't written to be most efficient and I added some println's to see what happens.)
import java.util.Random;
import junit.framework.TestCase;
public class AlgorithmTest extends TestCase {
/**
* Constructor for GetNumberTest.
*
* #param name The test's name.
*/
public AlgorithmTest(String name) {
super(name);
}
/**
* #see TestCase#setUp()
*/
protected void setUp() throws Exception {
super.setUp();
}
/**
* #see TestCase#tearDown()
*/
protected void tearDown() throws Exception {
super.tearDown();
}
/**
* Tests the algorithm.
*/
public void testEvenlySpacedOnes() {
assertFalse(isEvenlySpaced(1));
assertFalse(isEvenlySpaced(0x058003));
assertTrue(isEvenlySpaced(0x07001));
assertTrue(isEvenlySpaced(0x01007));
assertTrue(isEvenlySpaced(0x101010));
// some fun tests
Random random = new Random();
isEvenlySpaced(random.nextLong());
isEvenlySpaced(random.nextLong());
isEvenlySpaced(random.nextLong());
}
/**
* #param testBits
*/
private boolean isEvenlySpaced(long testBits) {
String testString = Long.toBinaryString(testBits);
char[] ones = testString.toCharArray();
final char ONE = '1';
for (int n = 0; n < ones.length - 1; n++) {
if (ONE == ones[n]) {
for (int m = n + 1; m < ones.length - m + n; m++) {
if (ONE == ones[m] && ONE == ones[m + m - n]) {
System.out.println(" IS evenly spaced: " + testBits + '=' + testString);
System.out.println(" at: " + n + ", " + m + ", " + (m + m - n));
return true;
}
}
}
}
System.out.println("NOT evenly spaced: " + testBits + '=' + testString);
return false;
}
}
I thought of a divide-and-conquer approach that might work.
First, in preprocessing you need to insert all numbers less than one half your input size (n/3) into a list.
Given a string: 0000010101000100 (note that this particular example is valid)
Insert all primes (and 1) from 1 to (16/2) into a list: {1, 2, 3, 4, 5, 6, 7}
Then divide it in half:
100000101 01000100
Keep doing this until you get to strings of size 1. For all size-one strings with a 1 in them, add the index of the string to the list of possibilities; otherwise, return -1 for failure.
You'll also need to return a list of still-possible spacing distances, associated with each starting index. (Start with the list you made above and remove numbers as you go) Here, an empty list means you're only dealing with one 1 and so any spacing is possible at this point; otherwise the list includes spacings that must be ruled out.
So continuing with the example above:
1000 0101 0100 0100
10 00 01 01 01 00 01 00
1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0
In the first combine step, we have eight sets of two now. In the first, we have the possibility of a set, but we learn that spacing by 1 is impossible because of the other zero being there. So we return 0 (for the index) and {2,3,4,5,7} for the fact that spacing by 1 is impossible. In the second, we have nothing and so return -1. In the third we have a match with no spacings eliminated in index 5, so return 5, {1,2,3,4,5,7}. In the fourth pair we return 7, {1,2,3,4,5,7}. In the fifth, return 9, {1,2,3,4,5,7}. In the sixth, return -1. In the seventh, return 13, {1,2,3,4,5,7}. In the eighth, return -1.
Combining again into four sets of four, we have:
1000: Return (0, {4,5,6,7})
0101: Return (5, {2,3,4,5,6,7}), (7, {1,2,3,4,5,6,7})
0100: Return (9, {3,4,5,6,7})
0100: Return (13, {3,4,5,6,7})
Combining into sets of eight:
10000101: Return (0, {5,7}), (5, {2,3,4,5,6,7}), (7, {1,2,3,4,5,6,7})
01000100: Return (9, {4,7}), (13, {3,4,5,6,7})
Combining into a set of sixteen:
10000101 01000100
As we've progressed, we keep checking all the possibilities so far. Up to this step we've left stuff that went beyond the end of the string, but now we can check all the possibilities.
Basically, we check the first 1 with spacings of 5 and 7, and find that they don't line up to 1's. (Note that each check is CONSTANT, not linear time) Then we check the second one (index 5) with spacings of 2, 3, 4, 5, 6, and 7-- or we would, but we can stop at 2 since that actually matches up.
Phew! That's a rather long algorithm.
I don't know 100% if it's O(n log n) because of the last step, but everything up to there is definitely O(n log n) as far as I can tell. I'll get back to this later and try to refine the last step.
EDIT: Changed my answer to reflect Welbog's comment. Sorry for the error. I'll write some pseudocode later, too, when I get a little more time to decipher what I wrote again. ;-)
I'll give my rough guess here, and let those who are better with calculating complexity to help me on how my algorithm fares in O-notation wise
given binary string 0000010101000100 (as example)
crop head and tail of zeroes -> 00000 101010001 00
we get 101010001 from previous calculation
check if the middle bit is 'one', if true, found valid three evenly spaced 'ones' (only if the number of bits is odd numbered)
correlatively, if the remained cropped number of bits is even numbered, the head and tail 'one' cannot be part of evenly spaced 'one',
we use 1010100001 as example (with an extra 'zero' to become even numbered crop), in this case we need to crop again, then becomes -> 10101 00001
we get 10101 from previous calculation, and check middle bit, and we found the evenly spaced bit again
I have no idea how to calculate complexity for this, can anyone help?
edit: add some code to illustrate my idea
edit2: tried to compile my code and found some major mistakes, fixed
char *binaryStr = "0000010101000100";
int main() {
int head, tail, pos;
head = 0;
tail = strlen(binaryStr)-1;
if( (pos = find3even(head, tail)) >=0 )
printf("found it at position %d\n", pos);
return 0;
}
int find3even(int head, int tail) {
int pos = 0;
if(head >= tail) return -1;
while(binaryStr[head] == '0')
if(head<tail) head++;
while(binaryStr[tail] == '0')
if(head<tail) tail--;
if(head >= tail) return -1;
if( (tail-head)%2 == 0 && //true if odd numbered
(binaryStr[head + (tail-head)/2] == '1') ) {
return head;
}else {
if( (pos = find3even(head, tail-1)) >=0 )
return pos;
if( (pos = find3even(head+1, tail)) >=0 )
return pos;
}
return -1;
}
I came up with something like this:
def IsSymetric(number):
number = number.strip('0')
if len(number) < 3:
return False
if len(number) % 2 == 0:
return IsSymetric(number[1:]) or IsSymetric(number[0:len(number)-2])
else:
if number[len(number)//2] == '1':
return True
return IsSymetric(number[:(len(number)//2)]) or IsSymetric(number[len(number)//2+1:])
return False
This is inspired by andycjw.
Truncate the zeros.
If even then test two substring 0 - (len-2) (skip last character) and from 1 - (len-1) (skip the first char)
If not even than if the middle char is one than we have success. Else divide the string in the midle without the midle element and check both parts.
As to the complexity this might be O(nlogn) as in each recursion we are dividing by two.
Hope it helps.
Ok, I'm going to take another stab at the problem. I think I can prove a O(n log(n)) algorithm that is similar to those already discussed by using a balanced binary tree to store distances between 1's. This approach was inspired by Justice's observation about reducing the problem to a list of distances between the 1's.
Could we scan the input string to construct a balanced binary tree around the position of 1's such that each node stores the position of the 1 and each edge is labeled with the distance to the adjacent 1 for each child node. For example:
10010001 gives the following tree
3
/ \
2 / \ 3
/ \
0 7
This can be done in O(n log(n)) since, for a string of size n, each insertion takes O(log(n)) in the worst case.
Then the problem is to search the tree to discover whether, at any node, there is a path from that node through the left-child that has the same distance as a path through the right child. This can be done recursively on each subtree. When merging two subtrees in the search, we must compare the distances from paths in the left subtree with distances from paths in the right. Since the number of paths in a subtree will be proportional to log(n), and the number of nodes is n, I believe this can be done in O(n log(n)) time.
Did I miss anything?
This seemed liked a fun problem so I decided to try my hand at it.
I am making the assumption that 111000001 would find the first 3 ones and be successful. Essentially the number of zeroes following the 1 is the important thing, since 0111000 is the same as 111000 according to your definition. Once you find two cases of 1, the next 1 found completes the trilogy.
Here it is in Python:
def find_three(bstring):
print bstring
dict = {}
lastone = -1
zerocount = 0
for i in range(len(bstring)):
if bstring[i] == '1':
print i, ': 1'
if lastone != -1:
if(zerocount in dict):
dict[zerocount].append(lastone)
if len(dict[zerocount]) == 2:
dict[zerocount].append(i)
return True, dict
else:
dict[zerocount] = [lastone]
lastone = i
zerocount = 0
else:
zerocount = zerocount + 1
#this is really just book keeping, as we have failed at this point
if lastone != -1:
if(zerocount in dict):
dict[zerocount].append(lastone)
else:
dict[zerocount] = [lastone]
return False, dict
This is a first try, so I'm sure this could be written in a cleaner manner. Please list the cases where this method fails down below.
I assume the reason this is nlog(n) is due to the following:
To find the 1 that is the start of the triplet, you need to check (n-2) characters. If you haven't found it by that point, you won't (chars n-1 and n cannot start a triplet) (O(n))
To find the second 1 that is the part of the triplet (started by the first one), you need to check m/2 (m=n-x, where x is the offset of the first 1) characters. This is because, if you haven't found the second 1 by the time you're halfway from the first one to the end, you won't... since the third 1 must be exactly the same distance past the second. (O(log(n)))
It O(1) to find the last 1 since you know the index it must be at by the time you find the first and second.
So, you have n, log(n), and 1... O(nlogn)
Edit: Oops, my bad. My brain had it set that n/2 was logn... which it obviously isn't (doubling the number on items still doubles the number of iterations on the inner loop). This is still at n^2, not solving the problem. Well, at least I got to write some code :)
Implementation in Tcl
proc get-triplet {input} {
for {set first 0} {$first < [string length $input]-2} {incr first} {
if {[string index $input $first] != 1} {
continue
}
set start [expr {$first + 1}]
set end [expr {1+ $first + (([string length $input] - $first) /2)}]
for {set second $start} {$second < $end} {incr second} {
if {[string index $input $second] != 1} {
continue
}
set last [expr {($second - $first) + $second}]
if {[string index $input $last] == 1} {
return [list $first $second $last]
}
}
}
return {}
}
get-triplet 10101 ;# 0 2 4
get-triplet 10111 ;# 0 2 4
get-triplet 11100000 ;# 0 1 2
get-triplet 0100100100 ;# 1 4 7
I think I have found a way of solving the problem, but I can't construct a formal proof. The solution I made is written in Java, and it uses a counter 'n' to count how many list/array accesses it does. So n should be less than or equal to stringLength*log(stringLength) if it is correct. I tried it for the numbers 0 to 2^22, and it works.
It starts by iterating over the input string and making a list of all the indexes which hold a one. This is just O(n).
Then from the list of indexes it picks a firstIndex, and a secondIndex which is greater than the first. These two indexes must hold ones, because they are in the list of indexes. From there the thirdIndex can be calculated. If the inputString[thirdIndex] is a 1 then it halts.
public static int testString(String input){
//n is the number of array/list accesses in the algorithm
int n=0;
//Put the indices of all the ones into a list, O(n)
ArrayList<Integer> ones = new ArrayList<Integer>();
for(int i=0;i<input.length();i++){
if(input.charAt(i)=='1'){
ones.add(i);
}
}
//If less than three ones in list, just stop
if(ones.size()<3){
return n;
}
int firstIndex, secondIndex, thirdIndex;
for(int x=0;x<ones.size()-2;x++){
n++;
firstIndex = ones.get(x);
for(int y=x+1; y<ones.size()-1; y++){
n++;
secondIndex = ones.get(y);
thirdIndex = secondIndex*2 - firstIndex;
if(thirdIndex >= input.length()){
break;
}
n++;
if(input.charAt(thirdIndex) == '1'){
//This case is satisfied if it has found three evenly spaced ones
//System.out.println("This one => " + input);
return n;
}
}
}
return n;
}
additional note: the counter n is not incremented when it iterates over the input string to construct the list of indexes. This operation is O(n), so it won't have an effect on the algorithm complexity anyway.
One inroad into the problem is to think of factors and shifting.
With shifting, you compare the string of ones and zeroes with a shifted version of itself. You then take matching ones. Take this example shifted by two:
1010101010
1010101010
------------
001010101000
The resulting 1's (bitwise ANDed), must represent all those 1's which are evenly spaced by two. The same example shifted by three:
1010101010
1010101010
-------------
0000000000000
In this case there are no 1's which are evenly spaced three apart.
So what does this tell you? Well that you only need to test shifts which are prime numbers. For example say you have two 1's which are six apart. You would only have to test 'two' shifts and 'three' shifts (since these divide six). For example:
10000010
10000010 (Shift by two)
10000010
10000010 (We have a match)
10000010
10000010 (Shift by three)
10000010 (We have a match)
So the only shifts you ever need to check are 2,3,5,7,11,13 etc. Up to the prime closest to the square root of size of the string of digits.
Nearly solved?
I think I am closer to a solution. Basically:
Scan the string for 1's. For each 1 note it's remainder after taking a modulus of its position. The modulus ranges from 1 to half the size of the string. This is because the largest possible separation size is half the string. This is done in O(n^2). BUT. Only prime moduli need be checked so O(n^2/log(n))
Sort the list of modulus/remainders in order largest modulus first, this can be done in O(n*log(n)) time.
Look for three consecutive moduli/remainders which are the same.
Somehow retrieve the position of the ones!
I think the biggest clue to the answer, is that the fastest sort algorithms, are O(n*log(n)).
WRONG
Step 1 is wrong as pointed out by a colleague. If we have 1's at position 2,12 and 102. Then taking a modulus of 10, they would all have the same remainders, and yet are not equally spaced apart! Sorry.
Here are some thoughts that, despite my best efforts, will not seem to wrap themselves up in a bow. Still, they might be a useful starting point for someone's analysis.
Consider the proposed solution as follows, which is the approach that several folks have suggested, including myself in a prior version of this answer. :)
Trim leading and trailing zeroes.
Scan the string looking for 1's.
When a 1 is found:
Assume that it is the middle 1 of the solution.
For each prior 1, use its saved position to compute the anticipated position of the final 1.
If the computed position is after the end of the string it cannot be part of the solution, so drop the position from the list of candidates.
Check the solution.
If the solution was not found, add the current 1 to the list of candidates.
Repeat until no more 1's are found.
Now consider input strings strings like the following, which will not have a solution:
101
101001
1010010001
101001000100001
101001000100001000001
In general, this is the concatenation of k strings of the form j 0's followed by a 1 for j from zero to k-1.
k=2 101
k=3 101001
k=4 1010010001
k=5 101001000100001
k=6 101001000100001000001
Note that the lengths of the substrings are 1, 2, 3, etc. So, problem size n has substrings of lengths 1 to k such that n = k(k+1)/2.
k=2 n= 3 101
k=3 n= 6 101001
k=4 n=10 1010010001
k=5 n=15 101001000100001
k=6 n=21 101001000100001000001
Note that k also tracks the number of 1's that we have to consider. Remember that every time we see a 1, we need to consider all the 1's seen so far. So when we see the second 1, we only consider the first, when we see the third 1, we reconsider the first two, when we see the fourth 1, we need to reconsider the first three, and so on. By the end of the algorithm, we've considered k(k-1)/2 pairs of 1's. Call that p.
k=2 n= 3 p= 1 101
k=3 n= 6 p= 3 101001
k=4 n=10 p= 6 1010010001
k=5 n=15 p=10 101001000100001
k=6 n=21 p=15 101001000100001000001
The relationship between n and p is that n = p + k.
The process of going through the string takes O(n) time. Each time a 1 is encountered, a maximum of (k-1) comparisons are done. Since n = k(k+1)/2, n > k**2, so sqrt(n) > k. This gives us O(n sqrt(n)) or O(n**3/2). Note however that may not be a really tight bound, because the number of comparisons goes from 1 to a maximum of k, it isn't k the whole time. But I'm not sure how to account for that in the math.
It still isn't O(n log(n)). Also, I can't prove those inputs are the worst cases, although I suspect they are. I think a denser packing of 1's to the front results in an even sparser packing at the end.
Since someone may still find it useful, here's my code for that solution in Perl:
#!/usr/bin/perl
# read input as first argument
my $s = $ARGV[0];
# validate the input
$s =~ /^[01]+$/ or die "invalid input string\n";
# strip leading and trailing 0's
$s =~ s/^0+//;
$s =~ s/0+$//;
# prime the position list with the first '1' at position 0
my #p = (0);
# start at position 1, which is the second character
my $i = 1;
print "the string is $s\n\n";
while ($i < length($s)) {
if (substr($s, $i, 1) eq '1') {
print "found '1' at position $i\n";
my #t = ();
# assuming this is the middle '1', go through the positions
# of all the prior '1's and check whether there's another '1'
# in the correct position after this '1' to make a solution
while (scalar #p) {
# $p is the position of the prior '1'
my $p = shift #p;
# $j is the corresponding position for the following '1'
my $j = 2 * $i - $p;
# if $j is off the end of the string then we don't need to
# check $p anymore
next if ($j >= length($s));
print "checking positions $p, $i, $j\n";
if (substr($s, $j, 1) eq '1') {
print "\nsolution found at positions $p, $i, $j\n";
exit 0;
}
# if $j isn't off the end of the string, keep $p for next time
push #t, $p;
}
#p = #t;
# add this '1' to the list of '1' positions
push #p, $i;
}
$i++;
}
print "\nno solution found\n";
While scanning 1s, add their positions to a List. When adding the second and successive 1s, compare them to each position in the list so far. Spacing equals currentOne (center) - previousOne (left). The right-side bit is currentOne + spacing. If it's 1, the end.
The list of ones grows inversely with the space between them. Simply stated, if you've got a lot of 0s between the 1s (as in a worst case), your list of known 1s will grow quite slowly.
using System;
using System.Collections.Generic;
namespace spacedOnes
{
class Program
{
static int[] _bits = new int[8] {128, 64, 32, 16, 8, 4, 2, 1};
static void Main(string[] args)
{
var bytes = new byte[4];
var r = new Random();
r.NextBytes(bytes);
foreach (var b in bytes) {
Console.Write(getByteString(b));
}
Console.WriteLine();
var bitCount = bytes.Length * 8;
var done = false;
var onePositions = new List<int>();
for (var i = 0; i < bitCount; i++)
{
if (isOne(bytes, i)) {
if (onePositions.Count > 0) {
foreach (var knownOne in onePositions) {
var spacing = i - knownOne;
var k = i + spacing;
if (k < bitCount && isOne(bytes, k)) {
Console.WriteLine("^".PadLeft(knownOne + 1) + "^".PadLeft(spacing) + "^".PadLeft(spacing));
done = true;
break;
}
}
}
if (done) {
break;
}
onePositions.Add(i);
}
}
Console.ReadKey();
}
static String getByteString(byte b) {
var s = new char[8];
for (var i=0; i<s.Length; i++) {
s[i] = ((b & _bits[i]) > 0 ? '1' : '0');
}
return new String(s);
}
static bool isOne(byte[] bytes, int i)
{
var byteIndex = i / 8;
var bitIndex = i % 8;
return (bytes[byteIndex] & _bits[bitIndex]) > 0;
}
}
}
I thought I'd add one comment before posting the 22nd naive solution to the problem. For the naive solution, we don't need to show that the number of 1's in the string is at most O(log(n)), but rather that it is at most O(sqrt(n*log(n)).
Solver:
def solve(Str):
indexes=[]
#O(n) setup
for i in range(len(Str)):
if Str[i]=='1':
indexes.append(i)
#O((number of 1's)^2) processing
for i in range(len(indexes)):
for j in range(i+1, len(indexes)):
indexDiff = indexes[j] - indexes[i]
k=indexes[j] + indexDiff
if k<len(Str) and Str[k]=='1':
return True
return False
It's basically a fair bit similar to flybywire's idea and implementation, though looking ahead instead of back.
Greedy String Builder:
#assumes final char hasn't been added, and would be a 1
def lastCharMakesSolvable(Str):
endIndex=len(Str)
j=endIndex-1
while j-(endIndex-j) >= 0:
k=j-(endIndex-j)
if k >= 0 and Str[k]=='1' and Str[j]=='1':
return True
j=j-1
return False
def expandString(StartString=''):
if lastCharMakesSolvable(StartString):
return StartString + '0'
return StartString + '1'
n=1
BaseStr=""
lastCount=0
while n<1000000:
BaseStr=expandString(BaseStr)
count=BaseStr.count('1')
if count != lastCount:
print(len(BaseStr), count)
lastCount=count
n=n+1
(In my defense, I'm still in the 'learn python' stage of understanding)
Also, potentially useful output from the greedy building of strings, there's a rather consistent jump after hitting a power of 2 in the number of 1's... which I was not willing to wait around to witness hitting 2096.
strlength # of 1's
1 1
2 2
4 3
5 4
10 5
14 8
28 9
41 16
82 17
122 32
244 33
365 64
730 65
1094 128
2188 129
3281 256
6562 257
9842 512
19684 513
29525 1024
I'll try to present a mathematical approach. This is more a beginning than an end, so any help, comment, or even contradiction - will be deeply appreciated. However, if this approach is proven - the algorithm is a straight-forward search in the string.
Given a fixed number of spaces k and a string S, the search for a k-spaced-triplet takes O(n) - We simply test for every 0<=i<=(n-2k) if S[i]==S[i+k]==S[i+2k]. The test takes O(1) and we do it n-k times where k is a constant, so it takes O(n-k)=O(n).
Let us assume that there is an Inverse Proportion between the number of 1's and the maximum spaces we need to search for. That is, If there are many 1's, there must be a triplet and it must be quite dense; If there are only few 1's, The triplet (if any) can be quite sparse. In other words, I can prove that if I have enough 1's, such triplet must exist - and the more 1's I have, a more dense triplet must be found. This can be explained by the Pigeonhole principle - Hope to elaborate on this later.
Say have an upper bound k on the possible number of spaces I have to look for. Now, for each 1 located in S[i] we need to check for 1 in S[i-1] and S[i+1], S[i-2] and S[i+2], ... S[i-k] and S[i+k]. This takes O((k^2-k)/2)=O(k^2) for each 1 in S - due to Gauss' Series Summation Formula. Note that this differs from section 1 - I'm having k as an upper bound for the number of spaces, not as a constant space.
We need to prove O(n*log(n)). That is, we need to show that k*(number of 1's) is proportional to log(n).
If we can do that, the algorithm is trivial - for each 1 in S whose index is i, simply look for 1's from each side up to distance k. If two were found in the same distance, return i and k. Again, the tricky part would be finding k and proving the correctness.
I would really appreciate your comments here - I have been trying to find the relation between k and the number of 1's on my whiteboard, so far without success.
Assumption:
Just wrong, talking about log(n) number of upper limit of ones
EDIT:
Now I found that using Cantor numbers (if correct), density on set is (2/3)^Log_3(n) (what a weird function) and I agree, log(n)/n density is to strong.
If this is upper limit, there is algorhitm who solves this problem in at least O(n*(3/2)^(log(n)/log(3))) time complexity and O((3/2)^(log(n)/log(3))) space complexity. (check Justice's answer for algorhitm)
This is still by far better than O(n^2)
This function ((3/2)^(log(n)/log(3))) really looks like n*log(n) on first sight.
How did I get this formula?
Applaying Cantors number on string.
Supose that length of string is 3^p == n
At each step in generation of Cantor string you keep 2/3 of prevous number of ones. Apply this p times.
That mean (n * ((2/3)^p)) -> (((3^p)) * ((2/3)^p)) remaining ones and after simplification 2^p.
This mean 2^p ones in 3^p string -> (3/2)^p ones . Substitute p=log(n)/log(3) and get
((3/2)^(log(n)/log(3)))
How about a simple O(n) solution, with O(n^2) space? (Uses the assumption that all bitwise operators work in O(1).)
The algorithm basically works in four stages:
Stage 1: For each bit in your original number, find out how far away the ones are, but consider only one direction. (I considered all the bits in the direction of the least significant bit.)
Stage 2: Reverse the order of the bits in the input;
Stage 3: Re-run step 1 on the reversed input.
Stage 4: Compare the results from Stage 1 and Stage 3. If any bits are equally spaced above AND below we must have a hit.
Keep in mind that no step in the above algorithm takes longer than O(n). ^_^
As an added benefit, this algorithm will find ALL equally spaced ones from EVERY number. So for example if you get a result of "0x0005" then there are equally spaced ones at BOTH 1 and 3 units away
I didn't really try optimizing the code below, but it is compilable C# code that seems to work.
using System;
namespace ThreeNumbers
{
class Program
{
const int uint32Length = 32;
static void Main(string[] args)
{
Console.Write("Please enter your integer: ");
uint input = UInt32.Parse(Console.ReadLine());
uint[] distancesLower = Distances(input);
uint[] distancesHigher = Distances(Reverse(input));
PrintHits(input, distancesLower, distancesHigher);
}
/// <summary>
/// Returns an array showing how far the ones away from each bit in the input. Only
/// considers ones at lower signifcant bits. Index 0 represents the least significant bit
/// in the input. Index 1 represents the second least significant bit in the input and so
/// on. If a one is 3 away from the bit in question, then the third least significant bit
/// of the value will be sit.
///
/// As programed this algorithm needs: O(n) time, and O(n*log(n)) space.
/// (Where n is the number of bits in the input.)
/// </summary>
public static uint[] Distances(uint input)
{
uint[] distanceToOnes = new uint[uint32Length];
uint result = 0;
//Sets how far each bit is from other ones. Going in the direction of LSB to MSB
for (uint bitIndex = 1, arrayIndex = 0; bitIndex != 0; bitIndex <<= 1, ++arrayIndex)
{
distanceToOnes[arrayIndex] = result;
result <<= 1;
if ((input & bitIndex) != 0)
{
result |= 1;
}
}
return distanceToOnes;
}
/// <summary>
/// Reverses the bits in the input.
///
/// As programmed this algorithm needs O(n) time and O(n) space.
/// (Where n is the number of bits in the input.)
/// </summary>
/// <param name="input"></param>
/// <returns></returns>
public static uint Reverse(uint input)
{
uint reversedInput = 0;
for (uint bitIndex = 1; bitIndex != 0; bitIndex <<= 1)
{
reversedInput <<= 1;
reversedInput |= (uint)((input & bitIndex) != 0 ? 1 : 0);
}
return reversedInput;
}
/// <summary>
/// Goes through each bit in the input, to check if there are any bits equally far away in
/// the distancesLower and distancesHigher
/// </summary>
public static void PrintHits(uint input, uint[] distancesLower, uint[] distancesHigher)
{
const int offset = uint32Length - 1;
for (uint bitIndex = 1, arrayIndex = 0; bitIndex != 0; bitIndex <<= 1, ++arrayIndex)
{
//hits checks if any bits are equally spaced away from our current value
bool isBitSet = (input & bitIndex) != 0;
uint hits = distancesLower[arrayIndex] & distancesHigher[offset - arrayIndex];
if (isBitSet && (hits != 0))
{
Console.WriteLine(String.Format("The {0}-th LSB has hits 0x{1:x4} away", arrayIndex + 1, hits));
}
}
}
}
}
Someone will probably comment that for any sufficiently large number, bitwise operations cannot be done in O(1). You'd be right. However, I'd conjecture that every solution that uses addition, subtraction, multiplication, or division (which cannot be done by shifting) would also have that problem.
Below is a solution. There could be some little mistakes here and there, but the idea is sound.
Edit: It's not n * log(n)
PSEUDO CODE:
foreach character in the string
if the character equals 1 {
if length cache > 0 { //we can skip the first one
foreach location in the cache { //last in first out kind of order
if ((currentlocation + (currentlocation - location)) < length string)
if (string[(currentlocation + (currentlocation - location))] equals 1)
return found evenly spaced string
else
break;
}
}
remember the location of this character in a some sort of cache.
}
return didn't find evenly spaced string
C# code:
public static Boolean FindThreeEvenlySpacedOnes(String str) {
List<int> cache = new List<int>();
for (var x = 0; x < str.Length; x++) {
if (str[x] == '1') {
if (cache.Count > 0) {
for (var i = cache.Count - 1; i > 0; i--) {
if ((x + (x - cache[i])) >= str.Length)
break;
if (str[(x + (x - cache[i]))] == '1')
return true;
}
}
cache.Add(x);
}
}
return false;
}
How it works:
iteration 1:
x
|
101101001
// the location of this 1 is stored in the cache
iteration 2:
x
|
101101001
iteration 3:
a x b
| | |
101101001
//we retrieve location a out of the cache and then based on a
//we calculate b and check if te string contains a 1 on location b
//and of course we store x in the cache because it's a 1
iteration 4:
axb
|||
101101001
a x b
| | |
101101001
iteration 5:
x
|
101101001
iteration 6:
a x b
| | |
101101001
a x b
| | |
101101001
//return found evenly spaced string
Obviously we need to at least check bunches of triplets at the same time, so we need to compress the checks somehow. I have a candidate algorithm, but analyzing the time complexity is beyond my ability*time threshold.
Build a tree where each node has three children and each node contains the total number of 1's at its leaves. Build a linked list over the 1's, as well. Assign each node an allowed cost proportional to the range it covers. As long as the time we spend at each node is within budget, we'll have an O(n lg n) algorithm.
--
Start at the root. If the square of the total number of 1's below it is less than its allowed cost, apply the naive algorithm. Otherwise recurse on its children.
Now we have either returned within budget, or we know that there are no valid triplets entirely contained within one of the children. Therefore we must check the inter-node triplets.
Now things get incredibly messy. We essentially want to recurse on the potential sets of children while limiting the range. As soon as the range is constrained enough that the naive algorithm will run under budget, you do it. Enjoy implementing this, because I guarantee it will be tedious. There's like a dozen cases.
--
The reason I think that algorithm will work is because the sequences without valid triplets appear to go alternate between bunches of 1's and lots of 0's. It effectively splits the nearby search space, and the tree emulates that splitting.
The run time of the algorithm is not obvious, at all. It relies on the non-trivial properties of the sequence. If the 1's are really sparse then the naive algorithm will work under budget. If the 1's are dense, then a match should be found right away. But if the density is 'just right' (eg. near ~n^0.63, which you can achieve by setting all bits at positions with no '2' digit in base 3), I don't know if it will work. You would have to prove that the splitting effect is strong enough.
No theoretical answer here, but I wrote a quick Java program to explore the running-time behavior as a function of k and n, where n is the total bit length and k is the number of 1's. I'm with a few of the answerers who are saying that the "regular" algorithm that checks all the pairs of bit positions and looks for the 3rd bit, even though it would require O(k^2) in the worst case, in reality because the worst-case needs sparse bitstrings, is O(n ln n).
Anyway here's the program, below. It's a Monte-Carlo style program which runs a large number of trials NTRIALS for constant n, and randomly generates bitsets for a range of k-values using Bernoulli processes with ones-density constrained between limits that can be specified, and records the running time of finding or failing to find a triplet of evenly spaced ones, time measured in steps NOT in CPU time. I ran it for n=64, 256, 1024, 4096, 16384* (still running), first a test run with 500000 trials to see which k-values take the longest running time, then another test with 5000000 trials with narrowed ones-density focus to see what those values look like. The longest running times do happen with very sparse density (e.g. for n=4096 the running time peaks are in the k=16-64 range, with a gentle peak for mean runtime at 4212 steps # k=31, max runtime peaked at 5101 steps # k=58). It looks like it would take extremely large values of N for the worst-case O(k^2) step to become larger than the O(n) step where you scan the bitstring to find the 1's position indices.
package com.example.math;
import java.io.PrintStream;
import java.util.BitSet;
import java.util.Random;
public class EvenlySpacedOnesTest {
static public class StatisticalSummary
{
private int n=0;
private double min=Double.POSITIVE_INFINITY;
private double max=Double.NEGATIVE_INFINITY;
private double mean=0;
private double S=0;
public StatisticalSummary() {}
public void add(double x) {
min = Math.min(min, x);
max = Math.max(max, x);
++n;
double newMean = mean + (x-mean)/n;
S += (x-newMean)*(x-mean);
// this algorithm for mean,std dev based on Knuth TAOCP vol 2
mean = newMean;
}
public double getMax() { return (n>0)?max:Double.NaN; }
public double getMin() { return (n>0)?min:Double.NaN; }
public int getCount() { return n; }
public double getMean() { return (n>0)?mean:Double.NaN; }
public double getStdDev() { return (n>0)?Math.sqrt(S/n):Double.NaN; }
// some may quibble and use n-1 for sample std dev vs population std dev
public static void printOut(PrintStream ps, StatisticalSummary[] statistics) {
for (int i = 0; i < statistics.length; ++i)
{
StatisticalSummary summary = statistics[i];
ps.printf("%d\t%d\t%.0f\t%.0f\t%.5f\t%.5f\n",
i,
summary.getCount(),
summary.getMin(),
summary.getMax(),
summary.getMean(),
summary.getStdDev());
}
}
}
public interface RandomBernoulliProcess // see http://en.wikipedia.org/wiki/Bernoulli_process
{
public void setProbability(double d);
public boolean getNextBoolean();
}
static public class Bernoulli implements RandomBernoulliProcess
{
final private Random r = new Random();
private double p = 0.5;
public boolean getNextBoolean() { return r.nextDouble() < p; }
public void setProbability(double d) { p = d; }
}
static public class TestResult {
final public int k;
final public int nsteps;
public TestResult(int k, int nsteps) { this.k=k; this.nsteps=nsteps; }
}
////////////
final private int n;
final private int ntrials;
final private double pmin;
final private double pmax;
final private Random random = new Random();
final private Bernoulli bernoulli = new Bernoulli();
final private BitSet bits;
public EvenlySpacedOnesTest(int n, int ntrials, double pmin, double pmax) {
this.n=n; this.ntrials=ntrials; this.pmin=pmin; this.pmax=pmax;
this.bits = new BitSet(n);
}
/*
* generate random bit string
*/
private int generateBits()
{
int k = 0; // # of 1's
for (int i = 0; i < n; ++i)
{
boolean b = bernoulli.getNextBoolean();
this.bits.set(i, b);
if (b) ++k;
}
return k;
}
private int findEvenlySpacedOnes(int k, int[] pos)
{
int[] bitPosition = new int[k];
for (int i = 0, j = 0; i < n; ++i)
{
if (this.bits.get(i))
{
bitPosition[j++] = i;
}
}
int nsteps = n; // first, it takes N operations to find the bit positions.
boolean found = false;
if (k >= 3) // don't bother doing anything if there are less than 3 ones. :(
{
int lastBitSetPosition = bitPosition[k-1];
for (int j1 = 0; !found && j1 < k; ++j1)
{
pos[0] = bitPosition[j1];
for (int j2 = j1+1; !found && j2 < k; ++j2)
{
pos[1] = bitPosition[j2];
++nsteps;
pos[2] = 2*pos[1]-pos[0];
// calculate 3rd bit index that might be set;
// the other two indices point to bits that are set
if (pos[2] > lastBitSetPosition)
break;
// loop inner loop until we go out of bounds
found = this.bits.get(pos[2]);
// we're done if we find a third 1!
}
}
}
if (!found)
pos[0]=-1;
return nsteps;
}
/*
* run an algorithm that finds evenly spaced ones and returns # of steps.
*/
public TestResult run()
{
bernoulli.setProbability(pmin + (pmax-pmin)*random.nextDouble());
// probability of bernoulli process is randomly distributed between pmin and pmax
// generate bit string.
int k = generateBits();
int[] pos = new int[3];
int nsteps = findEvenlySpacedOnes(k, pos);
return new TestResult(k, nsteps);
}
public static void main(String[] args)
{
int n;
int ntrials;
double pmin = 0, pmax = 1;
try {
n = Integer.parseInt(args[0]);
ntrials = Integer.parseInt(args[1]);
if (args.length >= 3)
pmin = Double.parseDouble(args[2]);
if (args.length >= 4)
pmax = Double.parseDouble(args[3]);
}
catch (Exception e)
{
System.out.println("usage: EvenlySpacedOnesTest N NTRIALS [pmin [pmax]]");
System.exit(0);
return; // make the compiler happy
}
final StatisticalSummary[] statistics;
statistics=new StatisticalSummary[n+1];
for (int i = 0; i <= n; ++i)
{
statistics[i] = new StatisticalSummary();
}
EvenlySpacedOnesTest test = new EvenlySpacedOnesTest(n, ntrials, pmin, pmax);
int printInterval=100000;
int nextPrint = printInterval;
for (int i = 0; i < ntrials; ++i)
{
TestResult result = test.run();
statistics[result.k].add(result.nsteps);
if (i == nextPrint)
{
System.err.println(i);
nextPrint += printInterval;
}
}
StatisticalSummary.printOut(System.out, statistics);
}
}
# <algorithm>
def contains_evenly_spaced?(input)
return false if input.size < 3
one_indices = []
input.each_with_index do |digit, index|
next if digit == 0
one_indices << index
end
return false if one_indices.size < 3
previous_indexes = []
one_indices.each do |index|
if !previous_indexes.empty?
previous_indexes.each do |previous_index|
multiple = index - previous_index
success_index = index + multiple
return true if input[success_index] == 1
end
end
previous_indexes << index
end
return false
end
# </algorithm>
def parse_input(input)
input.chars.map { |c| c.to_i }
end
I'm having trouble with the worst-case scenarios with millions of digits. Fuzzing from /dev/urandom essentially gives you O(n), but I know the worst case is worse than that. I just can't tell how much worse. For small n, it's trivial to find inputs at around 3*n*log(n), but it's surprisingly hard to differentiate those from some other order of growth for this particular problem.
Can anyone who was working on worst-case inputs generate a string with length greater than say, one hundred thousand?
An adaptation of the Rabin-Karp algorithm could be possible for you.
Its complexity is 0(n) so it could help you.
Take a look http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
Could this be a solution? I', not sure if it's O(nlogn) but in my opinion it's better than O(n²) because the the only way not to find a triple would be a prime number distribution.
There's room for improvement, the second found 1 could be the next first 1. Also no error checking.
#include <iostream>
#include <string>
int findIt(std::string toCheck) {
for (int i=0; i<toCheck.length(); i++) {
if (toCheck[i]=='1') {
std::cout << i << ": " << toCheck[i];
for (int j = i+1; j<toCheck.length(); j++) {
if (toCheck[j]=='1' && toCheck[(i+2*(j-i))] == '1') {
std::cout << ", " << j << ":" << toCheck[j] << ", " << (i+2*(j-i)) << ":" << toCheck[(i+2*(j-i))] << " found" << std::endl;
return 0;
}
}
}
}
return -1;
}
int main (int agrc, char* args[]) {
std::string toCheck("1001011");
findIt(toCheck);
std::cin.get();
return 0;
}
I think this algorithm has O(n log n) complexity (C++, DevStudio 2k5). Now, I don't know the details of how to analyse an algorithm to determine its complexity, so I have added some metric gathering information to the code. The code counts the number of tests done on the sequence of 1's and 0's for any given input (hopefully, I've not made a balls of the algorithm). We can compare the actual number of tests against the O value and see if there's a correlation.
#include <iostream>
using namespace std;
bool HasEvenBits (string &sequence, int &num_compares)
{
bool
has_even_bits = false;
num_compares = 0;
for (unsigned i = 1 ; i <= (sequence.length () - 1) / 2 ; ++i)
{
for (unsigned j = 0 ; j < sequence.length () - 2 * i ; ++j)
{
++num_compares;
if (sequence [j] == '1' && sequence [j + i] == '1' && sequence [j + i * 2] == '1')
{
has_even_bits = true;
// we could 'break' here, but I want to know the worst case scenario so keep going to the end
}
}
}
return has_even_bits;
}
int main ()
{
int
count;
string
input = "111";
for (int i = 3 ; i < 32 ; ++i)
{
HasEvenBits (input, count);
cout << i << ", " << count << endl;
input += "0";
}
}
This program outputs the number of tests for each string length up to 32 characters. Here's the results:
n Tests n log (n)
=====================
3 1 1.43
4 2 2.41
5 4 3.49
6 6 4.67
7 9 5.92
8 12 7.22
9 16 8.59
10 20 10.00
11 25 11.46
12 30 12.95
13 36 14.48
14 42 16.05
15 49 17.64
16 56 19.27
17 64 20.92
18 72 22.59
19 81 24.30
20 90 26.02
21 100 27.77
22 110 29.53
23 121 31.32
24 132 33.13
25 144 34.95
26 156 36.79
27 169 38.65
28 182 40.52
29 196 42.41
30 210 44.31
31 225 46.23
I've added the 'n log n' values as well. Plot these using your graphing tool of choice to see a correlation between the two results. Does this analysis extend to all values of n? I don't know.

the least adding numbers--algorithm

I came across this problem online.
Given an integer:N and an array int arr[], you have to add some
elements to the array so that you can generate from 1 to N by using
(add) the element in the array.
Please keep in mind that you can only use each element in the array once when generating a certain x (1<=x<=N). Return the number of the least adding numbers.
For example:
N=6, arr = [1, 3]
1 is already in arr.
add 2 to the arr.
3 is already in arr
4 = 1 + 3
5 = 2 + 3
6 = 1 + 2 + 3
So we return 1 since we only need to add one element which is 2.
Can anyone give some hints?
N can always be made by adding subset of 1 to N - 1 numbers except N = 2 and N = 1. So, a number X can must be made when previous 1 to X - 1 consecutive elements are already in the array.
Example -
arr[] = {1, 2, 5}, N = 9
ans := 0
1 is already present.
2 is already present.
3 is absent. But prior 1 to (3 - 1) elements are present. So 3 is added in the array. But as 3 is built using already existed elements, so answer won't increase.
same rule for 4 and 5
So, ans is 0
arr[] = {3, 4}, for any N >= 2
ans = 2
arr[] = {1, 3}, for any N >= 2
ans = 1
So, it seems that, if only 1 and 2 is not present in the array, we have to add that element regardless of the previous elements are already in array or not. All later numbers can be made by using previous elements. And when trying to making any number X (> 2), we will already found previous 1 to X - 1 elements in the array. So X can always be made.
So, basically we need to check if 1 and 2 is present or not. So answer of this problem won't be bigger than 2
Constraint 2
In above algorithm, we assume, when a new element X is not present in the array but it can be made using already existed elements of the array, then answer won't increase but X will be added in the array to be used for next numbers building. What if X can't be added in the array?
Then, Basically it will turn into a subset sum problem. For every missing number we have to check if the number can be made using any subset of elements in the array. Its a typical O(N^2) dynamic programming algorithm.
int subsetSum(vector<int>& arr, int N)
{
// The value of subset[i][j] will be true if there is a subset of set[0..j-1]
// with sum equal to i
bool subset[N + 1][arr.size() + 1];
// If sum is 0, then answer is true
for (int i = 0; i <= arr.size(); i++)
subset[0][i] = true;
// If sum is not 0 and set is empty, then answer is false
for (int i = 1; i <= N; i++)
subset[i][0] = false;
// Fill the subset table in botton up manner
for (int i = 1; i <= N; i++)
{
for (int j = 1; j <= arr.size(); j++)
{
subset[i][j] = subset[i][j - 1];
if (i >= set[j - 1])
subset[i][j] = subset[i][j] || subset[i - set[j - 1]][j - 1];
}
}
unordered_map<int, bool> exist;
for(int i = 0; i < arr.size(); ++i) {
exist[arr[i]] = true;
}
int ans = 0;
for(int i = 1; i <= N; ++i) {
if(!exist[i] or !subset[i][arr.size()]) {
ans++;
}
}
return ans;
}
Let A be the collection of input numbers.
Initialize a boolean array B to store in B[i] whether or not we can 'make' i by adding the numbers in A as described in the problem. Make all B[i] initially FALSE.
Then, pseudocode:
for i = 1 to N
if B[i] && (not A.Contains(i))
continue next i
if not A.Contains(i)
countAdded++
for j = N-i downTo 1
if B[j] then B[j+i] = TRUE
B[i] = TRUE
next i
Explanation:
Within the (main) loop (i): B contains TRUE for the values that we can construct with the values in A that are lower than i. Initially, therefore, with i=1 all B are FALSE.
Then, for each i we have two aspects to consider: (a) is B[i] already TRUE? If not we'll have to add i; (b) is i present in A? because, see previous remark, at this point we haven't yet processed that A-value. So, even if B[i] is already TRUE we'll have to flag TRUE for all (other) B that we may reach with i.
Consequently:
For each i we first determine if either of these two cases applies, and if not, we skip to the next i.
Then, if A does NOT (yet) contain i, it must be the case that B[i] is FALSE, see skip-condition, and therefore we'll add i (to A, conceptually, but it's not necessary to actually put it into A).
Next, either we had i in A initially, or we have just added it. In any case, we'll need to flag B TRUE for all values that can be constructed with this new i. To do so, we better scan existing B in downward fashion; otherwise we may add i to a "new" B-value that has i already as constituent.
Finally, B[i] itself is set TRUE (it may already be TRUE...), simply because i is in A (orginally, or by adding)
One way can be to make a set of all possible numbers that can be generated by the array. This can be done in O(n^2) time. Then, check whether numbers from 1 to n are present in the set in O(1) time. If a number is not present, add it to the count of least adding numbers which was initially zero and make a new empty set. Take all elements of previous set and add not present number to them and add them (set-add method) to the new set. Replace original set with the union of original and new set. Doing this from 1 to n will give the sum of least adding numbers in O(n^3) time.
Sort the array (NLogN)
Think this should work -
max_sum = 0
numbers_added = 0 # this will contain you final answer
for i in range(1, N+1):
if i not in arr and i > max_sum:
numbers_added += 1
max_sum += i
elif i < len(arr):
max_sum += arr[i]
print numbers_added
For each number starting from 1 we may either
Have it in the arr. In such case we update the list of numbers we can make.
Don't have it in the arr but we can form it with existing numbers. We simply ignore it.
We don't have it in the arr and we cannot form it with existing numbers. We add it to the arr and update the list of numbers we can make.
For example:
N=10, arr = [1, 2, 6]
1 is already in arr.
2 is already in arr.
3 = 1 + 2
3 is not in the arr but we can already form 3.
4 is not present in arr and we cannot form 4 either with existing numbers.
So add 4 to the arr and update.
5 = 1 + 4
6 = 2 + 4
7 = 1 + 2 + 4
5 is not in arr but we can form 5.
6 is in array. So update
8 = 2 + 6
9 = 1 + 2 + 6
10 = 4 + 6
So we return 1 since we only need to add one element which is 4.
And following might be an implementation:
int calc(bool arr[], bool can[], int N) {
// arr[i] is true if we already have number
// can[i] is true if we have been able to form number i
int count=0;
for(int i=1;i<=N;i++) {
if(arr[i]==false && can[i]==true) { // case 1
continue;
} else if(arr[i]==false && can[i]==false) { // case 3
count++;
}
for(int j=N-i;j>=1;j--) { // update for case 1 and case 3
if(can[j]==true) can[i+j]=true;
}
can[i]=1;
}
return count;
}

Google Combinatorial Optimization interview problem

I got asked this question on a interview for Google a couple of weeks ago, I didn't quite get the answer and I was wondering if anyone here could help me out.
You have an array with n elements. The elements are either 0 or 1.
You want to split the array into k contiguous subarrays. The size of each subarray can vary between ceil(n/2k) and floor(3n/2k). You can assume that k << n.
After you split the array into k subarrays. One element of each subarray will be randomly selected.
Devise an algorithm for maximizing the sum of the randomly selected elements from the k subarrays.
Basically means that we will want to split the array in such way such that the sum of all the expected values for the elements selected from each subarray is maximum.
You can assume that n is a power of 2.
Example:
Array: [0,0,1,1,0,0,1,1,0,1,1,0]
n = 12
k = 3
Size of subarrays can be: 2,3,4,5,6
Possible subarrays [0,0,1] [1,0,0,1] [1,0,1,1,0]
Expected Value of the sum of the elements randomly selected from the subarrays: 1/3 + 2/4 + 3/5 = 43/30 ~ 1.4333333
Optimal split: [0,0,1,1,0,0][1,1][0,1,1,0]
Expected value of optimal split: 1/3 + 1 + 1/2 = 11/6 ~ 1.83333333
I think we can solve this problem using dynamic programming.
Basically, we have:
f(i,j) is defined as the maximum sum of all expected values chosen from an array of size i and split into j subarrays. Therefore the solution should be f(n,k).
The recursive equation is:
f(i,j) = f(i-x,j-1) + sum(i-x+1,i)/x where (n/2k) <= x <= (3n/2k)
I don't know if this is still an open question or not, but it seems like the OP has managed to add enough clarifications that this should be straightforward to solve. At any rate, if I am understanding what you are saying this seems like a fair thing to ask in an interview environment for a software development position.
Here is the basic O(n^2 * k) solution, which should be adequate for small k (as the interviewer specified):
def best_val(arr, K):
n = len(arr)
psum = [ 0.0 ]
for x in arr:
psum.append(psum[-1] + x)
tab = [ -100000 for i in range(n) ]
tab.append(0)
for k in range(K):
for s in range(n - (k+1) * ceil(n/(2*K))):
terms = range(s + ceil(n/(2*K)), min(s + floor((3*n)/(2*K)) + 1, n+1))
tab[s] = max( [ (psum[t] - psum[s]) / (t - s) + tab[t] for t in terms ])
return tab[0]
I used the numpy ceil/floor functions but you basically get the idea. The only `tricks' in this version is that it does windowing to reduce the memory overhead to just O(n) instead of O(n * k), and that it precalculates the partial sums to make computing the expected value for a box a constant time operation (thus saving a factor of O(n) from the inner loop).
I don't know if anyone is still interested to see the solution for this problem. Just stumbled upon this question half an hour ago and thought of posting my solution(Java). The complexity for this is O(n*K^log10). The proof is a little convoluted so I would rather provide runtime numbers:
n k time(ms)
48 4 25
48 8 265
24 4 20
24 8 33
96 4 51
192 4 143
192 8 343919
The solution is the same old recursive one where given an array, choose the first partition of size ceil(n/2k) and find the best solution recursively for the rest with number of partitions = k -1, then take ceil(n/2k) + 1 and so on.
Code:
public class PartitionOptimization {
public static void main(String[] args) {
PartitionOptimization p = new PartitionOptimization();
int[] input = { 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0};
int splitNum = 3;
int lowerLim = (int) Math.ceil(input.length / (2.0 * splitNum));
int upperLim = (int) Math.floor((3.0 * input.length) / (2.0 * splitNum));
System.out.println(input.length + " " + lowerLim + " " + upperLim + " " +
splitNum);
Date currDate = new Date();
System.out.println(currDate);
System.out.println(p.getMaxPartExpt(input, lowerLim, upperLim,
splitNum, 0));
System.out.println(new Date().getTime() - currDate.getTime());
}
public double getMaxPartExpt(int[] input, int lowerLim, int upperLim,
int splitNum, int startIndex) {
if (splitNum <= 1 && startIndex<=(input.length -lowerLim+1)){
double expt = findExpectation(input, startIndex, input.length-1);
return expt;
}
if (!((input.length - startIndex) / lowerLim >= splitNum))
return -1;
double maxExpt = 0;
double curMax = 0;
int bestI=0;
for (int i = startIndex + lowerLim - 1; i < Math.min(startIndex
+ upperLim, input.length); i++) {
double curExpect = findExpectation(input, startIndex, i);
double splitExpect = getMaxPartExpt(input, lowerLim, upperLim,
splitNum - 1, i + 1);
if (splitExpect>=0 && (curExpect + splitExpect > maxExpt)){
bestI = i;
curMax = curExpect;
maxExpt = curExpect + splitExpect;
}
}
return maxExpt;
}
public double findExpectation(int[] input, int startIndex, int endIndex) {
double expectation = 0;
for (int i = startIndex; i <= endIndex; i++) {
expectation = expectation + input[i];
}
expectation = (expectation / (endIndex - startIndex + 1));
return expectation;
}
}
Not sure I understand, the algorithm is to split the array in groups, right? The maximum value the sum can have is the number of ones. So split the array in "n" groups of 1 element each and the addition will be the maximum value possible. But it must be something else and I did not understand the problem, that seems too silly.
I think this can be solved with dynamic programming. At each possible split location, get the maximum sum if you split at that location and if you don't split at that point. A recursive function and a table to store history might be useful.
sum_i = max{ NumOnesNewPart/NumZerosNewPart * sum(NewPart) + sum(A_i+1, A_end),
sum(A_0,A_i+1) + sum(A_i+1, A_end)
}
This might lead to something...
I think its a bad interview question, but it is also an easy problem to solve.
Every integer contributes to the expected value with weight 1/s where s is the size of the set where it has been placed. Therefore, if you guess the sizes of the sets in your partition, you just need to fill the sets with ones starting from the smallest set, and then fill the remaining largest set with zeroes.
You can easily see then that if you have a partition, filled as above, where the sizes of the sets are S_1, ..., S_k and you do a transformation where you remove one item from set S_i and move it to set S_i+1, you have the following cases:
Both S_i and S_i+1 were filled with ones; then the expected value does not change
Both them were filled with zeroes; then the expected value does not change
S_i contained both 1's and 0's and S_i+1 contains only zeroes; moving 0 to S_i+1 increases the expected value because the expected value of S_i increases
S_i contained 1's and S_i+1 contains both 1's and 0's; moving 1 to S_i+1 increases the expected value because the expected value of S_i+1 increases and S_i remains intact
In all these cases, you can shift an element from S_i to S_i+1, maintaining the filling rule of filling smallest sets with 1's, so that the expected value increases. This leads to the simple algorithm:
Create a partitioning where there is a maximal number of maximum-size arrays and maximal number of minimum-size arrays
Fill the arrays starting from smallest one with 1's
Fill the remaining slots with 0's
How about a recursive function:
int BestValue(Array A, int numSplits)
// Returns the best value that would be obtained by splitting
// into numSplits partitions.
This in turn uses a helper:
// The additional argument is an array of the valid split sizes which
// is the same for each call.
int BestValueHelper(Array A, int numSplits, Array splitSizes)
{
int result = 0;
for splitSize in splitSizes
int splitResult = ExpectedValue(A, 0, splitSize) +
BestValueHelper(A+splitSize, numSplits-1, splitSizes);
if splitResult > result
result = splitResult;
}
ExpectedValue(Array A, int l, int m) computes the expected value of a split of A that goes from l to m i.e. (A[l] + A[l+1] + ... A[m]) / (m-l+1).
BestValue calls BestValueHelper after computing the array of valid split sizes between ceil(n/2k) and floor(3n/2k).
I have omitted error handling and some end conditions but those should not be too difficult to add.
Let
a[] = given array of length n
from = inclusive index of array a
k = number of required splits
minSize = minimum size of a split
maxSize = maximum size of a split
d = maxSize - minSize
expectation(a, from, to) = average of all element of array a from "from" to "to"
Optimal(a[], from, k) = MAX[ for(j>=minSize-1 to <=maxSize-1) { expectation(a, from, from+j) + Optimal(a, j+1, k-1)} ]
Runtime (assuming memoization or dp) = O(n*k*d)

O(nlogn) Algorithm - Find three evenly spaced ones within binary string

I had this question on an Algorithms test yesterday, and I can't figure out the answer. It is driving me absolutely crazy, because it was worth about 40 points. I figure that most of the class didn't solve it correctly, because I haven't come up with a solution in the past 24 hours.
Given a arbitrary binary string of length n, find three evenly spaced ones within the string if they exist. Write an algorithm which solves this in O(n * log(n)) time.
So strings like these have three ones that are "evenly spaced": 11100000, 0100100100
edit: It is a random number, so it should be able to work for any number. The examples I gave were to illustrate the "evenly spaced" property. So 1001011 is a valid number. With 1, 4, and 7 being ones that are evenly spaced.
Finally! Following up leads in sdcvvc's answer, we have it: the O(n log n) algorithm for the problem! It is simple too, after you understand it. Those who guessed FFT were right.
The problem: we are given a binary string S of length n, and we want to find three evenly spaced 1s in it. For example, S may be 110110010, where n=9. It has evenly spaced 1s at positions 2, 5, and 8.
Scan S left to right, and make a list L of positions of 1. For the S=110110010 above, we have the list L = [1, 2, 4, 5, 8]. This step is O(n). The problem is now to find an arithmetic progression of length 3 in L, i.e. to find distinct a, b, c in L such that b-a = c-b, or equivalently a+c=2b. For the example above, we want to find the progression (2, 5, 8).
Make a polynomial p with terms xk for each k in L. For the example above, we make the polynomial p(x) = (x + x2 + x4 + x5+x8). This step is O(n).
Find the polynomial q = p2, using the Fast Fourier Transform. For the example above, we get the polynomial q(x) = x16 + 2x13 + 2x12 + 3x10 + 4x9 + x8 + 2x7 + 4x6 + 2x5 + x4 + 2x3 + x2. This step is O(n log n).
Ignore all terms except those corresponding to x2k for some k in L. For the example above, we get the terms x16, 3x10, x8, x4, x2. This step is O(n), if you choose to do it at all.
Here's the crucial point: the coefficient of any x2b for b in L is precisely the number of pairs (a,c) in L such that a+c=2b. [CLRS, Ex. 30.1-7] One such pair is (b,b) always (so the coefficient is at least 1), but if there exists any other pair (a,c), then the coefficient is at least 3, from (a,c) and (c,a). For the example above, we have the coefficient of x10 to be 3 precisely because of the AP (2,5,8). (These coefficients x2b will always be odd numbers, for the reasons above. And all other coefficients in q will always be even.)
So then, the algorithm is to look at the coefficients of these terms x2b, and see if any of them is greater than 1. If there is none, then there are no evenly spaced 1s. If there is a b in L for which the coefficient of x2b is greater than 1, then we know that there is some pair (a,c) — other than (b,b) — for which a+c=2b. To find the actual pair, we simply try each a in L (the corresponding c would be 2b-a) and see if there is a 1 at position 2b-a in S. This step is O(n).
That's all, folks.
One might ask: do we need to use FFT? Many answers, such as beta's, flybywire's, and rsp's, suggest that the approach that checks each pair of 1s and sees if there is a 1 at the "third" position, might work in O(n log n), based on the intuition that if there are too many 1s, we would find a triple easily, and if there are too few 1s, checking all pairs takes little time. Unfortunately, while this intuition is correct and the simple approach is better than O(n2), it is not significantly better. As in sdcvvc's answer, we can take the "Cantor-like set" of strings of length n=3k, with 1s at the positions whose ternary representation has only 0s and 2s (no 1s) in it. Such a string has 2k = n(log 2)/(log 3) ≈ n0.63 ones in it and no evenly spaced 1s, so checking all pairs would be of the order of the square of the number of 1s in it: that's 4k ≈ n1.26 which unfortunately is asymptotically much larger than (n log n). In fact, the worst case is even worse: Leo Moser in 1953 constructed (effectively) such strings which have n1-c/√(log n) 1s in them but no evenly spaced 1s, which means that on such strings, the simple approach would take Θ(n2-2c/√(log n)) — only a tiny bit better than Θ(n2), surprisingly!
About the maximum number of 1s in a string of length n with no 3 evenly spaced ones (which we saw above was at least n0.63 from the easy Cantor-like construction, and at least n1-c/√(log n) with Moser's construction) — this is OEIS A003002. It can also be calculated directly from OEIS A065825 as the k such that A065825(k) ≤ n < A065825(k+1). I wrote a program to find these, and it turns out that the greedy algorithm does not give the longest such string. For example, for n=9, we can get 5 1s (110100011) but the greedy gives only 4 (110110000), for n=26 we can get 11 1s (11001010001000010110001101) but the greedy gives only 8 (11011000011011000000000000), and for n=74 we can get 22 1s (11000010110001000001011010001000000000000000010001011010000010001101000011) but the greedy gives only 16 (11011000011011000000000000011011000011011000000000000000000000000000000000). They do agree at quite a few places until 50 (e.g. all of 38 to 50), though. As the OEIS references say, it seems that Jaroslaw Wroblewski is interested in this question, and he maintains a website on these non-averaging sets. The exact numbers are known only up to 194.
Your problem is called AVERAGE in this paper (1999):
A problem is 3SUM-hard if there is a sub-quadratic reduction from the problem 3SUM: Given a set A of n integers, are there elements a,b,c in A such that a+b+c = 0? It is not known whether AVERAGE is 3SUM-hard. However, there is a simple linear-time reduction from AVERAGE to 3SUM, whose description we omit.
Wikipedia:
When the integers are in the range [−u ... u], 3SUM can be solved in time O(n + u lg u) by representing S as a bit vector and performing a convolution using FFT.
This is enough to solve your problem :).
What is very important is that O(n log n) is complexity in terms of number of zeroes and ones, not the count of ones (which could be given as an array, like [1,5,9,15]). Checking if a set has an arithmetic progression, terms of number of 1's, is hard, and according to that paper as of 1999 no faster algorithm than O(n2) is known, and is conjectured that it doesn't exist. Everybody who doesn't take this into account is attempting to solve an open problem.
Other interesting info, mostly irrevelant:
Lower bound:
An easy lower bound is Cantor-like set (numbers 1..3^n-1 not containing 1 in their ternary expansion) - its density is n^(log_3 2) (circa 0.631). So any checking if the set isn't too large, and then checking all pairs is not enough to get O(n log n). You have to investigate the sequence smarter. A better lower bound is quoted here - it's n1-c/(log(n))^(1/2). This means Cantor set is not optimal.
Upper bound - my old algorithm:
It is known that for large n, a subset of {1,2,...,n} not containing arithmetic progression has at most n/(log n)^(1/20) elements. The paper On triples in arithmetic progression proves more: the set cannot contain more than n * 228 * (log log n / log n)1/2 elements. So you could check if that bound is achieved and if not, naively check pairs. This is O(n2 * log log n / log n) algorithm, faster than O(n2). Unfortunately "On triples..." is on Springer - but the first page is available, and Ben Green's exposition is available here, page 28, theorem 24.
By the way, the papers are from 1999 - the same year as the first one I mentioned, so that's probably why the first one doesn't mention that result.
This is not a solution, but a similar line of thought to what Olexiy was thinking
I was playing around with creating sequences with maximum number of ones, and they are all quite interesting, I got up to 125 digits and here are the first 3 numbers it found by attempting to insert as many '1' bits as possible:
11011000011011000000000000001101100001101100000000000000000000000000000000000000000110110000110110000000000000011011000011011
10110100010110100000000000010110100010110100000000000000000000000000000000000000000101101000101101000000000000101101000101101
10011001010011001000000000010011001010011001000000000000000000000000000000000000010011001010011001000000000010011001010011001
Notice they are all fractals (not too surprising given the constraints). There may be something in thinking backwards, perhaps if the string is not a fractal of with a characteristic, then it must have a repeating pattern?
Thanks to beta for the better term to describe these numbers.
Update:
Alas it looks like the pattern breaks down when starting with a large enough initial string, such as: 10000000000001:
100000000000011
10000000000001101
100000000000011011
10000000000001101100001
100000000000011011000011
10000000000001101100001101
100000000000011011000011010000000001
100000000000011011000011010000000001001
1000000000000110110000110100000000010011
1000000000000110110000110100000000010011001
10000000000001101100001101000000000100110010000000001
10000000000001101100001101000000000100110010000000001000001
1000000000000110110000110100000000010011001000000000100000100000000000001
10000000000001101100001101000000000100110010000000001000001000000000000011
1000000000000110110000110100000000010011001000000000100000100000000000001101
100000000000011011000011010000000001001100100000000010000010000000000000110100001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011000000001100100000000100100000000000010000000010000100000100100010010000010000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000110000010000000000000000000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011000000001100100000000100100000000000010000000010000100000100100010010000010000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000110000010000000000000000000001001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011000001000000000000000000000100100000000000000000000000000000000000011
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011000001000000000000000000000100100000000000000000000000000000000000011001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001100100000000000000000000001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001100100000000000000000000001001
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001100100000000000000000000001001000001
100000000000011011000011010000000001001100100000000010000010000000000000110100001001000001000000110001000000001000000000000000000000000000000000000000010000001000000000000001100000000110010000000010010000000000001000000001000010000010010001001000001000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000011000001000000000000000000000100100000000000000000000000000000000000011001000000000000000000000010010000010000001
1000000000000110110000110100000000010011001000000000100000100000000000001101000010010000010000001100010000000010000000000000000000000000000000000000000100000010000000000000011000000001100100000000100100000000000010000000010000100000100100010010000010000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000110000010000000000000000000001001000000000000000000000000000000000000110010000000000000000000000100100000100000011
10000000000001101100001101000000000100110010000000001000001000000000000011010000100100000100000011000100000000100000000000000000000000000000000000000001000000100000000000000110000000011001000000001001000000000000100000000100001000001001000100100000100000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000001100000100000000000000000000010010000000000000000000000000000000000001100100000000000000000000001001000001000000110000000000001
I suspect that a simple approach that looks like O(n^2) will actually yield something better, like O(n ln(n)). The sequences that take the longest to test (for any given n) are the ones that contain no trios, and that puts severe restrictions on the number of 1's that can be in the sequence.
I've come up with some hand-waving arguments, but I haven't been able to find a tidy proof. I'm going to take a stab in the dark: the answer is a very clever idea that the professor has known for so long that it's come to seem obvious, but it's much too hard for the students. (Either that or you slept through the lecture that covered it.)
Revision: 2009-10-17 23:00
I've run this on large numbers (like, strings of 20 million) and I now believe this algorithm is not O(n logn). Notwithstanding that, it's a cool enough implementation and contains a number of optimizations that makes it run really fast. It evaluates all the arrangements of binary strings 24 or fewer digits in under 25 seconds.
I've updated the code to include the 0 <= L < M < U <= X-1 observation from earlier today.
Original
This is, in concept, similar to another question I answered. That code also looked at three values in a series and determined if a triplet satisfied a condition. Here is C# code adapted from that:
using System;
using System.Collections.Generic;
namespace StackOverflow1560523
{
class Program
{
public struct Pair<T>
{
public T Low, High;
}
static bool FindCandidate(int candidate,
List<int> arr,
List<int> pool,
Pair<int> pair,
ref int iterations)
{
int lower = pair.Low, upper = pair.High;
while ((lower >= 0) && (upper < pool.Count))
{
int lowRange = candidate - arr[pool[lower]];
int highRange = arr[pool[upper]] - candidate;
iterations++;
if (lowRange < highRange)
lower -= 1;
else if (lowRange > highRange)
upper += 1;
else
return true;
}
return false;
}
static List<int> BuildOnesArray(string s)
{
List<int> arr = new List<int>();
for (int i = 0; i < s.Length; i++)
if (s[i] == '1')
arr.Add(i);
return arr;
}
static void BuildIndexes(List<int> arr,
ref List<int> even, ref List<int> odd,
ref List<Pair<int>> evenIndex, ref List<Pair<int>> oddIndex)
{
for (int i = 0; i < arr.Count; i++)
{
bool isEven = (arr[i] & 1) == 0;
if (isEven)
{
evenIndex.Add(new Pair<int> {Low=even.Count-1, High=even.Count+1});
oddIndex.Add(new Pair<int> {Low=odd.Count-1, High=odd.Count});
even.Add(i);
}
else
{
oddIndex.Add(new Pair<int> {Low=odd.Count-1, High=odd.Count+1});
evenIndex.Add(new Pair<int> {Low=even.Count-1, High=even.Count});
odd.Add(i);
}
}
}
static int FindSpacedOnes(string s)
{
// List of indexes of 1s in the string
List<int> arr = BuildOnesArray(s);
//if (s.Length < 3)
// return 0;
// List of indexes to odd indexes in arr
List<int> odd = new List<int>(), even = new List<int>();
// evenIndex has indexes into arr to bracket even numbers
// oddIndex has indexes into arr to bracket odd numbers
List<Pair<int>> evenIndex = new List<Pair<int>>(),
oddIndex = new List<Pair<int>>();
BuildIndexes(arr,
ref even, ref odd,
ref evenIndex, ref oddIndex);
int iterations = 0;
for (int i = 1; i < arr.Count-1; i++)
{
int target = arr[i];
bool found = FindCandidate(target, arr, odd, oddIndex[i], ref iterations) ||
FindCandidate(target, arr, even, evenIndex[i], ref iterations);
if (found)
return iterations;
}
return iterations;
}
static IEnumerable<string> PowerSet(int n)
{
for (long i = (1L << (n-1)); i < (1L << n); i++)
{
yield return Convert.ToString(i, 2).PadLeft(n, '0');
}
}
static void Main(string[] args)
{
for (int i = 5; i < 64; i++)
{
int c = 0;
string hardest_string = "";
foreach (string s in PowerSet(i))
{
int cost = find_spaced_ones(s);
if (cost > c)
{
hardest_string = s;
c = cost;
Console.Write("{0} {1} {2}\r", i, c, hardest_string);
}
}
Console.WriteLine("{0} {1} {2}", i, c, hardest_string);
}
}
}
}
The principal differences are:
Exhaustive search of solutions
This code generates a power set of data to find the hardest input to solve for this algorithm.
All solutions versus hardest to solve
The code for the previous question generated all the solutions using a python generator. This code just displays the hardest for each pattern length.
Scoring algorithm
This code checks the distance from the middle element to its left- and right-hand edge. The python code tested whether a sum was above or below 0.
Convergence on a candidate
The current code works from the middle towards the edge to find a candidate. The code in the previous problem worked from the edges towards the middle. This last change gives a large performance improvement.
Use of even and odd pools
Based on the observations at the end of this write-up, the code searches pairs of even numbers of pairs of odd numbers to find L and U, keeping M fixed. This reduces the number of searches by pre-computing information. Accordingly, the code uses two levels of indirection in the main loop of FindCandidate and requires two calls to FindCandidate for each middle element: once for even numbers and once for odd ones.
The general idea is to work on indexes, not the raw representation of the data. Calculating an array where the 1's appear allows the algorithm to run in time proportional to the number of 1's in the data rather than in time proportional to the length of the data. This is a standard transformation: create a data structure that allows faster operation while keeping the problem equivalent.
The results are out of date: removed.
Edit: 2009-10-16 18:48
On yx's data, which is given some credence in the other responses as representative of hard data to calculate on, I get these results... I removed these. They are out of date.
I would point out that this data is not the hardest for my algorithm, so I think the assumption that yx's fractals are the hardest to solve is mistaken. The worst case for a particular algorithm, I expect, will depend upon the algorithm itself and will not likely be consistent across different algorithms.
Edit: 2009-10-17 13:30
Further observations on this.
First, convert the string of 0's and 1's into an array of indexes for each position of the 1's. Say the length of that array A is X. Then the goal is to find
0 <= L < M < U <= X-1
such that
A[M] - A[L] = A[U] - A[M]
or
2*A[M] = A[L] + A[U]
Since A[L] and A[U] sum to an even number, they can't be (even, odd) or (odd, even). The search for a match could be improved by splitting A[] into odd and even pools and searching for matches on A[M] in the pools of odd and even candidates in turn.
However, this is more of a performance optimization than an algorithmic improvement, I think. The number of comparisons should drop, but the order of the algorithm should be the same.
Edit 2009-10-18 00:45
Yet another optimization occurs to me, in the same vein as separating the candidates into even and odd. Since the three indexes have to add to a multiple of 3 (a, a+x, a+2x -- mod 3 is 0, regardless of a and x), you can separate L, M, and U into their mod 3 values:
M L U
0 0 0
1 2
2 1
1 0 2
1 1
2 0
2 0 1
1 0
2 2
In fact, you could combine this with the even/odd observation and separate them into their mod 6 values:
M L U
0 0 0
1 5
2 4
3 3
4 2
5 1
and so on. This would provide a further performance optimization but not an algorithmic speedup.
Wasn't able to come up with the solution yet :(, but have some ideas.
What if we start from a reverse problem: construct a sequence with the maximum number of 1s and WITHOUT any evenly spaced trios. If you can prove the maximum number of 1s is o(n), then you can improve your estimate by iterating only through list of 1s only.
This may help....
This problem reduces to the following:
Given a sequence of positive integers, find a contiguous subsequence partitioned into a prefix and a suffix such that the sum of the prefix of the subsequence is equal to the sum of the suffix of the subsequence.
For example, given a sequence of [ 3, 5, 1, 3, 6, 5, 2, 2, 3, 5, 6, 4 ], we would find a subsequence of [ 3, 6, 5, 2, 2] with a prefix of [ 3, 6 ] with prefix sum of 9 and a suffix of [ 5, 2, 2 ] with suffix sum of 9.
The reduction is as follows:
Given a sequence of zeros and ones, and starting at the leftmost one, continue moving to the right. Each time another one is encountered, record the number of moves since the previous one was encountered and append that number to the resulting sequence.
For example, given a sequence of [ 0, 1, 1, 0, 0, 1, 0, 0, 0, 1 0 ], we would find the reduction of [ 1, 3, 4]. From this reduction, we calculate the contiguous subsequence of [ 1, 3, 4], the prefix of [ 1, 3] with sum of 4, and the suffix of [ 4 ] with sum of 4.
This reduction may be computed in O(n).
Unfortunately, I am not sure where to go from here.
For the simple problem type (i.e. you search three "1" with only (i.e. zero or more) "0" between it), Its quite simple: You could just split the sequence at every "1" and look for two adjacent subsequences having the same length (the second subsequence not being the last one, of course). Obviously, this can be done in O(n) time.
For the more complex version (i.e. you search an index i and an gap g>0 such that s[i]==s[i+g]==s[i+2*g]=="1"), I'm not sure, if there exists an O(n log n) solution, since there are possibly O(n²) triplets having this property (think of a string of all ones, there are approximately n²/2 such triplets). Of course, you are looking for only one of these, but I have currently no idea, how to find it ...
A fun question, but once you realise that the actual pattern between two '1's does not matter, the algorithm becomes:
scan look for a '1'
starting from the next position scan for another '1' (to the end of the array minus the distance from the current first '1' or else the 3rd '1' would be out of bounds)
if at the position of the 2nd '1' plus the distance to the first 1' a third '1' is found, we have evenly spaces ones.
In code, JTest fashion, (Note this code isn't written to be most efficient and I added some println's to see what happens.)
import java.util.Random;
import junit.framework.TestCase;
public class AlgorithmTest extends TestCase {
/**
* Constructor for GetNumberTest.
*
* #param name The test's name.
*/
public AlgorithmTest(String name) {
super(name);
}
/**
* #see TestCase#setUp()
*/
protected void setUp() throws Exception {
super.setUp();
}
/**
* #see TestCase#tearDown()
*/
protected void tearDown() throws Exception {
super.tearDown();
}
/**
* Tests the algorithm.
*/
public void testEvenlySpacedOnes() {
assertFalse(isEvenlySpaced(1));
assertFalse(isEvenlySpaced(0x058003));
assertTrue(isEvenlySpaced(0x07001));
assertTrue(isEvenlySpaced(0x01007));
assertTrue(isEvenlySpaced(0x101010));
// some fun tests
Random random = new Random();
isEvenlySpaced(random.nextLong());
isEvenlySpaced(random.nextLong());
isEvenlySpaced(random.nextLong());
}
/**
* #param testBits
*/
private boolean isEvenlySpaced(long testBits) {
String testString = Long.toBinaryString(testBits);
char[] ones = testString.toCharArray();
final char ONE = '1';
for (int n = 0; n < ones.length - 1; n++) {
if (ONE == ones[n]) {
for (int m = n + 1; m < ones.length - m + n; m++) {
if (ONE == ones[m] && ONE == ones[m + m - n]) {
System.out.println(" IS evenly spaced: " + testBits + '=' + testString);
System.out.println(" at: " + n + ", " + m + ", " + (m + m - n));
return true;
}
}
}
}
System.out.println("NOT evenly spaced: " + testBits + '=' + testString);
return false;
}
}
I thought of a divide-and-conquer approach that might work.
First, in preprocessing you need to insert all numbers less than one half your input size (n/3) into a list.
Given a string: 0000010101000100 (note that this particular example is valid)
Insert all primes (and 1) from 1 to (16/2) into a list: {1, 2, 3, 4, 5, 6, 7}
Then divide it in half:
100000101 01000100
Keep doing this until you get to strings of size 1. For all size-one strings with a 1 in them, add the index of the string to the list of possibilities; otherwise, return -1 for failure.
You'll also need to return a list of still-possible spacing distances, associated with each starting index. (Start with the list you made above and remove numbers as you go) Here, an empty list means you're only dealing with one 1 and so any spacing is possible at this point; otherwise the list includes spacings that must be ruled out.
So continuing with the example above:
1000 0101 0100 0100
10 00 01 01 01 00 01 00
1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0
In the first combine step, we have eight sets of two now. In the first, we have the possibility of a set, but we learn that spacing by 1 is impossible because of the other zero being there. So we return 0 (for the index) and {2,3,4,5,7} for the fact that spacing by 1 is impossible. In the second, we have nothing and so return -1. In the third we have a match with no spacings eliminated in index 5, so return 5, {1,2,3,4,5,7}. In the fourth pair we return 7, {1,2,3,4,5,7}. In the fifth, return 9, {1,2,3,4,5,7}. In the sixth, return -1. In the seventh, return 13, {1,2,3,4,5,7}. In the eighth, return -1.
Combining again into four sets of four, we have:
1000: Return (0, {4,5,6,7})
0101: Return (5, {2,3,4,5,6,7}), (7, {1,2,3,4,5,6,7})
0100: Return (9, {3,4,5,6,7})
0100: Return (13, {3,4,5,6,7})
Combining into sets of eight:
10000101: Return (0, {5,7}), (5, {2,3,4,5,6,7}), (7, {1,2,3,4,5,6,7})
01000100: Return (9, {4,7}), (13, {3,4,5,6,7})
Combining into a set of sixteen:
10000101 01000100
As we've progressed, we keep checking all the possibilities so far. Up to this step we've left stuff that went beyond the end of the string, but now we can check all the possibilities.
Basically, we check the first 1 with spacings of 5 and 7, and find that they don't line up to 1's. (Note that each check is CONSTANT, not linear time) Then we check the second one (index 5) with spacings of 2, 3, 4, 5, 6, and 7-- or we would, but we can stop at 2 since that actually matches up.
Phew! That's a rather long algorithm.
I don't know 100% if it's O(n log n) because of the last step, but everything up to there is definitely O(n log n) as far as I can tell. I'll get back to this later and try to refine the last step.
EDIT: Changed my answer to reflect Welbog's comment. Sorry for the error. I'll write some pseudocode later, too, when I get a little more time to decipher what I wrote again. ;-)
I'll give my rough guess here, and let those who are better with calculating complexity to help me on how my algorithm fares in O-notation wise
given binary string 0000010101000100 (as example)
crop head and tail of zeroes -> 00000 101010001 00
we get 101010001 from previous calculation
check if the middle bit is 'one', if true, found valid three evenly spaced 'ones' (only if the number of bits is odd numbered)
correlatively, if the remained cropped number of bits is even numbered, the head and tail 'one' cannot be part of evenly spaced 'one',
we use 1010100001 as example (with an extra 'zero' to become even numbered crop), in this case we need to crop again, then becomes -> 10101 00001
we get 10101 from previous calculation, and check middle bit, and we found the evenly spaced bit again
I have no idea how to calculate complexity for this, can anyone help?
edit: add some code to illustrate my idea
edit2: tried to compile my code and found some major mistakes, fixed
char *binaryStr = "0000010101000100";
int main() {
int head, tail, pos;
head = 0;
tail = strlen(binaryStr)-1;
if( (pos = find3even(head, tail)) >=0 )
printf("found it at position %d\n", pos);
return 0;
}
int find3even(int head, int tail) {
int pos = 0;
if(head >= tail) return -1;
while(binaryStr[head] == '0')
if(head<tail) head++;
while(binaryStr[tail] == '0')
if(head<tail) tail--;
if(head >= tail) return -1;
if( (tail-head)%2 == 0 && //true if odd numbered
(binaryStr[head + (tail-head)/2] == '1') ) {
return head;
}else {
if( (pos = find3even(head, tail-1)) >=0 )
return pos;
if( (pos = find3even(head+1, tail)) >=0 )
return pos;
}
return -1;
}
I came up with something like this:
def IsSymetric(number):
number = number.strip('0')
if len(number) < 3:
return False
if len(number) % 2 == 0:
return IsSymetric(number[1:]) or IsSymetric(number[0:len(number)-2])
else:
if number[len(number)//2] == '1':
return True
return IsSymetric(number[:(len(number)//2)]) or IsSymetric(number[len(number)//2+1:])
return False
This is inspired by andycjw.
Truncate the zeros.
If even then test two substring 0 - (len-2) (skip last character) and from 1 - (len-1) (skip the first char)
If not even than if the middle char is one than we have success. Else divide the string in the midle without the midle element and check both parts.
As to the complexity this might be O(nlogn) as in each recursion we are dividing by two.
Hope it helps.
Ok, I'm going to take another stab at the problem. I think I can prove a O(n log(n)) algorithm that is similar to those already discussed by using a balanced binary tree to store distances between 1's. This approach was inspired by Justice's observation about reducing the problem to a list of distances between the 1's.
Could we scan the input string to construct a balanced binary tree around the position of 1's such that each node stores the position of the 1 and each edge is labeled with the distance to the adjacent 1 for each child node. For example:
10010001 gives the following tree
3
/ \
2 / \ 3
/ \
0 7
This can be done in O(n log(n)) since, for a string of size n, each insertion takes O(log(n)) in the worst case.
Then the problem is to search the tree to discover whether, at any node, there is a path from that node through the left-child that has the same distance as a path through the right child. This can be done recursively on each subtree. When merging two subtrees in the search, we must compare the distances from paths in the left subtree with distances from paths in the right. Since the number of paths in a subtree will be proportional to log(n), and the number of nodes is n, I believe this can be done in O(n log(n)) time.
Did I miss anything?
This seemed liked a fun problem so I decided to try my hand at it.
I am making the assumption that 111000001 would find the first 3 ones and be successful. Essentially the number of zeroes following the 1 is the important thing, since 0111000 is the same as 111000 according to your definition. Once you find two cases of 1, the next 1 found completes the trilogy.
Here it is in Python:
def find_three(bstring):
print bstring
dict = {}
lastone = -1
zerocount = 0
for i in range(len(bstring)):
if bstring[i] == '1':
print i, ': 1'
if lastone != -1:
if(zerocount in dict):
dict[zerocount].append(lastone)
if len(dict[zerocount]) == 2:
dict[zerocount].append(i)
return True, dict
else:
dict[zerocount] = [lastone]
lastone = i
zerocount = 0
else:
zerocount = zerocount + 1
#this is really just book keeping, as we have failed at this point
if lastone != -1:
if(zerocount in dict):
dict[zerocount].append(lastone)
else:
dict[zerocount] = [lastone]
return False, dict
This is a first try, so I'm sure this could be written in a cleaner manner. Please list the cases where this method fails down below.
I assume the reason this is nlog(n) is due to the following:
To find the 1 that is the start of the triplet, you need to check (n-2) characters. If you haven't found it by that point, you won't (chars n-1 and n cannot start a triplet) (O(n))
To find the second 1 that is the part of the triplet (started by the first one), you need to check m/2 (m=n-x, where x is the offset of the first 1) characters. This is because, if you haven't found the second 1 by the time you're halfway from the first one to the end, you won't... since the third 1 must be exactly the same distance past the second. (O(log(n)))
It O(1) to find the last 1 since you know the index it must be at by the time you find the first and second.
So, you have n, log(n), and 1... O(nlogn)
Edit: Oops, my bad. My brain had it set that n/2 was logn... which it obviously isn't (doubling the number on items still doubles the number of iterations on the inner loop). This is still at n^2, not solving the problem. Well, at least I got to write some code :)
Implementation in Tcl
proc get-triplet {input} {
for {set first 0} {$first < [string length $input]-2} {incr first} {
if {[string index $input $first] != 1} {
continue
}
set start [expr {$first + 1}]
set end [expr {1+ $first + (([string length $input] - $first) /2)}]
for {set second $start} {$second < $end} {incr second} {
if {[string index $input $second] != 1} {
continue
}
set last [expr {($second - $first) + $second}]
if {[string index $input $last] == 1} {
return [list $first $second $last]
}
}
}
return {}
}
get-triplet 10101 ;# 0 2 4
get-triplet 10111 ;# 0 2 4
get-triplet 11100000 ;# 0 1 2
get-triplet 0100100100 ;# 1 4 7
I think I have found a way of solving the problem, but I can't construct a formal proof. The solution I made is written in Java, and it uses a counter 'n' to count how many list/array accesses it does. So n should be less than or equal to stringLength*log(stringLength) if it is correct. I tried it for the numbers 0 to 2^22, and it works.
It starts by iterating over the input string and making a list of all the indexes which hold a one. This is just O(n).
Then from the list of indexes it picks a firstIndex, and a secondIndex which is greater than the first. These two indexes must hold ones, because they are in the list of indexes. From there the thirdIndex can be calculated. If the inputString[thirdIndex] is a 1 then it halts.
public static int testString(String input){
//n is the number of array/list accesses in the algorithm
int n=0;
//Put the indices of all the ones into a list, O(n)
ArrayList<Integer> ones = new ArrayList<Integer>();
for(int i=0;i<input.length();i++){
if(input.charAt(i)=='1'){
ones.add(i);
}
}
//If less than three ones in list, just stop
if(ones.size()<3){
return n;
}
int firstIndex, secondIndex, thirdIndex;
for(int x=0;x<ones.size()-2;x++){
n++;
firstIndex = ones.get(x);
for(int y=x+1; y<ones.size()-1; y++){
n++;
secondIndex = ones.get(y);
thirdIndex = secondIndex*2 - firstIndex;
if(thirdIndex >= input.length()){
break;
}
n++;
if(input.charAt(thirdIndex) == '1'){
//This case is satisfied if it has found three evenly spaced ones
//System.out.println("This one => " + input);
return n;
}
}
}
return n;
}
additional note: the counter n is not incremented when it iterates over the input string to construct the list of indexes. This operation is O(n), so it won't have an effect on the algorithm complexity anyway.
One inroad into the problem is to think of factors and shifting.
With shifting, you compare the string of ones and zeroes with a shifted version of itself. You then take matching ones. Take this example shifted by two:
1010101010
1010101010
------------
001010101000
The resulting 1's (bitwise ANDed), must represent all those 1's which are evenly spaced by two. The same example shifted by three:
1010101010
1010101010
-------------
0000000000000
In this case there are no 1's which are evenly spaced three apart.
So what does this tell you? Well that you only need to test shifts which are prime numbers. For example say you have two 1's which are six apart. You would only have to test 'two' shifts and 'three' shifts (since these divide six). For example:
10000010
10000010 (Shift by two)
10000010
10000010 (We have a match)
10000010
10000010 (Shift by three)
10000010 (We have a match)
So the only shifts you ever need to check are 2,3,5,7,11,13 etc. Up to the prime closest to the square root of size of the string of digits.
Nearly solved?
I think I am closer to a solution. Basically:
Scan the string for 1's. For each 1 note it's remainder after taking a modulus of its position. The modulus ranges from 1 to half the size of the string. This is because the largest possible separation size is half the string. This is done in O(n^2). BUT. Only prime moduli need be checked so O(n^2/log(n))
Sort the list of modulus/remainders in order largest modulus first, this can be done in O(n*log(n)) time.
Look for three consecutive moduli/remainders which are the same.
Somehow retrieve the position of the ones!
I think the biggest clue to the answer, is that the fastest sort algorithms, are O(n*log(n)).
WRONG
Step 1 is wrong as pointed out by a colleague. If we have 1's at position 2,12 and 102. Then taking a modulus of 10, they would all have the same remainders, and yet are not equally spaced apart! Sorry.
Here are some thoughts that, despite my best efforts, will not seem to wrap themselves up in a bow. Still, they might be a useful starting point for someone's analysis.
Consider the proposed solution as follows, which is the approach that several folks have suggested, including myself in a prior version of this answer. :)
Trim leading and trailing zeroes.
Scan the string looking for 1's.
When a 1 is found:
Assume that it is the middle 1 of the solution.
For each prior 1, use its saved position to compute the anticipated position of the final 1.
If the computed position is after the end of the string it cannot be part of the solution, so drop the position from the list of candidates.
Check the solution.
If the solution was not found, add the current 1 to the list of candidates.
Repeat until no more 1's are found.
Now consider input strings strings like the following, which will not have a solution:
101
101001
1010010001
101001000100001
101001000100001000001
In general, this is the concatenation of k strings of the form j 0's followed by a 1 for j from zero to k-1.
k=2 101
k=3 101001
k=4 1010010001
k=5 101001000100001
k=6 101001000100001000001
Note that the lengths of the substrings are 1, 2, 3, etc. So, problem size n has substrings of lengths 1 to k such that n = k(k+1)/2.
k=2 n= 3 101
k=3 n= 6 101001
k=4 n=10 1010010001
k=5 n=15 101001000100001
k=6 n=21 101001000100001000001
Note that k also tracks the number of 1's that we have to consider. Remember that every time we see a 1, we need to consider all the 1's seen so far. So when we see the second 1, we only consider the first, when we see the third 1, we reconsider the first two, when we see the fourth 1, we need to reconsider the first three, and so on. By the end of the algorithm, we've considered k(k-1)/2 pairs of 1's. Call that p.
k=2 n= 3 p= 1 101
k=3 n= 6 p= 3 101001
k=4 n=10 p= 6 1010010001
k=5 n=15 p=10 101001000100001
k=6 n=21 p=15 101001000100001000001
The relationship between n and p is that n = p + k.
The process of going through the string takes O(n) time. Each time a 1 is encountered, a maximum of (k-1) comparisons are done. Since n = k(k+1)/2, n > k**2, so sqrt(n) > k. This gives us O(n sqrt(n)) or O(n**3/2). Note however that may not be a really tight bound, because the number of comparisons goes from 1 to a maximum of k, it isn't k the whole time. But I'm not sure how to account for that in the math.
It still isn't O(n log(n)). Also, I can't prove those inputs are the worst cases, although I suspect they are. I think a denser packing of 1's to the front results in an even sparser packing at the end.
Since someone may still find it useful, here's my code for that solution in Perl:
#!/usr/bin/perl
# read input as first argument
my $s = $ARGV[0];
# validate the input
$s =~ /^[01]+$/ or die "invalid input string\n";
# strip leading and trailing 0's
$s =~ s/^0+//;
$s =~ s/0+$//;
# prime the position list with the first '1' at position 0
my #p = (0);
# start at position 1, which is the second character
my $i = 1;
print "the string is $s\n\n";
while ($i < length($s)) {
if (substr($s, $i, 1) eq '1') {
print "found '1' at position $i\n";
my #t = ();
# assuming this is the middle '1', go through the positions
# of all the prior '1's and check whether there's another '1'
# in the correct position after this '1' to make a solution
while (scalar #p) {
# $p is the position of the prior '1'
my $p = shift #p;
# $j is the corresponding position for the following '1'
my $j = 2 * $i - $p;
# if $j is off the end of the string then we don't need to
# check $p anymore
next if ($j >= length($s));
print "checking positions $p, $i, $j\n";
if (substr($s, $j, 1) eq '1') {
print "\nsolution found at positions $p, $i, $j\n";
exit 0;
}
# if $j isn't off the end of the string, keep $p for next time
push #t, $p;
}
#p = #t;
# add this '1' to the list of '1' positions
push #p, $i;
}
$i++;
}
print "\nno solution found\n";
While scanning 1s, add their positions to a List. When adding the second and successive 1s, compare them to each position in the list so far. Spacing equals currentOne (center) - previousOne (left). The right-side bit is currentOne + spacing. If it's 1, the end.
The list of ones grows inversely with the space between them. Simply stated, if you've got a lot of 0s between the 1s (as in a worst case), your list of known 1s will grow quite slowly.
using System;
using System.Collections.Generic;
namespace spacedOnes
{
class Program
{
static int[] _bits = new int[8] {128, 64, 32, 16, 8, 4, 2, 1};
static void Main(string[] args)
{
var bytes = new byte[4];
var r = new Random();
r.NextBytes(bytes);
foreach (var b in bytes) {
Console.Write(getByteString(b));
}
Console.WriteLine();
var bitCount = bytes.Length * 8;
var done = false;
var onePositions = new List<int>();
for (var i = 0; i < bitCount; i++)
{
if (isOne(bytes, i)) {
if (onePositions.Count > 0) {
foreach (var knownOne in onePositions) {
var spacing = i - knownOne;
var k = i + spacing;
if (k < bitCount && isOne(bytes, k)) {
Console.WriteLine("^".PadLeft(knownOne + 1) + "^".PadLeft(spacing) + "^".PadLeft(spacing));
done = true;
break;
}
}
}
if (done) {
break;
}
onePositions.Add(i);
}
}
Console.ReadKey();
}
static String getByteString(byte b) {
var s = new char[8];
for (var i=0; i<s.Length; i++) {
s[i] = ((b & _bits[i]) > 0 ? '1' : '0');
}
return new String(s);
}
static bool isOne(byte[] bytes, int i)
{
var byteIndex = i / 8;
var bitIndex = i % 8;
return (bytes[byteIndex] & _bits[bitIndex]) > 0;
}
}
}
I thought I'd add one comment before posting the 22nd naive solution to the problem. For the naive solution, we don't need to show that the number of 1's in the string is at most O(log(n)), but rather that it is at most O(sqrt(n*log(n)).
Solver:
def solve(Str):
indexes=[]
#O(n) setup
for i in range(len(Str)):
if Str[i]=='1':
indexes.append(i)
#O((number of 1's)^2) processing
for i in range(len(indexes)):
for j in range(i+1, len(indexes)):
indexDiff = indexes[j] - indexes[i]
k=indexes[j] + indexDiff
if k<len(Str) and Str[k]=='1':
return True
return False
It's basically a fair bit similar to flybywire's idea and implementation, though looking ahead instead of back.
Greedy String Builder:
#assumes final char hasn't been added, and would be a 1
def lastCharMakesSolvable(Str):
endIndex=len(Str)
j=endIndex-1
while j-(endIndex-j) >= 0:
k=j-(endIndex-j)
if k >= 0 and Str[k]=='1' and Str[j]=='1':
return True
j=j-1
return False
def expandString(StartString=''):
if lastCharMakesSolvable(StartString):
return StartString + '0'
return StartString + '1'
n=1
BaseStr=""
lastCount=0
while n<1000000:
BaseStr=expandString(BaseStr)
count=BaseStr.count('1')
if count != lastCount:
print(len(BaseStr), count)
lastCount=count
n=n+1
(In my defense, I'm still in the 'learn python' stage of understanding)
Also, potentially useful output from the greedy building of strings, there's a rather consistent jump after hitting a power of 2 in the number of 1's... which I was not willing to wait around to witness hitting 2096.
strlength # of 1's
1 1
2 2
4 3
5 4
10 5
14 8
28 9
41 16
82 17
122 32
244 33
365 64
730 65
1094 128
2188 129
3281 256
6562 257
9842 512
19684 513
29525 1024
I'll try to present a mathematical approach. This is more a beginning than an end, so any help, comment, or even contradiction - will be deeply appreciated. However, if this approach is proven - the algorithm is a straight-forward search in the string.
Given a fixed number of spaces k and a string S, the search for a k-spaced-triplet takes O(n) - We simply test for every 0<=i<=(n-2k) if S[i]==S[i+k]==S[i+2k]. The test takes O(1) and we do it n-k times where k is a constant, so it takes O(n-k)=O(n).
Let us assume that there is an Inverse Proportion between the number of 1's and the maximum spaces we need to search for. That is, If there are many 1's, there must be a triplet and it must be quite dense; If there are only few 1's, The triplet (if any) can be quite sparse. In other words, I can prove that if I have enough 1's, such triplet must exist - and the more 1's I have, a more dense triplet must be found. This can be explained by the Pigeonhole principle - Hope to elaborate on this later.
Say have an upper bound k on the possible number of spaces I have to look for. Now, for each 1 located in S[i] we need to check for 1 in S[i-1] and S[i+1], S[i-2] and S[i+2], ... S[i-k] and S[i+k]. This takes O((k^2-k)/2)=O(k^2) for each 1 in S - due to Gauss' Series Summation Formula. Note that this differs from section 1 - I'm having k as an upper bound for the number of spaces, not as a constant space.
We need to prove O(n*log(n)). That is, we need to show that k*(number of 1's) is proportional to log(n).
If we can do that, the algorithm is trivial - for each 1 in S whose index is i, simply look for 1's from each side up to distance k. If two were found in the same distance, return i and k. Again, the tricky part would be finding k and proving the correctness.
I would really appreciate your comments here - I have been trying to find the relation between k and the number of 1's on my whiteboard, so far without success.
Assumption:
Just wrong, talking about log(n) number of upper limit of ones
EDIT:
Now I found that using Cantor numbers (if correct), density on set is (2/3)^Log_3(n) (what a weird function) and I agree, log(n)/n density is to strong.
If this is upper limit, there is algorhitm who solves this problem in at least O(n*(3/2)^(log(n)/log(3))) time complexity and O((3/2)^(log(n)/log(3))) space complexity. (check Justice's answer for algorhitm)
This is still by far better than O(n^2)
This function ((3/2)^(log(n)/log(3))) really looks like n*log(n) on first sight.
How did I get this formula?
Applaying Cantors number on string.
Supose that length of string is 3^p == n
At each step in generation of Cantor string you keep 2/3 of prevous number of ones. Apply this p times.
That mean (n * ((2/3)^p)) -> (((3^p)) * ((2/3)^p)) remaining ones and after simplification 2^p.
This mean 2^p ones in 3^p string -> (3/2)^p ones . Substitute p=log(n)/log(3) and get
((3/2)^(log(n)/log(3)))
How about a simple O(n) solution, with O(n^2) space? (Uses the assumption that all bitwise operators work in O(1).)
The algorithm basically works in four stages:
Stage 1: For each bit in your original number, find out how far away the ones are, but consider only one direction. (I considered all the bits in the direction of the least significant bit.)
Stage 2: Reverse the order of the bits in the input;
Stage 3: Re-run step 1 on the reversed input.
Stage 4: Compare the results from Stage 1 and Stage 3. If any bits are equally spaced above AND below we must have a hit.
Keep in mind that no step in the above algorithm takes longer than O(n). ^_^
As an added benefit, this algorithm will find ALL equally spaced ones from EVERY number. So for example if you get a result of "0x0005" then there are equally spaced ones at BOTH 1 and 3 units away
I didn't really try optimizing the code below, but it is compilable C# code that seems to work.
using System;
namespace ThreeNumbers
{
class Program
{
const int uint32Length = 32;
static void Main(string[] args)
{
Console.Write("Please enter your integer: ");
uint input = UInt32.Parse(Console.ReadLine());
uint[] distancesLower = Distances(input);
uint[] distancesHigher = Distances(Reverse(input));
PrintHits(input, distancesLower, distancesHigher);
}
/// <summary>
/// Returns an array showing how far the ones away from each bit in the input. Only
/// considers ones at lower signifcant bits. Index 0 represents the least significant bit
/// in the input. Index 1 represents the second least significant bit in the input and so
/// on. If a one is 3 away from the bit in question, then the third least significant bit
/// of the value will be sit.
///
/// As programed this algorithm needs: O(n) time, and O(n*log(n)) space.
/// (Where n is the number of bits in the input.)
/// </summary>
public static uint[] Distances(uint input)
{
uint[] distanceToOnes = new uint[uint32Length];
uint result = 0;
//Sets how far each bit is from other ones. Going in the direction of LSB to MSB
for (uint bitIndex = 1, arrayIndex = 0; bitIndex != 0; bitIndex <<= 1, ++arrayIndex)
{
distanceToOnes[arrayIndex] = result;
result <<= 1;
if ((input & bitIndex) != 0)
{
result |= 1;
}
}
return distanceToOnes;
}
/// <summary>
/// Reverses the bits in the input.
///
/// As programmed this algorithm needs O(n) time and O(n) space.
/// (Where n is the number of bits in the input.)
/// </summary>
/// <param name="input"></param>
/// <returns></returns>
public static uint Reverse(uint input)
{
uint reversedInput = 0;
for (uint bitIndex = 1; bitIndex != 0; bitIndex <<= 1)
{
reversedInput <<= 1;
reversedInput |= (uint)((input & bitIndex) != 0 ? 1 : 0);
}
return reversedInput;
}
/// <summary>
/// Goes through each bit in the input, to check if there are any bits equally far away in
/// the distancesLower and distancesHigher
/// </summary>
public static void PrintHits(uint input, uint[] distancesLower, uint[] distancesHigher)
{
const int offset = uint32Length - 1;
for (uint bitIndex = 1, arrayIndex = 0; bitIndex != 0; bitIndex <<= 1, ++arrayIndex)
{
//hits checks if any bits are equally spaced away from our current value
bool isBitSet = (input & bitIndex) != 0;
uint hits = distancesLower[arrayIndex] & distancesHigher[offset - arrayIndex];
if (isBitSet && (hits != 0))
{
Console.WriteLine(String.Format("The {0}-th LSB has hits 0x{1:x4} away", arrayIndex + 1, hits));
}
}
}
}
}
Someone will probably comment that for any sufficiently large number, bitwise operations cannot be done in O(1). You'd be right. However, I'd conjecture that every solution that uses addition, subtraction, multiplication, or division (which cannot be done by shifting) would also have that problem.
Below is a solution. There could be some little mistakes here and there, but the idea is sound.
Edit: It's not n * log(n)
PSEUDO CODE:
foreach character in the string
if the character equals 1 {
if length cache > 0 { //we can skip the first one
foreach location in the cache { //last in first out kind of order
if ((currentlocation + (currentlocation - location)) < length string)
if (string[(currentlocation + (currentlocation - location))] equals 1)
return found evenly spaced string
else
break;
}
}
remember the location of this character in a some sort of cache.
}
return didn't find evenly spaced string
C# code:
public static Boolean FindThreeEvenlySpacedOnes(String str) {
List<int> cache = new List<int>();
for (var x = 0; x < str.Length; x++) {
if (str[x] == '1') {
if (cache.Count > 0) {
for (var i = cache.Count - 1; i > 0; i--) {
if ((x + (x - cache[i])) >= str.Length)
break;
if (str[(x + (x - cache[i]))] == '1')
return true;
}
}
cache.Add(x);
}
}
return false;
}
How it works:
iteration 1:
x
|
101101001
// the location of this 1 is stored in the cache
iteration 2:
x
|
101101001
iteration 3:
a x b
| | |
101101001
//we retrieve location a out of the cache and then based on a
//we calculate b and check if te string contains a 1 on location b
//and of course we store x in the cache because it's a 1
iteration 4:
axb
|||
101101001
a x b
| | |
101101001
iteration 5:
x
|
101101001
iteration 6:
a x b
| | |
101101001
a x b
| | |
101101001
//return found evenly spaced string
Obviously we need to at least check bunches of triplets at the same time, so we need to compress the checks somehow. I have a candidate algorithm, but analyzing the time complexity is beyond my ability*time threshold.
Build a tree where each node has three children and each node contains the total number of 1's at its leaves. Build a linked list over the 1's, as well. Assign each node an allowed cost proportional to the range it covers. As long as the time we spend at each node is within budget, we'll have an O(n lg n) algorithm.
--
Start at the root. If the square of the total number of 1's below it is less than its allowed cost, apply the naive algorithm. Otherwise recurse on its children.
Now we have either returned within budget, or we know that there are no valid triplets entirely contained within one of the children. Therefore we must check the inter-node triplets.
Now things get incredibly messy. We essentially want to recurse on the potential sets of children while limiting the range. As soon as the range is constrained enough that the naive algorithm will run under budget, you do it. Enjoy implementing this, because I guarantee it will be tedious. There's like a dozen cases.
--
The reason I think that algorithm will work is because the sequences without valid triplets appear to go alternate between bunches of 1's and lots of 0's. It effectively splits the nearby search space, and the tree emulates that splitting.
The run time of the algorithm is not obvious, at all. It relies on the non-trivial properties of the sequence. If the 1's are really sparse then the naive algorithm will work under budget. If the 1's are dense, then a match should be found right away. But if the density is 'just right' (eg. near ~n^0.63, which you can achieve by setting all bits at positions with no '2' digit in base 3), I don't know if it will work. You would have to prove that the splitting effect is strong enough.
No theoretical answer here, but I wrote a quick Java program to explore the running-time behavior as a function of k and n, where n is the total bit length and k is the number of 1's. I'm with a few of the answerers who are saying that the "regular" algorithm that checks all the pairs of bit positions and looks for the 3rd bit, even though it would require O(k^2) in the worst case, in reality because the worst-case needs sparse bitstrings, is O(n ln n).
Anyway here's the program, below. It's a Monte-Carlo style program which runs a large number of trials NTRIALS for constant n, and randomly generates bitsets for a range of k-values using Bernoulli processes with ones-density constrained between limits that can be specified, and records the running time of finding or failing to find a triplet of evenly spaced ones, time measured in steps NOT in CPU time. I ran it for n=64, 256, 1024, 4096, 16384* (still running), first a test run with 500000 trials to see which k-values take the longest running time, then another test with 5000000 trials with narrowed ones-density focus to see what those values look like. The longest running times do happen with very sparse density (e.g. for n=4096 the running time peaks are in the k=16-64 range, with a gentle peak for mean runtime at 4212 steps # k=31, max runtime peaked at 5101 steps # k=58). It looks like it would take extremely large values of N for the worst-case O(k^2) step to become larger than the O(n) step where you scan the bitstring to find the 1's position indices.
package com.example.math;
import java.io.PrintStream;
import java.util.BitSet;
import java.util.Random;
public class EvenlySpacedOnesTest {
static public class StatisticalSummary
{
private int n=0;
private double min=Double.POSITIVE_INFINITY;
private double max=Double.NEGATIVE_INFINITY;
private double mean=0;
private double S=0;
public StatisticalSummary() {}
public void add(double x) {
min = Math.min(min, x);
max = Math.max(max, x);
++n;
double newMean = mean + (x-mean)/n;
S += (x-newMean)*(x-mean);
// this algorithm for mean,std dev based on Knuth TAOCP vol 2
mean = newMean;
}
public double getMax() { return (n>0)?max:Double.NaN; }
public double getMin() { return (n>0)?min:Double.NaN; }
public int getCount() { return n; }
public double getMean() { return (n>0)?mean:Double.NaN; }
public double getStdDev() { return (n>0)?Math.sqrt(S/n):Double.NaN; }
// some may quibble and use n-1 for sample std dev vs population std dev
public static void printOut(PrintStream ps, StatisticalSummary[] statistics) {
for (int i = 0; i < statistics.length; ++i)
{
StatisticalSummary summary = statistics[i];
ps.printf("%d\t%d\t%.0f\t%.0f\t%.5f\t%.5f\n",
i,
summary.getCount(),
summary.getMin(),
summary.getMax(),
summary.getMean(),
summary.getStdDev());
}
}
}
public interface RandomBernoulliProcess // see http://en.wikipedia.org/wiki/Bernoulli_process
{
public void setProbability(double d);
public boolean getNextBoolean();
}
static public class Bernoulli implements RandomBernoulliProcess
{
final private Random r = new Random();
private double p = 0.5;
public boolean getNextBoolean() { return r.nextDouble() < p; }
public void setProbability(double d) { p = d; }
}
static public class TestResult {
final public int k;
final public int nsteps;
public TestResult(int k, int nsteps) { this.k=k; this.nsteps=nsteps; }
}
////////////
final private int n;
final private int ntrials;
final private double pmin;
final private double pmax;
final private Random random = new Random();
final private Bernoulli bernoulli = new Bernoulli();
final private BitSet bits;
public EvenlySpacedOnesTest(int n, int ntrials, double pmin, double pmax) {
this.n=n; this.ntrials=ntrials; this.pmin=pmin; this.pmax=pmax;
this.bits = new BitSet(n);
}
/*
* generate random bit string
*/
private int generateBits()
{
int k = 0; // # of 1's
for (int i = 0; i < n; ++i)
{
boolean b = bernoulli.getNextBoolean();
this.bits.set(i, b);
if (b) ++k;
}
return k;
}
private int findEvenlySpacedOnes(int k, int[] pos)
{
int[] bitPosition = new int[k];
for (int i = 0, j = 0; i < n; ++i)
{
if (this.bits.get(i))
{
bitPosition[j++] = i;
}
}
int nsteps = n; // first, it takes N operations to find the bit positions.
boolean found = false;
if (k >= 3) // don't bother doing anything if there are less than 3 ones. :(
{
int lastBitSetPosition = bitPosition[k-1];
for (int j1 = 0; !found && j1 < k; ++j1)
{
pos[0] = bitPosition[j1];
for (int j2 = j1+1; !found && j2 < k; ++j2)
{
pos[1] = bitPosition[j2];
++nsteps;
pos[2] = 2*pos[1]-pos[0];
// calculate 3rd bit index that might be set;
// the other two indices point to bits that are set
if (pos[2] > lastBitSetPosition)
break;
// loop inner loop until we go out of bounds
found = this.bits.get(pos[2]);
// we're done if we find a third 1!
}
}
}
if (!found)
pos[0]=-1;
return nsteps;
}
/*
* run an algorithm that finds evenly spaced ones and returns # of steps.
*/
public TestResult run()
{
bernoulli.setProbability(pmin + (pmax-pmin)*random.nextDouble());
// probability of bernoulli process is randomly distributed between pmin and pmax
// generate bit string.
int k = generateBits();
int[] pos = new int[3];
int nsteps = findEvenlySpacedOnes(k, pos);
return new TestResult(k, nsteps);
}
public static void main(String[] args)
{
int n;
int ntrials;
double pmin = 0, pmax = 1;
try {
n = Integer.parseInt(args[0]);
ntrials = Integer.parseInt(args[1]);
if (args.length >= 3)
pmin = Double.parseDouble(args[2]);
if (args.length >= 4)
pmax = Double.parseDouble(args[3]);
}
catch (Exception e)
{
System.out.println("usage: EvenlySpacedOnesTest N NTRIALS [pmin [pmax]]");
System.exit(0);
return; // make the compiler happy
}
final StatisticalSummary[] statistics;
statistics=new StatisticalSummary[n+1];
for (int i = 0; i <= n; ++i)
{
statistics[i] = new StatisticalSummary();
}
EvenlySpacedOnesTest test = new EvenlySpacedOnesTest(n, ntrials, pmin, pmax);
int printInterval=100000;
int nextPrint = printInterval;
for (int i = 0; i < ntrials; ++i)
{
TestResult result = test.run();
statistics[result.k].add(result.nsteps);
if (i == nextPrint)
{
System.err.println(i);
nextPrint += printInterval;
}
}
StatisticalSummary.printOut(System.out, statistics);
}
}
# <algorithm>
def contains_evenly_spaced?(input)
return false if input.size < 3
one_indices = []
input.each_with_index do |digit, index|
next if digit == 0
one_indices << index
end
return false if one_indices.size < 3
previous_indexes = []
one_indices.each do |index|
if !previous_indexes.empty?
previous_indexes.each do |previous_index|
multiple = index - previous_index
success_index = index + multiple
return true if input[success_index] == 1
end
end
previous_indexes << index
end
return false
end
# </algorithm>
def parse_input(input)
input.chars.map { |c| c.to_i }
end
I'm having trouble with the worst-case scenarios with millions of digits. Fuzzing from /dev/urandom essentially gives you O(n), but I know the worst case is worse than that. I just can't tell how much worse. For small n, it's trivial to find inputs at around 3*n*log(n), but it's surprisingly hard to differentiate those from some other order of growth for this particular problem.
Can anyone who was working on worst-case inputs generate a string with length greater than say, one hundred thousand?
An adaptation of the Rabin-Karp algorithm could be possible for you.
Its complexity is 0(n) so it could help you.
Take a look http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
Could this be a solution? I', not sure if it's O(nlogn) but in my opinion it's better than O(n²) because the the only way not to find a triple would be a prime number distribution.
There's room for improvement, the second found 1 could be the next first 1. Also no error checking.
#include <iostream>
#include <string>
int findIt(std::string toCheck) {
for (int i=0; i<toCheck.length(); i++) {
if (toCheck[i]=='1') {
std::cout << i << ": " << toCheck[i];
for (int j = i+1; j<toCheck.length(); j++) {
if (toCheck[j]=='1' && toCheck[(i+2*(j-i))] == '1') {
std::cout << ", " << j << ":" << toCheck[j] << ", " << (i+2*(j-i)) << ":" << toCheck[(i+2*(j-i))] << " found" << std::endl;
return 0;
}
}
}
}
return -1;
}
int main (int agrc, char* args[]) {
std::string toCheck("1001011");
findIt(toCheck);
std::cin.get();
return 0;
}
I think this algorithm has O(n log n) complexity (C++, DevStudio 2k5). Now, I don't know the details of how to analyse an algorithm to determine its complexity, so I have added some metric gathering information to the code. The code counts the number of tests done on the sequence of 1's and 0's for any given input (hopefully, I've not made a balls of the algorithm). We can compare the actual number of tests against the O value and see if there's a correlation.
#include <iostream>
using namespace std;
bool HasEvenBits (string &sequence, int &num_compares)
{
bool
has_even_bits = false;
num_compares = 0;
for (unsigned i = 1 ; i <= (sequence.length () - 1) / 2 ; ++i)
{
for (unsigned j = 0 ; j < sequence.length () - 2 * i ; ++j)
{
++num_compares;
if (sequence [j] == '1' && sequence [j + i] == '1' && sequence [j + i * 2] == '1')
{
has_even_bits = true;
// we could 'break' here, but I want to know the worst case scenario so keep going to the end
}
}
}
return has_even_bits;
}
int main ()
{
int
count;
string
input = "111";
for (int i = 3 ; i < 32 ; ++i)
{
HasEvenBits (input, count);
cout << i << ", " << count << endl;
input += "0";
}
}
This program outputs the number of tests for each string length up to 32 characters. Here's the results:
n Tests n log (n)
=====================
3 1 1.43
4 2 2.41
5 4 3.49
6 6 4.67
7 9 5.92
8 12 7.22
9 16 8.59
10 20 10.00
11 25 11.46
12 30 12.95
13 36 14.48
14 42 16.05
15 49 17.64
16 56 19.27
17 64 20.92
18 72 22.59
19 81 24.30
20 90 26.02
21 100 27.77
22 110 29.53
23 121 31.32
24 132 33.13
25 144 34.95
26 156 36.79
27 169 38.65
28 182 40.52
29 196 42.41
30 210 44.31
31 225 46.23
I've added the 'n log n' values as well. Plot these using your graphing tool of choice to see a correlation between the two results. Does this analysis extend to all values of n? I don't know.

Algorithm to find two repeated numbers in an array, without sorting

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.
E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.
What is the best way to find the repeated numbers?
P.S. [You should not use sorting]
There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.
bool flags[100];
for(int i = 0; i < 100; i++)
flags[i] = false;
for(int i = 0; i < input_size; i++)
if(flags[input_array[i]])
return input_array[i];
else
flags[input_array[i]] = true;
Of course there is the additional memory but this is the fastest.
OK, seems I just can't give it a rest :)
Simplest solution
int A[N] = {...};
int signed_1(n) { return n%2<1 ? +n : -n; } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n; } // 0,+1,-2,-3,+4,+5,-6,-7,...
long S1 = 0; // or int64, or long long, or some user-defined class
long S2 = 0; // so that it has enough bits to contain sum without overflow
for (int i=0; i<N-2; ++i)
{
S1 += signed_1(A[i]) - signed_1(i);
S2 += signed_2(A[i]) - signed_2(i);
}
for (int i=N-2; i<N; ++i)
{
S1 += signed_1(A[i]);
S2 += signed_2(A[i]);
}
S1 = abs(S1);
S2 = abs(S2);
assert(S1 != S2); // this algorithm fails in this case
p = (S1+S2)/2;
q = abs(S1-S2)/2;
One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().
if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).
Previous solution
I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:
Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)
so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2
Simple math remains:
(1) p+q = S1
(2) p^2+q^2 = S2
Surely you won't solve it as math classes teach to solve square equations.
First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.
You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.
You can calculate the sum and the product over the array, resulting in:
1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2
So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get
sum(Array) - (n-3)(n-2)/2 = x = p + q
Now do the same for the product:
1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q
prod(Array) / (n - 3)! = y = p * q
Your now got these terms:
x = p + q
y = p * q
=> y(p + q) = x(p * q)
If you transform this term, you should be able to calculate p and q
Insert each element into a set/hashtable, first checking if its are already in it.
You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.
Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.
Check this old but good paper on the topic:
Finding Repeated Elements (PDF)
Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.
For example, here's a relevant part from my answer:
bool has_duplicates(int* a, int m, int n)
{
/** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')
Whether a[] array has duplicates.
precondition: all values are in [n, n+m) range.
feature: It marks visited items using a sign bit.
*/
assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
for (int *p = a; p != &a[m]; ++p) {
*p -= (n - 1); // [n, n+m) -> [1, m+1)
assert(*p > 0);
}
// determine: are there duplicates
bool has_dups = false;
for (int i = 0; i < m; ++i) {
const int j = abs(a[i]) - 1;
assert(j >= 0);
assert(j < m);
if (a[j] > 0)
a[j] *= -1; // mark
else { // already seen
has_dups = true;
break;
}
}
// restore the array
for (int *p = a; p != &a[m]; ++p) {
if (*p < 0)
*p *= -1; // unmark
// [1, m+1) -> [n, n+m)
*p += (n - 1);
}
return has_dups;
}
The program leaves the array unchanged (the array should be writeable but its values are restored on exit).
It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).
suppose array is
a[0], a[1], a[2] ..... a[n-1]
sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]
sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2
similarly
sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6
Suppose repeated elements are = X and Y
so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;
So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
I know the question is very old but I suddenly hit it and I think I have an interesting answer to it.
We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.
As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.
We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:
Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )
We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.
Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).
We can calculate (A XOR B) as follows:
A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2 ... XOR 0
Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).
For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.
But what if A XOR B is zero?
Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.
Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.
Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:
int count [10];
for (int i = 0; i < arraylen; i++) {
count[array[i]]++;
}
Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.
Here's implementation in Python of #eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).
def two_repeated(iterable):
s1, s2 = 0, 0
for i, j in enumerate(iterable):
s1 += j - i # number_of_digits(s1) ~ 2 * number_of_digits(i)
s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i)
s1 += (i - 1) + i
s2 += (i - 1)**2 + i**2
p = (s1 - int((2*s2 - s1**2)**.5)) // 2
# `Decimal().sqrt()` could replace `int()**.5` for really large integers
# or any function to compute integer square root
return p, s1 - p
Example:
>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
A more verbose version of the above code follows with explanation:
def two_repeated_seq(arr):
"""Return the only two duplicates from `arr`.
>>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)
"""
n = len(arr)
assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
assert len(set(arr)) == (n - 2) # number of unique items
s1 = (n-2) + (n-1) # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits
s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
for i, j in enumerate(arr):
s1 += j - i
s2 += j*j - i*i
"""
s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
= sum(arr) - sum(range(n-2))
= sum(range(n-2)) + p + q - sum(range(n-2))
= p + q
"""
assert s1 == (sum(arr) - sum(range(n-2)))
"""
s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
= sum(i*i for i in arr) - sum(i*i for i in range(n-2))
= p*p + q*q
"""
assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))
"""
s1 = p+q
-> s1**2 = (p+q)**2
-> s1**2 = p*p + 2*p*q + q*q
-> s1**2 - (p*p + q*q) = 2*p*q
s2 = p*p + q*q
-> p*q = (s1**2 - s2)/2
Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
that p and q are roots of x**2 - B*x + C = 0
-> p = (B + sqrtD) / 2
-> q = (B - sqrtD) / 2
where sqrtD = sqrt(B**2 - 4*C)
-> p = (s1 + sqrt(2*s2 - s1**2))/2
"""
sqrtD = (2*s2 - s1**2)**.5
assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
sqrtD = int(sqrtD)
assert (s1 - sqrtD) % 2 == 0 # even
p = (s1 - sqrtD) // 2
q = s1 - p
assert q == ((s1 + sqrtD) // 2)
assert sqrtD == (q - p)
return p, q
NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.
Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)
You can use simple nested for loop
int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };
for (int i = 0; i < numArray.Length; i++)
{
for (int j = i + 1; j < numArray.Length; j++)
{
if (numArray[i] == numArray[j])
{
//DO SOMETHING
}
}
*OR you can filter the array and use recursive function if you want to get the count of occurrences*
int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;
void GetDuplicates(int[] array)
for (int i = 0; i < array.Length; i++)
{
for (int j = i + 1; j < array.Length; j++)
{
if (array[i] == array[j])
{
a += 1;
}
}
Console.WriteLine(" {0} occurred {1} time/s", array[i], a);
IEnumerable<int> num = from n in array where n != array[i] select n;
myNewArray = null;
a = 1;
myNewArray = num.ToArray() ;
break;
}
GetDuplicates(myNewArray);
answer to 18..
you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!
check this out ...
O(n) time and O(1) space complexity
for(i=0;i< n;i++)
xor=xor^arr[i]
for(i=1;i<=n-3;i++)
xor=xor^i;
So in the given example you will get the xor of 3 and 5
xor=xor & -xor //Isolate the last digit
for(i = 0; i < n; i++)
{
if(arr[i] & xor)
x = x ^ arr[i];
else
y = y ^ arr[i];
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
x = x ^ i;
else
y = y ^ i;
}
x and y are your answers
For each number: check if it exists in the rest of the array.
Without sorting you're going to have a keep track of numbers you've already visited.
in psuedocode this would basically be (done this way so I'm not just giving you the answer):
for each number in the list
if number not already in unique numbers list
add it to the unique numbers list
else
return that number as it is a duplicate
end if
end for each
How about this:
for (i=0; i<n-1; i++) {
for (j=i+1; j<n; j++) {
if (a[i] == a[j]) {
printf("%d appears more than once\n",a[i]);
break;
}
}
}
Sure it's not the fastest, but it's simple and easy to understand, and requires
no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)
In c:
int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};
int num = 0, i;
for (i=0; i < 8; i++)
num = num ^ arr[i] ^i;
Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.
I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.
So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.
Lets take an input array as below
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.
Lets apply my programme on this
using namespace std;
main()
{
//int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
int arr[] = {1,1,2,10,3,3,4,5,5,6,6};
int i = 0;
vector<int> vec;
int var = arr[0];
for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
{
var = var ^ arr[i];
if(var != 0 )
{
//put in vector
var = arr[i-1];
vec.push_back(var);
i = i-1;
}
var = arr[i+1];
}
for(int i = 0 ; i < vec.size() ; i++)
printf("value not repeated = %d\n",vec[i]);
}
This gives the output:
value not repeated= 2
value not repeated= 10
value not repeated= 4
Its simple and very straight forward, just use XOR man.
for(i=1;i<=n;i++) {
if(!(arr[i] ^ arr[i+1]))
printf("Found Repeated number %5d",arr[i]);
}
Here is an algorithm that uses order statistics and runs in O(n).
You can solve this by repeatedly calling SELECT with the median as parameter.
You also rely on the fact that After a call to SELECT,
the elements that are less than or equal to the median are moved to the left of the median.
Call SELECT on A with the median as the parameter.
If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
You continue this way recursively.
For example:
When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
After the first call to SELECT
A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
A={3, 2, 0, 1, 3}
After the second call to SELECT
A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
A={3, 3}, found.
This algorithm runs in O(n+n/2+n/4+...)=O(n).
What about using the https://en.wikipedia.org/wiki/HyperLogLog?
Redis does http://redis.io/topics/data-types-intro#hyperloglogs
A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.
Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.
def repeated(ar,n):
count=0
for i in range(n):
for j in range(i+1,n):
if ar[i] == ar[j]:
count+=1
if count == 1:
count=0
print("repeated:",ar[i])
arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no .
This can be extended to find out if there is any missing no in the array or not.
This solution is O(n) in time complexity

Resources