dynamic programming reduction of brute force - algorithm

A emoticon consists of an arbitrary positive number of underscores between two semicolons. Hence, the shortest possible emoticon is ;_;. The strings ;__; and ;_____________; are also valid emoticons.
given a String containing only(;,_).The problem is to divide string into one or more emoticons and count how many division are possible. Each emoticon must be a subsequence of the message, and each character of the message must belong to exactly one emoticon. Note that the subsequences are not required to be contiguous. subsequence definition.
The approach I thought of is to write a recursive method as follows:
countDivision(string s){
//base cases
if(s.empty()) return 1;
if(s.length()<=3){
if(s.length()!=3) return 0;
return s[0]==';' && s[1]=='_' && s[2]==';';
}
result=0;
//subproblems
genrate all valid emocticon and remove it from s let it be w
result+=countDivision(w);
return result;
}
The solution above will easily timeout when n is large such as 100. What kind of approach should I use to convert this brute force solution to a dynamic programming solution?
Few examples
1. ";_;;_____;" ans is 2
2. ";;;___;;;" ans is 36
Example 1.
";_;;_____;" Returns: 2
There are two ways to divide this string into two emoticons.
One looks as follows: ;_;|;_____; and the other looks like
this(rembember we can pick subsequence it need not be contigous): ;_ ;|; _____;

I'll describe an O(n^4)-time and -space dynamic programming solution (that can easily be improved to use just O(n^3) space) that should work for up to n=100 or so.
Call a subsequence "fresh" if consists of a single ;.
Call a subsequence "finished" if it corresponds to an emoticon.
Call a subsequence "partial" if it has nonzero length and is a proper prefix of an emoticon. (So for example, ;, ;_, and ;___ are all partial subsequences, while the empty string, _, ;; and ;___;; are not.)
Finally, call a subsequence "admissible" if it is fresh, finished or partial.
Let f(i, j, k, m) be the number of ways of partitioning the first i characters of the string into exactly j+k+m admissible subsequences, of which exactly j are fresh, k are partial and m are finished. Notice that any prefix of a valid partition into emoticons determines i, j, k and m uniquely -- this means that no prefix of a valid partition will be counted by more than one tuple (i, j, k, m), so if we can guarantee that, for each tuple (i, j, k, m), the partition prefixes within that tuple are all counted once and only once, then we can add together the counts for tuples to get a valid total. Specifically, the answer to the question will then be the sum over all 1 <= j <= n of f(n, 0, j, 0).
If s[i] = "_":
f(i, j, k, m) =
(j+1) * f(i-1, j+1, k, m-1) // Convert any of the j+1 fresh subsequences to partial
+ m * f(i-1, j, k, m) // Add _ to any of the m partial subsequences
Else if s[i] = ";":
f(i, j, k, m) =
f(i-1, j-1, k, m) // Start a fresh subsequence
+ (m+1) * f(i-1, j, k-1, m+1) // Finish any of the m+1 partial subsequences
We also need the base cases
f(0, 0, 0, 0) = 1
f(0, _, _, _) = 0
f(i, j, k, m) = 0 if any of i, j, k or m are negative
My own C++ implementation gives the correct answer of 36 for ;;;___;;; in a few milliseconds, and e.g. for ;;;___;;;_;_; it gives an answer of 540 (also in a few milliseconds). For a string consisting of 66 ;s followed by 66 _s followed by 66 ;s, it takes just under 2s and reports an answer of 0 (probably due to overflow of the long long).

Here's a fairly straightforward memoized recursion that returns an answer immediately for a string of 66 ;s followed by 66 _s followed by 66 ;s. The function has three parameters: i = index in the string, j = number of accumulating emoticons with only a left semi-colon, and k = number of accumulating emoticons with a left semi-colon and one or more underscores.
An array is also constructed for how many underscores and semi-colons are available to the right of each index, to help decide on the next possibilities.
Complexity is O(n^3) and the problem constrains the search space, where j is at most n/2 and k at most n/4.
Commented JavaScript code:
var s = ';_;;__;_;;';
// record the number of semi-colons and
// underscores to the right of each index
var cs = new Array(s.length);
cs.push(0);
var us = new Array(s.length);
us.push(0);
for (var i=s.length-1; i>=0; i--){
if (s[i] == ';'){
cs[i] = cs[i+1] + 1;
us[i] = us[i+1];
} else {
us[i] = us[i+1] + 1;
cs[i] = cs[i+1];
}
}
// memoize
var h = {};
function f(i,j,k){
// memoization
var key = [i,j,k].join(',');
if (h[key] !== undefined){
return h[key];
}
// base case
if (i == s.length){
return 1;
}
var a = 0,
b = 0;
if (s[i] == ';'){
// if there are still enough colons to start an emoticon
if (cs[i] > j + k){
// start a new emoticon
a = f(i+1,j+1,k);
}
// close any of k partial emoticons
if (k > 0){
b = k * f(i+1,j,k-1);
}
}
if (s[i] == '_'){
// if there are still extra underscores
if (j < us[i] && k > 0){
// apply them to partial emoticons
a = k * f(i+1,j,k);
}
// convert started emoticons to partial
if (j > 0){
b = j * f(i+1,j-1,k+1);
}
}
return h[key] = a + b;
}
console.log(f(0,0,0)); // 52

Related

Majority element of JPEG images using Divide and conquer [duplicate]

An array is said to have a majority element if more than half of its elements are the same. Is there a divide-and-conquer algorithm for determining if an array has a majority element?
I normally do the following, but it is not using divide-and-conquer. I do not want to use the Boyer-Moore algorithm.
int find(int[] arr, int size) {
int count = 0, i, mElement;
for (i = 0; i < size; i++) {
if (count == 0) mElement = arr[i];
if (arr[i] == mElement) count++;
else count--;
}
count = 0;
for (i = 0; i < size; i++) {
if (arr[i] == mElement) count++;
}
if (count > size / 2) return mElement;
return -1;
}
I can see at least one divide and conquer method.
Start by finding the median, such as with Hoare's Select algorithm. If one value forms a majority of the elements, the median must have that value, so we've just found the value we're looking for.
From there, find (for example) the 25th and 75th percentile items. Again, if there's a majority element, at least one of those would need to have the same value as the median.
Assuming you haven't ruled out there being a majority element yet, you can continue the search. For example, let's assume the 75th percentile was equal to the median, but the 25th percentile wasn't.
When then continue searching for the item halfway between the 25th percentile and the median, as well as the one halfway between the 75th percentile and the end.
Continue finding the median of each partition that must contain the end of the elements with the same value as the median until you've either confirmed or denied the existence of a majority element.
As an aside: I don't quite see how Boyer-Moore would be used for this task. Boyer-Moore is a way of finding a substring in a string.
There is, and it does not require the elements to have an order.
To be formal, we're dealing with multisets (also called bags.) In the following, for a multiset S, let:
v(e,S) be the multiplicity of an element e in S, i.e. the number of times it occurs (the multiplicity is zero if e is not a member of S at all.)
#S be the cardinality of S, i.e. the number of elements in S counting multiplicity.
⊕ be the multiset sum: if S = L ⊕ R then S contains all the elements of L and R counting multiplicity, i.e. v(e;S) = v(e;L) + v(e;R) for any element e. (This also shows that the multiplicity can be calculated by 'divide-and-conquer'.)
[x] be the largest integer less than or equal to x.
The majority element m of S, if it exists, is that element such that 2 v(m;S) > #S.
Let's call L and R a splitting of S if L ⊕ R = S and an even splitting if |#L - #R| ≤ 1. That is, if n=#S is even, L and R have exactly half the elements of S, and if n is odd, than one has cardinality [n/2] and the other has cardinality [n/2]+1.
For an arbitrary split of S into L and R, two observations:
If neither L nor R has a majority element, then S cannot: for any element e, 2 v(e;S) = 2 v(e;L) + 2 v(e;R) ≤ #L + #R = #S.
If one of L and R has a majority element m with multiplicity k, then it is the majority element of S only if it has multiplicity r in the other half, with 2(k+r) > #S.
The algorithm majority(S) below returns either a pair (m,k), indicating that m is the majority element with k occurrences, or none:
If S is empty, return none; if S has just one element m, then return (m,1). Otherwise:
Make an even split of S into two halves L and R.
Let (m,k) = majority(L), if not none:
a. Let k' = k + v(m;R).
b. Return (m,k') if 2 k' > n.
Otherwise let (m,k) = majority(R), if not none:
a. Let k' = k + v(m;L).
b. Return (m,k') if 2 k' > n.
Otherwise return none.
Note that the algorithm is still correct even if the split is not an even one. Splitting evenly though is likely to perform better in practice.
Addendum
Made the terminal case explicit in the algorithm description above. Some sample C++ code:
struct majority_t {
int m; // majority element
size_t k; // multiplicity of m; zero => no majority element
constexpr majority_t(): m(0), k(0) {}
constexpr majority_t(int m_,size_t k_): m(m_), k(k_) {}
explicit operator bool() const { return k>0; }
};
static constexpr majority_t no_majority;
size_t multiplicity(int x,const int *arr,size_t n) {
if (n==0) return 0;
else if (n==1) return arr[0]==x?1:0;
size_t r=n/2;
return multiplicity(x,arr,r)+multiplicity(x,arr+r,n-r);
}
majority_t majority(const int *arr,size_t n) {
if (n==0) return no_majority;
else if (n==1) return majority_t(arr[0],1);
size_t r=n/2;
majority_t left=majority(arr,r);
if (left) {
left.k+=multiplicity(left.m,arr+r,n-r);
if (left.k>r) return left;
}
majority_t right=majority(arr+r,n-r);
if (right) {
right.k+=multiplicity(right.m,arr,r);
if (right.k>r) return right;
}
return no_majority;
}
A simpler divide and conquer algorithm works for the case that there exists more than 1/2 elements which are the same and there are n = 2^k elements for some integer k.
FindMost(A, startIndex, endIndex)
{ // input array A
if (startIndex == endIndex) // base case
return A[startIndex];
x = FindMost(A, startIndex, (startIndex + endIndex - 1)/2);
y = FindMost(A, (startIndex + endIndex - 1)/2 + 1, endIndex);
if (x == null && y == null)
return null;
else if (x == null && y != null)
return y;
else if (x != null && y == null)
return x;
else if (x != y)
return null;
else return x
}
This algorithm could be modified so that it works for n which is not exponent of 2, but boundary cases must be handled carefully.
Lets say the array is 1, 2, 1, 1, 3, 1, 4, 1, 6, 1.
If an array contains more than half of elements same then there should be a position where the two consecutive elements are same.
In the above example observe 1 is repeated more than half times. And the indexes(index start from 0) index 2 and index 3 have same element.

Max suffix of a list

This problem is trying to find the lexicographical max suffix of a given list.
Suppose we have an array/list [e1;e2;e3;e4;e5].
Then all suffixes of [e1;e2;e3;e4;e5] are:
[e1;e2;e3;e4;e5]
[e2;e3;e4;e5]
[e3;e4;e5]
[e4;e5]
[e5]
Then our goal is to find the lexicographical max one among the above 5 lists.
for example, all suffixes of [1;2;3;1;0] are
[1;2;3;1;0]
[2;3;1;0]
[3;1;0]
[1;0]
[0].
The lexicographical max suffix is [3;1;0] from above example.
The straightforward algorithm is just to compare all suffixes one by one and always record the max. The time complexity is O(n^2) as comparing two lists need O(n).
However, the desired time complexity is O(n) and no suffix tree (no suffix array either) should be used.
please note that elements in the list may not be distinct
int max_suffix(const vector<int> &a)
{
int n = a.size(),
i = 0,
j = 1,
k;
while (j < n)
{
for (k = 0; j + k < n && a[i + k] == a[j + k]; ++k);
if (j + k == n) break;
(a[i + k] < a[j + k] ? i : j) += k + 1;
if (i == j)
++j;
else if (i > j)
swap(i, j);
}
return i;
}
My solution is a little modification of the solution to the problem Minimum Rotations.
In the above code, each time it step into the loop, it's keeped that i < j, and all a[p...n] (0<=p<j && p!=i) are not the max suffix. Then in order to decide which of a[i...n] and a[j...n] is less lexicographical, use the for-loop to find the least k that make a[i+k]!=a[j+k], then update i and j according to k.
We can skip k elements for i or j, and still keep it true that all a[p...n] (0<=p<j && p!=i) are not the max suffix. For example, if a[i+k]<a[j+k], then a[i+p...n](0<=p<=k) is not max suffix, since a[j+p...n] is lexicographically greater than it.
Imagine in a two player game, two opponents A and B work against each other, on finding the max suffix of a given string s. Whoever first finds the max suffix will win the game. In the first round, A picks suffix s[i..], and B picks suffix s[j..].
i: _____X
j: _____Y
Matched length = k
A judge compares two suffixes and finds there is mismatch after k comparisons, as shown in the fig above.
Without the loss of generality, we assume X > Y, then B is lost in this round. So he has to pick a different suffix in order to (possibly) beat A in next round. If B is smart, he will not pick any suffix starting at position j, j + 1, ..., j + k, because s[j..] is already beaten by s[i..] and he knows s[j+1..] will be beaten by s[i+1..], and s[j+2..] will be beaten by s[i+2..] and so on. So B should pick suffix S[j + k + 1..] for next round. One extra observation is that B should not pick the same suffix as A either because the first person who finds the max suffix wins the game. If j + k + 1 happens to be equal to i, B should skip to the next position.
Finally, after many rounds, either A or B will run out choices and lose the game, because the number of choices are limited for both A and B, and some choices will be eliminated after each round.
When this happens, the current suffix that the winner holds is the max suffix (Remember the loser runs out all choices. A choice is given up because either it cannot possibly be max suffix, or it is currently held by the other person. So the only reason that the loser gives up the actual max suffix in some round is that his opponent is holding it. Once a player holds max suffix, he will never lose and give it up).
The program below in C++ is almost literal translation of this game.
int maxSuffix(const std::string& s) {
std::size_t i = 0, j = 1, k;
while (i < s.size() && j < s.size()) {
for (k = 0; i + k < s.size() && j + k < s.size() && s[i + k] == s[j +k]; ++k) { } //judge
if (j + k >= s.size()) return i; //B is finally lost
if (i + k >= s.size()) return j; //A is finally lost
if (s[i + k] > s[j + k]) { //B is lost in this round so he needs a new choice
j = j + k + 1;
if (j == i) ++j;
} else { //A is lost in this round so he needs a new choice
i = i + k + 1;
if (i == j) ++i;
}
}
return j >= s.size() ? i : j;
}
Running time analysis: Initially each player has n choices. After each round, the judge makes k comparisons, and at least k possible choices are eliminated from either A or B. So the total number of comparisons are bounded by 2n when the game is over.
The discussion above is in the context of string, but it should work with minor modification on any container that supports sequential access only.

Generating M distinct random numbers (one at a time) from a given range 0..N-1 in less than O(M) memory

Is there any method to do this?
I mean, we even cannot work with "in" array of {0,1,..,N-1} (because it's at least O(N) memory).
M can be = N. N can be > 2^64. Result should be uniformly random and would better be every possible sequence (but may not).
Also full-range PRNGs (and friends) aren't suitable, because it will give same sequence each time.
Time complexity doesn't matter.
If you don't care what order the random selection comes out in, then it can be done in constant memory. The selection comes out in order.
The answer hinges on estimating the probability that the smallest value in a random selection of M distinct values of the set {0, ..., N-1} is i, for each possible i. Call this value p(i, M, N). With more mathematics than I have the patience to type into an interface which doesn't support Latex, you can derive some pretty good estimates for the p function; here, I'll just show the simple, non-time-efficient approach.
Let's just focus on p(0, M, N), which is the probability that a random selection of M out of N objects will include the first object. Then we can iterate through the objects (that is, the numbers 0...N-1) one at a time; deciding for each one whether it is included or not by flipping a weighted coin. We just need to compute the coin's weights for each flip.
By definition, there are MCN possible M-selections of a set of N objects. Of these MCN-1 do not include the first element. (That's the count of M-selections of N-1 objects, which is all the M-selections of the set missing one element). Similarly, M-1CN-1 selections do include the first element (that is, all the M-1-selections of the N-1-set, with the first element added to each selection).
These two values add up to MCN; the well-known recursive algorithm for computing C.
So p(0, M, N) is just M-1CN-1/MCN. Since MCN = N!/(M!*(N-M)!), we can simplify that fraction to M/N. As expected, if M == N, that works out to 1 (M of N objects must include every object).
So now we know what the probability that the first object will be in the selection. We can then reduce the size of the set, and either reduce the remaining selection size or not, depending on whether the coin flip determined that we did or did not include the first object. So here's the final algorithm, in pseudo-code, based on the existence of the weighted random boolean function:
w(x, y) => true with probability X / Y; otherwise false.
I'll leave the implementation of w for the reader, since it's trivial.
So:
Generate a random M-selection from the set 0...N-1
Parameters: M, N
Set i = 0
while M > 0:
if w(M, N):
output i
M = M - 1
N = N - 1
i = i + 1
It might not be immediately obvious that that works, but note that:
the output i statement must be executed exactly M times, since it is coupled with a decrement of M, and the while loop executes until M is 0
The closer M gets to N, the higher the probability that M will be decremented. If we ever get to the point where M == N, then both will be decremented in lockstep until they both reach 0.
i is incremented exactly when N is decremented, so it must always be in the range 0...N-1. In fact, it's redundant; we could output N-1 instead of outputting i, which would change the algorithm to produce sets in decreasing order instead of increasing order. I didn't do that because I think the above is easier to understand.
The time complexity of that algorithm is O(N+M) which must be O(N). If N is large, that's not great, but the problem statement said that time complexity doesn't matter, so I'll leave it there.
PRNGs that don't map their state space to a lower number of bits for output should work fine. Examples include Linear Congruential Generators and Tausworthe generators. They will give the same sequence if you use the same seed to start them, but that's easy to change.
Brute force:
if time complexity doesn't matter it would be a solution for 0 < M <= N invariant. nextRandom(N) is a function which returns random integer in [0..N):
init() {
for (int idx = 0; idx < N; idx++) {
a[idx] = -1;
}
for (int idx = 0; idx < M; idx++) {
getNext();
}
}
int getNext() {
for (int idx = 1; idx < M; idx++) {
a[idx -1] = a[idx];
}
while (true) {
r = nextRandom(N);
idx = 0;
while (idx < M && a[idx] != r) idx++;
if (idx == M) {
a[idx - 1] = r;
return r;
}
}
}
O(M) solution: It is recursive solution for simplicity. It supposes to run nextRandom() which returns a random number in [0..1):
rnd(0, 0, N, M); // to get next M distinct random numbers
int rnd(int idx, int n1, int n2, int m) {
if (n1 >= n2 || m <= 0) return idx;
int r = nextRandom(n2 - n1) + n1;
int m1 = (int) ((m-1.0)*(r-n1)/(n2-n1) + nextRandom()); // gives [0..m-1]
int m2 = m - m1 - 1;
idx = rnd(idx, n1, r-1, m1);
print r;
return rnd(idx+1, r+1, n2, m2);
}
the idea is to select a random r in between [0..N) on first step which splits the range on two sub-ranges by N1 and N2 elements in each (N1+N2==N-1). We need to repeat the same step for [0..r) which has N1 elements and [r+1..N) (N2 elements) choosing M1 and M2 (M1+M2==M-1) so as M1/M2 == N1/N2. M1 and M2 must be integers, but the proportion can give real results, we need to round values with probabilities (1.2 will give 1 with p=0.8 and 2 with p=0.2 etc.).

counting boolean parenthesizations implementation

Given a boolean expression containing the symbols {true, false, and, or, xor}, count the number of ways to parenthesize the expression such that it evaluates to true.
For example, there is only 1 way to parenthesize 'true and false xor true' such that it evaluates to true.
Here is my algorithm
we can calculate the total number of parenthesization of a string
Definition:
N - the total number of
True - the number of parenthesizations that evaluates to true
False - the number of parenthesizations that evaluates to false
True + False = N
Left_True - the number of parenthesization in the left part that evaluates to True
same to Left_False, Right_True, Right_False
we iterate the input string from left to right and deal with each operator as follows:
if it is "and", the number of parenthesization leads to true is
Left_True * Right_True;
if it is "xor", the number of parenthesization leads to true
Left_True * Right_False + Left_False * Right_True
if it is 'or', the number is
N - Left_False * Right_False
Here is my psuedocode
n = number of operator within the String
int[n][n] M; // save number of ways evaluate to true
for l = 2 to n
for i = 1 to n-l+1
do j = i+l-1
// here we have different string varying from 2 to n starting from i and ending at j
for k = i to j-1
// (i,k-1) is left part
// (k+1, j) is right part
switch(k){
case 'and': // calculate, update array m
case 'or': // same
case 'xor':
}
we save all the solutions to subproblems and read them when we meet them again. thus save time.
Can we have a better solution?
Your pseudocode gives an algorithm in O(2^n). I think you can have something in O(n^3).
First of all, let's see the complexity of your algorithm. Let's say that the number of operations needed to check the parenthesization is T(n). If I understood well, your algorithm consists of :
Cut the expression in two (n-1 possibilities)
Check if the left and the right part have appropriate parenthesization.
So T(n) = checking if you cut at the first place + checking if you cut at the second place + ... + checking if you cut at the last place
T(n) = T(1)+T(n-1) + T(2)+T(n-2) + ... + T(n-1)+T(1) + n
A bit of computation will tell you that T(n) = 2^n*T(1) + O(n^2) = O(2^n)
My idea is that what you only need is to check for parenthesization are the "subwords". The "subword_i_j" consists of all the litterals between position i and position j. Of course i<j so you have N*(N-1)/2 subwords. Let's say that L[i][j] is the number of valid parenthesizations of the subword_i_j. For the sake of convenience, I'll forget the other values M[i][j] that states the number of parenthesization that leads to false, but don't forget that it's here!
You want to compute all the possible subwords starting from the smallest ones (size 1) to the biggest one (size N).
You begin by computing L[i][i] for all i. There are N such values. It's easy, if the i-th litteral is True then L[i][i]=1 else L[i][i]=0. Now, you know the number of parenthesization for all subwords of size 1.
Lets say that you know the parenthesization for all subwords of size S.
Then compute L[i][i+S] for i between 1 and N-S. These are subwords of size S+1. It consists of splitting the subword in all possible ways (S ways), and checking if the left part(which is a subword of size S1<=S) and the right part(which is of size S2<=S) and the operator inbetween (or, xor, and) are compatible. There are S*(N-S) such values.
Finally, you'll end up with L[1][N] which will tell you if there is a valid parenthesization.
The cost is :
checking subwords of size 1 + checking subwords of size 2 + ... + checking subwords of size N
= N + N-1 + 2*(N-2) + 2*(N-2) + .. + (N-1)*(1)
= O(N^3)
The reason the complexity is better is that in your pseudocode, you check multiple times the same subwords without storing the result in memory.
Edit : Arglllll, I overlooked the sentence we save all the solutions to subproblems and read them when we meet them again. thus save time.. Well, seems that if you do, you also have an algorithm in worst-case O(N^3). Don't think you can do much better than that...
This problem can be solved by Dynamic-Algorithm and it is similar to Matrix chain multiplication problem, the detail answer is follow:
1、Let the operation consist of operand a_i and operator b_j (1<=i<=n, 1<=j<=n-1 n is the size of operand), substitute true for 1, substitute false for 0
2、Let DPone[i][j] be the number of ways to parenthesize in {a_i b_i a_i+1 ... b_j-1 b_j} such that the result is 1, Let DPzero[i][j] be the number of ways to parenthesize in {a_i b_i a_i+1 ... b_j-1 b_j} such that the result is 0
3、Build function oper(i,j,k), the return value is the number of ways such that operation's result is 1 when b_k is the last used operator in {a_i b_i a_i+1 ... b_j-1 b_j}, the direct operation method is based on b_k. For example, b_i is and, so the return value is DPone[i][k]*DPone[k+1][j].
4、Now the DP equation is follow:
DPone[i][j] = max{ sum ( oper(i,j,k) ) i<=k<=j-1 }
so we just need to determine DPone[1][n]. The complexity is O(n^3)
Intention:
1、We should determine DPzero[i][j] after determine DPone[i][j], but it's simple, DPzero[i][j]=total_Parenthesize_Ways[i][j]-DPone[i][j]
2、the order to find DPone is [1][1],[2][2],...[n][n],[1][2],[2][3],...[n-1][n],[1][3],[2][4]......[2][n],[1][n], of course, [1][1]~[n][n] should be initialized by ourselves.
Here is the code for counting parenthesizations for an array of booleans and operators.
Time complexity O(N^3) and space complexity O(N^2)
public static int CountingBooleanParenthesizations(bool[] boolValues, string[] operators)
{
int[,] trueTable = new int[boolValues.Length, boolValues.Length];
int[,] falseTable = new int[boolValues.Length, boolValues.Length];
for (int j = 0; j < boolValues.Length; j++)
{
for (int i = j; i >= 0; i--)
{
if (i == j)
{
trueTable[i, j] = boolValues[i] ? 1 : 0;
falseTable[i, j] = boolValues[i] ? 0 : 1;
}
else
{
int trueSum = 0;
int falseSum = 0;
for (int k = i; k < j; k++)
{
int total1 = trueTable[i, k] + falseTable[i, k];
int total2 = trueTable[k + 1, j] + falseTable[k + 1, j];
switch (operators[k])
{
case "or":
{
int or = falseTable[i, k] * falseTable[k + 1, j];
falseSum += or;
or = total1 * total2 - or;
trueSum += or;
}
break;
case "and":
{
int and = trueTable[i, k] * trueTable[k + 1, j];
trueSum += and;
and = total1 * total2 - and;
falseSum += and;
}
break;
case "xor":
{
int xor = trueTable[i, k] * falseTable[k + 1, j] + falseTable[i, k] * trueTable[k + 1, j];
trueSum += xor;
xor = total1 * total2 - xor;
falseSum += xor;
}
break;
}
}
trueTable[i, j] = trueSum;
falseTable[i, j] = falseSum;
}
}
}
return trueTable[0, boolValues.Length - 1];
}

Minimal cyclic shift algorithm explanation

I have recently came up against this code lacking any comment. It finds minimal cyclic shift of word (this code specifically returns its index in string) and its called Duval algorithm. Only info I found describes algorithm in few words and has cleaner code. I would appreciate any help in understanding this algorithm. I have always found text algorithms pretty tricky and rather hard to understand.
int minLexCyc(const char *x) {
int i = 0, j = 1, k = 1, p = 1, a, b, l = strlen(x);
while(j+k <= (l<<1)) {
if ((a=x[(i+k-1)%l])>(b=x[(j+k-1)%l])) {
i=j++;
k=p=1;
} else if (a<b) {
j+=k;
k=1;
p=j-i;
} else if (a==b && k!=p) {
k++;
} else {
j+=p;
k=1;
}
}
return i;
}
First, I believe that your code has a bug in it. The last line should be
return p;. I beleve that i holds the index of the lexicographically smallest cyclic shift, and p holds the smallest shift that matches. I also think that your stopping condition is too weak, i.e. you are doing too much checking after you have found a match, but I am not sure exactly what it should be.
Note that i and j only advance and that i is always less than j. We are looking for a string that matches the string starting at i, and we are trying to match it with a string that starts at j. We do this by comparing the k'th character of each string while increasing k (as long as they match). Note that we only change i if we determine that the string starting at j is lexicographically less than the string starting at j, and then we set i to j and reset k and p to their initial values.
I do not have time for a detailed analysis, but it looks like
i = the start of the lexicographic smallest cyclic shift
j = the start of the cyclic shift we are matching against the shift starting at i
k = the character in strings i and j currently under consideration (the strings match in positions 1 to k-1
p = the cyclic shift under consideration (i believe p stands for prefix)
Edit Going further
this section of code
if ((a=x[(i+k-1)%l])>(b=x[(j+k-1)%l])) {
i=j++;
k=p=1;
Moves the start of the comparison to a lexicographically earlier string when we find one and reinitializes everything else.
this section
} else if (a<b) {
j+=k;
k=1;
p=j-i;
is the tricky part. We have found a mismatch that is lexicographically later than our reference string, so we skip to the end of the text matched so far, and start matching from there. We also increase p (our stride). Why can we skip over all the starting points between j and j + k? This is because the string starting with i is the lexicographically smallest seen, and if the tail of the current j string is greater then the string at i then any suffix of the string at j will be greater than the string at i.
Finally
} else if (a==b && k!=p) {
k++;
} else {
j+=p;
k=1;
this just checks that the string of length p starting at i repeats.
**further edit*
We do this by incrementing k until k == p, checking that the k'th character of the string starting at i equals the k'th character of the string starting at j. Once k reaches p we start scanning again at the next supposed occurrence of the string.
Even further edit to attempt to answer jethro's questions.
First: the k != p in else if (a==b && k!=p) Here we have a match in that the k'th and all previous characters in the strings starting at i and j are equal. The variable p represents the length that we think that the repeating string is. When k != p, actually k < p, so we are ensuring that the p characters at the string beginning at i are the same as the p characters of the string beginning at j. When k == p (the final else) we should be at a point where the string starting at j + k looks the same as the string starting at j, so we increase j by p and set k back to 1 and go back to comparing the two strings.
Second: Yes, I believe you are correct, it should return i. I was misunderstanding the meaning of "Minimum Cyclic Shift"
It may be the same as this algorithm, whose explanation can be found here:
int ComputeMaxSufPos(string w)
{
int i = 0, n = w.Length;
for (int j = 1; j < n; ++j)
{
int c, k = 0;
while ((c = w[(i + k) % n].CompareTo(w[(j + k) % n])) == 0 && k != n)
{ k++; }
j += c > 0 ? k / (j - i) * (j - i) : k;
i = c > 0 ? j : i;
}
return i;
}

Resources