While converting a project from Python to C#, I found some interesting differences in the syntax families. Yet I got stuck, still unable to understand and comprehend the dissimilar behavior of comparison operator in C#.
During the course of curing this curiosity, I considered few languages of C-syntax family; C, C++, C#, Java, Javascript.. and verified the behavior. Here is how it transpired:
Let a=2, b=3, c=4, d=5;
Now, consider the following expressions:
a < a < a // returns true
c < b < a // returns true
c > b > a // returns false
a < c > b // returns false
If it was due to the right-associativity, then the following code in JavaScript shouldn't act like:
console.info(a < false); // returns false
console.info(a < a); // returns false
console.info(a < a < a); // returns true, as opposed to returning false
Here is the C/C++ version
int main(){
int a=2, b=3, c=4, d=5;
printf("%s\n","false\0true"+6*(a < a < a)); // returns true
printf("%s\n","false\0true"+6*(c < b < a)); // returns true
printf("%s\n","false\0true"+6*(c > b > a)); // returns false
printf("%s\n","false\0true"+6*(a < c > b)); // returns false
return 0;
}
Except in Python, where
a < a < a // returns false
c < b < a // returns false
c > b > a // returns true
a < c > b // returns true
Can anyone explain why C-family of languages and Python are computing the expressions differently?
Because Python uses a slightly interpretation of your input:
Formally, if a, b, c, ..., y, z are expressions and op1, op2, ..., opN are comparison operators, then a op1 b op2 c ... y opN z is equivalent to a op1 b and b op2 c and ... y opN z, except that each expression is evaluated at most once.
This means your lines will be interpreted as
a < a < a = a < a and a < a // returns false
c < b < a = c < b and b < a // returns false
c > b > a = c > b and b > a // returns true
a < c > b = a < c and c > b // returns true
In C-style languages, an comparison expression will evaluate to either false (integer value 0) or true (integer value 1). So in C it will behave like
a < a < a = (a < a) < a = 0 < a // returns true
c < b < a = (c < b) < a = 0 < a // returns true
c > b > a = (c > b) > a = 1 > a // returns false
a < c > b = (a < c) > b = 0 > b // returns false
Note that almost all languages define operators with a boolean return value, but since boolean values can be implicit converted to zero or one the proposition above is still valid:
// C++ example
struct myComparableObject{
int data;
bool operator<(const myComparableObject& o){
return data < o.data;
}
};
myComparableObject a, b;
a.data = 2;
b.data = 3;
int c = 5;
a < b; // true
a < c; // error, a cannot be converted to int / unknown operator
a.data < c; // true
a < b < c; // true, as this is equal to
// (a < b) < c = false < c = 0 < c
For example JavaScript will actually use ToNumber in order to compare two non-string objects, see [ECMAScript p78, 11.8.5 The Abstract Relational Comparison Algorithm], where ToNumber(false) is zero and ToNumber(true) === 1.
The comparison x < y, where x and y are values, produces true, false, or undefined[...]
Let px be the result of calling ToPrimitive(x, hint Number).
Let py be the result of calling ToPrimitive(y, hint Number).
If it is not the case that both Type(px) is String and Type(py) is String, then
a. Let nx be the result of calling ToNumber(px). Because px and py are primitive values evaluation
order is not important.
b. Let ny be the result of calling ToNumber(py).
c. If nx is NaN, return undefined.
d. If ny is NaN, return undefined.
e. If nx and ny are the same Number value, return false.
f. If nx is +0 and ny is -0, return false.
g. If nx is -0 and ny is +0, return false.
h. If nx is +infty, return false.
i. If ny is +infty, return true.
j. If ny is -infty, return false.
k. If nx is -infty, return true.
l. If the mathematical value of nx is less than the mathematical value of ny —note that these
mathematical values are both finite and not both zero— return true. Otherwise, return false.
It's because a < a < a is evaluated like ((a < a) < a) which then becomes 0 < a which is true (when a >= 0)
If you run the following, the first expression changes to false.
int main(){
int a=0, b=3, c=4, d=5;
printf("%s\n","false\0true"+6*(a < a < a)); // returns false
printf("%s\n","false\0true"+6*(c < b < a)); // returns true
printf("%s\n","false\0true"+6*(c > b > a)); // returns false
printf("%s\n","false\0true"+6*(a < c > b)); // returns false
return 0;
}
Whereas in Python, a < a < a becomes a < a and a < a. So it's not comparing the result of a < a. If you add parentheses to your statements, you'll get the C-like behavior again.
(a < a) < a ## returns true
(c < b) < a ## returns true
(c > b) > a ## returns false
(a < c) > b ## returns false
Related
I'm trying to prove the gcd algorithm with Dafny and it's apparently not that simple. What I have so far (not much indeed) is a functional specification, and Dafny manages to prove that compute_gcd behaves like it. However, when I remove the comments [1], [2] and [3] Dafny fails
to prove that the functional specification has the desired properties:
function gcd(a: int, b: int) : (result : int)
requires a > 0
requires b > 0
// [1] ensures (exists q1:int :: (q1 * result == a))
// [2] ensures (exists q2:int :: (q2 * result == b))
// [3] ensures forall d :int, q1:int, q2:int :: ((q1*d==a)&&(q2*d==b)) ==> (exists q3:int :: (q3*d == result))
{
if (a > b) then gcd(a-b,b) else
if (b > a) then gcd(a,b-a) else a
}
method compute_gcd(a: int, b: int) returns (result: int)
requires a > 0
requires b > 0
ensures result == gcd(a,b)
{
var x := a;
var y := b;
while (x != y)
decreases x+y
invariant x > 0
invariant y > 0
invariant gcd(x,y) == gcd(a,b)
{
if (x > y) { x := x - y; }
if (y > x) { y := y - x; }
}
return x;
}
Am I going in the right direction?
any help is very much appreciated, thanks!
I managed to prove a weaker gcd specification (permalink here), but I'm still having a hard time with property [3] above:
function gcd(a: int, b: int) : (result : int)
requires a > 0
requires b > 0
// [1] ensures (exists q1:int :: (q1 * result == a))
// [2] ensures (exists q2:int :: (q2 * result == b))
{
if (a > b) then gcd(a-b,b) else
if (b > a) then gcd(a,b-a) else a
}
lemma gcd_correct(a: int, b: int)
requires a > 0
requires b > 0
ensures (exists q1:int :: (q1 * gcd(a,b) == a))
ensures (exists q2:int :: (q2 * gcd(a,b) == b))
{
if (a > b)
{
gcd_correct(a-b, b);
var q1 :| q1 * gcd(a-b,b) == a-b;
var q2 :| q2 * gcd(a-b,b) == b;
assert (q1+q2) * gcd(a,b) == a;
}
else if (b > a)
{
gcd_correct(a,b-a);
var q1 :| q1 * gcd(a,b-a) == a;
var q2 :| q2 * gcd(a,b-a) == b-a;
assert (q2+q1) * gcd(a,b) == b;
}
else
{
assert 1 * gcd(a,b) == a;
}
}
method compute_gcd(a: int, b: int) returns (result: int)
requires a > 0
requires b > 0
ensures result == gcd(a,b)
ensures (exists q1:int :: (q1 * result == a))
ensures (exists q2:int :: (q2 * result == b))
{
var x := a;
var y := b;
while (x != y)
decreases x+y
invariant x > 0
invariant y > 0
invariant gcd(x,y) == gcd(a,b)
{
if (x > y) { x := x - y; }
if (y > x) { y := y - x; }
}
gcd_correct(a,b);
return x;
}
Any tips?
You might take a look at the GCD algorithm in the Dafny test suite (in Test/VerifyThis2015/Problem2.dfy) and compare it with your approach:
I have found this interesting dynamic-programming problem and want to know the approach .
We are given an array 'a' of size-'n'.
Each element of the array is either '1' or '2'.
We start at index '0' . If a[i]=1 , we can go to i+1 or i-1.
On the contrary, If a[i]=2 , we can go to i+1 or i+2 or i-1 or i-2.
We have to find the number of all possible paths .
**Main Constraint ** : - 1) We can go to a particular index in an array only once .
2) We always start at the index-'0' .
3) A path can end anytime we want :- )
Example array : --> [1,1,1,1]
Answer : - 4
1ST possible path : [0]
2ND possible path : [0,1]
3rd possible path : [0,1,2]
4th possible path : [0,1,2,3]
Another example : -
[2,2,2]
Answer:- 5
Paths : - [0],[0,1],[0,1,2] , [0,2,1] , [0,2] .
(This question is divided into-3-parts!)
Value(s) of n are in range : - 1) [1,100000]
2) [1,10]
3)[1,1000]
Consider used spaces.
0 1 2 3 4 5 6
^
In order to reach a number from the right, the cell just before it must have been used. Therefore, all the ways to end with x coming from the left cannot include numbers from the right. And all the ways to end with x coming from the right used x-1 and a set of moves to the right of x disjoint from the left side.
Let f(A, x) = l(A, x) + r(A, x), where l(A, x) represents all ways to end at x coming from the left; r(A, x), coming from the right.
To obtain l(A, x), we need:
(1) all ways to reach (x-1)
= l(A, x-1)
(there are no numbers used to
the right of x, and since
x is used last, we could not
have reached x-1 from the right.)
(2) all ways to reach (x-2):
cleary we need l(A, x-2). Now
to reach (x-2) from the right,
the only valid path would have
been ...(x-3)->(x-1)->(x-2)
which equals the number of ways
to reach (x-3) from the left.
= l(A, x-2) + l(A, x-3)
To obtain r(A, x), we need:
(1) all ways to reach (x+1) so as
to directly go from there to x
= l(A, x-1)
(We can only reach (x+1) from (x-1).)
(2) all ways to reach (x+2) after
starting at (x+1)
= l(A, x-1) * f(A[x+1...], 1)
(To get to the starting point in
A[x+1...], we must first get to
(x-1).)
So it seems that
f(A, x) = l(A, x) + r(A, x)
l(A, x) =
l(A, x-1) + l(A, x-2) + l(A, x-3)
r(A, x) =
l(A, x-1) + l(A, x-1) * f(A[x+1...], 1)
The JavaScript code below tries a different 7-element array each time we run it. I leave memoisation and optimisation to the reader (for efficiently tabling f(_, 1), notice that l(_, 1) = 1).
function f(A, x){
if (x < 0 || x > A.length - 1)
return 0
return l(A, x) + r(A, x)
function l(A, x){
if (x < 0 || x > A.length - 1)
return 0
if (x == 0)
return 1
let result = l(A, x-1)
if (A[x-2] && A[x-2] == 2){
result += l(A, x-2)
if (A[x-3] && A[x-3] == 2)
result += l(A, x-3)
}
return result
}
function r(A, x){
if (x < 0 || x >= A.length - 1 || !(A[x-1] && A[x-1] == 2))
return 0
let result = l(A, x-1)
if (A[x+2] && A[x+2] == 2)
result += l(A, x-1) * f(A.slice(x+1), 1)
return result
}
}
function validate(A){
let n = A.length
function g(i, s){
if (debug)
console.log(s)
let result = 1
let [a, b] = [i+1, i-1]
if (a < n && !s.includes(a))
result += g(a, s.slice().concat(a))
if (b >= 0 && !s.includes(b))
result += g(b, s.slice().concat(b))
if (A[i] == 2){
[a, b] = [i+2, i-2]
if (a < n && !s.includes(a))
result += g(a, s.slice().concat(a))
if (b >= 0 && !s.includes(b))
result += g(b, s.slice().concat(b))
}
return result
}
return g(0, [0])
}
let debug = false
let arr = []
let n = 7
for (let i=0; i<n; i++)
arr[i] = Math.ceil(Math.random() * 2)
console.log(JSON.stringify(arr))
console.log('')
let res = 0
for (let x=0; x<arr.length; x++){
let c = f(arr, x)
if (debug)
console.log([x, c])
res += c
}
if (debug)
console.log('')
let v = validate(arr)
if (debug)
console.log('')
console.log(v)
console.log(res)
Given a string and the constraint of matching on >= 3 characters, how can you ensure that the result string will be as small as possible?
edit with gassa's explicitness:
E.G.
'AAAABBBAC'
If I remove the B's first,
AAAA[BBB]AC -- > AAAAAC, then I can remove all of the A's from the resultant string and be left with:
[AAAAA]C --> C
'C'
If I just remove what is available first (the sequence of A's), I get:
[AAAA]BBBAC -- > [BBB]AC --> AC
'AC'
A tree would definitely get you the shortest string(s).
The tree solution:
Define a State (node) for each current string Input and all its removable sub-strings' int[] Indexes.
Create the tree: For each int index create another State and add it to the parent state State[] Children.
A State with no possible removable sub-strings has no children Children = null.
Get all Descendants State[] of your root State. Order them by their shortest string Input. And that is/are your answer(s).
Test cases:
string result = FindShortest("AAAABBBAC"); // AC
string result2 = FindShortest("AABBAAAC"); // AABBC
string result3 = FindShortest("BAABCCCBBA"); // B
The Code:
Note: Of-course everyone is welcome to enhance the following code in terms of performance and/or fixing any bug.
class Program
{
static void Main(string[] args)
{
string result = FindShortest("AAAABBBAC"); // AC
string result2 = FindShortest("AABBAAAC"); // AABBC
string result3 = FindShortest("BAABCCCBBA"); // B
}
// finds the FIRST shortest string for a given input
private static string FindShortest(string input)
{
// all possible removable strings' indexes
// for this given input
int[] indexes = RemovableIndexes(input);
// each input string and its possible removables are a state
var state = new State { Input = input, Indexes = indexes };
// create the tree
GetChildren(state);
// get the FIRST shortest
// i.e. there would be more than one answer sometimes
// this could be easily changed to get all possible results
var result =
Descendants(state)
.Where(d => d.Children == null || d.Children.Length == 0)
.OrderBy(d => d.Input.Length)
.FirstOrDefault().Input;
return result;
}
// simple get all descendants of a node/state in a tree
private static IEnumerable<State> Descendants(State root)
{
var states = new Stack<State>(new[] { root });
while (states.Any())
{
State node = states.Pop();
yield return node;
if (node.Children != null)
foreach (var n in node.Children) states.Push(n);
}
}
// creates the tree
private static void GetChildren(State state)
{
// for each an index there is a child
state.Children = state.Indexes.Select(
i =>
{
var input = RemoveAllAt(state.Input, i);
return input.Length < state.Input.Length && input.Length > 0
? new State
{
Input = input,
Indexes = RemovableIndexes(input)
}
: null;
}).ToArray();
foreach (var c in state.Children)
GetChildren(c);
}
// find all possible removable strings' indexes
private static int[] RemovableIndexes(string input)
{
var indexes = new List<int>();
char d = input[0];
int count = 1;
for (int i = 1; i < input.Length; i++)
{
if (d == input[i])
count++;
else
{
if (count >= 3)
indexes.Add(i - count);
// reset
d = input[i];
count = 1;
}
}
if (count >= 3)
indexes.Add(input.Length - count);
return indexes.ToArray();
}
// remove all duplicate chars starting from an index
private static string RemoveAllAt(string input, int startIndex)
{
string part1, part2;
int endIndex = startIndex + 1;
int i = endIndex;
for (; i < input.Length; i++)
if (input[i] != input[startIndex])
{
endIndex = i;
break;
}
if (i == input.Length && input[i - 1] == input[startIndex])
endIndex = input.Length;
part1 = startIndex > 0 ? input.Substring(0, startIndex) : string.Empty;
part2 = endIndex <= (input.Length - 1) ? input.Substring(endIndex) : string.Empty;
return part1 + part2;
}
// our node, which is
// an input string &
// all possible removable strings' indexes
// & its children
public class State
{
public string Input;
public int[] Indexes;
public State[] Children;
}
}
I propose O(n^2) solution with dynamic programming.
Let's introduce notation. Prefix and suffix of length l of string A denoted by P[l] and S[l]. And we call our procedure Rcd.
Rcd(A) = Rcd(Rcd(P[n-1])+S[1])
Rcd(A) = Rcd(P[1]+Rcd(S[n-1]))
Note that outer Rcd in the RHS is trivial. So, that's our optimal substructure. Based on this i came up with the following implementation:
#include <iostream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;
string remdupright(string s, bool allowEmpty) {
if (s.size() >= 3) {
auto pos = s.find_last_not_of(s.back());
if (pos == string::npos && allowEmpty) s = "";
else if (pos != string::npos && s.size() - pos > 3) s = s.substr(0, pos + 1);
}
return s;
}
string remdupleft(string s, bool allowEmpty) {
if (s.size() >= 3) {
auto pos = s.find_first_not_of(s.front());
if (pos == string::npos && allowEmpty) s = "";
else if (pos != string::npos && pos >= 3) s = s.substr(pos);
}
return s;
}
string remdup(string s, bool allowEmpty) {
return remdupleft(remdupright(s, allowEmpty), allowEmpty);
}
string run(const string in) {
vector<vector<string>> table(in.size());
for (int i = 0; i < (int)table.size(); ++i) {
table[i].resize(in.size() - i);
}
for (int i = 0; i < (int)table[0].size(); ++i) {
table[0][i] = in.substr(i,1);
}
for (int len = 2; len <= (int)table.size(); ++len) {
for (int pos = 0; pos < (int)in.size() - len + 1; ++pos) {
string base(table[len - 2][pos]);
const char suffix = in[pos + len - 1];
if (base.size() && suffix != base.back()) {
base = remdupright(base, false);
}
const string opt1 = base + suffix;
base = table[len - 2][pos+1];
const char prefix = in[pos];
if (base.size() && prefix != base.front()) {
base = remdupleft(base, false);
}
const string opt2 = prefix + base;
const string nodupopt1 = remdup(opt1, true);
const string nodupopt2 = remdup(opt2, true);
table[len - 1][pos] = nodupopt1.size() > nodupopt2.size() ? opt2 : opt1;
assert(nodupopt1.size() != nodupopt2.size() || nodupopt1 == nodupopt2);
}
}
string& res = table[in.size() - 1][0];
return remdup(res, true);
}
void testRcd(string s, string expected) {
cout << s << " : " << run(s) << ", expected: " << expected << endl;
}
int main()
{
testRcd("BAABCCCBBA", "B");
testRcd("AABBAAAC", "AABBC");
testRcd("AAAA", "");
testRcd("AAAABBBAC", "C");
}
You can check default and run your tests here.
Clearly we are not concerned about any block of repeated characters longer than 2 characters. And there is only one way two blocks of the same character where at least one of the blocks is less than 3 in length can be combined - namely, if the sequence between them can be removed.
So (1) look at pairs of blocks of the same character where at least one is less than 3 in length, and (2) determine if the sequence between them can be removed.
We want to decide which pairs to join so as to minimize the total length of blocks less than 3 characters long. (Note that the number of pairs is bound by the size (and distribution) of the alphabet.)
Let f(b) represent the minimal total length of same-character blocks remaining up to the block b that are less than 3 characters in length. Then:
f(b):
p1 <- previous block of the same character
if b and p1 can combine:
if b.length + p1.length > 2:
f(b) = min(
// don't combine
(0 if b.length > 2 else b.length) +
f(block before b),
// combine
f(block before p1)
)
// b.length + p1.length < 3
else:
p2 <- block previous to p1 of the same character
if p1 and p2 can combine:
f(b) = min(
// don't combine
b.length + f(block before b),
// combine
f(block before p2)
)
else:
f(b) = b.length + f(block before b)
// b and p1 cannot combine
else:
f(b) = b.length + f(block before b)
for all p1 before b
The question is how can we efficiently determine if a block can be combined with the previous block of the same character (aside from the obvious recursion into the sub-block-list between the two blocks).
Python code:
import random
import time
def parse(length):
return length if length < 3 else 0
def f(string):
chars = {}
blocks = [[string[0], 1, 0]]
chars[string[0]] = {'indexes': [0]}
chars[string[0]][0] = {'prev': -1}
p = 0 # pointer to current block
for i in xrange(1, len(string)):
if blocks[len(blocks) - 1][0] == string[i]:
blocks[len(blocks) - 1][1] += 1
else:
p += 1
# [char, length, index, f(i), temp]
blocks.append([string[i], 1, p])
if string[i] in chars:
chars[string[i]][p] = {'prev': chars[string[i]]['indexes'][ len(chars[string[i]]['indexes']) - 1 ]}
chars[string[i]]['indexes'].append(p)
else:
chars[string[i]] = {'indexes': [p]}
chars[string[i]][p] = {'prev': -1}
#print blocks
#print
#print chars
#print
memo = [[None for j in xrange(len(blocks))] for i in xrange(len(blocks))]
def g(l, r, top_level=False):
####
####
#print "(l, r): (%s, %s)" % (l,r)
if l == r:
return parse(blocks[l][1])
if memo[l][r]:
return memo[l][r]
result = [parse(blocks[l][1])] + [None for k in xrange(r - l)]
if l < r:
for i in xrange(l + 1, r + 1):
result[i - l] = parse(blocks[i][1]) + result[i - l - 1]
for i in xrange(l, r + 1):
####
####
#print "\ni: %s" % i
[char, length, index] = blocks[i]
#p1 <- previous block of the same character
p1_idx = chars[char][index]['prev']
####
####
#print "(p1_idx, l, p1_idx >= l): (%s, %s, %s)" % (p1_idx, l, p1_idx >= l)
if p1_idx < l and index > l:
result[index - l] = parse(length) + result[index - l - 1]
while p1_idx >= l:
p1 = blocks[p1_idx]
####
####
#print "(b, p1, p1_idx, l): (%s, %s, %s, %s)\n" % (blocks[i], p1, p1_idx, l)
between = g(p1[2] + 1, index - 1)
####
####
#print "between: %s" % between
#if b and p1 can combine:
if between == 0:
if length + p1[1] > 2:
result[index - l] = min(
result[index - l],
# don't combine
parse(length) + (result[index - l - 1] if index - l > 0 else 0),
# combine: f(block before p1)
result[p1[2] - l - 1] if p1[2] > l else 0
)
# b.length + p1.length < 3
else:
#p2 <- block previous to p1 of the same character
p2_idx = chars[char][p1[2]]['prev']
if p2_idx < l:
p1_idx = chars[char][p1_idx]['prev']
continue
between2 = g(p2_idx + 1, p1[2] - 1)
#if p1 and p2 can combine:
if between2 == 0:
result[index - l] = min(
result[index - l],
# don't combine
parse(length) + (result[index - l - 1] if index - l > 0 else 0),
# combine the block, p1 and p2
result[p2_idx - l - 1] if p2_idx - l > 0 else 0
)
else:
#f(b) = b.length + f(block before b)
result[index - l] = min(
result[index - l],
parse(length) + (result[index - l - 1] if index - l > 0 else 0)
)
# b and p1 cannot combine
else:
#f(b) = b.length + f(block before b)
result[index - l] = min(
result[index - l],
parse(length) + (result[index - l - 1] if index - l > 0 else 0)
)
p1_idx = chars[char][p1_idx]['prev']
#print l,r,result
memo[l][r] = result[r - l]
"""if top_level:
return (result, blocks)
else:"""
return result[r - l]
if len(blocks) == 1:
return ([parse(blocks[0][1])], blocks)
else:
return g(0, len(blocks) - 1, True)
"""s = ""
for i in xrange(300):
s = s + ['A','B','C'][random.randint(0,2)]"""
print f("abcccbcccbacccab") # b
print
print f("AAAABBBAC"); # C
print
print f("CAAAABBBA"); # C
print
print f("AABBAAAC"); # AABBC
print
print f("BAABCCCBBA"); # B
print
print f("aaaa")
print
The string answers for these longer examples were computed using jdehesa's answer:
t0 = time.time()
print f("BCBCCBCCBCABBACCBABAABBBABBBACCBBBAABBACBCCCACABBCAABACBBBBCCCBBAACBAABACCBBCBBAABCCCCCAABBBBACBBAAACACCBCCBBBCCCCCCCACBABACCABBCBBBBBCBABABBACCAACBCBBAACBBBBBCCBABACBBABABAAABCCBBBAACBCACBAABAAAABABB")
# BCBCCBCCBCABBACCBABCCAABBACBACABBCAABACAACBAABACCBBCBBCACCBACBABACCABBCCBABABBACCAACBCBBAABABACBBABABBCCAACBCACBAABBABB
t1 = time.time()
total = t1-t0
print total
t0 = time.time()
print f("CBBACAAAAABBBBCAABBCBAABBBCBCBCACACBAABCBACBBABCABACCCCBACBCBBCBACBBACCCBAAAACACCABAACCACCBCBCABAACAABACBABACBCBAACACCBCBCCCABACABBCABBAAAAABBBBAABAABBCACACABBCBCBCACCCBABCAACBCAAAABCBCABACBABCABCBBBBABCBACABABABCCCBBCCBBCCBAAABCABBAAABBCAAABCCBAABAABCAACCCABBCAABCBCBCBBAACCBBBACBBBCABAABCABABABABCA")
# CBBACCAABBCBAACBCBCACACBAABCBACBBABCABABACBCBBCBACBBABCACCABAACCACCBCBCABAACAABACBABACBCBAACACCBCBABACABBCBBCACACABBCBCBCABABCAACBCBCBCABACBABCABCABCBACABABACCBBCCBBCACBCCBAABAABCBBCAABCBCBCBBAACCACCABAABCABABABABCA
t1 = time.time()
total = t1-t0
print total
t0 = time.time()
print f("AADBDBEBBBBCABCEBCDBBBBABABDCCBCEBABADDCABEEECCECCCADDACCEEAAACCABBECBAEDCEEBDDDBAAAECCBBCEECBAEBEEEECBEEBDACDDABEEABEEEECBABEDDABCDECDAABDAEADEECECEBCBDDAEEECCEEACCBBEACDDDDBDBCCAAECBEDAAAADBEADBAAECBDEACDEABABEBCABDCEEAABABABECDECADCEDAEEEBBBCEDECBCABDEDEBBBABABEEBDAEADBEDABCAEABCCBCCEDCBBEBCECCCA")
# AADBDBECABCEBCDABABDCCBCEBABADDCABCCEADDACCEECCABBECBAEDCEEBBECCBBCEECBAEBCBEEBDACDDABEEABCBABEDDABCDECDAABDAEADEECECEBCBDDACCEEACCBBEACBDBCCAAECBEDDBEADBAAECBDEACDEABABEBCABDCEEAABABABECDECADCEDACEDECBCABDEDEABABEEBDAEADBEDABCAEABCCBCCEDCBBEBCEA
t1 = time.time()
total = t1-t0
print total
Another scala answer, using memoization and tailcall optimization (partly) (updated).
import scala.collection.mutable.HashSet
import scala.annotation._
object StringCondense extends App {
#tailrec
def groupConsecutive (s: String, sofar: List[String]): List[String] = s.toList match {
// def groupConsecutive (s: String): List[String] = s.toList match {
case Nil => sofar
// case Nil => Nil
case c :: str => {
val (prefix, rest) = (c :: str).span (_ == c)
// Strings of equal characters, longer than 3, don't make a difference to just 3
groupConsecutive (rest.mkString(""), (prefix.take (3)).mkString ("") :: sofar)
// (prefix.take (3)).mkString ("") :: groupConsecutive (rest.mkString(""))
}
}
// to count the effect of memoization
var count = 0
// recursively try to eliminate every group of 3 or more, brute forcing
// but for "aabbaabbaaabbbaabb", many reductions will lead sooner or
// later to the same result, so we try to detect these and avoid duplicate
// work
def moreThan2consecutive (s: String, seenbefore: HashSet [String]): String = {
if (seenbefore.contains (s)) s
else
{
count += 1
seenbefore += s
val sublists = groupConsecutive (s, Nil)
// val sublists = groupConsecutive (s)
val atLeast3 = sublists.filter (_.size > 2)
atLeast3.length match {
case 0 => s
case 1 => {
val res = sublists.filter (_.size < 3)
moreThan2consecutive (res.mkString (""), seenbefore)
}
case _ => {
val shrinked = (
for {idx <- (0 until sublists.size)
if (sublists (idx).length >= 3)
pre = (sublists.take (idx)).mkString ("")
post= (sublists.drop (idx+1)).mkString ("")
} yield {
moreThan2consecutive (pre + post, seenbefore)
}
)
(shrinked.head /: shrinked.tail) ((a, b) => if (a.length <= b.length) a else b)
}
}
}
}
// don't know what Rcd means, adopted from other solution but modified
// kind of a unit test **update**: forgot to reset count
testRcd (s: String, expected: String) : Boolean = {
count = 0
val seenbefore = HashSet [String] ()
val result = moreThan2consecutive (s, seenbefore)
val hit = result.equals (expected)
println (s"Input: $s\t result: ${result}\t expected ${expected}\t $hit\t count: $count");
hit
}
// some test values from other users with expected result
// **upd:** more testcases
def testgroup () : Unit = {
testRcd ("baabcccbba", "b")
testRcd ("aabbaaac", "aabbc")
testRcd ("aaaa", "")
testRcd ("aaaabbbac", "c")
testRcd ("abcccbcccbacccab", "b")
testRcd ("AAAABBBAC", "C")
testRcd ("CAAAABBBA", "C")
testRcd ("AABBAAAC", "AABBC")
testRcd ("BAABCCCBBA", "B")
testRcd ("AAABBBAAABBBAAABBBC", "C") // 377 subcalls reported by Yola,
testRcd ("AAABBBAAABBBAAABBBAAABBBC", "C") // 4913 when preceeded with AAABBB
}
testgroup
def testBigs () : Unit = {
/*
testRcd ("BCBCCBCCBCABBACCBABAABBBABBBACCBBBAABBACBCCCACABBCAABACBBBBCCCBBAACBAABACCBBCBBAABCCCCCAABBBBACBBAAACACCBCCBBBCCCCCCCACBABACCABBCBBBBBCBABABBACCAACBCBBAACBBBBBCCBABACBBABABAAABCCBBBAACBCACBAABAAAABABB",
"BCBCCBCCBCABBACCBABCCAABBACBACABBCAABACAACBAABACCBBCBBCACCBACBABACCABBCCBABABBACCAACBCBBAABABACBBABABBCCAACBCACBAABBABB")
*/
testRcd ("CBBACAAAAABBBBCAABBCBAABBBCBCBCACACBAABCBACBBABCABACCCCBACBCBBCBACBBACCCBAAAACACCABAACCACCBCBCABAACAABACBABACBCBAACACCBCBCCCABACABBCABBAAAAABBBBAABAABBCACACABBCBCBCACCCBABCAACBCAAAABCBCABACBABCABCBBBBABCBACABABABCCCBBCCBBCCBAAABCABBAAABBCAAABCCBAABAABCAACCCABBCAABCBCBCBBAACCBBBACBBBCABAABCABABABABCA",
"CBBACCAABBCBAACBCBCACACBAABCBACBBABCABABACBCBBCBACBBABCACCABAACCACCBCBCABAACAABACBABACBCBAACACCBCBABACABBCBBCACACABBCBCBCABABCAACBCBCBCABACBABCABCABCBACABABACCBBCCBBCACBCCBAABAABCBBCAABCBCBCBBAACCACCABAABCABABABABCA")
/*testRcd ("AADBDBEBBBBCABCEBCDBBBBABABDCCBCEBABADDCABEEECCECCCADDACCEEAAACCABBECBAEDCEEBDDDBAAAECCBBCEECBAEBEEEECBEEBDACDDABEEABEEEECBABEDDABCDECDAABDAEADEECECEBCBDDAEEECCEEACCBBEACDDDDBDBCCAAECBEDAAAADBEADBAAECBDEACDEABABEBCABDCEEAABABABECDECADCEDAEEEBBBCEDECBCABDEDEBBBABABEEBDAEADBEDABCAEABCCBCCEDCBBEBCECCCA",
"AADBDBECABCEBCDABABDCCBCEBABADDCABCCEADDACCEECCABBECBAEDCEEBBECCBBCEECBAEBCBEEBDACDDABEEABCBABEDDABCDECDAABDAEADEECECEBCBDDACCEEACCBBEACBDBCCAAECBEDDBEADBAAECBDEACDEABABEBCABDCEEAABABABECDECADCEDACEDECBCABDEDEABABEEBDAEADBEDABCAEABCCBCCEDCBBEBCEA")
*/
}
// for generated input, but with fixed seed, to compare the count with
// and without memoization
import util.Random
val r = new Random (31415)
// generate Strings but with high chances to produce some triples and
// longer sequences of char clones
def genRandomString () : String = {
(1 to 20).map (_ => r.nextInt (6) match {
case 0 => "t"
case 1 => "r"
case 2 => "-"
case 3 => "tt"
case 4 => "rr"
case 5 => "--"
}).mkString ("")
}
def testRandom () : Unit = {
(1 to 10).map (i=> testRcd (genRandomString, "random mode - false might be true"))
}
testRandom
testgroup
testRandom
// testBigs
}
Comparing the effect of memoization lead to interesting results:
Updated measurements. In the old values, I forgot to reset the counter, which leaded to much higher results. Now the spreading of results
is much more impressive and in total, the values are smaller.
No seenbefore:
Input: baabcccbba result: b expected b true count: 4
Input: aabbaaac result: aabbc expected aabbc true count: 2
Input: aaaa result: expected true count: 2
Input: aaaabbbac result: c expected c true count: 5
Input: abcccbcccbacccab result: b expected b true count: 34
Input: AAAABBBAC result: C expected C true count: 5
Input: CAAAABBBA result: C expected C true count: 5
Input: AABBAAAC result: AABBC expected AABBC true count: 2
Input: BAABCCCBBA result: B expected B true count: 4
Input: AAABBBAAABBBAAABBBC res: C expected C true count: 377
Input: AAABBBAAABBBAAABBBAAABBBC r: C expected C true count: 4913
Input: r--t----ttrrrrrr--tttrtttt--rr----result: rr--rr expected ? unknown ? false count: 1959
Input: ttrtt----tr---rrrtttttttrtr--rr result: r--rr expected ? unknown ? false count: 213
Input: tt----r-----ttrr----ttrr-rr--rr-- result: ttrttrrttrr-rr--rr-- ex ? unknown ? false count: 16
Input: --rr---rrrrrrr-r--rr-r--tt--rrrrr result: rr-r--tt-- expected ? unknown ? false count: 32
Input: tt-rrrrr--r--tt--rrtrrr------- result: ttr--tt--rrt expected ? unknown ? false count: 35
Input: --t-ttt-ttt--rrrrrt-rrtrttrr result: --tt-rrtrttrr expected ? unknown ? false count: 35
Input: rrt--rrrr----trrr-rttttrrtttrr result: rrtt- expected ? unknown ? false count: 1310
Input: ---tttrrrrrttrrttrr---tt-----tt result: rrttrr expected ? unknown ? false count: 1011
Input: -rrtt--rrtt---t-r--r---rttr-- result: -rrtt--rr-r--rrttr-- ex ? unknown ? false count: 9
Input: rtttt--rrrrrrrt-rrttt--tt--t result: r--t-rr--tt--t expectd ? unknown ? false count: 16
real 0m0.607s (without testBigs)
user 0m1.276s
sys 0m0.056s
With seenbefore:
Input: baabcccbba result: b expected b true count: 4
Input: aabbaaac result: aabbc expected aabbc true count: 2
Input: aaaa result: expected true count: 2
Input: aaaabbbac result: c expected c true count: 5
Input: abcccbcccbacccab result: b expected b true count: 11
Input: AAAABBBAC result: C expected C true count: 5
Input: CAAAABBBA result: C expected C true count: 5
Input: AABBAAAC result: AABBC expected AABBC true count: 2
Input: BAABCCCBBA result: B expected B true count: 4
Input: AAABBBAAABBBAAABBBC rest: C expected C true count: 28
Input: AAABBBAAABBBAAABBBAAABBBC C expected C true count: 52
Input: r--t----ttrrrrrr--tttrtttt--rr----result: rr--rr expected ? unknown ? false count: 63
Input: ttrtt----tr---rrrtttttttrtr--rr result: r--rr expected ? unknown ? false count: 48
Input: tt----r-----ttrr----ttrr-rr--rr-- result: ttrttrrttrr-rr--rr-- xpe? unknown ? false count: 8
Input: --rr---rrrrrrr-r--rr-r--tt--rrrrr result: rr-r--tt-- expected ? unknown ? false count: 19
Input: tt-rrrrr--r--tt--rrtrrr------- result: ttr--tt--rrt expected ? unknown ? false count: 12
Input: --t-ttt-ttt--rrrrrt-rrtrttrr result: --tt-rrtrttrr expected ? unknown ? false count: 16
Input: rrt--rrrr----trrr-rttttrrtttrr result: rrtt- expected ? unknown ? false count: 133
Input: ---tttrrrrrttrrttrr---tt-----tt result: rrttrr expected ? unknown ? false count: 89
Input: -rrtt--rrtt---t-r--r---rttr-- result: -rrtt--rr-r--rrttr-- ex ? unknown ? false count: 6
Input: rtttt--rrrrrrrt-rrttt--tt--t result: r--t-rr--tt--t expected ? unknown ? false count: 8
real 0m0.474s (without testBigs)
user 0m0.852s
sys 0m0.060s
With tailcall:
real 0m0.478s (without testBigs)
user 0m0.860s
sys 0m0.060s
For some random strings, the difference is bigger than a 10fold.
For long Strings with many groups one could, as an improvement, eliminate all groups which are the only group of that character, for instance:
aa bbb aa ccc xx ddd aa eee aa fff xx
The groups bbb, ccc, ddd, eee and fff are unique in the string, so they can't fit to something else and could all be eliminated, and the order of removal is will not matter. This would lead to the intermediate result
aaaa xx aaaa xx
and a fast solution. Maybe I try to implement it too. However, I guess, it will be possible to produce random Strings, where this will have a big impact and by a different form of random generated strings, to distributions, where the impact is low.
Here is a Python solution (function reduce_min), not particularly smart but I think fairly easy to understand (excessive amount of comments added for answer clarity):
def reductions(s, min_len):
"""
Yields every possible reduction of s by eliminating contiguous blocks
of l or more repeated characters.
For example, reductions('AAABBCCCCBAAC', 3) yields
'BBCCCCBAAC' and 'AAABBBAAC'.
"""
# Current character
curr = ''
# Length of current block
n = 0
# Start position of current block
idx = 0
# For each character
for i, c in enumerate(s):
if c != curr:
# New block begins
if n >= min_len:
# If previous block was long enough
# yield reduced string without it
yield s[:idx] + s[i:]
# Start new block
curr = c
n = 1
idx = i
else:
# Still in the same block
n += 1
# Yield reduction without last block if it was long enough
if n >= min_len:
yield s[:idx]
def reduce_min(s, min_len):
"""
Finds the smallest possible reduction of s by successive
elimination of contiguous blocks of min_len or more repeated
characters.
"""
# Current set of possible reductions
rs = set([s])
# Current best solution
result = s
# While there are strings to reduce
while rs:
# Get one element
r = rs.pop()
# Find reductions
r_red = list(reductions(r, min_len))
# If no reductions are found it is irreducible
if len(r_red) == 0 and len(r) < len(result):
# Replace if shorter than current best
result = r
else:
# Save reductions for next iterations
rs.update(r_red)
return result
assert reduce_min("BAABCCCBBA", 3) == "B"
assert reduce_min("AABBAAAC", 3) == "AABBC"
assert reduce_min("AAAA", 3) == ""
assert reduce_min("AAAABBBAC", 3) == "C"
EDIT: Since people seem to be posting C++ solutions, here is mine in C++ (again, function reduce_min):
#include <string>
#include <vector>
#include <unordered_set>
#include <iterator>
#include <utility>
#include <cassert>
using namespace std;
void reductions(const string &s, unsigned int min_len, vector<string> &rs)
{
char curr = '\0';
unsigned int n = 0;
unsigned int idx = 0;
for (auto it = s.begin(); it != s.end(); ++it)
{
if (curr != *it)
{
auto i = distance(s.begin(), it);
if (n >= min_len)
{
rs.push_back(s.substr(0, idx) + s.substr(i));
}
curr = *it;
n = 1;
idx = i;
}
else
{
n += 1;
}
}
if (n >= min_len)
{
rs.push_back(s.substr(0, idx));
}
}
string reduce_min(const string &s, unsigned int min_len)
{
unordered_set<string> rs { s };
string result = s;
vector<string> rs_new;
while (!rs.empty())
{
auto it = rs.begin();
auto r = *it;
rs.erase(it);
rs_new.clear();
reductions(r, min_len, rs_new);
if (rs_new.empty() && r.size() < result.size())
{
result = move(r);
}
else
{
rs.insert(rs_new.begin(), rs_new.end());
}
}
return result;
}
int main(int argc, char **argv)
{
assert(reduce_min("BAABCCCBBA", 3) == "B");
assert(reduce_min("AABBAAAC", 3) == "AABBC");
assert(reduce_min("AAAA", 3) == "");
assert(reduce_min("AAAABBBAC", 3) == "C");
return 0;
}
If you can use C++17 you can save memory by using string views.
EDIT 2: About the complexity of the algorithm. It is not straightforward to figure out, and as I said the algorithm is meant to be simple more than anything, but let's see. In the end, it is more or less the same as a breadth-first search. Let's say the string length is n, and, for generality, let's say the minimum block length (value 3 in the question) is m. In the first level, we can generate up to n / m reductions in the worst case. For each of these, we can generate up to (n - m) / m reductions, and so on. So basically, at "level" i (loop iteration i) we create up to (n - i * m) / m reductions per string we had, and each of these will take O(n - i * m) time to process. The maximum number of levels we can have is, again, n / m. So the complexity of the algorithm (if I'm not making mistakes) should have the form:
O( sum {i = 0 .. n / m} ( O(n - i * m) * prod {j = 0 .. i} ((n - i * m) / m) ))
|-Outer iters--| |---Cost---| |-Prev lvl-| |---Branching---|
Whew. So this should be something like:
O( sum {i = 0 .. n / m} (n - i * m) * O(n^i / m^i) )
Which in turn would collapse to:
O((n / m)^(n / m))
So yeah, the algorithm is more or less simple, but it can run into exponential cost cases (the bad cases would be strings made entirely of exactly m-long blocks, like AAABBBCCCAAACCC... for m = 3).
I tried to use MillerRabin + PollardP1_rho method to factorize an integer into primes in Python3 for reducing time complexity as much as I could.But it failed some tests,I knew where the problem was.But I am a tyro in algorithm, I didn't know how to fix it.So I will put all relative codes here.
import random
def gcd(a, b):
"""
a, b: integers
returns: a positive integer, the greatest common divisor of a & b.
"""
if a == 0:
return b
if a < 0:
return gcd(-a, b)
while b > 0:
c = a % b
a, b = b, c
return a
def mod_mul(a, b, n):
# Calculate a * b % n iterately.
result = 0
while b > 0:
if (b & 1) > 0:
result = (result + a) % n
a = (a + a) % n
b = (b >> 1)
return result
def mod_exp(a, b, n):
# Calculate (a ** b) % n iterately.
result = 1
while b > 0:
if (b & 1) > 0:
result = mod_mul(result, a, n)
a = mod_mul(a, a, n)
b = (b >> 1)
return result
def MillerRabinPrimeCheck(n):
if n in {2, 3, 5, 7, 11}:
return True
elif (n == 1 or n % 2 == 0 or n % 3 == 0 or n % 5 == 0 or n % 7 == 0 or n % 11 == 0):
return False
k = 0
u = n - 1
while not (u & 1) > 0:
k += 1
u = (u >> 1)
random.seed(0)
s = 5 #If the result isn't right, then add the var s.
for i in range(s):
x = random.randint(2, n - 1)
if x % n == 0:
continue
x = mod_exp(x, u, n)
pre = x
for j in range(k):
x = mod_mul(x, x, n)
if (x == 1 and pre != 1 and pre != n - 1):
return False
pre = x
if x != 1:
return False
return True
def PollardP1_rho(n, c):
'''
Consider c as a constant integer.
'''
i = 1
k = 2
x = random.randrange(1, n - 1) + 1
y = x
while 1:
i += 1
x = (mod_mul(x, x, n) + c) % n
d = gcd(y - x, n)
if 1 < d < n:
return d
elif x == y:
return n
elif i == k:
y = x
k = (k << 1)
result = []
def PrimeFactorsListGenerator(n):
if n <= 1:
pass
elif MillerRabinPrimeCheck(n) == True:
result.append(n)
else:
a = n
while a == n:
a = PollardP1_rho(n, random.randrange(1,n - 1) + 1)
PrimeFactorsListGenerator(a)
PrimeFactorsListGenerator(n // a)
When I tried to test this:
PrimeFactorsListGenerator(4)
It didn't stop and looped this:
PollardP1_rho(4, random.randrange(1,4 - 1) + 1)
I have already tested the functions before PollardP1_rho and they work normally,so I know the function PollardP1_rho cannot deal the number 4 correctly,also the number 5.How can I fix that?
I have solved it myself.
There is 1 mistake in the code.
I should not use a var 'result' outside of the function as a global var,I should define in the function and use result.extend() to ensure the availability of the whole recursive process.So I rewrote PollardP1_rho(n, c) and PrimeFactorsListGenerator(n):
def Pollard_rho(x, c):
'''
Consider c as a constant integer.
'''
i, k = 1, 2
x0 = random.randint(0, x)
y = x0
while 1:
i += 1
x0 = (mod_mul(x0, x0, x) + c) % x
d = gcd(y - x0, x)
if d != 1 and d != x:
return d
if y == x0:
return x
if i == k:
y = x0
k += k
def PrimeFactorsListGenerator(n):
result = []
if n <= 1:
return None
if MillerRabinPrimeCheck(n):
return [n]
p = n
while p >= n:
p = Pollard_rho(p, random.randint(1, n - 1))
result.extend(PrimeFactorsListGenerator(p))
result.extend(PrimeFactorsListGenerator(n // p))
return result
#PrimeFactorsListGenerator(400)
#PrimeFactorsListGenerator(40000)
There is an additional tip: You don't need to write a function mod_mul(a, b, n) at all, using Python built-in pow(a, b, n) will do the trick and it is fully optimized.
How to design an algorithm to simulate multiplication by addition. input two integers. they may be zero, positive or negative..
def multiply(a, b):
if (a == 1):
return b
elif (a == 0):
return 0
elif (a < 0):
return -multiply(-a, b)
else:
return b + multiply(a - 1, b)
some pseudocode:
function multiply(x, y)
if abs(x) = x and abs(y) = y or abs(x) <> x and abs(y) <> y then sign = 'plus'
if abs(x) = x and abs(y) <> y or abs(x) <> x and abs(y) = y then sign = 'minus'
res = 0
for i = 0 to abs(y)
res = res + abs(x)
end
if sign = 'plus' return res
else return -1 * res
end function
val:= 0
bothNegative:=false
if(input1 < 0) && if(input2 < 0)
bothNegative=true
if(bothNegative)
smaller_number:=absolute_value_of(smaller_number)
for [i:=absolute_value_of(bigger_number);i!=0;i--]
do val+=smaller_number
return val;
mul(a,b)
{
sign1=sign2=1;
if(a==0 || b==0)
return 0;
if(a<0){
sign1=-1;
a=-a;
}
if(b<0){
sign2=-1;
b=-b;
}
s=a;
for(i=1;i<b;i++)
s+=a;
if(sign1==sign2)
return s;
else
return -s;
}
How about this for integers:
int multiply(int a, int b)
{
int product = 0;
int i;
if ( b > 0 )
{
for(i = 0; i < b ; i++)
{
product += a;
}
}
else
{
for(i = 0; i > b ; i--)
{
product -= a;
}
}
return product;
}
I got here because I was looking for multiplication algorithm without using * operation. All I see here is just adding or subtracting number n-times. It's O(n) and it's ok, but...
If you have bitwise shift operations you can get O(log n) algorithm for multiplication.
Here is my pseudocode:
function mul(n, x)
if n < 0 then # 'n' cannot be negative
n := -n
x := -x
endif
y := 0
while n != 0 do
if n % 2 == 0 then
x := x << 1 # x := x + x
n := n >> 1 # n := n / 2
else
y := y + x
x := x << 1 # x := x + x
n := n - 1 # n := (n-1)/2
n := n >> 1
endif
endwhile
return y # y = n * x
end
Remember that function above for mul(1000000, 2) is O(log 1000000) and for mul(2, 1000000) is only O(log 2).
Of course, you will get the same results, but keep in mind that the order of the parameters in function call does matter.
Edit: sidenote for using n % 2
Implementation of n % 2 using bitwise shift
It's pretty straightforward. First divide n by 2, then multiply n by 2 and check if n has changed. Pseudocode:
function is_even(n)
n_original := n
n := n >> 1 # n := n / 2
n := n << 1 # n := n * 2
if n = n_original then
return true # n is even
else
return false # n is not even
endif
end
Implementation of n % 2 using bitwise and
function is_even(n)
if n and 1 = 0 then
return true
else
return false
endif
end