How to sort a list of full names by how common the ancestors are? - algorithm

If every letter in the following represents a name. What is the best way to sort them by how common the ancestors are?
A B C D
E F G H
I J K L
M N C D
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L
The result should be:
I J K L # Three names is more important that two names
W J K L
X J K L
A B C D # C D is repeated more than G H
M N C D
O P C D
Q R C D
E F G H
S T G H
U V G H
EDIT:
Names might have spaces in them (Double names).
Consider the following example where each letter represents a single word:
A B C D M
E F G H M
I J K L M
M N C D M
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L
The output should be:
A B C D M
M N C D M
I J K L M
E F G H M
W J K L
X J K L
O P C D
Q R C D
S T G H
U V G H

First count the number of occurrences for each chain. Then rank each name according to that count. Try this:
from collections import defaultdict
words = """A B C D
E F G H
I J K L
M N C D
O P C D
Q R C D
S T G H
U V G H
W J K L
X J K L"""
words = words.split('\n')
# Count ancestors
counters = defaultdict(lambda: defaultdict(lambda: 0))
for word in words:
parts = word.split()
while parts:
counters[len(parts)][tuple(parts)] += 1
parts.pop(0)
# Calculate tuple of ranks, used for sorting
ranks = {}
for word in words:
rank = []
parts = word.split()
while parts:
rank.append(counters[len(parts)][tuple(parts)])
parts.pop(0)
ranks[word] = tuple(rank)
# Sort by ancestor count, longest chain comes first
words.sort(key=lambda word: ranks[word], reverse=True)
print(words)

Here's how you could do it in Java - essentially the same method as #fafl's solution:
static List<Name> sortNames(String[] input)
{
List<Name> names = new ArrayList<>();
for (String name : input)
names.add(new Name(name));
Map<String, Integer> partCount = new HashMap<>();
for (Name name : names)
for (String part : name.parts)
partCount.merge(part, 1, Integer::sum);
for (Name name : names)
for (String part : name.parts)
name.counts.add(partCount.get(part));
Collections.sort(names, new Comparator<Name>()
{
public int compare(Name n1, Name n2)
{
for (int c, i = 0; i < n1.parts.size(); i++)
if ((c = Integer.compare(n2.counts.get(i), n1.counts.get(i))) != 0)
return c;
return 0;
}
});
return names;
}
static class Name
{
List<String> parts = new ArrayList<>();
List<Integer> counts = new ArrayList<>();
Name(String name)
{
List<String> s = Arrays.asList(name.split("\\s+"));
for (int i = 0; i < s.size(); i++)
parts.add(String.join(" ", s.subList(i, s.size())));
}
}
Test:
public static void main(String[] args)
{
String[] input = {
"A B C D",
"W J K L",
"E F G H",
"I J K L",
"M N C D",
"O P C D",
"Q R C D",
"S T G H",
"U V G H",
"X J K L" };
for (Name name : sortNames(input))
System.out.println(name.parts.get(0));
}
Output:
I J K L
W J K L
X J K L
A B C D
M N C D
O P C D
Q R C D
E F G H
S T G H
U V G H

Related

Reverse the isometric projection algorithm

I've got this code:
const a = 2; // always > 0 and known in advance
const b = 3; // always > 0 and known in advance
const c = 4; // always > 0 and known in advance
for (let x = 0; x <= a; x++) {
for (let y = 0; y <= b; y++) {
for (let z = 0; z <= c; z++) {
for (let p = 0; p <= 1; p++) {
for (let q = 0; q <= 2; q++) {
let u = b + x - y + p;
let v = a + b + 2 * c - x - y - 2 * z + q;
let w = c + x + y - z;
}
}
}
}
}
The code generates (a+1)*(b+1)*(c+1)*2*3 triplets of (u,v,w), each of them is unique. And because of that fact, I think it should be possible to write reversed version of this algorithm that will calculate x,y,z,p,q based on u,v,w. I understand that there are only 3 equations and 5 variables to get, but known boundaries for x,y,z,p,q and the fact that all variables are integers should probably help.
for (let u = ?; u <= ?; u++) {
for (let v = ?; v <= ?; v++) {
for (let w = ?; w <= ?; w++) {
x = ?;
y = ?;
z = ?;
p = ?;
q = ?;
}
}
}
I even managed to produce the first line: for (let u = 0; u <= a + b + 1; u++) by taking the equation for u and finding min and max but I'm struggling with moving forward. I understand that min and max values for v are depending on u, but can't figure out the formulas.
Examples are in JS, but I will be thankful for any help in any programming language or even plain math formulas.
If anyone is interested in what this code is actually about - it projects voxel 3d model to triangles on a plain. u,v are resulting 2d coordinates and w is distance from the camera. Reversed algorithm will be actually a kind of raytracing.
UPDATE: Using line equations from 2 points I managed to create minmax conditions for v and code now looks like this:
for (let u = 0; u <= a + b + 1; u++) {
let minv = u <= a ? a - u : -a + u - 1;
let maxv = u <= b ? a + 2 * c + u + 2 : a + 2 * b + 2 * c - u + 3;
for (let v = minv; v <= maxv; v++) {
...
}
}
I think I know what to do with x, y, z, p, q on the last step so the problem left is minw and maxw. As far as I understand those values should depend both on u and v and I must use plane equations?
If the triplets are really unique (didn't check that) and if p and q always go up to 1 and 2 (respectively), then you can "group" triplets together and go up the loop chain.
We'll first find the 3 triplets that where made in the same "q loop" : the triplets make with the same x,y,z,p. As only q change, the only difference will be v, and it will be 3 consecutive numbers.
For that, let's group triplets such that, in a group, all triplets have the same u and same w. Then we sort triplets in groups by their v parameters, and we group them 3 by 3. Inside each group it's easy to assign the correct q variable to each triplet.
Then reduce the groups of 3 into the first triplet (the one with q == 0). We start over to assign the p variable : Group all triplets such that they have same v and w inside a group. Then sort them by the u value, and group them 2 by 2. This let's us find their p value. Remember that each triplet in the group of 3 (before reducing) has that same p value.
Then, for each triplet, we have found p and q. We solve the 3 equation for x,y,z :
z = -1 * ((v + w) - a - b - 3c -q)/3
y = (w - u + z + b - c - p)/2
x = u + y - b - p
After spending some time with articles on geometry and with the huge help from Wolfram Alpha, I managed to write needed equations myself. And yes, I had to use plane equations.
const a = 2; // always > 0 and known in advance
const b = 3; // always > 0 and known in advance
const c = 4; // always > 0 and known in advance
const minu = 0;
const maxu = a + b + 1;
let minv, maxv, minw, maxw;
let x, y, z, p, q;
for (let u = minu; u <= maxu; u++) {
if (u <= a) {
minv = a - u;
} else {
minv = -a + u - 1;
}
if (u <= b) {
maxv = a + 2 * c + u + 2;
} else {
maxv = a + 2 * b + 2 * c - u + 3;
}
for (let v = minv; v <= maxv; v++) {
if (u <= b && v >= a + u + 1) {
minw = (-a + 2 * b - 3 * u + v - 2) / 2;
} else if (u > b && v >= a + 2 * b - u + 2) {
minw = (-a - 4 * b + 3 * u + v - 5) / 2;
} else {
minw = a + b - v;
}
if (u <= a && v <= a + 2 * c - u + 1) {
maxw = (-a + 2 * b + 3 * u + v - 1) / 2;
} else if (u > a && v <= -a + 2 * c + u) {
maxw = (5 * a + 2 * b - 3 * u + v + 2) / 2;
} else {
maxw = a + b + 3 * c - v + 2;
}
minw = Math.round(minw);
maxw = Math.round(maxw);
for (let w = minw; w <= maxw; w++) {
z = (a + b + 3 * c - v - w + 2) / 3;
q = Math.round(2 - (z % 1) * 3);
z = Math.floor(z);
y = (a + 4 * b + q - 3 * u - v + 2 * w + 3) / 6;
p = 1 - (y % 1) * 2;
y = Math.floor(y);
x = (a - 2 * b - 3 * p + q + 3 * u - v + 2 * w) / 6;
x = Math.round(x);
}
}
}
This code passes my tests, but if someone can create better solution, I would be very interested.

Binary splitting algorithm needed for nested hypergeometric-type rational sums

There are some good sources on-line to implement fast summation using binary splitting techniques. For example, Ch. 20, Jörg Arndt Book, (2004), Cheng et al. (2007) and papers from Haible and Papanikolaou (1997) and distributed with the CLN library source code. From this last article, the following notes apply to the evaluation of this kind of linearly convergent series (Type 1)
S = SUM(n=0,+oo,a(n)/b(n)*PROD(k=0,n,p(k)/q(k)))
where a(n), b(n), p(n), q(n) are integers with O(log N) bits. The most often used
case is that a(n), b(n), p(n), q(n) are polynomials in n with integer coefficients. Sum S is computed considering the following sequence of partial series with given two index bounds [n1, n2]
S(n1,n2) = SUM(n=n1,n2-1,a(n)/b(n)*PROD(k=n1,n,p(k)/q(k))
S(n1,n2) are not computed directly. Instead, the product of integers
P = p(n1) ... p(n2-1)
Q = q(n1) ... q(n2−1)
B = b(n1) ... b(n2-1)
T = B Q S
are computed recursively by binary splitting until n2 - n1 < 5 when these are computed directly. Choose an index nm in the middle of n1 and n2, compute the components Pl, Ql, Bl , Tl belonging to the interval n1 =< n < nm, compute the components Pr, Qr , Br, Tr belonging to the interval nm =< n < n2 and set these products and sums
P = Pl Pr
Q = Ql Qr,
B = Bl Br
T = Br Qr Tl + Bl Pl Tr
Finally, this algorithm is applied to n1 = 0 and n2 = nmax = O(N), and a final floating-point division
S = T/(B Q)
is performed.
The bit complexity of computing S with N bits of precision is O((log N)^2 M(N)) where M(N) is the bit complexity of the multiplication of two N-bit numbers.
A slightly modified but more complex series (Type 2) that is found in the last reference above can be also summed by binary splitting. It has an additional inner sum of rationals where c(n) and d(n) are integers with O(log N) bits
U = SUM(n=0,+oo,a(n)/b(n) * PROD(k=0,n,p(k)/q(k)) * SUM(m=0,n,c(m)/d(m)))
We consider these partial sums
U(n1,n2) = SUM(n=n1,n2-1,a(n)/b(n) * PROD(k=n1,n,p(k)/q(k)) * SUM(m=n1,n,c(m)/d(m)))
The algorithm is a variation of the above as follows.
P = p(n1) ... p(n2-1)
Q = q(n1) ... q(n2−1)
B = b(n1) ... b(n2-1)
T = B Q S
D = d(n1) ... d(n2-1)
C = D (c(n1)/d(n1) + ... + c(n2-1)/d(n2-1))
V = D B Q U
If n2 - n1 =< 4 these values are computed directly. If n2 - n1 > 4 they are computed by binary splitting. Choose an index nm in the middle of n1 and n2, compute the components Pl, Ql, Bl , Tl, Dl, Cl, Vl belonging to the interval n1 =< n < nm, compute the components Pr, Qr , Br, Tr, Dr, Cr, Vr belonging to the interval nm =< n < n2 and set these products and sums
P = Pl Pr
Q = Ql Qr
B = Bl Br
T = Br Qr Tl + Bl Pl Tr
D = Dl Dr
C = Cl Dr + Cr Dl
V = Dr Br Qr Vl + Dr Cl Bl Pl Tr + Dl Bl Pl Vr
At last, this algorithm is applied to n1 = 0 and n2 = nmax = O(N), and final floating point divisions are performed
S = T / (B Q)
V = U / (D B Q)
I have programmed both algorithms in Pari-GP and applied them to compute some mathematical constants using Chudnovsky's formula for Pi, this formula for Catalan Constant and more. (I have got more than 1000000 decimal digits in some cases under this platform). This code has been used to compute some difficult series as well.
I want to go one step ahead to accelerate some series by mixing binary splitting algorithm and levin-type sequence transformations. To do this I need to find the binary splitting relationships for a slight extension of these series.
W = SUM(n=0,+oo,a(n)/b(n) * PROD(k=0,n,p(k)/q(k)) * SUM(m=0,n,c(m)/d(m) * PROD(i=0,m,f(i)/g(i))))
It has has an additional product of rationals inside the inner sum where f(n) and g(n) are integers with O(log N) bits. These series are not hypergeometric but they are nested hypergeometric type sums. I think this algorithm might be derived from these partial series
W(n1,n2) = SUM(n=n1,n2-1,a(n)/b(n) * PROD(k=n1,n,p(k)/q(k)) * SUM(m=n1,n,c(m)/d(m) * PROD(i=n1,m,f(i)/g(i))))
I would very much appreciate if someone can derive the product and sum steps to bin-split this type of series.
I will leave the PARI-GP code for computing fast linearly convergent series of type 1 and 2 as explained. Use ?sumbinsplit for help. There are some Testing examples for Type 2 series as well. You can un-comment one of them and use
precision(-log10(abs(sumbinsplit(~F)[1]/s-1)),ceil(log10(Digits())));
to check it.
\\ ANSI COLOR CODES
{
DGreen = Dg = "\e[0;32m";
Brown = Br = "\e[0;33m";
DCyan = Dc = "\e[0;36m";
Purple = Pr = "\e[0;35m";
Gray = Gy = "\e[0;37m";
Red = Rd = "\e[0;91m";
Green = Gr = "\e[0;92m";
Yellow = Yw = "\e[0;93m";
Blue = Bl = "\e[0;94m";
Magenta = Mg = "\e[0;95m";
Cyan = Cy = "\e[0;96m";
Reset = "\e[0m";
White = Wh = "\e[0;97m";
Black = Bk = "\e[0;30m";
}
eps()={ my(e=1.); while(e+1. != 1., e>>=1); e; }
addhelp(eps,Str(Yw,"\n SMALLEST REAL NUMBER\n",Wh,"\n eps() ",Gr,"returns the minimum positive real number for current precision"));
log10(x) = if(x==0,log(eps()),log(x))/log(10);
Digits(n) = if(type(n) == "t_INT" && n > 0,default(realprecision,n); precision(1.), precision(1.));
addhelp(Digits,Str(Yw,"\n DIGITS\n",Wh,"\n Digits(n)",Gr," Sets global precision to",Wh," n",Gr," decimal digits.",Wh," Digits()",Gr," returns current global precision."));
addhelp(BinSplit2,Str(Yw,"\n SERIES BINARY SPLITTING (TYPE 2)\n\n",Wh,"BinSplit(~F,n1,n2)",Gr," for ",Wh,"F = [a(n),b(n),p(n),q(n),c(n),d(n)]",Gr," a vector of ",Br,"t_CLOSUREs",Gr," whose\n components are typically polynomials, computes by binary splitting method sums of type\n\n",Wh,"S2 = sum(n=n1,n2-1,a(n)/b(n)*prod(k=n1,n,p(k)/q(k))*sum(m=n1,n,c(m)/d(m)))\n\n",Gr,"Output: ",Wh," [P,Q,B,T,D,C,V]",Gr," integer valued algorithm computing parameters"));
addhelp(BinSplit1,Str(Yw,"\n SERIES BINARY SPLITTING (TYPE 1)\n\n",Wh,"BinSplit(~F,n1,n2)",Gr," for ",Wh,"F = [a(n),b(n),p(n),q(n)]",Gr," a vector of ",Br,"t_CLOSUREs",Gr," whose components\n are typically polynomials, computes by binary splitting method sums of type\n\n",Wh,"S1 = sum(n=n1,n2-1,a(n)/b(n)*prod(k=n1,n,p(k)/q(k)))\n\n",Gr,"Output: ",Wh,"[P,Q,B,T]",Gr," integer valued algorithm computing parameters"));
BinSplit2(~F, n1, n2) =
{
my( P = 1, Q = 1, B = 1, T = 0, D = 1, C = 0, V = 0,
LP, LQ, LB, LT, LD, LC, LV, RP, RQ, RB, RT, RD, RC, RV,
nm, tmp1 = 1, tmp2, tmp3 );
\\
\\ F = [a(n),b(n),p(n),q(n),c(n),d(n)]
\\
if( n2 - n1 < 5,
\\
\\ then
\\
for ( j = n1, n2-1,
LP = F[3](j);
LQ = F[4](j);
LB = F[2](j);
LD = F[6](j);
LC = F[5](j);
\\
tmp2 = LB * LQ;
tmp3 = LP * F[1](j) * tmp1;
T = T * tmp2 + tmp3;
C = C * LD + D * LC;
V = V * tmp2 * LD + C * tmp3;
P *= LP;
Q *= LQ;
B *= LB;
D *= LD;
tmp1 *= LP * LB;
),
\\
\\ else
\\
nm = (n1 + n2) >> 1;
\\
[RP,RQ,RB,RT,RD,RC,RV] = BinSplit2(~F, nm, n2);
[LP,LQ,LB,LT,LD,LC,LV] = BinSplit2(~F, n1, nm);
\\
tmp1 = RB * RQ;
tmp2 = LB * LP;
tmp3 = LC * RD;
\\
P = LP * RP;
Q = RQ * LQ;
B = LB * RB;
T = LT * tmp1 + RT * tmp2;
D = LD * RD;
C = RC * LD + tmp3;
V = RD * LV * tmp1 + ( RT * tmp3 + LD * RV ) * tmp2;
\\
\\ end if
);
return([P,Q,B,T,D,C,V]);
}
BinSplit1(~F, n1, n2) =
{
my( P = 1, Q = 1, B = 1, T = 0,
LP, LQ, LB, LT, RP, RQ, RB, RT,
tmp1 = 1, nm );
\\
\\ F = [a(n),b(n),p(n),q(n)]
\\
if( n2 - n1 < 5,
\\
\\ then
\\
for ( j = n1, n2-1,
LP = F[3](j);
LQ = F[4](j);
LB = F[2](j);
\\
T = T * LB * LQ + LP * F[1](j) * tmp1;
P *= LP;
Q *= LQ;
B *= LB;
\\
tmp1 *= LP * LB;
),
\\
\\ else
\\
nm = (n1 + n2) >> 1;
\\
[RP,RQ,RB,RT] = BinSplit1(~F, nm, n2);
[LP,LQ,LB,LT] = BinSplit1(~F, n1, nm);
\\
P = LP * RP;
Q = RQ * LQ;
B = LB * RB;
T = LT * RB * RQ + RT * LB * LP;
\\
\\ end if
);
return([P,Q,B,T]);
}
sumbinsplit(~F, n1 = 1, dgs = getlocalprec()) =
{
my( n = #F, P, Q, B, T, D, C, V, [a,b] = F[3..4] );
my( n2 = 1 + ceil(dgs*log(10)/log(abs(pollead(Pol(b(x),x))/pollead(Pol(a(x),x))))) );
\\
if ( n > 4, [P, Q, B, T, D, C, V] = BinSplit2(~F,n1,n2); return(1.*([V/D,T]/B/Q)),\
[P, Q, B, T] = BinSplit1(~F,n1,n2); return(1.*(T/B/Q)));
}
addhelp(sumbinsplit,Str(Yw,"\n LINEARLY CONVERGENT SERIES BINARY SPLITTING SUMMATION\n\n",Wh,"sumbinsplit( ~F, {n1 = 1}, {dgs = getlocalprec()} )\n\n",Gr,"for either ",Wh,"F = [a(n),b(n),p(n),q(n)] ",Gr,"or",Wh," F = [a(n),b(n),p(n),q(n),c(n),d(n)]",Gr," vectors of ",Br,"t_CLOSUREs",Gr," whose\n components are typically polynomials. It computes sums of type 1 or type 2 by binary splitting method\n\n (See BinSplit1, BinSplit2 help)\n\n",Wh,"n1",Gr," starting index (default 1),",Wh," dgs",Gr," result's floating precision\n\n",Yw,"OUTPUT:",Gr," either",Wh," S1",Gr," series value (Type 1) or ",Wh," [S2, S1]",Gr," series values [Type 2, Type1]"));
/* TESTINGS */
/*
Digits(100000);
a = n->1;
b = n->n;
p = n->n*(n<<1-1);
q = n->3*(3*n-1)*(3*n-2);
c = n->1;
d = n->n*(n<<1-1)<<1;
s = log(2)*(3*log(2)+Pi/2)/10-Pi^2/60;
F = [a,b,p,q,c,d];
*/
/*
Digits(100000);
s = -Pi*Catalan/2+33/32*zeta(3)+log(2)*Pi^2/24;
F = [n->1,n->n^2,n->n*(n<<1-1),n->3*(3*n-1)*(3*n-2),n->1,n->n*(n<<1-1)<<1];
\\ precision(-log10(abs(sumbinsplit(~F)[1]/s-1)),ceil(log10(Digits())));
*/
/*
Digits(10000);
a = n->1;
b = n->n<<1+1;
p = n->n<<1-1;
q = n->n<<3;
c = n->1;
d = n->(n<<1-1)^2;
s = Pi^3/648;
F = [a,b,p,q,c,d];
*/
/*
Digits(10000);
a = n->-1;
b = n->n^3;
p = n->-n;
q = n->(n<<1-1)<<1;
c = n->20*n-9;
d = n->n*(n<<1-1)<<1;
s = 2*Pi^4/75;
F = [a,b,p,q,c,d];
*/
/*
Digits(10000);
a = n->2;
b = n->n^2;
p = n->n;
q = n->(n<<1-1);
c = n->1;
d = n->n<<1-1;
s = 7*zeta(3)-2*Pi*Catalan;
F = [a,b,p,q,c,d];
*/
/*
Digits(10000);
a = n->1;
b = n->n^4;
p = n->n;
q = n->(n<<1-1)<<1;
c = n->36*n-17;
d = n->n*(n<<1-1)<<1;
s = 14*zeta(5)/9+5/18*Pi^2*zeta(3);
F = [a,b,p,q,c,d];
*/
/*
Digits(10000);
a = n->1;
b = n->n^2;
p = n->n;
q = n->(n<<1-1)<<1;
c = n->12*n-5;
d = n->n*(n<<1-1)<<1;
s = 5*zeta(3)/3;
F = [a,b,p,q,c,d];
*/
Making a close analysis of the nested hypergeometric series W and partial sums W(n1,n2) I have found the binary splitting relationships,
Write the inner hypergeometric sum in W(n1,n2) as
Z(n1,n2) = SUM(m=n1,n2-1,c(m)/d(m) * PROD(i=n1,m,f(i)/g(i)))
we have
W(n1,n2) = = SUM(n=n1,n2-1,a(n)/b(n) * PROD(k=n1,n,p(k)/q(k)) * Z(n1,n+1))
The algorithm is a deeper variation of the previous one. We have now 8 polynomial functions a(n), b(n), p(n), q(n), c(n), d(n), f(n), g(n). Set these products and sums to be computed by binary splitting
P = p(n1) ... p(n2-1)
Q = q(n1) ... q(n2−1)
B = b(n1) ... b(n2-1)
T = B Q S
D = d(n1) ... d(n2-1)
F = f(n1) ... f(n2-1)
G = g(n1) ... g(n2-1)
C = D G Z
V = D B Q G W
If n2 - n1 =< 4 these values are computed directly. If n2 - n1 > 4 they are computed recursively by binary splitting. Choose an index nm in the middle of n1 and n2, compute the components Pl, Ql, Bl , Tl, Dl, Cl, Vl, Fl, Gl belonging to the interval n1 =< n < nm, compute the components Pr, Qr , Br, Tr, Dr, Cr, Vr, Fr, Gr belonging to the interval nm =< n < n2 and set these products and sums
P = Pl Pr
Q = Ql Qr
B = Bl Br
T = Br Qr Tl + Bl Pl Tr
D = Dl Dr
F = Fl Fr
G = Gl Gr
C = Cl Dr Gr + Cr Dl Fl
V = Dr Br Qr Gr Vl + Dr Cl Bl Pl Gr Tr + Dl Bl Pl Fl Vr
Using auxiliary variables and factorizing, these 27 big integer products can be reduced to just 19. Finally, this algorithm is applied to n1 = 0 and n2 = nmax = O(N), and final floating point divisions are performed. Algorithm provides all 3 sums
S = T / (B Q)
Z = C / (D G)
W = V / (B Q D G)
I will code and test this algorithm to complement this answer. Many convergence acceleration methods (CAM) applied to some slowly convergent series have the structure of series W (for example, some classical CAMs like Salzer's, Gustavson's, sumalt() from Pari GP -Cohen, Rodriguez, Zagier-, Weniger's transformations and several Levin-type CAMs). I believe that the merge of BinSplit and Levin-type sequence transformations should provide a strong boost to this topic. We will see.

Competitive programming using Haskell

I am currently trying to refresh my Haskell knowledge by solving some Hackerrank problems.
For example:
https://www.hackerrank.com/challenges/maximum-palindromes/problem
I've already implemented an imperative solution in C++ which got accepted for all test cases. Now I am trying to come up with a pure functional solution in (reasonably idiomatic) Haskell.
My current code is
module Main where
import Control.Monad
import qualified Data.ByteString.Char8 as C
import Data.Bits
import Data.List
import qualified Data.Map.Strict as Map
import qualified Data.IntMap.Strict as IntMap
import Debug.Trace
-- precompute factorials
compFactorials :: Int -> Int -> IntMap.IntMap Int
compFactorials n m = go 0 1 IntMap.empty
where
go a acc map
| a < 0 = map
| a < n = go a' acc' map'
| otherwise = map'
where
map' = IntMap.insert a acc map
a' = a + 1
acc' = (acc * a') `mod` m
-- precompute invs
compInvs :: Int -> Int -> IntMap.IntMap Int -> IntMap.IntMap Int
compInvs n m facts = go 0 IntMap.empty
where
go a map
| a < 0 = map
| a < n = go a' map'
| otherwise = map'
where
map' = IntMap.insert a v map
a' = a + 1
v = (modExp b (m-2) m) `mod` m
b = (IntMap.!) facts a
modExp :: Int -> Int -> Int -> Int
modExp b e m = go b e 1
where
go b e r
| (.&.) e 1 == 1 = go b' e' r'
| e > 0 = go b' e' r
| otherwise = r
where
r' = (r * b) `mod` m
b' = (b * b) `mod` m
e' = shift e (-1)
-- precompute frequency table
initFreqMap :: C.ByteString -> Map.Map Char (IntMap.IntMap Int)
initFreqMap inp = go 1 map1 map2 inp
where
map1 = Map.fromList $ zip ['a'..'z'] $ repeat 0
map2 = Map.fromList $ zip ['a'..'z'] $ repeat IntMap.empty
go idx m1 m2 inp
| C.null inp = m2
| otherwise = go (idx+1) m1' m2' $ C.tail inp
where
m1' = Map.update (\v -> Just $ v+1) (C.head inp) m1
m2' = foldl' (\m w -> Map.update (\v -> liftM (\c -> IntMap.insert idx c v) $ Map.lookup w m1') w m)
m2 ['a'..'z']
query :: Int -> Int -> Int -> Map.Map Char (IntMap.IntMap Int)
-> IntMap.IntMap Int -> IntMap.IntMap Int -> Int
query l r m freqMap facts invs
| x > 1 = (x * y) `mod` m
| otherwise = y
where
calcCnt cs = cr - cl
where
cl = IntMap.findWithDefault 0 (l-1) cs
cr = IntMap.findWithDefault 0 r cs
f1 acc cs
| even cnt = acc
| otherwise = acc + 1
where
cnt = calcCnt cs
f2 (acc1,acc2) cs
| cnt < 2 = (acc1 ,acc2)
| otherwise = (acc1',acc2')
where
cnt = calcCnt cs
n = cnt `div` 2
acc1' = acc1 + n
r = choose acc1' n
acc2' = (acc2 * r) `mod` m
-- calc binomial coefficient using Fermat's little theorem
choose n k
| n < k = 0
| otherwise = (f1 * t) `mod` m
where
f1 = (IntMap.!) facts n
i1 = (IntMap.!) invs k
i2 = (IntMap.!) invs (n-k)
t = (i1 * i2) `mod` m
x = Map.foldl' f1 0 freqMap
y = snd $ Map.foldl' f2 (0,1) freqMap
main :: IO()
main = do
inp <- C.getLine
q <- readLn :: IO Int
let modulo = 1000000007
let facts = compFactorials (C.length inp) modulo
let invs = compInvs (C.length inp) modulo facts
let freqMap = initFreqMap inp
forM_ [1..q] $ \_ -> do
line <- getLine
let [s1, s2] = words line
let l = (read s1) :: Int
let r = (read s2) :: Int
let result = query l r modulo freqMap facts invs
putStrLn $ show result
It passes all small and medium test cases but I am getting timeout with large test cases.
The key to solve this problem is to precompute some stuff once at the beginning and use them to answer the individual queries efficiently.
Now, my main problem where I need help is:
The initital profiling shows that the lookup operation of the IntMap seems to be the main bottleneck. Is there better alternative to IntMap for memoization? Or should I look at Vector or Array, which I believe will lead to more "ugly" code.
Even in current state, the code doesn't look nice (by functional standards) and as verbose as my C++ solution. Any tips to make it more idiomatic? Other than IntMap usage for memoization, do you spot any other obvious problems which can lead to performance problems?
And is there any good sources, where I can learn how to use Haskell more effectively for competitive programming?
A sample large testcase, where the current code gets timeout:
input.txt
output.txt
For comparison my C++ solution:
#include <vector>
#include <iostream>
#define MOD 1000000007L
long mod_exp(long b, long e) {
long r = 1;
while (e > 0) {
if ((e & 1) == 1) {
r = (r * b) % MOD;
}
b = (b * b) % MOD;
e >>= 1;
}
return r;
}
long n_choose_k(int n, int k, const std::vector<long> &fact_map, const std::vector<long> &inv_map) {
if (n < k) {
return 0;
}
long l1 = fact_map[n];
long l2 = (inv_map[k] * inv_map[n-k]) % MOD;
return (l1 * l2) % MOD;
}
int main() {
std::string s;
int q;
std::cin >> s >> q;
std::vector<std::vector<long>> freq_map;
std::vector<long> fact_map(s.size()+1);
std::vector<long> inv_map(s.size()+1);
for (int i = 0; i < 26; i++) {
freq_map.emplace_back(std::vector<long>(s.size(), 0));
}
std::vector<long> acc_map(26, 0);
for (int i = 0; i < s.size(); i++) {
acc_map[s[i]-'a']++;
for (int j = 0; j < 26; j++) {
freq_map[j][i] = acc_map[j];
}
}
fact_map[0] = 1;
inv_map[0] = 1;
for (int i = 1; i <= s.size(); i++) {
fact_map[i] = (i * fact_map[i-1]) % MOD;
inv_map[i] = mod_exp(fact_map[i], MOD-2) % MOD;
}
while (q--) {
int l, r;
std::cin >> l >> r;
std::vector<long> x(26, 0);
long t = 0;
long acc = 0;
long result = 1;
for (int i = 0; i < 26; i++) {
auto cnt = freq_map[i][r-1] - (l > 1 ? freq_map[i][l-2] : 0);
if (cnt % 2 != 0) {
t++;
}
long n = cnt / 2;
if (n > 0) {
acc += n;
result *= n_choose_k(acc, n, fact_map, inv_map);
result = result % MOD;
}
}
if (t > 0) {
result *= t;
result = result % MOD;
}
std::cout << result << std::endl;
}
}
UPDATE:
DanielWagner's answer has confirmed my suspicion that the main problem in my code was the usage of IntMap for memoization. Replacing IntMap with Array made my code perform similar to DanielWagner's solution.
module Main where
import Control.Monad
import Data.Array (Array)
import qualified Data.Array as A
import qualified Data.ByteString.Char8 as C
import Data.Bits
import Data.List
import Debug.Trace
-- precompute factorials
compFactorials :: Int -> Int -> Array Int Int
compFactorials n m = A.listArray (0,n) $ scanl' f 1 [1..n]
where
f acc a = (acc * a) `mod` m
-- precompute invs
compInvs :: Int -> Int -> Array Int Int -> Array Int Int
compInvs n m facts = A.listArray (0,n) $ map f [0..n]
where
f a = (modExp ((A.!) facts a) (m-2) m) `mod` m
modExp :: Int -> Int -> Int -> Int
modExp b e m = go b e 1
where
go b e r
| (.&.) e 1 == 1 = go b' e' r'
| e > 0 = go b' e' r
| otherwise = r
where
r' = (r * b) `mod` m
b' = (b * b) `mod` m
e' = shift e (-1)
-- precompute frequency table
initFreqMap :: C.ByteString -> Map.Map Char (Array Int Int)
initFreqMap inp = Map.fromList $ map f ['a'..'z']
where
n = C.length inp
f c = (c, A.listArray (0,n) $ scanl' g 0 [0..n-1])
where
g x j
| C.index inp j == c = x+1
| otherwise = x
query :: Int -> Int -> Int -> Map.Map Char (Array Int Int)
-> Array Int Int -> Array Int Int -> Int
query l r m freqMap facts invs
| x > 1 = (x * y) `mod` m
| otherwise = y
where
calcCnt freqMap = cr - cl
where
cl = (A.!) freqMap (l-1)
cr = (A.!) freqMap r
f1 acc cs
| even cnt = acc
| otherwise = acc + 1
where
cnt = calcCnt cs
f2 (acc1,acc2) cs
| cnt < 2 = (acc1 ,acc2)
| otherwise = (acc1',acc2')
where
cnt = calcCnt cs
n = cnt `div` 2
acc1' = acc1 + n
r = choose acc1' n
acc2' = (acc2 * r) `mod` m
-- calc binomial coefficient using Fermat's little theorem
choose n k
| n < k = 0
| otherwise = (f1 * t) `mod` m
where
f1 = (A.!) facts n
i1 = (A.!) invs k
i2 = (A.!) invs (n-k)
t = (i1 * i2) `mod` m
x = Map.foldl' f1 0 freqMap
y = snd $ Map.foldl' f2 (0,1) freqMap
main :: IO()
main = do
inp <- C.getLine
q <- readLn :: IO Int
let modulo = 1000000007
let facts = compFactorials (C.length inp) modulo
let invs = compInvs (C.length inp) modulo facts
let freqMap = initFreqMap inp
replicateM_ q $ do
line <- getLine
let [s1, s2] = words line
let l = (read s1) :: Int
let r = (read s2) :: Int
let result = query l r modulo freqMap facts invs
putStrLn $ show result
I think you've shot yourself in the foot by trying to be too clever. Below I'll show a straightforward implementation of a slightly different algorithm that is about 5x faster than your Haskell code.
Here's the core combinatoric computation. Given a character frequency count for a substring, we can compute the number of maximum-length palindromes this way:
Divide all the frequencies by two, rounding down; call this the div2-frequencies. We'll also want the mod2-frequencies, which is the set of letters for which we had to round down.
Sum the div2-frequencies to get the total length of the palindrome prefix; its factorial gives an overcount of the number of possible prefixes for the palindrome.
Take the product of the factorials of the div2-frequencies. This tells the factor by which we overcounted above.
Take the size of the mod2-frequencies, or choose 1 if there are none. We can extend any of the palindrome prefixes by one of the values in this set, if there are any, so we have to multiply by this size.
For the overcounting step, it's not super obvious to me whether it would be faster to store precomputed inverses for factorials, and take their product, or whether it's faster to just take the product of all the factorials and do one inverse operation at the very end. I'll do the latter, because it just intuitively seems faster to do one inversion per query than one lookup per repeated letter, but what do I know? Should be easy to test if you want to try to adapt the code yourself.
There's only one other quick insight I had vs. your code, which is that we can cache the frequency counts for prefixes of the input; then computing the frequency count for a substring is just pointwise subtraction of two cached counts. Your precomputation on the input I find to be a bit excessive in comparison.
Without further ado, let's see some code. As usual there's some preamble.
module Main where
import Control.Monad
import Data.Array (Array)
import qualified Data.Array as A
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as M
import Data.Monoid
Like you, I want to do all my computations on cheap Ints and bake in the modular operations where possible. I'll make a newtype to make sure this happens for me.
newtype Mod1000000007 = Mod Int deriving (Eq, Ord)
instance Num Mod1000000007 where
fromInteger = Mod . (`mod` 1000000007) . fromInteger
Mod l + Mod r = Mod ((l+r) `rem` 1000000007)
Mod l * Mod r = Mod ((l*r) `rem` 1000000007)
negate (Mod v) = Mod ((1000000007 - v) `rem` 1000000007)
abs = id
signum = id
instance Integral Mod1000000007 where
toInteger (Mod n) = toInteger n
quotRem a b = (a * b^1000000005, 0)
I baked in the base of 1000000007 in several places, but it's easy to generalize by giving Mod a phantom parameter and making a HasBase class to pick the base. Ask a fresh question if you're not sure how and are interested; I'll be happy to do a more thorough writeup. There's a few more instances for Mod that are basically uninteresting and primarily needed because of Haskell's wacko numeric class hierarchy:
instance Show Mod1000000007 where show (Mod n) = show n
instance Real Mod1000000007 where toRational (Mod n) = toRational n
instance Enum Mod1000000007 where
toEnum = Mod . (`mod` 1000000007)
fromEnum (Mod n) = n
Here's the precomputation we want to do for factorials...
type FactMap = Array Int Mod1000000007
factMap :: Int -> FactMap
factMap n = A.listArray (0,n) (scanl (*) 1 [1..])
...and for precomputing frequency maps for each prefix, plus getting a frequency map given a start and end point.
type FreqMap = Map Char Int
freqMaps :: String -> Array Int FreqMap
freqMaps s = go where
go = A.listArray (0, length s)
(M.empty : [M.insertWith (+) c 1 (go A.! i) | (i, c) <- zip [0..] s])
substringFreqMap :: Array Int FreqMap -> Int -> Int -> FreqMap
substringFreqMap maps l r = M.unionWith (-) (maps A.! r) (maps A.! (l-1))
Implementing the core computation described above is just a few lines of code, now that we have suitable Num and Integral instances for Mod1000000007:
palindromeCount :: FactMap -> FreqMap -> Mod1000000007
palindromeCount facts freqs
= toEnum (max 1 mod2Freqs)
* (facts A.! sum div2Freqs)
`div` product (map (facts A.!) div2Freqs)
where
(div2Freqs, Sum mod2Freqs) = foldMap (\n -> ([n `quot` 2], Sum (n `rem` 2))) freqs
Now we just need a short driver to read stuff and pass it around to the appropriate functions.
main :: IO ()
main = do
inp <- getLine
q <- readLn
let freqs = freqMaps inp
facts = factMap (length inp)
replicateM_ q $ do
[l,r] <- map read . words <$> getLine
print . palindromeCount facts $ substringFreqMap freqs l r
That's it. Notably I made no attempt to be fancy about bitwise operations and didn't do anything fancy with accumulators; everything is in what I would consider idiomatic purely-functional style. The final count is about half as much code that runs about 5x faster.
P.S. Just for fun, I replaced the last line with print (l+r :: Int)... and discovered that about half the time is spent in read. Ouch! Seems there's still plenty of low-hanging fruit if this isn't fast enough yet.

Send UTF-16 encoded SOAP request with Ruby and Savon

How do I encode the request in UTF-16? Here's what I have:
# Create Savon client
#client = Savon::Client.new do
wsdl.document = File.expand_path("account_list.wsdl", __FILE__)
end
# Set header encoding
#client.http.headers["Content-Type"] = "text/xml;charset=UTF-16"
# Setup ssl configuration
#client.http.auth.ssl.cert_key_file = "cert_key_file.pem"
#client.http.auth.ssl.cert_file = "cert_file.pem"
#client.http.auth.ssl.ca_cert_file = "ca_cert_file.pem"
#client.http.auth.ssl.verify_mode=:none
# Execute request
response = #client.request :account_list do
soap.body = {
:id => "18615618"
}
end
Here's the begging of what's sent, notice the encoding="UTF-8":
Content-Type: text/xml;charset=UTF-16, SOAPAction: "accountList", Content-Length: 888 <?xml version="1.0" encoding="UTF-8"?><env:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema
Here's the error I get:
< s o a p : E n v e l o p e x m l n s : s o a p = " h t t p : / / s c h e m a s . x m l s o a p . o r g / s o a p / e n v e l o p e / " x m l n s : w s>d
< s o a p : B o d y >
< s o a p : F a u l t >
< f a u l t c o d e > s o a p : C l i e n t < / f a u l t c o d e >
< f a u l t s t r i n g > F a i l e d t o p r o c e s s S O A P r e q u e s t . S O A P b o d y n o t i n U T F - 1 6 .
< / f a u l t s t r i n g >
< d e t a i l >
< w s d l _ o p s : e r r o r > F a i l e d t o p r o c e s s S O A P r e q u e s t . S O A P b o d y n o t i n U T F - 1 6 .
< / w s d l _ o p s : e r r o r >
< / d e t a i l >
< / s o a p : F a u l t >
< / s o a p : B o d y >
< / s o a p : E n v e l o p e >
Savon currently only supports changing the XML directive tag via the integrated Builder-method:
response = #client.request(:account_list) do
soap.xml(:xml, :encoding => "UTF-16") { |xml| xml.id("18615618") }
end
You'll miss out a lot of XML-support by using this approach though. No SOAP envelope, no header or body:
<?xml version="1.0" encoding="UTF-16"?><id>18615618</id>
I'll use your ticket to come up with a better solution asap!

Simple equations solving

Think of a equations system like the following:
a* = b + f + g
b* = a + c + f + g + h
c* = b + d + g + h + i
d* = c + e + h + i + j
e* = d + i + j
f* = a + b + g + k + l
g* = a + b + c + f + h + k + l + m
h* = b + c + d + g + i + l + m + n
...
a, b, c, ... element of { 0, 1 }
a*, b*, c*, ... element of { 0, 1, 2, 3, 4, 5, 6, 7, 8 }
+ ... a normal integer addition
Some of the variables a, b, c... a*, b*, c*... are given. I want to calculate as much other variables (a, b, c... but not a*, b*, c*...) as logically possible.
Example:
given: a = 0; b = 0; c = 0;
given: a* = 1; b* = 2; c* = 1;
a* = b + f + g ==> 1 = 0 + f + g ==> 1 = f + g
b* = a + c + f + g + h ==> 2 = 0 + 0 + f + g + h ==> 2 = f + g + h
c* = b + d + g + h + i ==> 1 = 0 + d + g + h + i ==> 1 = d + g + h + i
1 = f + g
2 = f + g + h ==> 2 = 1 + h ==> h = 1
1 = d + g + h + i ==> 1 = d + g + 1 + i ==> d = 0; g = 0; i = 0;
1 = f + g ==> 1 = f + 0 ==> f = 1
other variables calculated: d = 0; f = 1; g = 0; h = 1; i = 0;
Can anybody think of a way to perform this operations automatically?
Brute force may be possible in this example, but later there are about 400 a, b, c... variables and 400 a*, b*, c*... variables.
This sounds a little like constraint propogation. You might find "Solving every Sudoku Puzzle" a good read to get the general idea.
The problem is NP-complete. Look at the system of equations:
2 = a + c + d1
2 = b + c + d2
2 = a + b + c + d3
Assume that d1,d2,d3 are dummy variables that are only used once and hence add no other constraints that di=0 or di=1. Hence from the first equation follows c=1 if a=0. From the second equation follows c=1 if b=0 and from the third one we get c=0 if a=1 and b=1 and hence we get the relation
c = a NAND b.
Thus we can express any boolean circuit using such a system of equations and therefore the boolean satisfyability problem can be reduced to solving such a system of equations.

Resources