Collect partial results from parallel streams - parallel-processing

In Java8, processing pairs of items in two parallel streams as below:
final List<Item> items = getItemList();
final int l = items.size();
List<String> results = Collections.synchronizedList(new ArrayList<String>());
IntStream.range(0, l - 1).parallel().forEach(
i -> {
Item item1 = items.get(i);
int x1 = item1.x;
IntStream.range(i + 1, l).parallel()
.forEach(j -> {
Item item2 = items.get(j);
int x2 = item2.x;
if (x1 + x2 < 200) return;
// code that writes to ConcurrentHashMap defined near results
if (x1 + x2 > 500) results.add(i + " " + j);
});
}
);
Each stream pair writes to ConcurrentHashMap, and depending on certain conditions it may terminate the stream execution by calling return; or it may write to a synchronized list.
I want to make streams return the results like return i + " " + j and collect those results into a list strings outside. It should be partial as returning nothing must be supported (in case when x1 + x2 < 200).
What would be the most time-efficient (fastest code) way to achieve that?

In this answer, I will not address the time efficiency, because there are correctness problems that should be handled beforehand.
As I said in the comments, it is not possible to stop the stream execution after a certain condition if we parallelize the stream. Otherwise, there might be some pairs (i,j) that are already being executed that are numerically after a pair that triggered the stop condition x1 + x2 < 200.
Another issue is the return; inside the lambda, all it will do is skip the second if for the j for which x1 + x2 < 200 holds, but the stream will continue with j+1.
There is no straightforward way to stop a stream in Java, but we can achieve that with allMatch, as we can expect that as soon as it finds a false value, it will short-circuit and return false right way.
So, this would be a correct version of your code:
IntStream.range(0, l - 1).allMatch(i -> {
int x1 = items.get(i).x;
return IntStream.range(i + 1, l).allMatch(j -> {
int x2 = items.get(j).x;
if (x1 + x2 < 200) {
return false;
} else {
if (x1 + x2 > 500) results2.add(i + " " + j);
return true;
}
});
});
For the following example, with the constructor Item(int x, int y):
final List<Item> items = Arrays.asList(
new Item(200, 0),
new Item(100, 0),
new Item(500, 0),
new Item(400, 0),
new Item(1, 0));
The contents of results in my version is:
[0 2, 0 3, 1 2]
With your code (order and elements vary in each execution):
[2 4, 2 3, 1 2, 0 3, 0 2]

I think this will be more efficient (haven't done any micro benchmarking though):
IntStream.range(0,l-1).forEach(
i -> IntStream.range(i+1,l)
.filter(j -> items.get(i).x + items.get(j).x > 500)
.forEach(j -> results.add(i + " " + j)));
However, if I was really worried about the time taken to do this, I'd pay more attention to what kind of a List implementation is used for items. Perhaps even convert the list to a HashMap<Integer, Item> before getting into the lambda. For example, if items is a LinkedList, any improvement to the lambda may be inconsequential because items.get() will eat up all the time.

Related

Change x,y from 1,1 to p,q using given rules

Given a = p, b = q
In one cycle a can change to a = a + b or b = b + a
In any cycle either of two can be performed but not both.
Starting from a = 1, b = 1
Calculate no of iterations required to convert (x, y) from (1, 1) to (p,q) using the above mentioned rules.
Return not possible if cannot be done
Can anyone tell how to solve this problem.
As already mentioned in a comment you can just go backwards. The larger element must be the one where the calculation was performed. So you could just do the reverse on the larger element and see if you end up with (1, 1). Or better subtract the smaller element directly as many times as needed from the larger one so that it becomes smaller than the other one:
function steps(a, b) {
let count = 0
while (a != b) {
console.log('(' + a + ', ' + b + ')')
let t
if (a > b) {
t = a % b == 0 ? a / b - 1 : Math.floor(a / b)
a -= t * b
} else {
t = b % a == 0 ? b / a - 1 : Math.floor(b / a)
b -= t * a
}
count += t
}
if (a == 1)
return count
return -1
}
console.log(steps(87, 13))
console.log(steps(23, 69))

Compact way to produce a large sequence of strings in lexical order

I want to generate a sequence of strings with the following properties:
Lexically ordered
Theoretically infinite
Compact over a realistic range
Generated by a simple process of incrementation
Matches the regexp /\w+/
The obvious way to generate a lexically-ordered sequence is to choose a string length and pad the strings with a base value like this: 000000, 000001, etc. This approach poses a trade-off between the number of permutations and compactness: a string long enough to yield many permutations will be filled many zeros along the way. Plus, the length I choose sets an upper bound on the total number of permutations unless I have some mechanism for expanding the string when it maxes out.
So I came up with a sequence that works like this:
Each string consists of a "head", which is a base-36 number, followed by an underscore, and then the "tail", which is also a base-36 number padded by an increasing number of zeros
The first cycle goes from 0_0 to 0_z
The second cycle goes from 1_00 to 1_zz
The third cycle goes from 2_000 to 2_zzz, and so on
Once the head has reached z and the tail consists of 36 zs, the first "supercycle" has ended. Now the whole sequence starts over, except the z remains at the beginning, so the new cycle starts with z0_0, then continues to z1_00, and so on
The second supercycle goes zz0_0, zz1_00, and so on
Although the string of zs in the head could become unwieldy over the long run, a single supercycle contains over 10^56 permutations, which is far more than I ever expect to use. The sequence is theoretically infinite but very compact within a realistic range. For instance, the trillionth permutation is a succinct 7_bqd55h8s.
I can generate the sequence relatively simply with this javascript function:
function genStr (n) {
n = BigInt(n);
let prefix = "",
cycle = 0n,
max = 36n ** (cycle + 1n);
while (n >= max) {
n -= max;
if (cycle === 35n) {
prefix += "z";
cycle = 0n;
} else {
cycle++;
}
max = 36n ** (cycle + 1n);
}
return prefix
+ cycle.toString(36)
+ "_"
+ n.toString(36).padStart(Number(cycle) + 1, 0);
}
The n parameter is a number that I increment and pass to the function to get the next member of the sequence. All I need to keep track of is a simple integer, making the sequence very easy to use.
So obviously I spent a lot of time on this and I think it's pretty good, but I'm wondering if there is a better way. Is there a good algorithm for generating a sequence along the lines of the one I'm looking for?
A close idea to yours. (more rafined than my first edit...).
Let our alphabet be A = {0,1,2,3}.
Let |2| mean we iterate from 0 to 2 and |2|^2 mean we generate the cartesian product in a lexically sorted manner (00,01,10,11).
We start with
0 |3|
So we have a string of length 2. We "unshift" the digit 1 which "factorizes" since any 0|3|... is less than 1|3|^2.
1 |3|^2
Same idea: unshift 2, and make words of length 4.
2 |3|^3
Now we can continue and generate
3 |2| |3|^3
Notice |2| and not |3|. Now our maximum number becomes 32333. And as you did, we can now add the carry and start a new supercycle:
33 0|3|
This is a slight improvement, since _ can now be part of our alphabet: we don't need to reserve it as a token separator.
In our case we can represent in a supercycle:
n + n^2 + ... + n^(n-1) + (n-1) * n^(n-1)
\-----------------------/\--------------/
geometric special
In your case, the special part would be n^n (with the nuance that you have theorically one char less so replace n with n-1 everywhere)
The proposed supercycle is of length :
P = (n \sum_{k = 0}^{n-2} n^k) + (n-1) * n^(n-1)
P = (n \sum_{k = 0}^{n-3} n^k) + n^n
P = n(n^{n-2} - 1)/(n-1) + n^n
Here is an example diff with alphabet A={0,1,2}
my genStr(grandinero)
,00 0_0
,01 0_1
,02 0_2
,100 1_00
,101 1_01
,102 1_02
,110 1_10
,111 1_11
,112 1_12
,120 1_20
,121 1_21
,122 1_22
,2000 2_000
,2001 2_001
,2002 2_002
,2010 2_010
,2011 2_011
,2012 2_012
,2020 2_020
,2021 2_021
,2022 2_022
,2100 2_100
,2101 2_101
,2102 2_102
,2110 2_110
,2111 2_111
,2112 2_112
,2120 2_120
,2121 2_121
,2122 2_122
22,00 2_200 <-- end of my supercycle if no '_' allowed
22,01 2_201
22,02 2_202
22,100 2_210
22,101 2_211
22,102 2_212
22,110 2_220
22,111 2_221
22,112 2_222 <-- end of yours
22,120 z0_0
That said, for a given number x, we can can count how many supercycles (E(x / P)) there are, each supercycle making two leading e (e being the last char of A).
e.g: A = {0,1,2} and x = 43
e = 2
P = n(n^{n-2} - 1)/(n-1) + n^n = 3(3^1 -1)/2 + 27 = 30
// our supercycle is of length 30
E(43/30) = 1 // 43 makes one supercycle and a few more "strings"
r = x % P = 13 // this is also x - (E(43/30) * 30) (the rest of the euclidean division by P)
Then for the left over (r = x % P) two cases to consider:
either we fall in the geometric sequence
either we fall in the (n-1) * n^(n-1) part.
1. Adressing the geometric sequence with cumulative sums (x < S_w)
Let S_i be the cumsum of n, n^2,..
S_i = n\sum_{k = 0}^{i-1} n^k
S_i = n/(n-1)*(n^i - 1)
which gives S_0 = 0, S_1 = n, S_2 = n + n^2...
So basically, if x < S_1, we get 0(x), elif x < S_2, we get 1(x-S_1)
Let S_w = S_{n-1} the count of all the numbers we can represent.
If x <= S_w then we want the i such that
S_i < x <= S_{i+1} <=> n^i < (n-1)/n * x + 1 <= n^{i+1}
We can then apply some log flooring (base(n)) to get that i.
We can then associate the string: A[i] + base_n(x - S_i).
Illustration:
This time with A = {0,1,2,3}.
Let x be 17.
Our consecutive S_i are:
S_0 = 0
S_1 = 4
S_2 = S_1 + 4^2 = 20
S_3 = S_2 + 4^3 = 84
S_w = S_{4-1} = S_3 = 84
x=17 is indeed less than 84, we will be able to affect it to one of the S_i ranges.
In particular S_1==4 < x==17 <= S_2==20.
We remove the strings encoded by the leading 0(there are a number S_1 of those strings).
The position to encode with the leading 1 is
x - 4 = 13.
And we conclude the thirteen's string generated with a leading 1 is base_4(13) = '31' (idem string -> '131')
Should we have had x = 21, we would have removed the count of S_2 so 21-20 = 1, which in turn gives with a leading 2 the string '2001'.
2. Adressing x in the special part (x >= S_w)
Let's consider study case below:
with A = {0,1,2}
The special part is
2 |1| |2|^2
that is:
2 0 00
2 0 01
2 0 02
2 0 10
2 0 11
2 0 12
2 0 20
2 0 21
2 0 22
2 1 20
2 1 21
2 1 22
2 1 10
2 1 11
2 1 12
2 1 20
2 1 21
2 1 22
Each incremented number of the second column (here 0 to 1 (specified from |1|)) gives 3^2 combination.
This is similar to the geometric series except that here each range is constant. We want to find the range which means we know which string to prefix.
We can represent it as the matrix
20 (00,01,02,10,11,12,20,21,22)
21 (00,01,02,10,11,12,20,21,22)
The portion in parenthesis is our matrix.
Every item in a row is simply its position base_3 (left-padded with 0).
e.g: n=7 has base_3 value '21'. (7=2*3+1).
'21' does occur in position 7 in the row.
Assuming we get some x (relative to that special part).
E(x / 3^2) gives us the row number (here E(7/9) = 0 so prefix is '20')
x % 3^2 give us the position in the row (here base_3(7%9)='21' giving us the final string '2021')
If we want to observe it remember that we substracted S_w=12 before to get x = 7, so we would call myGen(7+12)
Some code
Notice the same output as long as we stand in the "geometric" range, without supercycle.
Obviously, when carry starts to appear, it depends on whether I can use '_' or not. If yes, my words get shorter otherwise longer.
// https://www.cs.sfu.ca/~ggbaker/zju/math/int-alg.html
// \w insensitive could give base64
// but also éè and other accents...
function base_n(x, n, A) {
const a = []
while (x !== 0n) {
a.push(A[Number(x % n)])
x = x / n // auto floor with bigInt
}
return a.reverse().join('')
}
function mygen (A) {
const n = A.length
const bn = BigInt(n)
const A_last = A[A.length-1]
const S = Array(n).fill(0).map((x, i) => bn * (bn ** BigInt(i) - 1n) / (bn - 1n))
const S_w = S[n-1]
const w = S_w + (bn - 1n) * bn ** (bn - 1n)
const w2 = bn ** (bn - 1n)
const flog_bn = x => {
// https://math.stackexchange.com/questions/1627914/smart-way-to-calculate-floorlogx
let L = 0
while (x >= bn) {
L++
x /= bn
}
return L
}
return function (x) {
x = BigInt(x)
let r = x % w
const q = (x - r) / w
let s
if (r < S_w) {
const i = flog_bn(r * (bn - 1n) / bn + 1n)
const r2 = r - S[i]
s = A[i] + base_n(r2, bn, A).padStart(i+1, '0')
} else {
const n2 = r - S_w
const r2 = n2 % w2
const q2 = (n2 - r2 ) / w2
s = A_last + A[q2] + base_n(r2, bn, A).padStart(n-1, '0')
}
// comma below __not__ necessary, just to ease seeing cycles
return A_last.repeat(2*Number(q)) +','+ s
}
}
function genStr (A) {
A = A.filter(x => x !== '_')
const bn_noUnderscore = BigInt(A.length)
return function (x) {
x = BigInt(x);
let prefix = "",
cycle = 0n,
max = bn_noUnderscore ** (cycle + 1n);
while (x >= max) {
x -= max;
if (cycle === bn_noUnderscore - 1n) {
prefix += "z";
cycle = 0n;
} else {
cycle++;
}
max = bn_noUnderscore ** (cycle + 1n);
}
return prefix
+ base_n(cycle, bn_noUnderscore, A)
+ "_"
+ base_n(x, bn_noUnderscore, A).padStart(Number(cycle) + 1, 0);
}
}
function test(a, b, x){
console.log(a(x), b(x))
}
{
console.log('---my supercycle is shorter if underscore not used. Plenty of room for grandinero')
const A = '0123456789abcdefghijklmnopqrstuvwxyz'.split('').sort((a,b)=>a.localeCompare(b))
let my = mygen(A)
const grandinero = genStr(A)
test(my, grandinero, 1e4)
test(my, grandinero, 1e12)
test(my, grandinero, 106471793335560744271846581685593263893929893610517909620n) // cycle ended for me (w variable value)
}
{
console.log('---\n my supercycle is greater if underscore is used in my alphabet (not grandinero since "forbidden')
// underscore used
const A = '0123456789abcdefghijklmnopqrstuvwxyz_'.split('').sort((a,b)=>a.localeCompare(b))
let my = mygen(A)
const grandinero = genStr(A)
test(my, grandinero, 1e12)
test(my, grandinero, 106471793335560744271846581685593263893929893610517909620n) // cycle ended for me (w variable value)
test(my, grandinero, 1e57) // still got some place in the supercycle
}
After considering the advice provided by #kaya3 and #grodzi and reviewing my original code, I have made some improvements. I realized a few things:
There was a bug in my original code. If one cycle ends at z_z (actually 36 z's after the underscore, but you get the idea) and the next one begins at z0_0, then lexical ordering is broken because _ comes after 0. The separator (or "neck") needs to be lower in lexical order than the lowest possible value of the head.
Though I was initially resistant to the idea of rolling a custom baseN generator so that more characters can be included, I have now come around to the idea.
I can squeeze more permutations out of a given string length by also incrementing the neck. For example, I can go from A00...A0z to A10...A1z, and so on, thus increasing the number of unique strings I can generate with A as the head before I move on to B.
With that in mind, I have revised my code:
// this is the alphabet used in standard baseN conversions:
let baseAlpha = "0123456789abcdefghijklmnopqrstuvwxyz";
// this is a factory for creating a new string generator:
function sequenceGenerator (config) {
let
// alphabets for the head, neck and body:
headAlpha = config.headAlpha,
neckAlpha = config.neckAlpha,
bodyAlpha = config.bodyAlpha,
// length of the body alphabet corresponds to the
// base of the numbering system:
base = BigInt(bodyAlpha.length),
// if bodyAlpha is identical to an alphabet that
// would be used for a standard baseN conversion,
// then use the built-in method, which should be
// much faster:
convertBody = baseAlpha.startsWith(bodyAlpha)
? (n) => n.toString(bodyAlpha.length)
// otherwise, roll a custom baseN generator:
: function (n) {
let s = "";
while (n > 0n) {
let i = n % base;
s = bodyAlpha[i] + s;
n = n / base;
}
return s;
},
// n is used to cache the last iteration and is
// incremented each time you call `getNext`
// it can optionally be initialized to a value other
// than 0:
n = BigInt(config.start || 0),
// see below:
headCycles = [0n],
cycleLength = 0n;
// the length of the body increases by 1 each time the
// head increments, meaning that the total number of
// permutations increases geometrically for each
// character in headAlpha
// here we cache the maximum number of permutations for
// each length of the body
// since we know these values ahead of time, calculating
// them in advance saves time when we generate a new
// string
// more importantly, it saves us from having to do a
// reverse calculation involving Math.log, which requires
// converting BigInts to Numbers, which breaks the
// program on larger numbers:
for (let i = 0; i < headAlpha.length; i++) {
// the maximum number of permutations depends on both
// the string length (i + 1) and the number of
// characters in neckAlpha, since the string length
// remains the same while the neck increments
cycleLength += BigInt(neckAlpha.length) * base ** BigInt(i + 1);
headCycles.push(cycleLength);
}
// given a number n, this function searches through
// headCycles to find where the total number of
// permutations exceeds n
// this is how we avoid the reverse calculation with
// Math.log to determine which head cycle we are on for
// a given permutation:
function getHeadCycle (n) {
for (let i = 0; i < headCycles.length; i++) {
if (headCycles[i] > n) return i;
}
}
return {
cycleLength: cycleLength,
getString: function (n) {
let cyclesDone = Number(n / cycleLength),
headLast = headAlpha[headAlpha.length - 1],
prefix = headLast.repeat(cyclesDone),
nn = n % cycleLength,
headCycle = getHeadCycle(nn),
head = headAlpha[headCycle - 1],
nnn = nn - headCycles[headCycle - 1],
neckCycleLength = BigInt(bodyAlpha.length) ** BigInt(headCycle),
neckCycle = nnn / neckCycleLength,
neck = neckAlpha[Number(neckCycle)],
body = convertBody(nnn % neckCycleLength);
body = body.padStart(headCycle , bodyAlpha[0]);
return prefix + head + neck + body;
},
getNext: function () { return this.getString(n++); }
};
}
let bodyAlpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz",
getStr = sequenceGenerator({
// achieve more permutations within a supercycle
// with a larger headAlpha:
headAlpha: "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
// the highest value of neckAlpha must be lower than
// the lowest value of headAlpha:
neckAlpha: "0",
bodyAlpha: bodyAlpha
});
console.log("---supercycle length:");
console.log(Number(getStr.cycleLength));
console.log("---first two values:")
console.log(getStr.getNext());
console.log(getStr.getNext());
console.log("---arbitrary large value (1e57):");
console.log(getStr.getString(BigInt(1e57)));
console.log("");
// here we use a shorter headAlpha and longer neckAlpha
// to shorten the maximum length of the body, but this also
// decreases the number of permutations in the supercycle:
getStr = sequenceGenerator({
headAlpha: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
neckAlpha: "0123456789",
bodyAlpha: bodyAlpha
});
console.log("---supercycle length:");
console.log(Number(getStr.cycleLength));
console.log("---first two values:");
console.log(getStr.getNext());
console.log(getStr.getNext());
console.log("---arbitrary large value (1e57):");
console.log(getStr.getString(BigInt(1e57)));
EDIT
After further discussion with #grodzi, I have made some more improvements:
I realized that the "neck" or separator wasn't providing much value, so I have gotten rid of it. Later edit: actually, the separator is necessary. I am not sure why I thought it wasn't. Without the separator, the beginning of each new supercycle will lexically precede the end of the previous supercycle. I haven't changed my code below, but anyone using this code should include a separator. I have also realized that I was wrong to use an underscore as the separator. The separator must be a character, such as the hyphen, which lexically precedes the lowest digit used in the sequence (0).
I have taken #grodzi's suggestion to allow the length of the tail to continue growing indefinitely.
Here is the new code:
let baseAlpha = "0123456789abcdefghijklmnopqrstuvwxyz";
function sequenceGenerator (config) {
let headAlpha = config.headAlpha,
tailAlpha = config.tailAlpha,
base = BigInt(tailAlpha.length),
convertTail = baseAlpha.startsWith(tailAlpha)
? (n) => n.toString(tailAlpha.length)
: function (n) {
if (n === 0n) return "0";
let s = "";
while (n > 0n) {
let i = n % base;
s = tailAlpha[i] + s;
n = n / base;
}
return s;
},
n = BigInt(config.start || 0);
return {
getString: function (n) {
let cyclesDone = 0n,
headCycle = 0n,
initLength = 0n,
accum = 0n;
for (;; headCycle++) {
let _accum = accum + base ** (headCycle + 1n + initLength);
if (_accum > n) {
n -= accum;
break;
} else if (Number(headCycle) === headAlpha.length - 1) {
cyclesDone++;
initLength += BigInt(headAlpha.length);
headCycle = -1n;
}
accum = _accum;
}
let headLast = headAlpha[headAlpha.length - 1],
prefix = headLast.repeat(Number(cyclesDone)),
head = headAlpha[Number(headCycle)],
tail = convertTail(n),
tailLength = Number(headCycle + initLength);
tail = tail.padStart(tailLength, tailAlpha[0]);
return prefix + head + tail;
},
getNext: function () { return this.getString(n++); }
};
}
let alpha = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz",
genStr = sequenceGenerator({headAlpha: alpha, tailAlpha: alpha});
console.log("--- first string:");
console.log(genStr.getString(0n));
console.log("--- 1e+57");
console.log(genStr.getString(BigInt(1e+57)));
console.log("--- end of first supercycle:");
console.log(genStr.getString(63n*(1n-(63n**63n))/(1n-63n)-1n));
console.log("--- start of second supercycle:");
console.log(genStr.getString(63n*(1n-(63n**63n))/(1n-63n)));

The sum of my page ranks converge at 0.9

When I'm calculating page ranks of a set of crawled domains, using a dampening factor of 0.85. As mentioned in many page ranks papers, the sum of pageranks should converge to 1. But regardless of how many iterations I do, it seems to converge at 0.90xxx. If I lower dampening factor to 0.5, I move closer to 1 obviously.
Is it bad that the page ranks sum converge at 0.90, and what would this generally implicate?
Yes, it is bad, since it indicate a bug in your implementation. Pagerank gives as a result a probability space, and it must sum to 1 as a basic sanity check.
My guess of the problem is you did not handle 'sinks' - nodes that have no outgoing links.
Common ways to handle sinks are:
For a sink vi, regard all nodes (vi,vj) as existing except vi=vj
remove them from the graph completely (and repeat until convergence)
Link them back to all nodes that linked to them (if vi is a sink, for all edge (vj,vi), add (vi,vj) as well).
Consider the following toy example: 2 pages, A,B. A links to B, B links to nothing. The resulting matrix is:
W=
0 1
0 0
Now, using d=0.85, you get the following equations:
v = 0.85* W'v + 0.15*[1/2,1/2]
v1 = 0.85* (0*v1+0*v2) + 0.15*1/2 = 0.15*1/2 = 0.075
v2 = 0.85*(1*v1 + 0*v2) + 0.15/2 = 0.85v1 + 0.075 = 0.006375 + 0.075 = 0.13875
And the sum is not 1.
However, if you handle the sinks, in one of the suggested approach (let's examine approach (1)), you will get:
W =
0 1
1 0
You will now get the set of equations:
v = 0.85* W'v + 0.15*[1/2,1/2]
v1 = 0.85* (0*v1+1*v2) + 0.15*1/2 = 0.85v2 + 0.075
v2 = 0.85*(1*v1 + 0*v2) + 0.15/2 = 0.85v1 + 0.075 (/0.85)-> 1/0.85 * v2 = v1 + 0.075/0.85
-> (add 2 equations)
1/0.85*v2 + v1 = 0.85v2 + 0.075 + v1 + 0.075/0.85
-> (approximately)
0.326*v2 = 0.163
v2 = 0.5
As you can see, by using this method, we got a probability space and now, as expected, page rank of all nodes sum to 1.
This became the algorithm:
// data structures
private HashMap<String, Double> pageRanks;
private HashMap<String, Double> oldRanks;
private HashMap<String, Integer> numberOutlinks;
private HashMap<String, HashMap<String, Integer>> inlinks;
private HashSet<String> domainsWithNoOutlinks;
private double N;
// data parsing occluded
public void startAlgorithm() {
int maxIterations = 20;
int itr = 0;
double d = 0.85;
double dp = 0;
double dpp = (1 - d) / N;
// initialize pagerank
for (String s : oldRanks.keySet()) {
oldRanks.put(s, 1.0 / N);
}
System.out.println("Starting page rank iterations..");
while (maxIterations >= itr) {
System.out.println("Iteration: " + itr);
dp = 0;
// teleport probability
for (String domain : domainsWithNoOutlinks) {
dp = dp + d * oldRanks.get(domain) / N;
}
for (String domain : oldRanks.keySet()) {
pageRanks.put(domain, dp + dpp);
for (String inlink : inlinks.get(domain).keySet()) { // for every inlink of domain
pageRanks.put(domain, pageRanks.get(domain) + inlinks.get(domain).get(inlink) * d * oldRanks.get(inlink) / numberOutlinks.get(inlink));
}
}
// update pageranks with new values
for (String domain : pageRanks.keySet()) {
oldRanks.put(domain, pageRanks.get(domain));
}
itr++;
}
}
Where this line is the important one:
pageRanks.put(domain, pageRanks.get(domain) + inlinks.get(domain).get(inlink) * d * oldRanks.get(inlink) / numberOutlinks.get(inlink));
inlinks.get(domain).get(inlink) returns how much an inlink "like/referenced" the current domain, and we divide that by how many inlinks that current domain have. And "inlinks.get(domain).get(inlink)" is what I missed in my algorithm hence why the sum didn't converge at 1.
Read more: http://www.ccs.northeastern.edu/home/daikeshi/notes/PageRank.pdf

find all subsets that sum to x - using an initial code

I am trying to build upon a problem, to solve another similar problem... given below is a code for finding the total number of subsets that sum to a particular value, and I am trying to modify the code so that I can return all subsets that sum to that value (instead of finding the count).
Code for finding the total number of suibsets that sum to 'sum':
/**
* method to return number of sets with a given sum.
**/
public static int count = 0;
public static void countSubsetSum2(int arr[], int k, int sum) {
if(sum == 0) {
count++;
return;
}
if(sum != 0 && k == 0) {
return;
}
if(sum < arr[k - 1]) {
countSubsetSum2(arr, k-1, sum);
}
countSubsetSum2(arr, k-1, sum - arr[k-1]);
countSubsetSum2(arr, k-1, sum);
}
Can someone propose some changes to this code, to make it return the subsets rather than the subset count?
Firstly, your code isn't correct.
The function, at every step, recurses with the sum excluding and including the current element 1, moving on to the next element, thanks to these lines:
countSubsetSum2(arr, k-1, sum - arr[k-1]);
countSubsetSum2(arr, k-1, sum);
But then there's also this:
if(sum < arr[k - 1]) {
countSubsetSum2(arr, k-1, sum);
}
which causes it to recurse twice with the sum excluding the current element under some circumstances (which it should never do).
Essentially you just need to remove that if-statement.
If all the elements are positive and sum - arr[k-1] < 0, we'd keep going, but we can never get a sum of 0 since the sum can't increase, thus we'd be doing a lot of unnecessary work. So, if the elements are all positive, we can add a check for if(arr[k - 1] <= sum) to the first call to improve the running time. If the elements aren't all positive, the code won't find all sums.
Now on to printing the sums
If you understand the code well, changing it to print the sums instead should be pretty easy. I suggest you work on understanding it a bit more - trace what the program will do by hand, then trace what you want the program to do.
And a hint for solving the actual problem: On noting that countSubsetSum2(arr, k-1, sum - arr[k-1]); recurses with the sum including the current element (and the other recursive call recurses with the sum excluding the current element), what you should do should become clear.
1: Well, technically it's reversed (we start with the target sum and decrease to 0 instead of starting at 0 and increasing to sum), but the same idea is there.
This is the code that works:
import java.util.LinkedList;
import java.util.Iterator;
import java.util.List;
public class subset{
public static int count = 0;
public static List list = new LinkedList();
public static void countSubsetSum2(int arr[], int k, int sum) {
if(sum <= 0 || k < 0) {
count++;
return;
}
if(sum == arr[k]) {
System.out.print(arr[k]);
for(Iterator i = list.iterator(); i.hasNext();)
System.out.print("\t" + i.next());
System.out.println();
}
list.add(arr[k]);
countSubsetSum2(arr, k-1, sum - arr[k]);
list.remove(list.size() - 1);
countSubsetSum2(arr, k-1, sum);
}
public static void main(String[] args)
{
int [] array = {1, 4, 5, 6};
countSubsetSum2(array, 3, 10);
}
}
First off, the code you have there doesn't seem to actually work (I tested it on input [1,2,3, ..., 10] with a sum of 3 and it output 128).
To get it working, first note that you implemented the algorithm in a pretty unorthodox way. Mathematical functions take input and produce output. (Arguably) the most elegant programming functions should also take input and produce output because then we can reason about them as we reason about math.
In your case you don't produce any output (the return type is void) and instead store the result in a static variable. This means it's hard to tell exactly what it means to call countSubsetSum2. In particular, what happens if you call it multiple times? It does something different each time (because the count variable will have a different starting value!) Instead, if you write countSubsetSum2 so that it returns a value then you can define its behavior to be: countSubsetSum2 returns the number of subsets of the input arr[0...k] that sum to sum. And then you can try proving why your implementation meets that specification.
I'm not doing the greatest job of explaining, but I think a more natural way to write it would be:
// Algorithm stops once k is the least element in the array
if (k == 0) {
if (sum == 0 || sum == arr[k]) {
// Either we can sum to "sum"
return 1;
}
else {
// Or we can't sum to "sum"
return 0;
}
}
// Otherwise, let's recursively see if we can sum to "sum"
// Any valid subset either includes arr[k]
return countSubsetSum2(arr, k-1, sum - arr[k]) +
// Or it doesn't
countSubsetSum2(arr, k-1, sum);
As described above, this function takes an input and outputs a value that we can define and prove to be true mathematically (caveat: it's usually not quite a proof because there are crazy edge cases in most programming languages unfortunately).
Anyways, to get back to your question. The issue with the above code is that it doesn't store any data... it just returns the count. Instead, let's generate the actual subsets while we're generating them. In particular, when I say Any valid subset either includes arr[k] I mean... the subset we're generating includes arr[k]; so add it. Below I assumed that the code you wrote above is java-ish. Hopefully it makes sense:
// Algorithm stops once k is the least element in the array
if (k == 0) {
if (sum == 0 || sum == arr[k]) {
// Either we can sum to "sum" using just arr[0]
// So return a list of all of the subsets that sum to "sum"
// There are actually a few edge cases here, so we need to be careful
List<Set<int>> ret = new List<Set<int>>();
// First consider if the singleton containing arr[k] could equal sum
if (sum == arr[k])
{
Set<int> subSet = new Subset<int>();
subSet.Add(arr[k]);
ret.Add(subSet);
}
// Now consider the empty set
if (sum == 0)
{
Set<int> subSet = new Subset<int>();
ret.Add(subSet);
}
return ret;
}
else {
// Or we can't sum to "sum" using just arr[0]
// So return a list of all of the subsets that sum to "sum". None
// (given our inputs!)
List<Set<int>> ret = new List<Set<int>>();
return ret;
}
}
// Otherwise, let's recursively generate subsets summing to "sum"
// Any valid subset either includes arr[k]
List<Set<int>> subsetsThatNeedKthElement = genSubsetSum(arr, k-1, sum - arr[k]);
// Or it doesn't
List<Set<int>> completeSubsets = genSubsetSum(arr, k-1, sum);
// Note that subsetsThatNeedKthElement only sum to "sum" - arr[k]... so we need to add
// arr[k] to each of those subsets to create subsets which sum to "sum"
// On the other hand, completeSubsets contains subsets which already sum to "sum"
// so they're "complete"
// Initialize it with the completed subsets
List<Set<int>> ret = new List<Set<int>>(completeSubsets);
// Now augment the incomplete subsets and add them to the final list
foreach (Set<int> subset in subsetsThatNeedKthElement)
{
subset.Add(arr[k]);
ret.Add(subset);
}
return ret;
The code is pretty cluttered with all the comments; but the key point is that this implementation always returns what it's specified to return (a list of sets of ints from arr[0] to arr[k] which sum to whatever sum was passed in).
FYI, there is another approach which is "bottom-up" (i.e. doesn't use recursion) which should be more performant. If you implement it that way, then you need to store extra data in static state (a "memoized table")... which is a bit ugly but practical. However, when you implement it this way you need to have a more clever way of generating the subsets. Feel free to ask that question in a separate post after giving it a try.
Based, on the comments/suggestions here, I have been able to get the solution for this problem in this way:
public static int counter = 0;
public static List<List<Integer>> lists = new ArrayList<>();
public static void getSubsetCountThatSumToTargetValue(int[] arr, int k, int targetSum, List<Integer> list) {
if(targetSum == 0) {
counter++;
lists.add(list);
return;
}
if(k <= 0) {
return;
}
getSubsetCountThatSumToTargetValue(arr, k - 1, targetSum, list);
List<Integer> appendedlist = new ArrayList<>();
appendedlist.addAll(list);
appendedlist.add(arr[k - 1]);
getSubsetCountThatSumToTargetValue(arr, k - 1, targetSum - arr[k - 1], appendedlist);
}
The main method looks like this:
public static void main(String[] args) {
int[] arr = {1, 2, 3, 4, 5};
SubSetSum.getSubsetCountThatSumToTargetValue(arr, 5, 9, new ArrayList<Integer>());
System.out.println("Result count: " + counter);
System.out.println("lists: " + lists);
}
Output:
Result: 3
lists: [[4, 3, 2], [5, 3, 1], [5, 4]]
A Python implementation with k moving from 0 to len() - 1:
import functools
def sum_of_subsets( numbers, sum_original ):
def _sum_of_subsets( list, k, sum ):
if sum < 0 or k == len( numbers ):
return
if ( sum == numbers[ k ] ):
expression = functools.reduce( lambda result, num: str( num ) if len( result ) == 0 else \
"%s + %d" % ( result, num ),
sorted( list + [ numbers[ k ]] ),
'' )
print "%d = %s" % ( sum_original, expression )
return
list.append( numbers[ k ] )
_sum_of_subsets( list, k + 1, sum - numbers[ k ])
list.pop( -1 )
_sum_of_subsets( list, k + 1, sum )
_sum_of_subsets( [], 0, sum_original )
...
sum_of_subsets( [ 8, 6, 3, 4, 2, 5, 7, 1, 9, 11, 10, 13, 12, 14, 15 ], 15 )
...
15 = 1 + 6 + 8
15 = 3 + 4 + 8
15 = 1 + 2 + 4 + 8
15 = 2 + 5 + 8
15 = 7 + 8
15 = 2 + 3 + 4 + 6
15 = 1 + 3 + 5 + 6
15 = 4 + 5 + 6
15 = 2 + 6 + 7
15 = 6 + 9
15 = 1 + 2 + 3 + 4 + 5
15 = 1 + 3 + 4 + 7
15 = 1 + 2 + 3 + 9
15 = 2 + 3 + 10
15 = 3 + 5 + 7
15 = 1 + 3 + 11
15 = 3 + 12
15 = 2 + 4 + 9
15 = 1 + 4 + 10
15 = 4 + 11
15 = 1 + 2 + 5 + 7
15 = 1 + 2 + 12
15 = 2 + 13
15 = 1 + 5 + 9
15 = 5 + 10
15 = 1 + 14
15 = 15

Combinations with multiple containers of varying sizes

If you know what this kind of problem is called, let me know (unless you actually know the answer to the question).
If I have a set Z of objects, is there an algorithm for diving them up between a bunch of containers (each holding a certain number of objects)?
To slightly complicate the problem, let's assume the set of objects we start with has a subset X. There are X containers, and each container must hold a single element of X, in addition to other objects (if it has room).
The best way I can think of doing this currently is looking at the disjunction of Z and X, let's call it Y. Then we can generate the z choose x combinations, and then expand that out for all possible combinations of x.
Example:
The actual problem is basically generating all events in a space. Suppose we have two event triggers (X) and 2 event arguments (Y), where Z = X U Y. Each event must have a trigger, and it can have 0...N arguments (depending on the type of event, but that isn't important for now. A trigger can also be an argument. Clearly, in this situation we can have a single event with one trigger and 3 arguments (one of which is the second trigger)
Our event space is as follows (Trigger[Arguments], + indicates a new event):
X1[] + X2[]
X1[Y1] + X2[]
X1[Y2] + X2[]
X1[] + X2[Y1]
X1[] + X2[Y2]
X1[Y1] + X2[Y2]
X1[Y2] + X2[Y1]
X1[X2]
X1[X2,Y1]
X1[X2,Y2]
X1[X2,Y1,Y2]
X2[X1]
X2[X1,Y1]
X2[X1,Y2]
X2[X1,Y1,Y2]
I'm pretty sure that's all the combinations.
Update:
After thinking a bit more about the problem, I have a few thoughts on constraints and stuff: Rules for creating "events":
1) There is an event for every trigger, and every event must have a trigger
2) Event must have > 0 arguments
3) Events cannot share arguments
4) Triggers can be used as arguments
For a brute force solution, perhaps one could generate all permutations of the triggers + events and then eliminate results that don't match the above 4 rules, and treat the ordering as grouping of events?
Thanks for any problem names or ideas!
Algorithm:
For all nonempty subsets Triggers of X:
For all maps from (X \ Triggers) to X:
For all maps from Y to (X union {None}):
print the combination, where an assignment of y in Y to None means y is omitted
In Python:
def assignments(xs, ys):
asgns = [[]]
for x in xs:
asgns1 = []
for y in ys:
for asgn in asgns:
asgn1 = asgn[:]
asgn1.append((x, y))
asgns1.append(asgn1)
asgns = asgns1
return asgns
def combinations(xs, ys):
xroleasgns = assignments(xs, ('argument', 'trigger'))
for xroleasgn in xroleasgns:
triggers = [x for (x, role) in xroleasgn if role == 'trigger']
if (xs or ys) and not triggers:
continue
xargs = [x for (x, role) in xroleasgn if role == 'argument']
for xargasgn in assignments(xargs, triggers):
for yargasgn in assignments(ys, [None] + triggers):
d = dict((x, []) for x in triggers)
for xarg, t in xargasgn:
d[t].append(xarg)
for yarg, t in yargasgn:
if t is not None:
d[t].append(yarg)
print ' + '.join('%s[%s]' % (t, ','.join(args)) for (t, args) in d.iteritems())
"""
>>> assign.combinations(['X1','X2'],['Y1','Y2'])
X1[X2]
X1[X2,Y1]
X1[X2,Y2]
X1[X2,Y1,Y2]
X2[X1]
X2[X1,Y1]
X2[X1,Y2]
X2[X1,Y1,Y2]
X2[] + X1[]
X2[] + X1[Y1]
X2[Y1] + X1[]
X2[] + X1[Y2]
X2[] + X1[Y1,Y2]
X2[Y1] + X1[Y2]
X2[Y2] + X1[]
X2[Y2] + X1[Y1]
X2[Y1,Y2] + X1[]
"""
Here is my java implementation over9000's solution to the original problem:
public static void main(String[] args) throws Exception {
ArrayList xs = new ArrayList();
ArrayList ys = new ArrayList();
xs.add("X1");
xs.add("X2");
ys.add("Y1");
ys.add("Y2");
combinations(xs,ys);
}
private static void combinations(ArrayList xs, ArrayList ys) {
ArrayList def = new ArrayList();
def.add("argument");
def.add("trigger");
ArrayList<ArrayList> xroleasgns = assignments(xs, def);
for(ArrayList xroleasgn:xroleasgns){
// create triggers list
ArrayList triggers = new ArrayList();
for(Object o:xroleasgn){
Pair p = (Pair)o;
if("trigger".equals(p.b.toString()))
triggers.add(p.a);
}
if((xs.size()>0 || ys.size()>0) && triggers.size()==0)
continue;
// create xargs list
ArrayList xargs = new ArrayList();
for(Object o:xroleasgn){
Pair p = (Pair)o;
if("argument".equals(p.b.toString()))
xargs.add(p.a);
}
// Get combinations!
for(ArrayList xargasgn:assignments(xargs,triggers)){
ArrayList yTriggers = new ArrayList(triggers);
yTriggers.add(null);
for(ArrayList yargasgn:assignments(ys,yTriggers)){
// d = dict((x, []) for x in triggers)
HashMap<Object,ArrayList> d = new HashMap<Object,ArrayList>();
for(Object x:triggers)
d.put(x, new ArrayList());
for(Object o:xargasgn){
Pair p = (Pair)o;
d.get(p.b).add(p.a);
}
for(Object o:yargasgn){
Pair p = (Pair)o;
if(p.b!=null){
d.get(p.b).add(p.a);
}
}
for(Entry<Object, ArrayList> e:d.entrySet()){
Object t = e.getKey();
ArrayList args = e.getValue();
System.out.print(t+"["+args.toString()+"]"+"+");
}
System.out.println();
}
}
}
}
private static ArrayList<ArrayList> assignments(ArrayList xs, ArrayList def) {
ArrayList<ArrayList> asgns = new ArrayList<ArrayList>();
asgns.add(new ArrayList()); //put an initial empty arraylist
for(Object x:xs){
ArrayList asgns1 = new ArrayList();
for(Object y:def){
for(ArrayList<Object> asgn:asgns){
ArrayList asgn1 = new ArrayList();
asgn1.addAll(asgn);
asgn1.add(new Pair(x,y));
asgns1.add(asgn1);
}
}
asgns = asgns1;
}
return asgns;
}

Resources