Find optimal array intersections - algorithm

I have an array of arrays and a matching array. Each array has unique id values.
MatchingArray = [1,2,3,4,5,6]
A1 = [1, 4, 6]
A2 = [2,3,5]
A3 = [1,5]
A4 = [4]
A5 = [1, 6]
Need to find "optimal matchings". An optimal matching is an array of subsets from A1-A5 with minimal length, which should have a maximum possible intersection with MatchingArray.
For this example there are 2 possible matchings with a maximum intersection: M1 = [[2,3,5], [1, 4, 6]] and M2 = [[1,5], [4], [1, 6]]. But M1.length < M2.length, so the algorithm should output M1.

You could use sets (or hashes, whatever the language calls them) to optimise the time efficiency.
Convert the target array to a set, and then subtract the selected source from it (i.e. removing common values). Keep doing this recursively until the target set is empty. Keep track of the best result (using the fewest source arrays as possible). Backtrack if the number of source arrays being used gets past the length of the best solution already found at that moment.
Here is the code in Python:
def find_optimal_coverage(target, sources):
max_size = len(target)
best = None
def recurse(target, sources, selected):
nonlocal max_size, best
if len(target) == 0:
best = selected
max_size = len(best) - 1
return True
if len(selected) == max_size:
return None
for i, source in enumerate(sources):
result = recurse(target - set(source), sources[i+1:],
selected + [list(source)])
if result:
return True
target = set(target) # convert to set for faster lookup
# limit the source lists to elements that occur in the target
sources = list(map(target.intersection, sources))
# limit target to elements that occur in at least one source
target = set.union(*sources)
# sort sources by decreasing length to maximise probability of
# finding optimal solution sooner
sources.sort(key = len, reverse = True)
if recurse(target, sources, []):
return best
result = find_optimal_coverage(
[1, 2, 3, 4, 5, 6, 8],
[
[1, 4, 6, 7],
[2, 3, 5],
[1, 5],
[4],
[1, 6]
]
)
print(result)
See it run on repl.it
In JavaScript:
function subtractArray(s, arr) {
return arr.reduce( (s, v) => (s.delete(v), s), new Set(s) );
}
function findOptimalCoverage(target, sources) {
var maxSize = target.size;
var best = null;
function recurse(target, sources, selected) {
if (target.size == 0) {
best = selected;
maxSize = best.length - 1;
return true;
}
if (selected.length == maxSize) return;
return sources.some( (source, i) =>
recurse(subtractArray(target, source), sources.slice(i+1),
selected.concat([source]))
);
}
target = new Set(target) // convert to set for faster lookup
// limit the source arrays to elements that occur in the target
sources = sources.map( source => source.filter(target.has.bind(target)));
// limit target to elements that occur in at least one source
target = new Set([].concat(...sources));
// sort sources by decreasing length to maximise probability of
// finding optimal solution sooner
sources.sort( (a,b) => b.length - a.length );
if (recurse(target, sources, [])) return best;
}
var result = findOptimalCoverage(
[1, 2, 3, 4, 5, 6, 8],
[
[1, 4, 6, 7],
[2, 3, 5],
[1, 5],
[4],
[1, 6]
]
);
console.log(result);
.as-console-wrapper { max-height: 100% !important; top: 0; }

Implemented algorithm in javascript:
var matchingArray = [1, 2, 3, 4, 5, 6];
var A1 = [1, 4, 6],
A2 = [2, 3, 5],
A3 = [1, 5],
A4 = [4],
A5 = [1, 6];
var M = [A1, A2, A3, A4, A5];
function compareArrays(M, machingArray) {
var intersections = []
M.forEach(function(A) {
var partOfItersections;
if (A.length > 0) {
var intersectionsCount = getIntersectionCount(A, machingArray);
partOfItersections = intersectionsCount / A.length;
} else {
partOfItersections = 0
}
intersections.push({
length: A.length,
partOfItersections: partOfItersections
});
});
//alert(JSON.stringify(intersections));
var maxLength = 0,
maxPartOfItersections = 0,
optimalArrays = [];
intersections.forEach(function(arrayData, index) {
var currentArr = M[index];
var currentArrLength = currentArr.length;
if (maxPartOfItersections < arrayData.partOfItersections) {
setCurrentOptimalArr(arrayData.partOfItersections, currentArr);
} else if (maxPartOfItersections === arrayData.partOfItersections) {
if (maxLength < currentArrLength) {
setCurrentOptimalArr(arrayData.partOfItersections, currentArr);
} else if (maxLength === currentArrLength) {
optimalArrays.push(currentArr);
}
}
});
//alert(JSON.stringify(optimalArrays));
return optimalArrays;
function setCurrentOptimalArr(intersectionsCount, currentArr) {
optimalArrays = [currentArr];
maxLength = currentArr.length;
maxPartOfItersections = intersectionsCount;
}
function getIntersectionCount(A, machingArray) {
var intersectionCount = 0;
A.forEach(function(elem) {
if (machingArray.indexOf(elem) != -1) {
intersectionCount++;
}
});
return intersectionCount;
}
}
alert(JSON.stringify(compareArrays(M, matchingArray)));
Count intersection of arrays separately.
Return arrays which contain more part of intersections.
Code updated

Related

Java/Kotlin - convert Set to Map<Long, Set<Long>>

I have Set of Long values
Set<Long> ids = {1,2,3,4}
What I'd like to achieve is
Set<Map<Long, Set<Long>>
and from this Set of ids I need to have Set with 4 elements like:
Set: {
Map -> key: 1, values: 2,3,4
Map -> key: 2, values: 1,3,4
Map -> key: 3, values: 1,2,4
Map -> key: 4, values: 1,2,3
}
How can i get it by stream or maybe kotlin's groupBy ?
Was anyone going to have a map like this? (Solution without a for or while loop)
You can use use map method to transform every element to Map then collect it to set
var set = setOf(1, 2, 3, 4)
var map = set.map { v -> mapOf(v to set.filter { it != v }.toSet()) }
.toSet()
However I don't believe it's much better than simple foreach loop due to performance or readability
Opinions on kotlin groupBy
Notice that groupBy can just split the original set into severial sets without intersection. So it's impossible to construct the mentioned map directly with groupBy function.
The solution below take advantage of groupBy when getting result, but result2 is much more clear to read and meets intuition:
fun main() {
val set = setOf(1, 2, 3, 4)
val result = set
.groupBy { it }
.mapValues { (_, values) -> set.filter { it !in values } }
println(result) // {1=[2, 3, 4], 2=[1, 3, 4], 3=[1, 2, 4], 4=[1, 2, 3]}
val result2 = HashMap<Int, List<Int>>().apply {
set.forEach { this[it] = (set - it).toList() }
}
println(result2) // {1=[2, 3, 4], 2=[1, 3, 4], 3=[1, 2, 4], 4=[1, 2, 3]}
}
That would be a possible solution with a for loop:
val ids: Set<Long> = setOf(1, 2, 3, 4)
var result: MutableSet<Map<Long, Set<Long>>> = mutableSetOf()
for (id in ids) {
result.add(mapOf(id to ids.filter { it != id }.toSet()))
}
println(result)

Why does my parallel merge algorithm produce the correct values in all positions of the output except the first?

I am writing a parallel merging algorithm in Rust using scoped-threadpool, but it seems to be producing the correct values in all positions of the output except the first.
I am attempting to adapt the pseudocode from the merge algorithm Wikipedia page:
fn parallel_merge(first: &[i32], second: &[i32], output: &mut [i32]) {
let mut n = first.len();
let mut m = second.len();
let a;
let b;
// Make sure that 'first' is the largest of the two to be merged
if m < n {
a = first;
b = second;
} else {
a = second;
b = first;
let tmp = n;
n = m;
m = tmp;
}
if m <= 0 {
return;
}
let pivot = n / 2;
let s = bisect(a[pivot], b);
let t = pivot + s;
output[t] = a[pivot];
let mut pool = Pool::new(2);
pool.scoped(|scoped| {
let (left, right) = output.split_at_mut(t);
scoped.execute(move || {
parallel_merge(&a[..pivot], &b[..s], left);
});
scoped.execute(move || {
parallel_merge(&a[pivot..], &b[s..], right);
});
});
}
When called with first as the slice [1, 3, 5, 7, 9], second as [2, 4, 6, 8, 10] and a slice of ten zeroes as the initial output, output is left as [0, 2, 3, 4, 5, 6, 7, 8, 9].
What is going wrong? As far as I can see, it matches the pseudocode aside from the unnecessary tracking of indexes.
You've misread the algorithm. m is the length of A:
algorithm merge(A[i...j], B[k...ℓ], C[p...q]) is
let m = j - i,
n = ℓ - k
You have it as the length of B:
let mut m = second.len();
The complete example:
use scoped_threadpool::Pool; // 0.1.9
fn parallel_merge(a: &[i32], b: &[i32], output: &mut [i32]) {
let (a, b) = if a.len() >= b.len() { (a, b) } else { (b, a) };
if a.is_empty() {
return;
}
let pivot = a.len() / 2;
let s = match b.binary_search(&a[pivot]) {
Ok(x) => x,
Err(x) => x,
};
let t = pivot + s;
let (a_left, a_tail) = a.split_at(pivot);
let (a_mid, a_right) = a_tail.split_first().unwrap();
let (b_left, b_right) = b.split_at(s);
let (o_left, o_tail) = output.split_at_mut(t);
let (o_mid, o_right) = o_tail.split_first_mut().unwrap();
*o_mid = *a_mid;
let mut pool = Pool::new(2);
pool.scoped(|scoped| {
scoped.execute(move || parallel_merge(a_left, b_left, o_left));
scoped.execute(move || parallel_merge(a_right, b_right, o_right));
});
}
#[test]
fn exercise() {
let first = [1, 3, 5, 7, 9];
let second = [2, 4, 6, 8, 10];
let mut output = [0; 10];
parallel_merge(&first, &second, &mut output);
assert_eq!(output, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]);
}

What is the best approach to solve this problem?

If an array contained [1, 10, 3, 5, 2, 7] and k = 2, combine the set as {110, 35, 27}, sort the set {27, 35, 110} and split the set into array as [2, 7, 3, 5, 1, 10]
Here is a way to implement this in JavaScript:
const k = 2;
const arr = [1, 10, 3, 5, 2, 7];
// STEP 1 - Combine the set by k pair number
const setCombined = []
for(let i = 0; i < arr.length; ++i) {
if(i % k === 0) {
setCombined.push(parseInt(arr.slice(i, i + k ).join('')))
}
}
console.log('STEP1 - combined: \n', setCombined);
// STEP 2 - Sort
const sortedArray = setCombined.sort((a, b) => a - b)
console.log('STEP2 - sorted: \n', sortedArray);
// STEP 3 - Split sorted
const splitArray = sortedArray.join('').split('').map(e => parseInt(e))
console.log('STEP3 - split: \n', splitArray);
I was not sure though when you said to combine set, if you really ment to keep only unique values or not... Let me know

How can I select the minimum sub-sequence using LINQ?

If I have an array of golf results:
-3, +5, -3, 0, +1, +8, 0, +6, +2, -8, +5
I need to find a sequence of three adjacent numbers which have the minimum sum. For this example, the sub-sequences would be:
[-3, +5, -3]
[+5, -3, 0]
[-3, 0, +1]
... etc ...
[+2, -8, +5]
And the minimum sequence would be [-3, 0, +1] having a sum of -2.
You could use this LINQ query:
int[] golfResult = { -3, +5, -3, 0, +1, +8, 0, +6, +2, -8, +5 };
var combinations = from i in Enumerable.Range(0, golfResult.Length - 2)
select new {
i1 = golfResult[i],
i2 = golfResult[i + 1],
i3 = golfResult[i + 2],
};
var min = combinations.OrderBy(x => x.i1 + x.i2 + x.i3).First();
int[] minGolfResult = { min.i1, min.i2, min.i3 }; // -3, 0, +1
Of course you need to check if there are at least three results in the array.
I'm not sure why you would do this with LINQ. I think a straight up iterative solution is easier to understand:
int[] scores = new[] { -3, 5, -3, 0, 1, 8, 0, 6, 2, -8, 5 };
int minimumSubsequence = int.MaxValue;
int minimumSubsequenceIndex = -1;
for (int i = 0; i < scores.Length - 2; i++)
{
int sum = scores[i] + scores[i + 1] + scores[i + 2];
if (sum < minimumSubsequence)
{
minimumSubsequence = sum;
minimumSubsequenceIndex = i;
}
}
// minimumSubsequenceIndex is index of the first item in the minimum subsequence
// minimumSubsequence is the minimum subsequence's sum.
If you really want to do it in LINQ, you can go this way:
int length = 3;
var scores = new List<int>() { -3, +5, -3, 0, +1, +8, 0, +6, +2, -8, +5 };
var results =
scores
.Select((value, index) => new
{
Value = scores.Skip(index - length + 1).Take(length).Sum(),
Index = index - length + 1
})
.Skip(length - 1)
.OrderBy(x => x.Value)
.First()
.Index;
This creates a second list that sums all length preceeding elements and then sorts it. You have

closest to zero [absolute value] sum of consecutive subsequence of a sequence of real values

this is an algorithmic playground for me! I've seen variations of this problem tackling maximum consecutive subsequence but this is another variation as well.
the formal def:
given A[1..n] find i and j so that abs(A[i]+A[i+1]+...+A[j]) is closest to zero among others.
I'm wondering how to get O(n log^2 n), or even O(n log n) solution.
Calculate the cumulative sum.
Sort it.
Find the sequential pair with least difference.
function leastSubsequenceSum(values) {
var n = values.length;
// Store the cumulative sum along with the index.
var sums = [];
sums[0] = { index: 0, sum: 0 };
for (var i = 1; i <= n; i++) {
sums[i] = {
index: i,
sum: sums[i-1].sum + values[i-1]
};
}
// Sort by cumulative sum
sums.sort(function (a, b) {
return a.sum == b.sum ? b.index - a.index : a.sum - b.sum;
});
// Find the sequential pair with the least difference.
var bestI = -1;
var bestDiff = null;
for (var i = 1; i <= n; i++) {
var diff = Math.abs(sums[i-1].sum - sums[i].sum);
if (bestDiff === null || diff < bestDiff) {
bestDiff = diff;
bestI = i;
}
}
// Just to make sure start < stop
var start = sums[bestI-1].index;
var stop = sums[bestI].index;
if (start > stop) {
var tmp = start;
start = stop;
stop = tmp;
}
return [start, stop-1, bestDiff];
}
Examples:
>>> leastSubsequenceSum([10, -5, 3, -4, 11, -4, 12, 20]);
[2, 3, 1]
>>> leastSubsequenceSum([5, 6, -1, -9, -2, 16, 19, 1, -4, 9]);
[0, 4, 1]
>>> leastSubsequenceSum([3, 16, 8, -10, -1, -8, -3, 10, -2, -4]);
[6, 9, 1]
In the first example, [2, 3, 1] means, sum from index 2 to 3 (inclusive), and you get an absolute sum of 1:
[10, -5, 3, -4, 11, -4, 12, 20]
^^^^^

Resources