Java 8 stream refactoring - java-8

Is it possible to express the following logic more succinctly using Java 8 stream constructs:
public static Set<Pair> findSummingPairsLookAhead(int[] data, int sum){
Set<Pair> collected = new HashSet<>();
Set<Integer> lookaheads = new HashSet<>();
for(int i = 0; i < data.length; i++) {
int elem = data[i];
if(lookaheads.contains(elem)) {
collected.add(new Pair(elem, sum - elem));
}
lookaheads.add(sum - elem);
}
return collected;
}
Something to the effect of Arrays.stream(data).forEach(...).
Thanks in advance.

An algorithm that involves mutating a state during iteration is not well-suited for streams. However, it is often possible to rethink an algorithm in terms of bulk operations that do not explicitly mutate any intermediate state.
In your case, the task is to collect a set of Pair(x, sum - x) where sum - x appears before x in the list. So, we can first build a map of numbers to the index of their first occurrence in the list and then use that map to filter the list and build the set of pairs:
Map<Integer, Integer> firstIdx = IntStream.range(0, data.length)
.boxed()
.collect(toMap(i -> data[i], i -> i, (a, b) -> a));
Set<Pair> result = IntStream.range(0, data.length)
.filter(i -> firstIdx.contains(sum - data[i]))
.filter(i -> firstIdx.get(sum - data[i]) < i)
.mapToObj(i -> new Pair(data[i], sum - data[i]))
.collect(toSet());
You can shorten the two filters by either using && or getOrDefault if you find that clearer.

It's worth mentioning that your imperative style implementation is probably the most effective way to express your expectations. But if you really want to implement same logic using Java 8 Stream API, you can consider utilizing .reduce() method, e.g.
import org.apache.commons.lang3.tuple.Pair;
import java.util.Arrays;
import java.util.Set;
import java.util.concurrent.ConcurrentSkipListSet;
final class SummingPairsLookAheadExample {
public static void main(String[] args) {
final int[] data = new int[]{1,2,3,4,5,6};
final int sum = 8;
final Set<Pair> pairs = Arrays.stream(data)
.boxed()
.parallel()
.reduce(
Pair.of(Collections.synchronizedSet(new HashSet<Pair>()), Collections.synchronizedSet(new HashSet<Integer>())),
(pair,el) -> doSumming(pair, el, sum),
(a,b) -> a
).getLeft();
System.out.println(pairs);
}
synchronized private static Pair<Set<Pair>, Set<Integer>> doSumming(Pair<Set<Pair>, Set<Integer>> pair, int el, int sum) {
if (pair.getRight().contains(el)) {
pair.getLeft().add(Pair.of(el, sum - el));
}
pair.getRight().add(sum - el);
return pair;
}
}
Output
[(5,3), (6,2)]
The first parameter in .reduce() method is accumulator's initial value. This object will be passed to each iteration step. In our case we use a pair of Set<Pair> (expected result) and Set<Integer> (same as variable lookaheads in your example). Second parameter is a lambda (BiFunction) that does the logic (extracted to a separate private method to make code more compact). And the last one is binary operator. It's pretty verbose, but it does not rely on any side effects. #Eugene pointed out that my previous example had issues with parallel execution, so I've updated this example to be safe in parallel execution as well. If you don't run it in parallel you can simply remove synchronized keyword from helper method and use regular sets instead of synchronized one as initial values for accumulator.

Are you trying to get the unique Pairs which's sum equals to specified sum?
Arrays.stream(data).boxed()
.collect(Collectors.groupingBy(i -> i <= sum / 2 ? i : sum - i, toList())).values().stream()
.filter(e -> e.size() > 1 && (e.get(0) * 2 == sum || e.stream().anyMatch(i -> i == sum - e.get(0))))
.map(e -> Pair.of(sum - e.get(0), e.get(0)))
.collect(toList());
A list with unique pairs is returned. you can change it to set by toSet() if you want.

What you have in place is fine (and the java-8 gods are happy). The main problem is that you are relying on side-effects and streams are not very happy about it - they even mention it explicitly in the documentation.
Well I can think of this (I've replaced Pair with SimpleEntry so that I could compile)
public static Set<AbstractMap.SimpleEntry<Integer, Integer>> findSummingPairsLookAhead2(int[] data, int sum) {
Set<Integer> lookaheads = Collections.synchronizedSet(new HashSet<>());
return Arrays.stream(data)
.boxed()
.map(x -> {
lookaheads.add(sum - x);
return x;
})
.filter(lookaheads::contains)
.collect(Collectors.mapping(
x -> new AbstractMap.SimpleEntry<Integer, Integer>(x, sum - x),
Collectors.toSet()));
}
But we are still breaking the side-effects property of map - in a safe way, but still bad. Think about people that will come after you and look at this code; they might find it at least weird.
If you don't ever plan to run this in parallel, you could drop the Collections.synchronizedSet - but do that at your own risk.

Related

Is There Some Stream-Only Way To Determine The Index Of The Max Stream Element?

I have a Stream<Set<Integer>> intSetStream.
I can do this on it...
Set<Integer> theSetWithTheMax = intSetStream.max( (x,y)->{ return Integer.compare( x.size(), y.size() ); } ).get( );
...and I get a hold of the Set<Integer> that has the highest number of Integer elements in it.
That's great. But what I really need to know is, is it the 1st Set in that Stream that's the max? Or is it the 10th Set in the Stream? Or the ith Set? Which one of them has the most elements in it?
So my question is: Is there some way — using the Stream API — that I can determine "It was the ith Set in the Stream of Sets that returned the largest value of them all, for the Set.size( ) call"?
The best solution I can think of, is to iterate over the Stream<Set<Integer>> (using intSetStream.iterator()) and do a hand-rolled max( ) calculation. But I'm hoping to learn a more Stream-y way to go about it; if there is such a thing.
You can do this with a custom collector:
int posOfMax = stream.mapToInt(Set::size)
.collect(() -> new int[] { 0, -1, -1 },
(a,i) -> { int pos = a[0]++; if(i>a[2]) { a[1] = pos; a[2] = i; } },
(a1,a2) -> {
if(a2[2] > a1[2]) { a1[1] = a1[0]+a2[1]; a1[2] = a2[2]; }
a1[0] += a2[0];
})[1];
This is the most lightweight solution. Its logic becomes clearer when we use a dedicated class instead of an array:
int posOfMax = stream.mapToInt(Set::size)
.collect(() -> new Object() { int size = 0, pos = -1, max = -1; },
(o,i) -> { int pos = o.size++; if(i>o.max) { o.pos = pos; o.max = i; } },
(a,b) -> {
if(b.max > a.max) { a.pos = a.size+b.pos; a.max = b.max; }
a.size += b.size;
}).pos;
The state object holds the size, which is simply the number of elements encountered so far, the last encountered max value and its position which we update to the previous value of the size if the current element is bigger than the max value. That’s what the accumulator function (the second argument to collect) does.
In order to support arbitrary evaluation orders, i.e. parallel stream, we have to provide a combiner function (the last argument to collect). It merges the state of two partial evaluation into the first state. If the second state’s max value is bigger, we update the first’s max value and the position, whereas we have to add the first state’s size to the second’s position to reflect the fact that both are partial results. Further, we have to update the size to the sum of both sizes.
One way to do it is to firstly map Stream<Set<Integer>> to a Collection<Integer> where each element is the size of each Set<Integer> and then you can extract what is the largest number of elements given Stream<Set<Integer>> and then get the "index" of this set by finding an index of the largest number in the collection of sizes.
Consider following example:
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class IntSetStreamExample {
public static void main(String[] args) {
final Stream<Set<Integer>> stream = Stream.of(
new HashSet<>(Arrays.asList(1,2,3)),
new HashSet<>(Arrays.asList(1,2)),
new HashSet<>(Arrays.asList(1,2,3,4,5)),
new HashSet<>(Arrays.asList(0)),
new HashSet<>(Arrays.asList(0,1,2,3,4,5)),
new HashSet<>()
);
final List<Integer> result = stream.map(Set::size).collect(Collectors.toList());
System.out.println("List of number of elements in Stream<Set<Integer>>: " + result);
final int max = Collections.max(result);
System.out.println("Largest set contains " + max + " elements");
final int index = result.indexOf(max);
System.out.println("Index of the largest set: " + index);
}
}
The exemplary output may look like this:
List of number of elements in Stream<Set<Integer>>: [3, 2, 5, 1, 6, 0]
Largest set contains 6 elements
Index of the largest set: 4
Streams methods are not designed to be aware of the current element iterated.
So I think that you actual way : find the Set with the max of elements and then iterate on the Sets to find this Set is not bad.
As alternative you could first collect the Stream<Set<Integer>> into a List (to have a way to retrieve the index) and use a SimpleImmutableEntry but it seems really overkill :
Stream<Set<Integer>> intSetStream = ...;
List<Set<Integer>> list = intSetStream.collect(Collectors.toList());
SimpleImmutableEntry<Integer, Set<Integer>> entry =
IntStream.range(0, list.size())
.mapToObj(i -> new SimpleImmutableEntry<>(i, list.get(i)))
.max((x, y) -> {
return Integer.compare(x.getValue()
.size(),
y.getValue()
.size());
})
.get();
Integer index = entry.getKey();
Set<Integer> setWithMaxNbElements = entry.getValue();
Insight provided in #Holzer's custom Collector-based solution (on top of my downright shameless plagiarizing of the source code of IntSummaryStatistics.java), inspired a custom Collector-based solution of my own; that might, in turn, inspire others...
public class IndexOfMaxCollector implements IntConsumer {
private int max = Integer.MIN_VALUE;
private int maxIdx = -1;
private int currIdx = 0;
public void accept( int value ){
if( value > max )
maxIdx = currIdx;
max = Math.max( max, value );
currIdx++;
}
public void combine( IndexOfMaxCollector other ){
if( other.max > max ){
maxIdx = other.maxIdx + currIdx;
max = other.max;
}
currIdx += other.currIdx;
}
public int getMax( ){ return this.max; }
public int getIndexOfMax( ){ return this.maxIdx; }
}
...Using that custom Collector, I could take the intSetStream of my OQ and determine the index of the Set<Integer> that contains the highest number of elements, like this...
int indexOfMax = intSetStream.map( Set::size )
.collect( IndexOfMaxCollector::new,
IndexOfMaxCollector::accept,
IndexOfMaxCollector::combine )
.getIndexOfMax( );
This solution — admittedly not the most "beautiful" — possibly has a teensie bit of an edge over others in both the reusability and understandability stakes.

java 8 streams how to find min difference between elements of 2 lists

I'm totally new to Streams of Java 8 and currently trying to solve this task, I have two lists as follow:
List<Integer> list1 = Arrays.asList(5, 11,17,123);
List<Integer> list2 = Arrays.asList(124,14,80);
I want to find the absolute min difference existing between all the elements out of these lists.
The expected result: 1(124-123=1)
It's not a problem to implement it with Java 7, but how i can achieve it with Streams of Java8? How i can iterate forEach element from List1 and also forEach from List2 and to keep the min value?
Try this one
public static void main(String[] args) {
List<Integer> list1 = Arrays.asList(5, 11,17,123);
List<Integer> list2 = Arrays.asList(124,14,80);
OptionalInt min = list1.stream()
.flatMap(n -> list2.stream()
.map(r -> n-r > 0? n-r: r-n))
.mapToInt(t -> t).min();
System.out.println(min.getAsInt());
}
Edit(Holger suggestion)
OptionalLong min = list1.stream()
.flatMapToLong(n -> list2.stream()
.mapToLong(r -> Math.abs(r-(long)n))).min();
While it is easy to convert a “compare each element of list1 with each element of list2” logic to code using the Stream API and the solution’s source code might be short, it isn’t an efficient solution. If the lists are rather large, you will end up doing n × m operations.
Further, note that the distance between two int values can be up to 2³², which doesn’t fit into the (signed) int value range. So you should use long to express the result, if you’re looking for a general solution.
So a solution may look like this:
public static long smallestDistance(List<Integer> list1, List<Integer> list2) {
int[] list2Sorted = list2.stream().mapToInt(Integer::intValue).sorted().toArray();
if(list2Sorted.length == 0) throw new IllegalArgumentException("list2 is empty");
long smallest = Long.MAX_VALUE;
for(int i: list1) {
int pos = Arrays.binarySearch(list2Sorted, i);
if(pos >= 0) return 0;
pos ^= -1;
if(pos < list2Sorted.length) {
long d = Math.abs(list2Sorted[pos] - (long)i);
if(d < smallest) smallest = d;
}
if(pos > 0) {
long d = Math.abs(list2Sorted[pos-1] - (long)i);
if(d < smallest) smallest = d;
}
}
if(smallest == Long.MAX_VALUE) throw new IllegalArgumentException("list1 is empty");
return smallest;
}
By sorting one list, we can efficiently look up the closest element(s) for each element of the other list. This way, the time complexity reduces from O(n×m) (all cases) to O((n+m)×log(m)) (worst case). As a bonus, it will return immediately if it found a match, as that implies the smallest possible distance of zero.
If you want to test it, consider example input like a list of even and a list of odd numbers:
List<Integer> list1
= IntStream.range(0, 100000).mapToObj(i -> i*2).collect(Collectors.toList());
List<Integer> list2
= IntStream.range(0, 100000).mapToObj(i -> i*2+1).collect(Collectors.toList());
Since there is no exact match, short-cutting is not possible, but the different time complexity will become noticeable.

Finding subsets of items with non-conflicting categories

I have a list of objects with each Item having a cost and a set of resources associated with it (see below). I'm looking for a way to select a subset from this list based on the combined cost and each resource must be contained at most once (not every resource has to be included though). The way the subset's combined cost is calculated should be exchangeable (e.g. max, min, avg). If two subsets have the same combined cost the subset with more items is selected.
Item | cost resources [1..3]
================================
P1 | 0.5 B
P2 | 4 A B C
P3 | 1.5 A B
P4 | 2 C
P5 | 2 A
This would allow for these combinations:
Variant | Items sum
==========================
V1 | P1 P4 P5 4.5
V2 | P2 4
V3 | P3 P4 3.5
For a maximum selection V1 would be selected. The number of items can span from anywhere between 1 and a few dozen, the same is true for the number of resources.
My brute force approach would just sum up the cost of all possible permutations and select the max/min one, but I assume there is a much more efficient way to do this. I'm coding in Java 8 but I'm fine with pseudocode or Matlab.
I found some questions which appeared to be similar (i.e. (1), (2), (3)) but I couldn't quite transfer them to my problem, so forgive me if you think this is a duplicate :/
Thanks in advance!
~
Clarification
A friend of mine was confused about what kinds of sets I want. No matter how I select my subset in the end, I always want to generate subsets with as many items in them as possible. If I have added P3 to my subset and can add P4 without creating a conflict (that is, a resource is used twice within the subset) then I want P3+P4, not just P3.
Clarification2
"Variants don't have to contain all resources" means that if it's impossible to add an item to fill in a missing resource slot without creating a conflict (because all items with the missing resource also have another resource already present) then the subset is complete.
This problem is NP-Hard, even without the "Resources" factor, you are dealing with the knapsack-problem.
If you can transform your costs to relatively small integers, you may be able to modify the Dynamic Programming solution of Knapsack by adding one more dimension per resource allocated, and have a formula similar to (showing concept, make sure all edge cases work or modify if needed):
D(_,_,2,_,_) = D(_,_,_,2,_) = D(_,_,_,_,2) = -Infinity
D(x,_,_,_,_) = -Infinity x < 0
D(x,0,_,_,_) = 0 //this stop clause is "weaker" than above stop clauses - it can applies only if above don't.
D(x,i,r1,r2,r3) = max{1+ D(x-cost[i],i-1,r1+res1[i],r2+res2[i],r3+res3[i]) , D(x,i-1,r1,r2,r3)}
Where cost is array of costs, and res1,res2,res3,... are binary arrays of resources needed by eahc item.
Complexity will be O(W*n*2^#resources)
After giving my problem some more thoughts I came up with a solution I am quite proud of. This solution:
will find all possible complete variants, that is, variants where no additional item can be added without causing a conflict
will also find a few non-complete variants. I can live with that.
can select the final variant by any means you want.
works with non-integer item-values.
I realized that this is indeed not a variant of the knapsack problem, as the items have a value but no weight associated with them (or, you could interpret it as a variant of the multi-dimensional knapsack problem variant but with all weights equal). The code uses some lambda expressions, if you don't use Java 8 you'll have to replace those.
public class BenefitSelector<T extends IConflicting>
{
public ArrayList<T> select(ArrayList<T> proposals, Function<T, Double> valueFunction)
{
if (proposals.isEmpty())
return null;
ArrayList<ArrayList<T>> variants = findVariants(proposals);
double value = 0;
ArrayList<T> selected = null;
for (ArrayList<T> v : variants)
{
double x = 0;
for (T p : v)
x += valueFunction.apply(p);
if (x > value)
{
value = x;
selected = v;
}
}
return selected;
}
private ArrayList<ArrayList<T>> findVariants(ArrayList<T> list)
{
ArrayList<ArrayList<T>> ret = new ArrayList<>();
Conflict c = findConflicts(list);
if (c == null)
ret.add(list);
else
{
ret.addAll(findVariants(c.v1));
ret.addAll(findVariants(c.v2));
}
return ret;
}
private Conflict findConflicts(ArrayList<T> list)
{
// Sort conflicts by the number of items remaining in the first list
TreeSet<Conflict> ret = new TreeSet<>((c1, c2) -> Integer.compare(c1.v1.size(), c2.v1.size()));
for (T p : list)
{
ArrayList<T> conflicting = new ArrayList<>();
for (T p2 : list)
if (p != p2 && p.isConflicting(p2))
conflicting.add(p2);
// If conflicts are found create subsets by
// - v1: removing p
// - v2: removing all objects offended by p
if (!conflicting.isEmpty())
{
Conflict c = new Conflict(p);
c.v1.addAll(list);
c.v1.remove(p);
c.v2.addAll(list);
c.v2.removeAll(conflicting);
ret.add(c);
}
}
// Return only the conflict with the highest number of elements in v1 remaining.
// The algorithm seems to behave in such a way that it is sufficient to only
// descend into this one conflict. As the root list contains all items and we use
// the remainder of objects there should be no way to miss an item.
return ret.isEmpty() ? null
: ret.last();
}
private class Conflict
{
/** contains all items from the superset minus the offending object */
private final ArrayList<T> v1 = new ArrayList<>();
/** contains all items from the superset minus all offended objects */
private final ArrayList<T> v2 = new ArrayList<>();
// Not used right now but useful for debugging
private final T offender;
private Conflict(T offender)
{
this.offender = offender;
}
}
}
Tested with variants of the following setup:
public static void main(String[] args)
{
BenefitSelector<Scavenger> sel = new BenefitSelector<>();
ArrayList<Scavenger> proposals = new ArrayList<>();
proposals.add(new Scavenger("P1", new Resource[] {Resource.B}, 0.5));
proposals.add(new Scavenger("P2", new Resource[] {Resource.A, Resource.B, Resource.C}, 4));
proposals.add(new Scavenger("P3", new Resource[] {Resource.C}, 2));
proposals.add(new Scavenger("P4", new Resource[] {Resource.A, Resource.B}, 1.5));
proposals.add(new Scavenger("P5", new Resource[] {Resource.A}, 2));
proposals.add(new Scavenger("P6", new Resource[] {Resource.C, Resource.D}, 3));
proposals.add(new Scavenger("P7", new Resource[] {Resource.D}, 1));
ArrayList<Scavenger> result = sel.select(proposals, (p) -> p.value);
System.out.println(result);
}
private static class Scavenger implements IConflicting
{
private final String name;
private final Resource[] resources;
private final double value;
private Scavenger(String name, Resource[] resources, double value)
{
this.name = name;
this.resources = resources;
this.value = value;
}
#Override
public boolean isConflicting(IConflicting other)
{
return !Collections.disjoint(Arrays.asList(resources), Arrays.asList(((Scavenger) other).resources));
}
#Override
public String toString()
{
return name;
}
}
This results in [P1(B), P5(A), P6(CD)] with a combined value of 5.5, which is higher than any other combination (e.g. [P2(ABC), P7(D)]=5). As variants aren't lost until they are selected dealing with equal variants is easy as well.

Ordered insertion working sporadically with primitive types & strings

For an assignment, we've been asked to implement both ordered and unordered versions of LinkedLists as Bags in Java. The ordered versions will simply extend the unordered implmentations while overriding the insertion methods.
The ordering on insertion function works... somewhat. Given a test array of
String[] testArray= {"z","g","x","v","y","t","s","r","w","q"};
the output is
q w r s t y v x g z
when it should be
g q r s t v w x y z
However, the ordering works fine when the elements aren't mixed up in value. For example, I originally used the testArray[] above with the alphabe reversed, and the ordering was exactly as it should be.
My add function is
#Override
public void add(E e){
Iter iter= new Iter(head.prev);
int compValue;
E currentItem= null;
//empty list, add at first position
if (size < 1)
iter.add(e);
else {
while (iter.hasNext()){
currentItem= iter.next(); //gets next item
//saves on multiple compareTo calls
compValue= e.compareTo(currentItem);
//adds at given location
if (compValue <= 0)
iter.add(e, iter.index);
else //moves on
currentItem= iter.next();
}
}
}
The iterator functionality is implemented as
//decided to use iterator to simplify method functionality
protected class Iter implements Iterator<E>, ListIterator<E>{
protected int index= 0;
protected Node current= null;
//Sets a new iterator to the index point provided
public Iter(int index){
current= head.next;
this.index=0;
while (index > nextIndex()) //moves on to the index point
next();
}
public void add(E e, int index){
size++;
Iter iterator= new Iter(index);
Node node= new Node();
Node current= iterator.current.prev;
node.next= current.next;
node.prev= current;
node.next.prev= node;
node.prev.next= node;
node.item= e;
}
As it is right now, the only things being used are primitive types. I know for objects, a specific comparable class will have to be written, but in this case, String contains a compareTo() method that should give correct ordering.
By chance, a classmate of mine has a similar implementation and is returning the same results.
Using natural ordering, how can I resolve this problem?
Three things about your add() function jump out at me:
It should exit the loop as soon as it inserts the new value; this might not actually be a problem, but it is inefficient to keep looking
You call next_item at the top of the loop, but call it AGAIN if the value isn't added
If your list has just 1 value in it, and you try to add a value larger than the one currently in the list, won't the new value fail to be added?

LINQ implementation of Cartesian Product with pruning

I hope someone is able to help me with what is, at least to me, quite a tricky algorithm.
The Problem
I have a List (1 <= size <= 5, but size unknown until run-time) of Lists (1 <= size <= 2) that I need to combine. Here is an example of what I am looking at:-
ListOfLists = { {1}, {2,3}, {2,3}, {4}, {2,3} }
So, there are 2 stages to what I need to do:-
(1). I need to combine the inner lists in such a way that any combination has exactly ONE item from each list, that is, the possible combinations in the result set here would be:-
1,2,2,4,2
1,2,2,4,3
1,2,3,4,2
1,2,3,4,3
1,3,2,4,2
1,3,2,4,3
1,3,3,4,2
1,3,3,4,3
The Cartesian Product takes care of this, so stage 1 is done.....now, here comes the twist which I can't figure out - at least I can't figure out a LINQ way of doing it (I am still a LINQ noob).
(2). I now need to filter out any duplicate results from this Cartesian Product. A duplicate in this case constitutes any line in the result set with the same quantity of each distinct list element as another line, that is,
1,2,2,4,3 is the "same" as 1,3,2,4,2
because each distinct item within the first list occurs the same number of times in both lists (1 occurs once in each list, 2 appears twice in each list, ....
The final result set should therefore look like this...
1,2,2,4,2
1,2,2,4,3
--
1,2,3,4,3
--
--
--
1,3,3,4,3
Another example is the worst-case scenario (from a combination point of view) where the ListOfLists is {{2,3}, {2,3}, {2,3}, {2,3}, {2,3}}, i.e. a list containing inner lists of the maximum size - in this case there would obviously be 32 results in the Cartesian Product result-set, but the pruned result-set that I am trying to get at would just be:-
2,2,2,2,2
2,2,2,2,3 <-- all other results with four 2's and one 3 (in any order) are suppressed
2,2,2,3,3 <-- all other results with three 2's and two 3's are suppressed, etc
2,2,3,3,3
2,3,3,3,3
3,3,3,3,3
To any mathematically-minded folks out there - I hope you can help. I have actually got a working solution to part 2, but it is a total hack and is computationally-intensive, and I am looking for guidance in finding a more elegant, and efficient LINQ solution to the issue of pruning.
Thanks for reading.
pip
Some resources used so far (to get the Cartesian Product)
computing-a-cartesian-product-with-linq
c-permutation-of-an-array-of-arraylists
msdn
UPDATE - The Solution
Apologies for not posting this sooner...see below
You should implement your own IEqualityComparer<IEnumerable<int>> and then use that in Distinct().
The choice of hash code in the IEqualityComparer depends on your actual data, but I think something like this should be adequate if your actual data resemble those in your examples:
class UnorderedQeuenceComparer : IEqualityComparer<IEnumerable<int>>
{
public bool Equals(IEnumerable<int> x, IEnumerable<int> y)
{
return x.OrderBy(i => i).SequenceEqual(y.OrderBy(i => i));
}
public int GetHashCode(IEnumerable<int> obj)
{
return obj.Sum(i => i * i);
}
}
The important part is that GetHashCode() should be O(N), sorting would be too slow.
void Main()
{
var query = from a in new int[] { 1 }
from b in new int[] { 2, 3 }
from c in new int[] { 2, 3 }
from d in new int[] { 4 }
from e in new int[] { 2, 3 }
select new int[] { a, b, c, d, e };
query.Distinct(new ArrayComparer());
//.Dump();
}
public class ArrayComparer : IEqualityComparer<int[]>
{
public bool Equals(int[] x, int[] y)
{
if (x == null || y == null)
return false;
return x.OrderBy(i => i).SequenceEqual<int>(y.OrderBy(i => i));
}
public int GetHashCode(int[] obj)
{
if ( obj == null || obj.Length == 0)
return 0;
var hashcode = obj[0];
for (int i = 1; i < obj.Length; i++)
{
hashcode ^= obj[i];
}
return hashcode;
}
}
The finalised solution to the whole combining of multisets, then pruning the result-sets to remove duplicates problem ended up in a helper class as a static method. It takes svick's much appreciated answer and injects the IEqualityComparer dependency into the existing CartesianProduct answer I found at Eric Lipperts's blog here (I'd recommend reading his post as it explains the iterations in his thinking and why the linq implimentation is the best).
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(IEnumerable<IEnumerable<T>> sequences,
IEqualityComparer<IEnumerable<T>> sequenceComparer)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
var resultsSet = sequences.Aggregate(emptyProduct, (accumulator, sequence) => from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item }));
if (sequenceComparer != null)
return resultsSet.Distinct(sequenceComparer);
else
return resultsSet;
}

Resources