Using LINQ to join [n] collections and find matches - linq

Using LINQ I can find matching elements between two collections like this:
var alpha = new List<int>() { 1, 2, 3, 4, 5 };
var beta = new List<int>() { 1, 3, 5 };
return (from a in alpha
join b in beta on a equals b
select a);
I can increased this to three collections, like so:
var alpha = new List<int>() { 1, 2, 3, 4, 5 };
var beta = new List<int>() { 1, 3, 5 };
var gamma = new List<int>() { 3 };
return (from a in alpha
join b in beta on a equals b
join g in gamma on a equals g
select a);
But how can I construct a LINQ query that will return the matches between N number of collections?
I'm thinking if each collection was added to a parent collection, then the parent collection was iterated through using a recursive loop, it may work?

There's no need to recurse - you can just iterate. However, you may find it best to create a set and intersect that each time:
List<List<int>> collections = ...;
HashSet<int> values = new HashSet<int>(collections[0]);
foreach (var collection in collections.Skip(1)) // Already done the first
{
values.IntersectWith(collection);
}
(Like BrokenGlass, I'm assuming you've got distint values, and that you really just want to find the values which are in all the collections.)
If you prefer the immutable and lazy approach, you could use:
List<List<int>> collections = ...;
IEnumerable<int> values = collections[0];
foreach (var collection in collections.Skip(1)) // Already done the first
{
values = values.Intersect(collection);
}

If you have only unique values you can use Intersect:
var result = alpha.Intersect(beta).Intersect(gamma).ToList();
If you need to preserve multiple values that are not unique you can just exclude non-intersecting items from the original collection as an additional step:
alpha = alpha.Where(x => result.Contains(x)).ToList();
To generalize the Intersect approach you can just use a loop to do all intersections one by one:
IEnumerable<List<int>> collections = new [] { alpha, beta, gamma };
IEnumerable<int> result = collections.First();
foreach (var item in collections.Skip(1))
{
result = result.Intersect(item);
}
result = result.ToList();

Related

Default values for empty groups in Linq GroupBy query

I have a data set of values that I want to summarise in groups. For each group, I want to create an array big enough to contain the values of the largest group. When a group contains less than this maximum number, I want to insert a default value of zero for the empty key values.
Dataset
Col1 Col2 Value
--------------------
A X 10
A Z 15
B X 9
B Y 12
B Z 6
Desired result
X, [10, 9]
Y, [0, 12]
Z, [15, 6]
Note that value "A" in Col1 in the dataset has no value for "Y" in Col2. Value "A" is first group in the outer series, therefore it is the first element that is missing.
The following query creates the result dataset, but does not insert the default zero values for the Y group.
result = data.GroupBy(item => item.Col2)
.Select(group => new
{
name = group.Key,
data = group.Select(item => item.Value)
.ToArray()
})
Actual result
X, [10, 9]
Y, [12]
Z, [15, 6]
What do I need to do to insert a zero as the missing group value?
Here is how I understand it.
Let say we have this
class Data
{
public string Col1, Col2;
public decimal Value;
}
Data[] source =
{
new Data { Col1="A", Col2 = "X", Value = 10 },
new Data { Col1="A", Col2 = "Z", Value = 15 },
new Data { Col1="B", Col2 = "X", Value = 9 },
new Data { Col1="B", Col2 = "Y", Value = 12 },
new Data { Col1="B", Col2 = "Z", Value = 6 },
};
First we need to determine the "fixed" part
var columns = source.Select(e => e.Col1).Distinct().OrderBy(c => c).ToList();
Then we can process with the normal grouping, but inside the group we will left join the columns with group elements which will allow us to achieve the desired behavior
var result = source.GroupBy(e => e.Col2, (key, elements) => new
{
Key = key,
Elements = (from c in columns
join e in elements on c equals e.Col1 into g
from e in g.DefaultIfEmpty()
select e != null ? e.Value : 0).ToList()
})
.OrderBy(e => e.Key)
.ToList();
It won't be pretty, but you can do something like this:
var groups = data.GroupBy(d => d.Col2, d => d.Value)
.Select(g => new { g, count = g.Count() })
.ToList();
int maxG = groups.Max(p => p.count);
var paddedGroups = groups.Select(p => new {
name = p.g.Key,
data = p.g.Concat(Enumerable.Repeat(0, maxG - p.count)).ToArray() });
You can do it like this:-
int maxCount = 0;
var result = data.GroupBy(x => x.Col2)
.OrderByDescending(x => x.Count())
.Select(x =>
{
if (maxCount == 0)
maxCount = x.Count();
var Value = x.Select(z => z.Value);
return new
{
name = x.Key,
data = maxCount == x.Count() ? Value.ToArray() :
Value.Concat(new int[maxCount - Value.Count()]).ToArray()
};
});
Code Explanation:-
Since you need to append default zeros in case when you have less items in any group, I am storing the maxCount (which any group can produce in a variable maxCount) for this I am ordering the items in descending order. Next I am storing the maximum count which the item can producr in maxCount variable. While projecting I am simply checking if number of items in the group is not equal to maxCount then create an integer array of size (maxCount - x.Count) i.e. maximum count minus number of items in current group and appending it to the array.
Working Fiddle.

Getting wrong answer when trying to get k subsets from Array in ActionScript

I'm working on a Texas Holdem game and i need to generate all possible k subsets from an Array of cards (represented as numbers in this example). This is how it looks so far:
public function getKSubsetsFromArray(arr:Array, k:int):Array {
var data:Array = new Array();
var result:Array = new Array();
combinations(arr, data, 0, arr.length - 1, 0, k, result, 0);
return result;
}
public function combinations(arr:Array, data:Array, start:int, end:int, index:int, r:int, resultArray:Array, resultIndex:int):int {
if (index == r) {
trace(resultIndex, data);
resultArray[resultIndex] = data;
return ++resultIndex;
}
for (var i:int = start; i<=end && end-i+1 >= r-index; i++) {
data[index] = arr[i];
resultIndex = combinations(arr, data, i + 1, end, index + 1, r, resultArray, resultIndex);
}
return resultIndex;
}
I am new to Actionscript, my idea is to have a function that takes an array of number and a parameter k, and returns an Array of arrays each of size k. However once i test the functions I get an array containing only the last combination nCk times. For example:
var testArray:Array = new Array(1, 2, 3, 4, 5);
trace(getKSubsetsFromArray(testArray, 3));
Returns:
0 1,2,3
1 1,2,4
2 1,2,5
3 1,3,4
4 1,3,5
5 1,4,5
6 2,3,4
7 2,3,5
8 2,4,5
9 3,4,5
The function output is
3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5,3,4,5
Of course it should print an array containing all the combinations listed before but it only prints the last one the right amount of times.
Thank your for your help.
The reason for the error is that when you are making array of arrays you are actually using the reference of the same array (data) so when the last combination is executed the contains of data array become 3,4,5 and each of index of resultArray points to data array so it prints out same values.
Solution :-
if (index == r) {
trace(resultIndex, data);
var result = new Array();
copy(result,data)
resultArray[resultIndex] = result;
return ++resultIndex;
}
Note :-
The above is pseudo code as i am not familiar with actionscript but you can implement copy function that copies values of data into result in actionscript syntax.

LINQ filtering on a Dictionary<string, IList<string>>

I have code similar to this:
var dict = new Dictionary<string, IList<string>>();
dict.Add("A", new List<string>{"1","2","3"});
dict.Add("B", new List<string>{"2","4"});
dict.Add("C", new List<string>{"3","5","7"});
dict.Add("D", new List<string>{"8","5","7", "2"});
var categories = new List<string>{"A", "B"};
//This gives me categories and their items matching the category list
var result = dict.Where(x => categories.Contains(x.Key));
Key Value
A 1, 2, 3
B 2, 4
What I would like to get is this:
A 2
B 2
So the keys and just the values that are in both lists. Is there a way to do this in LINQ?
Thanks.
Easy peasy:
string key1 = "A";
string key2 = "B";
var intersection = dict[key1].Intersect(dict[key2]);
In general:
var intersection =
categories.Select(c => dict[c])
.Aggregate((s1, s2) => s1.Intersect(s2));
Here, I'm utilizing Enumerable.Intersect.
A somewhat dirty way of doing it...
var results = from c in categories
join d in dict on c equals d.Key
select d.Value;
//Get the limited intersections
IEnumerable<string> intersections = results.First();
foreach(var valueSet in results)
{
intersections = intersections.Intersect(valueSet);
}
var final = from c in categories
join i in intersections on 1 equals 1
select new {Category = c, Intersections = i};
Assuming we have 2 and 3 common to both lists, this will do the following:
A 2
A 3
B 2
B 3

Order reversed in Aggregate LINQ query

I think the result should contain [1,1,2,2,3,3] but it contains [3,3,2,2,1,1]. Why is the list being reversed?
var sequence = new int[] { 1, 2, 3 };
var result = sequence.Aggregate(
Enumerable.Empty<int>(),
(acc, s) => Enumerable.Repeat(s, 2).Concat(acc));
Thanks
For every item in the sequence, you are concatenating the repetition to the beginning of the accumulated sequence. Swap the order so you are concatenating to the end.
(acc, s) => acc.Concat(Enumerable.Repeat(s, 2))
On a side note, it would be easier (and more efficient) to do this to get that sequence instead.
var result =
from s in sequence
from x in Enumerable.Repeat(s, 2)
select x;
Simpler way to achieve by using SelectMany:
var sequence = new int[] { 1, 2, 3 };
var result = sequence.SelectMany(i => new[] {i, i}).ToArray();

Linq/lambda question about .Select (newby learning 3.0)

I am playing with the new stuff of C#3.0 and I have this code (mostly taken from MSDN) but I can only get true,false,true... and not the real value :
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var oddNumbers = numbers.Select(n => n % 2 == 1);
Console.WriteLine("Numbers < 5:");
foreach (var x in oddNumbers)
{
Console.WriteLine(x);
}
How can I fix that to show the list of integer?
Change your "Select" to a "Where"
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var oddNumbers = numbers.Where(n => n % 2 == 1);
Console.WriteLine("Odd Number:");
foreach (var x in oddNumbers)
{
Console.WriteLine(x);
}
The "Select" method is creating a new list of the lambda result for each element (true/false). The "Where" method is filtering based on the lambda.
In C#, you could also use this syntax, which you may find clearer:
var oddNumbers = from n in numbers
where n % 2 == 1
select n;
which the compiler translates to:
var oddNumbers = numbers.Where(n => n % 2 == 1).Select(n => n);
numbers.Select(n => n % 2 == 1);
Change this to
numbers.Where(n => n % 2 == 1);
What select does is "convert" one thing to another. So in this case, it's "Converting" n to "n % 2 == 1" (which is a boolean) - hence you get all the true and falses.
It's usually used for getting properties on things. For example if you had a list of Person objects, and you wanted to get their names, you'd do
var listOfNames = listOfPeople.Select( p => p.Name );
You can think of this like so:
Convert the list of people into a list of strings, using the following method: ( p => p.Name)
To "select" (in the "filtering" sense of the word) a subset of a collection, you need to use Where.
Thanks Microsoft for the terrible naming

Resources