How can I intersect more than two sets/lists of values? - linq

Here is an example that works in Linqpad. The problem is that I need it to work for more than two words, e.g. searchString = "headboard bed railing". This is a query against an index and instead of "Match Any Word" which I've done, I need it to "Match All Words", where it finds common key values for each of the searched words.
//Match ALL words for categories in index
string searchString = "headboard bed";
List<string> searchList = new List<string>(searchString.Split(' '));
string word1 = searchList[0];
string word2 = searchList[1];
var List1 = (from i in index
where i.word.ToUpper().Contains(word1)
select i.category.ID).ToList();
var List2 = (from i in index
where i.word.ToUpper().Contains(word2)
select i.category.ID).ToList();
//How can I make this work for more than two Lists?
var commonCats = List1.Intersect(List2).ToList();
var category = (from i in index
from s in commonCats
where commonCats.Contains(i.category.ID)
select new
{
MajorCategory = i.category.category1.description,
MinorCategory = i.category.description,
Taxable = i.category.taxable,
Life = i.category.life,
ID = i.category.ID
}).Distinct().OrderBy(i => i.MinorCategory);
category.Dump();
Thanks!

Intersection of an intersection is commutative and associative. This means that (A ∩ B ∩ C) = (A ∩ (B ∩ C)) = ((A ∩ B) ∩ C), and rearranging the order of the lists will not change the result. So just apply .Intersect() multiple times:
var commonCats = List1.Intersect(List2).Intersect(List3).ToList();
So, to make your code more general:
var searchList = searchString.Split(' ');
// Change "int" if this is not the type of i.category.ID in the query below.
IEnumerable<int> result = null;
foreach (string word in searchList)
{
var query = from i in index
where i.word.ToUpper().Contains(word1)
select i.category.ID;
result = (result == null) ? query : result.Intersect(query);
}
if (result == null)
throw new InvalidOperationException("No words supplied.");
var commonCats = result.ToList();

To build on #cdhowie's answer, why use Intersect? I would think you could make it more efficient by building your query in multiple steps. Something like...
if(string.IsNullOrWhitespace(search))
{
throw new InvalidOperationException("No word supplied.");
}
var query = index.AsQueryable();
var searchList = searchString.Split(' ');
foreach (string word in searchList)
{
query = query.Where(i => i.word.ToUpper().Contains(word));
}
var commonCats = query.Select(i => i.category.ID).ToList();

Related

Linq FirstOrDefault List inside a query

I have this linq query
var numberGroups =
from n in VISRUBs.Where(a => a.VISANA.VISITE.DATEVIS <= d && a.VISANA.VISITE.PANUM == p)
group n by n.RUBRIQUE into g
select new {
RemainderCHAPLIB = g.Key.ANALYSE.CHAPITRE.LIBELLE,
RemainderLIB = g.Key.LIBELLE,
RemainderRUNUM = g.Key.RUNUM,
vals = from vlist in g.OrderByDescending(a=>a.VISANA.VISITE.DATEVIS)
select vlist.VALEUR
};
which gives me this result in Linqpad
What I want is to select the first and second item from the last field (vals) which is a List<string>.
I have tried this:
var numberGroups =
from n in VISRUBs.Where(a => a.VISANA.VISITE.DATEVIS <= d && a.VISANA.VISITE.PANUM == p)
group n by n.RUBRIQUE into g
select new {
RemainderCHAPLIB = g.Key.ANALYSE.CHAPITRE.LIBELLE,
RemainderLIB = g.Key.LIBELLE,
RemainderRUNUM = g.Key.RUNUM,
vals = from vlist in g.OrderByDescending(a => a.VISANA.VISITE.DATEVIS)
select vlist.VALEUR
};
var lst = from n in numberGroups
select new
{
RemainderCHAPLIB = n.RemainderCHAPLIB,
RemainderLIB = n.RemainderLIB,
RemainderRUNUM = n.RemainderRUNUM,
VAL = n.vals.FirstOrDefault()
};
but it didn't work, I got an exception:
Dynamic SQL ErrorSQL error code = -104Token unknown - line 54, column 1OUTER
found it !
var lst = from n in numberGroups.ToList()
select new
{
RemainderCHAPLIB = n.RemainderCHAPLIB,
RemainderLIB = n.RemainderLIB,
RemainderRUNUM = n.RemainderRUNUM,
VAL = n.vals.FirstOrDefault(),
ANT = n.vals.Skip(1).FirstOrDefault()
};

LINQ filtering on a Dictionary<string, IList<string>>

I have code similar to this:
var dict = new Dictionary<string, IList<string>>();
dict.Add("A", new List<string>{"1","2","3"});
dict.Add("B", new List<string>{"2","4"});
dict.Add("C", new List<string>{"3","5","7"});
dict.Add("D", new List<string>{"8","5","7", "2"});
var categories = new List<string>{"A", "B"};
//This gives me categories and their items matching the category list
var result = dict.Where(x => categories.Contains(x.Key));
Key Value
A 1, 2, 3
B 2, 4
What I would like to get is this:
A 2
B 2
So the keys and just the values that are in both lists. Is there a way to do this in LINQ?
Thanks.
Easy peasy:
string key1 = "A";
string key2 = "B";
var intersection = dict[key1].Intersect(dict[key2]);
In general:
var intersection =
categories.Select(c => dict[c])
.Aggregate((s1, s2) => s1.Intersect(s2));
Here, I'm utilizing Enumerable.Intersect.
A somewhat dirty way of doing it...
var results = from c in categories
join d in dict on c equals d.Key
select d.Value;
//Get the limited intersections
IEnumerable<string> intersections = results.First();
foreach(var valueSet in results)
{
intersections = intersections.Intersect(valueSet);
}
var final = from c in categories
join i in intersections on 1 equals 1
select new {Category = c, Intersections = i};
Assuming we have 2 and 3 common to both lists, this will do the following:
A 2
A 3
B 2
B 3

Linq - return index of collection using conditional logic

I have a collection
List<int> periods = new List<int>();
periods.Add(0);
periods.Add(30);
periods.Add(60);
periods.Add(90);
periods.Add(120);
periods.Add(180);
var overDueDays = 31;
I have a variable over due days. When the vale is between 0 to 29 then I want to return the index of 0. When between 30 - 59 I want to return index 1. The periods list is from db so its not hard coded and values can be different from what are here. What is the best way to to it using LINQ in one statement.
It's not really what Linq is designed for, but (assuming that the range is not fixed) you could do the following to get the index
List<int> periods = new List<int>();
periods.Add(0);
periods.Add(30);
periods.Add(60);
periods.Add(90);
periods.Add(120);
periods.Add(180);
var overDueDays = 31;
var result = periods.IndexOf(periods.First(n => overDueDays < n)) - 1;
You can use .TakeWhile():
int periodIndex = periods.TakeWhile(p => p <= overDueDays).Count() - 1;
how about this ?
var qPeriods = periods.Where(v => v <= overDueDays)
.Select((result, i) => new { index = i })
.Last();
Assuming that periods is sorted, you can use the following approach:
var result = periods.Skip(1)
.Select((o, i) => new { Index = i, Value = o })
.FirstOrDefault(o => overDueDays < o.Value);
if (result != null)
{
Console.WriteLine(result.Index);
}
else
{
Console.WriteLine("Matching range not found!");
}
The first value is skipped since we're interested in comparing with the upper value of the range. By skipping it, the indices fall into place without the need to subtract 1. FirstOrDefault is used in case overDueDays doesn't fall between any of the available ranges.

LINQ Grouping: Is there a cleaner way to do this without a for loop

I am trying to create a very simple distribution chart and I want to display the counts of tests score percentages in their corresponding 10's ranges.
I thought about just doing the grouping on the Math.Round((d.Percentage/10-0.5),0)*10 which should give me the 10's value....but I wasn't sure the best way to do this given that I would probably have missing ranges and all ranges need to appear even if the count is zero. I also thought about doing an outer join on the ranges array but since I'm fairly new to Linq so for the sake of time I opted for the code below. I would however like to know what a better way might be.
Also note: As I tend to work with larger teams with varying experience levels, I'm not all that crazy about ultra compact code unless it remains very readable to the average developer.
Any suggestions?
public IEnumerable<TestDistribution> GetDistribution()
{
var distribution = new List<TestDistribution>();
var ranges = new int[] { 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 };
var labels = new string[] { "0%'s", "10%'s", "20%'s", "30%'s", "40%'s", "50%'s", "60%'s", "70%'s", "80%'s", "90%'s", "100%'s", ">110% "};
for (var n = 0; n < ranges.Count(); n++)
{
var count = 0;
var min = ranges[n];
var max = (n == ranges.Count() - 1) ? decimal.MaxValue : ranges[n+1];
count = (from d in Results
where d.Percentage>= min
&& d.Percentage<max
select d)
.Count();
distribution.Add(new TestDistribution() { Label = labels[n], Frequency = count });
}
return distribution;
}
// ranges and labels in a list of pairs of them
var rangesWithLabels = ranges.Zip(labels, (r,l) => new {Range = r, Label = l});
// create a list of intervals (ie. 0-10, 10-20, .. 110 - max value
var rangeMinMax = ranges.Zip(ranges.Skip(1), (min, max) => new {Min = min, Max = max})
.Union(new[] {new {Min = ranges.Last(), Max = Int32.MaxValue}});
//the grouping is made by the lower bound of the interval found for some Percentage
var resultsDistribution = from c in Results
group c by
rangeMinMax.FirstOrDefault(r=> r.Min <= c.Percentage && c.Percentage < r.Max).Min into g
select new {Percentage = g.Key, Frequency = g.Count() };
// left join betweem the labels and the results with frequencies
var distributionWithLabels =
from l in rangesWithLabels
join r in resultsDistribution on l.Range equals r.Percentage
into rd
from r in rd.DefaultIfEmpty()
select new TestDistribution{
Label = l.Label,
Frequency = r != null ? r.Frequency : 0
};
distribution = distributionWithLabels.ToList();
Another solution if the ranges and labels can be created in another way
var ranges = Enumerable.Range(0, 10)
.Select(c=> new {
Min = c * 10,
Max = (c +1 )* 10,
Label = (c * 10) + "%'s"})
.Union(new[] { new {
Min = 100,
Max = Int32.MaxValue,
Label = ">110% "
}});
var resultsDistribution = from c in Results
group c by ranges.FirstOrDefault(r=> r.Min <= c.Percentage && c.Percentage < r.Max).Min
into g
select new {Percentage = g.Key, Frequency = g.Count() };
var distributionWithLabels =
from l in ranges
join r in resultsDistribution on l.Min equals r.Percentage
into rd
from r in rd.DefaultIfEmpty()
select new TestDistribution{
Label = l.Label,
Frequency = r != null ? r.Frequency : 0
};
This works
public IEnumerable<TestDistribution> GetDistribution()
{
var range = 12;
return Enumerable.Range(0, range).Select(
n => new TestDistribution
{
Label = string.Format("{1}{0}%'s", n*10, n==range-1 ? ">" : ""),
Frequency =
Results.Count(
d =>
d.Percentage >= n*10
&& d.Percentage < ((n == range - 1) ? decimal.MaxValue : (n+1)*10))
});
}

Using LINQ to join [n] collections and find matches

Using LINQ I can find matching elements between two collections like this:
var alpha = new List<int>() { 1, 2, 3, 4, 5 };
var beta = new List<int>() { 1, 3, 5 };
return (from a in alpha
join b in beta on a equals b
select a);
I can increased this to three collections, like so:
var alpha = new List<int>() { 1, 2, 3, 4, 5 };
var beta = new List<int>() { 1, 3, 5 };
var gamma = new List<int>() { 3 };
return (from a in alpha
join b in beta on a equals b
join g in gamma on a equals g
select a);
But how can I construct a LINQ query that will return the matches between N number of collections?
I'm thinking if each collection was added to a parent collection, then the parent collection was iterated through using a recursive loop, it may work?
There's no need to recurse - you can just iterate. However, you may find it best to create a set and intersect that each time:
List<List<int>> collections = ...;
HashSet<int> values = new HashSet<int>(collections[0]);
foreach (var collection in collections.Skip(1)) // Already done the first
{
values.IntersectWith(collection);
}
(Like BrokenGlass, I'm assuming you've got distint values, and that you really just want to find the values which are in all the collections.)
If you prefer the immutable and lazy approach, you could use:
List<List<int>> collections = ...;
IEnumerable<int> values = collections[0];
foreach (var collection in collections.Skip(1)) // Already done the first
{
values = values.Intersect(collection);
}
If you have only unique values you can use Intersect:
var result = alpha.Intersect(beta).Intersect(gamma).ToList();
If you need to preserve multiple values that are not unique you can just exclude non-intersecting items from the original collection as an additional step:
alpha = alpha.Where(x => result.Contains(x)).ToList();
To generalize the Intersect approach you can just use a loop to do all intersections one by one:
IEnumerable<List<int>> collections = new [] { alpha, beta, gamma };
IEnumerable<int> result = collections.First();
foreach (var item in collections.Skip(1))
{
result = result.Intersect(item);
}
result = result.ToList();

Resources