Lambdas in Linq AST - why different behaviour? - linq

Let's assume the following demo classes.
public class Foo {
public int key1 {get; set;}
public Foo(int _key1) {
key1 = _key1;
}
}
public class Bar {
public int key2 {get; set;}
public Bar(int _key2) {
key2 = _key2;
}
}
They are combined together in a simple Linq join.
Foo[]aSet = new Foo[3]{new Foo(1),new Foo(2),new Foo(3)};
Bar[]bSet = new Bar[3]{new Bar(1),new Bar(3),new Bar(5)};
Func<int,Func<Foo,bool>> VisibleLambda = w => x => x.key1 > w;
var pb = Expression.Parameter(typeof(Bar),"z");
var pf = Expression.Parameter(typeof(Foo), "y");
PropertyInfo BarId = typeof(Bar).GetProperty("key2");
PropertyInfo FooId = typeof(Foo).GetProperty("key1");
var eqexpr = Expression.Equal(Expression.Property(pb, BarId), Expression.Property(pf, FooId));
var lambdaInt = Expression.Lambda<Func<Bar, bool>>(eqexpr, pb);
var InvisibleLambda = Expression.Lambda<Func<Foo,Func<Bar, bool>>>( lambdaInt,pf);
var query = from a in aSet.Where(VisibleLambda(1))
from b in bSet.Where(InvisibleLambda.Compile()(a))
select new Tuple<Foo,Bar>(a,b);
Now, the query is implemented through an extension
IQueryable<TElement> IQueryProvider.CreateQuery<TElement>(Expression expression)
{
if (expression == null)
throw new ArgumentNullException("expression");
return new ExpressionQueryImpl<TElement>(DataContextInfo, expression);
}
The details of the implementation are irrelevant: my question is only related to the expression derived from the IQueryable.
There are two lambdas: one ("visible") is generated as an argument of the expression with a NodeType Quote that is very easy to analyse, while the other one ("invisible") is generated as a second argument of the expression with "where" clause of NodeType Invoke that is almost invisible in terms of its sql rendering.
Why is that happening and is there a way to work-around and d-tour it?

As pointed out by the comments 1 and 2 of Ivan Stoev, the different behaviour, and in particular the problem in the sql generation, was due to different signature expected from the Queryable.Where
Here is the solution from Igor Tkachev, for anyone who would be interested.
Everything boils down to implementing the helpful extension, where one can leverage the linq method with the appropriate signature: i.e the Queryable.GroupJoin :-)
static class ExpressionTestExtensions
{
public class LeftJoinInfo<TOuter,TInner>
{
public TOuter Outer;
public TInner Inner;
}
[ExpressionMethod("LeftJoinImpl")]
public static IQueryable<LeftJoinInfo<TOuter,TInner>> LeftJoin<TOuter, TInner, TKey>(
this IQueryable<TOuter> outer,
IEnumerable<TInner> inner,
Expression<Func<TOuter, TKey>> outerKeySelector,
Expression<Func<TInner, TKey>> innerKeySelector)
{
return outer
.GroupJoin(inner, outerKeySelector, innerKeySelector, (o, gr) => new { o, gr })
.SelectMany(t => t.gr.DefaultIfEmpty(), (o,i) => new LeftJoinInfo<TOuter,TInner> { Outer = o.o, Inner = i });
}
static Expression<Func<
IQueryable<TOuter>,
IEnumerable<TInner>,
Expression<Func<TOuter,TKey>>,
Expression<Func<TInner,TKey>>,
IQueryable<LeftJoinInfo<TOuter,TInner>>>>
LeftJoinImpl<TOuter, TInner, TKey>()
{
return (outer,inner,outerKeySelector,innerKeySelector) => outer
.GroupJoin(inner, outerKeySelector, innerKeySelector, (o, gr) => new { o, gr })
.SelectMany(t => t.gr.DefaultIfEmpty(), (o,i) => new LeftJoinInfo<TOuter,TInner> { Outer = o.o, Inner = i });
}
}
Having defined such an extension, my "generic join" will turn to
static internal IQueryable<ExpressionTestExtensions.LeftJoinInfo<T2,T1>> NewJoin<T1, T2, TKey>(Expression<Func<T2, TKey>> outer, Expression<Func<T1, TKey>> inner)
where T2: class
where T1 : class
{
using (var db = new MyContext()) {
var query = (from b in db.GetTable<T2>() select b).LeftJoin <T2,T1, TKey>((from f in db.GetTable<T1>() select f), outer, inner);
return query;
}
}
}
Finally, the elegant use case simply becomes
public static void Main(string[] args)
{
Console.WriteLine("Hello World!");
//var queryList = Test.Join<Bar, Foo>(b => q => q.id == b.id);
var queryList = Test.NewJoin<Bar, Foo, int>(q => q.id, b => b.id);
foreach (var telement in queryList)
{
var bar = telement.Inner as Bar;
var element = telement.Outer as Foo;
Console.WriteLine(element.id.ToString() + " " + element.FromDate.ToShortDateString() +" "
+bar.id.ToString() + " " + bar.Name
);
}
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}

Related

Both left join and right join in only one linq query? [duplicate]

I have a list of people's ID and their first name, and a list of people's ID and their surname. Some people don't have a first name and some don't have a surname; I'd like to do a full outer join on the two lists.
So the following lists:
ID FirstName
-- ---------
1 John
2 Sue
ID LastName
-- --------
1 Doe
3 Smith
Should produce:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
3 Smith
I have found quite a few solutions for 'LINQ Outer Joins' which all look quite similar, but really seem to be left outer joins.
My attempts so far go something like this:
private void OuterJoinTest()
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });
List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });
var outerJoin = from first in firstNames
join last in lastNames
on first.ID equals last.ID
into temp
from last in temp.DefaultIfEmpty()
select new
{
id = first != null ? first.ID : last.ID,
firstname = first != null ? first.Name : string.Empty,
surname = last != null ? last.Name : string.Empty
};
}
}
public class FirstName
{
public int ID;
public string Name;
}
public class LastName
{
public int ID;
public string Name;
}
But this returns:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
What am I doing wrong?
Update 1: providing a truly generalized extension method FullOuterJoin
Update 2: optionally accepting a custom IEqualityComparer for the key type
Update 3: this implementation has recently become part of MoreLinq - Thanks guys!
Edit Added FullOuterGroupJoin (ideone). I reused the GetOuter<> implementation, making this a fraction less performant than it could be, but I'm aiming for 'highlevel' code, not bleeding-edge optimized, right now.
See it live on http://ideone.com/O36nWc
static void Main(string[] args)
{
var ax = new[] {
new { id = 1, name = "John" },
new { id = 2, name = "Sue" } };
var bx = new[] {
new { id = 1, surname = "Doe" },
new { id = 3, surname = "Smith" } };
ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
.ToList().ForEach(Console.WriteLine);
}
Prints the output:
{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b = }
{ a = , b = { id = 3, surname = Smith } }
You could also supply defaults: http://ideone.com/kG4kqO
ax.FullOuterJoin(
bx, a => a.id, b => b.id,
(a, b, id) => new { a.name, b.surname },
new { id = -1, name = "(no firstname)" },
new { id = -2, surname = "(no surname)" }
)
Printing:
{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }
Explanation of terms used:
Joining is a term borrowed from relational database design:
A join will repeat elements from a as many times as there are elements in b with corresponding key (i.e.: nothing if b were empty). Database lingo calls this inner (equi)join.
An outer join includes elements from a for which no corresponding
element exists in b. (i.e.: even results if b were empty). This is usually referred to as left join.
A full outer join includes records from a as well as b if no corresponding element exists in the other. (i.e. even results if a were empty)
Something not usually seen in RDBMS is a group join[1]:
A group join, does the same as described above, but instead of repeating elements from a for multiple corresponding b, it groups the records with corresponding keys. This is often more convenient when you wish to enumerate through 'joined' records, based on a common key.
See also GroupJoin which contains some general background explanations as well.
[1] (I believe Oracle and MSSQL have proprietary extensions for this)
Full code
A generalized 'drop-in' Extension class for this
internal static class MyExtensions
{
internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> a,
IEnumerable<TB> b,
Func<TA, TKey> selectKeyA,
Func<TB, TKey> selectKeyB,
Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
IEqualityComparer<TKey> cmp = null)
{
cmp = cmp?? EqualityComparer<TKey>.Default;
var alookup = a.ToLookup(selectKeyA, cmp);
var blookup = b.ToLookup(selectKeyB, cmp);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
keys.UnionWith(blookup.Select(p => p.Key));
var join = from key in keys
let xa = alookup[key]
let xb = blookup[key]
select projection(xa, xb, key);
return join;
}
internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> a,
IEnumerable<TB> b,
Func<TA, TKey> selectKeyA,
Func<TB, TKey> selectKeyB,
Func<TA, TB, TKey, TResult> projection,
TA defaultA = default(TA),
TB defaultB = default(TB),
IEqualityComparer<TKey> cmp = null)
{
cmp = cmp?? EqualityComparer<TKey>.Default;
var alookup = a.ToLookup(selectKeyA, cmp);
var blookup = b.ToLookup(selectKeyB, cmp);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
keys.UnionWith(blookup.Select(p => p.Key));
var join = from key in keys
from xa in alookup[key].DefaultIfEmpty(defaultA)
from xb in blookup[key].DefaultIfEmpty(defaultB)
select projection(xa, xb, key);
return join;
}
}
I don't know if this covers all cases, logically it seems correct. The idea is to take a left outer join and right outer join then take the union of the results.
var firstNames = new[]
{
new { ID = 1, Name = "John" },
new { ID = 2, Name = "Sue" },
};
var lastNames = new[]
{
new { ID = 1, Name = "Doe" },
new { ID = 3, Name = "Smith" },
};
var leftOuterJoin =
from first in firstNames
join last in lastNames on first.ID equals last.ID into temp
from last in temp.DefaultIfEmpty()
select new
{
first.ID,
FirstName = first.Name,
LastName = last?.Name,
};
var rightOuterJoin =
from last in lastNames
join first in firstNames on last.ID equals first.ID into temp
from first in temp.DefaultIfEmpty()
select new
{
last.ID,
FirstName = first?.Name,
LastName = last.Name,
};
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
This works as written since it is in LINQ to Objects. If LINQ to SQL or other, the query processor might not support safe navigation or other operations. You'd have to use the conditional operator to conditionally get the values.
i.e.,
var leftOuterJoin =
from first in firstNames
join last in lastNames on first.ID equals last.ID into temp
from last in temp.DefaultIfEmpty()
select new
{
first.ID,
FirstName = first.Name,
LastName = last != null ? last.Name : default,
};
I think there are problems with most of these, including the accepted answer, because they don't work well with Linq over IQueryable either due to doing too many server round trips and too much data returns, or doing too much client execution.
For IEnumerable I don't like Sehe's answer or similar because it has excessive memory use (a simple 10000000 two list test ran Linqpad out of memory on my 32GB machine).
Also, most of the others don't actually implement a proper Full Outer Join because they are using a Union with a Right Join instead of Concat with a Right Anti Semi Join, which not only eliminates the duplicate inner join rows from the result, but any proper duplicates that existed originally in the left or right data.
So here are my extensions that handle all of these issues, generate SQL as well as implementing the join in LINQ to SQL directly, executing on the server, and is faster and with less memory than others on Enumerables:
public static class Ext {
public static IEnumerable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {
return from left in leftItems
join right in rightItems on leftKeySelector(left) equals rightKeySelector(right) into temp
from right in temp.DefaultIfEmpty()
select resultSelector(left, right);
}
public static IEnumerable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {
return from right in rightItems
join left in leftItems on rightKeySelector(right) equals leftKeySelector(left) into temp
from left in temp.DefaultIfEmpty()
select resultSelector(left, right);
}
public static IEnumerable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
public static IEnumerable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) {
var hashLK = new HashSet<TKey>(from l in leftItems select leftKeySelector(l));
return rightItems.Where(r => !hashLK.Contains(rightKeySelector(r))).Select(r => resultSelector(default(TLeft),r));
}
public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector) where TLeft : class {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;
public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
var sampleAnonLR = new { left = default(TLeft), rightg = default(IEnumerable<TRight>) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
var parmC = Expression.Parameter(typeof(TRight), "c");
var argLeft = Expression.PropertyOrField(parmP, "left");
var newleftrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, parmC), parmP, parmC), sampleAnonLR, default(TRight), default(TResult));
return leftItems.AsQueryable().GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
}
public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
var sampleAnonLR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "p");
var parmC = Expression.Parameter(typeof(TLeft), "c");
var argRight = Expression.PropertyOrField(parmP, "right");
var newrightrs = CastSMBody(Expression.Lambda(Expression.Invoke(resultSelector, parmC, argRight), parmP, parmC), sampleAnonLR, default(TLeft), default(TResult));
return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
}
public static IQueryable<TResult> FullOuterJoinDistinct<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Union(leftItems.RightOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
private static Expression<Func<TP, TResult>> CastSBody<TP, TResult>(LambdaExpression ex, TP unusedP, TResult unusedRes) => (Expression<Func<TP, TResult>>)ex;
public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
var sampleAnonLgR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(), "lgr");
var argLeft = Expression.Constant(default(TLeft), typeof(TLeft));
var argRight = Expression.PropertyOrField(parmLgR, "right");
var newrightrs = CastSBody(Expression.Lambda(Expression.Invoke(resultSelector, argLeft, argRight), parmLgR), sampleAnonLgR, default(TResult));
return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
}
public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
}
The difference between a Right Anti-Semi-Join is mostly moot with Linq to Objects or in the source, but makes a difference on the server (SQL) side in the final answer, removing an unnecessary JOIN.
The hand coding of Expression to handle merging an Expression<Func<>> into a lambda could be improved with LinqKit, but it would be nice if the language/compiler had added some help for that. The FullOuterJoinDistinct and RightOuterJoin functions are included for completeness, but I did not re-implement FullOuterGroupJoin yet.
I wrote another version of a full outer join for IEnumerable for cases where the key is orderable, which is about 50% faster than combining the left outer join with the right anti semi join, at least on small collections. It goes through each collection after sorting just once.
I also added another answer for a version that works with EF by replacing the Invoke with a custom expansion.
Here is an extension method doing that:
public static IEnumerable<KeyValuePair<TLeft, TRight>> FullOuterJoin<TLeft, TRight>(this IEnumerable<TLeft> leftItems, Func<TLeft, object> leftIdSelector, IEnumerable<TRight> rightItems, Func<TRight, object> rightIdSelector)
{
var leftOuterJoin = from left in leftItems
join right in rightItems on leftIdSelector(left) equals rightIdSelector(right) into temp
from right in temp.DefaultIfEmpty()
select new { left, right };
var rightOuterJoin = from right in rightItems
join left in leftItems on rightIdSelector(right) equals leftIdSelector(left) into temp
from left in temp.DefaultIfEmpty()
select new { left, right };
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
return fullOuterJoin.Select(x => new KeyValuePair<TLeft, TRight>(x.left, x.right));
}
I'm guessing #sehe's approach is stronger, but until I understand it better, I find myself leap-frogging off of #MichaelSander's extension. I modified it to match the syntax and return type of the built-in Enumerable.Join() method described here. I appended the "distinct" suffix in respect to #cadrell0's comment under #JeffMercado's solution.
public static class MyExtensions {
public static IEnumerable<TResult> FullJoinDistinct<TLeft, TRight, TKey, TResult> (
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector
) {
var leftJoin =
from left in leftItems
join right in rightItems
on leftKeySelector(left) equals rightKeySelector(right) into temp
from right in temp.DefaultIfEmpty()
select resultSelector(left, right);
var rightJoin =
from right in rightItems
join left in leftItems
on rightKeySelector(right) equals leftKeySelector(left) into temp
from left in temp.DefaultIfEmpty()
select resultSelector(left, right);
return leftJoin.Union(rightJoin);
}
}
In the example, you would use it like this:
var test =
firstNames
.FullJoinDistinct(
lastNames,
f=> f.ID,
j=> j.ID,
(f,j)=> new {
ID = f == null ? j.ID : f.ID,
leftName = f == null ? null : f.Name,
rightName = j == null ? null : j.Name
}
);
In the future, as I learn more, I have a feeling I'll be migrating to #sehe's logic given it's popularity. But even then I'll have to be careful, because I feel it is important to have at least one overload that matches the syntax of the existing ".Join()" method if feasible, for two reasons:
Consistency in methods helps save time, avoid errors, and avoid unintended behavior.
If there ever is an out-of-the-box ".FullJoin()" method in the future, I would imagine it will try to keep to the syntax of the currently existing ".Join()" method if it can. If it does, then if you want to migrate to it, you can simply rename your functions without changing the parameters or worrying about different return types breaking your code.
I'm still new with generics, extensions, Func statements, and other features, so feedback is certainly welcome.
EDIT: Didn't take me long to realize there was a problem with my code. I was doing a .Dump() in LINQPad and looking at the return type. It was just IEnumerable, so I tried to match it. But when I actually did a .Where() or .Select() on my extension I got an error: "'System Collections.IEnumerable' does not contain a definition for 'Select' and ...". So in the end I was able to match the input syntax of .Join(), but not the return behavior.
EDIT: Added "TResult" to the return type for the function. Missed that when reading the Microsoft article, and of course it makes sense. With this fix, it now seems the return behavior is in line with my goals after all.
As you've found, Linq doesn't have an "outer join" construct. The closest you can get is a left outer join using the query you stated. To this, you can add any elements of the lastname list that aren't represented in the join:
outerJoin = outerJoin.Concat(lastNames.Select(l=>new
{
id = l.ID,
firstname = String.Empty,
surname = l.Name
}).Where(l=>!outerJoin.Any(o=>o.id == l.id)));
My clean solution for situation that key is unique in both enumerables:
private static IEnumerable<TResult> FullOuterJoin<Ta, Tb, TKey, TResult>(
IEnumerable<Ta> a, IEnumerable<Tb> b,
Func<Ta, TKey> key_a, Func<Tb, TKey> key_b,
Func<Ta, Tb, TResult> selector)
{
var alookup = a.ToLookup(key_a);
var blookup = b.ToLookup(key_b);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key));
keys.UnionWith(blookup.Select(p => p.Key));
return keys.Select(key => selector(alookup[key].FirstOrDefault(), blookup[key].FirstOrDefault()));
}
so
var ax = new[] {
new { id = 1, first_name = "ali" },
new { id = 2, first_name = "mohammad" } };
var bx = new[] {
new { id = 1, last_name = "rezaei" },
new { id = 3, last_name = "kazemi" } };
var list = FullOuterJoin(ax, bx, a => a.id, b => b.id, (a, b) => "f: " + a?.first_name + " l: " + b?.last_name).ToArray();
outputs:
f: ali l: rezaei
f: mohammad l:
f: l: kazemi
I like sehe's answer, but it does not use deferred execution (the input sequences are eagerly enumerated by the calls to ToLookup). So after looking at the .NET sources for LINQ-to-objects, I came up with this:
public static class LinqExtensions
{
public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> left,
IEnumerable<TRight> right,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TKey, TResult> resultSelector,
IEqualityComparer<TKey> comparator = null,
TLeft defaultLeft = default(TLeft),
TRight defaultRight = default(TRight))
{
if (left == null) throw new ArgumentNullException("left");
if (right == null) throw new ArgumentNullException("right");
if (leftKeySelector == null) throw new ArgumentNullException("leftKeySelector");
if (rightKeySelector == null) throw new ArgumentNullException("rightKeySelector");
if (resultSelector == null) throw new ArgumentNullException("resultSelector");
comparator = comparator ?? EqualityComparer<TKey>.Default;
return FullOuterJoinIterator(left, right, leftKeySelector, rightKeySelector, resultSelector, comparator, defaultLeft, defaultRight);
}
internal static IEnumerable<TResult> FullOuterJoinIterator<TLeft, TRight, TKey, TResult>(
this IEnumerable<TLeft> left,
IEnumerable<TRight> right,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TKey, TResult> resultSelector,
IEqualityComparer<TKey> comparator,
TLeft defaultLeft,
TRight defaultRight)
{
var leftLookup = left.ToLookup(leftKeySelector, comparator);
var rightLookup = right.ToLookup(rightKeySelector, comparator);
var keys = leftLookup.Select(g => g.Key).Union(rightLookup.Select(g => g.Key), comparator);
foreach (var key in keys)
foreach (var leftValue in leftLookup[key].DefaultIfEmpty(defaultLeft))
foreach (var rightValue in rightLookup[key].DefaultIfEmpty(defaultRight))
yield return resultSelector(leftValue, rightValue, key);
}
}
This implementation has the following important properties:
Deferred execution, input sequences will not be enumerated before the output sequence is enumerated.
Only enumerates the input sequences once each.
Preserves order of input sequences, in the sense that it will yield tuples in the order of the left sequence and then the right (for the keys not present in left sequence).
These properties are important, because they are what someone new to FullOuterJoin but experienced with LINQ will expect.
I decided to add this as a separate answer as I am not positive it is tested enough. This is a re-implementation of the FullOuterJoin method using essentially a simplified, customized version of LINQKit Invoke/Expand for Expression so that it should work the Entity Framework. There's not much explanation as it is pretty much the same as my previous answer.
public static class Ext {
private static Expression<Func<TP, TC, TResult>> CastSMBody<TP, TC, TResult>(LambdaExpression ex, TP unusedP, TC unusedC, TResult unusedRes) => (Expression<Func<TP, TC, TResult>>)ex;
public static IQueryable<TResult> LeftOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
// (lrg,r) => resultSelector(lrg.left, r)
var sampleAnonLR = new { left = default(TLeft), rightg = default(IEnumerable<TRight>) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "lrg");
var parmC = Expression.Parameter(typeof(TRight), "r");
var argLeft = Expression.PropertyOrField(parmP, "left");
var newleftrs = CastSMBody(Expression.Lambda(resultSelector.Apply(argLeft, parmC), parmP, parmC), sampleAnonLR, default(TRight), default(TResult));
return leftItems.GroupJoin(rightItems, leftKeySelector, rightKeySelector, (left, rightg) => new { left, rightg }).SelectMany(r => r.rightg.DefaultIfEmpty(), newleftrs);
}
public static IQueryable<TResult> RightOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) {
// (lgr,l) => resultSelector(l, lgr.right)
var sampleAnonLR = new { leftg = default(IEnumerable<TLeft>), right = default(TRight) };
var parmP = Expression.Parameter(sampleAnonLR.GetType(), "lgr");
var parmC = Expression.Parameter(typeof(TLeft), "l");
var argRight = Expression.PropertyOrField(parmP, "right");
var newrightrs = CastSMBody(Expression.Lambda(resultSelector.Apply(parmC, argRight), parmP, parmC), sampleAnonLR, default(TLeft), default(TResult));
return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right })
.SelectMany(l => l.leftg.DefaultIfEmpty(), newrightrs);
}
private static Expression<Func<TParm, TResult>> CastSBody<TParm, TResult>(LambdaExpression ex, TParm unusedP, TResult unusedRes) => (Expression<Func<TParm, TResult>>)ex;
public static IQueryable<TResult> RightAntiSemiJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {
// newrightrs = lgr => resultSelector(default(TLeft), lgr.right)
var sampleAnonLgR = new { leftg = (IEnumerable<TLeft>)null, right = default(TRight) };
var parmLgR = Expression.Parameter(sampleAnonLgR.GetType(), "lgr");
var argLeft = Expression.Constant(default(TLeft), typeof(TLeft));
var argRight = Expression.PropertyOrField(parmLgR, "right");
var newrightrs = CastSBody(Expression.Lambda(resultSelector.Apply(argLeft, argRight), parmLgR), sampleAnonLgR, default(TResult));
return rightItems.GroupJoin(leftItems, rightKeySelector, leftKeySelector, (right, leftg) => new { leftg, right }).Where(lgr => !lgr.leftg.Any()).Select(newrightrs);
}
public static IQueryable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
this IQueryable<TLeft> leftItems,
IQueryable<TRight> rightItems,
Expression<Func<TLeft, TKey>> leftKeySelector,
Expression<Func<TRight, TKey>> rightKeySelector,
Expression<Func<TLeft, TRight, TResult>> resultSelector) where TLeft : class where TRight : class where TResult : class {
return leftItems.LeftOuterJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector).Concat(leftItems.RightAntiSemiJoin(rightItems, leftKeySelector, rightKeySelector, resultSelector));
}
public static Expression Apply(this LambdaExpression e, params Expression[] args) {
var b = e.Body;
foreach (var pa in e.Parameters.Cast<ParameterExpression>().Zip(args, (p, a) => (p, a))) {
b = b.Replace(pa.p, pa.a);
}
return b.PropagateNull();
}
public static Expression Replace(this Expression orig, Expression from, Expression to) => new ReplaceVisitor(from, to).Visit(orig);
public class ReplaceVisitor : System.Linq.Expressions.ExpressionVisitor {
public readonly Expression from;
public readonly Expression to;
public ReplaceVisitor(Expression _from, Expression _to) {
from = _from;
to = _to;
}
public override Expression Visit(Expression node) => node == from ? to : base.Visit(node);
}
public static Expression PropagateNull(this Expression orig) => new NullVisitor().Visit(orig);
public class NullVisitor : System.Linq.Expressions.ExpressionVisitor {
public override Expression Visit(Expression node) {
if (node is MemberExpression nme && nme.Expression is ConstantExpression nce && nce.Value == null)
return Expression.Constant(null, nce.Type.GetMember(nme.Member.Name).Single().GetMemberType());
else
return base.Visit(node);
}
}
public static Type GetMemberType(this MemberInfo member) {
switch (member) {
case FieldInfo mfi:
return mfi.FieldType;
case PropertyInfo mpi:
return mpi.PropertyType;
case EventInfo mei:
return mei.EventHandlerType;
default:
throw new ArgumentException("MemberInfo must be if type FieldInfo, PropertyInfo or EventInfo", nameof(member));
}
}
}
Performs a in-memory streaming enumeration over both inputs and invokes the selector for each row. If there is no correlation at the current iteration, one of the selector arguments will be null.
Example:
var result = left.FullOuterJoin(
right,
x=>left.Key,
x=>right.Key,
(l,r) => new { LeftKey = l?.Key, RightKey=r?.Key });
Requires an IComparer for the correlation type, uses the Comparer.Default if not provided.
Requires that 'OrderBy' is applied to the input enumerables
/// <summary>
/// Performs a full outer join on two <see cref="IEnumerable{T}" />.
/// </summary>
/// <typeparam name="TLeft"></typeparam>
/// <typeparam name="TValue"></typeparam>
/// <typeparam name="TRight"></typeparam>
/// <typeparam name="TResult"></typeparam>
/// <param name="left"></param>
/// <param name="right"></param>
/// <param name="leftKeySelector"></param>
/// <param name="rightKeySelector"></param>
/// <param name="selector">Expression defining result type</param>
/// <param name="keyComparer">A comparer if there is no default for the type</param>
/// <returns></returns>
[System.Diagnostics.DebuggerStepThrough]
public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TValue, TResult>(
this IEnumerable<TLeft> left,
IEnumerable<TRight> right,
Func<TLeft, TValue> leftKeySelector,
Func<TRight, TValue> rightKeySelector,
Func<TLeft, TRight, TResult> selector,
IComparer<TValue> keyComparer = null)
where TLeft: class
where TRight: class
where TValue : IComparable
{
keyComparer = keyComparer ?? Comparer<TValue>.Default;
using (var enumLeft = left.OrderBy(leftKeySelector).GetEnumerator())
using (var enumRight = right.OrderBy(rightKeySelector).GetEnumerator())
{
var hasLeft = enumLeft.MoveNext();
var hasRight = enumRight.MoveNext();
while (hasLeft || hasRight)
{
var currentLeft = enumLeft.Current;
var valueLeft = hasLeft ? leftKeySelector(currentLeft) : default(TValue);
var currentRight = enumRight.Current;
var valueRight = hasRight ? rightKeySelector(currentRight) : default(TValue);
int compare =
!hasLeft ? 1
: !hasRight ? -1
: keyComparer.Compare(valueLeft, valueRight);
switch (compare)
{
case 0:
// The selector matches. An inner join is achieved
yield return selector(currentLeft, currentRight);
hasLeft = enumLeft.MoveNext();
hasRight = enumRight.MoveNext();
break;
case -1:
yield return selector(currentLeft, default(TRight));
hasLeft = enumLeft.MoveNext();
break;
case 1:
yield return selector(default(TLeft), currentRight);
hasRight = enumRight.MoveNext();
break;
}
}
}
}
I've written this extensions class for an app perhaps 6 years ago, and have been using it ever since in many solutions without issues. Hope it helps.
edit: I noticed some might not know how to use an extension class.
To use this extension class, just reference its namespace in your class by adding the following line
using joinext;
^ this should allow you to to see the intellisense of extension functions on any IEnumerable object collection you happen to use.
Hope this helps. Let me know if it's still not clear, and I'll hopefully write a sample example on how to use it.
Now here is the class:
namespace joinext
{
public static class JoinExtensions
{
public static IEnumerable<TResult> FullOuterJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector)
where TInner : class
where TOuter : class
{
var innerLookup = inner.ToLookup(innerKeySelector);
var outerLookup = outer.ToLookup(outerKeySelector);
var innerJoinItems = inner
.Where(innerItem => !outerLookup.Contains(innerKeySelector(innerItem)))
.Select(innerItem => resultSelector(null, innerItem));
return outer
.SelectMany(outerItem =>
{
var innerItems = innerLookup[outerKeySelector(outerItem)];
return innerItems.Any() ? innerItems : new TInner[] { null };
}, resultSelector)
.Concat(innerJoinItems);
}
public static IEnumerable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector)
{
return outer.GroupJoin(
inner,
outerKeySelector,
innerKeySelector,
(o, i) =>
new { o = o, i = i.DefaultIfEmpty() })
.SelectMany(m => m.i.Select(inn =>
resultSelector(m.o, inn)
));
}
public static IEnumerable<TResult> RightJoin<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
IEnumerable<TInner> inner,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector)
{
return inner.GroupJoin(
outer,
innerKeySelector,
outerKeySelector,
(i, o) =>
new { i = i, o = o.DefaultIfEmpty() })
.SelectMany(m => m.o.Select(outt =>
resultSelector(outt, m.i)
));
}
}
}
Full outer join for two or more tables:
First extract the column that you want to join on.
var DatesA = from A in db.T1 select A.Date;
var DatesB = from B in db.T2 select B.Date;
var DatesC = from C in db.T3 select C.Date;
var Dates = DatesA.Union(DatesB).Union(DatesC);
Then use left outer join between the extracted column and main tables.
var Full_Outer_Join =
(from A in Dates
join B in db.T1
on A equals B.Date into AB
from ab in AB.DefaultIfEmpty()
join C in db.T2
on A equals C.Date into ABC
from abc in ABC.DefaultIfEmpty()
join D in db.T3
on A equals D.Date into ABCD
from abcd in ABCD.DefaultIfEmpty()
select new { A, ab, abc, abcd })
.AsEnumerable();
I think that LINQ join clause isn't the correct solution to this problem, because of join clause purpose isn't to accumulate data in such way as required for this task solution. The code to merge created separate collections becomes too complicated, maybe it is OK for learning purposes, but not for real applications. One of the ways how to solve this problem is in the code below:
class Program
{
static void Main(string[] args)
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });
List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });
HashSet<int> ids = new HashSet<int>();
foreach (var name in firstNames)
{
ids.Add(name.ID);
}
foreach (var name in lastNames)
{
ids.Add(name.ID);
}
List<FullName> fullNames = new List<FullName>();
foreach (int id in ids)
{
FullName fullName = new FullName();
fullName.ID = id;
FirstName firstName = firstNames.Find(f => f.ID == id);
fullName.FirstName = firstName != null ? firstName.Name : string.Empty;
LastName lastName = lastNames.Find(l => l.ID == id);
fullName.LastName = lastName != null ? lastName.Name : string.Empty;
fullNames.Add(fullName);
}
}
}
public class FirstName
{
public int ID;
public string Name;
}
public class LastName
{
public int ID;
public string Name;
}
class FullName
{
public int ID;
public string FirstName;
public string LastName;
}
If real collections are large for HashSet formation instead foreach loops can be used the code below:
List<int> firstIds = firstNames.Select(f => f.ID).ToList();
List<int> LastIds = lastNames.Select(l => l.ID).ToList();
HashSet<int> ids = new HashSet<int>(firstIds.Union(LastIds));//Only unique IDs will be included in HashSet
Thank You everybody for the interesting posts!
I modified the code because in my case I needed
a personalized join predicate
a personalized union distinct comparer
For the ones interested this is my modified code (in VB, sorry)
Module MyExtensions
<Extension()>
Friend Function FullOuterJoin(Of TA, TB, TResult)(ByVal a As IEnumerable(Of TA), ByVal b As IEnumerable(Of TB), ByVal joinPredicate As Func(Of TA, TB, Boolean), ByVal projection As Func(Of TA, TB, TResult), ByVal comparer As IEqualityComparer(Of TResult)) As IEnumerable(Of TResult)
Dim joinL =
From xa In a
From xb In b.Where(Function(x) joinPredicate(xa, x)).DefaultIfEmpty()
Select projection(xa, xb)
Dim joinR =
From xb In b
From xa In a.Where(Function(x) joinPredicate(x, xb)).DefaultIfEmpty()
Select projection(xa, xb)
Return joinL.Union(joinR, comparer)
End Function
End Module
Dim fullOuterJoin = lefts.FullOuterJoin(
rights,
Function(left, right) left.Code = right.Code And (left.Amount [...] Or left.Description.Contains [...]),
Function(left, right) New CompareResult(left, right),
New MyEqualityComparer
)
Public Class MyEqualityComparer
Implements IEqualityComparer(Of CompareResult)
Private Function GetMsg(obj As CompareResult) As String
Dim msg As String = ""
msg &= obj.Code & "_"
[...]
Return msg
End Function
Public Overloads Function Equals(x As CompareResult, y As CompareResult) As Boolean Implements IEqualityComparer(Of CompareResult).Equals
Return Me.GetMsg(x) = Me.GetMsg(y)
End Function
Public Overloads Function GetHashCode(obj As CompareResult) As Integer Implements IEqualityComparer(Of CompareResult).GetHashCode
Return Me.GetMsg(obj).GetHashCode
End Function
End Class
Yet another full outer join
As was not that happy with the simplicity and the readability of the other propositions, I ended up with this :
It does not have the pretension to be fast ( about 800 ms to join 1000 * 1000 on a 2020m CPU : 2.4ghz / 2cores). To me, it is just a compact and casual full outer join.
It works the same as a SQL FULL OUTER JOIN (duplicates conservation)
Cheers ;-)
using System;
using System.Collections.Generic;
using System.Linq;
namespace NS
{
public static class DataReunion
{
public static List<Tuple<T1, T2>> FullJoin<T1, T2, TKey>(List<T1> List1, Func<T1, TKey> KeyFunc1, List<T2> List2, Func<T2, TKey> KeyFunc2)
{
List<Tuple<T1, T2>> result = new List<Tuple<T1, T2>>();
Tuple<TKey, T1>[] identifiedList1 = List1.Select(_ => Tuple.Create(KeyFunc1(_), _)).OrderBy(_ => _.Item1).ToArray();
Tuple<TKey, T2>[] identifiedList2 = List2.Select(_ => Tuple.Create(KeyFunc2(_), _)).OrderBy(_ => _.Item1).ToArray();
identifiedList1.Where(_ => !identifiedList2.Select(__ => __.Item1).Contains(_.Item1)).ToList().ForEach(_ => {
result.Add(Tuple.Create<T1, T2>(_.Item2, default(T2)));
});
result.AddRange(
identifiedList1.Join(identifiedList2, left => left.Item1, right => right.Item1, (left, right) => Tuple.Create<T1, T2>(left.Item2, right.Item2)).ToList()
);
identifiedList2.Where(_ => !identifiedList1.Select(__ => __.Item1).Contains(_.Item1)).ToList().ForEach(_ => {
result.Add(Tuple.Create<T1, T2>(default(T1), _.Item2));
});
return result;
}
}
}
The idea is to
Build Ids based on provided key function builders
Process left only items
Process inner join
Process right only items
Here is a succinct test that goes with it :
Place a break point at the end to manually verify that it behaves as expected
using System;
using System.Collections.Generic;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using NS;
namespace Tests
{
[TestClass]
public class DataReunionTest
{
[TestMethod]
public void Test()
{
List<Tuple<Int32, Int32, String>> A = new List<Tuple<Int32, Int32, String>>();
List<Tuple<Int32, Int32, String>> B = new List<Tuple<Int32, Int32, String>>();
Random rnd = new Random();
/* Comment the testing block you do not want to run
/* Solution to test a wide range of keys*/
for (int i = 0; i < 500; i += 1)
{
A.Add(Tuple.Create(rnd.Next(1, 101), rnd.Next(1, 101), "A"));
B.Add(Tuple.Create(rnd.Next(1, 101), rnd.Next(1, 101), "B"));
}
/* Solution for essential testing*/
A.Add(Tuple.Create(1, 2, "B11"));
A.Add(Tuple.Create(1, 2, "B12"));
A.Add(Tuple.Create(1, 3, "C11"));
A.Add(Tuple.Create(1, 3, "C12"));
A.Add(Tuple.Create(1, 3, "C13"));
A.Add(Tuple.Create(1, 4, "D1"));
B.Add(Tuple.Create(1, 1, "A21"));
B.Add(Tuple.Create(1, 1, "A22"));
B.Add(Tuple.Create(1, 1, "A23"));
B.Add(Tuple.Create(1, 2, "B21"));
B.Add(Tuple.Create(1, 2, "B22"));
B.Add(Tuple.Create(1, 2, "B23"));
B.Add(Tuple.Create(1, 3, "C2"));
B.Add(Tuple.Create(1, 5, "E2"));
Func<Tuple<Int32, Int32, String>, Tuple<Int32, Int32>> key = (_) => Tuple.Create(_.Item1, _.Item2);
var watch = System.Diagnostics.Stopwatch.StartNew();
var res = DataReunion.FullJoin(A, key, B, key);
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
String aser = JToken.FromObject(res).ToString(Formatting.Indented);
Console.Write(elapsedMs);
}
}
}
I really hate these linq expressions, this is why SQL exists:
select isnull(fn.id, ln.id) as id, fn.firstname, ln.lastname
from firstnames fn
full join lastnames ln on ln.id=fn.id
Create this as sql view in database and import it as entity.
Of course, (distinct) union of left and right joins will make it too, but it is stupid.

.net core - combine a list of func with or to a single func

Hello i try to generate a single Func from a list combined by or.
var funcs = new List<Func<User, bool>>()
{
(u) => u.Id.Equals(entityToFind.Id),
(u) => u.UserName == entityToFind.UserName,
(u) => u.Email == entityToFind.Email
};
//TODO: Some magic that funs is euqaly to that:
Func<User, bool> func = (u) => u.Id.Equals(entityToFind.Id) || u.UserName == entityToFind.UserName || u.Email == entityToFind.Email;
I also tried it with Expressions, like that:
private Dictionary<string, Expression<Func<User, bool>>> private Dictionary<string, Expression<Func<User, bool>>> test(User entityToFind)
{
return new Dictionary<string, Expression<Func<User, bool>>>() {
{"Id", (u) => u.Id.Equals(entityToFind.Id) },
{"Name", (u) => u.UserName == entityToFind.UserName },
{"Email", (u) => u.Email == entityToFind.Email }
};
}
public static Expression<Func<T, bool>> ToOrExpression<T>(this Dictionary<string, Expression<Func<T, bool>>> dict)
{
var expressions = dict.Values.ToList();
if (!expressions.Any())
{
return t => true;
}
var delegateType = typeof(Func<T, bool>)
.GetGenericTypeDefinition()
.MakeGenericType(new[]
{
typeof(T),
typeof(bool)
}
);
var tfd = Expression.OrElse(expressions[0], expressions[1]);
var combined = expressions
.Cast<Expression>()
.Aggregate( (e1, e2) => Expression.OrElse(e1, e2) );
return (Expression<Func<T, bool>>)Expression.Lambda(delegateType, combined);
}
test(entityToFind).ToOrExpression();
But there i will get the following error:
The binary operator OrElse is not defined for the types 'System.Func2[Models.User,System.Boolean]' and
'System.Func2[Models.User,System.Boolean]'
While you could create a wrapper method to combine a bunch of Funcs, because you are using Entity Framework, that would cause the entire dataset to be downloaded into memory and the search done locally. What you should be using is Expression<Func<T, bool>> instead.
Fortunately Marc Gravell has already written a handy bit of code to combine expressions. Your question is strictly a dupliucate because you want to combine more than 2 together, but that is quite easy with a little Linq. So, lets start with your expressions first, the code barely changes:
var expressions = new List<Expression<Func<User, bool>>>()
{
(u) => u.Id.Equals(entityToFind.Id),
(u) => u.UserName == entityToFind.UserName,
(u) => u.Email == entityToFind.Email
};
Now using Marc's code and modifying it to be or instead of and:
public static class ExpressionExtensions
{
public static Expression<Func<T, bool>> OrElse<T>(
this Expression<Func<T, bool>> expr1,
Expression<Func<T, bool>> expr2)
{
var parameter = Expression.Parameter(typeof(T));
var leftVisitor = new ReplaceExpressionVisitor(expr1.Parameters[0], parameter);
var left = leftVisitor.Visit(expr1.Body);
var rightVisitor = new ReplaceExpressionVisitor(expr2.Parameters[0], parameter);
var right = rightVisitor.Visit(expr2.Body);
return Expression.Lambda<Func<T, bool>>(
Expression.OrElse(left, right), parameter);
}
private class ReplaceExpressionVisitor
: ExpressionVisitor
{
private readonly Expression _oldValue;
private readonly Expression _newValue;
public ReplaceExpressionVisitor(Expression oldValue, Expression newValue)
{
_oldValue = oldValue;
_newValue = newValue;
}
public override Expression Visit(Expression node)
{
if (node == _oldValue)
return _newValue;
return base.Visit(node);
}
}
}
No you combine your expressions with the Linq Aggregate method:
var combinedExpression = expressions.Aggregate((x, y) => x.OrElse(y));
And use it something like this:
var result = db.Things.Where(combinedExpression);

Linq query filter with "contains" with list<T> multiple elements

I want to query a list by Linq but filter by an other list containing two elements ( Name, Status) in my example.
This is inspired by an old question I've adapted to my issue.
LINQ: "contains" and a Lambda query
(in this answer it's working for only one element i.e. Status)
I try to use the "contains" method but didn't succeed to filter my list.
I should obtain a result with only two buildings (two, five)
Has anyone an idea where I'm stopped ?
Thanks
Blockquote
public class Building
{
public enum StatusType
{
open,
closed,
weird,
};
public string Name { get; set; }
public StatusType Status { get; set; }
}
private static readonly List<Building> BuildingList = new List<Building>()
{
new Building() {Name = "one", Status = Building.StatusType.open},
new Building() {Name = "two", Status = Building.StatusType.closed},
new Building() {Name = "three", Status = Building.StatusType.weird},
new Building() {Name = "four", Status = Building.StatusType.open},
new Building() {Name = "five", Status = Building.StatusType.closed},
new Building() {Name = "six", Status = Building.StatusType.weird},
};
private void GetResult()
{
var buildingSelect = new List<Building>
{
new Building() {Name = "two", Status = Building.StatusType.closed},
new Building() {Name = "five", Status = Building.StatusType.closed}
};
var q = (from building in BuildingList
where buildingSelect.Contains(building.Name, building.Status)
select building).ToList();
dataGridView1.DataSource = q;
}
The main problem of your LINQ is that you are trying the compare the equality of two Buildings, which LINQ can only compare by their references because Building does not implement IEquatable<Building> nor override object.Equals.
One way to solve it is to manually specify which properties to compare for equality as per #Wayne's answer.
The other way is, if Building instances are meant to be equated by their values and not by their references, implement IEquatable<Building> and override object.Equals:
public class Building : IEquatable<Building>
{
public Building(string name, StatusType status)
{
Name = name;
Status = status;
}
public enum StatusType
{
open,
closed,
weird,
};
public string Name { get; }
public StatusType Status { get; }
public static bool operator ==(Building left, Building right)
=> Equals(left, right);
public static bool operator !=(Building left, Building right)
=> !Equals(left, right);
public override bool Equals(object obj) => Equals(obj as Building);
public bool Equals(Building other)
{
if (ReferenceEquals(this, other))
{
return true;
}
if (ReferenceEquals(other, null) || GetType() != other.GetType())
{
return false;
}
return Name == other.Name && Status == other.Status;
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + Name?.GetHashCode() ?? 0;
hash = hash * 23 + Status.GetHashCode();
return hash;
}
}
}
That way, your original code would work because List.Contains will now use your implementation of IEquatable<Building> to check for equality.
You mean something like this?
var q = from b in BuildingList
from bs in buildingSelect
where b.Name == bs.Name && b.Status == bs.Status
select b;
or perhaps:
var q = from b in BuildingList
join bs in buildingSelect
on new { b.Name, b.Status } equals new { bs.Name, bs.Status }
select b;
You can either override the equality in the class itself, IF this makes sense.
Or just make the check normally with Any(), like this:
var q = (from building in BuildingList
where buildingSelect.Any(b => b.Name == building.Name
&& b.Status == building.Status)
select building).ToList();

How can I transform this linq expression?

Say I have an entity that I want to query with ranking applied:
public class Person: Entity
{
public int Id { get; protected set; }
public string Name { get; set; }
public DateTime Birthday { get; set; }
}
In my query I have the following:
Expression<Func<Person, object>> orderBy = x => x.Name;
var dbContext = new MyDbContext();
var keyword = "term";
var startsWithResults = dbContext.People
.Where(x => x.Name.StartsWith(keyword))
.Select(x => new {
Rank = 1,
Entity = x,
});
var containsResults = dbContext.People
.Where(x => !startsWithResults.Select(y => y.Entity.Id).Contains(x.Id))
.Where(x => x.Name.Contains(keyword))
.Select(x => new {
Rank = 2,
Entity = x,
});
var rankedResults = startsWithResults.Concat(containsResults)
.OrderBy(x => x.Rank);
// TODO: apply thenby ordering here based on the orderBy expression above
dbContext.Dispose();
I have tried ordering the results before selecting the anonymous object with the Rank property, but the ordering ends up getting lost. It seems that linq to entities discards the ordering of the separate sets and converts back to natural ordering during both Concat and Union.
What I think I may be able to do is dynamically transform the expression defined in the orderBy variable from x => x.Name to x => x.Entity.Name, but I'm not sure how:
if (orderBy != null)
{
var transformedExpression = ???
rankedResults = rankedResults.ThenBy(transformedExpression);
}
How might I be able to use Expression.Lambda to wrap x => x.Name into x => x.Entity.Name? When I hard code x => x.Entity.Name into the ThenBy I get the ordering that I want, but the orderBy is provided by the calling class of the query, so I don't want to hard-code it in. I have it hardcoded in the example above for simplicity of explanation only.
This should help. However you are going to have to concrete up the Anonymous type for this to work. My LinqPropertyChain will not work with it, since its going to be difficult to create the Expression<Func<Anonymous, Person>> whilst its still Anonymous.
Expression<Func<Person, object>> orderBy = x => x.Name;
using(var dbContext = new MyDbContext())
{
var keyword = "term";
var startsWithResults = dbContext.People
.Where(x => x.Name.StartsWith(keyword))
.Select(x => new {
Rank = 1,
Entity = x,
});
var containsResults = dbContext.People
.Where(x => !startsWithResults.Select(y => y.Entity.Id).Contains(x.Id))
.Where(x => x.Name.Contains(keyword))
.Select(x => new {
Rank = 2,
Entity = x,
});
var rankedResults = startsWithResults.Concat(containsResults)
.OrderBy(x => x.Rank)
.ThenBy(LinqPropertyChain.Chain(x => x.Entity, orderBy));
// TODO: apply thenby ordering here based on the orderBy expression above
}
public static class LinqPropertyChain
{
public static Expression<Func<TInput, TOutput>> Chain<TInput, TOutput, TIntermediate>(
Expression<Func<TInput, TIntermediate>> outter,
Expression<Func<TIntermediate, TOutput>> inner
)
{
Console.WriteLine(inner);
Console.WriteLine(outter);
var visitor = new Visitor(new Dictionary<ParameterExpression, Expression>
{
{inner.Parameters[0], outter.Body}
});
var newBody = visitor.Visit(inner.Body);
Console.WriteLine(newBody);
return Expression.Lambda<Func<TInput, TOutput>>(newBody, outter.Parameters);
}
private class Visitor : ExpressionVisitor
{
private readonly Dictionary<ParameterExpression, Expression> _replacement;
public Visitor(Dictionary<ParameterExpression, Expression> replacement)
{
_replacement = replacement;
}
protected override Expression VisitParameter(ParameterExpression node)
{
if (_replacement.ContainsKey(node))
return _replacement[node];
else
{
return node;
}
}
}
}
Figured out a way to do this with less Explicite Generics.
Expression<Func<Person, object>> orderBy = x => x.Name;
Expression<Func<Foo, Person>> personExpression = x => x.Person;
var helper = new ExpressionChain(personExpression);
var chained = helper.Chain(orderBy).Expression;
// Define other methods and classes here
public class ExpressionChain<TInput, TOutput>
{
private readonly Expression<Func<TInput, TOutput>> _expression;
public ExpressionChain(Expression<Func<TInput, TOutput>> expression)
{
_expression = expression;
}
public Expression<Func<TInput, TOutput>> Expression { get { return _expression; } }
public ExpressionChain<TInput, TChained> Chain<TChained>
(Expression<Func<TOutput, TChained>> chainedExpression)
{
var visitor = new Visitor(new Dictionary<ParameterExpression, Expression>
{
{_expression.Parameters[0], chainedExpression.Body}
});
var lambda = Expression.Lambda<Func<TInput, TOutput>>(newBody, outter.Parameters);
return new ExpressionChain(lambda);
}
private class Visitor : ExpressionVisitor
{
private readonly Dictionary<ParameterExpression, Expression> _replacement;
public Visitor(Dictionary<ParameterExpression, Expression> replacement)
{
_replacement = replacement;
}
protected override Expression VisitParameter(ParameterExpression node)
{
if (_replacement.ContainsKey(node))
return _replacement[node];
else
{
return node;
}
}
}
}
Since you're ordering by Rank first, and the Rank values are identical within each sequence, you should be able to just sort independently and then concatenate. It sounds like the hiccup here would be that, according to your post, Entity Framework isn't maintaining sorting across Concat or Union operations. You should be able to get around this by forcing the concatenation to happen client-side:
var rankedResults = startsWithResults.OrderBy(orderBy)
.AsEnumerable()
.Concat(containsResults.OrderBy(orderBy));
This should render the Rank property unnecessary and probably simplify the SQL queries being executed against your database, and it doesn't require mucking about with expression trees.
The downside is that, once you call AsEnumerable(), you no longer have the option of appending additional database-side operations (i.e., if you chain additional LINQ operators after Concat, they will use the LINQ-to-collections implementations). Looking at your code, I don't think this would be a problem for you, but it's worth mentioning.

How to retrieve ordering information from IQueryable object?

Let's say, I have an instance of IQueryable. How can I found out by which parameters it was ordered?
Here is how OrderBy() method looks like (as a reference):
public static IOrderedQueryable<T> OrderBy<T, TKey>(
this IQueryable<T> source, Expression<Func<T, TKey>> keySelector)
{
return (IOrderedQueryable<T>)source.Provider.CreateQuery<T>(
Expression.Call(null,
((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(
new Type[] { typeof(T), typeof(TKey) }
),
new Expression[] { source.Expression, Expression.Quote(keySelector) }
)
);
}
A hint from Matt Warren:
All queryables (even IOrderedQueryable's) have expression trees underlying them that encode the activity they represent. You should find using the IQueryable.Expression property a method-call expression node representing a call to the Queryable.OrderBy method with the actual arguments listed. You can decode from the keySelector argument the expression used for ordering. Take a look at the IOrderedQueryable object instance in the debugger to see what I mean.
This isn't pretty, but it seems to do the job:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Linq;
using System.Linq.Expressions;
using System.Windows.Forms;
public class Test
{
public int A;
public string B { get; set; }
public DateTime C { get; set; }
public float D;
}
public class QueryOrderItem
{
public QueryOrderItem(Expression expression, bool ascending)
{
this.Expression = expression;
this.Ascending = ascending;
}
public Expression Expression { get; private set; }
public bool Ascending { get; private set; }
public override string ToString()
{
return (Ascending ? "asc: " : "desc: ") + Expression;
}
}
static class Program
{
public static List<QueryOrderItem> GetQueryOrder(Expression expression)
{
var members = new List<QueryOrderItem>(); // queue for easy FILO
GetQueryOrder(expression, members, 0);
return members;
}
static void GetQueryOrder(Expression expr, IList<QueryOrderItem> members, int insertPoint)
{
if (expr == null) return;
switch (expr.NodeType)
{
case ExpressionType.Call:
var mce = (MethodCallExpression)expr;
if (mce.Arguments.Count > 1)
{ // OrderBy etc is expressed in arg1
switch (mce.Method.Name)
{ // note OrderBy[Descending] shifts the insertPoint, but ThenBy[Descending] doesn't
case "OrderBy": // could possibly check MemberInfo
members.Insert(insertPoint, new QueryOrderItem(mce.Arguments[1], true));
insertPoint = members.Count; // swaps order to enforce stable sort
break;
case "OrderByDescending":
members.Insert(insertPoint, new QueryOrderItem(mce.Arguments[1], false));
insertPoint = members.Count;
break;
case "ThenBy":
members.Insert(insertPoint, new QueryOrderItem(mce.Arguments[1], true));
break;
case "ThenByDescending":
members.Insert(insertPoint, new QueryOrderItem(mce.Arguments[1], false));
break;
}
}
if (mce.Arguments.Count > 0)
{ // chained on arg0
GetQueryOrder(mce.Arguments[0], members, insertPoint);
}
break;
}
}
static void Main()
{
var data = new[] {
new Test { A = 1, B = "abc", C = DateTime.Now, D = 12.3F},
new Test { A = 2, B = "abc", C = DateTime.Today, D = 12.3F},
new Test { A = 1, B = "def", C = DateTime.Today, D = 10.1F}
}.AsQueryable();
var ordered = (from item in data
orderby item.D descending
orderby item.C
orderby item.A descending, item.B
select item).Take(20);
// note: under the "stable sort" rules, this should actually be sorted
// as {-A, B, C, -D}, since the last order by {-A,B} preserves (in the case of
// a match) the preceding sort {C}, which in turn preserves (for matches) {D}
var members = GetQueryOrder(ordered.Expression);
foreach (var item in members)
{
Console.WriteLine(item.ToString());
}
// used to investigate the tree
TypeDescriptor.AddAttributes(typeof(Expression), new[] {
new TypeConverterAttribute(typeof(ExpandableObjectConverter)) });
Application.Run(new Form
{
Controls = {
new PropertyGrid { Dock = DockStyle.Fill, SelectedObject = ordered.Expression }
}
});
}
}

Resources