Solve Linq expression using Distinct, GroupBy and Take - linq

I am using Entity Framework, working with lambda expressions, I have to select number of selected records from the grid (_numRecords) which includes sorting, remove duplicates based on a column (Distinct by a column). Following is the code:
private IEnumerable<ReadViewModel> generateLocalData(IQueryable<ReadViewModel> query, [DataSourceRequest] DataSourceRequest dsRequest, int exportXml)
{
if (exportXml > 0)
query = query.GroupBy(x => x.Id).Select(x => x.FirstOrDefault());
//query = query.OrderByDescending(x => x.EventDate).GroupBy(x => x.Id).Select(x => x.FirstOrDefault());
//query = query.DistinctBy(x => x.Id).AsQueryable<ReadViewModel>();
query = query.OrderByDescending(x => x.EventDate);
query = query.ApplySorting(dsRequest.Groups, dsRequest.Sorts);
query = query.Take(this._numRecords);
List<ReadViewModel> data;
data = query.ToList();
return data;
}
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
There is data on the grid (which can have duplicate Id field) but when I export it to xml, it should be unique.
Problem is when Distinct is applied to lambda expression it destroys the sorting, and I am using Take() method so I need to use it before fetching data from DB.
I've used 2 approaches to acheive distinct: by GroupBy & Single/Select and DistinctBy from Extension methods. But I'm unable to make it work, using these methods, as sorting is destroyed.

private static IQueryable<T> DistinctBy<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> keySelector)
{
return source.GroupBy(keySelector).Select(grp => grp.First());
}
Group by the key you want to be distinct by, then take the first item from each group. You might need FirstOrDefault() with some providers that don't do well with First() within queries.
This is sub-optimal when used with linq-to-objects, but you have the optimal approach for that case in your question already.

Related

EF 6 Core IEnumerable<T> Projection to Dynamic Object

I have been struggling on this for a day now and I can't figure out how to do this:
I have an IEnumerable (Query generated in steps) that I'm taking in as a param (right after the where clause).
Base on column names List I'm trying create a dynamic select clause.
public static IQueryable<T> AddSelectClause<T>(SearchDto criteria, IQueryable<T> queryable, ControllerStateManager stateManager)
{
if (criteria.SelectColumns != null)
{
var formattedColumns = string.Join(", ", criteria.SelectColumns);
queryable.SelectMany(c => $"new({formattedColumns})");
}
return queryable;
}
Obviously this is not working, no exception just returns all columns.
I tried this:
queryable.Select(c => new { c.Id, c.Name });
Intellisense doesn't like this saying: T does not contain a definition for Id (same for name).
That is about my depth of knowledge on Linq, any suggestions would be greatly appreciated

How to query a datatable in C#

I have a datatable with a number of columns. Two of them are JobNum and ProcessNum. I want to see if I have duplicate records in the datatable.
Normally in SQL I would just do a
select count(jobnum) cntjobnum, JobNum, ProcessNum
from table
where jobnum = '1234' and processnum = '5678'
Is there a similar manner in C# (using a C# console app)?
I want to see if I have duplicate records in the datatable.
I don't think that your SQL statement will detect duplicate records, do you?
There is no LINQ method to detect if there are duplicates in an IEnumerable<...>. However, there are methods to detect the duplicates. If you only want to find whether there are duplicates or not, if is not very efficient to continue processing the other elements after you have found a duplicate. So the most efficient method would be to stop as soon as you have found a duplicate.
public static bool HasDuplicates<T>(this IEnumerable<T> source)
{
return HasDuplicates(source, null);
}
public static bool HasDuplicates<T>(this IEnumerable<T> source,
IEqualityComparer<T> comparer)
{
if (source == null) throw new ArgumentNullException(...);
// if null comparer, use the default comparer:
if (comparer == null) comparer = EqualityComparer<T>.Default;
HashSet<T> set = new HashSet<T>(comparer);
foreach (T t in source)
{
if (set.Contains(t))
return true; // found a duplicate
else
set.Add(t);
}
return false; // no duplicates found
}
If you want to find all duplicates, consider to use GroupBy, and use the complete item as key.
IEnumerable<Product> products = ...
IEqualityComparer<Product> comparer = ... // or use default comparison
var duplicates = products.GroupBy(product => product, comparer)
// keep only the groups that have more than one element
.Where(group => group.Skip(1).Any())
// from the remaining groups, take the key (which is the product
.Select(group => group.Key);
(key, elementsWithThisKey) =>

Linq query where there's a certain desired relationship between items in the result

A linq query Where clause can apply a func to an item in the original set and return a bool to include or not include the item based on the item's characteristics. Great stuff:
var q = myColl.Where(o => o.EffectiveDate = LastThursday);
But what if I want to find a set of items where each item is related to the last item in some way? Like:
var q = myColl.Where(o => o.EffectiveDate = thePreviousItem.ExpirationDate);
How do you make a Where (or other linq function) "jump out" of the current item?
Here's what I tried, trying to be clever. I made every item an array just so I can use the Aggregate function:
public IQueryable<T> CurrentVersions
{
get => AllVersions
.Select(vo => new T[] { vo })
.Aggregate((voa1, voa2) => voa1[0].BusinessExpirationDate.Value == voa2[0].BusinessEffectiveDate.Value ? voa1.Concat(voa2).ToArray() : voa1)
.SelectMany(vo => vo);
}
but that doesn't compile on the SelectMany:
The type arguments for method Enumerable.SelectMany<TSource,
TResult>(IEnumerable<TSource>, Func<TSource, IEnumerable<TResult>>)
cannot be inferred from the usage. Try specifying the type arguments
explicitly.
EDIT (SOLUTION)
As it turns out, I was on the right track, but was just confused about what SelectMany does. I didn't need it. I also needed to change IQueryable to IEnumerable because I'm using EF and you can't query after you let go of the DbContext. So, here is the actual solution.
public IEnumerable<T> CurrentVersions
{
get => AllVersions
.Select(vo => new T[] { vo })
.Aggregate((voa1, voa2) => voa1[0].BusinessExpirationDate.Value == voa2[0].BusinessEffectiveDate.Value ? voa1.Concat(voa2).ToArray() : voa1);
}
Linq queries are most effective when each item is processed in isolation. It doesn't work well when trying to relate items within the same collection, without having to process the same collection multiple times and standard linq operators.
The MoreLINQ library helps provide additional operators to fill in some of those gaps. I'm not sure what operators it provides that could be used in this instance, but I know it has a Pairwise() method that combines the current and previous items in the iteration.
In general, for situations like this, if you needed to roll out your own, it would be far easier to write it using a generator to generate your sequence. Either as a general purpose extension method:
public static IEnumerable<TSource> WhereWithPrevious<TSource>(
this IEnumerable<TSource> source,
Func<TSource, TSource, bool> predicate)
{
using (var iter = source.GetEnumerator())
{
if (!iter.MoveNext())
yield break;
var previous = iter.Current;
while (iter.MoveNext())
{
var current = iter.Current;
if (predicate(current, previous))
yield return current;
}
}
}
or one specifically for the problem you're trying to solve.
public static IEnumerable<MyType> GetVersions(IEnumerable<MyType> source)
{
using (var iter = source.GetEnumerator())
{
if (!iter.MoveNext())
yield break;
var previous = iter.Current;
while (iter.MoveNext())
{
var current = iter.Current;
if (current.EffectiveDate == previous.ExpirationDate)
yield return current;
}
}
}
An alternative approach which while standard practice in other languages but terribly inefficient here would be to zip the collection with itself offset by one.
var query = Collection.Skip(1).Zip(Collection, (c, p) => (current:c,previous:p))
.Where(x => x.current.EffectiveDate == x.previous.ExpirationDate)
...;
And with all of that said, using any of these options will most likely make your query incompatible with query providers. It's not something you would want expressed as a single query anyway.

Generic expression for where clause - "The LINQ expression node type 'Invoke' is not supported in LINQ to Entities."

I am trying to write a really generic way to load EF entities in batches, using the Contains method to generate a SQL IN statement. I've got it working if I pass the entire expression in, but when I try to build the expression dynamically, I am getting a "The LINQ expression node type 'Invoke' is not supported in LINQ to Entities." So I know this means that EF thinks I'm calling an arbitrary method and it can't translate it into SQL, but I can't figure out how to get it to understand the underlying expression.
So If I do something like this (just showing the relevant snippets):
Function declaration:
public static List<T> Load<T>(IQueryable<T> entityQuery, int[] entityIds, Func<T, int> entityKey, int batchSize = 500, Func<T, bool> postFilter = null) where T : EntityObject
{
var retList = new List<T>();
// Append a where clause to the query passed in, that will use a Contains expression, which generates a SQL IN statement. So our SQL looks something like
// WHERE [ItemTypeId] IN (1921,1920,1922)
// See http://rogeralsing.com/2009/05/21/entity-framework-4-where-entity-id-in-array/ for details
Func<int[], Expression<Func<T, bool>>> containsExpression = (entityArray => (expr => entityArray.Contains(entityKey(expr))));
// Build a new query with the current batch of IDs to retrieve and add it to the list we are returning
newQuery = entityQuery.Where<T>(containsExpression(entityIds));
retList.AddRange(newQuery.ToList());
return retList;
}
Call function:
var entities = BatchEntity.Load<ItemType>(from eItemType in dal.Context.InstanceContainer.ItemTypes
select eItemType
, itemTypeData
, (ek => ek.ItemTypeId)
);
I get "The LINQ expression node type 'Invoke' is not supported in LINQ to Entities."
But if I change it to be this:
Function declaration:
public static List<T> Load<T>(IQueryable<T> entityQuery, int[] entityIds, Func<int[], Expression<Func<T, bool>>> containsExpression, int batchSize = 500, Func<T, bool> postFilter = null) where T : EntityObject
{
var retList = new List<T>();
// Build a new query with the current batch of IDs to retrieve and add it to the list we are returning
newQuery = entityQuery.Where<T>(containsExpression(entityIds));
retList.AddRange(newQuery.ToList());
return retList;
}
Call function:
var entities = BatchEntity.Load<ItemType>(from eItemType in dal.Context.InstanceContainer.ItemTypes
select eItemType
, itemTypeData
, (entityArray => (ek => entityArray.Contains(ek.ItemTypeId)))
);
It works fine. Is there any way I can make EF understand the more generic version?
The problem, as you describe, is that the entityKey function in the first example is opaque since it is of type Func rather than Expression. However, you can get the behavior you want by implementing a Compose() method to combine two expressions. I posted the code to implement compose in this question: use Expression<Func<T,X>> in Linq contains extension.
With Compose() implemented, your function can be implemented as below:
public static List<T> Load<T>(this IQueryable<T> entityQuery,
int[] entityIds,
// note that this is an expression now
Expression<Func<T, int>> entityKey,
int batchSize = 500,
Expression<Func<T, bool>> postFilter = null)
where T : EntityObject
{
Expression<Func<int, bool>> containsExpression = id => entityIds.Contains(id);
Expression<Func<T, bool>> whereInEntityIdsExpression = containsExpression.Compose(entityKey);
IQueryable<T> filteredById = entityQuery.Where(whereInEntityIdsExpression);
// if your post filter is compilable to SQL, you might as well do the filtering
// in the database
if (postFilter != null) { filteredById = filteredById.Where(postFilter); }
// finally, pull into memory
return filteredById.ToList();
}

Generics and Database Access

I have the following method I can pass in a lambda expression to filter my result and then a callback method that will work on the list of results. This is just one particular table in my system, I will use this construct over and over. How can I build out a generic method, say DBget that takes a Table as a parameter(An ADO.NET dataservice entity to be fair) and pass in a filter (a lambda experssion).
public void getServiceDevelopmentPlan(Expression<Func<tblServiceDevelopmentPlan, bool>> filter, Action<List<tblServiceDevelopmentPlan>> callback)
{
var query = from employerSector in sdContext.tblServiceDevelopmentPlan.Where(filter)
select employerSector;
var DSQuery = (DataServiceQuery<tblServiceDevelopmentPlan>)query;
DSQuery.BeginExecute(result =>
{
callback(DSQuery.EndExecute(result).ToList<tblServiceDevelopmentPlan>());
}, null);
}
My first bash at this is:
public delegate Action<List<Table>> DBAccess<Table>(Expression<Func<Table, bool>> filter);
If you are using Linq to Ado.NET Dataservices or WCF Dataservices, your model will build you a lot of typed. Generally though you will be selecting and filtering. You need the following, then all your methods are just candy over the top of this:
Query Type 1 - One Filter, returns a list:
public void makeQuery<T>(string entity, Expression<Func<T, bool>> filter, Action<List<T>> callback)
{
IQueryable<T> query = plussContext.CreateQuery<T>(entity).Where(filter);
var DSQuery = (DataServiceQuery<T>)query;
DSQuery.BeginExecute(result =>
{
callback(DSQuery.EndExecute(result).ToList<T>());
}, null);
}
Query Type 2 - One Filter, returns a single entity:
public void makeQuery(string entity, Expression> filter, Action callback)
{
IQueryable<T> query = plussContext.CreateQuery<T>(entity).Where(filter);
var DSQuery = (DataServiceQuery<T>)query;
DSQuery.BeginExecute(result =>
{
callback(DSQuery.EndExecute(result).First<T>());
}, null);
}
What you need to do is overload these and swap out the filter for a simple array of filters
Expression<Func<T, bool>>[] filter
And repeat for single and list returns.
Bundle this into a singleton if you want one datacontext, or keep track of an array of contexts in some sort of hybrid factory/singleton and you are away. Let the constructor take a context or if non are supplied then use its own and you are away.
I then use this on a big line but all in one place:
GenericQuery.Instance.Create().makeQuery<tblAgencyBranches>("tblAgencyBranches", f => f.tblAgencies.agencyID == _agency.agencyID, res => { AgenciesBranch.ItemsSource = res; });
This may look complicated but it hides a lot of async magic, and in certain instances can be called straight from the button handlers. Not so much a 3 tier system, but a huge time saver.

Resources