LINQ groupby with condition dependent on other sequence elements - linq

LINQ. I have an IEnumerable of type Transaction:
private class Transaction{
public string Debitor { get; set; }
public double Debit { get; set; }
public string Creditor { get; set; } }
Sample IEnumerable<Transaction>:
[Debitor] | [Spend] | [Creditor]
luca 10 alessio
giulia 12 alessio
alessio 7 luca
alessio 6 giulia
marco 5 giulia
alessio 3 marco
luca 1 alessio
I would like to group the Transactions where Debitor == Creditor; else in a separate group. For the previous example, I should get:
Group 1:
luca 10 alessio
alessio 7 luca
luca 1 alessio
Group 2:
giulia 12 alessio
alessio 6 giulia
Group 3:
marco 5 giulia
Group 4:
alessio 3 marco
I have solved it using 2 for loops (one nested) over the same IEnumerable, performing the check and using separate lists for the output, but I wonder if there is a less clunky way of doing this using LINQ.
A similar question might be: LINQ Conditional Group, however in this case the grouping condition is variable dependent on the other elements of the IEnumerable.

The simplest way would indeed be to create a combined key, such as in Anu's answer.
Another way to do that (not necessarily better, but avoids sub collections and string joining), is:
var groups = transactions.GroupBy(t=> t.Debitor.CompareTo(t.Creditor) > 0 ? (t.Debitor,t.Creditor) : (t.Creditor,t.Debitor));
NB The above assumes you can use implicit Tuple creation. If you have a lower C# version and/or don't have the ValueTuple NuGet package installed, you can use: var groups = transactions.GroupBy(t=> t.Debitor.CompareTo(t.Creditor) > 0 ? Tuple.Create(t.Debitor,t.Creditor) : Tuple.Create(t.Creditor,t.Debitor));
Purely for the sake of mentioning it, another way, is to use a custom equality comparer for the group by. This might be overkill depending on your needs, collection size, need for reusabillity, etc, but to show the possibility all the same: first create a class (or implement it in Transaction directly)
class TransactionDebCredComparer : EqualityComparer<Transaction>
{
public override bool Equals(Transaction t1, Transaction t2) => (t1.Debitor == t2.Creditor && t2.Debitor == t1.Creditor) || (t1.Debitor == t2.Debitor && t2.Creditor == t1.Creditor);
public override int GetHashCode(Transaction t) => t.Debitor.GetHashCode() ^ t.Creditor.GetHashCode();
}
Then you can group your enumerable by using
var groups = transactions.GroupBy(t=>t, new TransactionDebCredComparer() );

For a simple solution, you could group using a key created with Creditor and Debitor. For example.
string CreateKey(params string[] names)=>string.Join(",",names.OrderBy(x => x));
var result = transactionCollection.GroupBy(x=> CreateKey(x.Debitor,x.Creditor));
Please note I have used a collection in CreateKey, in case you have more similar grouping factors, but you can write a simpler version for CreateKey if the condition is always involving Creditor and Debitor alone.

Related

How would I count these records using Linq to NHibernate (3.1)

I have an entity defined like so:
public class TestEntity : EntityBase
{
public string Source {get;set;}
public bool Suppressed {get;set;}
/* other stuff */
}
I want to show an HTML table that looks like:
Source Suppressed Not Suppressed
-------------------------------------------------
Source1 30 1225
Soure 7 573
My first attempt to query this was:
from e in _session.Query<TestEntity>()
group e by e.Source into g1
select new
{
Source = g1.Key,
Suppressed = g1.Sum(x=>x.Suppressed ? 1 : 0),
NotSuppressed = g1.Sum(x=>x.Suppressed ? 0 : 1),
}
But of course, Linq choked on the ternary expression when converting it to SQL. Any alternative ways to do this?
Edit: I tried Dmitry's suggestion, and it returns the same counts for both. The SQL generated by his suggestion is:
select
customer0_.SourceA as col_0_0_,
cast(count(*) as INT) as col_1_0_,
cast(count(*) as INT) as col_2_0_
from
dbo.Customers customer0_
group by
customer0_.SourceA
Which obviously isn't what I want...
What about doing something like g1.Count(x => x.Suppressed == true)?

prevent unnecessary cross joins in count query of generated sql code

I am using this query:
return from oi in NHibernateSession.Current.Query<BlaInteraction>()
select new BlaViewModel
{
...
NoPublications = oi.Publications.Count(),
...
};
BlaInteraction contains an IList of publications (i.e. entities). To determine the number of publications one does not really need to do all the joins for a publication. Can I prevent nhibernate from using joins in the generated sql (e.g. using projection???) somehow?
Thanks.
Christian
PS:
This is what NH produces (slightly adapted):
select cast(count(*) as INT) from RelationshipStatementPublications publicatio21_, Publication publicatio22_ inner join Statements publicatio22_1_ on publicatio22_.StatementId=publicatio22_1_.DBId where publicatio21_.StatementId = 22762181 and publicatio21_.PublicationId=publicatio22_.StatementId
This is what would be sufficient:
select cast(count(*) as INT) from RelationshipStatementPublications publicatio21_ where publicatio21_.StatementId = 22762181
Why can't you just create another query ?
Session.QueryOver<Publication>().Where(x => x.BlaInteractionId == idSentAsParameter).Select(Projections.RowCount()).SingleOrDefault<int>();
I think that's will work
return from oi in NHibernateSession.Current.Query<BlaInteraction>()
select new BlaViewModel
{
...
NoPublications = Session.QueryOver<Publication>().Where(x => x.BlaInteractionId == oi.Id).Select(Projections.RowCount()).SingleOrDefault<int>();
...
};
Another edit, have you tried lazy="extra" ?
Ok the best solution I have found so far is to use a FNH Formula:
mapping.Map(x => x.NOPublications).Formula("(select count(distinct RelationshipStatementPublications.PublicationId) from RelationshipStatementPublications where RelationshipStatementPublications.StatementId = DBId)");
public virtual int NOPublications {get; private set;}
when I map from the domain to the view model I use:
NoPublications = oi.NOPublications,
Christian

Linq Contains and Distinct

I have the following 3 tables with their fields
Books(Id_Book | Title | Year)
Book_Themes (Id | Id_Book| Id_Theme)
Themes (Id_Theme| Title)
I also have an Giud array with Id_Themes
Guid [] themesArray = new Guid []{new Guid("6236c491-b4ae-4a2f-819e-06a38bf2cf41"), new Guid("06586887-7e3f-4f0a-bb17-40c86bfa76ce")};
I'm trying to get all Books containing any of the Theme_Ids from the themesArray
This is what I have so far which is not working. Not sure how to use Contains in this scnenario.
int index = 1; int size= 10;
var books = (from book in DB.Books
join bookWThemes in DB.Book_Themes
on book.Id_Book equals bookWThemes.Id_Book
where themesArray.Contains(bookWThemes.Id_Theme)
orderby book.Year
select book)
.Skip((index - 1) * page)
.Take(size);
I'm getting an error on themesArray.Contains(bookWThemes.Id_Theme): System.Guid[] does not contain a definition for Contains. Also I'm not sure where to put the Distinct
****UPDATE****
noticed that my Model had Id_Theme as nullable... I changed the DB and didn't reflect the changes on my model. So to answer the question if it's nullable just change the Contains line to themesArray.Contains(bookWThemes.Id_Theme.Value)... and with this change it works.
Thanks for all the help!.
It's strange that your LINQ query is breaking down on .Contains. All three of the forms IEnumerable<> and List<> work for me.
[Test]
public void Test43()
{
var a = new List<Guid>(){new Guid(),new Guid(),new Guid()};
a.Contains(new Guid()); // works okay
var b = (IEnumerable<Guid>)a;
b.Contains<Guid>(new Guid()); // works okay
b.Contains(new Guid()); // works okay
}
For the "distinct" question, put the call here:
select book)
.Distinct() // <--
.Skip((index - 1) * page)
Try casting the Guid[] to List<Guid> and then you can use Contains on it.
where themesArray.ToList().Contains(bookWThemes.Id_Theme)

How can I pass a Predicate<T> to the Where() method in Linq to SQL?

I have the following model:
+--------+
| Folder |
+--------+
| 1
|
| *
+----------+ +---------+
| WorkItem |---------| Project |
+----------+ * 1 +---------+
I need to retrieve a list of Folders with the current count of WorkItems.
If I have specified a Project, then I only want the count of WorkItems in each Folder that are associated with the specified Project.
If no Project is specified, it should return the total WorkItem count.
I have the following Linq to SQL code:
public interface IWorkItemCriteria {
int? ProjectId { get; }
}
public static IQueryable<Folder> GetFoldersWithItemCounts(IWorkItemCriteria criteria) {
var results = from folder in dataContext.Folders
select new {
Folder = folder,
Count = folder.WorkItems.Count()
};
return(results);
}
The problem is - I want to filter the work items that are being counted.
This works:
var results = from folder in dataContext.Folders
select new {
Folder = folder,
Count = folder.WorkItems.Where(item => item.ProjectId == criteria.ProjectId).Count()
};
but I can't get it to use any kind of dynamic predicate / expression. The syntax I'm trying to use is:
var results = from folder in dataContext.Folders
select new {
Folder = folder,
Count = folder.WorkItems.Where(filter).Count()
};
I've tried
Predicate<WorkItem> filter = (item => item.ProjectId == criteria.ProjectId);
and
Expression<Func<WorkItem, bool>> filter = (item => item.ProjectId == criteria.ProjectId)
neither of which will compile - gives The type arguments for method 'System.Linq.Enumerable.Where<TSource> (System.Collections.Generic.IEnumerable<TSource>, System.Func<TSource,bool>)' cannot be inferred from the usage. Try specifying the type arguments explicitly.
I've tried
Func<WorkItem, bool> filter = (item => item.ProjectId == criteria.ProjectId);
which builds, but then fails with 'Unsupported overload used for query operator 'Where'.'
I'm going to need to add additional properties to the IWorkItemCriteria interface, so the ability to dynamically construct a predicate that can be cleanly translated by Linq to SQL is pretty important.
Any ideas?
I think your problem is that you'd like to reference one expression (filter) from within a second expression (the one being passed to results.Where.Select()), and that the LINQ to SQL IQueryable provider doesn't know how to pull in the value of that.
What you're going to have to do is compose the Select expression from scratch in order to inline the expression that's your dynamic filter. This might be extra hard with anonymous types, i'm not sure. I just found the following article, it seems to explain this pretty well: http://www.codeproject.com/KB/linq/rewrite_linq_expressions2.aspx
Something I grabbed off of MSDN: the idea is that you just point it to an existing function:
using System;
public class GenericFunc
{
public static void Main()
{
Func<string, string> convertMethod = UppercaseString;
}
private static string UppercaseString(string inputString)
{
return inputString.ToUpper();
}
}
I'm sure there are also other ways. I remember doing something like
new Func<WorkItem, bool>((item => item.projectId == criteria.ProjectId))
But I'm not sure. anymore, I'm quite sure you need the "new" though, hope this helps :)

Can I use LINQ to retrieve only "on change" values?

What I'd like to be able to do is construct a LINQ query that retrieved me a few values from some DataRows when one of the fields changes. Here's a contrived example to illustrate:
Observation Temp Time
------------- ---- ------
Cloudy 15.0 3:00PM
Cloudy 16.5 4:00PM
Sunny 19.0 3:30PM
Sunny 19.5 3:15PM
Sunny 18.5 3:30PM
Partly Cloudy 16.5 3:20PM
Partly Cloudy 16.0 3:25PM
Cloudy 16.0 4:00PM
Sunny 17.5 3:45PM
I'd like to retrieve only the entries when the Observation changed from the previous one. So the results would include:
Cloudy 15.0 3:00PM
Sunny 19.0 3:30PM
Partly Cloudy 16.5 3:20PM
Cloudy 16.0 4:00PM
Sunny 17.5 3:45PM
Currently there is code that iterates through the DataRows and does the comparisons and construction of the results but was hoping to use LINQ to accomplish this.
What I'd like to do is something like this:
var weatherStuff = from row in ds.Tables[0].AsEnumerable()
where row.Field<string>("Observation") != weatherStuff.ElementAt(weatherStuff.Count() - 1) )
select row;
But that doesn't work - and doesn't compile since this tries to use the variable 'weatherStuff' before it is declared.
Can what I want to do be done with LINQ? I didn't see another question like it here on SO, but could have missed it.
Here is one more general thought that may be intereting. It's more complicated than what #tvanfosson posted, but in a way, it's more elegant I think :-). The operation you want to do is to group your observations using the first field, but you want to start a new group each time the value changes. Then you want to select the first element of each group.
This sounds almost like LINQ's group by but it is a bit different, so you can't really use standard group by. However, you can write your own version (that's the wonder of LINQ!). You can either write your own extension method (e.g. GroupByMoving) or you can write extension method that changes the type from IEnumerable to some your interface and then define GroupBy for this interface. The resulting query will look like this:
var weatherStuff =
from row in ds.Tables[0].AsEnumerable().AsMoving()
group row by row.Field<string>("Observation") into g
select g.First();
The only thing that remains is to define AsMoving and implement GroupBy. This is a bit of work, but it is quite generally useful thing and it can be used to solve other problems too, so it may be worth doing it :-). The summary of my post is that the great thing about LINQ is that you can customize how the operators behave to get quite elegant code.
I haven't tested it, but the implementation should look like this:
// Interface & simple implementation so that we can change GroupBy
interface IMoving<T> : IEnumerable<T> { }
class WrappedMoving<T> : IMoving<T> {
public IEnumerable<T> Wrapped { get; set; }
public IEnumerator<T> GetEnumerator() {
return Wrapped.GetEnumerator();
}
public IEnumerator<T> GetEnumerator() {
return ((IEnumerable)Wrapped).GetEnumerator();
}
}
// Important bits:
static class MovingExtensions {
public static IMoving<T> AsMoving<T>(this IEnumerable<T> e) {
return new WrappedMoving<T> { Wrapped = e };
}
// This is (an ugly & imperative) implementation of the
// group by as described earlier (you can probably implement it
// more nicely using other LINQ methods)
public static IEnumerable<IEnumerable<T>> GroupBy<T, K>(this IEnumerable<T> source,
Func<T, K> keySelector) {
List<T> elementsSoFar = new List<T>();
IEnumerator<T> en = source.GetEnumerator();
if (en.MoveNext()) {
K lastKey = keySelector(en.Current);
do {
K newKey = keySelector(en.Current);
if (newKey != lastKey) {
yield return elementsSoFar;
elementsSoFar = new List<T>();
}
elementsSoFar.Add(en.Current);
} while (en.MoveNext());
yield return elementsSoFar;
}
}
You could use the IEnumerable extension that takes an index.
var all = ds.Tables[0].AsEnumerable();
var weatherStuff = all.Where( (w,i) => i == 0 || w.Field<string>("Observation") != all.ElementAt(i-1).Field<string>("Observation") );
This is one of those instances where the iterative solution is actually better than the set-based solution in terms of both readability and performance. All you really want Linq to do is filter and pre-sort the list if necessary to prepare it for the loop.
It is possible to write a query in SQL Server (or various other databases) using windowing functions (ROW_NUMBER), if that's where your data is coming from, but very difficult to do in pure Linq without making a much bigger mess.
If you're just trying to clean the code up, an extension method might help:
public static IEnumerable<T> Changed(this IEnumerable<T> items,
Func<T, T, bool> equalityFunc)
{
if (equalityFunc == null)
{
throw new ArgumentNullException("equalityFunc");
}
T last = default(T);
bool first = true;
foreach (T current in items)
{
if (first || !equalityFunc(current, last))
{
yield return current;
}
last = current;
first = false;
}
}
Then you can call this with:
var changed = rows.Changed((r1, r2) =>
r1.Field<string>("Observation") == r2.Field<string>("Observation"));
I think what you are trying to accomplish is not possible using the "syntax suggar". However it could be possible using the extension method Select that pass the index of the item you are evaluating. So you could use the index to compare the current item with the previous one (index -1).
You could useMorelinq's GroupAdjacent() extension method
GroupAdjacent: Groups the adjacent elements of a sequence according to
a specified key selector function...This method has 4 overloads.
You would use it like this with the result selector overload to lose the IGrouping key:-
var weatherStuff = ds.Tables[0].AsEnumerable().GroupAdjacent(w => w.Field<string>("Observation"), (_, val) => val.Select(v => v));
This is a very popular extension to default Linq methods, with more than 1M downloads on Nuget (compared to MS's own Ix.net with ~40k downloads at time of writing)

Resources