How to merge a collection of collections in Linq - linq

I would like to be able to fusion an IEnumerable<IEnumerable<T>> into IEnumerable<T> (i.e. merge all individual collections into one). The Union operators only applies to two collections. Any idea?

Try
var it = GetTheNestedCase();
return it.SelectMany(x => x);
SelectMany is a LINQ transformation which essentially says "For Each Item in a collection return the elements of a collection". It will turn one element into many (hence SelectMany). It's great for breaking down collections of collections into a flat list.

var lists = GetTheNestedCase();
return
from list in lists
from element in list
select element;
is another way of doing this using C# 3.0 query expression syntax.

Related

When to prefer joins expressed with SelectMany() over joins expressed with the join keyword in Linq

Linq allows to express inner joins by using the join keyword or by using
SelectMany() (i.e. a couple of from keywords) with a where keyword:
var personsToState = from person in persons
join state in statesOfUS
on person.State equals state.USPS
select new { person, State = state.Name };
foreach (var item in personsToState)
{
System.Diagnostics.Debug.WriteLine(item);
}
// The same query can be expressed with the query operator SelectMany(), which is
// expressed as two from clauses and a single where clause connecting the sequences.
var personsToState2 = from person in persons
from state in statesOfUS
where person.State == state.USPS
select new { person, State = state.Name };
foreach (var item in personsToState2)
{
System.Diagnostics.Debug.WriteLine(item);
}
My question: when is it purposeful to use the join-style and when to use the where-style,
has one style performance advantages over the other style?
For local queries Join is more efficient due to its keyed lookup as Athari mentioned, however for LINQ to SQL (L2S) you'll get more mileage out of SelectMany. In L2S a SelectMany ultimately uses some type of SQL join in the generated SQL depending on your query.
Take a look at questions 11 & 12 of the LINQ Quiz by Joseph/Ben Albahari, authors of C# 4.0 In a Nutshell. They show samples of different types of joins and they state:
With LINQ to SQL, SelectMany-based
joins are the most flexible, and can
perform both equi and non-equi joins.
Throw in DefaultIfEmpty, and you can
do left outer joins as well!
In addition, Matt Warren has a detailed blog post on this topic as it pertains to IQueryable / SQL here: LINQ: Building an IQueryable provider - Part VII.
Back to your question of which to use, you should use whichever query is more readable and allows you to easily express yourself and construct your end goal clearly. Performance shouldn't be an initial concern unless you are dealing with large collections and have profiled both approaches. In L2S you have to consider the flexibility SelectMany offers you depending on the way you need to pair up your data.
Join is more efficient, it uses Lookup class (a variation of Dictionary with multiple values for a single key) to find matching values.

DynamicObject LINQ query does't works with custom class!

DynamicObject LINQ query with the List compiles fine:
List<string> list = new List<string>();
var query = (from dynamic d in list where d.FirstName == "John" select d);
With our own custom class that we use for the "usual" LINQ compiler reports the error "An expression tree may not contain a dynamic
operation":
DBclass db = new DBclass();
var query = (from dynamic d in db where d.FirstName == "John" select d);
What shall we add to handle DynamicObject LINQ?
Does DBClass implement IEnumerable? Perhaps there is a method on it you should be calling to return an IEnumerable collection?
You could add a type, against which to write the query.
I believe your problem is, that in the first expression, where you are using the List<>, everything is done in memory using IEnumerable & Link-to-Objects.
Apparently, your DBClass is an IQueryable using Linq-to-SQL. IQueryables use an expression tree to build an SQL statement to send to the database.
In other words, despite looking much alike, the two statements are doing radically different things, one of which is allowed & one which isn't. (Much in the way var y = x * 5; will either succeed or fail depending on if x is an int or a string).
Further, your first example may compile, but as far as I can tell, it will fail when you run it. That's not a particular good benchmark for success.
The only way I see this working is if the query using dynamic is made on IEnumerables using Link-to-Objects. (Load the full table into a List, and then query on the list)

LINQ to SQL many to many int ID array criteria query

Ok this should be really simple, but I am doing my head in here and have read all the articles on this and tried a variety of things, but no luck.
I have 3 tables in a classic many-to-many setup.
ITEMS
ItemID
Description
ITEMFEATURES
ItemID
FeatureID
FEATURES
FeatureID
Description
Now I have a search interface where you can select any number of Features (checkboxes).
I get them all nicely as an int[] called SearchFeatures.
I simply want to find the Items which have the Features that are contained in the SearchFeatures.
E.g. something like:
return db.Items.Where(x => SearchFeatures.Contains(x.ItemFeatures.AllFeatures().FeatureID))
Inside my Items partial class I have added a custom method Features() which simply returns all Features for that Item, but I still can't seem to integrate that in any usable way into the main LINQ query.
Grr, it's gotta be simple, such a 1 second task in SQL. Many thanks.
The following query will return the list of items based on the list of searchFeatures:
from itemFeature in db.ItemFeatures
where searchFeatures.Contains(itemFeature.FeatureID)
select itemFeature.Item;
The trick here is to start with the ItemFeatures table.
It is possible to search items that have ALL features, as you asked in the comments. The trick here is to dynamically build up the query. See here:
var itemFeatures = db.ItemFeatures;
foreach (var temp in searchFeatures)
{
// You will need this extra variable. This is C# magic ;-).
var searchFeature = temp;
// Wrap the collection with a filter
itemFeatures =
from itemFeature in itemFeatures
where itemFeature.FeatureID == searchFeature
select itemFeature;
}
var items =
from itemFeature in itemFeatures
select itemFeature.Item;

Using LINQ Expression Instead of NHIbernate.Criterion

If I were to select some rows based on certain criteria I can use ICriterion object in NHibernate.Criterion, such as this:
public List<T> GetByCriteria()
{
SimpleExpression newJobCriterion =
NHibernate.Criterion.Expression.Eq("LkpStatu", statusObject);
ICriteria criteria = Session.GetISession().CreateCriteria(typeof(T)).SetMaxResults(maxResults);
criteria.Add(newJobCriterion );
return criteria.List<T>();
}
Or I can use LINQ's where clause to filter what I want:
public List<T> GetByCriteria_LINQ()
{
ICriteria criteria = Session.GetISession().CreateCriteria(typeof(T)).SetMaxResults(maxResults);
return criteria.Where(item=>item.LkpStatu=statusObject).ToList();
}
I would prefer the second one, of course. Because
It gives me strong typing
I don't need to learn yet-another-syntax in the form of NHibernate
The issue is is there any performance advantage of the first one over the second one? From what I know, the first one will create SQL queries, so it will filter the data before pass into the memory. Is this kind of performance saving big enough to justify its use?
As usual it depends. First note that in your second snippet there is .List() missing right after return criteria And also note that you won't get the same results on both examples. The first one does where and then return top maxResults, the second one however first selects top maxResults and then does where.
If your expected result set is relatively small and you are likely to use some of the results in lazy loads then it's actually better to take the second approach. Because all entities loaded through a session will stay in its first level cache.
Usually however you don't do it this way and use the first approach.
Perhaps you wanted to use NHibernate.Linq (located in Contrib project ). Which does linq translation to Criteria for you.
I combine the two and made this:
var crit = _session.CreateCriteria(typeof (T)).SetMaxResults(100);
return (from x in _session.Linq<T>(crit) where x.field == <something> select x).ToList();

Checking for duplicates in a complex object using Linq or Lambda expression

I've just started learning linq and lambda expressions, and they seem to be a good fit for finding duplicates in a complex object collection, but I'm getting a little confused and hope someone can help put me back on the path to happy coding.
My object is structured like list.list.uniqueCustomerIdentifier
I need to ensure there are no duplicate uniqueCustomerIdentifier with in the entire complex object. If there are duplicates, I need to identify which are duplicated and return a list of the duplicates.
Unpack the hierarchy
Project each element to its uniqueID property
Group these ID's up
Filter the groups by groups that have more than 1 element
Project each group to the group's key (back to uniqueID)
Enumerate the query and store the result in a list.
var result =
myList
.SelectMany(x => x.InnerList)
.Select(y => y.uniqueCustomerIdentifier)
.GroupBy(id => id)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToList()
There is a linq operator Distinct( ), that allows you to filter down to a distinct set of records if you only want the ids. If you have setup your class to override equals you or have an IEqualityComparer you can directly call the Distinct extension method to return the unique results from the list. As an added bonus you can also use the Union and Intersect methods to merge or filter between two lists.
Another option would be to group by the id and then select the first element.
var results = from item in list
group item by item.id into g
select g.First();
If you want to flatten the two list hierarchies, use the SelectMany method to flatten an IEnumerable<IEnumerable<T>> into IEnumerable<T>.

Resources