What is the linq query that would replace my two foreach - linq

I'm struggeling to figure out how to replace the two foreach below.
public void UpdateChangedRows(List<Row> changedRows)
{
// How to write linq to update the rows (example code using foreach)
foreach (Row row in table.Rows)
{
foreach (Row changedRow in changedRows)
{
if (row.RowId==changedRow.RowId)
{
row.Values = changedRow.Values;
}
}
}
}
I think there would be a linq way to perform the same operation. Thanks for any help.
Larsi

Well, LINQ is more about querying than updating, so I'll still separate it into a LINQ query and then a foreach loop to do the update:
var query = from row in table.Rows
join changedRow in changedRows on row.RowId = changeRow.RowId
select { row, changedRow };
foreach (var entry in query)
{
entry.row.Values = entry.changedRow.Values;
}
Note that this is less efficient in terms of memory, in that it loads all the changed rows into a dictionary (internally) so that it can quickly look up the changed rows based on the row ID - but it means the looping complexity is a lot smaller (O(n) + O(m) instead of O(n * m) if each row only matches a single changed row and vice versa).

Related

Can I refer to items within a LINQ result set by index?

I'm trying to work with a LINQ result set of 4 tables retrieved with html agility pack. I'd like to process each one slightly differently by setting a variable for each (switch statement below), and then processing the rows within the table. The variable would ideally be the index for each of the tables in the set, 0 to 3, and would be used in the switch statement and to select the rows. I haven't been able to locate the index property, but I see it used in situations such as SelectChildNode.
My question is can I refer to items within a LINQ result set by index? My "ideal scenario" is the last commented out line. Thanks in advance.
var ratingsChgs = from table in htmlDoc.DocumentNode
.SelectNodes("//table[#class='calendar-table']")
.Cast<HtmlNode>()
select table;
String rtgChgType;
for (int ratingsChgTbl = 0; ratingsChgTbl < 4; ratingsChgTbl++)
{
switch (ratingsChgTbl)
{
case 0:
rtgChgType = "Upgrades";
break;
case 1:
rtgChgType = "Downgrades";
break;
case 2:
rtgChgType = "Coverage Initiated";
break;
case 3:
rtgChgType = "Coverage Reit/ Price Tgt Changed";
break;
//This is what I'd like to do.
var tblRowsByChgType = from row in ratingsChgs[ratingsChgTbl]
.SelectNodes("tr")
select row;
//Processing of returned rows.
}
}
ElementAt does what you're asking for. I don't recommend using it in your example, though, because each time you call it, your initial LINQ query will be executed. The easy fix is to have ratingsChgs be a List or Array.
You can also refactor out the switch statement. It is overkill when you only need to iterate through a list of items. Here is a possible solution:
var ratingsChgs = from table in htmlDoc.DocumentNode
.SelectNodes("//table[#class='calendar-table']")
.Cast<HtmlNode>()
select table;
var rtgChgTypeNames = new List
{
"Upgrades",
"Downgrades",
"Coverage Initiated",
"Coverage Reit/ Price Tgt Changed"
};
var changeTypes = ratingsChgs.Zip(rtgChgTypeNames, (changeType, name) => new
{
Name = name,
Rows = changeType.SelectNodes("tr")
});
foreach( var changeType in changeTypes)
{
var name = changeType.Name;
var rows = changeType.Rows;
//Processing of returned rows.
}
Also, why not store your rating change types in the HTML doc? It seems odd to have table information defined in the business logic.

LINQ Distinct set by column value

Is there a simple LINQ query to get distinct records by a specific column value (not the whole record)?
Anyone know how i can filter a list with only distinct values?
You could use libraries like morelinq to do this. You'd be interested in the DistinctBy() method.
var query = records.DistinctBy(record => record.Column);
Otherwise, you could do this by hand.
var query =
from record in records
group record by record.Column into g
select g.First();
Select a single value first and then run the Distinct.
(from item in table
select item.TheSingleValue).Distinct();
If you want the entire record you need to use group x by into y. You then need to find a suitable aggregate function like First, Max, Average or similar to select one of the other values in the group.
from item in table
group item by item.TheSingleValue into g
select new { TheSingleValue = g.Key, OtherValue1 = g.First().OtherValue1, OtherValue2 = g.First().OtherValue2 };
You could make an implementation of the IEqualityComparer interface:
public class MyObjectComparer : IEqualityComparer<MyObject>
{
public bool Equals(MyObject x, MyObject y)
{
return x.ColumnNameProperty == y.ColumnNameProperty;
}
public int GetHashCode(MyObject obj)
{
return obj.ColumnNameProperty.GetHashCode();
}
}
And pass an instance into the Distinct method:
var distinctSource = source.Distinct(new MyObjectComparer());

Longish LINQ query breakes SQLite-parser - simplify?

I'm programming a search for a SQLite-database using C# and LINQ.
The idea of the search is, that you can provide one or more keywords, any of which must be contained in any of several column-entries for that row to be added to the results.
The implementation consists of several linq-queries which are all put together by union. More keywords and columns that have to be considered result in a more complicated query that way. This can lead to SQL-code, which is to long for the SQLite-parser.
Here is some sample code to illustrate:
IQueryable<Reference> query = null;
if (searchAuthor)
foreach (string w in words)
{
string word = w;
var result = from r in _dbConnection.GetTable<Reference>()
where r.ReferenceAuthor.Any(a => a.Person.LastName.Contains(word) || a.Person.FirstName.Contains(word))
orderby r.Title
select r;
query = query == null ? result : query.Union(result);
}
if (searchTitle)
foreach (string word in words)
{
var result = from r in _dbConnection.GetTable<Reference>()
where r.Title.Contains(word)
orderby r.Title
select r;
query = query == null ? result : query.Union(result);
}
//...
Is there a way to structure the query in a way that results in more compact SQL?
I tried to force the creation of smaller SQL-statments by calling GetEnumerator() on the query after every loop. But apparently Union() doesn't operate on data, but on the underlying LINQ/SQL statement, so I was generating to long statements regardless.
The only solution I can think of right now, is to really gather the data after every "sub-query" and doing a union on the actual data and not in the statement. Any ideas?
For something like that, you might want to use a PredicateBuilder, as shown in the chosen answer to this question.

LINQ OrderBy versus ThenBy

Can anyone explain what the difference is between:
tmp = invoices.InvoiceCollection
.OrderBy(sort1 => sort1.InvoiceOwner.LastName)
.OrderBy(sort2 => sort2.InvoiceOwner.FirstName)
.OrderBy(sort3 => sort3.InvoiceID);
and
tmp = invoices.InvoiceCollection
.OrderBy(sort1 => sort1.InvoiceOwner.LastName)
.ThenBy(sort2 => sort2.InvoiceOwner.FirstName)
.ThenBy(sort3 => sort3.InvoiceID);
Which is the correct approach if I wish to order by 3 items of data?
You should definitely use ThenBy rather than multiple OrderBy calls.
I would suggest this:
tmp = invoices.InvoiceCollection
.OrderBy(o => o.InvoiceOwner.LastName)
.ThenBy(o => o.InvoiceOwner.FirstName)
.ThenBy(o => o.InvoiceID);
Note how you can use the same name each time. This is also equivalent to:
tmp = from o in invoices.InvoiceCollection
orderby o.InvoiceOwner.LastName,
o.InvoiceOwner.FirstName,
o.InvoiceID
select o;
If you call OrderBy multiple times, it will effectively reorder the sequence completely three times... so the final call will effectively be the dominant one. You can (in LINQ to Objects) write
foo.OrderBy(x).OrderBy(y).OrderBy(z)
which would be equivalent to
foo.OrderBy(z).ThenBy(y).ThenBy(x)
as the sort order is stable, but you absolutely shouldn't:
It's hard to read
It doesn't perform well (because it reorders the whole sequence)
It may well not work in other providers (e.g. LINQ to SQL)
It's basically not how OrderBy was designed to be used.
The point of OrderBy is to provide the "most important" ordering projection; then use ThenBy (repeatedly) to specify secondary, tertiary etc ordering projections.
Effectively, think of it this way: OrderBy(...).ThenBy(...).ThenBy(...) allows you to build a single composite comparison for any two objects, and then sort the sequence once using that composite comparison. That's almost certainly what you want.
I found this distinction annoying in trying to build queries in a generic manner, so I made a little helper to produce OrderBy/ThenBy in the proper order, for as many sorts as you like.
public class EFSortHelper
{
public static EFSortHelper<TModel> Create<TModel>(IQueryable<T> query)
{
return new EFSortHelper<TModel>(query);
}
}
public class EFSortHelper<TModel> : EFSortHelper
{
protected IQueryable<TModel> unsorted;
protected IOrderedQueryable<TModel> sorted;
public EFSortHelper(IQueryable<TModel> unsorted)
{
this.unsorted = unsorted;
}
public void SortBy<TCol>(Expression<Func<TModel, TCol>> sort, bool isDesc = false)
{
if (sorted == null)
{
sorted = isDesc ? unsorted.OrderByDescending(sort) : unsorted.OrderBy(sort);
unsorted = null;
}
else
{
sorted = isDesc ? sorted.ThenByDescending(sort) : sorted.ThenBy(sort)
}
}
public IOrderedQueryable<TModel> Sorted
{
get
{
return sorted;
}
}
}
There are a lot of ways you might use this depending on your use case, but if you were for example passed a list of sort columns and directions as strings and bools, you could loop over them and use them in a switch like:
var query = db.People.AsNoTracking();
var sortHelper = EFSortHelper.Create(query);
foreach(var sort in sorts)
{
switch(sort.ColumnName)
{
case "Id":
sortHelper.SortBy(p => p.Id, sort.IsDesc);
break;
case "Name":
sortHelper.SortBy(p => p.Name, sort.IsDesc);
break;
// etc
}
}
var sortedQuery = sortHelper.Sorted;
The result in sortedQuery is sorted in the desired order, instead of resorting over and over as the other answer here cautions.
if you want to sort more than one field then go for ThenBy:
like this
list.OrderBy(personLast => person.LastName)
.ThenBy(personFirst => person.FirstName)
Yes, you should never use multiple OrderBy if you are playing with multiple keys.
ThenBy is safer bet since it will perform after OrderBy.

LINQ to SQL bug (or very strange feature) when using IQueryable, foreach, and multiple Where

I ran into a scenario where LINQ to SQL acts very strangely. I would like to know if I'm doing something wrong. But I think there is a real possibility that it's a bug.
The code pasted below isn't my real code. It is a simplified version I created for this post, using the Northwind database.
A little background: I have a method that takes an IQueryable of Product and a "filter object" (which I will describe in a minute). It should run some "Where" extension methods on the IQueryable, based on the "filter object", and then return the IQueryable.
The so-called "filter object" is a System.Collections.Generic.List of an anonymous type of this structure: { column = fieldEnum, id = int }
The fieldEnum is an enum of the different columns of the Products table that I would possibly like to use for the filtering.
Instead of explaining further how my code works, it's easier if you just take a look at it. It's simple to follow.
enum filterType { supplier = 1, category }
public IQueryable<Product> getIQueryableProducts()
{
NorthwindDataClassesDataContext db = new NorthwindDataClassesDataContext();
IQueryable<Product> query = db.Products.AsQueryable();
//this section is just for the example. It creates a Generic List of an Anonymous Type
//with two objects. In real life I get the same kind of collection, but it isn't hard coded like here
var filter1 = new { column = filterType.supplier, id = 7 };
var filter2 = new { column = filterType.category, id = 3 };
var filterList = (new[] { filter1 }).ToList();
filterList.Add(filter2);
foreach(var oFilter in filterList)
{
switch (oFilter.column)
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == oFilter.id);
break;
case filterType.category:
query = query.Where(p => p.CategoryID == oFilter.id);
break;
default:
break;
}
}
return query;
}
So here is an example. Let's say the List contains two items of this anonymous type, { column = fieldEnum.Supplier, id = 7 } and { column = fieldEnum.Category, id = 3}.
After running the code above, the underlying SQL query of the IQueryable object should contain:
WHERE SupplierID = 7 AND CategoryID = 3
But in reality, after the code runs the SQL that gets executed is
WHERE SupplierID = 3 AND CategoryID = 3
I tried defining query as a property and setting a breakpoint on the setter, thinking I could catch what's changing it when it shouldn't be. But everything was supposedly fine. So instead I just checked the underlying SQL after every command. I realized that the first Where runs fine, and query stays fine (meaning SupplierID = 7) until right after the foreach loop runs the second time. Right after oFilter becomes the second anonymous type item, and not the first, the 'query' SQL changes to Supplier = 3. So what must be happening here under-the-hood is that instead of just remembering that Supplier should equal 7, LINQ to SQL remembers that Supplier should equal oFilter.id. But oFilter is a name of a single item of a foreach loop, and it means something different after it iterates.
I have only glanced at your question, but I am 90% sure that you should read the first section of On lambdas, capture, and mutability (which includes links to 5 similar SO questions) and all will become clear.
The basic gist of it is that the variable oFilter in your example has been captured in the closure by reference and not by value. That means that once the loop finishes iterating, the variable's reference is to the last one, so the value as evaluated at lambda execution time is the final one as well.
The cure is to insert a new variable inside the foreach loop whose scope is only that iteration rather than the whole loop:
foreach(var oFilter in filterList)
{
var filter = oFilter; // add this
switch (oFilter.column) // this doesn't have to change, but can for consistency
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == filter.id); // use `filter` here
break;
Now each closure is over a different filter variable that is declared anew inside of each loop, and your code will run as expected.
Working as designed. The issue you are confronting is the clash between lexical closure and mutable variables.
What you probably want to do is
foreach(var oFilter in filterList)
{
var o = oFilter;
switch (o.column)
{
case filterType.supplier:
query = query.Where(p => p.SupplierID == o.id);
break;
case filterType.category:
query = query.Where(p => p.CategoryID == o.id);
break;
default:
break;
}
}
When compiled to IL, the variable oFilter is declared once and used multiply. What you need is a variable declared separately for each use of that variable within a closure, which is what o is now there for.
While you're at it, get rid of that bastardized Hungarian notation :P.
I think this is the clearest explanation I've ever seen: http://blogs.msdn.com/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx:
Basically, the problem arises because we specify that the foreach loop is a syntactic sugar for
{
IEnumerator<int> e = ((IEnumerable<int>)values).GetEnumerator();
try
{
int m; // OUTSIDE THE ACTUAL LOOP
while(e.MoveNext())
{
m = (int)(int)e.Current;
funcs.Add(()=>m);
}
}
finally
{
if (e != null) ((IDisposable)e).Dispose();
}
}
If we specified that the expansion was
try
{
while(e.MoveNext())
{
int m; // INSIDE
m = (int)(int)e.Current;
funcs.Add(()=>m);
}
then the code would behave as expected.
The problem is that you're not appending to the query, you're replacing it each time through the foreach statement.
You want something like the PredicateBuilder - http://www.albahari.com/nutshell/predicatebuilder.aspx

Resources