How can I get the "actual" count of element in a IEnumerable? - linq

If I wrote :
for (int i = 0; i < Strutture.Count(); i++)
{
}
and Strutture is an IEnumerable with 200 elements, IIS crash. That's because I see every time I do Strutture.Count() it executes all LINQ queries linked with that IEnumerable.
So, how can I get the "current" number of elements? I need a list?

"That's because I see every time I do Strutture.Count() it executes all LINQ queries linked with that IEnumerable."
Without doing such, how is it going to know how many elements there are?
For example:
Enumerable.Range(0,1000).Where(i => i % 2==0).Skip(100).Take(5).Count();
Without executing the LINQ, how could you know how many elements there are?
If you want to know how many elements there are in the source (e.g. Enumerable.Range) then I suggest you use a reference to that source and query it directly. E.g.
var numbers = Enumerable.Range(0,1000);
numbers.Count();
Also keep in mind some data sources don't really have a concept of 'Count' or if they do it involves going through every single item and counting them.
Lastly, if you're using .Count() repetitively [and you don't expect the value to actually change] it can be a good idea to cache:
var count = numbers.Count();
for (int i =0; i<count; i++) // Do Something
Supplemental:
"At first Count(), LINQ queries are executes. Than, for the next, it just "check" the value :) Not "execute the LINQ query again..." :)" - Markzzz
Then why don't we do that?
var query = Enumerable.Range(0,1000).Where(i => i % 2==0).Skip(100).Take(5).Count();
var result = query.ToArray() //Gets and stores the result!
result.Length;
:)
"But when I do the first "count", it should store (after the LINQ queries) the new IEnumerable (the state is changed). If I do again .Count(), why LINQ need to execute again ALL queries." - Markzzz
Because you're creating a query that gets compiled down into X,Y,Z. You're running the same query twice however the result may vary.
For example, check this out:
static void Main(string[] args)
{
var dataSource = Enumerable.Range(0, 100).ToList();
var query = dataSource.Where(i => i % 2 == 0);
//Run the query once and return the count:
Console.WriteLine(query.Count()); //50
//Now lets modify the datasource - remembering this could be a table in a db etc.
dataSource.AddRange(Enumerable.Range(100, 100));
//Run the query again and return the count:
Console.WriteLine(query.Count()); //100
Console.ReadLine();
}
This is why I recommended storing the results of the query above!

Materialize the number:
int number = Strutture.Count();
for (int i = 0; i < number; i++)
{
}
or materialize the list:
var list = Strutture.ToList();
for (int i = 0; i < list.Count; i++)
{
}
or use a foreach
foreach(var item in Strutture)
{
}

Related

LINQ Out of Memory Error

I am querying 200k records and using up all the server's memory (no surprise). I am new to LINQ so I found the following code that should help me but I don't know how to use it:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
{
List<T> nextbatch = new List<T>(batchSize);
foreach (T item in collection)
{
nextbatch.Add(item);
if (nextbatch.Count == batchSize)
{
yield return nextbatch;
nextbatch = new List<T>(batchSize);
}
}
if (nextbatch.Count > 0)
yield return nextbatch;
}
Source: http://goo.gl/aQZIj
Here is my code which creates the "out of memory" error. How do I incorporate the new Batch function into my code?
var crmMetrics = _crmDbContext.tpm_metricsSet.Where(a => a.ModifiedOn >= lastRunDate);
foreach (var crmMetric in crmMetrics)
{
metric = new Metric();
metric.ProductKey = crmMetric.tpm_Product.Id;
dbContext.Metrics.Add(metric);
dbContext.SaveChanges();
}
It's an extension method, so if it is part of a static class and there is a reference to the class's namespace in your code you could do:
var crmMetricsBatches = _crmDbContext.tpm_metricsSet
.Where(a => a.ModifiedOn >= lastRunDate)
.AsEnumerable() // !!
.Batch(20);
Except it wouldn't help. By the .AsEnumerable(), you still fetch all data in memory but now in chunks of 20. This is because you can't use the method directly against IQueryable: Entity Framework will try to translate it to SQL but of course has no clue how to do that.
As said by TGH, Skip and Take are more made for this:
var crmMetricsPage = _crmDbContext.tpm_metricsSet
.Where(a => a.ModifiedOn >= lastRunDate)
.OrderBy(a => a.??) // some property you choose
.Skip(pageNo * pageSize)
.Take(pageSize);
where pageNo counts from 0 to the number of pages (- 1) you're going to need. Skip and Take are expressions, and EF knows how to convert these to SQL. The OrderBy is required for EF to know where to start skipping.
In this process, called paging, you always get pageSize records at a time. The number of queries is greater, but resources are spared. One condition is that you can determine a pageSize in advance. I don't know if this fits with your logic.
If you can't use paging you should try to narrow the filter (Where(a => a.ModifiedOn >= lastRunDate), e.g. try to get the data in batches of one day or week.
I would use Linq's Skip and Take to get the batches
Check this out:
http://www.c-sharpcorner.com/UploadFile/3d39b4/take-and-skip-operator-in-linq-to-sql/

Truncating a collection using Linq query

I want to extract part of a collection to another collection.
I can easily do the same using a for loop, but my linq query is not working for the same.
I am a neophyte in Linq, so please help me correcting the query (if possible with explanation / beginners tutorial link)
Legacy way of doing :
Collection<string> testColl1 = new Collection<string> {"t1", "t2", "t3", "t4"};
Collection<string> testColl2 = new Collection<string>();
for (int i = 0; i < newLength; i++)
{
testColl2.Add(testColl1[i]);
}
Where testColl1 is the source & testColl2 is the desired truncated collection of count = newLength.
I have used the following linq queries, but none of them are working ...
var result = from t in testColl1 where t.Count() <= newLength select t;
var res = testColl1.Where(t => t.Count() <= newLength);
Use Enumerable.Take:
var testColl2 = testColl1.Take(newLength).ToList();
Note that there's a semantic difference between your for loop and the version using Take. The for loop will throw with IndexOutOfRangeException exception if there are less than newLength items in testColl1, whereas the Take version will silently ignore this fact and just return as many items up to newLength items.
The correct way is by using Take:
var result = testColl1.Take(newLength);
An equivalent way using Where is:
var result = testColl1.Where((i, item) => i < newLength);
These expressions will produce an IEnumerable, so you might also want to attach a .ToList() or .ToArray() at the end.
Both ways return one less item than your original implementation does because it is more natural (e.g. if newLength == 0 no items should be returned).
You could convert to for loop to something like this:
testColl1.Take(newLength)
Use Take:
var result = testColl1.Take(newLength);
This extension method returns the first N elements from the collection where N is the parameter you pass, in this case newLength.

LINQ: Field is not a reference field

I've got a list of IQueryable. I'm trying to split this list into an array of IQueryable matching on a certain field (say fieldnum) in the first list...
for example, if fieldnum == 1, it should go into array[1]. I'm using Where() to filter based on this field, it looks something like this:
var allItems = FillListofMyObjects();
var Filtered = new List<IQueryable<myObject>(MAX+1);
for (var i = 1; i <= MAX; i++)
{
var sublist = allItems.Where(e => e.fieldnum == i);
if (sublist.Count() == 0) continue;
Filtered[i] = sublist;
}
however, I'm getting the error Field "t1.fieldnum" is not a reference field on the if line. stepping through the debugger shows the error actually occurs on the line before (the Where() method) but either way, I don't know what I'm doing wrong.
I'm farily new to LINQ so if I'm doing this all wrong please let me know, thanks!
Why don't you just use ToLookup?
var allItemsPerFieldNum = allItems.ToLookup(e => e.fieldnum);
Do you need to reevaluate the expression every time you get the values?
Why not use a dictionary?
var dictionary = allItems.ToDictionar(y => y.fieldnum);

Linq to NHibernate, Order by Rand()?

I'm using Linq To Nhibernate, and with a HQL statement I can do something like this:
string hql = "from Entity e order by rand()";
Andi t will be ordered so random, and I'd link to know How can I do the same statement with Linq to Nhibernate ?
I try this:
var result = from e in Session.Linq<Entity>
orderby new Random().Next(0,100)
select e;
but it throws a exception and doesn't work...
is there any other way or solution?
Thanks
Cheers
I guess Linq to NHibernate is unable to convert the Random.Next call to SQL...
An option would be to sort the results after you retrieve them from the DB :
var rand = new Random();
var query = from e in Session.Linq<Entity>
select e;
var result = from e in query.AsEnumerable()
orderby rand.Next(0,100)
select e;
Note that you need to use a single instance of Random, because the seed is based on the current number of ticks ; if you create several instances of Random with very short interval, they will have the same seed, and generate the same sequence.
Anyway, sorting a collection based on a random number is not such a good idea, because the sort won't be stable and could theoretically last forever. If you need to shuffle the results, you can use the Fisher-Yates algorithm :
var result = Session.Linq<Entity>().ToArray();
var rand = new Random();
for(int i = result.Length - 1; i > 0; i--)
{
int swapIndex = rand.Next(i + 1);
var tmp = result[i];
result[i] = result[swapIndex];
result[swapIndex] = tmp;
}

Paging a collection with LINQ

How do you page through a collection in LINQ given that you have a startIndex and a count?
It is very simple with the Skip and Take extension methods.
var query = from i in ideas
select i;
var paggedCollection = query.Skip(startIndex).Take(count);
A few months back I wrote a blog post about Fluent Interfaces and LINQ which used an Extension Method on IQueryable<T> and another class to provide the following natural way of paginating a LINQ collection.
var query = from i in ideas
select i;
var pagedCollection = query.InPagesOf(10);
var pageOfIdeas = pagedCollection.Page(2);
You can get the code from the MSDN Code Gallery Page: Pipelines, Filters, Fluent API and LINQ to SQL.
I solved this a bit differently than what the others have as I had to make my own paginator, with a repeater. So I first made a collection of page numbers for the collection of items that I have:
// assumes that the item collection is "myItems"
int pageCount = (myItems.Count + PageSize - 1) / PageSize;
IEnumerable<int> pageRange = Enumerable.Range(1, pageCount);
// pageRange contains [1, 2, ... , pageCount]
Using this I could easily partition the item collection into a collection of "pages". A page in this case is just a collection of items (IEnumerable<Item>). This is how you can do it using Skip and Take together with selecting the index from the pageRange created above:
IEnumerable<IEnumerable<Item>> pageRange
.Select((page, index) =>
myItems
.Skip(index*PageSize)
.Take(PageSize));
Of course you have to handle each page as an additional collection but e.g. if you're nesting repeaters then this is actually easy to handle.
The one-liner TLDR version would be this:
var pages = Enumerable
.Range(0, pageCount)
.Select((index) => myItems.Skip(index*PageSize).Take(PageSize));
Which can be used as this:
for (Enumerable<Item> page : pages)
{
// handle page
for (Item item : page)
{
// handle item in page
}
}
This question is somewhat old, but I wanted to post my paging algorithm that shows the whole procedure (including user interaction).
const int pageSize = 10;
const int count = 100;
const int startIndex = 20;
int took = 0;
bool getNextPage;
var page = ideas.Skip(startIndex);
do
{
Console.WriteLine("Page {0}:", (took / pageSize) + 1);
foreach (var idea in page.Take(pageSize))
{
Console.WriteLine(idea);
}
took += pageSize;
if (took < count)
{
Console.WriteLine("Next page (y/n)?");
char answer = Console.ReadLine().FirstOrDefault();
getNextPage = default(char) != answer && 'y' == char.ToLowerInvariant(answer);
if (getNextPage)
{
page = page.Skip(pageSize);
}
}
}
while (getNextPage && took < count);
However, if you are after performance, and in production code, we're all after performance, you shouldn't use LINQ's paging as shown above, but rather the underlying IEnumerator to implement paging yourself. As a matter of fact, it is as simple as the LINQ-algorithm shown above, but more performant:
const int pageSize = 10;
const int count = 100;
const int startIndex = 20;
int took = 0;
bool getNextPage = true;
using (var page = ideas.Skip(startIndex).GetEnumerator())
{
do
{
Console.WriteLine("Page {0}:", (took / pageSize) + 1);
int currentPageItemNo = 0;
while (currentPageItemNo++ < pageSize && page.MoveNext())
{
var idea = page.Current;
Console.WriteLine(idea);
}
took += pageSize;
if (took < count)
{
Console.WriteLine("Next page (y/n)?");
char answer = Console.ReadLine().FirstOrDefault();
getNextPage = default(char) != answer && 'y' == char.ToLowerInvariant(answer);
}
}
while (getNextPage && took < count);
}
Explanation: The downside of using Skip() for multiple times in a "cascading manner" is, that it will not really store the "pointer" of the iteration, where it was last skipped. - Instead the original sequence will be front-loaded with skip calls, which will lead to "consuming" the already "consumed" pages over and over again. - You can prove that yourself, when you create the sequence ideas so that it yields side effects. -> Even if you have skipped 10-20 and 20-30 and want to process 40+, you'll see all side effects of 10-30 being executed again, before you start iterating 40+.
The variant using IEnumerable's interface directly, will instead remember the position of the end of the last logical page, so no explicit skipping is needed and side effects won't be repeated.

Resources