Linq First or Single have drastic effect on performance - performance

I found that adding .First() after Take(1) in the following LINQ query
{
var qty= (from ii in Inventory
where ii.Part == "abc" & ii.Zone == "xyz"
select ii.Qty).Take(1);
}
increases execution time several thousand times. Same with .Single(). Wondering why. Note that even without First the result already has only one record.
Full code:
namespace ConsoleApp1
{
class SurroundingClass
{
class part
{
public string id { get; set; }
public string zone { get; set; }
public int qty { get; set; }
}
public static void Main()
{
List<part> inventory = new List<part>();
for (var i = 1; i <= 50000; i++)
inventory.Add(new part() { id = System.Convert.ToString(i), zone = System.Convert.ToString(i), qty = 3 });
object qty1;
DateTime d0 = DateTime.Now;
for (var i = 1; i <= 20000; i++)
qty1 = (from x in inventory
where x.id == "40000" & x.zone == "40000"
select x.qty).Take(1).First();
DateTime d1 = DateTime.Now;
Console.WriteLine(((TimeSpan)(d1 - d0)).Seconds);
}
}
}

In the first code example you are not executing query, only creating it.
First(), Single(), ToArray() and some other methods triggering query execution / enumeration.

According to #Vladimir 's answer, you need to be aware of between Deferred and Immediate Query Execution in LINQ.
From that point: .First() and Single or even you foreach, For Each loop, it called Immediate Query Execution.
So that's the reason why your query increases execution time several thousand times.
Some tips which one should you use?:
Immediate Query Execution
If you want to cache the results of a query.
If you want to get the final result without re-executing query.
Deferred Query Execution
If you want to build the complexity of a query in several steps by separating query construction from query execution.
If you want to fetch the latest information.

Related

Performance issue in IEnumerable type when querying large amount of data with LINQ

I'm using LINQ to execute a query on a List type variable with a large amount of data (over a million). For performance purposes I'm using IEnumerable to store the results but when I try to access it there is a slight delay.
Specifically I want to see if the query produced any results, but when I use the .Count() or .Any() functions the performance drops.
I read that for IEnumerable types the execution of the query happens at the time of need, hence the delay. Is there a way to see if the IEnumerable has elements inside it without having that much delay?
This is what I'm trying to run.
IEnumerable<Entity> matchingEntities = entities.Where(e => e.Names.Any(n => myEntity.Names.Any(entityName => entityName.CompareNameObjects(n))));
and here are my classes
public class Entity
{
public string EntityIdentifier { get; set; }
public List<Name> Names { get; set; }
}
public class Name
{
public string FullName { get; set; }
public string NameType { get; set; }
public bool CompareNameObjects(Name name2)
{
return FullName == name2.FullName &&
NameType == name2.NameType;
}
}
entities is a list of all my objects and I want to check if myEntity has any Names identical with another entity in the set.
EDITED:
The data structure is similar to the 2 classes (Entity and Name). The entities are created by selecting all the entities, along with their names, from the database in XML format and then I convert the XML to a List as such:
List<Entity> entities = new List<Entity>();
using (SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["myCS"].ConnectionString))
{
conn.Open();
SqlCommand cmd = new SqlCommand("GetAllEntities", conn);
cmd.CommandType = CommandType.StoredProcedure;
string entitiesXml = "";
using (SqlDataReader rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
entitiesXml += rdr["XmlString"].ToString();
}
}
using (TextReader reader = new StringReader(entitiesXml))
entities = (Entity)xmlSerializer.Deserialize(reader);
conn.Close();
}
GetAllEntities (Stored Procedure):
declare #xmlString nvarchar(max) =(
select e.EntityIdentifier,
(
select n.[Full Name] as 'FullName',
n.[Name Type] as 'NameType'
from tblNames n
where e.EntityID=n.[Entity_ID]
for xml path('Name'), type
)
from tblEntities e
order by e.EntityID
for xml path('Entity')
)
select #xmlString as XmlString
Basically, you should avoid getting all data from your database then filter it with C# code. It consumes a lot of effort.
However, for quick solution, you can improve performance by preparing your conditions in a Dictionary form firstly.
// Let's say you have myEntity here
var myEntity = new Entity();
var entities = new List<Entity>();
// You should prepare the list of name that you wanna to find before you do it so that you don't have to make it repeatedly for every iteration
var names = myEntity.Names.Select(p=> p.FullName + p.NameType ).ToDictionary(p=>p, p=>p);
IEnumerable<Entity> matchingEntities = entities.Where(e => e.Names.Any(n => names.ContainsKey(n.FullName + n.NameType)));
This is just an example that may give you more idea. You can improve much more. I hope it can help you

I want to convert the timestamp in over 200000 rows after retriving them from sql server using LINQ. What is the optimized way to do that

I am using Linq to retrive 1 year worth of data from sql server based on a ID. Then I have to iterate through all the rows and convert the Timestamp to epoch format but it is taking too long to do so. What is the optimized way of doing that?
dt = DateTime.Now.AddMonths(-12);
var allData = obj.tableName.Where(m => m.AssetId == id&& m.CreatedDateTime >
dt).OrderBy(m => m.CreatedDateTime);
foreach (var eachDataRow in allData){
double date = ToUnixEpoch(eachDataRow.CreatedDateTime);
//then save date in an array
}
The above sample code is doing eerything correctly but taking over 30 secs to finish the job. I have around 200000 data points.
Should I not use linq? what is the best way to do this?
Linq is for querying, not for working. System.Threading.Tasks.Parallel provides support for parallel loops and regions. I think this is what you're searching for:
static void Main(string[] args)
{
IQueryable<MyClass> tmp = new List<MyClass>().AsQueryable<MyClass>();
Parallel.ForEach(tmp, ConvertIfNeeded); // here the magic is happening. Put in your list and the method to be run
}
public class MyClass // some dummy-class to work with
{
public DateTime CreatedDateTime { get; set; }
}
static readonly DateTime dt = DateTime.Now.AddMonths(-12);
public static void ConvertIfNeeded(MyClass t) // static function to be used in out parallel work
{
if (t.CreatedDateTime > dt)
t.CreatedDateTime = ToUnitEpoch(t.CreatedDateTime);
}
System.Threading.Tasks.Parallel#MSDN

Linq to Objects - Left Outer Join Distinct Object Property Values to an Aggregate Count

Lets say I have a generic list of the the following objects:
public class Supermarket
{
public string Brand { get; set; }
public string Suburb { get; set; }
public string State { get; set; }
public string Country { get; set; }
}
So using a List<Supermarket> which is populated with many of these objects with different values I am trying to:
Select the distinct Suburb properties from a
superset of Supermarket objects contained in a List<Supermarket> (say this superset contains 20
distinct Suburbs).
Join the Distinct List of Suburbs above to another set of aggregated and counted Suburbs obtained by a LINQ query to a different, smaller list of List<Supermarket>
The distinct items in my superset are:
"Blackheath"
"Ramsgate"
"Penrith"
"Vaucluse"
"Newtown"
And the results of my aggregate query are:
"Blackheath", 50
"Ramsgate", 30
"Penrith", 10
I want to join them to get
"Blackheath", 50
"Ramsgate", 30
"Penrith", 10
"Vaucluse", 0
"Newtown", 0
Here is what I have tried so far:
var results = from distinctSuburb in AllSupermarkets.Select(x => x.Suburb).Distinct()
select new
{
Suburb = distinctSuburb,
Count = (from item in SomeSupermarkets
group item by item.Suburb into aggr
select new
{
Suburb = aggr.Key,
Count = aggr.Count()
} into merge
where distinctSuburb == merge.Suburb
select merge.Count).DefaultIfEmpty(0)
} into final
select final;
This is the first time I have had to post on Stack Overflow as its such a great resource, but I can't seem to cobble together a solution for this.
Thanks for your time
EDIT: OK So I solved this a short while after the initial post. The only thing I was missing was chaining a call to .ElementAtOrDefault(0) after the call to .DefaultIfEmpty(0). I also verifed that using .First() instead of .DefaultIfEmpty(0) as Ani pointed out worked, The correct query is as follows:
var results = from distinctSuburb in AllSupermarkets.Select(x => x.Suburb).Distinct()
select new
{
Suburb = distinctSuburb,
Count = (from item in SomeSupermarkets
group item by item.Suburb into aggr
select new
{
Suburb = aggr.Key,
Count = aggr.Count()
} into merge
where distinctSuburb == merge.Suburb
select merge.Count).DefaultIfEmpty(0).ElementAtOrDefault(0)
} into final
select final;
LASTLY: I ran Ani's code snippet and confirmed that it ran successfully, so both approaches work and solve the original question.
I don't really understand the assumed equivalence between State and Suburb (where distinctSuburb == merge.State), but you can fix your query adding a .First() after the DefaultIfEmpty(0) call.
But here's how I would write your query: using a GroupJoin:
var results = from distinctSuburb in AllSupermarkets.Select(x => x.Suburb).Distinct()
join item in SomeSupermarkets
on distinctSuburb equals item.Suburb
into suburbGroup
select new
{
Suburb = distinctSuburb,
Count = suburbGroup.Count()
};

LINQ (Dynamic): OrderBy within a GroupBy using dynamic linq?

I had the following query using normal linq and it was working great (using anonymous type),
var result = from s in Items
group s by s.StartTime into groupedItems
select new {groupedItems.Key, Items= groupedItems.OrderBy(x => x.Name) };
But using Dynamic Linq I cannot get it to order by within the groupby.
result = Items.GroupBy("StartTime", "it").OrderBy("Name");
It states the Name isn't available. It is worth noting that if I take my OrderBy off, everything works great but items inside each "Key" are not ordered.
This is a good question!
I simulated your situation by creating a class called Item.
public class Item
{
public DateTime StartTime { get; set; }
public string Name { get; set; }
}
and then created a basic list of items to do the groupby.
List<Item> Items = new List<Item>()
{
new Item() { StartTime = DateTime.Today, Name = "item2"},
new Item() { StartTime = DateTime.Today, Name = "item1"},
new Item() { StartTime = DateTime.Today.AddDays(-1), Name = "item3"},
};
Now the big difference in the 2 queries is where the order by is being performed. In the first query, when you perform groupedItems.OrderBy(x => x.Name) its being performed on a IGrouping<DateTime,Item> or a single entry as it iterates through all the groupings.
In the second query, the orderby is being performed after the fact. This means you're doing an orderby on a IEnumerable<IGrouping<DateTime,Item>> because the iterations have already happened.
Since Microsoft was nice they added something to help deal with this for expressions. This overload allows you to specify the item returned as it iterates through the collection. Here's an example of the code:
var expressionResult = Items.GroupBy(x => x.StartTime,
(key, grpItems) => new { key, Items = grpItems.OrderBy(y => y.Name) });
The second part of the GroupBy you can specify a lambda expression that takes a key and a grouping of items under that key and return an entry that you specify, which is the same as you're doing in the original query.
Hope this helps!

Linq2SQL "Local sequence cannot be used in LINQ to SQL" error

I have a piece of code which combines an in-memory list with some data held in a database. This works just fine in my unit tests (using a mocked Linq2SqlRepository which uses List).
public IRepository<OrderItem> orderItems { get; set; }
private List<OrderHeld> _releasedOrders = null;
private List<OrderHeld> releasedOrders
{
get
{
if (_releasedOrders == null)
{
_releasedOrders = new List<nOrderHeld>();
}
return _releasedOrders;
}
}
.....
public int GetReleasedCount(OrderItem orderItem)
{
int? total =
(
from item in orderItems.All
join releasedOrder in releasedOrders
on item.OrderID equals releasedOrder.OrderID
where item.ProductID == orderItem.ProductID
select new
{
item.Quantity,
}
).Sum(x => (int?)x.Quantity);
return total.HasValue ? total.Value : 0;
}
I am getting an error I don't really understand when I run it against a database.
Exception information:
Exception type: System.NotSupportedException
Exception message: Local sequence cannot be used in LINQ to SQL
implementation of query operators
except the Contains() operator.
What am I doing wrong?
I'm guessing it's to do with the fact that orderItems is on the database and releasedItems is in memory.
EDIT
I have changed my code based on the answers given (thanks all)
public int GetReleasedCount(OrderItem orderItem)
{
var releasedOrderIDs = releasedOrders.Select(x => x.OrderID);
int? total =
(
from item in orderItems.All
where releasedOrderIDs.Contains(item.OrderID)
&& item.ProductID == orderItem.ProductID
select new
{
item.Quantity,
}
).Sum(x => (int?)x.Quantity);
return total.HasValue ? total.Value : 0;
}
I'm guessing it's to do with the fact
that orderItems is on the database
and releasedItems is in memory.
You are correct, you can't join a table to a List using LINQ.
Take a look at this link:
http://flatlinerdoa.spaces.live.com/Blog/cns!17124D03A9A052B0!455.entry
He suggests using the Contains() method but you'll have to play around with it to see if it will work for your needs.
It looks like you need to formulate the db query first, because it can't create the correct SQL representation of the expression tree for objects that are in memory. It might be down to the join, so is it possible to get a value from the in-memory query that can be used as a simple primitive? For example using Contains() as the error suggests.
You unit tests work because your comparing a memory list to a memory list.
For memory list to database, you will either need to use the memoryVariable.Contains(...) or make the db call first and return a list(), so you can compare memory list to memory list as before. The 2nd option would return too much data, so your forced down the Contains() route.
public int GetReleasedCount(OrderItem orderItem)
{
int? total =
(
from item in orderItems.All
where item.ProductID == orderItem.ProductID
&& releasedOrders.Contains(item.OrderID)
select new
{
item.Quantity,
}
).Sum(x => (int?)x.Quantity);
return total.HasValue ? total.Value : 0;
}

Resources