Performance issue in IEnumerable type when querying large amount of data with LINQ - linq

I'm using LINQ to execute a query on a List type variable with a large amount of data (over a million). For performance purposes I'm using IEnumerable to store the results but when I try to access it there is a slight delay.
Specifically I want to see if the query produced any results, but when I use the .Count() or .Any() functions the performance drops.
I read that for IEnumerable types the execution of the query happens at the time of need, hence the delay. Is there a way to see if the IEnumerable has elements inside it without having that much delay?
This is what I'm trying to run.
IEnumerable<Entity> matchingEntities = entities.Where(e => e.Names.Any(n => myEntity.Names.Any(entityName => entityName.CompareNameObjects(n))));
and here are my classes
public class Entity
{
public string EntityIdentifier { get; set; }
public List<Name> Names { get; set; }
}
public class Name
{
public string FullName { get; set; }
public string NameType { get; set; }
public bool CompareNameObjects(Name name2)
{
return FullName == name2.FullName &&
NameType == name2.NameType;
}
}
entities is a list of all my objects and I want to check if myEntity has any Names identical with another entity in the set.
EDITED:
The data structure is similar to the 2 classes (Entity and Name). The entities are created by selecting all the entities, along with their names, from the database in XML format and then I convert the XML to a List as such:
List<Entity> entities = new List<Entity>();
using (SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["myCS"].ConnectionString))
{
conn.Open();
SqlCommand cmd = new SqlCommand("GetAllEntities", conn);
cmd.CommandType = CommandType.StoredProcedure;
string entitiesXml = "";
using (SqlDataReader rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
entitiesXml += rdr["XmlString"].ToString();
}
}
using (TextReader reader = new StringReader(entitiesXml))
entities = (Entity)xmlSerializer.Deserialize(reader);
conn.Close();
}
GetAllEntities (Stored Procedure):
declare #xmlString nvarchar(max) =(
select e.EntityIdentifier,
(
select n.[Full Name] as 'FullName',
n.[Name Type] as 'NameType'
from tblNames n
where e.EntityID=n.[Entity_ID]
for xml path('Name'), type
)
from tblEntities e
order by e.EntityID
for xml path('Entity')
)
select #xmlString as XmlString

Basically, you should avoid getting all data from your database then filter it with C# code. It consumes a lot of effort.
However, for quick solution, you can improve performance by preparing your conditions in a Dictionary form firstly.
// Let's say you have myEntity here
var myEntity = new Entity();
var entities = new List<Entity>();
// You should prepare the list of name that you wanna to find before you do it so that you don't have to make it repeatedly for every iteration
var names = myEntity.Names.Select(p=> p.FullName + p.NameType ).ToDictionary(p=>p, p=>p);
IEnumerable<Entity> matchingEntities = entities.Where(e => e.Names.Any(n => names.ContainsKey(n.FullName + n.NameType)));
This is just an example that may give you more idea. You can improve much more. I hope it can help you

Related

count based on lookup in LINQ

I have a table (or entity) named Cases. There is another table CaseStatus_Lookup and the primary key of this table is a foreign key in the Cases table.
What I want to do is: For every status type I want the number of count of cases. For e.g. if status = in progress , I want to know how many cases are in that status.
one other thing: I also want to filter the Cases based on UserID.
I tried several ways in LINQ but could not get vary far. I was wondering if someone could help.
try Linq .GroupBy
am assuming your entity structure
suppose your Case Entity is like
public class Case
{
public int Id{get;set;}
public int CaseStatusId{get;set;}
public int UserId{get;set;}
//navigational fields
public virtual CaseStatus CaseStatus {get;set;}
}
and suppose your CaseStatus entity is like:
public class CaseStatus
{
public int Id{get;set;}
public string Name{get;set;}
//navigational fields..
public virtual ICollection<Case> Cases{get;set;}
}
then you can do this:
using (myDbContext db = new myDbContext())
{
var query = db.Cases.GroupBy(case => case.CaseStatus.Name)
.Select(group =>
new {
Name = group.Key,
Cases= group.OrderBy(x => x.Id),
Count= group.Count()
}
).ToList();
//query will give you count of cases grouped by CaseStatus.
}
similarly you can further filter your result based on userId.
Start to explore about Linq .GroupBy
You need a function that returns the sum and takes the status as parameter :- something like below.
MyCaseStatusEnum caseStatus; //Pass your required status
int caseCount = myCases
.Where(r => r.Status == caseStatus)
.GroupBy(p => p.Status)
.Select(q => q.Count()).FirstOrDefault<int>();

Linq to Entities, Take(3) from Joined table

I am trying to populate a ViewModel in an MVC app with data from a parent table joined with a child table. The only data I want from the child table is a comma diliminated string from the Nomenclature field of the top three records and put them into a string field in the ViewModel. Here is what I have tried without success:
public IEnumerable<ReqHeaderVM> GetOpenReqs(string siteCode)
{
var openReqs = from h in context.ReqHeaders
join l in context.ReqLineItems on h.ID equals l.ReqID into reqLineItems
select new ReqHeaderVM
{
ReqID = h.ID,
ShopCode = h.ShopCode
Nomenclatures = reqLineItems.Select(x => x.Nomenclature).Take(3) // This doesn't work
};
return (openReqs.ToList());
}
Here is the ViewMdel:
public class ReqHeaderVM
{
[Editable(false)]
public string ReqID { get; set; }
public string ShopCode { get; set; }
public string Nomenclatures {get; set;}
}
Assuming that you have proper relationship (foreign key) between ReqHeaders and ReqLineItems, this should give you what you are looking for...
public IEnumerable<ReqHeaderVM> GetOpenReqs(string siteCode)
{
var openReqs = from h in context.ReqHeaders
select new
{
ReqID = h.ID,
ShopCode = h.ShopCode
Nomenclatures = h.ReqLineItems
.OrderBy(x => x.SomeColumn)
.Select(x => x.Nomenclature)
.Take(3)
};
var openReqsTran = from oreq in openReqs.AsEnumerable()
select new ReqHeaderVM
{
oreq.ReqID,
oreq.ShopCode,
Nomenclatures = string.Join(", ", oreq.Nomenclatures)
};
return (openReqsTran);
}
Note that Nomenclatures is a list of type of Nomenclature.
Yes, the join creates a single Cartesian result set. (think tabular data) what you are attempting to do. To get the results you want you have a few choices.
use lazy loading and iterate over each header querying the line items individually.
pro - simple queries
con - select n+1
query all headers and all line items, but build view model with only the top 3
pro - single query
con - large Cartesian result set query too much data
query all headers and all associated lines individuals
pro - 2 smaller, simpler queries
con - query too many line details.
query all headers and top 3 lines per header in 2 queries
pro - get only the information you require
con - complex query for top 3 lines per header.

LINQ (Dynamic): OrderBy within a GroupBy using dynamic linq?

I had the following query using normal linq and it was working great (using anonymous type),
var result = from s in Items
group s by s.StartTime into groupedItems
select new {groupedItems.Key, Items= groupedItems.OrderBy(x => x.Name) };
But using Dynamic Linq I cannot get it to order by within the groupby.
result = Items.GroupBy("StartTime", "it").OrderBy("Name");
It states the Name isn't available. It is worth noting that if I take my OrderBy off, everything works great but items inside each "Key" are not ordered.
This is a good question!
I simulated your situation by creating a class called Item.
public class Item
{
public DateTime StartTime { get; set; }
public string Name { get; set; }
}
and then created a basic list of items to do the groupby.
List<Item> Items = new List<Item>()
{
new Item() { StartTime = DateTime.Today, Name = "item2"},
new Item() { StartTime = DateTime.Today, Name = "item1"},
new Item() { StartTime = DateTime.Today.AddDays(-1), Name = "item3"},
};
Now the big difference in the 2 queries is where the order by is being performed. In the first query, when you perform groupedItems.OrderBy(x => x.Name) its being performed on a IGrouping<DateTime,Item> or a single entry as it iterates through all the groupings.
In the second query, the orderby is being performed after the fact. This means you're doing an orderby on a IEnumerable<IGrouping<DateTime,Item>> because the iterations have already happened.
Since Microsoft was nice they added something to help deal with this for expressions. This overload allows you to specify the item returned as it iterates through the collection. Here's an example of the code:
var expressionResult = Items.GroupBy(x => x.StartTime,
(key, grpItems) => new { key, Items = grpItems.OrderBy(y => y.Name) });
The second part of the GroupBy you can specify a lambda expression that takes a key and a grouping of items under that key and return an entry that you specify, which is the same as you're doing in the original query.
Hope this helps!

linq: Using methods in select clause

I'm breaking my head with this and decided to share my problem with you
I want to create an anonymous select from several tables, some of them may contain more than one result. i want to concatenate these results into one string
i did something like this:
var resultTable = from item in dc.table
select new
{
id= item.id,
name= CreateString((from name in item.Ref_Items_Names
select name.Name).ToList()),
};
and the CreateString() is:
private string CreateString(List<string> list)
{
StringBuilder stringedData = new StringBuilder();
for (int i = 0; i < list.Count; i++)
{
stringedData.Append(list[i] + ", ");
}
return stringedData.ToString();
}
my intentions were to convert the "name" query to list and then sent it to CreateString() to convert it to one long concatenated string.
I tried using .Aggregate((current,next) => current + "," + next);
but when i try to convert my query to DataTable like below:
public DataTable ToDataTable(Object query)
{
DataTable dt = new DataTable();
IDbCommand cmd = dc.GetCommand(query as IQueryable);
SqlDataAdapter adapter = new SqlDataAdapter();
adapter.SelectCommand = (SqlCommand)cmd;
cmd.Connection.Open();
adapter.Fill(dt);
cmd.Connection.Close();
return dt;
}
I'm getting exception that "dc.GetCommand()" can't understand query with Aggregate method
later I tried to even use this simple query:
var resultTable = from itemin dc.table
select new
{
name = CreateString()
};
When CreateString() returns "success", nothing was inserted to "name"
why there is no way of using methods in select clause?
Thank you
Yotam
There is difference between LINQ to objects and LINQ to some-db-provider. Generally speaking, when using IQueryable, you can't use any methods, except the ones your provider understands.
What you can do is to retrieve the data from the database and then do the formatting using LINQ to objects:
var data = from item in dc.table
where /* some condition */
select item;
var result = from item in data.AsEnumerable()
select new
{
name = SomeFunction(item)
}
The AsEnumerable() extension method forces processing using LINQ to objects.
Forgive me if I've miss interpreted your question. It seems that what you are trying to do is abstract your select method for reuse. If this is the case, you may consider projection using a lambda expression. For example:
internal static class MyProjectors
{
internal static Expression<Func<Object1, ReturnObject>> StringDataProjector
{
get
{
return d => new Object1()
{
//assignment here
}
}
}
}
Now you can select your datasets as such:
dc.Table.Select(MyProjectors.StringDataProjector)
As for the concatenation logic, what about selecting to some base class with an IEnumerable<string> property and a read-only property to handle the concatenation of the string?

Selecting metadata using Linq and Reflection

Here's the situation:
I'm attempting to get a collection of all types in my assembly that implement a specific generic interface along with the generic type parameters used. I have managed to put together a Linq query to perform this but it seems awfully redunant.
I've read up on let and joins but couldn't see how to I'd use them to reduce the verbosity of this particular query. Can anyone provide any tips on how to shorten/enhance the query please?
Here's an MSTest class that currently passes and demonstrates what I'm trying to achieve:
[TestClass]
public class Sample
{
[TestMethod]
public void MyTest()
{
var results =
(from type in Assembly.GetExecutingAssembly().GetTypes()
where type.GetInterfaces().Any(x =>
x.IsGenericType &&
x.GetGenericTypeDefinition() == typeof(MyInterface<,>)
)
select new ResultObj(type,
type.GetInterfaces().First(x =>
x.IsGenericType &&
x.GetGenericTypeDefinition() == typeof(MyInterface<,>)
).GetGenericArguments()[0],
type.GetInterfaces().First(x =>
x.IsGenericType &&
x.GetGenericTypeDefinition() == typeof(MyInterface<,>)
).GetGenericArguments()[1]
)).ToList();
Assert.AreEqual(1, results.Count);
Assert.AreEqual(typeof(int), results[0].ArgA);
Assert.AreEqual(typeof(string), results[0].ArgB);
}
interface MyInterface<Ta, Tb>
{ }
class MyClassA : MyInterface<int, string>
{ }
class ResultObj
{
public Type Type { get; set; }
public Type ArgA { get; set; }
public Type ArgB { get; set; }
public ResultObj(Type type, Type argA, Type argB)
{
Type = type;
ArgA = argA;
ArgB = argB;
}
}
}
Regards,
Matt
Here is an example that shows how to rewrite this using the let keyword:
var results =
(from type in Assembly.GetExecutingAssembly().GetTypes()
// Try to find first such interface and assign the result to 'ifc'
// Note: we use 'FirstOrDefault', so if it is not found, 'ifc' will be null
let ifc = type.GetInterfaces().FirstOrDefault(x =>
x.IsGenericType &&
x.GetGenericTypeDefinition() == typeof(MyInterface<,>))
// Filtering and projection can now use 'ifc' that we already have
where ifc != null
// Similarly to avoid multiple calls to 'GetGenericArguments'
let args = ifc.GetGenericArguments()
select new ResultObj(type, args[0], args[1])).ToList();
The let keyword works a bit like variable declaration, but lives within LINQ queries - it allows you to create a variable that stores some result that is needed in multiple places later in the query. You mention "joins" as well, but that's mostly used for database-like joins (and I'm not sure how it would apply here).

Resources