Entity Framework SQL Selecting 600+ Columns - performance

I have a query generated by entity framework running against oracle that's too slow. It runs in about 4 seconds.
This is the main portion of my query
var query = from x in db.BUILDINGs
join pro_co in db.PROFILE_COMMUNITY on x.COMMUNITY_ID equals pro_co.COMMUNITY_ID
join co in db.COMMUNITies on x.COMMUNITY_ID equals co.COMMUNITY_ID
join st in db.STATE_PROFILE on co.STATE_CD equals st.STATE_CD
where pro_co.PROFILE_NM == authorizedUser.ProfileName
select new
{
COMMUNITY_ID = x.COMMUNITY_ID,
COUNTY_ID = x.COUNTY_ID,
REALTOR_GROUP_NM = x.REALTOR_GROUP_NM,
BUILDING_NAME_TX = x.BUILDING_NAME_TX,
ACTIVE_FL = x.ACTIVE_FL,
CONSTR_SQFT_AVAIL_NB = x.CONSTR_SQFT_AVAIL_NB,
TRANS_RAIL_FL = x.TRANS_RAIL_FL,
LAST_UPDATED_DT = x.LAST_UPDATED_DT,
CREATED_DATE = x.CREATED_DATE,
BUILDING_ADDRESS_TX = x.BUILDING_ADDRESS_TX,
BUILDING_ID = x.BUILDING_ID,
COMMUNITY_NM = co.COMMUNITY_NM,
IMAGECOUNT = x.BUILDING_IMAGE2.Count(),
StateCode = st.STATE_NM,
BuildingTypeItems = x.BUILDING_TYPE_ITEM,
BuildingZoningItems = x.BUILDING_ZONING_ITEM,
BuildingSpecFeatures = x.BUILDING_SPEC_FEATURE_ITEM,
buildingHide = x.BUILDING_HIDE,
buildinghideSort = x.BUILDING_HIDE.Count(y => y.PROFILE_NM == ProfileName) > 0 ? 1 : 0,
BUILDING_CITY_TX = x.BUILDING_CITY_TX,
BUILDING_ZIP_TX = x.BUILDING_ZIP_TX,
LPF_GENERAL_DS = x.LPF_GENERAL_DS,
CONSTR_SQFT_TOTAL_NB = x.CONSTR_SQFT_TOTAL_NB,
CONSTR_STORIES_NB = x.CONSTR_STORIES_NB,
CONSTR_CEILING_CENTER_NB = x.CONSTR_CEILING_CENTER_NB,
CONSTR_CEILING_EAVES_NB = x.CONSTR_CEILING_EAVES_NB,
DESCR_EXPANDABLE_FL = x.DESCR_EXPANDABLE_FL,
CONSTR_MATERIAL_TYPE_TX = x.CONSTR_MATERIAL_TYPE_TX,
SITE_ACRES_SALE_NB = x.SITE_ACRES_SALE_NB,
DESCR_PREVIOUS_USE_TX = x.DESCR_PREVIOUS_USE_TX,
CONSTR_YEAR_BUILT_TX = x.CONSTR_YEAR_BUILT_TX,
DESCR_SUBDIVIDE_FL = x.DESCR_SUBDIVIDE_FL,
LOCATION_CITY_LIMITS_FL = x.LOCATION_CITY_LIMITS_FL,
TRANS_INTERSTATE_NEAREST_TX = x.TRANS_INTERSTATE_NEAREST_TX,
TRANS_INTERSTATE_MILES_NB = x.TRANS_INTERSTATE_MILES_NB,
TRANS_HIGHWAY_NAME_TX = x.TRANS_HIGHWAY_NAME_TX,
TRANS_HIGHWAY_MILES_NB = x.TRANS_HIGHWAY_MILES_NB,
TRANS_AIRPORT_COM_NAME_TX = x.TRANS_AIRPORT_COM_NAME_TX,
TRANS_AIRPORT_COM_MILES_NB = x.TRANS_AIRPORT_COM_MILES_NB,
UTIL_ELEC_SUPPLIER_TX = x.UTIL_ELEC_SUPPLIER_TX,
UTIL_GAS_SUPPLIER_TX = x.UTIL_GAS_SUPPLIER_TX,
UTIL_WATER_SUPPLIER_TX = x.UTIL_WATER_SUPPLIER_TX,
UTIL_SEWER_SUPPLIER_TX = x.UTIL_SEWER_SUPPLIER_TX,
UTIL_PHONE_SVC_PVD_TX = x.UTIL_PHONE_SVC_PVD_TX,
CONTACT_ORGANIZATION_TX = x.CONTACT_ORGANIZATION_TX,
CONTACT_PHONE_TX = x.CONTACT_PHONE_TX,
CONTACT_EMAIL_TX = x.CONTACT_EMAIL_TX,
TERMS_SALE_PRICE_TX = x.TERMS_SALE_PRICE_TX,
TERMS_LEASE_SQFT_NB = x.TERMS_LEASE_SQFT_NB
};
There is a section of code that tacks on dynamic where and sort clauses to the query but I've left those out. The query takes about 4 seconds to run no matter what is in the where and sort.
I dropped the generated SQL in Oracle and an explain plan didn't appear to show anything that screamed fix me. Cost is 1554
If this isn't allowed I apologize but I can't seem to find a good way to share this information. I've uploaded the explain plan generated by Sql Developer here: http://www.123server.org/files/explainPlanzip-e1d291efcd.html
Table Layout
Building
--------------------
- BuildingID
- CommunityId
- Lots of other columns
Profile_Community
-----------------------
- CommunityId
- ProfileNM
- lots of other columns
state_profile
---------------------
- StateCD
- ProfileNm
- lots of other columns
Profile
---------------------
- Profile-NM
- a few other columns
All of the tables with allot of columns have 120-150 columns each. It seems like entity is generating a select statement that pulls every column from every table instead of just the ones I want.
The thing that's bugging me and I think might be my issue is that in my LINQ I've selected 50 items, but the generated sql is returning 677 columns. I think returning so many columns is the source of my slowness possibly.
Any ideas why I am getting so many columns returned in SQL or how to speed my query?

I have a suspicion some of the performance is being impacted by your object creation. Try running the query without just a basic "select x" and see if it's the SQL query taking time or the object creation.
Also if the query being generated is too complicated you could try separating it out into smaller sub-queries which gradually enrich your object rather than trying to query everything at once.

I ended up creating a view and having the view only select the columns I wanted and joining on things that needed to be left-joined in linq.
It's pretty annoying that EF selects every column from every table you're trying to join across. But I guess I only noticed this because I am joining a bunch of tables with 150+ columns in them.

Related

How to make zero counts show in LINQ query when getting daily counts?

I have a database table with a datetime column and I simply want to count how many records per day going back 3 months. I am currently using this query:
var minDate = DateTime.Now.AddMonths(-3);
var stats = from t in TestStats
where t.Date > minDate
group t by EntityFunctions.TruncateTime(t.Date) into g
orderby g.Key
select new
{
date = g.Key,
count = g.Count()
};
That works fine, but the problem is that if there are no records for a day then that day is not in the results at all. For example:
3/21/2008 = 5
3/22/2008 = 2
3/24/2008 = 7
In that short example I want to make 3/23/2008 = 0. In the real query all zeros should show between 3 months ago and today.
Fabricating missing data is not straightforward in SQL. I would recommend getting the data that is in SQL, then joining it to an in-memory list of all relevant dates:
var stats = (from t in TestStats
where t.Date > minDate
group t by EntityFunctions.TruncateTime(t.Date) into g
orderby g.Key
select new
{
date = g.Key,
count = g.Count()
}).ToList(); // hydrate so we only query the DB once
var firstDate = stats.Min(s => s.date);
var lastDate = stats.Max(s => s.date);
var allDates = Enumerable.Range(1,(lastDate - firstDate).Days)
.Select(i => firstDate.AddDays(i-1));
stats = (from d in allDates
join s in stats
on d equals s.date into dates
from ds in dates.DefaultIfEmpty()
select new {
date = d,
count = ds == null ? 0 : ds.count
}).ToList();
You could also get a list of dates not in the data and concatenate them.
I agree with #D Stanley's answer but want to throw an additional consideration into the mix. What are you doing with this data? Is it getting processed by the caller? Is it rendered in a UI? Is it getting transferred over a network?
Consider the size of the data. Why do you need to have the gaps filled in? If it is known to be returning over a network for instance, I'd advise against filling in the gaps. All you're doing is increasing the data size. This has to be serialised, transferred, then deserialised.
If you are going to loop the data to render in a UI, then why do you need the gaps? Why not implement the loop from min date to max date (like D Stanley's join) then place a default when no value is found.
If you ARE transferring over a network and you still NEED a single collection, consider applying D Stanley's resolution on the other side of the wire.
Just things to consider...

Oracle Update with Joins

I'm trying to convert few MS Access Queries into Oracle. Following is one of the query from MS Access.
UPDATE [RESULT] INNER JOIN [MASTER]
ON ([RESULT].[LAST_NAME] = [MASTER].[LAST_NAME])
AND ([RESULT].[FIRST_NAME] = [MASTER].[FIRST_NAME])
AND ([RESULT].[DOCUMENT_NUMBER] = [MASTER].[DOCUMENT_NUMBER])
AND ([RESULT].[BATCH_ID] = [MASTER].[LEAD_ID])
SET [MASTER].[CLOSURE_REASON] = "Closed For Name and Document Number Match",
[MASTER].[RESULT_ID] = [RESULT].[ID],
[MASTER].[RESULT_PID] = [RESULT].[PID]
WHERE (([MASTER].[CLOSURE_REASON] Is Null)
AND ([MASTER].[REC_CODE] = "A1")
AND ([RESULT].[EVENT_DATE] = [MASTER].[EVENT_DATE])
AND ([RESULT].[EVENT_TYPE] = "Open")
AND ([MASTER].[DOCUMENT_NUMBER] Is Not Null)
AND ([MASTER].[DOCUMENT_NUMBER)] "null"));
First I received ORA-01779: cannot modify a column which maps to a non key-preserved table Error. I followed different examples (including MERGE) from your site and modified my original query. Now, I receive ORA-30926: unable to get a stable set of rows in the source tables Error.
Most of the examples showed only one join between the tables but I have to make more joins based on my requirements.
Any help translating this query in to Oracle would be Great. Thanks!
I believe this should be equivalent.
UPDATE master m
SET closure_reason = 'Closed For Name and Document Number Match',
(result_id, result_pid) = (SELECT r.id, r.pid
FROM result r
WHERE m.last_name = r.last_name
AND m.first_name = r.first_name
AND m.lead_id = r.batch_id
AND m.document_number = r.document_number
AND m.event_date = r.event_date
AND r.event_type = 'Open')
WHERE m.closure_reason IS NULL
AND m.rec_code = 'A1'
AND m.document_number IS NOT NULL
AND m.document_number != 'null'
AND EXISTS( SELECT 1
FROM result r
WHERE m.last_name = r.last_name
AND m.first_name = r.first_name
AND m.lead_id = r.batch_id
AND m.document_number = r.document_number
AND m.event_date = r.event_date
AND r.event_type = 'Open' )
Obviously, however, this isn't tested. If you could post the DDL to create your tables, the DML to insert a few rows, and show the expected result, we could test our code and would likely be able to give you more accurate answers.

Wait for DomainContext.Load<t> from an entityquery with joins to complete (returning new type via 'select new')

My app consolidates data from other DBs for reporting purposes. We can't link the databases, so all the data processing has to be done in code - this is fine as we want to allow manual validation during the imports.
Certain users will be able to start an update through the Silverlight 4 front end.
I have 3 tables in database x that are fed from one EF4 Model (ModelX). I want to join those tables together, select specific columns and return the result as a new entity that exists in a different EF4 Model (ModelY). I'm using this query:
var myQuery = from i in DBx.table1 from it in DBx.table2 from h in DBx.table3 where (i.id==it.id && h.otherid == i.otherid) select new ModelYServer {Name = i.name,Thing = it.thing, Stuff = h.stuff};
The bit i'm stuck on, is how to execute that query, and wait until the Asynchronous call has completed. Normally, i'd use:
DomainContext.Load<T>(myQuery).Completed += (sender,args) =>
{List<T> myList = ((LoadOperation<T>)sender.Entities.ToList();};
but I can't pass myQuery (an IEnumerable) into the DomainContext.Load() as that expects an EntityQuery. The dataset is very large, and is taking up to 30 seconds to return, so I definitely need to wait before continuing.
So can anyone tell me how I can wait for the IEnumerable query to complete, or suggest a better way of doing this (there very likely is one).
Thanks
Mick
One simple way is just to force it to evaluate by calling ToList:
var query = from i in DBx.table1
join it in DBx.table2 on i.id equals it.id
join h in DBx.table3 on i.otherid equals h.otherid
select new ModelYServer {
Name = i.name,
Thing = it.thing,
Stuff = h.stuff
};
// This will block until the results have been fetched
var results = query.ToList();
// Now use results...
(I've changed your where clause into joins on the earlier tables, as that's what you were effectively doing and this is more idiomatic, IMO.)

Optimizing LINQ to Sharepoint

I have three lists on Sharepoint 2010 and I have working code that gets the lists and relates them. My problem is that it takes around 15 seconds to load my page. I am a rank beginner with LINQ to Sharepoint and LINQ in general. MY question is: Is there a way to make this code run faster?
SeatingChartContext dc = new SeatingChartContext(SPContext.Current.Web.Url);
EntityList<Seating_chartItem> seatCharts = dc.GetList<Seating_chartItem>("seating_chart");
EntityList<UsersItem> users = dc.GetList<UsersItem>("users");
EntityList<Excluded_usersItem> exusers = dc.GetList<Excluded_usersItem>("excluded_users");
// EntityList<LogsItem> logs = dc.GetList<LogsItem>("logs");
List<Seating_chartItem> seatList = (from seat in seatCharts where seat.Room == 0 where seat.Floor == floor select seat).ToList();
List <UsersItem> usersList = (from user in users select user).ToList();
List <Excluded_usersItem> xusersList = (from xuser in exusers select xuser).ToList();
var results = from seat in seatList
join user in usersList on
seat.User_id equals user.User_id
where seat.Room == 0
where seat.Floor == floor
where !(from xuser in xusersList select xuser.User_id).Contains(user.User_id)
select new
{
sid = seat.Seat_id,
icon = seat.Icon,
topCoord = seat.Top_coord,
leftCoord = seat.Left_coord,
name = user.Name,
phone = user.Phone,
mobile = user.Mobile,
content = seat.Content
};
The time this code takes is frustrating, to say the least.
Thanks.
One immediate thing: You are re-querying xusersList everytime within your join:
where !(from xuser in xusersList select xuser.User_id).Contains(user.User_id)
Instead just first extract the user ids only (since that is the only thing you need)
var xusersList = (from xuser in exusers select xuser.User_id).ToList();
then use it directly:
where !xusersList.Contains(user.User_id)
Even better - determine the valid users before your query:
usersList = usersList.Where( user => !xusersList.Contains(user.User_id))
.ToList();
Now you can just completely remove this where condition from your query.
Also these where conditions seem to be unneeded:
where seat.Room == 0
where seat.Floor == floor
since you have filtered your seatList this way already.
Having said that you should log some performance data to see what actually takes the most time - is it acquiring the inital lists or your actual join/linq query?

Failed to batch insert in Subsonic3 with error "Must declare the scalar variable..."

I have met a problem about inserting multiple rows in a batch with Subsonic3. My development environment includes:
1. Visual Studio 2010, but use .NET 3.5
2. Active Record Mode in SubSonic 3.0.0.4
3. SQL Server 2005 express
4. Northwind sample database
I am using Active Reecord mode to insert mutiple "Product" into table "Products". If I insert the rows one by one, either call "aProduct.Add()" or call "Insert.Execute()" mutiple times (just like the codes below), it works fine.
private static Product[] CreateProducts(int count)
{
Product[] products = new Product[count];
for (int index = 0; index < products.Length; ++index)
{
products[index] = new Product
{
ProductName = string.Format("cheka-test-{0}", index.ToString()),
Discontinued = (index % 2 == 0),
};
}
return products;
}
private static void SucceedByMultiExecuteInsert()
{
Product[] products = CreateProducts(2);
// -------------------------------- prepare batch
NorthwindDB db = new NorthwindDB();
var inserts = from prod in products
select db.Insert.Into<Product>(x => x.ProductName, x => x.Discontinued).Values(prod.ProductName, prod.Discontinued);
// -------------------------------- batch insert
var selectAll = Product.All();
Console.WriteLine("--- before total rows = {0}", selectAll.Count().ToString());
foreach (Insert insert in inserts)
insert.Execute();
Console.WriteLine("+++ after inserting {0} rows, now total rows = {1}",
products.Length.ToString(), selectAll.Count().ToString());
}
but if I use "BatchQuery" like the codes below,
private static void FailByBatchInsert()
{
Product[] products = CreateProducts(2);
// -------------------------------- prepare batch
NorthwindDB db = new NorthwindDB();
BatchQuery batchquery = new BatchQuery(db.Provider, db.QueryProvider);
var inserts = from prod in products
select db.Insert.Into<Product>(x => x.ProductName, x => x.Discontinued).Values(prod.ProductName, prod.Discontinued);
foreach (Insert insert in inserts)
batchquery.Queue(insert);
// -------------------------------- batch insert
var selectAll = Product.All();
Console.WriteLine("--- before total rows = {0}", selectAll.Count().ToString());
batchquery.Execute();
Console.WriteLine("+++ after inserting {0} rows, now total rows = {1}",
products.Length.ToString(), selectAll.Count().ToString());
}
then it failed with the exception :
"
Unhandled Exception: System.Data.SqlClient.SqlException: Must declare the scalar variable "#ins_ProductName".
Must declare the scalar variable "#ins_ProductName".
"
Please give me some help to solve this problem. Many thanks.
I ran into this problem as well. If you look at the query it's attempting to run, you'll see it doing something like this (this isn't actual code but you'll get the point):
exec_sql N'insert into MyTable (SomeField) Values (#ins_SomeField)',N'#0 varchar(32)','#0=SomeValue'
For some reason it defines the parameters in the query with "#ins_"+FieldName but then passes the parameters as ordinals. I have yet to determine the pattern for why/when it does this but I've lost enough time during this dev cycle futzing with SubSonic to try and diagnose the problem properly.
The work-around I implemented will involve you downloading the 3.0.0.4 source from github and making a change on line 179 of Insert.cs.
Where it reads
ParameterName = _provider.ParameterPrefix + "ins_" + columnName.ToAlphaNumericOnly(),
Changing it to
ParameterName = _provider.ParameterPrefix + Inserts.Count.ToString(),
seemed to do the trick for me. I make no warranties about this solution for you, expressed or implied. It did work for me but your mileage may vary.
I should also note that there's similar logic around the "update" statements as well in Update.cs on lines 181 and 194 but I haven't had these give me problems... yet.
Honestly, I don't think SubSonic is ready for primetime and that's a shame because I really like how Rob set it up. That said, it's in my product for better or worse now so you make the best with what you got.

Resources