How to handle large amount of data using linq in mvc - linq

I face a problem using a linq query. I am getting data from a SQL database using this query and date time parameter (see below). When this query executes, it takes a long time, and after a long wait, I get an error.
Data is available in database, and when I use Take() with number of rows, it will work. I don't to know how to figure out the problem.
Is it possible my query hits a large amount of data causing the query to not work? Can you please share any suggestions on how to solve this issue?
from ClassificationData in DbSet
where ClassificationData.CameraListId == id &&
ClassificationData.DateTime <= endDate &&
ClassificationData.DateTime >= startdate
orderby ClassificationData.Id descending
select ClassificationData

Your problem is probably more in the realm of SQL than LINQ. LINQ just translates what you write into Transact-SQL (T-SQL) that gets sent up to SQL Server. If your SQL Server is not set-up properly, then you'll get a timeout if the query takes too long.
You need to make sure that you have indexes on the ClassificationData table (I assume it's a table, but it could be a view -- whatever it is, you need to put indexes on it if it has lots of rows). Make sure that an index is on DateTime, and that an index is also on CameraListId. Then, bring down the data unordered and execute the order-by in a separate query done on the local machine -- that will let SQL Server start giving you data right away instead of sorting it first, reducing the chance for a timeout.
If your problems persist, write queries directly against SQL Server (in Query Analyzer, for instance, if they still use that -- I'm old school, but a bit rusty). Once you have SQL Server actually working, then you should be able to write the LINQ that translates into that SQL. But you could also make it a stored procedure, which has other benefits, but that's another story for another day.
Finally, if it's still too much data, break up your data into multiple tables, possibly by CameraListId, or perhaps a week at a time of data so the DateTime part of the query doesn't have as much to look through.
Good luck.

Related

Impala query with LIMIT 0

Being production support team member, I investigate issues with various Impala queries and while researching on an issue , I see a team submits an Impala query with LIMIT 0 which obviously do not return any rows and then again without LIMIT 0 which gives them result. I guess they submit these queries from IBM Datastage. Before I question them why they do so.. wanted to check what could be a reason for someone to run with LIMIT 0. Is it just to check syntax or connection with Impala? I see a similar question discussed here in context of SQL but thought to ask anyway in Impala perspective. Thanks Neel
I think you are partially correct.
Pls note, limit will process all the data and then apply limit clause.
LIMIT 0 is mostly used to -
to check if syntax of SQL is correct. But impala do fetch all the records before applying limit. so SQL is completely validated. Some system may use this to check out the sql they generated automatically before actually applying it in server.
limit fetching lots of rows from a huge table or a data set every time you run a SQL.
sometime you want to create an empty table using structure of some other tables but do not want to copy store format, configurations etc.
dont want to burden the hue/any interface that is interacting with impala. All data will be processed but will not be returned.
performance test - this will somewhat give you an idea of run time of SQL. i used the word somewhat because its not actual time to complete but estimated time to complete a SQL.

MS Access 2010: query slows down dramatically when using parameters

I hope this was not asked here before (I did search around here, and did google for an answer, but could not find an answer)
The problem is: I'm using MS Access 2010 to select records from a linked table (There are millions of records in the table). If I specify criteria (e.g. Date) directly (for example date=#1/1/2013#), the query returns in an instant. If i use parameters (add a parameter of type date/time and provide value of 1/1/2013 when prompted (or date in some different format), or reference a control in a form), the query takes minutes to load.
Please let me know if You have any ideas on what could be causing this. I do feel bad about asking such a question and possibly wasting someones time...
Here's a potential answer, I didn't know this myself and did a little digging.
If performance is important, it may be necessary to prefer dynamic SQL even for where parameter queries are suitable due to how queries are optimized. Generally, Access creates a plan for a new query upon saving. When a query contains a parameter, then Access cannot know what value the parameter may contain and has to make a "good guess". Depending on which actual values are later supplied, it may be okay or poor, resulting in sub-optimal performance. In contrast, dynamic SQL sidesteps this because the "parameters" are hard-coded into the temporary string and thus a new plan is compiled with that value, guaranteeing optimal execution plan. Since compiling a new plan at runtime is very fast, it can be the case that dynamic SQL will outperform parameter queries.
Source: http://www.utteraccess.com/wiki/index.php/Parameter_Query#Performance
Also, if I had to guess, in your parameter query, Access is requesting the ENTIRE table from Oracle and then filtering down with your where clause, but when the WHERE clause is specified, it actually just loads those records and possibly makes use of indexes.
As far as a solution, I would build your query string in VBA then execute it. It opens you up to injection, but you can handle that. So:
Instead of using a saved parameter query object in Access, try to do something like this.
dim qr as string
qr = "SELECT * FROM myTable WHERE myDate = #" & me.dateControl & "#;"
'CurrentDb.execute qr, dbFailOnError
Docmd.RunSQL qr
Or, as you replied, currentdb.openrecordset(qr)
This would force the engine to make an execution plan at runtime rather than having a saved potentially suboptimal plan. Let me know if this works out for you, I'd be interested to see.
Of course the above reference about using parameters with Access (JET/ACE) ONLY applies to access back ends, not ODBC ones like SQL server or oracle. Since you pointed out that your using Oracle here then creating a view or using a pass-though query would and should resolve this performance issue. However one does NOT want to use Access/JET paramters with data coming from some server based system - you best just send the server SQL strings, but much better would be to use a pass-though query. If the result set requires editing, then PT query are only readonly, and you have to create a view and link to that view.

ADO Search Performance

Because I am not familiar with ADO under the hood, I was wonder which of the two methods of finding a record generally yields quicker results using VB6.
Use a 'select' statement using 'where' as a qualifier. If the recordset count yields zero, the record was not found.
Select all records iterating through records with a client-side cursor until record is found, or not at all.
The recordset is in the range of 10,000 records and will grow. Also, I am open to anything that will yield shorter search times other than what was mentioned.
SELECT count(*) FROM foo WHERE some_column='some value'
If the result is greater than 0 the record satisfying your condition was found in the database. It is unlikely you would get any faster than this. Proper indexes on the columns you are using in the WHERE clause could considerably improve performance.
In every case I can think of, selecting using the where clause is faster.
Even in situations where the client code will iterate through the whole database (file-based databases like Access, for example), you will have optimized code written in c or c++ doing the selection (in the database driver.) This is always faster than VB6.
For Database engines (SQL, MySQL, etc), the performance increase can even be more profound. By using the where clause, you limit the amount of data that must be transmitted over the network, vastly improving the response.
Some additional performance tips:
Select only the fields you want.
Build indexes on frequently used fields
Watch what kind of recordset you are returning. Use Forward-only cursors if you are just returning data from a database.
Lastly, I was shocked by VB.NET's database performance, it being several times faster than the fastest VB6 code.

Performance on joins in linq

HI ,
I am going to rewrite a store procedure in LINQ.
What this sp is doing is joining 12 tables and get the data and insert it into another table.
it has 7 left outer joins and 4 inner joins.And returns one row of data.
Now question.
1)What is the best way to achieve this joins in linq.
2) do you think this affect performance (its only retrieving one row of data at a given point of time)
Please advice.
Thanks
SNA.
You might want to check this question for the multiple joins. I usually prefer lambda syntax, but YMMV.
As for performance: I doubt the query performance itself will be affected, but there may be some overhead in figuring out the execution plan, since it's such a complicated query. The biggest performance hit will likely be the extra database round trip you will need compared to the stored procedure. If I understand you correctly, your current SP does the SELECT AND INSERT all at once. Using LINQ to SQL or LINQ to Entities, you will need to fetch the data first before you can actually write them to the other table.
So, it depends on your usage if rewriting is warranted. Alternatively, you can add stored procedures to your data model. It will be exposed as a method on your data context.

LinqToSQL DateTime filters?

I've got a linqtosql query filtering and ordering by a datecolumn that takes 20 seconds to run. When I run the generated sqlquery directly on the DB it returns in 0 seconds.
var myObjs = DB.Table
.Where(obj => obj.DateCreated>=DateTime.Today)
.OrderByDescending(obj => obj.DateCreated);
The table has only 100,000 records and the DateTime column is indexed.
Just another in a long line of linqtosql performance grievances. But this one is SOO bad that I'm sure I must be doing something wrong.
I suspect the difference is that although running the generated query only takes 0 seconds, that's because it's not actually showing you all the results if you're using something like Enterprise Manager. Just fetching (and deserializing) all the data for 100,000 results could well take a significant amount of time, but your manual query is probably only showing you the first 20 hits or something similar.
If you run the same SQL in .NET and use a DataReader to fetch all the data, how long does it take then?
If you run server with profiling turned on, how long does it say the query took to execute from LINQ to SQL?
Thanks guys...
The problem was mine, not linq's. For brevity I shortened the query in the question but there was actually another filter that had been applied to a NON indexed column. Adding the index solved the problem.
What through me for a loop though was that, as Jon Skeet suggested, running the query in Sql Mgmt studio gave a false sense of confidence because the query was paged, and very quickly returned the top 20 rows, leaving me to think linq was to blame. So the index problem only showed up in linq and not in sql mgmt studio.
I can't see anything wrong in your query. It would be great to see the T-SQL generated by Linq. Did you try that?

Resources