I am using Azure db and my table has over 120000 records.
Applied with paging of 10 records a page I am fetching data into IQueryable, but fetching 10 records only taking around 2 minutes. This query has no join and just 2 filters. While using Azure search I can get all the records within 3 seconds.
Please suggest me who to minimise my Linq search as azure search is costly.
Based on the query in the comment to your question, it looks like you are reading the entire db table to the memory of your app. The data is potentially transferred from one side of the data centre to another, thus causing the performance issue.
unitofwork.Repository<EntityName().GetQueriable().ToList().Skip(1).Take(10);
Without seeing the rest of the code, I'm just guessing your LINQ query should be something like this:
unitofwork.Repository<EntityName().GetQueriable().Skip(1).Take(10).ToList();
Skip and Take should be executed on the db Server, while .ToList() is at the end will materialize the entities.
Related
I have to pull data from a table that has 2 million rows. The eloquent query looks like this:
$imagesData = Images::whereIn('file_id', $fileIds)
->with('image.user')
->with('file')
->orderBy('created_at', 'DESC')
->simplePaginate(12);
The $fileIds array used in whereIn can contain 100s or even 1000s of file ids.
The above query works fine in small table. But in production site that has over 2 million rows in Images table, it takes over 15 seconds to get a reply. I use Laravel for api only.
I have read through other discussions on this topic. I changed paginate() to simplePaginate(). Some suggests perhaps having a DB:: query with whereRaw might work better than whereIn. Some says it might be due to PDO in php while processing whereIn and some recommends using Images::whereIn which I already used.
I use MariaDB, with InnoDB for db engine and its loaded into RAM. The sql queries performs well for all other queries, but only the ones that has to gather data from huge table like this takes time.
How can I optimise the above laravel query so I can reduce down the query response to couple of seconds if possible when the table has millions of rows?
You need indexing, which segmented your data by certain columns. You are accessing file_id and created_at. Therefore this following index will help performance.
$table->index(['file_id', 'created_at']);
Indexing will increase insert time and can make queries have weird execution plans. If you use the SQL EXPLAIN on the query before an after executing the query, we can verify it helps the problem.
Here is an update on steps taken to speed up the page load.
The real culprit for such a really slow query was not the above specific query alone. After the above query fetches the data, the php iterates over the data and does a sub-query inside it to check something. This query was using the filename column to search the data during each iteration. Since filename is string and not indexed, the response time for that controller took so long since each sub-query is crawling through 1.5 millions rows inside a foreach loop. Once I removed this sub-query, the loading time decreased by a lot.
Secondly, I added the index to file_id and created_at as suggested by #mrhn and #ceejayoz above. I created a migration file like this:
Schema::table('images', function (Blueprint $table) {
$table->index('file_id', 'created_at');
});
Optimised the PHP script further. I removed all queries that does searches using fileNames and changed it to use the id to fetch results. Doing this made a huge difference throughout the app and also improved the server speed due to less CPU work during peak hours.
Lastly, I optimised the server by performing the following steps:
Completed any yum updates
Updated Litespeed
Tweaked Apache profile, this included some caching modules and installed EA-PHP 7.3
Updated cPanel
Tuned Mysql to allow your server to utilize some more of your server resources.
Once I edit all of the above steps, I found a huge difference in the loading speed. Here are the results:
BEFORE:
AFTER:
Thanks to each and everyone who have commented on this. Your comments helped me in performing all of the above steps and the result was fruitful. Thanks heaps.
I made some googling about my subject title, but didn't find useful answer. Most question were about effect of number of table's columns on query performance, while i need to know the effect of number of table's rows on linq query response time in C# MVC project. Actually, i have a web MVC project in which i try to get alarms set using ajax call from server and show them in a web grid in client side. Each ajax call is performed every 60 seconds in a loop created with setTimeOut method. Number of rows are gradually increasing within the alarm table (in SQL Server Database) and after a week, it reaches to thousands of rows. At first when lunching the project, I can see in DevTools of browser(Chrome), the time each ajax call takes is about 1 second or so. But this time gradually increases every day and after a week each success ajax call takes more than 2 minutes. This causes about 5 ajax call always be in pending queue. I am sure there is no memory leak both in client(JQuery) and server(C#) sides code, So the only culprit I suspect is SELECT Query response time performed on alarm table. I appreciate any advice.
Check the query that is executed in the database. There are two options:
Your Linq query fetches all the data from the database and process them locally. In this case the number of rows in the table is quite important. You need to fix your Linq query and make sure it fetches only the relevant rows from the database. Then you can hit option 2.
Your Linq query fetches only the relevant rows from the database, but there are no relevant indexes in your table and each query scans all the data in the table.
However with only few thousands rows in your table, I have doubts it will take 2 minutes to scan the table, so option 1 is more likely to be the reason for this slowdown.
I am working on Oracle SQL Developer and have created tables with Wikipedia data, so size of data is very huge and have 7 tables. I have created a search engine which fetches and display data using JSP, but the problem is that for each query the application has to access 4 tables making my application very time consuming.
I have added indexes to all tables but still it takes more time, so any suggestion on how to optimize my app and reduce time it is taking to display result.
There are several approaches you can take to tune your application. And it could be either tuning at the database end, front end or a combination of the two.
At the database end you could be looking at say a materialized view to summarize the more commonly searched data. This could either be for your search purposes only or to reduce the size and complexity of the resultset. You might also look at tuning the query itself - perhaps placing indexes on the relevant WHERE clauses of your search or look at denormalizing your tables.
At the application end - the retrieval of vast recordsets - can always cause problems where a single record is large (multi-columned) and the number or records in the resultset - numerous.
What you are probably looking for is a rapid response time from your application so your user doesn't feel they are waiting ... and waiting.
A technique I have seen and used is to retrieve the resultset either as
1) a recordset of ROWIDs and to page through these ROWIDs on the display
2) a simulated "paged" recordset. Retrieving the recordset in chunks.
Recently I started to learning Entity Framework.
First question made in my mind is:
When we want to use LINQ to fetching data in EF, every query like this:
var a = from p in contacts select p.name ;
will be converts to SQL commands like this :
select name from contacts
does this converting repeat every time that we are querying?
I heard that stored procedures are cached in database, does this event happens in LINQ queries in Entity Framework ?
And at last is my question clear?
I think linq query is converted each time you want to execute it. To improve performance you can use compiled queries.
There are all sorts of optimizations being made, both in the linq expression caching and what SQL server chooses to cache, the only way is to measure your performance speed and memory consumption
To see what SQL is created you can use http://efprof.com/ which I've found quite good. You can get some of this info through SQL profiler, it's just a lot more work.
What is the best way in terms of speed of the platform and maintainability to access data (read only) on Dynamics CRM 4? I've done all three, but interested in the opinions of the crowd.
Via the API
Via the webservices directly
Via DB calls to the views
...and why?
My thoughts normally center around DB calls to the views but I know there are purists out there.
Given both requirements I'd say you want to call the views. Properly crafted SQL queries will fly.
Going through the API is required if you plan to modify data, but it isnt the fastest approach around because it doesnt allow deep loading of entities. For instance if you want to look at customers and their orders you'll have to load both up individually and then join them manually. Where as a SQL query will already have the data joined.
Nevermind that the TDS stream is a lot more effecient that the SOAP messages being used by the API & webservices.
UPDATE
I should point out in regard to the views and CRM database in general: CRM does not optimize the indexes on the tables or views for custom entities (how could it?). So if you have a truckload entity that you lookup by destination all the time you'll need to add an index for that property. Depending upon your application it could make a huge difference in performance.
I'll add to jake's comment by saying that querying against the tables directly instead of the views (*base & *extensionbase) will be even faster.
In order of speed it'd be:
direct table query
view query
filterd view query
api call
Direct table updates:
I disagree with Jake that all updates must go through the API. The correct statement is that going through the API is the only supported way to do updates. There are in fact several instances where directly modifying the tables is the most reasonable option:
One time imports of large volumes of data while the system is not in operation.
Modification of specific fields across large volumes of data.
I agree that this sort of direct modification should only be a last resort when the performance of the API is unacceptable. However, if you want to modify a boolean field on thousands of records, doing a direct SQL update to the table is a great option.
Relative Speed
I agree with XVargas as far as relative speed.
Unfiltered Views vs Tables: I have not found the performance advantage to be worth the hassle of manually joining the base and extension tables.
Unfiltered views vs Filtered views: I recently was working with a complicated query which took about 15 minutes to run using the filtered views. After switching to the unfiltered views this query ran in about 10 seconds. Looking at the respective query plans, the raw query had 8 operations while the query against the filtered views had over 80 operations.
Unfiltered Views vs API: I have never compared querying through the API against querying views, but I have compared the cost of writing data through the API vs inserting directly through SQL. Importing millions of records through the API can take several days, while the same operation using insert statements might take several minutes. I assume the difference isn't as great during reads but it is probably still large.