Simple UPDATE query on a very big SQL table - performance

I have such a Update Query:
SET NOCOUNT ON;
DECLARE #rows INT, #count INT, #message VARCHAR(100);
SET #rows = 1;
SET #count = 0;
WHILE #rows > 0
BEGIN
update top(100000) [dbo].[Table]
set T1=212
where T1=-10
SET #rows = ##ROWCOUNT;
END
My Tables contains more than 300 Milion rows. I have set my Azure SQL Database to Premium P2 with 250 DTUs. As you can see in Figure, it remains by almost 70% DTU usage.
My question is now: if I scale out my DTUs to 500, Could my update query run faster?

If you are running on a heap (no clustered index), then what you are doing is scanning up to 300M rows to try to find the first 100k rows to update with your condition. Then you do this potentially multiple times. You also may spool the rows into tempdb as well.
If you are running on a clustered index on the column in the where clause, then you will do a range scan on the subset you want to update (which is more efficient). Please consider trying this.
Your current query is either cpu bound (if the pages fit in memory) or io bound (if they don’t). Increasing the dtu will help in either case to improve your query performance. However, you will be happier if you fix the algorithmic issues in your query performance as your next step.

1. If I scale out my DTUs to 500, Could my update query run faster?
Yes, it can.
But for customers to consider, Azure SQL Database doesn't recommend you scale your DTUs directly to improve performance of your database. Please reference:Improving database performance with more resources
.
Summary:
As a general guideline, if your CPU utilization is consistently at or above 80%, you have a running-related performance issue. If you have a running-related issue, it may be caused by insufficient CPU resources or it may be related to one of the following conditions:
Too many running queries
Too many compiling queries
One or more executing queries are using a sub-optimal query plan
Finally, if there are no actionable items that can improve performance of your database, you can change the amount of resources available in Azure SQL Database. You can assign more resources by changing the DTU service tier of a single database or increase the eDTUs of an elastic pool at any time.
As Clay said, for better performance and less costly, Azure also provides some suggestions to help you improve the performance. One of these suggestions is to optimize the query.
For more details, please see: Monitoring and performance tuning.
Hops this helps.

Related

Random spikes in usage (CockroachCloud Serverless)

I recently set up a free CockroachDB Serverless cluster on CockroachCloud. It's been really great so far, but sometimes there are random spikes in Request Units even though the amount of SQL statements doesn't increase at all. Here's a screenshot of the two graphs in the cluster management page, it illustrates pretty well what I mean. I would really appreciate some help on how I could eliminate these spikes because CockroachCloud has some limits on free usage. That being said, I'm still fairly new to CockroachDB, so I might be missing something obvious.
You are likely performing enough mutations on your data to trigger automatic statistics collection as a background process. By default when 20% or more rows are modified in a table, CockroachDB will trigger a statistics refresh. The statistics are used by the optimizer to create more efficient query plans.
Your SQL Statements graph indicates that almost all your operations are inserts. That many inserts is almost certainly triggering stats collection. While you can turn off stats collection, the optimizer will then be using stale data to calculate query plans, potentially causing performance problems.
The occasional spikes in your Request Unit graph are above the 100 RUs per second baseline, but the rest of the time you are well below 100 RUs per second. That means you are accumulating RUs most of the time, and that (plus the initial 10 million RU allocation) should cover the bursts.
I added a FAQ entry to the Serverless docs covering this.

Best way to retrieve 150,000 records from Oracle with JDBC

I have been searching for an answer to this today, and it seems the best approach divides opinion somewhat.
I have 150,000 records that I need to retrieve from an Oracle database using JDBC. Is it better to retrieve the data using one select query and allowing the JDBC driver to take care of transferring the records from the database using Oracle cursor and default fetchSize - OR to split up the query into batches using LIMIT / OFFSET?
With the LIMIT / OFFSET option, I think the pros are that you can take control over the number of results you return in each chunk. The cons are that the query is executed multiple times, and you also need to run a COUNT(*) up front using the same query to calculate the number of iterations required.
The pros of retrieving all at once are that you rely on the JDBC driver to manage the retrieval of data from the database. The cons are that the setFetchSize() hint can sometimes be ignored meaning that we could end up with a huge resultSet containing all 150,000 records at once!!
Would be great to hear some real life experiences solving similar issues, and recommendations would be much appreciated.
The native way in Oracle JDBC is to use the prepareStatement for the query, executeQuery and fetch
in a loop the results with defined fetchSize
Yes, of course the details are Oracle Database and JDBC Driver Version dependent and in some case the required fetchSize
can be ignored. But the typical problem is that the required fetch size is reset to fetchSize = 1 and you effectively makes a round trip for each record. (not that you get all records at once).
Your alternative with LIMIT seems to be meaningfull on the first view. But if you investigate the implementation you will probably decide to not use it.
Say you will divide the result set in 15 chunks 10K each:
You open 15 queries, each of them on average with a half of the resource consumption as the original query (OFFSET select the data and skips them).
So the only think you will reach is that the processing will take aproximatly 7,5x more time.
Best Practice
Take your query, write a simple script with JDBC fetch, use 10046 trace to see the effective used fetch size.
Test with a range of fetch sizes and observe the perfomance; choose the optimal one.
my preference is to maintain a safe execution time with the ability to continue if interrupted. i prefer this approach because it is future proof and respects memory and execution time limits. remember you're not planning for today, you're planning for 6m down the road. what may be 150,000 today may be 1.5m in 6 months.
i use a length + 1 recipe to know if there is more to fetch, although the count query will enable you to do a progress bar in % if that is important.
when considering 150,000 record result set, this is a memory pressure question. this will depend on the average size of each row. if it is a row with three integers, that's small. if it is a row with a bunch of text elements to store user profile details then that's potentially very large. so be prudent with what fields you're pulling.
also need to ask - you may not need to pull all the records all the time. it may be useful to apply a sync pattern. to only pull records with an updated date newer than your last pull.

to index or not in Oracle Exadata

I am new to Oracle Exadata. My question is, to Index or not to Index in Exadata?
Found some of the blogs which says not to Index database Index and only to storage indexes which are temporary, but there are no official documentation from Oracle which says not to index in Exadata.
What are the issues if I index in Exadata? (since it is implemented in memory concepts), will it improve or downgrade performance? Is it better to drop index if already created?
We have huge datas 15 million plus and growing in Oracle Exadata with Varchars, CLOBS and other common datatypes. Not having any indexes created except primary keys. Why query is taking 10 to 12 minutes ( from 15 million records with simple select query having few where conditions) for execution? Oracle says Exadata is the fastest database in the planet.
The decision for an index is independent of the platform. It is always the same process, namely:
Does the benefits of having the index outweigh the cost of having the index.
Costs
has to be maintained
space overhead
might increase contention in high insert/update/delete frequency environments
Benefits
faster response times
The reason you might have less indexes in Exadata is that if other mechanisms (storage indexes, compression, flash, etc etc) can give you response times that meet your business requirements, then you can save on not having the drawbacks of those indexes.
But the decision process remains identical - cost vs benefit.
A common technique to assess an existing index is to make it invisible and see if there is an adverse (or beneficial) impact. In that way, if you have to revert and keep the index, there is no cost in doing so.
In addition to Connor's answer, be aware that an index in not always the best way to access the data. This is true even on non-Exadata storage systems. The process and considerations of whether to use an index is independent of Exadata; what Exadata does is give more reasons/capabilities not to use an index.
The oaktable article (shown in earlier comment) shows why it is better in exadata to most always not have an index. The note from Oracle below explains why. In a non-exadata DB dumb-storage return blocks (usually 8k) not rows, so for large tables a FTS is almost always a bad thing (unless you need most rows). Exadata has smart storage that has info from the query and tries to eliminate bytes that won't answer the query. It tries to return only bytes (not blocks) that may answer the query. This action lowers I/O back to the DB for processing. This way a FTS is not so bad and may actually be preferred. As a DBA, I have a DB with 12TB and many times I have to stick in a NO_INDEX hint to improve queries. This goes against normal modeling theory. Retrieving data from disk is the slowest process in the DB. Exadata removes unneeded data early in the process (at the storage level) and lessens the amount of data sent back to the DB for processing. Many times, my FTS on 8 billion row table is much faster than when using an index... only in Exadata ;)
http://www.oracle.com/technetwork/testcontent/o31exadata-354069.html

Oracle select query performance

I am working on a application. It is in its initial stage so the number of records in table is not large, but later on it will have around 1 million records in the same table.
I want to know what points I should consider while writing select query which will fetch a huge amount of data from table so it does not slow down performance.
First rule:
Don't fetch huge amounts of data back to the application.
Unless you are going to display every single one of the items in the huge amount of data, do not fetch it. Communication between the DBMS and the application is (relatively) slow, so avoid it when possible. It isn't so slow that you shouldn't use the DBMS or anything like that, but if you can reduce the amount of data flowing between DBMS and application, the overall performance will usually improve.
Often, one easy way to do this is to list only those columns you actually need in the application, rather than using 'SELECT *' to retrieve all columns when you'll only use 4 of the 24 that exist.
Second rule:
Try to ensure that the DBMS does not have to look at huge amounts of data.
To the extent possible, minimize the work that the DBMS has to do. It is busy, and typically it is busy on behalf of many people at any given time. If you can reduce the amount of work that the DBMS has to do to process your query, everyone will be happier.
Consider things like ensuring you have appropriate indexes on the table - not too few, not too many. Designed judiciously, indexes can greatly improve the performance of many queries. Always remember, though, that each index has to be maintained, so inserts, deletes and updates are slower when there are more indexes to manage on a given table.
(I should mention: none of this advice is specific to Oracle - you can apply it to any DBMS.)
To get good performance with a database there is a lot of things you need to have in mind. At first, it is the design, and here you should primary think about normalization and denormalization (split up tables but still not as much as performance heavy joins are required).
There are often a big bunch of tuning when it comes to performance. However, 80% of the performance is determined from the SQL-code. Below are some links that might help you.
http://www.smart-soft.co.uk/Oracle/oracle-performance-tuning-part7.htm
http://www.orafaq.com/wiki/Oracle_database_Performance_Tuning_FAQ
A few points to remember:
Fetch only the columns you need to use on the client side.
Ensure you set up the correct indexes that are going to help you find records. These can be done later, but it is better to plan for them if you can.
Ensure you have properly accounted for column widths and data sizes. Don't use an INT when a TINYINT will hold all possible values. A row with 100 TINYINT fields will fetch faster than a row with 100 INT fields, and you'll also be able to fetch more rows per read.
Depending on how clean you need the data to be, it may be permissable to do a "dirty read", where the database fetches data while an update is in progress. This can speed things up significantly in some cases, though it means the data you get might not be the absolute latest.
Give your DBA beer. And hugs.
Jason

DB Index speed vs caching

We have about 10K rows in a table. We want to have a form where we have a select drop down that contains distinct values of a given column in this table. We have an index on the column in question.
To increase performance I created a little cache table that contains the distinct values so we didn't need to do a select distinct field from table against 10K rows. Surprisingly it seems doing select * from cachetable (10 rows) is no faster than doing the select distinct against 10K rows. Why is this? Is the index doing all the work? At what number of rows in our main table will there be a performance improvement by querying the cache table?
For a DB, 10K rows is nothing. You're not seeing much difference because the actual calculation time is minimal, with most of it consumed by other, constant, overhead.
It's difficult to predict when you'd start noticing a difference, but it would probably be at around a million rows.
If you've already set up caching and it's not detrimental, you may as well leave it in.
10k rows is not much... start caring when you reach 500k ~ 1 million rows.
Indexes do a great job, specially if you just have 10 different values for that index.
This depends on numerous factors - the amount of memory your DB has, the size of the rows in the table, use of a parameterised query and so forth, but generally 10K is not a lot of rows and particularly if the table is well indexed then it's not going to cause any modern RDBMS any sweat at all.
As a rule of thumb I would generally only start paying close attention to performance issues on a table when it passes the 100K rows mark, and 500K doesn't usually cause much of a problem if indexed correctly and accessed by such. Performance usually tends to fall off catastrophically on large tables - you may be fine on 500K rows but crawling on 600K - but you have a long way to go before you are at all likely to hit such problems.
Is the index doing all the work?
You can tell how the query is being executed by viewing the execution plan.
For example, try this:
explain plan for select distinct field from table;
select * from table(dbms_xplan.display);
I notice that you didn't include an ORDER BY on that. If you do not include ORDER BY then the order of the result set may be random, particularly if oracle uses the HASH algorithm for making a distinct list. You ought to check that.
So I'd look at the execution plans for the original query that you think is using an index, and at the one based on the cache table. Maybe post them and we can comment on what's really going on.
Incidentaly, the cache table would usually be implemented as a materialised view, particularly if the master table is generally pretty static.
Serious premature optimization. Just let the database do its job, maybe with some tweaking to the configuration (especially if it's MySQL, which has several cache types and settings).
Your query in 10K rows most probably uses HASH SORT UNIQUE.
As 10K most probably fit into db_buffers and hash_area_size, all operations are performed in memory, and you won't note any difference.
But if the query will be used as a part of a more complex query, or will be swapped out by other data, you may need disk I/O to access the data, which will slow your query down.
Run your query in a loop in several sessions (as many sessions as there will be users connected), and see how it performs in that case.
For future plans and for scalability, you may want to look into an indexing service that uses pure memory or something faster than the TCP DB round-trip. A lot of people (including myself) use Lucene to achieve this by normalizing the data into flat files.
Lucene has a built-in Ram Drive directory indexer, which can build the index all in memory - removing the dependency on the file system, and greatly increasing speed.
Lately, I've architected systems that have a single Ram drive index wrapped by a Webservice. Then, I have my Ajax-like dropdowns query into that Webservice for high availability and high speed - no db layer, no file system, just pure memory and if remote tcp packet speed.
If you have an index on the column, then all the values are in the index and the dbms never has to look in the table. It just looks in the index which just has 10 entries. If this is mostly read only data, then cache it in memory. Caching helps scalability and a lot by relieving the database of work. A query that is quick on a database with no users, might perform poorly if a 30 queries are going on at the same time.

Resources