Oracle Bulk Collect - Limit number - oracle

I read about Bulk Collect and wrote some code using it (not deployed yet). The total amount of rows returned is in the vicinity of 80.000. I limited the amount of rows returned in one batch to 10.000, but there is no basis for using this number, I simply improvised.
What would be a good method for determining how to limit the Bulk Collect?

As with anything, the best approach would be to benchmark the different options.
Realistically, though, in the vast majority of cases, there isn't any appreciable benefit to a limit much higher than 100. With a limit of 100, you're eliminating 99% of the context shifts. It's relatively unlikely that the remaining 1% of the context shifts account for a meaningful fraction of the execution time of your code. Reducing the context shifts further probably does nothing for performance and just causes you to use more valuable PGA memory.

Related

Random spikes in usage (CockroachCloud Serverless)

I recently set up a free CockroachDB Serverless cluster on CockroachCloud. It's been really great so far, but sometimes there are random spikes in Request Units even though the amount of SQL statements doesn't increase at all. Here's a screenshot of the two graphs in the cluster management page, it illustrates pretty well what I mean. I would really appreciate some help on how I could eliminate these spikes because CockroachCloud has some limits on free usage. That being said, I'm still fairly new to CockroachDB, so I might be missing something obvious.
You are likely performing enough mutations on your data to trigger automatic statistics collection as a background process. By default when 20% or more rows are modified in a table, CockroachDB will trigger a statistics refresh. The statistics are used by the optimizer to create more efficient query plans.
Your SQL Statements graph indicates that almost all your operations are inserts. That many inserts is almost certainly triggering stats collection. While you can turn off stats collection, the optimizer will then be using stale data to calculate query plans, potentially causing performance problems.
The occasional spikes in your Request Unit graph are above the 100 RUs per second baseline, but the rest of the time you are well below 100 RUs per second. That means you are accumulating RUs most of the time, and that (plus the initial 10 million RU allocation) should cover the bursts.
I added a FAQ entry to the Serverless docs covering this.

JDBC performance tuning - setFetchSize

I'm trying to develop a Scala microservice for data management for an Oracle database. I'm using JDBC drivers to connect to it.
Reading the answers to the performance questions regarding JDBC driver compared to the .NET one, I've understood that one of the more effective vehicle to tune the JDBC reading performance is to set the Fetch Size through the method ResultSet.setFetchSize.
I've tried connecting to an Oracle database to fetch real data for a real business case, with a fixed number of record returned by the DB, and I've measured an exponential behavior of the elapsed time. In particular, fetching 10,000 rows from the database without setting the fetch size resulting in a ridicolously large amount of fetch time, but specifying a fetch size larger than 1,000 resulting in a little amount of time gained (roughly 100 ms over 1 s).
Here's my questions regarding this topic:
I suppose that incrementing too much the fetch size would consume resources inopportunely for a little gain, so is there an even rough method to estimate the size of the ResultSet before actually fetching it? I've read about the following technique:
result.last();
result.getRow();
but this would mean scroll the entire ResultSet, and I was wondering if there's any even rough accurate technique to evaluate the count;
I've estimated that a good fetch size would be 1/10th of the number of record selected, but is there a documented rule to try to automatically estimate the correct fetch size for the largest number of cases?
Please do not set fetch size too large, unless you have network bottleneck between application and database. The larger the fetch size, the more memory consumed.
In my experience, 1024 - 2048 will lead to best performance most of the time. See
https://docs.oracle.com/javase/tutorial/jdbc/basics/retrieving.html discussing some details, but the default setting is usually best.
Do not try to get the total numbers of rows in the result set, it is not the best practice.
And finally, I want to point out that based on the hundreds of thousands of time optimize about JVM and jit, the bottleneck seems never happens on fetch size of JDBC after you set it with 1000-2000, but on the SQL performance, applications or resource limit and etc.

Disecting a Function for speed

I wanted to know if there was a way to measure the performance of a function. In parts.
I know that you are able to measure the total time it takes to complete the function but is there a way to measure the individual queires within a function?
Just wanted to know because I can not find the bottleneck for my function's performance.
Most of the time when you see a major difference between the estimated and the actual execution plans, it is because your statistics have not been (ever) updated. The SQL Server therefore has no idea which tables have little data, which ones are huge, and so on, and is more likely to generate bogus plans (both estimated and actual), or to miscalculate estimated plan costs. The actual plan is based on real, accurate costs of the plan, but when the plan is very far from an optimal one, this accuracy is of very little value for determining bottlenecks.
To correct this, issue the UPDATE STATISTICS statement or execute the sp_updatestats procedure.
Seeing 100% actuals for your function might well be an effect of empty or almost empty database, regardless of whether you have uptodate statistics or not.
When optimizing for performance, make sure that your database is populated quasi-realistically with lots of data (put twice as much records to each table than what you expect for production; but do maintain the expected rough proportions). There is not much point in looking for a performance bottleneck using an empty or an entirely, disproportionately overblown database; query plans will be different and even if the plan will happen to be the same one, the bottleneck may be elsewhere than in production.

Understanding the results of Execute Explain Plan in Oracle SQL Developer

I'm trying to optimize a query but don't quite understand some of the information returned from Explain Plan. Can anyone tell me the significance of the OPTIONS and COST columns? In the OPTIONS column, I only see the word FULL. In the COST column, I can deduce that a lower cost means a faster query. But what exactly does the cost value represent and what is an acceptable threshold?
The output of EXPLAIN PLAN is a debug output from Oracle's query optimiser. The COST is the final output of the Cost-based optimiser (CBO), the purpose of which is to select which of the many different possible plans should be used to run the query. The CBO calculates a relative Cost for each plan, then picks the plan with the lowest cost.
(Note: in some cases the CBO does not have enough time to evaluate every possible plan; in these cases it just picks the plan with the lowest cost found so far)
In general, one of the biggest contributors to a slow query is the number of rows read to service the query (blocks, to be more precise), so the cost will be based in part on the number of rows the optimiser estimates will need to be read.
For example, lets say you have the following query:
SELECT emp_id FROM employees WHERE months_of_service = 6;
(The months_of_service column has a NOT NULL constraint on it and an ordinary index on it.)
There are two basic plans the optimiser might choose here:
Plan 1: Read all the rows from the "employees" table, for each, check if the predicate is true (months_of_service=6).
Plan 2: Read the index where months_of_service=6 (this results in a set of ROWIDs), then access the table based on the ROWIDs returned.
Let's imagine the "employees" table has 1,000,000 (1 million) rows. Let's further imagine that the values for months_of_service range from 1 to 12 and are fairly evenly distributed for some reason.
The cost of Plan 1, which involves a FULL SCAN, will be the cost of reading all the rows in the employees table, which is approximately equal to 1,000,000; but since Oracle will often be able to read the blocks using multi-block reads, the actual cost will be lower (depending on how your database is set up) - e.g. let's imagine the multi-block read count is 10 - the calculated cost of the full scan will be 1,000,000 / 10; Overal cost = 100,000.
The cost of Plan 2, which involves an INDEX RANGE SCAN and a table lookup by ROWID, will be the cost of scanning the index, plus the cost of accessing the table by ROWID. I won't go into how index range scans are costed but let's imagine the cost of the index range scan is 1 per row; we expect to find a match in 1 out of 12 cases, so the cost of the index scan is 1,000,000 / 12 = 83,333; plus the cost of accessing the table (assume 1 block read per access, we can't use multi-block reads here) = 83,333; Overall cost = 166,666.
As you can see, the cost of Plan 1 (full scan) is LESS than the cost of Plan 2 (index scan + access by rowid) - which means the CBO would choose the FULL scan.
If the assumptions made here by the optimiser are true, then in fact Plan 1 will be preferable and much more efficient than Plan 2 - which disproves the myth that FULL scans are "always bad".
The results would be quite different if the optimiser goal was FIRST_ROWS(n) instead of ALL_ROWS - in which case the optimiser would favour Plan 2 because it will often return the first few rows quicker, at the cost of being less efficient for the entire query.
The CBO builds a decision tree, estimating the costs of each possible execution path available per query. The costs are set by the CPU_cost or I/O_cost parameter set on the instance. And the CBO estimates the costs, as best it can with the existing statistics of the tables and indexes that the query will use. You should not tune your query based on cost alone. Cost allows you to understand WHY the optimizer is doing what it does. Without cost you could figure out why the optimizer chose the plan it did. Lower cost does not mean a faster query. There are cases where this is true and there will be cases where this is wrong. Cost is based on your table stats and if they are wrong the cost is going to be wrong.
When tuning your query, you should take a look at the cardinality and the number of rows of each step. Do they make sense? Is the cardinality the optimizer is assuming correct? Is the rows being return reasonable. If the information present is wrong then its very likely the optimizer doesn't have the proper information it needs to make the right decision. This could be due to stale or missing statistics on the table and index as well as cpu-stats. Its best to have stats updated when tuning a query to get the most out of the optimizer. Knowing your schema is also of great help when tuning. Knowing when the optimizer chose a really bad decision and pointing it in the correct path with a small hint can save a load of time.
Here is a reference for using EXPLAIN PLAN with Oracle: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm), with specific information about the columns found here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm#i18300
Your mention of 'FULL' indicates to me that the query is doing a full-table scan to find your data. This is okay, in certain situations, otherwise an indicator of poor indexing / query writing.
Generally, with explain plans, you want to ensure your query is utilizing keys, thus Oracle can find the data you're looking for with accessing the least number of rows possible. Ultimately, you can sometime only get so far with the architecture of your tables. If the costs remain too high, you may have to think about adjusting the layout of your schema to be more performance based.
In recent Oracle versions the COST represent the amount of time that the optimiser expects the query to take, expressed in units of the amount of time required for a single block read.
So if a single block read takes 2ms and the cost is expressed as "250", the query could be expected to take 500ms to complete.
The optimiser calculates the cost based on the estimated number of single block and multiblock reads, and the CPU consumption of the plan. the latter can be very useful in minimising the cost by performing certain operations before others to try and avoid high CPU cost operations.
This raises the question of how the optimiser knows how long operations take. recent Oracle versions allow the collections of "system statistics", which are definitely not to be confused with statistics on tables or indexes. The system statistics are measurements of the performance of the hardware, mostly importantly:
How long a single block read takes
How long a multiblock read takes
How large a multiblock read is (often different to the maximum possible due to table extents being smaller than the maximum, and other reasons).
CPU performance
These numbers can vary greatly according to the operating environment of the system, and different sets of statistics can be stored for "daytime OLTP" operations and "nighttime batch reporting" operations, and for "end of month reporting" if you wish.
Given these sets of statistics, a given query execution plan can be evaluated for cost in different operating environments, which might promote use of full table scans at some times or index scans at others.
The cost is not perfect, but the optimiser gets better at self-monitoring with every release, and can feedback the actual cost in comparison to the estimated cost in order to make better decisions for the future. this also makes it rather more difficult to predict.
Note that the cost is not necessarily wall clock time, as parallel query operations consume a total amount of time across multiple threads.
In older versions of Oracle the cost of CPU operations was ignored, and the relative costs of single and multiblock reads were effectively fixed according to init parameters.
FULL is probably referring to a full table scan, which means that no indexes are in use. This is usually indicating that something is wrong, unless the query is supposed to use all the rows in a table.
Cost is a number that signals the sum of the different loads, processor, memory, disk, IO, and high numbers are typically bad. The numbers are added up when moving to the root of the plan, and each branch should be examined to locate the bottlenecks.
You may also want to query v$sql and v$session to get statistics about SQL statements, and this will have detailed metrics for all kind of resources, timings and executions.

Windows Performance Counter limits

What limits exist on the amount of data one can publish to a custom Windows performance counter category?
I understand there is no hard limit on the number of counters or the number of instances, but rather there is a memory limit for the entire category. What is that limit?
Is there a limit on the total number or size of all performance counter categories? What else should be taken into account when dealing with a relatively large amount of data that needs to be published?
To put this into perspective, I need to publish around 50,000 32bit counter-instance-values. I could split these up into categories in various ways, depending on what limits exist.
I appreciate that performance counters may not be the best solution, but there are reasons for this madness.
Under what circumstances would you need to publish tens of thousands of counters.
Remember that the tools that read those perf counters typically aren't designed for such massive data sets (althought they might be). As a result, it is possible that while you'll be able to author such a data set, the tools that read your data will fail in "interesting" ways.
You might want to reconsider your need to collect so much data. Do you really need 50,000 perf counters? What will you do with the information once you collect it? Will you really be able to gather meaningful information from 50,000 counters?
Is there actually a limit though? I thought you basically just published a block of shared memory - why not just increase the size of the block? What makes you think there is a limit?

Resources