Execution of Union in Oracle - Parallel or Sequential?

Execution of Union in Oracle - Parallel or Sequential? - oracle

How does a Union in Oracle execute ?
If I have 4 selects in my union statement, will the selects execute in parallel or sequential manner?

It depends on Oracle database version you use and OPTIMIZER_FEATURES_ENABLE initialization parameter.
Shortly: if set to 12.1 (or higher), concurrent execution is enabled by default. If it is set to a lower version, concurrent execution can be enabled by the P1_CONCURRENT_UNION hint.
Documentation: Concurrent execution of UNION ALL says:
Traditionally, set operators are processed in a sequential manner. Individual branches can be processed in serial or parallel, but only one branch at a time, one branch after another. (...)
The default behavior of concurrent execution for UNION or UNION ALL statements is controlled by the setting of the OPTIMIZER_FEATURES_ENABLE initialization parameter.
Read the whole document for more info. Also, search for the same subject in other Oracle database version documentation (if it differs from 12c whose link I posted).

Related

Is optimizer_use_sql_plan_baselines and resource_manager_cpu_allocation oracle system parameter have impact on sql query performance

Is optimizer_use_sql_plan_baselines and resource_manager_cpu_allocation oracle system parameter have impact on sql query performance.
We have two envt suppose A and B. On A Envt query is running fine but in Envt. B its tacking time. I have compared system parameter and found difference in values in optimizer_use_sql_plan_baselines and resource_manager_cpu_allocation .

SQL plan baselines and the resource manager certainly could have a huge impact on performance, and you should use the below two queries or confirm or deny that those parameters are related to your problem.
GV$SQL stores which SQL plan baseline is associated with each SQL statement. Compare the SQL_PLAN_BASELINE column in the below query, and if they are equal then your problem is not related to baselines:
select sql_plan_baseline, round(elapsed_time/1000000) elapsed_seconds, gv$sql.*
from gv$sql
order by elapsed_time desc;
The Active Session History (ASH) views can tell you if the resource manager is an issue. If your queries are being throttled then you will see an event
named "resmgr:cpu quantum" in the below query. (But pay attention to the counts - don't troubleshoot a wait event if it only happens a small number of times.)
select nvl(event, 'CPU') event, count(*)
from gv$active_session_history
group by event
order by count(*) desc;
Resource manager can have other potentially negative affects. If you're in a data warehouse, and using parallel queries, it's possible that resource manager has downgraded the queries on one system. If you're using parallel queries, try comparing the SQL monitoring reports from both systems:
select dbms_sqltune.report_sql_monitor(sql_id => '&YOUR_SQL_ID') from dual;
However, I have a feeling that you're using the wrong approach for your problem. There are generally two approaches to Oracle database performance - database tuning and query tuning. If you're only interested in a single query, then you should probably focus on things like the execution plan and the wait events for the operations of that specific query.

T-SQL : why is running multiple sql statements in one batch slower without GO?

I have come to very interesting problem (at least for me).
When I run following SQL:
SELECT count(*) AS [count]
FROM [dbo].[contract_v] AS [contract_v]
WHERE 1 = 0;
SELECT *
FROM [dbo].[contract] AS [contract]
LEFT JOIN ([dbo].[contract_accepted_garbage_type] AS [garbageTypes->contract_accepted_garbage_type]
INNER JOIN [dbo].[garbage_type] AS [garbageTypes] ON [garbageTypes].[id] = [garbageTypes->contract_accepted_garbage_type].[garbage_type_id])
ON [contract].[id] = [garbageTypes->contract_accepted_garbage_type].[contract_id]
WHERE [contract].[id] IN (125018);
Execution takes 21s
However when I add GO statement as following:
SELECT count(*) AS [count]
FROM [dbo].[contract_v] AS [contract_v]
WHERE 1 = 0;
GO
SELECT *
FROM [dbo].[contract] AS [contract]
LEFT JOIN ([dbo].[contract_accepted_garbage_type] AS [garbageTypes->contract_accepted_garbage_type]
INNER JOIN [dbo].[garbage_type] AS [garbageTypes] ON [garbageTypes].[id] = [garbageTypes->contract_accepted_garbage_type].[garbage_type_id])
ON [contract].[id] = [garbageTypes->contract_accepted_garbage_type].[contract_id]
WHERE [contract].[id] IN (125018);
It takes only 2s.
The view used in first SQL statement is based on the table called in second statement.
Could you please explain this behaviour to me? I know that GO statement makes database create separate execution plan for every batch. I have checked the execution plans, and the actual steps are identical.
Thank you!

The GO keyword separates execution batches. If the underlying tables are the same in both queries, and they are executed in the same batch, both queries have to be executed with the same transaction context. This ensures that the underlying data in both tables is the same during both executions.
If using separate batches (GO statement in-between), you cannot guarantee that the data will be consistent in that rows could theoretically be modified in between executions.
If you don't care about the chance of the data changing in between queries, then by all means use GO for performance. If you do care, consider it a dangerous move.
SQL Server applications can send multiple Transact-SQL statements to an instance of SQL Server for execution as a batch. The statements in the batch are then compiled into a single execution plan. Programmers executing ad hoc statements in the SQL Server utilities, or building scripts of Transact-SQL statements to run through the SQL Server utilities, use GO to signal the end of a batch.
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/sql-server-utilities-statements-go?view=sql-server-ver15

Nifi executeSql with 30 threads very slow

We are using HDF to fetch large data from oracle. We have a generateTableFetch to create partition of 8000 records which create query like below :
Select * from ( Select a.*, ROWNUM rnum FROM (SELECT * FROM OPUSER.DEPENDENCY_TYPES WHERE (1=1))a WHERE ROWNUM <= 368000) WHERE rnum > 361000
Now this query is taking almost 20-25min to return from oracle.
Is there anything wrong that we are doing wrong or any configuration changes we can do.
Nifi uses jdbc connection so is there any oracle side configuration for that.
Also if we somehow add parallelism hint to the query example /parallel(c,2)/. WIll this help?

I'm guessing you're using Oracle 11 (or less) and have selected Oracle as the database type. Since LIMIT/OFFSET wasn't introduced until Oracle 12, NiFi uses the nested SELECT with ROWNUM approach to ensure each "page" of data contains unique values. If you are using Oracle 12+, make sure to use the Oracle 12+ database adapter instead, as it can leverage the LIMIT/OFFSET capabilities resulting in a faster query. Also make sure you have the appropriate index(es) in place to help with query execution.
As of NiFi 1.7.0, you might also consider setting the Column for Value Partitioning property. If you have a column (perhaps your DEPENDENCY_TYPES column) that is fairly uniformly distributed, and is not "too sparse" in relation to your Partition Size property value, GenerateTableFetch can use the column's values rather than the ROWNUM approach, resulting in faster queries. See NIFI-5143 and the GenerateTableFetch documentation for more details.
If you need to add hints to the JDBC session, then as of NiFi 1.9.0 (see NIFI-5780 for more details) you can add pre- and post-query statements to ExecuteSQL.

What is the Difference in using FORCE, ENABLE, DISABLE keyword in PARALLEL QUERY/DML/DDL

What is the difference in following statements
ALTER SESSION FORCE PARALLEL QUERY;
ALTER SESSION ENABLE PARALLEL DDL;
ALTER SESSION DISABLE PARALLEL DML;
Which one is more suggested for the optimization point of view. Initially, I was using DISABLE, later on I tested with ENABLE which performed better and now FORCE is doing better. Is there any chance that FORCE will backfire in any case.

Please refer to http://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_2013.htm
PARALLEL DML | DDL | QUERY
The PARALLEL parameter determines whether all subsequent DML, DDL, or query statements in the session will be considered for parallel execution. This clause enables you to override the degree of parallelism of tables during the current session without changing the tables themselves. Uncommitted transactions must either be committed or rolled back prior to executing this clause for DML.
ENABLE Clause
Specify ENABLE to execute subsequent statements in the session in parallel. This is the default for DDL and query statements.
•
DML: DML statements are executed in parallel mode if a parallel hint or a parallel clause is specified.
•
DDL: DDL statements are executed in parallel mode if a parallel clause is specified.
•
QUERY: Queries are executed in parallel mode if a parallel hint or a parallel clause is specified.
Restriction on the ENABLE clause You cannot specify the optional PARALLEL integer with ENABLE.
DISABLE Clause
Specify DISABLE to execute subsequent statements in the session serially. This is the default for DML statements.
•
DML: DML statements are executed serially.
•
DDL: DDL statements are executed serially.
•
QUERY: Queries are executed serially.
Restriction on the DISABLE clause You cannot specify the optional PARALLEL integer with DISABLE.
FORCE Clause
FORCE forces parallel execution of subsequent statements in the session. If no parallel clause or hint is specified, then a default degree of parallelism is used. This clause overrides any parallel_clause specified in subsequent statements in the session but is overridden by a parallel hint.
•
DML: Provided no parallel DML restrictions are violated, subsequent DML statements in the session are executed with the default degree of parallelism, unless a degree is specified in this clause.
•
DDL: Subsequent DDL statements in the session are executed with the default degree of parallelism, unless a degree is specified in this clause. Resulting database objects will have associated with them the prevailing degree of parallelism.
Specifying FORCE DDL automatically causes all tables created in this session to be created with a default level of parallelism. The effect is the same as if you had specified the parallel_clause (with the default degree) in the CREATE TABLE statement.
•
QUERY: Subsequent queries are executed with the default degree of parallelism, unless a degree is specified in this clause.
PARALLEL integer Specify an integer to explicitly specify a degree of parallelism:
•
For FORCE DDL, the degree overrides any parallel clause in subsequent DDL statements.
•
For FORCE DML and QUERY, the degree overrides the degree currently stored for the table in the data dictionary.
•
A degree specified in a statement through a hint will override the degree being forced.
The following types of DML operations are not parallelized regardless of this clause:
•
Operations on cluster tables
•
Operations with embedded functions that either write or read database or package states
•
Operations on tables with triggers that could fire
•
Operations on tables or schema objects containing object types, or LONG or LOB data types

Difference between count (*) and count (1) with join [duplicate]

Just wondering if any of you people use Count(1) over Count(*) and if there is a noticeable difference in performance or if this is just a legacy habit that has been brought forward from days gone past?
The specific database is SQL Server 2005.

There is no difference.
Reason:
Books on-line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"
"1" is a non-null expression: so it's the same as COUNT(*).
The optimizer recognizes it for what it is: trivial.
The same as EXISTS (SELECT * ... or EXISTS (SELECT 1 ...
Example:
SELECT COUNT(1) FROM dbo.tab800krows
SELECT COUNT(1),FKID FROM dbo.tab800krows GROUP BY FKID
SELECT COUNT(*) FROM dbo.tab800krows
SELECT COUNT(*),FKID FROM dbo.tab800krows GROUP BY FKID
Same IO, same plan, the works
Edit, Aug 2011
Similar question on DBA.SE.
Edit, Dec 2011
COUNT(*) is mentioned specifically in ANSI-92 (look for "Scalar expressions 125")
Case:
a) If COUNT(*) is specified, then the result is the cardinality of T.
That is, the ANSI standard recognizes it as bleeding obvious what you mean. COUNT(1) has been optimized out by RDBMS vendors because of this superstition. Otherwise it would be evaluated as per ANSI
b) Otherwise, let TX be the single-column table that is the
result of applying the <value expression> to each row of T
and eliminating null values. If one or more null values are
eliminated, then a completion condition is raised: warning-

In SQL Server, these statements yield the same plans.
Contrary to the popular opinion, in Oracle they do too.
SYS_GUID() in Oracle is quite computation intensive function.
In my test database, t_even is a table with 1,000,000 rows
This query:
SELECT COUNT(SYS_GUID())
FROM t_even
runs for 48 seconds, since the function needs to evaluate each SYS_GUID() returned to make sure it's not a NULL.
However, this query:
SELECT COUNT(*)
FROM (
SELECT SYS_GUID()
FROM t_even
)
runs for but 2 seconds, since it doen't even try to evaluate SYS_GUID() (despite * being argument to COUNT(*))

I work on the SQL Server team and I can hopefully clarify a few points in this thread (I had not seen it previously, so I am sorry the engineering team has not done so previously).
First, there is no semantic difference between select count(1) from table vs. select count(*) from table. They return the same results in all cases (and it is a bug if not). As noted in the other answers, select count(column) from table is semantically different and does not always return the same results as count(*).
Second, with respect to performance, there are two aspects that would matter in SQL Server (and SQL Azure): compilation-time work and execution-time work. The Compilation time work is a trivially small amount of extra work in the current implementation. There is an expansion of the * to all columns in some cases followed by a reduction back to 1 column being output due to how some of the internal operations work in binding and optimization. I doubt it would show up in any measurable test, and it would likely get lost in the noise of all the other things that happen under the covers (such as auto-stats, xevent sessions, query store overhead, triggers, etc.). It is maybe a few thousand extra CPU instructions. So, count(1) does a tiny bit less work during compilation (which will usually happen once and the plan is cached across multiple subsequent executions). For execution time, assuming the plans are the same there should be no measurable difference. (One of the earlier examples shows a difference - it is most likely due to other factors on the machine if the plan is the same).
As to how the plan can potentially be different. These are extremely unlikely to happen, but it is potentially possible in the architecture of the current optimizer. SQL Server's optimizer works as a search program (think: computer program playing chess searching through various alternatives for different parts of the query and costing out the alternatives to find the cheapest plan in reasonable time). This search has a few limits on how it operates to keep query compilation finishing in reasonable time. For queries beyond the most trivial, there are phases of the search and they deal with tranches of queries based on how costly the optimizer thinks the query is to potentially execute. There are 3 main search phases, and each phase can run more aggressive(expensive) heuristics trying to find a cheaper plan than any prior solution. Ultimately, there is a decision process at the end of each phase that tries to determine whether it should return the plan it found so far or should it keep searching. This process uses the total time taken so far vs. the estimated cost of the best plan found so far. So, on different machines with different speeds of CPUs it is possible (albeit rare) to get different plans due to timing out in an earlier phase with a plan vs. continuing into the next search phase. There are also a few similar scenarios related to timing out of the last phase and potentially running out of memory on very, very expensive queries that consume all the memory on the machine (not usually a problem on 64-bit but it was a larger concern back on 32-bit servers). Ultimately, if you get a different plan the performance at runtime would differ. I don't think it is remotely likely that the difference in compilation time would EVER lead to any of these conditions happening.
Net-net: Please use whichever of the two you want as none of this matters in any practical form. (There are far, far larger factors that impact performance in SQL beyond this topic, honestly).
I hope this helps. I did write a book chapter about how the optimizer works but I don't know if its appropriate to post it here (as I get tiny royalties from it still I believe). So, instead of posting that I'll post a link to a talk I gave at SQLBits in the UK about how the optimizer works at a high level so you can see the different main phases of the search in a bit more detail if you want to learn about that. Here's the video link: https://sqlbits.com/Sessions/Event6/inside_the_sql_server_query_optimizer

Clearly, COUNT(*) and COUNT(1) will always return the same result. Therefore, if one were slower than the other it would effectively be due to an optimiser bug. Since both forms are used very frequently in queries, it would make no sense for a DBMS to allow such a bug to remain unfixed. Hence you will find that the performance of both forms is (probably) identical in all major SQL DBMSs.

In the SQL-92 Standard, COUNT(*) specifically means "the cardinality of the table expression" (could be a base table, `VIEW, derived table, CTE, etc).
I guess the idea was that COUNT(*) is easy to parse. Using any other expression requires the parser to ensure it doesn't reference any columns (COUNT('a') where a is a literal and COUNT(a) where a is a column can yield different results).
In the same vein, COUNT(*) can be easily picked out by a human coder familiar with the SQL Standards, a useful skill when working with more than one vendor's SQL offering.
Also, in the special case SELECT COUNT(*) FROM MyPersistedTable;, the thinking is the DBMS is likely to hold statistics for the cardinality of the table.
Therefore, because COUNT(1) and COUNT(*) are semantically equivalent, I use COUNT(*).

COUNT(*) and COUNT(1) are same in case of result and performance.

I would expect the optimiser to ensure there is no real difference outside weird edge cases.
As with anything, the only real way to tell is to measure your specific cases.
That said, I've always used COUNT(*).

As this question comes up again and again, here is one more answer. I hope to add something for beginners wondering about "best practice" here.
SELECT COUNT(*) FROM something counts records which is an easy task.
SELECT COUNT(1) FROM something retrieves a 1 per record and than counts the 1s that are not null, which is essentially counting records, only more complicated.
Having said this: Good dbms notice that the second statement will result in the same count as the first statement and re-interprete it accordingly, as not to do unnecessary work. So usually both statements will result in the same execution plan and take the same amount of time.
However from the point of readability you should use the first statement. You want to count records, so count records, not expressions. Use COUNT(expression) only when you want to count non-null occurences of something.

I ran a quick test on SQL Server 2012 on an 8 GB RAM hyper-v box. You can see the results for yourself. I was not running any other windowed application apart from SQL Server Management Studio while running these tests.
My table schema:
CREATE TABLE [dbo].[employee](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Total number of records in Employee table: 178090131 (~ 178 million rows)
First Query:
Set Statistics Time On
Go
Select Count(*) From Employee
Go
Set Statistics Time Off
Go
Result of First Query:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 35 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 10766 ms, elapsed time = 70265 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Second Query:
Set Statistics Time On
Go
Select Count(1) From Employee
Go
Set Statistics Time Off
Go
Result of Second Query:
SQL Server parse and compile time:
CPU time = 14 ms, elapsed time = 14 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 11031 ms, elapsed time = 70182 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
You can notice there is a difference of 83 (= 70265 - 70182) milliseconds which can easily be attributed to exact system condition at the time queries are run. Also I did a single run, so this difference will become more accurate if I do several runs and do some averaging. If for such a huge data-set the difference is coming less than 100 milliseconds, then we can easily conclude that the two queries do not have any performance difference exhibited by the SQL Server Engine.
Note : RAM hits close to 100% usage in both the runs. I restarted SQL Server service before starting both the runs.

SET STATISTICS TIME ON
select count(1) from MyTable (nolock) -- table containing 1 million records.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 36 ms.
select count(*) from MyTable (nolock) -- table containing 1 million records.
SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 37 ms.
I've ran this hundreds of times, clearing cache every time.. The results vary from time to time as server load varies, but almost always count(*) has higher cpu time.

There is an article showing that the COUNT(1) on Oracle is just an alias to COUNT(*), with a proof about that.
I will quote some parts:
There is a part of the database software that is called “The
Optimizer”, which is defined in the official documentation as
“Built-in database software that determines the most efficient way to
execute a SQL statement“.
One of the components of the optimizer is called “the transformer”,
whose role is to determine whether it is advantageous to rewrite the
original SQL statement into a semantically equivalent SQL statement
that could be more efficient.
Would you like to see what the optimizer does when you write a query
using COUNT(1)?
With a user with ALTER SESSION privilege, you can put a tracefile_identifier, enable the optimizer tracing and run the COUNT(1) select, like: SELECT /* test-1 */ COUNT(1) FROM employees;.
After that, you need to localize the trace files, what can be done with SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Diag Trace';. Later on the file, you will find:
SELECT COUNT(*) “COUNT(1)” FROM “COURSE”.”EMPLOYEES” “EMPLOYEES”
As you can see, it's just an alias for COUNT(*).
Another important comment: the COUNT(*) was really faster two decades ago on Oracle, before Oracle 7.3:
Count(1) has been rewritten in count(*) since 7.3 because Oracle like
to Auto-tune mythic statements. In earlier Oracle7, oracle had to
evaluate (1) for each row, as a function, before DETERMINISTIC and
NON-DETERMINISTIC exist.
So two decades ago, count(*) was faster
For another databases as Sql Server, it should be researched individually for each one.
I know that this question is specific for SQL Server, but the other questions on SO about the same subject (without mention a specific database) were closed and marked as duplicated from this answer.

In all RDBMS, the two ways of counting are equivalent in terms of what result they produce. Regarding performance, I have not observed any performance difference in SQL Server, but it may be worth pointing out that some RDBMS, e.g. PostgreSQL 11, have less optimal implementations for COUNT(1) as they check for the argument expression's nullability as can be seen in this post.
I've found a 10% performance difference for 1M rows when running:
-- Faster
SELECT COUNT(*) FROM t;
-- 10% slower
SELECT COUNT(1) FROM t;

COUNT(1) is not substantially different from COUNT(*), if at all. As to the question of COUNTing NULLable COLUMNs, this can be straightforward to demo the differences between COUNT(*) and COUNT(<some col>)--
USE tempdb;
GO
IF OBJECT_ID( N'dbo.Blitzen', N'U') IS NOT NULL DROP TABLE dbo.Blitzen;
GO
CREATE TABLE dbo.Blitzen (ID INT NULL, Somelala CHAR(1) NULL);
INSERT dbo.Blitzen SELECT 1, 'A';
INSERT dbo.Blitzen SELECT NULL, NULL;
INSERT dbo.Blitzen SELECT NULL, 'A';
INSERT dbo.Blitzen SELECT 1, NULL;
SELECT COUNT(*), COUNT(1), COUNT(ID), COUNT(Somelala) FROM dbo.Blitzen;
GO
DROP TABLE dbo.Blitzen;
GO

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio