Nifi executeSql with 30 threads very slow - oracle

We are using HDF to fetch large data from oracle. We have a generateTableFetch to create partition of 8000 records which create query like below :
Select * from ( Select a.*, ROWNUM rnum FROM (SELECT * FROM OPUSER.DEPENDENCY_TYPES WHERE (1=1))a WHERE ROWNUM <= 368000) WHERE rnum > 361000
Now this query is taking almost 20-25min to return from oracle.
Is there anything wrong that we are doing wrong or any configuration changes we can do.
Nifi uses jdbc connection so is there any oracle side configuration for that.
Also if we somehow add parallelism hint to the query example /parallel(c,2)/. WIll this help?

I'm guessing you're using Oracle 11 (or less) and have selected Oracle as the database type. Since LIMIT/OFFSET wasn't introduced until Oracle 12, NiFi uses the nested SELECT with ROWNUM approach to ensure each "page" of data contains unique values. If you are using Oracle 12+, make sure to use the Oracle 12+ database adapter instead, as it can leverage the LIMIT/OFFSET capabilities resulting in a faster query. Also make sure you have the appropriate index(es) in place to help with query execution.
As of NiFi 1.7.0, you might also consider setting the Column for Value Partitioning property. If you have a column (perhaps your DEPENDENCY_TYPES column) that is fairly uniformly distributed, and is not "too sparse" in relation to your Partition Size property value, GenerateTableFetch can use the column's values rather than the ROWNUM approach, resulting in faster queries. See NIFI-5143 and the GenerateTableFetch documentation for more details.
If you need to add hints to the JDBC session, then as of NiFi 1.9.0 (see NIFI-5780 for more details) you can add pre- and post-query statements to ExecuteSQL.

Related

Is optimizer_use_sql_plan_baselines and resource_manager_cpu_allocation oracle system parameter have impact on sql query performance

Is optimizer_use_sql_plan_baselines and resource_manager_cpu_allocation oracle system parameter have impact on sql query performance.
We have two envt suppose A and B. On A Envt query is running fine but in Envt. B its tacking time. I have compared system parameter and found difference in values in optimizer_use_sql_plan_baselines and resource_manager_cpu_allocation .
SQL plan baselines and the resource manager certainly could have a huge impact on performance, and you should use the below two queries or confirm or deny that those parameters are related to your problem.
GV$SQL stores which SQL plan baseline is associated with each SQL statement. Compare the SQL_PLAN_BASELINE column in the below query, and if they are equal then your problem is not related to baselines:
select sql_plan_baseline, round(elapsed_time/1000000) elapsed_seconds, gv$sql.*
from gv$sql
order by elapsed_time desc;
The Active Session History (ASH) views can tell you if the resource manager is an issue. If your queries are being throttled then you will see an event
named "resmgr:cpu quantum" in the below query. (But pay attention to the counts - don't troubleshoot a wait event if it only happens a small number of times.)
select nvl(event, 'CPU') event, count(*)
from gv$active_session_history
group by event
order by count(*) desc;
Resource manager can have other potentially negative affects. If you're in a data warehouse, and using parallel queries, it's possible that resource manager has downgraded the queries on one system. If you're using parallel queries, try comparing the SQL monitoring reports from both systems:
select dbms_sqltune.report_sql_monitor(sql_id => '&YOUR_SQL_ID') from dual;
However, I have a feeling that you're using the wrong approach for your problem. There are generally two approaches to Oracle database performance - database tuning and query tuning. If you're only interested in a single query, then you should probably focus on things like the execution plan and the wait events for the operations of that specific query.

Offset Management - Confluent JDBC Connector in query mode

As per the confluent documentation, when we use the query mode, we have to do the offset management. As per my understanding, we need to keep track of the last updated timestamp and pass it in the where clause when we restart the program each time. Could anyone confirm if the understanding is correct? Appreciate your help in advance!
You can do both - you can still set the timestamp and incrementing mode in addition to the query. it will simply add a where statement based on timestamp.column.name and/or incrementing.column.name field. You can even use a subquery if your query needs a where statement
As an example you could set your query to: select * from (select apples from tree where color = green) as subquery
with the timestamp.column.name set to ripedate the sql kafka will execute is:
select * from (select apples from tree where color = green) as subquery where ripedate > offsetdate

What's the best practice to filter out specific year in query in Netezza?

I am a SQL Server guy and just started working on Netezza, one thing pops up to me is a daily query to find out the size of a table filtered out by year: 2016,2015, 2014, ...
What I am using now is something like below and it works for me, but I wonder if there is a better way to do it:
select count(1)
from table
where extract(year from datacolumn) = 2016
extract is a built-in function, applying a function on a table with size like 10 billion+ is not imaginable in SQL Server to my knowledge.
Thank you for your advice.
The only problem i see with the query is the where clause which executes a function on the 'variable' side. That effectively disables zonemaps and thus forces netezza to scan all data pages, not only those with data from that year.
Instead write something like:
select count(1)
from table
where datecolumn between '2016-01-01' and '2016-12-31'
A more generic alternative is to create a 'date dimension table' with one row per day in your tables (and a couple of years into the future)
This is an example for Postgres: https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac
This enables you to write code like this:
Select count(1)
From table t join d_date d on t.datecolumn=d.date_actual
Where year_actual=2016
You may not have the generate_series() function on your system, but a 'select row_number()...' can do the same trick. A download is available here: https://www.ibm.com/developerworks/community/wikis/basic/anonymous/api/wiki/76c5f285-8577-4848-b1f3-167b8225e847/page/44d502dd-5a70-4db8-b8ee-6bbffcb32f00/attachment/6cb02340-a342-42e6-8953-aa01cbb10275/media/generate_series.tgz
A couple of further notices in 'date interval' where clauses:
Those columns are the most likely candidate for a zonemaps optimization. Add a 'organize on (datecolumn)' at the bottom of your table DDL and organize your table. That will cause netezza to move around records to pages with similar dates, and the query times will be better.
Furthermore you should ensure that the 'distribute on' clause for the table results in an even distribution across data slices of the table is big. The execution of the query will never be faster than the slowest dataslice.
I hope this helps

Oracle accessing multiple databases

I'm using Oracle SQL Developer version 4.02.15.21.
I need to write a query that accesses multiple databases. All that I'm trying to do is get a list of all the IDs present in "TableX" (There is an instance of Table1 in each of these databases, but with different values) in each database and union all of the results together into one big list.
My problem comes with accessing more than 4 databases -- I get this error: ORA-02020: too many database links in use. I cannot change the INIT.ORA file's open_links maximum limit.
So I've tried dynamically opening/closing these links:
SELECT Local.PUID FROM TableX Local
UNION ALL
----
SELECT Xdb1.PUID FROM TableX#db1 Xdb1;
ALTER SESSION CLOSE DATABASE LINK db1
UNION ALL
----
SELECT Xdb2.PUID FROM TableX#db2 Xdb2;
ALTER SESSION CLOSE DATABASE LINK db2
UNION ALL
----
SELECT Xdb3.PUID FROM TableX#db3 Xdb3;
ALTER SESSION CLOSE DATABASE LINK db3
UNION ALL
----
SELECT Xdb4.PUID FROM TableX#db4 Xdb4;
ALTER SESSION CLOSE DATABASE LINK db4
UNION ALL
----
SELECT Xdb5.PUID FROM TableX#db5 Xdb5;
ALTER SESSION CLOSE DATABASE LINK db5
However this produces 'ORA-02081: database link is not open.' On whichever db is being closed out last.
Can someone please suggest an alternative or adjustment to the above?
Please provide a small sample of your suggestion with syntactically correct SQL if possible.
If you can't change the open_links setting, you cannot have a single query that selects from all the databases you want to query.
If your requirement is to query a large number of databases via database links, it seems highly reasonable to change the open_links setting. If you have one set of people telling you that you need to do X (query data from a large number of tables) and another set of people telling you that you cannot do X, it almost always makes sense to have those two sets of people talk and figure out which imperative wins.
If we can solve the problem without writing a single query, then you have options. You can write a bit of PL/SQL, for example, that selects the data from each table in turn and does something with it. Depending on the number of database links involved, it may make sense to write a loop that generates a dynamic SQL statement for each database link, executes the SQL, and then closes the database link.
If you want need to provide a user with the ability to run a single query that returns all the data, you can write a pipelined table function that implements this sort of loop with dynamic SQL and then let the user query the pipelined table function. This isn't really a single query that fetches the data from all the tables. But it is as close as you're likely to get without modifying the open_links limit.

java 1.4 :how to insert multiple records in a database with one single hit using executeBatch?

i am reading records data from a file(records count can be up to thousands ).Now i want to insert each record in to database.I want to insert all of records in one hit to reduce performance hit. If i use addBatch(String sqlQuery ) on statment object,my sql query should be static .but in my case query will be non static.Please tell me possible solutions with best performance?
platform
java 1.4
sql server 2000.
From Wiki
A SQL feature (since SQL-92) is the use of row value constructors to insert multiple rows at a time in a single SQL statement:
INSERT INTO ''TABLE'' (''column1'', [''column2, ... ''])
VALUES (''value1a'', [''value1b, ...'']),
(''value2a'', [''value2b, ...'']),
...

Resources