Executing data validation rules from a table using a procedure with runtime values - oracle

I have a table that is filled with many data validation queries. For example a row:
SELECT end_time - start_time
FROM mt_process_status
WHERE process_id = <PROCESS_ID> AND ref_date = <REF_DATE>
I have to execute all these SQL statements filling the values within '<>' with run time values and check if the performance of the process has not changed.
Can this be done with a stored procedure? I want to understand what the solution would look like. Any links to documentation of this sort of thing, anything to guide me in the right direction.

Somewhere there's a bunch of analysts in your organization telling each other, "We've done the difficult stuff, we've defined the queries. All the database has to do is execute them, how hard can that be?" Answer: very hard.
Let's take the query you posted:
select end_time - start_time from mt_process_status where process_id = <PROCESS_ID> AND ref_date = <REF_DATE>
It's easy enough to use replace(the_str, '<PROCESS_ID>', 1234) to substitute a value. But for ref_date that's presumably a date, so it needs to be replace(the_str, '<REF_DATE>', 'date ''2017-01-01'''). Starting to get icky, and that's just handling literals. It will be even ickier when the substitution values are passed as parameters.
Of course I've made an assumption that PROCESS_ID is numeric. Maybe it isn't. Who can tell? Is there a data dictionary where these details are defined?
It would be easier if the query was defined with dynamic SQL placeholders:
select end_time - start_time from mt_process_status where process_id = :PROCESS_ID AND ref_date = :REF_DATE
Then you could forget about the replace and simply run
execute immediate the_str
using 1234, date '2017-01-01'
into whatever;
But you still need to know how many placeholders there are, in what order they occur and what datatype they are. It may feel like this is soft-coded and configurable but there is still a really hard dependency between the query and the program which calls it.
Plus you have lost the ability to do impact analysis. What queries will be affected when you change mt_process_status? Who can tell?

CREATE OR REPLACE PROCEDURE MY_PROC(g_id)
AS
l_query varchar2(1000);
l_duration number;
BEGIN
-- Get the query from the table
select query into l_query from my_table where id=g_id;
-- l_query contains "select end_time - start_time from mt_process_status where process_id = <PROCESS_ID> and ref_date = <REF_DATE>"
-- Put value to replace the tags (take care of code injection...)
l_query := REPLACE(l_query,'<PROCESS_ID>', something);
l_query := REPLACE(l_query,'<REF_DATE>', something_else);
EXECUTE IMMEDIATE lquery RETURN INTO l_duration;
-- Do what you have to do...
-- If you do a function, then you can:
-- RETURN l_duration;
END;
/
Your question is not clear. This is the answer to your question. Use variables in a procedure.
regards.

I would put the reference data in another table and do it as a join

Related

How do I declare a variable and use it in a subsequent query

I'm trying to do something that's really simple in TSQL but I'm utterly stuck doing the same in PlSQL.
I want to do the equivalent of this:-
declare #Today Date = GetDate();
declare #FirstDate Date;
declare #LastDate Date;
with cteStartDates as
(
Select START_DATE
From TABLE1
Union All
Select START_DATE
From TABLE2
)
Select #FirstDate = MIN(START_DATE),
#LastDate = DateAdd(Day, 1, #Today)
From cteStartDates;
Basically, I'm trying to get the a start date and end date where start date is the date of a first record in one of two tables and end date is tomorrow. I then need to use #FirstDate and #LastDate as parameters in a whole bunch of subsequent queries.
I'm falling at the first hurdle in PLSQL. I can't even work out how to do the equivalent of this:-
declare #Today Date = GetDate();
I've tried reading around Oracle Variables but I'm not really understanding it. I don't understand the difference between DEFINE, DECLARE and VARIABLE and no matter which I try I can't seem to get it to work and keep getting problems I don't really understand.
For example (based on Declare but I've tried all the following with Define and Variable also), I've tried this as an experiment (assign a value to variable and then issue an otherwise valid query which doesn't even use the variable):-
Declare
v_Today Date;
Begin
Select sysdate into v_Today from dual;
Select ID From atable;
End;
That tells me I need an Into clause on the second select. I don't really understand why, I'm not trying to assign ID to anything, I just want to select it. I've seen some examples that sort of imply that an into will define the column names (I'm not sure I've understood that correctly though). OK, I tried this:-
Declare
v_Today Date;
Begin
Select sysdate into v_Today from dual;
Select ID into IDColumn From atable;
End;
That gives me a error saying identifier IDColumn must be declared so clearly the into can't simply name columns.
From examples I get the impression that perhaps the begin and end surround the bock in which variables are assigned values that can then be used later in the script. So I tried this:-
Declare
v_Today Date;
Begin
Select sysdate into v_Today from dual;
End;
Select v_Today from Dual;
That tells me that it encountered the keyword Select, so it seem I can't just simply follow up the declare begin and end block with a query.
Some example seem to show that you can assign a variable, execute then use the variable. So I tried executing the Declare/Begin/End Block on it's own - that gave me message saying it ran successfully. Then I tried executing the subsequent Select v_Today from Dual, v_Today's not recognised, so clearly I've lost the value of the variable by splitting up the executions.
I feel like this should be trivially easy but I'm clearly not getting it. Can someone point me in the right direction?
Edit> Ah, finally figured it out. I can use the variables within the Begin and end but I can't just issue a select in there.
PL/SQL block is enclosed with BEGIN/END keywords. Once you leave the block you end your PL/SQL context. And enter plain SQL context where PL/SQL variables are not known.
In simple words :-)
This is correct what you have learned:
you need to select values into variables
variables must be declared in DECLARE part of PL/SQL block
if you want to return a variable to the client - e.g. to have it displayed - you need to use Oracle's package dbms_output.
Like this:
Declare
v_Today Date;
Begin
Select sysdate into v_Today from dual;
dbms_output.put_line(v_Today);
End;
You will not see a thing until you issue before PL/SQL block:
SET SERVEROUTPUT ON
This blog post can help: https://blogs.oracle.com/connect/post/building-with-blocks
Steve published whole series of posts for PL/SQL beginners. This is just the first one.
The variables in the Declare section can be used in the Begin and End block

Performance differents when using parameter or constants in PL/SQL

I have a performance problem.
First PL/SQL (most time never ends and OS database process is always over 90%):
DECLARE
myId nvarchar2(10) := '0;WF21izb0';
BEGIN
insert into MY_TABLE (select * from MY_VIEW where ID = myId);
END;
Second PL/SQL (ends with successfull result in 50s):
BEGIN
insert into MY_TABLE (select * from MY_VIEW where ID = '0;WF21izb0');
END;
select count(*) from MY_VIEW
is also a not ending call, there are a lot of table joins behind this view.
select count(*) from MY_VIEW where ID = '0;WF21izb0'
ends in 50s with count=60000.
Can somebody explain me the reason why my first PL/SQL is not finishing after 50s? What is the difference between using static string and declared parameter?
It boils down to what the DB engine knows about your data and your query, when preparing the query execution plan.
When a literal is placed in your query, it is a part of your query, so is known to the engine responsible for preparing the plan. It can take that literal value into account and decide on an execution plan, that is suitable, e.g. based on the DB data statistics (e.g. that this value is rare).
When you are using a PL/SQL variable, the actual query, for which the plan is determined, is different. It's something like:
insert into MY_TABLE (select * from MY_VIEW where ID = :param)
As you can see, the DB engine has now no information on the value, which will be used, when the query gets executed. So the best plan for such a scenario, is to prepare something, which is averagely good for most of the probable values (i.e. see what values in the DB will match this place most often, i.e. values that are prevalent).
If your data is unbalanced, and the '0;WF21izb0' value is rare (or even non-existent) in your data, a selective index may be used to narrow down, what needs to be processed, relatively soon in the critical parts of the execution plan. This plan will however backfire, when you'll use a value, which is all over the place - use of the index will be counter-productive. A better plan for such case may be a full table scan. Possibly the same one, which is used when executing select count(*) from MY_VIEW.
If you are faced with a scenario, where you do not know the filtering value upfront, you'll have to analyze the view code, and try to adjust it so it can be effectively used also for less "selective" values. You could try applying some optimizer hints to the query. You could also resign from using a view, and try your luck with a tabular function, where you can push your filtering predicates to the exact spots of the query, where they can be used most effectively.
Edit:
All in all, follow the advises from the question comments, and examine your execution plans and execution profile data. You should be able to find the culprit. From there it may not be obvious, what the solution is, but still, you know your data and relations much better than us.
I was checking some traces, but after reading the comment of APC and the answer of Hilarion i end up in this solution:
declare
sql_stmt VARCHAR2(200);
id VARCHAR2(10) := '0;WF21izb0';
BEGIN
sql_stmt := 'insert into MY_TABLE (select * from MY_VIEW where ID = :1)';
EXECUTE IMMEDIATE sql_stmt using id;
END;
This is done in 50s, and id can be now a function/procedure parameter.
Thanks for the comments.

how to run the stored procedure in batch mode or in run it in parallel processing

We are iterating 100k+ records from global temporary table.below stored procedure will iterate all records from glogal temp table one by one and has to process below three steps.
to see whether product is exists or not
to see whether product inside the assets are having the 'category' or not.
to see whether the assets are having file names starts with '%pdf%' or not.
So each record has to process these 3 steps and final document names will be stored in the table for the successful record. If any error comes in any of the steps then error message will be stored for that record.
Below stored procedure is taking long time to process Because its processing sequentially.
Is there any way to make this process faster in the stored procedure itself by doing batch process?
If it's not possible in stored procedure then can we change this code into Java and run this code in multi threaded mode? like creating 10 threads and each thread will take one record concurrently and process this code. I would be happy if somebody gives some pseudo code.
which approach is going to suggest?
DECLARE
V_NODE_ID VARCHAR2(20);
V_FILENAME VARCHAR2(100);
V_CATEGORY_COUNT INTEGER :=0;
FINAL_FILNAME VARCHAR2(2000);
V_FINAL_ERRORMESSAGE VARCHAR2(2000);
CURSOR C1 IS
SELECT isbn FROM GT_ADD_ISBNS GT;
CURSOR C2(v_isbn in varchar2) IS
SELECT ANP.NODE_ID NODE_ID
FROM
table1 ANP,
table2 ANPP,
table3 AN
WHERE
ANP.NODE_ID=AN.ID AND
ANPP.NODE_ID=ANP.NODE_ID AND
AN.NAME_ID =26 AND
ANP.CATEORGY='category' AND
ANP.QNAME_ID='categories' AND
ANP.NODE_ID IN(SELECT CHILD_NODE_ID
FROM TABLE_ASSOC START WITH PARENT_NODE_ID IN(v_isbn)
CONNECT BY PRIOR CHILD_NODE_ID = PARENT_NODE_ID);
BEGIN
--Iterating all Products
FOR R1 IN C1
LOOP
FINAL_FILNAME :='';
BEGIN
--To check whether Product is exists or not
SELECT AN.ID INTO V_NODE_ID
FROM TABLE1 AN,
TABLE2 ANP
WHERE
AN.ID=ANP.NODE_ID AND
ANP.VALUE in(R1.ISBN);
V_CATEGORY_COUNT :=0;
V_FINAL_ERRORMESSAGE :='';
--To check Whether Product inside the assets are having the 'category' is applied or not
FOR R2 IN C2(R1.ISBN)
LOOP
V_CATEGORY_COUNT := V_CATEGORY_COUNT+1;
BEGIN
--In this Logic Product inside the assets have applied the 'category' But those assets are having documents LIKE '%pdf%' or not
SELECT ANP.STRING_VALUE into V_FILENAME
FROM
table1 ANP,
table2 ANPP,
table3 ACD
WHERE
ANP.QNAME_ID=21 AND
ACD.ID=ANPP.LONG_VALUE
ANP.NODE_ID=ANPP.NODE_ID AND
ANPP.QNAME_ID=36 AND
ANP.STRING_VALUE LIKE '%pdf%' AND
ANP.NODE_ID=R2.NODE_ID;
FINAL_FILNAME := FINAL_FILNAME || V_FILENAME ||',';
EXCEPTION WHEN
NO_DATA_FOUND THEN
V_FINAL_ERRORMESSAGE:=V_FINAL_ERRORMESSAGE|| 'Category is applied for this Product But for the asset:'|| R2.NODE_ID || ':Documents[LIKE %pdf%] were not found ;';
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= V_FINAL_ERRORMESSAGE WHERE ISBN= R1.ISBN;
END;--Iterating for each NODEID
END LOOP;--Iterating the assets[Nodes] for each product of catgeory
-- DBMS_OUTPUT.PUT_LINE('R1.ISBN:' || R1.ISBN ||'::V_CATEGORY_COUNT:' || V_CATEGORY_COUNT);
IF(V_CATEGORY_COUNT = 0) THEN
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= 'Category is not applied to none of the Assets for this Product' WHERE ISBN= R1.ISBN;
END IF;
EXCEPTION WHEN
NO_DATA_FOUND THEN
UPDATE GT_ADD_ISBNS SET ERROR_MESSAGE= 'Product is not Found:' WHERE ISBN= R1.ISBN;
END;
-- DBMS_OUTPUT.PUT_LINE( R1.ISBN || 'Final documents:'||FINAL_FILNAME);
UPDATE GT_ADD_ISBNS SET FILENAME=FINAL_FILNAME WHERE ISBN= R1.ISBN;
COMMIT;
END LOOP;--looping gt_isbns
END;
You have a number of potential performance hits. Here's one:
"We are iterating 100k+ records from global temporary table"
Global temporary tables can be pretty slow. Populating them means writing all that data to disk; reading from them means reading from disk. That's a lot of I/O which might be avoidable. Also, GTTs use the temporary tablespace so you may be in contention with other sessions doing large sorts.
Here's another red flag:
FOR R1 IN C1 LOOP
... FOR R2 IN C2(R1.ISBN) LOOP
SQL is a set-based language. It is optimised for joining tables and returning sets of data in a highly-performative fashion. Nested cursor loops mean row-by-row processing which is undoubtedly easier to code but may be orders of magnitude slower than the equivalent set operation would be.
--To check whether Product is exists or not
You have several queries selecting from the same tables (AN, 'ANP) using the same criteria (isbn`). Perhaps all these duplicates are the only way of validating your business rules but it seems unlikely.
FINAL_FILNAME := FINAL_FILNAME || V_FILENAME ||',';
Maybe you could rewrite your query to use listagg() instead of using procedural logic to concatenate a string?
UPDATE GT_ADD_ISBNS
Again, all your updates are single row operations instead of set ones.
"Is there any way to make this process faster in the stored procedure itself by doing batch process?"
Without knowing your rules and the context we cannot rewrite your logic for you, but 15-16 hours is way too long for this so you can definitely reduce the elapsed time.
Things to consider:
Replace the writing and reading to the temporary table with the query you use to populate it
Rewrite the loops to use BULK COLLECT with a high LIMIT (e.g. 1000) to improve the select efficiency. Find out more.
Populate arrays and use FORALL to improve the efficiency of the updates. Find out more.
Try to remove all those individual look-ups by incorporating the logic into the main query, using OUTER JOIN syntax to test for existence.
These are all guesses. If you really want to know where the procedure is spending the time - and that knowledge is the root of all successful tuning, so you ought to want to know - you should run the procedure under a PL/SQL Profiler. This will tell you which lines cost the most time, and those are usually the ones where you need to focus your tuning effort. If you don't already have access to DBMS_PROFILER you will need a DBA to run the install script for you. Find out more.
" can we change this code into Java and run this code in multi threaded mode?"
Given that one of the reasons for slowing down the procedure is the I/O cost of selecting from the temporary table there's a good chance multi-threading might introduce further contention and actually make things worse. You should seek to improve the stored procedure first.

How to find the column used in the dynamic query without executing whole query

Problem Statement
I have a dynamic SQL which i need to store in a table ,but before
storing the sql i need to validate the sql with the list of columns
stored in another table.
Without executing the query , is it possible to find name of columns in the select ?
Approach1
Only option i can think of is ,try to use explain plan of the query and read the meta data in the data dictionaries table .But unfortunately i am not able to find any table with such data.Please let me know if you know such views?
Approach2
Use DBMS_SQL.DESCRIBE_COLUMNS package to find the column name ,but i believe this will execute the whole query.
You don't need to execute the query to get the column names, you just need to parse it; e.g. as a simple example:
set serveroutput on
declare
l_statement varchar2(4000) := 'select * from employees';
l_c pls_integer;
l_col_cnt pls_integer;
l_desc_t dbms_sql.desc_tab;
begin
l_c := dbms_sql.open_cursor;
dbms_sql.parse(c=>l_c, statement=>l_statement, language_flag=>dbms_sql.native);
dbms_sql.describe_columns(c=>l_c, col_cnt=>l_col_cnt, desc_t=>l_desc_t);
for i in 1..l_col_cnt loop
dbms_output.put_line(l_desc_t(i).col_name);
end loop;
dbms_sql.close_cursor(l_c);
exception
when others then
if (dbms_sql.is_open(l_c)) then
dbms_sql.close_cursor(l_c);
end if;
raise;
end;
/
which outputs:
EMPLOYEE_ID
FIRST_NAME
LAST_NAME
EMAIL
PHONE_NUMBER
HIRE_DATE
JOB_ID
SALARY
COMMISSION_PCT
MANAGER_ID
DEPARTMENT_ID
PL/SQL procedure successfully completed.
You can do whatever validation you need on the column names inside the loop.
Bear in mind that you'll only see (and validate) the column names or aliases for column expressions, which won't necessarily reflect the data that is actually being retrieved. Someone could craft a query that pulls any data from anywhere it has permission to access, but then gives the columns/expression aliases that are considered valid.
If you're trying to restrict access to specific data then look into other mechanisms like views, virtual private database, etc.
DBMS_SQL.PARSE will not execute a SELECT statement but it will execute a DDL statement. If the string 'select * from employees' is replaced by 'drop table employees' the code will fail but the table will still get dropped.
If you're only worried about the performance of retrieving the metadata then Alex Poole's answer will work fine.
If you're worried about running the wrong statement types then you'll want to make some adjustments to Alex Poole's answer.
It is surprisingly difficult to tell if a statement is a SELECT instead of something else. A simple condition checking that the string begins with select will work 99% of the time but getting from 99% to 100% is a huge amount of work. Simple regular expressions cannot keep up with all the different keywords, comments, alternative quoting format, spaces, etc.
/*comment in front -- */ select * from dual
select * from dual
with asdf as (select * from dual) select * from asdf;
((((((select * from dual))))));
If you need 100% accuracy I recommend you use my open source PLSQL_LEXER. Once installed you can reliably test the command types like this:
select
statement_classifier.get_command_name(' /*comment*/ ((select * from dual))') test1,
statement_classifier.get_command_name('alter table asdf move compress') test2
from dual;
TEST1 TEST2
----- -----
SELECT ALTER TABLE

How to retrieve parsed dynamic pl Sql

I have many PL/SQL functions and procedures that execute dynamic sql.
Is it possible to extract the parsed statements and dbms_output as an debugging aid ?
What I really want is to see the parsed sql (sql statement with substituted parameters).
Example:
I have a dynamic SQL statement like this
SQ:='SELECT :pComno as COMNO,null t$CPLS,t$CUNO,t$cpgs,t$stdt,t$tdat,t$qanp,t$disc,:cS Source FROM BAAN.TTDSLS031'||PCOMNO --1
|| ' WHERE' ||' TRIM(T$CUNO)=trim(:CUNO)' --2
|| ' AND TRIM(T$CPGS)=trim(:CPGS)' --3
|| ' AND T$QANP = priceWorx.fnDefaultQanp ' --4
|| ' AND priceWorx.fdG2J(sysdate) between priceWorx.fdG2J(t$stdt) and priceWorx.fdG2J(t$tdat)' --5
|| ' AND rownum=1 order by t$stdt';--6
execute immediate SQ into R using
PCOMNO,'C' --1
,PCUNO-- 2
,PCPGS;-- 3
What will be the statement sent to the server ?
You can display the bind variables associated with a SQL statement like this:
select v$sql.sql_text
,v$sql_bind_capture.*
from v$sql_bind_capture
inner join v$sql on
v$sql_bind_capture.hash_value = v$sql.hash_value
and v$sql_bind_capture.child_address = v$sql.child_address
--Some unique string from your query
where lower(sql_text) like lower('%priceWorx.fdG2J(sysdate)%');
You probably would like to see the entire query, with all the bind variables replaced by their actual values. Unfortunately, there's no easy way to get exactly what you're looking for, because of the following
issues.
V$SQL_BIND_CAPTURE doesn't store all of the bind variable information. The biggest limitation is that it only displays data "when the bind variable is used in the WHERE or HAVING clauses of the SQL statement."
Matching the bind variable names from the bind capture data to the query is incredibly difficult. It's easy to get it working 99% of the time, but that last 1% requires a SQL and PL/SQL parser, which is basically impossible.
SQL will age out of the pool. For example, if you gather stats on one of the relevant tables, it may invalidate all queries that use that table. You can't always trust V$SQL to have your query.
Which means you're probably stuck doing it the ugly way. You need to manually store the SQL and the bind variable data, similar to what user1138658 is doing.
You can do this with the dbms_output package. You can enable and disable the debug, and get the lines with get_line procedure.
I tested with execute immediate, inserting in a table and it works.
I recently answered another question with a example of using this.
One possible solution of this is to create a table temp(id varchar2,data clob); in your schema and then put the insert statement wherever you want to find the parsed key
insert into temp values(seq.nextval,v_text);
For example
declare
v_text varchar2(2000);
begin
v_text:='select * from emp'; -- your dynamic statement
insert into temp values(seq.nextval,v_text); --insert this script whenever you want to find the actual query
OPEN C_CUR FOR v_text;
-----
end;
Now if you see the table temp, you'll get the data for that dynamic statement.

Resources