Is it possible to perform parallel nested transactions? - oracle

I've a scenario were I need to call a set of different Oracle procedures in parallel. These procedures must share the same initial context which has uncommitted transactions. I cannot commit the parent transaction under the danger of having read inconsistency between those parallel processes.
Is it possible in PL/SQL?

One thing comes to my mind: mapreduce with table functions http://blogs.oracle.com/datawarehousing/entry/mapreduce_oracle_tablefunction
I've used this in several scenarios to run things concurrently, though I'm not sure it is applicable to your problem.

You may be able to accomplish this using the DBMS_XA package, which allows you to "switch or share transactions across SQL*Plus sessions or processes using PL/SQL".
Oracle-Base has a good example of how to use the package.
(But if your goal is to use parallelism to improve performance you should use normal statement-level parallel execution instead.)

Far as I know: no.
DBMS_JOB and DBMS_SCHEDULER can be used to run Oracle procedures in parallel but they run them in their own sessions.

Related

Run/execute multiple procedures in Parallel - Oracle PL/SQL

I have an Activity table which is getting all the table events of the system. Events like new orders, insertion/deletion on all the system tables will be inserted into this table. So, the no of events/sec is really huge for Activity table.
Now, I want to process the incoming events based on the business logic depending on the table responsible for raising the event. Every table may have different procedure to do the processing.
I used the same link
Parallelizing calls in PL/SQL
As a solution I have created multiple dbms_scheduler jobs which will be called at the same time. All these jobs (JOB1, JOB2--- - -JOB10) will have the same procedure (ProcForAll_Processing) as JOB_ACTION to achieve parallel processing.
begin
dbms_scheduler.run_job('JOB1',false);
dbms_scheduler.run_job('JOB2',false);
end;
ProcForAll_Processing: This procedure in turn will call 6 other procedures
Proc1,proc2,proc3 --- -- - -- - Proc6 in sequential manner. I want to achieve parallel processing for these as well.
P.S: We can’t create further jobs to achieve parallel processing in ProcForAll_Processing proc as it may lead to consume further resources and also DBA is not agreeing for creating further jobs. Also, I can't use
dbms_parallel_execute for parallel processing.
Please help me as I am really stuck to get it done
It is impossible in general case without jobs, and it will make multiple sessions for this. There is no such thing as multithreading PL\SQL with a few exceptions. One of them is parallel execution of sql statements [1]. So there are some attempts to abuse this stuff for parallel execution of PL\SQL code, for example try to look here [2].
But as i've said it's abuse IMHO.
Reference:
https://docs.oracle.com/cd/B19306_01/server.102/b14223/usingpe.htm
http://www.williamrobertson.net/documents/parallel-plsql-launcher.html
Get a new DBA. Or even better, cut them out of any decision making processes. A DBA should not review your code and should not tell you to not create jobs, unless there is a good, specific reason.
Using DBMS_SCHEDULER to run things in parallel is by far the easiest and most common way to achieve this result. Of course it's going to consume more resources, that's what parallelism will inevitably do.
Another, poorer option, is to use parallel pipelined table functions. It's an advanced PL/SQL feature that can't be easily explained in a simple example. The best I can do is refer you to the manual.
You should try to use DBMS_PARALLEL_EXECUTE (since RDBMS 11).
https://blogs.oracle.com/warehousebuilder/entry/parallel_processing_in_plsql
https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2

How does the PL/SQL context work?

i have some doubts about the PL/SQL context, are there:
the PL/SQL context is static ?
the PL/SQL context is sync ?
if a procedure was called two times at the same time, the first one takes 20 seconds to complete.. will the second one wait this 20 seconds to start its execution ?
thanks.
Each database session that references a package has an independent instance of the package. All package state (i.e. global package variables) is distinct to each session.
There is no synchronization between multiple sessions invoking the same package procedures or functions -- except what might occur as a natural side effect of the database operations that they perform and the locking required to achieve them.
Your questions are a bit difficult to understand. I'll try to answer it any way.
PL/SQL is a procedural language. So I wouldn't talk about instances. The code only exists once, the package variables exist once per session and the local variables exist once per call of the procedure or function. You cannot access aonther session's variables or memory.
All calls to PL/SQL code are synchronuous. There are no concepts like multi-threading or shared memory (in PL/SQL). Note however, that Oracle is a multi-user system. So other sessions might be changing data in the database at the same time. And many of these changes a temporarily hidden from you due to transaction isloation. But it doesn't influence any variables in memory.
Procedures never block unless they try to change the same database row as another session. But this isn't related to any piece of PL/SQL code and can also be experienced with an SQL command run from a different tool.

One data store. Multiple processes. Will this SQL prevent race conditions?

I'm trying to create a Ruby script that spawns several concurrent child processes, each of which needs to access the same data store (a queue of some type) and do something with the data. The problem is that each row of data should be processed only once, and a child process has no way of knowing whether another child process might be operating on the same data at the same instant.
I haven't picked a data store yet, but I'm leaning toward PostgreSQL simply because it's what I'm used to. I've seen the following SQL fragment suggested as a way to avoid race conditions, because the UPDATE clause supposedly locks the table row before the SELECT takes place:
UPDATE jobs
SET status = 'processed'
WHERE id = (
SELECT id FROM jobs WHERE status = 'pending' LIMIT 1
) RETURNING id, data_to_process;
But will this really work? It doesn't seem intuitive the Postgres (or any other database) could lock the table row before performing the SELECT, since the SELECT has to be executed to determine which table row needs to be locked for updating. In other words, I'm concerned that this SQL fragment won't really prevent two separate processes from select and operating on the same table row.
Am I being paranoid? And are there better options than traditional RDBMSs to handle concurrency situations like this?
As you said, use a queue. The standard solution for this in PostgreSQL is PgQ. It has all these concurrency problems worked out for you.
Do you really want many concurrent child processes that must operate serially on a single data store? I suggest that you create one writer process who has sole access to the database (whatever you use) and accepts requests from the other processes to do the database operations you want. Then do the appropriate queue management in that thread rather than making your database do it, and you are assured that only one process accesses the database at any time.
The situation you are describing is called "Non-repeatable read". There are two ways to solve this.
The preferred way would be to set the transaction isolation level to at least REPEATABLE READ. This will mean that any row that concurrent updates of the nature you described will fail. if two processes update the same rows in overlapping transactions one of them will be canceled, its changes ignored, and will return an error. That transaction will have to be retried. This is achieved by calling
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
At the start of the transaction. I can't seem to find documentation that explains an idiomatic way of doing this for ruby; you may have to emit that sql explicitly.
The other option is to manage the locking of tables explicitly, which can cause a transaction to block (and possibly deadlock) until the table is free. Transactions won't fail in the same way as they do above, but contention will be much higher, and so I won't describe the details.
That's pretty close to the approach I took when I wrote pg_message_queue, which is a simple queue implementation for PostgreSQL. Unlike PgQ, it requires no components outside of PostgreSQL to use.
It will work just fine. MVCC will come to the rescue.

Event-driven programming in PL/SQL

I have two PL/SQL systems, residing in two separate databases. SystemA will need to populate SystemB's tables. This will probably be done over a datalink. Everytime a set of records is inserted in SystemB's tables, a process in SystemB must run. I could wait for SystemA to complete and then run a script to start processing in SystemB, but since SystemA could spend many hours processing and then populating SystemB, I'd rather that SystemB handle each set of records as soon as they become available (each set can be processed indpendently of the others so this should work OK).
What I'm not sure of is how I can do even-driven programming in PL/SQL. I'd need SystemA to notify SystemB that a set is ready for processing. My first idea was to have a special "event" table in SystemB and then when SystemA finishes a set, it inserts into the "event" table and there is a trigger on insert that starts the process (and the process could be a long one, possibly 5-10 minutes per process) in SystemB. I don't have enough experience with triggers in Oracle to know if this is an established way of doing it, OR if there's a better mechanism. Suggestions? Tips? Advice?
Use Oracle Advanced Queuing; it's designed for this. I believe you'll still have to set up a database link between the two systems (from B to A in this case, to consume the queue on A).
Yes, Oracle Advance Queues or even having A submit a venerable Oracle Job to B would be a better idea.
And, if your process is going to be needing complete replication of the data from A to B, then you might want to look something like an Oracle Streams process to copy over the data and then do the processing.

Can We use threading in PL/SQL?

Is there any feature of asynchronous calling in PL/SQL?
Suppose I am in a block of code would like to call a procedure multiple times and wouldn't bother when and what the procedure returns?
BEGIN
myProc(1,100);
myProc(101,200);
myProc(201,300);
...
...
END;
In the above case, I don't want my code to wait for myProc(1,100) to finish processing before executing(101,200)
Thanks.
+1 for DBMS_SCHEDULER and DBMS_JOB approaches, but also consider whether you ought to be using a different approach.
If you have a procedure which executes in a row-by-row manner and you find that it is slow, the answer is probably not to run the procedure multiple times simltaneously but to ensure that a set-based aproach is used instead. At an extreme you can even then use parallel query and parallel DML to reduce the wall clock time of the process.
I mention this only because it is a very common fault.
Submit it in a DBMS_JOB like so:
declare
ln_dummy number;
begin
DBMS_JOB.SUBMIT(ln_dummy, 'begin myProc(1,100); end;');
DBMS_JOB.SUBMIT(ln_dummy, 'begin myProc(101,200); end;');
DBMS_JOB.SUBMIT(ln_dummy, 'begin myProc(201,300); end;');
COMMIT;
end;
You'll need the job_queue_processes parameter set to >0 to spawn threads to process the jobs. You can query the jobs by examining the view user_jobs.
Note that this applies to Oracle 9i, not sure what support 10g has. See more info here.
EDIT: Added missed COMMIT
You may want to look into DBMS_SCHEDULER.
Edited for completeness:
DMBS_SCHEDULER is available on Oracle 10g. For versions before this, DBMS_JOB does approximately the same job.
For more information, see: http://download.oracle.com/docs/cd/B12037_01/server.101/b10739/jobtosched.htm
For PL/SQL Parallel processing you have the following options:
DBMS_SCHEDULER (newer)
DBMS_JOB (older)
Parallel Query
These will let you "emulate" forking and threading in PL/SQL. Of course, using these, you may realize the need to communicate between parallel executed procedures. To do so check out:
Advanced Queuing
DBMS_ALERT
DBMS_PIPE
Personally I've implemented a parallel processing system using DBMS_Scheduler, and used DBMS_Pipe to communicate between "threads". I was very happy with the combination of the two, and my main goal (to reduce major processing times with a particular heavy-weight procedure) was achieved!
You do have another option starting in 11g. Oracle has introduced a package that does something similar to what you want to do, named DBMS_PARALLEL_EXECUTE
According to them, "The DBMS_PARALLEL_EXECUTE package enables the user to incrementally update table data in parallel". A fairly good summary of how to use it is here
Basically, you define a way that Oracle should use to break your job up into pieces (in your case by you seem to be passing some key value), and then it will start each of the pieces individually. There is certainly a little planning and a little extra coding involved in order to use it, but nothing that you shouldn't have been doing anyways.
The advantage of using a sanctioned method such as this is that Oracle even provides database views that can be used to monitor each of the independent threads.
Another way of doing parallel (multi-threaded) PL/SQL is shown here:
http://www.williamrobertson.net/documents/parallel-plsql-launcher.html
The disadvantage of using dbms_job or dbms_schedular is that you don't really know when your tasks are finished. I read that you don't bother but maybe you will change your mind in the future.
EDIT:
This article http://www.devx.com/dbzone/10MinuteSolution/20902/0/page/1 describes another way. It uses dbms_job and dbms_alert. The alerts are used to signal that the jobs are done (callback signal).
Here an explanation of different ways of unloading data to a flat file. One of the ways shows how you can do parallel execution with PL/SQL to speed things up.
http://www.oracle-developer.net/display.php?id=425
The parallel pipelined approach listed here by askTom provides a more complex approach, but you will actually pause until the work is complete, unlike the DBMS Job techniques. That said, you did ask for the "asynchronous" technique, and DBMS_JOB is perfect for that.
Have you considered using Oracle Advaned Queuing?

Resources