I have a db job running daily that manages to process 10.000 rows from a table of 3.500.000 rows, in three hours.
Tuning the main cursor's select statement can only save me 30 minutes, but I need to reduce the job running time from 3 hours to 10-15 minutes.
I have to state that there is only the main loop for the cursor and for each record there are calls to external systems, in order to get or send data, so this is an overhead I cannot control. The time for each record to be processed after it is fetched is a little less than a second and that is not acceptable ...
Is there something I could do? All ideas are more than welcome!
Imho, you can submit job for each query to external system or try to run in parallel, may be you can use ADVANSED QUEUE. Explain: send each selected row to queue, and quering to external will proceed with AQ
You may try to process rows in parallel.
Related
I need to create several 10 million jobs.
I have tried it with for-loops and Bus::batch([]) and unfortunately the creation of the jobs takes longer than the processing of the jobs by the 10 servers/workers. That means the workers have to wait until the job shows up in the database (redis etc). With redis-benchmark I could learn that Redis is not the problem.
Anyway... is there a way to create jobs in BULK (not batch)? I'm just thinking of something like:
INSERT INTO ... () VALUES (), (), (), (), ...
Anyway, creating several million jobs in a for-loop or in batch seems to be very slow for some reason. Probably because it's always just 1 query at a time and not an "upsert".
For any help I would be very grateful!
Writing a million records will be kind of slow at any condition. I'd recommend maximize your queue performance using several methods:
Create job that will create all other jobs if possible
Use only QUEUE_CONNECTION=redis for your queues as redis stores data in RAM which is fastest possible.
Create your jobs after response was processed already
An Oracle stored procedure suddenly throws ORA-01555 while executing.
select
a,b
from table1 S into a_var,b_var
where s.abc=systedate
and requiedate between add_months(sysdate,-2) and sysdate
AND Currency= NVL(currency_CODe,USD)
group by S.actcount;
table1_invoice(1)=a_var;
table1_invoice(2)=b_var;
FORALL indx in 1..test.count SAVE EXCEPTIONS
insert into table2 values
table1_invoice(indx);
When the procedure was running and using table A, I executed an index re-build in parallel on the same table.
Once that completed, I executed gather stats on table A.
Does this things create error ORA-01555? Does rebuild index consume a rollback segment and the old snapshot of uncommitted data is removed?
I have pasted dummy code.
I execute index re-build in parallel on the same table.
This is your likely cause. ORA-1555 pertains to being able to give you a consistent view of the data. For example, using your dummy code as a template:
You open your cursor at 9am
You start fetching from that cursor at 9am and lets say the total execution of the query takes 60 seconds.
So lets say you at the 40 second mark of that fetch. Because (you) reading data does not block others from changing it, you might come across some data that has been recently changed (say 3 seconds ago) by someone else.
We can't give you THAT data, because we have to show you the data as it was at 9am, (when your query started).
So we find the transaction(s) that changed that data 3 seconds ago, and use the undo information those transactions wrote to reverse out the changes. We'll continue to do that until the data now looks like it did at 9am
Now we can use that (undone) data because it is consistent with the time you opened the cursor.
So where does ORA-1555 fit in? What if our query ran for (say) an hour? Now we might need to be undo-ing other transactions that ran nearly an hour ago. There is only so much space we reserve for the undo for (completed) transactions, because we need to free it up for new transactions as they come in. We throw away the old stuff because those transactions have committed. So revisiting the processing above, the following might happen:
You open your cursor at 9am
You start fetching from that cursor at 9am and lets say the total execution of the query takes 60 seconds.
So lets say you at the 40 second mark of that fetch. Because (you) reading data does not block others from changing it, you might come across some data that has been recently changed (say 3 seconds ago) by someone else.
We can't give you THAT data, because we have to show you the data as it was at 9am, (when your query started).
So we find the transaction(s) that changed that data 3 seconds ago and use the undo information those transactions wrote to reverse out the changes.
OH NO! That undo information has been discarded!!!
Now we're stuck, because we cannot give you the data as it was at 9am anymore because we can't take some changed data all the way back to 9am. The snapshot in time of the data you want is too old.
Hence "ORA-1555: Snapshot too old"
This is why the most common solution is just to retry your operation because now you are starting your query at a more recent time.
So you can see - the more activity going on against the database from OTHER sessions at the time of your query, the greater the risk of hitting a ORA-1555 because undo space is being consumed quickly and thus we might throw away the older stuff more rapidly.
Env: Oracle 12c R2
Trying to understand what the best approach would be to set up an Oracle DBMS_SCHEDULER job that would be used to monitor a DBMS_ALERT trigger that checks when a specific column value changes within a table.
The thing is, this table column value change will sometimes occur on a frequent basis and sometimes it may only occur twice a day but I will need to monitor this column change via the DBMS_ALERT.
The trigger I have is as follows and I have a procedure called check_signal that checks for the signal that I wish to use within the DBMS_SCHEDULER job.
The goal that I am trying to achieve is that I am going to have the situation where I will need to run say, three jobs:
Job1
Job2
Job3
The thing is, the payload returned from Job1 is required and and passed as parameters into Job2 and again, the payload returned from Job2 is required and passed as parameters into Job3.
It is this wait/alert that I am trying to achieve through the use of DBMS_ALERTS.
create or replace trigger my_tab_upd after update of status on my_tab for each row
begin
dbms_alert.signal('mystatusalert', 'changed from '||:old.status||' to '||:new.status||'.');
end;
/
This will be used via a web-based application which is used by multiple users.
Just unsure on how to setup this scheduled job that will continuously check for the alert and then be used within the web app.
If there is a better means than DBMS_ALERT, then please let me know.
The general answer is simple, while polling for events every N seconds you get an average delay N/2 seconds and maximal delay of N seconds.
In context of DBMS_ALERT you should re-think this approach, as this will implement polling with wait on the event.
The periodically executed jobs make basically tho thinks:
DBMS_ALERT.REGISTER on an event name
wait with DBMS_ALERT.WAITONE
Assume that the DBMS_SCHEDULER jobs runs every 10 seconds and it is started in the phase with frequent signalling. So the first execution returns quickly after receiving an event.
The second execution falls in the quite period, so the job will wait hours to get an event.
I think this is not what you expect as
1) the waiting job will have an open session - what you want to avoid as follows from you other question
You may use timeout = 0 in the DBMS_ALERT.WAITONE, but this will return close to no events, except those fired accidentally between the REGISTER and WAITONE
2) if in the first 10 seconds two events are signalled, the second one will be lost as at the signaling time the subscribing job is not active and no registration exists.
The situation is simple, there is a table in oracle used as a "shared table" for data exchange. The table structure and number of records remains unchanged. In normal case, I continuously update data into this table and other process read this table for current data.
Strange thing is, when my process starts, the time consumption of each update statement execution is approximately 2 ms. And after a certain peroid of time(like 8 hours), the time consumption increased to 10 ~ 20 ms per statement. It makes the procedure quite slow.
the structure of table
and the update statement is like:
anaNum = anaList.size();
qry.prepare(tr("update YC set MEAVAL=:MEAVAL, QUALITY=:QUALITY, LASTUPDATE=:LASTUPDATE where YCID=:YCID"));
foreach(STbl_ANA ana, anaList)
{
qry.bindValue(":MEAVAL",ana.meaVal);
qry.bindValue(":QUALITY",ana.quality);
qry.bindValue(":LASTUPDATE",QDateTime::fromTime_t(ana.lastUpdate));
qry.bindValue(":YCID",ana.ycId);
if(!qry.exec())
{
qWarning() << QObject::tr("update yc failed, ")
<< qry.lastError().databaseText() << qry.lastError().driverText();
failedAnaList.append(ana);
}
}
the update statement using qt interface
There is many reasons which can cause orcle opreation slowd down, but I cannot find a clue to explain this.
I never start a transaction manually in qt code, which means the commit operation is executed every time after update statement.
The update frequency is about 200 records per second, but the number is dynamically changed by time. It maybe increase to 1000 in one time and drop to 10 in next time.
once the time consumption up to 10 ~ 20 ms per statement, it'll never dorp down. time consumption can be restored to 2ms only be restart oracle service.(it's useless to shutdown or restart any user process which visit orcle)
Please tell me how to solve it or at least what to be examined.
Good starting points is to check the AWR and ASH reports.
Comparing the reports in "good" and "bad" times you can spot the cause of the change. This can be for example a change of an execution plan or increase of wait events. One possible outcome is that only change you see is that the database is waiting more time on the client (i.e. the problem is not in the DB).
Anyway as diagnosed in other answer, the root cause of problems seems to be the update in a loop. If your update lists are long (say more that 10-100 entries) you can profit by updating the whole list in a single statement using MERGE.
build a collection from your list
cast the collection as TABLE
use this table in a MERGE statement to update the rows.
See here for details.
You can trace the session while it is running quickly and again later when it is running slowly. Use the sql trace functionality and tkprof to get a breakdown of where the update is spending its time in each case and see what has changed.
https://docs.oracle.com/cd/E25178_01/server.1111/e16638/sqltrace.htm#i4640
If you need help interpreting the results you can update your question or ask a new one.
Secondly, as a rule single record updates are not the best way to do updates in Oracle. Since you have many records to update already prepared before you prepare the query, look at execBatch.
https://doc.qt.io/qt-4.8/qsqlquery.html#execBatch
This will both execute the update faster and only issue a single commit.
what's good for Oracle DBMS_Scheduler?
keeping a job scheduled(disabled) every time. and enable it and run it when needed.
create the job ,run it and drop it.
I have a table x and whenever a records gets submitted to that table ,I should have a job to process that record.
we may or may not have the record insertions always..
Keeping this in mind..what's better...?
Processing rows as they appear in a table in an asynchronous process can be done in a number of different ways, choose the way that suits you:
Add a trigger to the table which creates a one-off job to process the row using DBMS_JOB. This is suitable if the volume of data being inserted to the table is quite low, and you don't want your job running all the time. The advantage of DBMS_JOB is that the job will not start until the insert is committed; if it is rolled back, the job is also rolled back so doesn't run. The disadvantage is that if there is a sustained spike of activity, all the jobs created will crowd out any other jobs that are running.
Create a single job using DBMS_SCHEDULER which runs regularly, polls the table for new records and processes them. This method would need a column on the table that it can update to mark each record as "processed". For example, add a VARCHAR2(1) flag which is set to 'Y' on insert and set to NULL by the job after processing. You could add an index to that flag which will only store entries for unprocessed rows (so it will be small and fast). This method is much more efficient, especially for large data volumes, because each run of the job can effectively process large chunks of data in bulk at a time.
Use Oracle Advanced Queueing. http://docs.oracle.com/cd/E11882_01/server.112/e11013/aq_intro.htm#ADQUE0100
For (1), a separate job is created for each record in the table. You don't need to create the jobs. You do need to monitor them, however; if one fails, you would need to investigate and re-run manually.
For (2), you just create one job and let it run regularly. If one record fails, it could be picked up by the next iteration of the job. I would process each record in a separate transaction so the failure of one record doesn't affect the failure of other records still in the queue.
For (3), you still create a job like (2) but instead of reading the table it pulls requests off a queue.