It happens when we extract data from oracle database 11g/12c that the table we are extracting from is busy for hours (i.e there is an exclusive lock already taken over it so we can't extract data). We don't want to wait in the parallel queue forever.. In the same time we want that extracting query to take the full time it needs to extract data (it can take hours in extraction given that there is actual extraction is taking place, not that we are waiting for other queries to finish). Is there an oracle property to do so?
In other words, I want something like setting wait_timeout property in mysql: https://askubuntu.com/a/892859/765684
Thank you.
Related
Let's say there is a job A which executes a Python to connect to Oracle, fetch the data from Table A and load the data into Snowflake once a day. Application A dependent on Table A in Snowflake can just depend on the success of job A for further processing, this is easy.
But if the data movement is via Replication (Change Data Capture from Oracle moves to s3 using Golden Gate, pipes pushes into stage, stream to target using Task every few mins) - what is the best way to let Application A know that the data is ready? How to check if the data is ready? is there something available in Oracle, like a table level marker that can be moved over to Snowflake? Table's in Oracle cannot be modified to add anything new, marker rows also cannot be added - these are impractical. But something that Oracle provides implicitly, which can be moved over to Snowflake or some SCN like number at the table level that can be compared every few minutes could be a solution, eager to know any approaches.
I'm reading data from a table in Sybase using a Table Input step. The query is really simple:
SELECT person_ref, displayname FROM person
That table has about 2 million rows. I'm connecting to Sybase ASE 12. My user has read-only rights. PDI is using the jconnect driver with the following options:
IMPLICIT_CURSOR_FETCH_SIZE=5000
SELECT_OPENS_CURSOR=True
I've also tried using the noholdlock option on that query to change the isolation level.
The problem is that the query seems to remain idle for a long time, nearly a minute. PDI indicates that the step is in idle state for that time and then changes to Running. This makes it hard to measure the time the process takes, because PDI won't start measuring time until the steps change state from idle.
I can't seem to find anything in the manuals, or any option that will speed up the read time by decreasing or eliminating this idel time. Is there any option I'm missing? Does the idle status mean that PDI is just waiting for a response from Sybase?
Maybe your query is long to retreive the data.
The latence time is in the jdbc architecture. It sends the query to the database, who stores the data in a buffer. Only when this buffer is full, the data is transferred back to PDI. Until it receives some data, the Input table is in idle mode.
If you want to measure the time including the idle time, put a step that will fire without any latency, for example a Generate row (1 row is enough) step. You do not need to connect this step to any thing, as the PDI will start all the steps in parallel as soon as possible.
You won't see the total result on the Input table row of the the Step Metrics bottom tab. But you will have the result on the Metrics.
You can also use a Block this step until steps finish. You have an example in the sample directory that was shipped with your distribution. Open youKettleInstallDir/sample/transformation/Block this step until steps finish.ktr, and replace the top row with your flow. Then watch the statistics of the blocking step.
In my opinion, you have another step in your transformation locking the tables person. There is an overwhelming probability that you have a Output table step trying to truncate the table person.
I don't know if this is what I would call an answer, but I definitely found a way to get the Sybase connection to respond quickly. There's a querying tool called Sybase anywhere, that you can use to query the DB directly. What I did was look into an installation in a separate machine that had a good connection.
That machine had an ODBC connection defined for the Sybase DB, and the install of the client tool had its own version of Sybase drivers, along with some DLL files. I tool the jars and dll's and put them in the machine that had PDI installed. I made sure they were all in the classpath, and created a generic JDBC connection that pointed to the system ODBC one. It's going at the speed you would expect now.
I am writing some data loading code that pulls data from a large, slow table in an oracle database. I have read-only access to the data, and do not have the ability to change indexes or affect the speed of the query in any way.
My select statement takes 5 minutes to execute and returns around 300,000 rows. The system is inserting large batches of new records constantly, and I need to make sure I get every last one, so I need to save a timestamp for the last time I downloaded the data.
My question is: If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?
My gut tells me that the answer is 'no', especially since a large portion of those 5 minutes is just the time spent on the data transfer from the database to the local environment, but I can't find any direct documentation on the scenario.
"If my select statement is running for 5 minutes, and new rows get inserted while the select is running, will I receive the new rows or not in the query result?"
No. Oracle enforces strict isolation levels and does not permit dirty reads.
The default isolation level is Read Committed. This means the result set you get after five minutes will be identical to the one you would have got if Oracle could have delivered you all the records in 0.0000001 seconds. Anything committed after you query started running will not be included in the results. That includes updates to the records as well as inserts.
Oracle does this by tracking changes to the table in the UNDO tablespace. Provided it can restrict the original image from that data your query will run to completion; if for any reason the undo information is overwritten your query will fail with the dreaded ORA-1555: Snapshot too old. That's right: Oracle would rather hurl an exception than provide us with an inconsistent result set.
Note that this consistency applies at the statement level. If we run the same query twice within the one transaction we may see two different results sets. If that is a problem (I think not in your case) we need to switch from Read Committed to Serialized isolation.
The Concepts Manual covers Concurrency and Consistency in great depth. Find out more.
So to answer your question, take the timestamp from the time you start the select. Specifically, take the max(created_ts) from the table before you kick off the query. This should protect you from the gap Alex mentions (if records are not committed the moment they are inserted there is the potential to lose records if you base the select on comparing with the system timestamp). Although doing this means you're issuing two queries in the same transaction which means you do need Serialized isolation after all!
The situation is simple, there is a table in oracle used as a "shared table" for data exchange. The table structure and number of records remains unchanged. In normal case, I continuously update data into this table and other process read this table for current data.
Strange thing is, when my process starts, the time consumption of each update statement execution is approximately 2 ms. And after a certain peroid of time(like 8 hours), the time consumption increased to 10 ~ 20 ms per statement. It makes the procedure quite slow.
the structure of table
and the update statement is like:
anaNum = anaList.size();
qry.prepare(tr("update YC set MEAVAL=:MEAVAL, QUALITY=:QUALITY, LASTUPDATE=:LASTUPDATE where YCID=:YCID"));
foreach(STbl_ANA ana, anaList)
{
qry.bindValue(":MEAVAL",ana.meaVal);
qry.bindValue(":QUALITY",ana.quality);
qry.bindValue(":LASTUPDATE",QDateTime::fromTime_t(ana.lastUpdate));
qry.bindValue(":YCID",ana.ycId);
if(!qry.exec())
{
qWarning() << QObject::tr("update yc failed, ")
<< qry.lastError().databaseText() << qry.lastError().driverText();
failedAnaList.append(ana);
}
}
the update statement using qt interface
There is many reasons which can cause orcle opreation slowd down, but I cannot find a clue to explain this.
I never start a transaction manually in qt code, which means the commit operation is executed every time after update statement.
The update frequency is about 200 records per second, but the number is dynamically changed by time. It maybe increase to 1000 in one time and drop to 10 in next time.
once the time consumption up to 10 ~ 20 ms per statement, it'll never dorp down. time consumption can be restored to 2ms only be restart oracle service.(it's useless to shutdown or restart any user process which visit orcle)
Please tell me how to solve it or at least what to be examined.
Good starting points is to check the AWR and ASH reports.
Comparing the reports in "good" and "bad" times you can spot the cause of the change. This can be for example a change of an execution plan or increase of wait events. One possible outcome is that only change you see is that the database is waiting more time on the client (i.e. the problem is not in the DB).
Anyway as diagnosed in other answer, the root cause of problems seems to be the update in a loop. If your update lists are long (say more that 10-100 entries) you can profit by updating the whole list in a single statement using MERGE.
build a collection from your list
cast the collection as TABLE
use this table in a MERGE statement to update the rows.
See here for details.
You can trace the session while it is running quickly and again later when it is running slowly. Use the sql trace functionality and tkprof to get a breakdown of where the update is spending its time in each case and see what has changed.
https://docs.oracle.com/cd/E25178_01/server.1111/e16638/sqltrace.htm#i4640
If you need help interpreting the results you can update your question or ask a new one.
Secondly, as a rule single record updates are not the best way to do updates in Oracle. Since you have many records to update already prepared before you prepare the query, look at execBatch.
https://doc.qt.io/qt-4.8/qsqlquery.html#execBatch
This will both execute the update faster and only issue a single commit.
I have an Oracle DB of roughly 20 million record. I used the BatchInserter to insert the data in my model.
The problem is that I have to loop over a result set containing the whole 20 million data to get the needed properties to insert. But it take too long time to do just the loop process.
Anyone tried something like that? and What is the best way to do it in an optimum time?
Can you share more details? Where do you have to loop?
Check http://neo4j.org/develop/import for some options.
If you have JDBC you can also drive the import directly from your JDBC results.
Just loop twice over the results, once for nodes and once for rels.