In case of batch read in JDBC using setFetchSize, if we apply timeout to statement then is it batch level timeout? - jdbc

Statement statement = connection.prepareStatement("SELECT * FROM BOOKS");
statement.setFetchSize(100);
statement.setQueryTimeout(10);//Timeout of 10 seconds
here is the timeout for per batch or total records to be fetched?

The JDBC API documentation of Statement.setQueryTimeout says:
Sets the number of seconds the driver will wait for a Statement object
to execute to the given number of seconds. By default there is no
limit on the amount of time allowed for a running statement to
complete. If the limit is exceeded, an SQLTimeoutException is
thrown. A JDBC driver must apply this limit to the execute,
executeQuery and executeUpdate methods.
Note: JDBC driver implementations may also apply this limit to ResultSet methods (consult your driver vendor documentation for
details).
Note: In the case of Statement batching, it is implementation defined as to whether the time-out is applied to individual SQL
commands added via the addBatch method or to the entire batch of SQL
commands invoked by the executeBatch method (consult your driver
vendor documentation for details).
In other words, this will depend on the database and driver you use.
Some drivers might apply it to execute only, and fetching rows after execute might not be subject to the timeout, some might apply it to individual fetches triggered by ResultSet.next() as well, and yet others might even use the wall clock time since query execution started and forcibly close the cursor if query execution and fetching and processing rows takes longer than the specified timeout (which could trigger the timeout even if the server is quick to respond, but it is the client that is taking a long time to process the entire result set).

Related

JDBC batch execution guarantees

We're trying to execute a batch insert into Azure Synapse (formerly Azure SQL Data warehouse). Problems are:
Performance is abysmal (~1 second for insertion of one row of less than 2KB and 20-25 columns)
It scales linearly (~90 seconds for 100 rows I think)
We're using standard JDBC batch insertion pattern addBatch() & executeBatch() with PreparedStatements (https://stackoverflow.com/a/3786127/496289).
We're using JDBC driver provided by Microsoft.
We know what's wrong, in DB telemetry it's clear that DB is breaking the batch down and more or less running it as if it's in a for-loop. No batch "optimization".
Curiously, when the underlying data source is SQL Server, batch scales as expected.
Question is: Is there nothing in standard/spec that says executeBatch() should scale better than linearly?
E.g. JDBC™ 4.3 Specification (JSR 221) says it can improve performance, not it must.
CHAPTER 14 Batch Updates
The batch update facility allows multiple SQL statements to be submitted to a data source for processing at once. Submitting multiple SQL statements, instead of individually, can greatly improve performance. Statement, PreparedStatement, and CallableStatement objects can be used to submit batch updates
14.1.4 PreparedStatement Objects has no such explicit/implied statement to say batch mechanism is for better performance.
Should probably add that Azure Synapse is capable to loading 1 trillion rows of data (~450 GB in Parquet format) from Data lake in 17-26 minutes with 500 DWUs.
The JDBC specification doesn't require any kind of optimization for batch execution. In fact, not all databases support batch execution. A conforming JDBC driver is expected to implement batch execution whether or not the underlying database system supports it.
If the database system doesn't support it, the JDBC driver will simulate batch execution by repeatedly executing the statement in a loop. Such an implementation will not perform better than manually executing the statement repeatedly.
This is also why the text you quote says "can greatly improve performance" and not will or must.

Oracle db table data loading is too slow in DBeaver

I'm using DBeaver to connect to an Oracle database. Database connection and table properties view functions are working fine without any delay. But fetching table data is too slow(sometimes around 50 seconds).
Any settings to speed up fetching table data in DBeaver?
Changing following settings in your oracle db connection will be faster fetching table data than it's not set.
Right click on your db connection --> Edit Connection --> Oracle properties --> tick on 'Use RULE hint for system catalog queries'
(by default this is not set)
UPDATE
In the newer version (21.0.0) of DBeaver, many more performance options appear here. Turning on them significantly improves the performance for me
I've never used DBeaver, but I often see applications which use too small an "array fetch size"**, which often poses fetch issues.
** Array fetch size note:
As per the Oracle documentation the Fetch Buffer Size is an application side memory setting that affects the number of rows returned by a single fetch. Generally you balance the number of rows returned with a single fetch (a.k.a. array fetch size) with the number of rows needed to be fetched.
A low array fetch size compared to the number of rows needed to be returned will manifest as delays from increased network and client side processing needed to process each fetch (i.e. the high cost of each network round trip [SQL*Net protocol]).
If this is the case, you will likely see very high waits on “SQLNet message from client” [in gv$session or elsewhere].
SQLNet message from client
This wait event is posted by the session when it is waiting for a message from the client to arrive. Generally, this means that the session is just sitting idle, however, in a Client/Server environment it could also means that either the client process is running slow or there are network latency delays. The database performance is not degraded by high wait times for this wait event.

oracle: connection timeout when calling a plsql procedure from Java

We have a daily batch job executing a oracle-plsql function. Actually the quartz scheduler invokes a java program which makes a call to the oracle-plsql function. This oracle plsql function deletes data (which is more than 6 months) from 4 tables and then commits the transaction.
This batch job was running successfully in the test environment but started failing when new data was dumped to the tables which happened 2 weeks ago (The code is supposed to go into production this week). Earlier the number of rows in each table was not more than 0.1 million. But now it is 1 million in 3 tables and 2.4 million in the other table.
After running for 3 hours, we are getting a error in java (written in the log file) "...Connection reset; nested exception is java.sql.SQLException: Io exception: Connection reset....". When the row-counts on the tables were checked, it was clear that no record was deleted from any of the tables.
Is it possible in oracle database, for the plsql procedure/function to be automatically terminated/killed when the connection is timed out and the invoking session is no longer active?
Thanks in advance,
Pradeep.
The PL/SQL won't terminate because it is inactive, since by definition it isn't - it is still doing something. It won't be generating any network traffic back to your client though.
It appears something at the network level is causing the connection to be terminated. This could be a listener timeout, a firewall timeout, or something else. If it's consistently after three hours then it will almost certainly be a timeout configured somewhere rather than a network glitch, which would be more random (and possibly recoverable).
When the network connection is interrupted, Oracle will notice at some point and terminate the session. That will cause the PL/SQL call to be terminated, and that will cause any work it has done to be rolled back, which may take a while.
3 hours seems a long time for your deletes though even for a few million records. Perhaps you're deleting inefficiently, with individual inserts within your procedure. Which doesn't really help you of course. It might be worth pointing out that your production environment might not have whatever setting is killing your connection, or might have a shorter timeout, so even reducing the runtime might not make it bullet-proof in live. You probably need to find the source of the timeout and check the equivalent in the live env. to try to pre-empt similar problems there.

Multithreaded access to an embedded HSQLDB database

I have an HSQLDB embedded database, which I use to store statistics of some measurements. The statistics are expected to be arriving about every second from the same thread, but be fetched from a few different threads (from the pool of threads) every several seconds.
I do not have much experience with jdbc, so my questions may sound trivial:
What is the price of creating/disposing a new connection every second? Recall, that the database is embedded, so there is no TCP/IP involved.
What is the price of creating/disposing prepared statements every second?
Please, note that some inserts are bulk inserts, where I think to use addBatch and executeBatch methods of a prepared statement.
You should try to reuse the connections and the prepared statements.
To do this, each thread will have a single connection, and each connection will reuse a set of prepared statements. The connection is committed after the unit of work is complete, but not closed. The connection is closed when your app is closed or completes its work.
You should use executeBatch for bulk inserts.
With HSQLDB, the price of creating / disposing a new connection / prepred statements every second is not high, but you should still avoid this if you can.

Does the compiled prepared statement in the database driver still require compilation in the database?

In the Oracle JDBC driver, there is an option to cache prepared statements. My understanding of this is that the prepared statements are precompiled by the driver, then cached, which improves performance for cached prepared statements.
My question is, does this mean that the database never has to compile those prepared statements? Does the JDBC driver send some precompiled representation, or is there still some kind of parsing/compilation that happens in the database itself?
When you use the implicit statement cache (or the Oracle Extension for the explicit Statement Cache) the Oracle Driver will cache a prepared- or callable statement after(!) the close() for re-use with the physical connection.
So what happens is: if a prepared Statement is used, and the physical connection has never seen it, it sends the SQL to the DB. Depending if the DB has seen the statement before or not, it will do a hard parse or a soft parse. So typically if you have a 10 connection pool, you will see 10 parses, one of it beein a hard parse.
After the statement is closed on a connection the Oracle driver will put the handle to the parsed statement (shared cursor) into a LRU cache. The next time you use prepareStatement on that connection it finds this cached handle to use and does not need to send the SQL at all. This results in a execution with NO PARSE.
If you have more (different) prepared statements used on a physical connection than the cache is in size the longest unused open shared cursor is closed. Which results in another soft parse the next time the statement is used again - because SQL needs to be sent to the server again.
This is basically the same function as some data sources for middleware have implemented more generically (for example prepared-statement-cache in JBoss). Use only one of both to avoid double caching.
You can find the details here:
http://docs.oracle.com/cd/E11882_01/java.112/e16548/stmtcach.htm#g1079466
Also check out the Oracle Unified Connection Pool (UCP) which supports this and interacts with FAN.
I think that this answers your question: (sorry it is powerpoint but it defines how the prepared statement is sent to Oracle, how Oracle stores it in the Shared SQL pool, processes it, etc). The main performance gain you are getting from Prepared statements is that on the 1+nth run you are avoiding hard parses of the sql statement.
http://www.google.com/url?sa=t&source=web&cd=2&ved=0CBoQFjAB&url=http%3A%2F%2Fchrisgatesconsulting.com%2FpreparedStatements.ppt&rct=j&q=java%20oracle%20sql%20prepared%20statements&ei=z0iaTJ3tJs2InQeClPwf&usg=AFQjCNG9Icy6hmlFUWHj2ruUsux7mM4Nag&cad=rja
Oracle (or db of choice) will store the prepared statement, java just send's it the same statement that the db will choose from (this is limited resources however, after x time of no query the shared sql will be purged esp. of non-common queries) and then a re-parse will be required -- whether or not it is cached in your java application.

Resources