Multithreaded access to an embedded HSQLDB database - jdbc

I have an HSQLDB embedded database, which I use to store statistics of some measurements. The statistics are expected to be arriving about every second from the same thread, but be fetched from a few different threads (from the pool of threads) every several seconds.
I do not have much experience with jdbc, so my questions may sound trivial:
What is the price of creating/disposing a new connection every second? Recall, that the database is embedded, so there is no TCP/IP involved.
What is the price of creating/disposing prepared statements every second?
Please, note that some inserts are bulk inserts, where I think to use addBatch and executeBatch methods of a prepared statement.

You should try to reuse the connections and the prepared statements.
To do this, each thread will have a single connection, and each connection will reuse a set of prepared statements. The connection is committed after the unit of work is complete, but not closed. The connection is closed when your app is closed or completes its work.
You should use executeBatch for bulk inserts.
With HSQLDB, the price of creating / disposing a new connection / prepred statements every second is not high, but you should still avoid this if you can.

Related

In case of batch read in JDBC using setFetchSize, if we apply timeout to statement then is it batch level timeout?

Statement statement = connection.prepareStatement("SELECT * FROM BOOKS");
statement.setFetchSize(100);
statement.setQueryTimeout(10);//Timeout of 10 seconds
here is the timeout for per batch or total records to be fetched?
The JDBC API documentation of Statement.setQueryTimeout says:
Sets the number of seconds the driver will wait for a Statement object
to execute to the given number of seconds. By default there is no
limit on the amount of time allowed for a running statement to
complete. If the limit is exceeded, an SQLTimeoutException is
thrown. A JDBC driver must apply this limit to the execute,
executeQuery and executeUpdate methods.
Note: JDBC driver implementations may also apply this limit to ResultSet methods (consult your driver vendor documentation for
details).
Note: In the case of Statement batching, it is implementation defined as to whether the time-out is applied to individual SQL
commands added via the addBatch method or to the entire batch of SQL
commands invoked by the executeBatch method (consult your driver
vendor documentation for details).
In other words, this will depend on the database and driver you use.
Some drivers might apply it to execute only, and fetching rows after execute might not be subject to the timeout, some might apply it to individual fetches triggered by ResultSet.next() as well, and yet others might even use the wall clock time since query execution started and forcibly close the cursor if query execution and fetching and processing rows takes longer than the specified timeout (which could trigger the timeout even if the server is quick to respond, but it is the client that is taking a long time to process the entire result set).

Cache greenplum query plan globally?

I'd like to save planner cost using plan cache, since OCRA/Legacy optimizer will take dozens of millionseconds.
I think greenplum cache query plan in session level, when session end or other session could not share the analyzed plan. Even more, we can't keep session always on, since gp system will not release resource until TCP connection disconnected.
most major database cache plans after first running, and use that corss connections.
So, is there any switch that turn on query plan cache cross connectors? I can see in a session, client timing statistics not match the "Total time" planner gives?
Postgres can cache the plans as well, which is on a per session basis and once the session is ended, the cached plan is thrown away. This can be tricky to optimize/analyze, but generally of less importance unless the query you are executing is really complex and/or there are a lot of repeated queries.
The documentation explains those in detail pretty well. We can query pg_prepared_statements to see what is cached. Note that it is not available across sessions and visible only to the current session.
When a user starts a session with Greenplum Database and issues a query, the system creates groups or 'gangs' of worker processes on each segment to do the work. After the work is done, the segment worker processes are destroyed except for a cached number which is set by the gp_cached_segworkers_threshold parameter.
A lower setting conserves system resources on the segment hosts, but a higher setting may improve performance for power-users that want to issue many complex queries in a row.
Also see gp_max_local_distributed_cache.
Obviously, the more you cache, the less memory there will be available for other connections and queries. Perhaps not a big deal if you are only hosting a few power users running concurrent queries... but you may need to adjust your gp_vmem_protect_limit accordingly.
For clarification:
Segment resources are released after the gp_vmem_idle_resource_timeout.
Only the master session will remain until the TCP connection is dropped.

oracle: connection timeout when calling a plsql procedure from Java

We have a daily batch job executing a oracle-plsql function. Actually the quartz scheduler invokes a java program which makes a call to the oracle-plsql function. This oracle plsql function deletes data (which is more than 6 months) from 4 tables and then commits the transaction.
This batch job was running successfully in the test environment but started failing when new data was dumped to the tables which happened 2 weeks ago (The code is supposed to go into production this week). Earlier the number of rows in each table was not more than 0.1 million. But now it is 1 million in 3 tables and 2.4 million in the other table.
After running for 3 hours, we are getting a error in java (written in the log file) "...Connection reset; nested exception is java.sql.SQLException: Io exception: Connection reset....". When the row-counts on the tables were checked, it was clear that no record was deleted from any of the tables.
Is it possible in oracle database, for the plsql procedure/function to be automatically terminated/killed when the connection is timed out and the invoking session is no longer active?
Thanks in advance,
Pradeep.
The PL/SQL won't terminate because it is inactive, since by definition it isn't - it is still doing something. It won't be generating any network traffic back to your client though.
It appears something at the network level is causing the connection to be terminated. This could be a listener timeout, a firewall timeout, or something else. If it's consistently after three hours then it will almost certainly be a timeout configured somewhere rather than a network glitch, which would be more random (and possibly recoverable).
When the network connection is interrupted, Oracle will notice at some point and terminate the session. That will cause the PL/SQL call to be terminated, and that will cause any work it has done to be rolled back, which may take a while.
3 hours seems a long time for your deletes though even for a few million records. Perhaps you're deleting inefficiently, with individual inserts within your procedure. Which doesn't really help you of course. It might be worth pointing out that your production environment might not have whatever setting is killing your connection, or might have a shorter timeout, so even reducing the runtime might not make it bullet-proof in live. You probably need to find the source of the timeout and check the equivalent in the live env. to try to pre-empt similar problems there.

Does the compiled prepared statement in the database driver still require compilation in the database?

In the Oracle JDBC driver, there is an option to cache prepared statements. My understanding of this is that the prepared statements are precompiled by the driver, then cached, which improves performance for cached prepared statements.
My question is, does this mean that the database never has to compile those prepared statements? Does the JDBC driver send some precompiled representation, or is there still some kind of parsing/compilation that happens in the database itself?
When you use the implicit statement cache (or the Oracle Extension for the explicit Statement Cache) the Oracle Driver will cache a prepared- or callable statement after(!) the close() for re-use with the physical connection.
So what happens is: if a prepared Statement is used, and the physical connection has never seen it, it sends the SQL to the DB. Depending if the DB has seen the statement before or not, it will do a hard parse or a soft parse. So typically if you have a 10 connection pool, you will see 10 parses, one of it beein a hard parse.
After the statement is closed on a connection the Oracle driver will put the handle to the parsed statement (shared cursor) into a LRU cache. The next time you use prepareStatement on that connection it finds this cached handle to use and does not need to send the SQL at all. This results in a execution with NO PARSE.
If you have more (different) prepared statements used on a physical connection than the cache is in size the longest unused open shared cursor is closed. Which results in another soft parse the next time the statement is used again - because SQL needs to be sent to the server again.
This is basically the same function as some data sources for middleware have implemented more generically (for example prepared-statement-cache in JBoss). Use only one of both to avoid double caching.
You can find the details here:
http://docs.oracle.com/cd/E11882_01/java.112/e16548/stmtcach.htm#g1079466
Also check out the Oracle Unified Connection Pool (UCP) which supports this and interacts with FAN.
I think that this answers your question: (sorry it is powerpoint but it defines how the prepared statement is sent to Oracle, how Oracle stores it in the Shared SQL pool, processes it, etc). The main performance gain you are getting from Prepared statements is that on the 1+nth run you are avoiding hard parses of the sql statement.
http://www.google.com/url?sa=t&source=web&cd=2&ved=0CBoQFjAB&url=http%3A%2F%2Fchrisgatesconsulting.com%2FpreparedStatements.ppt&rct=j&q=java%20oracle%20sql%20prepared%20statements&ei=z0iaTJ3tJs2InQeClPwf&usg=AFQjCNG9Icy6hmlFUWHj2ruUsux7mM4Nag&cad=rja
Oracle (or db of choice) will store the prepared statement, java just send's it the same statement that the db will choose from (this is limited resources however, after x time of no query the shared sql will be purged esp. of non-common queries) and then a re-parse will be required -- whether or not it is cached in your java application.

BPEL for data synchronization

I am trying to use Oracle SOA BPEL to synch data of about 1000 employees between an HR service and our local db. I get IDs of all employees with a findEmp call and loop through it empCount times to getEmp(empID) from the same HR service and update/insert into our db in every loop. This times out after about 60 odd employees, though this process is an asynch process. How should I redesign the process flow?
The timeout is occuring because you don't have any dehydration points in your BPEL code. Oracle BPEL needs to dehydrate before the Java transaction times out.
If you are using the Oracle BPEL DB Adapter, you can actually submit many objects at once for processing to the database, simply put more than one in the element from the DB Adapter. This may help a lot, since you can fetch all your data at once, then write it all at once.
Additionally, you can extend the transaction timeout for Oracle BPEL- it's a configuration parameter in transaction-manager.xml (there's also some tweaks to the EJB timeouts you need to do on 10.1.3.3.x & 10.1.3.4.x). The Oracle BPEL docs tell you how to change this variable.

Resources