Does the compiled prepared statement in the database driver still require compilation in the database?

Does the compiled prepared statement in the database driver still require compilation in the database? - oracle

In the Oracle JDBC driver, there is an option to cache prepared statements. My understanding of this is that the prepared statements are precompiled by the driver, then cached, which improves performance for cached prepared statements.
My question is, does this mean that the database never has to compile those prepared statements? Does the JDBC driver send some precompiled representation, or is there still some kind of parsing/compilation that happens in the database itself?

When you use the implicit statement cache (or the Oracle Extension for the explicit Statement Cache) the Oracle Driver will cache a prepared- or callable statement after(!) the close() for re-use with the physical connection.
So what happens is: if a prepared Statement is used, and the physical connection has never seen it, it sends the SQL to the DB. Depending if the DB has seen the statement before or not, it will do a hard parse or a soft parse. So typically if you have a 10 connection pool, you will see 10 parses, one of it beein a hard parse.
After the statement is closed on a connection the Oracle driver will put the handle to the parsed statement (shared cursor) into a LRU cache. The next time you use prepareStatement on that connection it finds this cached handle to use and does not need to send the SQL at all. This results in a execution with NO PARSE.
If you have more (different) prepared statements used on a physical connection than the cache is in size the longest unused open shared cursor is closed. Which results in another soft parse the next time the statement is used again - because SQL needs to be sent to the server again.
This is basically the same function as some data sources for middleware have implemented more generically (for example prepared-statement-cache in JBoss). Use only one of both to avoid double caching.
You can find the details here:
http://docs.oracle.com/cd/E11882_01/java.112/e16548/stmtcach.htm#g1079466
Also check out the Oracle Unified Connection Pool (UCP) which supports this and interacts with FAN.

I think that this answers your question: (sorry it is powerpoint but it defines how the prepared statement is sent to Oracle, how Oracle stores it in the Shared SQL pool, processes it, etc). The main performance gain you are getting from Prepared statements is that on the 1+nth run you are avoiding hard parses of the sql statement.
http://www.google.com/url?sa=t&source=web&cd=2&ved=0CBoQFjAB&url=http%3A%2F%2Fchrisgatesconsulting.com%2FpreparedStatements.ppt&rct=j&q=java%20oracle%20sql%20prepared%20statements&ei=z0iaTJ3tJs2InQeClPwf&usg=AFQjCNG9Icy6hmlFUWHj2ruUsux7mM4Nag&cad=rja
Oracle (or db of choice) will store the prepared statement, java just send's it the same statement that the db will choose from (this is limited resources however, after x time of no query the shared sql will be purged esp. of non-common queries) and then a re-parse will be required -- whether or not it is cached in your java application.

Related

Details of JDBC PreparedStatement.executeQuery()

When we execute SQL queries using PreparedStatement (as described here http://tutorials.jenkov.com/jdbc/preparedstatement.html), what exactly does the executeQuery() method do, if we use for example a database like SQL Server or Postgres? Does it convert the SQL query directly into a set of database operations, or does it make a network call to a database server that translates the SQL query to the database operations?
This is more generally a question about how databases like SQL Server work. I'm just wondering if they're running on separate servers than the ones calling executeQuery().

The implementations details vary per database system, but in general JDBC drivers for RDBMSes that uses SQL as their native query language will work as follows:
When Connection.prepareStatement(...) is executed, the query is sent to the database server for compilation
On PreparedStatement.executeQuery(), the driver sends an execute command together with the collected parameter values, and the database server executes the statement compiled earlier using those parameter values.
In other words, the driver is not concerned with low-level operations on the database server, but will just send a 'compile' and 'execute-with-parameters' command to the database server, and the database server takes care of the low-level operations.
And to be clear, not all drivers work this way. For example, the MySQL Connector/J by default 'compiles' the statement locally (determines the number of parameters), and on execute it will inline the parameter values (with proper escaping) into the statement, and then send a SQL string with literal values instead of parameters for execution on the database server. However, in that case, the database server is still responsible for determining and performing the necessary low-level operations.
On the other hand, it is entirely possible that a driver for some NoSQL database (or other type of datastore) will translate a SQL query into low-level operations on that datastore.

JDBC batch execution guarantees

We're trying to execute a batch insert into Azure Synapse (formerly Azure SQL Data warehouse). Problems are:
Performance is abysmal (~1 second for insertion of one row of less than 2KB and 20-25 columns)
It scales linearly (~90 seconds for 100 rows I think)
We're using standard JDBC batch insertion pattern addBatch() & executeBatch() with PreparedStatements (https://stackoverflow.com/a/3786127/496289).
We're using JDBC driver provided by Microsoft.
We know what's wrong, in DB telemetry it's clear that DB is breaking the batch down and more or less running it as if it's in a for-loop. No batch "optimization".
Curiously, when the underlying data source is SQL Server, batch scales as expected.
Question is: Is there nothing in standard/spec that says executeBatch() should scale better than linearly?
E.g. JDBC™ 4.3 Specification (JSR 221) says it can improve performance, not it must.
CHAPTER 14 Batch Updates
The batch update facility allows multiple SQL statements to be submitted to a data source for processing at once. Submitting multiple SQL statements, instead of individually, can greatly improve performance. Statement, PreparedStatement, and CallableStatement objects can be used to submit batch updates
14.1.4 PreparedStatement Objects has no such explicit/implied statement to say batch mechanism is for better performance.
Should probably add that Azure Synapse is capable to loading 1 trillion rows of data (~450 GB in Parquet format) from Data lake in 17-26 minutes with 500 DWUs.

The JDBC specification doesn't require any kind of optimization for batch execution. In fact, not all databases support batch execution. A conforming JDBC driver is expected to implement batch execution whether or not the underlying database system supports it.
If the database system doesn't support it, the JDBC driver will simulate batch execution by repeatedly executing the statement in a loop. Such an implementation will not perform better than manually executing the statement repeatedly.
This is also why the text you quote says "can greatly improve performance" and not will or must.

Oracle Bind Query is very slow

I have an Oracle bind query that is extremely slow (about 2 minutes) when it executes in my C# program but runs very quickly in SQL Developer. It has two parameters that hit the tables index:
select t.Field1, t.Field2
from theTable t
where t.key1=:key1
and t.key2=:key2
Also, if I remove the bind variables and create dynamic sql, it runs just like it does in SQL Developer.
Any suggestion?
BTW, I'm using ODP.

If you are replacing the bind variables with static varibles in sql developer, then you're not really running the same test. Make sure you use the bind varibles, and if it's also slow you're just getting bit by a bad cached execution plan. Updating the stats on that table should resolve it.
However if you are actually using bind variables in sql developers then keep reading. The TLDR version is that parameters that ODP.net run under sometimes cause a slightly more pessimistic approach. Start with updating the stats, but have your dba capture the execution plan under both scenarios and compare to confirm.
I'm reposting my answer from here: https://stackoverflow.com/a/14712992/852208
I considered flagging yours as a duplicate but your title is a little more concise since it identifies the query does run fast in sql developer. I'll welcome advice on handling in another manner.
Adding the following to your config will send odp.net tracing info to a log file:
This will probably only be helpful if you can find a large gap in time. Chances are rows are actually coming in, just at a slower pace.
Try adding "enlist=false" to your connection string. I don't consider this a solution since it effecitively disables distributed transactions but it should help you isolate the issue. You can get a little bit more information from an oracle forumns post:
From an ODP perspective, all we can really point out is that the
behavior occurs when OCI_ATR_EXTERNAL_NAME and OCI_ATR_INTERNAL_NAME
are set on the underlying OCI connection (which is what happens when
distrib tx support is enabled).
I'd guess what you're not seeing is that the execution plan is actually different (meaning the actual performance hit is actually occuring on the server) between the odp.net call and the sql developer call. Have your dba trace the connection and obtain execution plans from both the odp.net call and the call straight from SQL Developer (or with the enlist=false parameter).
If you confirm different execution plans or if you want to take a preemptive shot in the dark, update the statistics on the related tables. In my case this corrected the issue, indicating that execution plan generation doesn't really follow different rules for the different types of connections but that the cost analysis is just slighly more pesimistic when a distributed transaction might be involved. Query hints to force an execution plan are also an option but only as a last resort.
Finally, it could be a network issue. If your odp.net install is using a fresh oracle home (which I would expect unless you did some post-install configuring) then the tnsnames.ora could be different. Host names in tnsnams might not be fully qualified, creating more delays resolving the server. I'd only expect the first attempt (and not subsequent attempts) to be slow in this case so I don't think it's the issue but I thought it should be mentioned.

Are the parameters bound to the correct data type in C#? Are the columns key1 and key2 numbers, but the parameters :key1 and :key2 are strings? If so, the query may return the correct results but will require implicit conversion. That implicit conversion is like using a function to_char(key1), which prevents an index from being used.

Please also check what is the number of rows returned by the query. If the number is big then possibly C# is fetching all rows and the other tool first pocket only. Fetching all rows may require many more disk reads in that case, which is slower. To check this try to run in SQL Developer:
SELECT COUNT(*) FROM (
select t.Field1, t.Field2
from theTable t
where t.key1=:key1
and t.key2=:key2
)
The above query should fetch the maximum number of database blocks.
Nice tool in such cases is tkprof utility which shows SQL execution plan which may be different in cases above (however it should not be).
It is also possible that you have accidentally connected to different databases. In such cases it is nice to compare results of queries.
Since you are raising "Bind is slow" I assume you have checked the SQL without binds and it was fast. In 99% using binds makes things better. Please check if query with constants will run fast. If yes than problem may be implicit conversion of key1 or key2 column (ex. t.key1 is a number and :key1 is a string).

Multithreaded access to an embedded HSQLDB database

I have an HSQLDB embedded database, which I use to store statistics of some measurements. The statistics are expected to be arriving about every second from the same thread, but be fetched from a few different threads (from the pool of threads) every several seconds.
I do not have much experience with jdbc, so my questions may sound trivial:
What is the price of creating/disposing a new connection every second? Recall, that the database is embedded, so there is no TCP/IP involved.
What is the price of creating/disposing prepared statements every second?
Please, note that some inserts are bulk inserts, where I think to use addBatch and executeBatch methods of a prepared statement.

You should try to reuse the connections and the prepared statements.
To do this, each thread will have a single connection, and each connection will reuse a set of prepared statements. The connection is committed after the unit of work is complete, but not closed. The connection is closed when your app is closed or completes its work.
You should use executeBatch for bulk inserts.
With HSQLDB, the price of creating / disposing a new connection / prepred statements every second is not high, but you should still avoid this if you can.

Query result in Oracle memory or Perl memory?

I am using the DBI module to fire a select query of Oracle. Using the prepare module of the DBI, the query has been prepared, and then using the execute module, the select query is executed.
My question is: Once the query is executed, the result is stored in memory till we use any of the fetchrow methods to retrieve the result. Till then, the query result is stored in Oracle memory or Perl memory?
As of my understanding, it should be in Oracle memory, I still wanted to confirm.

It is held in Oracle until you issue your first fetch. However, you should be aware that once you make your first fetch call DBD::Oracle (which I presume you are using) will likely fetch multiple rows back in one go even if you asked for only one (you can see how many with RowsInCache). You can alter the settings used with ora_prefetch_rows, ora_prefetch_memory and ora_row_cache_off.

in the Oracle memory. First hint: you don't have access to that data.
You could test the amount of memory used by your Perl script before and after the execute statement to confirm.
See http://docstore.mik.ua/orelly/linux/dbi/ch05_01.htm

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio