If I don't have a WHERE clause in a query then should I use Statement or PreparedStatement. Which one will be efficient.
For Ex,
SELECT ID, NAME FROM PERSON
A prepared statement is precompiled to enhance efficiency. Also the database caches the statement which gains performance on later execution. Both can be of use even if you don't have variables in your statement. Especially if the statement is executed often.
If executed once or very seldomly I'd say a normal Statement is fine. Otherwise I would use a PreparedStatement. But there's no way of beeing sure about it without benchmarking.
Depends on the implementation of the JDBC driver. Some vendors save that statement in a cache, regardless if is a instance of java.sql.Statement or java.sql.PreparedStatement. For simplicity, you could use java.sql.Statement. On the other hand, if you plan to add a parameter and execute the statement several times (in the same connection), uses an instance of java.sql.PreparedStatement.
In the javadoc for java.sql.PreparedStatement says:
This object can then be used to efficiently execute this statement multiple times.
Apart from what has been mentioned by stonedsquirrel, another point is in future if you would want to add where condition then it is easy to make a change, all you need to add the following in your code
PreparedStatement ps = con.prepareStatement("SELECT ID, NAME FROM PERSON WHERE NAME= ?");
ps.setString(1, getName(""));
....
...
However if you are using Statement, then you need to make more changes in your code.
So by using PreparedStatement you will do minimal change if you need to add where conditions.
On the contrary by using Statement, it is quite easy to log or print the sql query, however if
PreparedStatement is used, logging or printing sql statement is quite difficult or there are no direct approaches available.
Related
I want to use the count(*) result from a Hive Query as input for a second hive query. The query is simplified as:
set LIM = SELECT count(*) from default.mytable* 0.8;
select * from default.mytable LIMIT ${hiveconf:LIM};
The above code will lead to an error as the first query does not get executed and there the LIM variable will not get substituted with a numeric value.
Is there a way to force Hive to substitute the variable LIM so that I have a numeric value in the second query?
## WARNING - verbose explanation follows; the short answer is "no way" ##
In terms of IT Architecture, this kind of tricks is not done in the database tier but in the application tier.
Since I don't know nuthin' about your Teradata stack (fondly nicknamed "taratata" by some of your French-speaking colleagues) I'll take the Oracle stack as an example.
A. Inside a PL/SQL block, you can retrieve the (scalar) result of a query into a variable, and use it later -- as an input bind variable in a prepared statement, or as a way to build dynamically a string to be parsed dynamically as a SQL query. That PL/SQL block an "application", with application logic of arbitrary complexity; it just happens to run inside an Oracle session, on the same host that also runs the database tier.
B. Inside the SQL*Plus client (and maybe compatible tools e.g. SQL Developer) you can use a weird syntax to retrieve a value in a kind of macro-variable, that can be used to stuff the value as-is in further SQL queries. That trick allows some crude "application" logic to be applied to an otherwise static SQL script, client-side. But that is clearly a non-portable trick.
Bottom line - since Hive has no procedural language, and will probably (hopefully) never have one, the best way to do what you want would be to develop your own custom Hive client all by yourself, with whatever business logic you want. After all, there must be thousands of people around the world who are developing Java code to access Hive with JDBC, so you would not be alone...
Well.. you can do that if you are comfy with writing a shell script.
Take the query output and store it into a variable & use the variable for your second query.
I have an Oracle bind query that is extremely slow (about 2 minutes) when it executes in my C# program but runs very quickly in SQL Developer. It has two parameters that hit the tables index:
select t.Field1, t.Field2
from theTable t
where t.key1=:key1
and t.key2=:key2
Also, if I remove the bind variables and create dynamic sql, it runs just like it does in SQL Developer.
Any suggestion?
BTW, I'm using ODP.
If you are replacing the bind variables with static varibles in sql developer, then you're not really running the same test. Make sure you use the bind varibles, and if it's also slow you're just getting bit by a bad cached execution plan. Updating the stats on that table should resolve it.
However if you are actually using bind variables in sql developers then keep reading. The TLDR version is that parameters that ODP.net run under sometimes cause a slightly more pessimistic approach. Start with updating the stats, but have your dba capture the execution plan under both scenarios and compare to confirm.
I'm reposting my answer from here: https://stackoverflow.com/a/14712992/852208
I considered flagging yours as a duplicate but your title is a little more concise since it identifies the query does run fast in sql developer. I'll welcome advice on handling in another manner.
Adding the following to your config will send odp.net tracing info to a log file:
This will probably only be helpful if you can find a large gap in time. Chances are rows are actually coming in, just at a slower pace.
Try adding "enlist=false" to your connection string. I don't consider this a solution since it effecitively disables distributed transactions but it should help you isolate the issue. You can get a little bit more information from an oracle forumns post:
From an ODP perspective, all we can really point out is that the
behavior occurs when OCI_ATR_EXTERNAL_NAME and OCI_ATR_INTERNAL_NAME
are set on the underlying OCI connection (which is what happens when
distrib tx support is enabled).
I'd guess what you're not seeing is that the execution plan is actually different (meaning the actual performance hit is actually occuring on the server) between the odp.net call and the sql developer call. Have your dba trace the connection and obtain execution plans from both the odp.net call and the call straight from SQL Developer (or with the enlist=false parameter).
If you confirm different execution plans or if you want to take a preemptive shot in the dark, update the statistics on the related tables. In my case this corrected the issue, indicating that execution plan generation doesn't really follow different rules for the different types of connections but that the cost analysis is just slighly more pesimistic when a distributed transaction might be involved. Query hints to force an execution plan are also an option but only as a last resort.
Finally, it could be a network issue. If your odp.net install is using a fresh oracle home (which I would expect unless you did some post-install configuring) then the tnsnames.ora could be different. Host names in tnsnams might not be fully qualified, creating more delays resolving the server. I'd only expect the first attempt (and not subsequent attempts) to be slow in this case so I don't think it's the issue but I thought it should be mentioned.
Are the parameters bound to the correct data type in C#? Are the columns key1 and key2 numbers, but the parameters :key1 and :key2 are strings? If so, the query may return the correct results but will require implicit conversion. That implicit conversion is like using a function to_char(key1), which prevents an index from being used.
Please also check what is the number of rows returned by the query. If the number is big then possibly C# is fetching all rows and the other tool first pocket only. Fetching all rows may require many more disk reads in that case, which is slower. To check this try to run in SQL Developer:
SELECT COUNT(*) FROM (
select t.Field1, t.Field2
from theTable t
where t.key1=:key1
and t.key2=:key2
)
The above query should fetch the maximum number of database blocks.
Nice tool in such cases is tkprof utility which shows SQL execution plan which may be different in cases above (however it should not be).
It is also possible that you have accidentally connected to different databases. In such cases it is nice to compare results of queries.
Since you are raising "Bind is slow" I assume you have checked the SQL without binds and it was fast. In 99% using binds makes things better. Please check if query with constants will run fast. If yes than problem may be implicit conversion of key1 or key2 column (ex. t.key1 is a number and :key1 is a string).
I have a vey huge query. It is rather large, so i will not post it here(it has 6 levels of nested queries with ordering and grouping). Query has 2 parameters that are passed to it via PreparedStatement.setString(index, value). When I execute my query through SQL Developer(replacing query parameters to actual values before it by hand) the query runs about 10 seconds and return approximately 15000 rows. But when I try to run it through java program using PreparedStament with varibales it fails with ORA-01652(unable to extend temp segment). I have tried to use simple Statement from java program - it works fine. Also when I use preparedStatement without variables(don't use setString(), but specify parameters by hand) it works fine too.
So, I suspect that problem is in PreparedStatemnt parameters.
How does the mechanism of that parameters work? Why simple statement works fine but prepared one fails?
You're probably running into issues with bind variable peeking.
For the same query, the best plan can be significantly different depending on the actual bind variables. In 10g, Oracle builds the execution plan based on the first set of bind variables used. 11g mostly fixed this problem with adaptive cursor sharing, a feature that creates multiple plans for different bind variables.
Here are some ideas for solving this problem:
Use literals This isn't always as bad as people assume. If the good version of your query runs in 10 seconds, the overhead of hard-parsing the query will be negligible. But you may need to be careful to avoid SQL injection.
Force a hard-parse There are a few ways to force Oracle to hard-parse every query. One method is to call DBMS_STATS with NO_INVALIDATE=>FALSE on one of the tables in the query.
Disable bind-variable peeking / hints You can do this by removing the relevant histograms, or using one of the parameters in the link provided by OldProgrammer. This will stabilize your plan, but will not necessarily pick the correct plan. You may also need to use hints to pick the right plan. But then you may not have the right plan for every combination of inputs.
Upgrade to 11g This may not be an option, but this issue is another good reason to start planning an upgrade.
A general question, but I am not able to find answer for this: if PreparedStatement can run even static sqls, why we need statement in java.sql.*
EDIT:
Thanks Mat
but my concern is why one would use Statement rather than using PreparedStatement, in other words where Statement supersedes PreparedStatement
Note: My understanding is- one can use Statement for static queries nor fired frequently, rather than the PreparedStatement, which is used in cases of frequent queries(reason: performance because of pre-compilation of SQL)
In general, PreparedStatement will have a better performance, when you have the same query in your code because it make cache of query.
But if in your application, if the frequency of the same query is very less, then preparedstatement will not improve any performance. Here you should use statement, which is the primary choice for your jdbc based development codebase.
I have a table:
-- Tag
ID | Name
-----------
1 | c#
2 | linq
3 | entity-framework
I have a class that will have the following methods:
IEnumerable<Tag> GetAll();
IEnumerable<Tag> GetByName();
Should I use a compiled query in this case?
static readonly Func<Entities, IEnumerable<Tag>> AllTags =
CompiledQuery.Compile<Entities, IEnumerable<Tag>>
(
e => e.Tags
);
Then my GetByName method would be:
IEnumerable<Tag> GetByName(string name)
{
using (var db = new Entities())
{
return AllTags(db).Where(t => t.Name.Contains(name)).ToList();
}
}
Which generates a SELECT ID, Name FROM Tag and execute Where on the code. Or should I avoid CompiledQuery in this case?
Basically I want to know when I should use compiled queries. Also, on a website they are compiled only once for the entire application?
You should use a CompiledQuery when all of the following are true:
The query will be executed more than once, varying only by parameter values.
The query is complex enough that the cost of expression evaluation and view generation is "significant" (trial and error)
You are not using a LINQ feature like IEnumerable<T>.Contains() which won't work with CompiledQuery.
You have already simplified the query, which gives a bigger performance benefit, when possible.
You do not intend to further compose the query results (e.g., restrict or project), which has the effect of "decompiling" it.
CompiledQuery does its work the first time a query is executed. It gives no benefit for the first execution. Like any performance tuning, generally avoid it until you're sure you're fixing an actual performance hotspot.
2012 Update: EF 5 will do this automatically (see "Entity Framework 5: Controlling automatic query compilation") . So add "You're not using EF 5" to the above list.
Compiled queries save you time, which would be spent generating expression trees. If the query is used often and you'll save the compiled query, you should definitely use it. I had many cases when the query parsing took more time than the actual round trip to the database.
In your case, if you are sure that it would generate SELECT ID, Name FROM Tag without the WHERE case (which I doubt, as your AllQueries function should return IQueryable and the actual query should be made only after calling ToList) - you shouldn't use it.
As someone already mentioned, on bigger tables SELECT * FROM [someBigTable] would take very long and you'll spend even more time filtering that on the client side. So you should make sure that your filtering is made on the database side, no matter if you are using compiled queries or not.
compiled queries are more helpfull with linq queries with large expression trees say complex queries to gain performance over building expression tree again and again while reusing query. in your case i guess it will save a very little time.
Compiled queries are compiled when the application is compiled and every time you reuse a query often or it is complex you should definitely try compiled queries to make execution faster.
But I would not go for it on all queries as it is a little more code to write and for simple queries it might not be worthwhile.
But for maximum performance you should also evaluate Stored Procedures where you do all the processing on the database server, even if Linq tries to push as much of the work to the db as possible you will have situations where a stored procedure will be faster.
Compiled queries offer a performance improvement, but it's not huge. If you have complex queries, I'd rather go with a stored procedure or a view, if possible; letting the database do it's thing might be a better approach.