JDBC: Is it possible to execute another query on the results of a previous query? - jdbc

I want to first get some result set (that includes two joins and selection), and then get the maximum value for one of the columns in the result set.
I need both the data in the original results set, and the max.
Is this possible with JDBC, and how?

I think it is. Should look like this:
SELECT MAX(derivedTable.myRow) FROM
(SELECT * FROM table1 JOIN table2 ON table1.id = table2.some_id) derivedTable
The key is to assign your inner select an alias ("derivedTable" above) and perform another selection on that.
--- Edit based on comment:
No, I don't think that's possible. Even without the JDBC layer - say, in a direct SQL console - I don't think there is a way to query data in a result set in any RDBMS I know.
Depending on the speed of your query and the size of the result, either performing a second query or just iterating through the results to find the maximum are your best options.

You can do it with standard SQL although it's a bit awkward:
SELECT a, b, c, (SELECT MAX(c) FROM table1 JOIN table2 ON table1.id = table2.table_id) max_c
FROM table1
JOIN table2 ON table1.id = table2.table_id

Related

Write a nested select statement with a where clause in Hive

I have a requirement to do a nested select within a where clause in a Hive query. A sample code snippet would be as follows;
select *
from TableA
where TA_timestamp > (select timestmp from TableB where id="hourDim")
Is this possible or am I doing something wrong here, because I am getting an error while running the above script ?!
To further elaborate on what I am trying to do, there is a cassandra keyspace that I publish statistics with a timestamp. Periodically (hourly for example) this stats will be summarized using hive, once summarized that data will be stored separately with the corresponding hour. So when the query runs for the second time (and consecutive runs) the query should only run on the new data (i.e. - timestamp > previous_execution_timestamp). I am trying to do that by storing the latest executed timestamp in a separate hive table, and then use that value to filter out the raw stats.
Can this be achieved this using hive ?!
Subqueries inside a WHERE clause are not supported in Hive:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
However, often you can use a JOIN statement instead to get to the same result:
https://karmasphere.com/hive-queries-on-table-data#join_syntax
For example, this query:
SELECT a.KEY, a.value
FROM a
WHERE a.KEY IN
(SELECT b.KEY FROM B);
can be rewritten to:
SELECT a.KEY, a.val
FROM a LEFT SEMI JOIN b ON (a.KEY = b.KEY)
Looking at the business requirements underlying your question, it occurs that you might get more efficient results by partitioning your Hive table using hour. If the data can be written to use this factor as the partition key, then your query to update the summary will be much faster and require fewer resources.
Partitions can get out of hand when they reach the scale of millions, but this seems like a case that will not tease that limitation.
It will work if you put in :
select *
from TableA
where TA_timestamp in (select timestmp from TableB where id="hourDim")
EXPLANATION : As > , < , = need one exact figure in the right side, while here we are getting multiple values which can be taken only with 'IN' clause.

Oracle Bulk Collect and Update from cursor or user Update with Select subquery

I have 2 options to update a table based on a cursor of a select query.
Say I have this select query:
select id1 from table1
and my update query is :
update table2 set value=1 where table2.id2 = table1.id1
Now the 2 options are:
set a cursor the select query and fetch it in bulk and then fire the update query inside the for all statement.
write update query with the select subquery as:
update table2 set value=1 where table2.id2 in (select id1 from table1)
which one is better?
Does Oracle internally convert the select subquery in to a bulk collect or does it treat it as a normal cursor?
First on your question "Does Oracle internally convert the select subquery in to a bulk collect?"
No. The optimizer calculates a plan and selects the data from the subquery in some appropiate way. There's no bulk collect involved.
To your question "which one is better?". Well, it depends. If you can formulate a query which gets all of your table1.id1 in one run and table2 has lots of rows so the subselect is expensive, then I'd probably use bulk collect. But keep in mind that depending on the amount of data, you'll need some PGA to accomplish this.
But I may point to another - IMHO quite elegant - solution:
MERGE INTO table2
USING (select id1 from table1)
ON (id2 = id2)
WHEN MATCHED THEN
UPDATE SET value=1
;
That's usually faster than doing subqueries and faster than bulk collects: one run on table1 and one run on table2. (Add where clause to your needs)

Oracle Select Query, Order By + Limit Results

I am new to Oracle and working with a fairly large database. I would like to perform a query that will select the desired columns, order by a certain column and also limit the results. According to everything I have read, the below query should be working but it is returning "ORA-00918: column ambiguously defined":
SELECT * FROM(SELECT * FROM EAI.EAI_EVENT_LOG e,
EAI.EAI_EVENT_LOG_MESSAGE e1 WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) WHERE ROWNUM <= 20
Any suggestions would be greatly appreciated :D
The error message means your result set contains two columns with the same name. Each column in a query's projection needs to have a unique name. Presumably you have a column (or columns) with the same name in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE.
You also want to join on that column. At the moment you are generating a cross join between the two tables. In other words, if you have a hundred records in EAI_EVENT_LOG and two hundred records EAI_EVENT_LOG_MESSAGE your result set will be twenty thousand records (without the rownum). This is probably your intention.
"By switching to innerjoin, will that eliminate the error with the
current code?"
No, you'll still need to handle having two columns with the same name. Basically this comes from using SELECT * on two multiple tables. SELECT * is bad practice. It's convenient but it is always better to specify the exact columns you want in the query's projection. That way you can include (say) e.TRANSACTION_ID and exclude e1.TRANSACTION_ID, and avoid the ORA-00918 exception.
Maybe you have some columns in both EAI_EVENT_LOG and EAI_EVENT_LOG_MESSAGE tables having identical names? Instead of SELECT * list all columns you want to select.
Other problem I see is that you are selecting from two tables but you're not joining them in the WHERE clause hence the result set will be the cross product of those two table.
You need to stop using SQL '89 implicit join syntax.
Not because it doesn't work, but because it is evil.
Right now you have a cross join which in 99,9% of the cases is not what you want.
Also every sub-select needs to have it's own alias.
SELECT * FROM
(SELECT e.*, e1.* FROM EAI.EAI_EVENT_LOG e
INNER JOIN EAI.EAI_EVENT_LOG_MESSAGE e1 on (......)
WHERE e.SOURCE_URL LIKE '%.XML'
ORDER BY e.REQUEST_DATE_TIME DESC) s WHERE ROWNUM <= 20
Please specify a join criterion on the dotted line.
Normally you do a join on a keyfield e.g. ON (e.id = e1.event_id)
It's bad idea to use select *, it's better to specify exactly which fields you want:
SELECT e.field1 as customer_id
,e.field2 as customer_name
.....

Why the Select * FROM Table where ID NOT IN ( list of int ids) query is slow in sql server ce?

well this problem is general in sql server ce
i have indexes on all the the fields.
also the same query but with ID IN ( list of int ids) is pretty fast.
i tried to change the query to OUTER Join but this just make it worse.
so any hints on why this happen and how to fix this problem?
That's because the index is not really helpful for that kind of query, so the database has to do a full table scan. If the query is (for some reason) slower than a simple "SELECT * FROM TABLE", do that instead and filter the unwanted IDs in the program.
EDIT: by your comment, I recognize you use a subquery instead of a list. Because of that, there are three possible ways to do the same (hopefully one of them is faster):
Original statement:
select * from mytable where id not in (select id from othertable);
Alternative 1:
select * from mytable where not exists
(select 1 from othertable where mytable.id=othertable.id);
Alternative 2:
select * from mytable
minus
select mytable.* from mytable in join othertable on mytable.id=othertable.id;
Alternative 3: (ugly and hard to understand, but if everything else fails...)
select * from mytable
left outer join othertable on (mytable.id=othertable.id)
where othertable.id is null;
This is not a problem in SQL Server CE, but overall database.
The OPERATION IN is sargable and NOT IN is nonsargable.
What this mean ?
Search ARGument Able, thies mean that DBMS engine can take advantage of using index, for Non Search ARGument Ablee the index can't be used.
The solution might be using filter statement to remove those IDs
More in SQL Performance Tuning by Peter Gulutzan.
ammoQ is right, index does not help much with your query. Depending on distribution of values in your ID column you could optimise the query by specifying which IDs to select rather than not to select. If you end up requesting say more than ~25% of the table index will not be used anyway though because for nonclustered indexed (which is the only type of indexes which SQL CE supports if memory serves) it would be cheaper to scan the table. Otherwise (if the query is actually selective) you could re-write query with ID ranges to select ('union all' may work better than 'or' to combine ranges if SQL CE supports 'union all', not sure)

Table Join Efficiency Question

When joining across tables (as in the examples below), is there an efficiency difference between joining on the tables or joining subqueries containing only the needed columns?
In other words, is there a difference in efficiency between these two tables?
SELECT result
FROM result_tbl
JOIN test_tbl USING (test_id)
JOIN sample_tbl USING (sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
vs
SELECT result
FROM (SELECT result, test_id FROM result_tbl)
JOIN (SELECT test_id, sample_id FROM test_tbl) USING(test_id)
JOIN (SELECT sample_id FROM sample_tbl) USING(sample_id)
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A') USING(request_id)
The only way to find out for sure is to run both with tracing turned on and then look at the trace file. But in all probability they will be treated the same: the optimizer will merge all the inline views into the main statement and come up with the same query plan.
It doesn't matter. It may actually be WORSE since you are taking control away from the optimizer which generally knows best.
However, remember if you are doing a JOIN and only including a column from one of the tables that it is QUITE OFTEN better to re-write it as a series of EXISTS statements -- because that's what you really mean. JOINs (with some exceptions) will join matching rows which is a lot more work for the optimizer to do.
e.g.
SELECT t1.id1
FROM table1 t1
INNER JOIN table2 ON something = something
should almost always be
SELECT id1
FROM table1 t1
WHERE EXISTS( SELECT *
FROM table2
WHERE something = something )
For simple queries the optimizer may reduce the query plans into identical ones. Check it out on your DBMS.
Also this is a code smell and probably should be changed:
JOIN (SELECT request_id
FROM request_tbl
WHERE request_status='A')
to
SELECT result
FROM request
WHERE EXISTS(...)
AND request_status = 'A'
No difference.
You can tell by running EXPLAIN PLAN on both those statements - Oracle knows that all you want is the "result" column, so it only does the minimum necessary to get the data it needs - you should find that the plans will be identical.
The Oracle optimiser does, sometimes, "materialize" a subquery (i.e. run the subquery and keep the results in memory for later reuse), but this is rare and only occurs when the optimiser believes this will result in a performance improvement; in any case, Oracle will do this "materialization" whether you specified the columns in the subqueries or not.
Obviously if the only place the "results" column is stored is in the blocks (along with the rest of the data), Oracle has to visit those blocks - but it will only keep the relevant info (the "result" column and other relevant columns, e.g. "test_id") in memory when processing the query.

Resources