I encountered following problem:
I have a long query (let's call it query "Z" ) that includes lots of joins and subqueries. It ouputs two columns:
A: item
B: Integer attribute, range guaranteed to be 1-10
I want to join from table X items (column A) that are not present as output of query Z and give them an arbitrary attribute value 10 (column B).
I tried creating subquery with inner subquery uisng not exists but that requires copying my original query inside and takes a lot of time (I didn't even manage to execute it).
Any suggestions? Thanks
It's not entirely clear from your question but I think you mean table X is a superset of the records from query Z. If so, a simple outer join should give you the result you want:
select coalesce(z.a, x.a) as a
, coalesce(z.b, 10) as b
from x
left outer join ( your query ) z
on z.a = x.a
If X is not a superset of Z then you should try a FULL OUTER JOIN instead.
I have assumed that column A works as a UID for the query Z and the table X. If this is not the case you'll need to tweak the above statement, or edit your question to include more details.
According to your comments , one possibility is to first make a union of the results of your "Z" query and then another select with the addition of only selecting the MIN() of your column B.
So it would look something along the lines of :
SELECT A , MIN(B) FROM
(
(QUERY Z) AS Z
union
(SELECT ITEM as A, 10 as B FROM X)
)
GROUP BY A
I did it similarily to what #Ancaron posted.
SELECT A, B FROM Z
UNION ALL
SELECT A, '10' FROM X
WHERE NOT EXISTS
(
select Z.A
from Z
WHERE Z.A=X.A
)
Related
I'm trying to execute a left join where multiple conditions must be met with the inclusion of pulling in the MAX sequence number that meets those conditions.
The left join is on the unique identifier in both tables. Table acaps_history has several rows for each app_id. I need to pull in only one row with the highest seq_number and activity_code of 'XU'. If the code 'XU' doesn't exist for the given app_id, then the case statement above should return 'N' for that row. The code I have currently just isn't working - returning the error "a column may not be outer-joined to a subquery":
create table orig_play3 as
(select
x.*,
case when xa.activity_code in 'XU' then 'Y' else 'N' end as cpo_flag
from
dfs_tab_orig_play_x x
left join cf.acaps_history xa on
x.APP_ID = xa.FOC_APPL_ID
and xa.activity_code in 'XU'
and xa.seq_number = (select max(seq_number) from cf.acaps_history where FOC_APPL_ID=x.app_id)
)
Given your error, it seems that the issue is the last part of your query:
and xa.seq_number = (select max(seq_number) from cf.acaps_history where FOC_APPL_ID=x.app_id)
This is still operating in the context of the ON clause, so the sub-query to find the max sequence number is the issue.
You should be able to avoid this by moving that sub-query out of the ON clause:
LEFT JOIN (
SELECT FOC_APPL_ID, activity_code, seq_number
FROM cf.acaps_history
WHERE activity_code in 'XU'
) xa
ON x.APP_ID = xa.FOC_APPL_ID
WHERE xa.seq_number = (select max(ah.seq_number) from cf.acaps_history ah where ah.FOC_APPL_ID=x.app_id and ah.activity_code in 'XU')
This may be the most inefficient way to execute this query, but it worked... It took like 3 minutes to run (table size is over 600K rows), but again, it returned the results I needed:
create table test as (
select x.*,
case when xb.activity_code in 'XU' then 'Y' else 'N' end as cpo_flag
from dfs_tab_orig_play_x x
left join
(select
xa.FOC_APPL_ID, xa.activity_code, xa.seq_number
from dfs_tab_orig_play_x x, cf.acaps_history xa
where x.app_id = xa.FOC_APPL_ID (+)
and xa.seq_number = (select max(seq_number) from cf.acaps_history where
x.app_id=FOC_APPL_ID(+) and activity_code in 'XU')) xb
on x.app_id = xb.FOC_APPL_ID (+)
)
If you are on 12c, I like OUTER APPLY for this sort of thing, because it lets you sort the rows for each app_id descending by seq_number and then just pick the highest one.
SELECT
x.*,
CASE
WHEN xa.activity_code IN 'XU' THEN 'Y'
ELSE 'N'
END
AS cpo_flag
FROM
dfs_tab_orig_play_x x
OUTER APPLY ( SELECT *
FROM cf.acaps_history xa
WHERE xa.foc_appl_id = x.app_id
AND xa.activity_code = 'XU'
ORDER BY xa.seq_number DESC
FETCH FIRST 1 ROW ONLY ) xa
Note: this logic is a little different from what you posted. In this version, it will join to the acaps_history row having the highest seq_number from among 'XU' records for the given app_id. Your version was joining to the row having the highest seq_number for the given app_id, whether that row was an 'XU' row or not. I am assuming (with little reason) that that was a bug on your part. But, if it wasn't, my version won't work as given.
In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was created just before the datafilename was created. So, i tried this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x left join
(
select c.config_date, c.aircraft_id, c.configurationfrom t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.configuration;
This picks up all the configurations created for the aircraft which is obvious as there is no condition to check conf.config_date < flow.f_file_creation_date. I tried to include this condition like this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x join
(
select c.config_date, c.aircraft_id, c.FILEFILTER from t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT where y.config_date < x.f_file_creation_date
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.filefilter;
This time failed with the error
required (...)+ loop did not match anything at input 'where' in statement
Can someone give me a hint or two where i am going wrong and on how to fix this?
select f.f_file_creation_date
,f.f_file_archived_relative_path
,f.f_file_archived_name
,f.k_aircraft
,c.config_date
,c.aircraft_id
,c.filefilter
from t_flow as f
join (select config_date
,aircraft_id
,filefilter
,lead (config_date,1,date '3000-01-01') over
(
partition by aircraft_id
order by config_date
) as next_config_date
from t_conf
) c
on c.aircraft_id =
f.k_aircraft
where f.f_file_creation_date >= c.config_date
and f.f_file_creation_date < c.next_config_date
Please read carefully
Posting a question
When you post a data related question -
Supply a data sample: source data + required results.
It is going to be more clear than any explanation you give.
It will also supply a common background for further discussions and a way for you and others to verify the correctness of the given solutions.
Supply the size properties (records/volume) of the tables.
It is important for performance considerations ans might impact the given solution.
SQL
Hive currently does not support any JOIN condition type other than equijoin (e.g. t1.X = t2.X and t1.Y = t2.Y). This is why you get an error.
If you are doing an inner join (and not outer join) then you can move the non-equijoin conditions to the WHERE clause.
Stick to ISO SQL standard. There is a conventional order for SQL clauses: SELECT-FROM-WHERE...
You gain nothing from esoteric syntax except for esoteric error messages.
There is no reason what so ever to use sub-queries in order to narrow the columns list.
Just to make it perfectly clear - There isn't any performance gain doing that. More than that, if it would have work as you assume (and it does not) the performance would have been worse, not better.
I can't reproduce your error. I guess your query is valid.
What version do you use for Hive ? I tested this query with hive 2.1.1.
DROP TABLE IF EXISTS t_flow;
CREATE TABLE IF NOT EXISTS t_flow (
f_file_creation_date DATE
, f_file_archived_relative_path STRING
, f_file_archived_name STRING
, k_aircraft STRING
);
-- Conf table contains configuration information.
-- It has columns configdate, aircraftid, configurationame
DROP TABLE IF EXISTS t_conf;
CREATE TABLE IF NOT EXISTS t_conf (
config_date DATE
, aircraft_id STRING
, filefilter STRING
);
SELECT
x.f_file_creation_date,
x.f_file_archived_relative_path,
x.f_file_archived_name,
x.k_aircraft,
y.config_date,
y.aircraft_id,
y.filefilter
FROM
(SELECT
f_file_creation_date,
f_file_archived_relative_path,
f_file_archived_name,
k_aircraft
FROM t_flow f) x
JOIN
(SELECT
c.config_date,
c.aircraft_id,
c.filefilter
FROM t_conf c) y on y.aircraft_id = x.k_aircraft where y.config_date < x.f_file_creation_date;
Can we perform outer join with inquality operators.
When I tried I got the result for right outer join but it's not working for left outer join.
SELECT EMP.ENAME,EMP.SALARY,SALG.SALARY_GRADE
FROM EMPLOYEE EMP , SALARY_GRADES SALG
WHERE EMP.SAL BETWEEN SALG.FROM_RANGE(+) AND SALG.TO_RANGE
Above query is generating the result as inner join where as below query is working fine.
SELECT EMP.ENAME,EMP.SALARY,SALG.SALARY_GRADE
FROM EMPLOYEE EMP , SALARY_GRADES SALG
WHERE EMP.SAL(+) BETWEEN SALG.FROM_RANGE AND SALG.TO_RANGE
I meant to say that right outer join is working fine but not left outer join.
Ummm, yes. Did you create a simple test case to demonstrate? Please always do this.
Both LEFT and RIGHT JOINs work fine. Given the following schema:
create table a (
id number
, val number );
insert all
into a values (1, 1)
into a values (2, 2)
into a values (3, 5)
select * from dual;
create table b (
id number
, min_val number
, max_val number );
insert all
into b values (1, 1, 1)
into b values (2, 1, 6)
into b values (3, 4, 6)
into b values (3, 10, 12)
select * from dual;
These two queries return the expected data. Please note my use of ANSI joins.
select *
from a
left outer join b
on a.val between b.min_val and b.max_val;
select *
from a
right outer join b
on a.val between b.min_val and b.max_val;
Here's the proof.
If you're ever in any doubt as to whether there is a problem with the database or your code you should assume either that your code is incorrect or that the data in your database simply does not exist. It's highly unlikely to be the database itself.
A very good way to test this is to do as I have done, create a very simple example. A short, self-contained, correct example that demonstrates the concepts you're using. You can then apply this to your own code to work out where you might have been going wrong.
You've commented:
Thanks for your answer but......when I insert one more record with 4
as id and 50 as value using insert into a values(4,50); then if i
query using oracle proprietary syntax like select * from a, b where
a.val between b.min_val(+) and b.max_val; I am not getting inserted
record in the result...? It is working with ansi syntax but not with
traditional syntax.....
So, this would imply that your query using the Oracle proprietary syntax is incorrect. I much prefer the ANSI standard as it's extremely obvious if you've done something wrong and it's portable. However, if you want to use the Oracle syntax the reason is that you've turned it into an INNER JOIN but not stating that both items in the BETWEEN are part of the OUTER JOIN:
select *
from a
, b
where a.val between b.min_val(+) and b.max_val(+);
Is there any condition under which the two queries will yield different results?
select * from a,b,c where a.id = b.id(+) and a.id=c.id(+);
select * from a,b,c where a.id = b.id(+) or a.id=c.id(+);
I think in both cases, it will return the row if the id is in table a.
The second select fails with ORA-01719, Outer join operator (+) not allowed in operand of OR or IN.
Yet another reason to use ANSI JOIN syntax. You couldn't even conceive of this question if you were doing so.
I have one table, 'a', with id and timestamp. Another table, 'b', has N multiple rows referring to id, and each row has 'type', and "some other data".
I want a LINQ query to produce a single row with id, timestamp, and "some other data" x N. Like this:
1 | 4671 | 46.5 | 56.5
where 46.5 is from one row of 'b', and 56.5 is from another row; both with the same id.
I have a working query in SQLite, but I am new to LINQ. I dont know where to start - I don't think this is a JOIN at all.
SELECT
a.id as id,
a.seconds,
COALESCE(
(SELECT b.some_data FROM
b WHERE
b.id=a.id AND b.type=1), '') AS 'data_one',
COALESCE(
(SELECT b.some_data FROM
b WHERE
b.id=a.id AND b.type=2), '') AS 'data_two'
FROM a first
WHERE first.id=1
GROUP BY first.ID
you didn't mention if you are using Linq to sql or linq to entities. However following query should get u there
(from x in a
join y in b on x.id equals y.id
select new{x.id, x.seconds, y.some_data, y.type}).GroupBy(x=>new{x.id,x.seconds}).
Select(x=>new{
id = x.key.id,
seconds = x.Key.seconds,
data_one = x.Where(z=>z.type == 1).Select(g=>g.some_data).FirstOrDefault(),
data_two = x.Where(z=>z.type == 2).Select(g=>g.some_data).FirstOrDefault()
});
Obviously, you have to prefix your table names with datacontext or Objectcontext depending upon the underlying provider.
What you want to do is similar to pivoting, see Is it possible to Pivot data using LINQ?. The difference here is that you don't really need to aggregate (like a standard pivot), so you'll need to use Max or some similar method that can simulate selecting a single varchar field.