Nested select in hiveQL

Nested select in hiveQL - hadoop

In one of my use case, i have two tables namely flow and conf. The flow table contains list of all flight data. It has columns creationdate,datafilename,aircraftid. The conf table contains configuration information. It has columns configdate, aircraftid, configurationame. There are multiple versions of configurations created for one aircraft type. So, when we process a datafilename, we need to identify the aircraftid from the flow table, and pick up the configuration from conf table that was created just before the datafilename was created. So, i tried this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x left join
(
select c.config_date, c.aircraft_id, c.configurationfrom t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.configuration;
This picks up all the configurations created for the aircraft which is obvious as there is no condition to check conf.config_date < flow.f_file_creation_date. I tried to include this condition like this,
FROM (
SELECT
F_FILE_CREATION_DATE,
F_FILE_ARCHIVED_RELATIVE_PATH,
F_FILE_ARCHIVED_NAME,
K_AIRCRAFT
from T_FLOW f )x join
(
select c.config_date, c.aircraft_id, c.FILEFILTER from t_conf c
) y on y.aircraft_id = x.K_AIRCRAFT where y.config_date < x.f_file_creation_date
select
x.F_FILE_CREATION_DATE,
x.F_FILE_ARCHIVED_RELATIVE_PATH,
x.F_FILE_ARCHIVED_NAME,
x.K_AIRCRAFT,
y.config_date,
y.aircraft_id,
y.filefilter;
This time failed with the error
required (...)+ loop did not match anything at input 'where' in statement
Can someone give me a hint or two where i am going wrong and on how to fix this?

select f.f_file_creation_date
,f.f_file_archived_relative_path
,f.f_file_archived_name
,f.k_aircraft
,c.config_date
,c.aircraft_id
,c.filefilter
from t_flow as f
join (select config_date
,aircraft_id
,filefilter
,lead (config_date,1,date '3000-01-01') over
(
partition by aircraft_id
order by config_date
) as next_config_date
from t_conf
) c
on c.aircraft_id =
f.k_aircraft
where f.f_file_creation_date >= c.config_date
and f.f_file_creation_date < c.next_config_date
Please read carefully
Posting a question
When you post a data related question -
Supply a data sample: source data + required results.
It is going to be more clear than any explanation you give.
It will also supply a common background for further discussions and a way for you and others to verify the correctness of the given solutions.
Supply the size properties (records/volume) of the tables.
It is important for performance considerations ans might impact the given solution.
SQL
Hive currently does not support any JOIN condition type other than equijoin (e.g. t1.X = t2.X and t1.Y = t2.Y). This is why you get an error.
If you are doing an inner join (and not outer join) then you can move the non-equijoin conditions to the WHERE clause.
Stick to ISO SQL standard. There is a conventional order for SQL clauses: SELECT-FROM-WHERE...
You gain nothing from esoteric syntax except for esoteric error messages.
There is no reason what so ever to use sub-queries in order to narrow the columns list.
Just to make it perfectly clear - There isn't any performance gain doing that. More than that, if it would have work as you assume (and it does not) the performance would have been worse, not better.

I can't reproduce your error. I guess your query is valid.
What version do you use for Hive ? I tested this query with hive 2.1.1.
DROP TABLE IF EXISTS t_flow;
CREATE TABLE IF NOT EXISTS t_flow (
f_file_creation_date DATE
, f_file_archived_relative_path STRING
, f_file_archived_name STRING
, k_aircraft STRING
);
-- Conf table contains configuration information.
-- It has columns configdate, aircraftid, configurationame
DROP TABLE IF EXISTS t_conf;
CREATE TABLE IF NOT EXISTS t_conf (
config_date DATE
, aircraft_id STRING
, filefilter STRING
);
SELECT
x.f_file_creation_date,
x.f_file_archived_relative_path,
x.f_file_archived_name,
x.k_aircraft,
y.config_date,
y.aircraft_id,
y.filefilter
FROM
(SELECT
f_file_creation_date,
f_file_archived_relative_path,
f_file_archived_name,
k_aircraft
FROM t_flow f) x
JOIN
(SELECT
c.config_date,
c.aircraft_id,
c.filefilter
FROM t_conf c) y on y.aircraft_id = x.k_aircraft where y.config_date < x.f_file_creation_date;

Related

How to retrieve data from 3 tables using sub query oracle SQL

I want to retrieve users name and there responsibility_key where there end_date is null and i want to convert it to (sysdate+1) using nvl but i am only able to retrieve the responsibility_key not the name please help.

The error in the image says "column ambiguously defined". Take a close look. Your last END_DATE could refer to either the u alias or the table from the subquery. Change it to match the rest of your subquery (FIND_USER_GROUPS_DIRECT.END_DATE)
EDIT
Your query is
select u.USER_NAME, d.responsibility_key from FND_USER u,FND_RESPONSIBILITY_VL d
where responsibility_id in(
select responsibility_id from
FND_USER_RESP_GROUPS_DIRECT WHERE END_USER_RESP_GROUPS_DIRECT.END_DATE=nvl(END_DATE,sysdate+1)) and
u.END_DATE=nvl(END_DATE,SYSDATE + 1)
;
The query isn't formatted, which makes it hard to read.
Not all columns are qualified with table name (or aliases), as mentioned in the comments.
The query currently uses an implicit join.
The query is impossible to understand without seeing the table definitions (desc [table_name]).
For points 1 and 2, a properly formatted query will look something like
select u.user_name, d.responsibility_key
from
fnd_user u,
fnd_responsibility_vl d
where
d.responsibility_id in (
select urgd.responsibility_id
from
fnd_user_resp_groups_direct urgd
where
urgd.end_date = nvl(u.end_date, sysdate+1)
) and
u.end_date = nvl(urgd.end_date, sysdate + 1)
;
This makes it easier to read and in addition to this, you can see that without table definitions I guessed (see point 4) as to which tables the end_date column belongs in your query. If I had to guess, so does Oracle. That means you have an ambiguity problem. To fix it, take a close look at the end_date column as it appears in your original query and where you do not prefix it with anything, you need to prefix it with the appropriate alias (after you have aliased all your tables).
For point 3, you can write your query more clearly with an explicit join and by using aliases for all columns. As for the explicit join I have no idea what your tables look like but one possibility is something like
select u.user_name, d.responsibility_key
from fnd_user u
join fnd_responsibility_vl d
on u.id = d.user_id
where
d.responsibility_id in (
select responsibility_id
from fnd_user_resp_groups_direct urgd
where
urgd.end_date = nvl(u.end_date, sysdate+1)
) and
u.end_date = nvl(urgd.end_date, sysdate+1)
;
If you follow these points you will get to the root of the error.

Update statement with joins in Oracle

I need to update one column in table A with the result of a multiplication of one field from table A with one field from table B.
It would be pretty simple to do this in T-SQL, but I can't write the correct syntax in Oracle.
What I've tried:
UPDATE TABLE_A
SET TABLE_A.COLUMN_TO_UPDATE =
(select TABLE_A.COLUMN_WITH_SOME_VALUE * TABLE_B.COLUMN_WITH_PERCENTAGE
from TABLE_A
INNER JOIN TABLE_B
ON TABLE_A.PRODUCT_ID = TABLE_B.PRODUCT_ID
AND TABLE_A.SALES_CHANNEL_ID = TABLE_B.SALES_CHANNEL_ID)
WHERE TABLE_A.MONTH_ID IN (201601, 201602, 201603);
But I keep getting errors. Could anybody help me, please?

I generally prefer to use the below format for such cases since this will ensure there's no update performed if there's no data in the table(query extracted temp table) whereas in the above solution provided by Brian Leach will update the new value as null if there's no record present in the 2nd table but exists in the first table.
UPDATE
(
select TABLE_A.COLUMN_TO_UPDATE
, TABLE_A.PRODUCT_ID
, TABLE_A.COLUMN_WITH_SOME_VALUE * TABLE_B.COLUMN_WITH_PERCENTAGE as value
from TABLE_A
INNER JOIN TABLE_B
ON TABLE_A.PRODUCT_ID = TABLE_B.PRODUCT_ID
AND TABLE_A.SALES_CHANNEL_ID = TABLE_B.SALES_CHANNEL_ID
AND TABLE_A.MONTH_ID IN (201601, 201602, 201603)
) DATA
SET DATA.COLUMN_TO_UPDATE = DATA.value;
This solution can cause key preserved value issues which shouldn't be an issue here since i expect a single row in both the tables for one product(ID).
More on Key Preserved table concept in inner join can be found here
https://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:548422757486

#Jayesh Mulwani raiesed a valid point, this will set the value to null if there is no matching record. This may or may not be the desired result. If it isn't, and no change is desirect, you can change the select statement to:
coalesce((SELECT table_b.column_with_percentage
FROM table_b
WHERE table_a.product_id = table_b.product_id AND table_a.sales_channel_id = table_b.sales_channel_id),1)
If this is the desired outcome, Jayesh's solution will be more efficient as it will only update matching records.
UPDATE table_a
SET table_a.column_to_update = table_a.column_with_some_value
* (SELECT table_b.column_with_percentage
FROM table_b
WHERE table_a.product_id = table_b.product_id
AND table_a.sales_channel_id = table_b.sales_channel_id)
WHERE table_a.month_id IN (201601, 201602, 201603);

Oracle: Invalid identifier

I am using the following query in oracle. However, it gives an error saying that "c.par" in line 5 is an invalid parameter. No idea why. The columns exist. I checked. I have been struggling with this for a long time. All I want to do is to merge one table into another and update it using oracle. Could someone please help?
MERGE INTO SPRENTHIERARCHIES
USING ( SELECT c.PARENTCATEGORYID AS par,
e.rootcategoryId AS root
FROM SPRENTCATEGORIES c,SPRENTHIERARCHIES e
WHERE e.root (+)= c.par
) SPRENTCATEGORIES
ON (SPRENTHIERARCHIES.rootcategoryId = SPRENTCATEGORIES.parentcategoryId)
WHEN MATCHED THEN
UPDATE SET e.root=c.par

The e and c aliases only exist within the query in the using clause. You're trying to refer to them in the update clause. You're also using a column alias from the using clause against the target table, which doesn't have that column (unless your tables have both rootcategoryId and root, and parentCategoryId and par).
So this:
UPDATE SET e.root=c.par
should be:
UPDATE SET SPRENTHIERARCHIES.rootcategoryId= SPRENTCATEGORIES.par
And in that using clause you're trying to use column aliases as the same level of query, so this:
WHERE e.root (+)= c.par
should be:
WHERE e.rootcategoryId (+)= c.PARENTCATEGORYID
Your on clause is wrong too, as that is not using the column alias:
ON (SPRENTHIERARCHIES.rootcategoryId = SPRENTCATEGORIES.par)
But I'd suggest you replace the old syntax in the using clause with proper join clauses:
MERGE INTO SPRENTHIERARCHIES
USING ( SELECT c.PARENTCATEGORYID AS par,
e.rootcategoryId AS root
FROM SPRENTCATEGORIES c
LEFT JOIN SPRENTHIERARCHIES e
ON e.rootcategoryId = c.PARENTCATEGORYID
) SPRENTCATEGORIES
ON (SPRENTHIERARCHIES.rootcategoryId = SPRENTCATEGORIES.par)
WHEN MATCHED THEN
UPDATE SET SPRENTHIERARCHIES.rootcategoryId= SPRENTCATEGORIES.par
You have a more fundamental problem though, as you're trying to update a joining column; this will get:
ORA-38104: Columns referenced in the ON Clause cannot be updated
As Gordon Linoff suggested you can use an update rather than a merge. Something like:
UPDATE SPRENTHIERARCHIES h
SET h.rootcategoryId = (
SELECT c.PARENTCATEGORYID
FROM SPRENTCATEGORIES c
WHERE c.PARENTCATEGORYID = h.rootCategoryID
)
WHERE EXISTS (
SELECT null
FROM SPRENTCATEGORIES c
WHERE c.PARENTCATEGORYID = h.rootCategoryID
)
The where exists clause is there in case there not be a matching record - which the outer join in your original query implies. But in this form it's even more obvious that you're going to update rootcategoryId to the same value, since you're selecting the parentCategoryID which is equal to it. So the update (or merge) seems to be pointless.

summing values from one table into another table

I'm an SQL newbie using VB6/Access 2000 and am trying to get a query which puts the sum of values from a table into another table.
VB6 does the job, but it's so slow.
I searched and tried in Access many times, just got lost with keywords IN, ON, (INNER) JOIN, each time getting a different error.
The core code should be as follows:
update t1
set t1.value = sum(t2.value)
where
val(t2.code)>89
and
t2.date=t1.date
t1.date is a date, no duplicates
t2.code is a variable string like '0081', '090'
values are single precision
After further searching i found a similar question here ( http://goo.gl/uqlw0U ) and tried that:
UPDATE t1
SET t1.value =
(
SELECT
SUM(t2.value)
FROM spese
WHERE
t1.date=t2.date
AND
val(t2.code)>89
)
but Access just says "updatable query needed" -- what does that mean?

Try this:
UPDATE t1
SET t1.value = SUM(t2.value)
FROM t1, t2
WHERE
val(t2.code)>89
AND
t2.date=t1.date

SQL Statement : Update issue

I have these three tables :
RESEARCHER(Re_Id, Re_Name, Re_Address, Re_Phone, Re_HomePhoneNumber,
Re_OfficeNumber, Re_FirstScore, Re_Second_Score)
PUBLICATION(Pub_ID, Pub_Title, Pub_Type, Pub_Publisher, Pub_Year,Pub_Country, Pub_StartingPage, Pub_Number_of_Page, Score1, Score2)
WRITTEN_BY(Re_Id, Pub_ID)
I want to change the authors of the publication “Introduction to Database System” to “Henry Gordon” and “Sarah Parker”.
The problem is in WRITTEN_BY table,I just store the researcher's ID and publication's ID.
My idea is to change the Re_Id in WRITTEN_BY by those names are "Henry Gorgon" , "Sarah Parker" , which are already existed in RESEARCHER table
UPDATE WRITTEN_BY
SET Re_Id = ....( SELECT Re_Id
FROM RESEACHER
WHERE Re_Name = ‘Henry Gordon’ OR Re_Name = ‘Sarah Paker’ )
WHERE Pub_ID IN ( SELECT Pub_ID
FROM PUBLICATION
WHERE Pub_Name = ‘Introduciton to Database system’ );
I have problem in the SET part,so how to write the SQL statement for that requirement?
Here is the sqlfiddle link for my schema : http://sqlfiddle.com/#!9/b9118/1

I'd use something like below query:
DELETE FROM WRITTEN_BY WHERE Pub_ID IN (
SELECT Pub_ID FROM PUBLICATION
WHERE Pub_Title = 'Introduciton to Database system' )
INSERT INTO WRITTEN_BY
SELECT Re_Id,Pub_Id
FROM RESEARCHER CROSS JOIN PUBLICATION
WHERE Re_Name = 'Henry Gordon' OR Re_Name = 'Sarah Paker'
AND Pub_Title like 'Introduciton to Database system'
SELECT * FROM WRITTEN_BY
The idea is to first drop the existing mapping- you should not update it- and the insert a new one.
The reason for delete/insert approach vs update in case of mapping table is justified in favor of delete/insert as most mapping tables contain many-many mapping and usually one-to-many mappings.
Initially we may have a book mapped to say n number of authors where n <>1 then we either add extra rows, or are left with extraneous rows.
See sample fiddle: http://sqlfiddle.com/#!6/a0e72/13
The real deal however is CROSS JOIN. This does not take any ON like other JOINs and is used to produce cartesian product type map.
We are restricting it to get only limited number of rows as per our need by adding suitable WHERE clauses

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio