Spoon lookup obtaining two rows instead of duplicate columns - etl

I have this two tables
If I use the lookup step using the EMPLOYEEID and YEAR I get this columns:
But I need to get this instead:

I have prepare a solution Here.
Using this you will get your result.
Please let me know if its ok with you.

A lookup is designed to retrieve fields from a matching record in another table and append them to the "master" record.
If you want to combine records from 2 tables into a single dataset then you need to UNION them together.
Update following comment
Something like this pseudo-code ought to work:
SELECT T1.*
FROM TABLE1 T1
UNION
SELECT T2.*
FROM TABLE1 T2
INNER JOIN TABLE1 T1A ON T2.KEY1 = T1A.KEY1 AND T2.KEY2 = T1A.KEY2 AND ...

Related

Consecutive JOIN and aliases: order of execution

I am trying to use FULLTEXT search as a preliminary filter before fetching data from another table. Consecutive JOINs follow to further refine the query and to mix-and-match rows (in reality there are up to 6 JOINs of the main table).
The first "filter" returns the IDs of the rows that are useful, so after joining I have a subset to continue with. My issue is performance, however, and my lack of understanding of how the SQL query is executed in SQLite.
SELECT *
FROM mytbl AS t1
JOIN
(SELECT someid
FROM myftstbl
WHERE
myftstbl MATCH 'MATCHME') AS prior
ON
t1.someid = prior.someid
AND t1.othercol = 'somevalue'
JOIN mytbl AS t2
ON
t2.someid = prior.someid
/* Or is this faster? t2.someid = t1.someid */
My thought process for the query above is that first, we retrieve the matched IDs from the myftstbl table and use those to JOIN on the main table t1 to get a sub-selection. Then we again JOIN a duplicate of the main table as t2. The part that I am unsure of is which approach would be faster: using the IDs from the matches, or from t2?
In other words: when I refer to t1.someid inside the second JOIN, does that contain only the someids after the first JOIN (so only those at the intersection of prior and those for which t1.othercol = 'somevalue) OR does it contain all the original someids of the whole original table?
You can assume that all columns are indexed. In fact, when I use one or the other approach, I find with EXPLAIN QUERY PLAN that different indices are being used for each query. So there must be a difference between the two.
The query should be simplified to
SELECT *
FROM mytbl AS t1
JOIN myftstbl USING (someid) -- or ON t1.someid = myftstbl.someid
JOIN mytbl AS t2 USING (someid) -- or ON t1.someid = t2.someid
WHERE myftstbl.{???} MATCH 'MATCHME' -- replace {???} with correct column name
AND t1.othercol = 'somevalue'
PS. The query logic is not clear for me, so it is saved as-is.

Oracle select rows from a query which are not exist in another query

Let me explain the question.
I have two tables, which have 3 columns with same data tpyes. The 3 columns create a key/ID if you like, but the name of the columns are different in the tables.
Now I am creating queries with these 3 columns for both tables. I've managed to independently get these results
For example:
SELECT ID, FirstColumn, sum(SecondColumn)
FROM (SELECT ABC||DEF||GHI AS ID, FirstTable.*
FROM FirstTable
WHERE ThirdColumn = *1st condition*)
GROUP BY ID, FirstColumn
;
SELECT ID, SomeColumn, sum(AnotherColumn)
FROM (SELECT JKM||OPQ||RST AS ID, SecondTable.*
FROM SecondTable
WHERE AlsoSomeColumn = *2nd condition*)
GROUP BY ID, SomeColumn
;
So I make a very similar queries for two different tables. I know the results have a certain number of same rows with the ID attribute, the one I've just created in the queries. I need to check which rows in the result are not in the other query's result and vice versa.
Do I have to make temporary tables or views from the queries? Maybe join the two tables in a specific way and only run one query on them?
As a beginner I don't have any experience how to use results as an input for the next query. I'm interested what is the cleanest, most elegant way to do this.
No, you most probably don't need any "temporary" tables. WITH factoring clause would help.
Here's an example:
with
first_query as
(select id, first_column, ...
from (select ABC||DEF||GHI as id, ...)
),
second_query as
(select id, some_column, ...
from (select JKM||OPQ||RST as id, ...)
)
select id from first_query
minus
select id from second_query;
For another result you'd just switch the tables, e.g.
with ... <the same as above>
select id from second_query
minus
select id from first_query

How to effeciently select data from two tables?

I have two tables: A, B.
A has prisoner_id and prisoner_name columns.
B has all other info about prisoners included prisoner_name column.
First I select all of the data that I need from B:
WITH prisoner_datas AS
(SELECT prisoner_name, ... FROM B WHERE ...)
Then I want to know all of the id of my prisoner_datas. To do this I need to combine information by prisoner_name column, because it's common for both tables
I did the following
SELECT A.prisoner_id, prisoner_datas.prisoner_name, prisoner_datas. ...,
FROM A, prisoner_datas
WHERE A.prisoner_name = prisoner_datas.prisoner_name
But it works very slow. How can I improve performance?
Add an index on the prisoner_name join column in the B table. Then the following join should have some performance improvement:
SELECT
A.prisoner_id,
B.prisoner_name,
B.prisoner_datas.id -- and other columns if needed
FROM A
INNER JOIN B
ON A.prisoner_name = B.prisoner_name
Note here that I used an explicit join syntax here. It isn't required, and the query plan might not change, but it makes the query easier to read. I don't think the CTE will change much, but the lack of an index on the join column should be important here.

Insert Statement Returns ORA-01427 Error While Trying To Insert From Multiple Tables

I have this table F_Flight which I am trying to insert into from 3 different tables. The first, fourth and fifth columns are from the same, and the second and third columns from different tables. When I execute the code, I get a "single-row subquery returns more than one row" error.
insert when 1 = 1 then into F_Flight (planeid, groupid, dateid, flightduration, kmsflown) values
(planeid, (select b.groupid from BridgeTable b where exists (select p.p1id from pilotkeylookup p where b.pilotid = p.p1id)),
(select dd.id from D_Date dd where exists (select p.launchtime from PilotKeyLookup p where dd."Date" = p.launchtime)),
flightduration, kmsflown) select * from PilotKeyLookup p;
Your subqueries get multiple rows back, which is what the error message says. There is no correlation between the various bits of data and subqueries you're trying to insert into a single row.
This can be done as a much simpler insert...select with joins, something like:
insert into f_flight (planeid, groupid, dateid, flightduration, kmsflown)
select pkl.planeid, bt.groupid, dd.id, pkl.flightduration, pkl.kmsflown
from pilotkeylookup pkl
join bridgetable bt on bt.pilotid = pkl.p1id
join d_date dd on dd."Date" = pkl.launchtime;
This joins the main PilotKeyLookup table to the other two on the keys you used in your subqueries.
Storing an ID value instead of an actual date is unusual, and if launchtime has a time component - which seems likely from the name - and your d_date entries are just dates (i.e. all with time at midnight) then you won't find matches; you might need to do:
join d_date dd on dd."Date" = trunc(pkl.launchtime);
It also seems like this could be a view, as you're storing duplicate data - everything in f_flight could, obviously, be found from the other tables.

HIve join without common filed

I have the following tables:
Table1:
user_name Url
Rahul www.cric.info.com
ranbir www.rogby.com
sahil www.google.com
banit www.yahoo.com
Table2:
Keyword category
cric sports
footbal sports
google search
I want to search Table1 by matching the keyword in Table2. I can perform the same using case statement and the query works but it is not the right approach because each time I have to add the case statement when I will add new search keyword.
select user_name from table1
case when url like '%cric%' then sports
else 'undefined'
end as category
from table1;
Thanks find the soluntions for this approach. FIrst we need to do the Join and after that we need to filter the record.
select user_name,url,Keyword,catagory from(select table1.user_name,table1.url ,table2.keyword,table2.catagory from table1 left outer join table2)a where a.url like (concat('%',a.phrase,'%')
Not sure about more current versions, but I've run into a similar problem... the primary issue is that Hive only supports equi-join statements... when you apply logic to either side of the join, it has difficulty translating into a Map Reduce function.
The alternative method, if you have a reliably structured field, is that you can create a matching key from the larger field. For example, if you know that you're looking for your keyword to exist in the second position of a dot-delimited URI, you could do something like:
select
Uri
, split(Uri, "\\.")[1] as matchKey
from
Table1
join Table2 on Table2.keyword = Table1.matchKey
;

Resources