I have these tables:
Products, Articles, Product_Articles
Lets say, product_ids are: p1 , p2 article_ids are: a1 , a2 , a3
product_articles is:
(p1,a1)
(p1,a2)
(p2,a1)
(p2,a1)
(p2,a2)
(p2,a3)
How to query for product_id, which has only a1,a2, nothing less, nothing more?
UPDATED Try
SELECT p.*
FROM products p JOIN
(
SELECT product_id
FROM product_articles
GROUP BY product_id
HAVING COUNT(*) = SUM(CASE WHEN article_id IN (1, 2) THEN 1 ELSE 0 END)
AND SUM(CASE WHEN article_id IN (1, 2) THEN 1 ELSE 0 END) = 2
) q ON p.product_id = q.product_id
or
SELECT p.*
FROM products p JOIN
(
SELECT product_id, COUNT(*) a_count
FROM product_articles
WHERE article_id IN (1, 2)
GROUP BY product_id
HAVING COUNT(*) = 2
) a ON p.product_id = a.product_id JOIN
(
SELECT product_id, COUNT(*) total_count
FROM product_articles
GROUP BY product_id
) b ON p.product_id = b.product_id
WHERE a.a_count = b.total_count
Here is SQLFiddle demo for both queries
This is an example of a "set-within-sets" subquery. I advocate using aggregation with a having clause for the logic, because this is the most general way to express the relationships.
The idea is that you can count the appearance of the articles within a product (in this case) in a way similar to using a where statement. The code is a bit more complex, but it offers flexibility. In your case, this would be:
select pa.product_id
from product_articles pa
group by pa.product_id
having sum(case when pa.article_id = 'a1' then 1 else 0 end) > 0 and
sum(case when pa.article_id = 'a2' then 1 else 0 end) > 0 and
sum(case when pa.article_id not in ('a1', 'a2') then 1 else 0 end) = 0;
The first two clauses count the appearance of the two articles, making sure that there is at least one occurrence of each. The last counts the number of rows without those two articles, making sure there are none.
You can see how this easily generalizes to more articles. Or to queries where you have "a1" and "a2" but not "a3". Or where you have three of four of specific articles, and so on.
I believe this can be done entirely using relational joins, as follows:
SELECT DISTINCT pa1.PRODUCT_ID
FROM PRODUCT_ARTICLES pa1
INNER JOIN PRODUCT_ARTICLES pa2
ON (pa2.PRODUCT_ID = pa1.PRODUCT_ID)
LEFT OUTER JOIN (SELECT *
FROM PRODUCT_ARTICLES
WHERE ARTICLE_ID NOT IN (1, 2)) pa3
ON (pa3.PRODUCT_ID = pa1.PRODUCT_ID)
WHERE pa1.ARTICLE_ID = 1 AND
pa2.ARTICLE_ID = 2 AND
pa3.PRODUCT_ID IS NULL
SQLFiddle here.
The inner join looks for products associated with the articles we care about (articles 1 and 2 - produces product 1 and 2). The left outer looks for products associated with articles we don't care about (anything article except 1 and 2) and then only accepts products which don't have any unwanted articles (i.e. pa3.PRODUCT_ID IS NULL, indicating that no row from pa3 was joined in).
Related
Scenario :
Join Table ORDER with Table COST
where COST has multiple rows for a single reference from Table ORDER
Desired outcome:
Return single row per Order with its associated costs.
ID NAME PRICE GST
1 Book 100 10
2 CD 50 5
Ex:
Table ORDER
ID NAME COST
1 Book 110
2 CD 55
Table COST
ID ORDER_ID COST_TYPE VALUE
1 1 PRICE 100
2 1 GST 10
3 2 PRICE 50
4 2 GST 5
LEFT OUTER JOIN returns multiple rows when below condition is used
SELECT * from ORDER
LEFT OUTER JOIN COST
ON ORDER.ID = COST.ORDER_ID
select o.id, o.name, c.price, c.gst
from order o left outer join
( select order_id,
sum(case when cost_type = 'PRICE' then value end) as price,
sum(case when cost_type = 'GST' then value end) as gst
from cost
group by order_id
) c
on o.id = c.order_id
;
So Select ORDER.ID, COST.COST_TYPE, COST.VALUE from ORDERS LEFT OUTER JOIN COST ON ORDER.ID = COST.ORDER_ID and COST.COST_TYPE = 'PRICE'
If you don't specify COST_TYPE then it will return multiple rows because ORDER_ID repeats on your COST TABLE.
This is what i found working in my case.
Had to use 2 LEFT OUTER JOIN's and alias to get it working
SELECT ID, NAME, PRICE.value as PRICE, GST.value as GST
from ORDER
LEFT OUTER JOIN COST as PRICE
ON ORDER.ID = COST.ORDER_ID
AND PRICE.COST_TYPE = 'PRICE'
LEFT OUTER JOIN COST as GST
ON ORDER.ID = COST.ORDER_ID
AND GST.COST_TYPE = 'GST'
I have searched, but can find no example that fits what I need. Perhaps I am lost in the many joins of my query...
I am returning data from three Oracle 11g tables - ATE_TESTS, ATE_DATA, and TM_CONDITION_DYNAMIC. The query has other tables to join these. In fact, there are no less than 7 Joins.
ATE_DATA may have multiple records on the Many side of a join, but I want only the last-written row. ATE_DATA has an incremented Primary Key which I would like to use in the Aggregate Function MAX(DATA_PK) within a subquery. I think it should be in the WHERE clause of the main query, but I do not know how to implement this and there may be a better way.
Perhaps I might be educated?
My query is:
SELECT ate_serial, data_data, dyn_value
FROM ate_tests
LEFT JOIN ate_test_procedure
ON ate_tests.ate_pk = ate_test_procedure.proc_ate_test_fk
LEFT JOIN ate_test_data
ON ate_test_procedure.proc_pk = ate_test_data.data_ate_test_procedure_fk
LEFT JOIN tm_test_procedure
ON ate_test_procedure.proc_test_procedure = tm_test_procedure.proc_pk
LEFT JOIN tm_test_specification
ON ate_test_data.data_specification = tm_test_specification.spec_pk
LEFT JOIN tm_test_condition_dynamic
ON tm_test_specification.spec_condition_set_fk = tm_test_condition_dynamic.dyn_condition_set_fk
LEFT JOIN tm_test_sequences
ON ate_tests.ate_sequence_fk = tm_test_sequences.seq_pk
LEFT JOIN lu_tm_products_model
ON tm_test_sequences.seq_model = lu_tm_products_model.lumod_pk
WHERE upper(spec_name) = 'POWER'
AND lumod_model = 'AMP'
AND dyn_value = '136'
AND ate_yield = 1
AND upper(proc_procedure_name) = 'FINAL TEST'
AND proc_report = 1
AND proc_status = 1
ORDER BY ate_serial, dyn_value
... but I want only the last-written row.
It's typical top-n query. Let's say table a contains info about vegetables, table b their historical prices in different shops and s info about these shops. You are interested only in last price,
which you can find using the function row_number() (or max() ... keep dense rank last...).
with a as (select 1 vid, 'Tomato' name from dual union all
select 2 vid, 'Potato' name from dual union all
select 3 vid, 'Garlic' name from dual),
b as (select 1 pid, 1 vid, 1 sid, 11.5 price from dual union all
select 2 pid, 3 vid, 1 sid, 31.8 price from dual union all
select 3 pid, 1 vid, 1 sid, 13.2 price from dual union all
select 4 pid, 1 vid, 2 sid, 12.7 price from dual ),
s as (select 1 sid, 'Best Vegetables' name from dual union all
select 2 sid, 'Organic Products' name from dual)
select a.vid, a.name, s.name as shop, p.price as last_price
from a
left join (select vid, sid, price
from (select vid, sid, price,
row_number() over (partition by vid order by pid desc) rn
from b)
where rn = 1) p
on p.vid = a.vid
left join s on s.sid = p.sid
order by a.vid
Output:
VID Name Shop Price
--- -------- ------------------ -------
1 Tomato Organic Products 12.7
2 Potato
3 Garlic Best Vegetables 31.8
I have a question concerning Hive. Let me explain to you the scenario :
I am using a Hive action on Oozie; I have a query which is doing
succesive LEFT JOIN on different tables;
Total number of rows to be inserted is about 35 million;
First, the job was crashing due to lack of memory, so I set "set hive.auto.convert.join=false" the query was perfectly executed but it took 4 hours to be done;
I tried to rewrite the order of LEFT JOINs putting large tables at the end, but same result, about 4 hours to be executed;
Here is what the query look like:
INSERT OVERWRITE TABLE final_table
SELECT
T1.Id,
T1.some_field_name,
T1.another_filed_name,
T2.also_another_filed_name,
FROM table1 T1
LEFT JOIN table2 T2 ON ( T2.Id = T1.Id ) -- T2 is the smallest table
LEFT JOIN table3 T3 ON ( T3.Id = T1.Id )
LEFT JOIN table4 T4 ON ( T4.Id = T1.Id ) -- T4 is the biggest table
So, knowing the structure of the query is there a way to rewrite it so that I can avoid too many JOINs ?
Thanks in advance
PS: Even vectorization gave me the same timing
Too long for a comment, will be deleted later.
(1) Your current query won't compile.
(2) You are not selecting anything from T3 and T4, which makes no sense.
(3) Changing the order of tables is not likely to have any impact with cost based optimizer.
(4) Basically I would suggest to collect statistics on the tables, specifically on the id columns, but in your case I got a feeling that id is not unique in more than 1 table.
Add to your post the result of the following query:
select *
, case when cnt_1 = 0 then 1 else cnt_1 end
* case when cnt_2 = 0 then 1 else cnt_2 end
* case when cnt_3 = 0 then 1 else cnt_3 end
* case when cnt_4 = 0 then 1 else cnt_4 end as product
from (select id
,count(case when tab = 1 then 1 end) as cnt_1
,count(case when tab = 2 then 1 end) as cnt_2
,count(case when tab = 3 then 1 end) as cnt_3
,count(case when tab = 4 then 1 end) as cnt_4
from ( select 1 as tab,id from table1
union all select 2 as tab,id from table2
union all select 3 as tab,id from table3
union all select 4 as tab,id from table4
) t
group by id
having greatest (cnt_1,cnt_2,cnt_3,cnt_4) >= 10
) t
order by product desc
limit 10
;
I have multiple tables, each with FK relationships that connect them to one another. I need to create a pivot table using details out of some of the tables.
Region Table
Region_ID|Region_Description
State Table
State_ID|State_Description|Region_ID_FK
Order Table
Order_ID|Order_Date|State_ID_FK
Category Table
Category_ID|Category|Description|Order_ID_FK
I am joining all the tables using a natural join, based on the FKs.
I need to determine how many orders are in each category for each region.
The resulting table should look like this:
Category|Region1|Region2|Region3|Total
Sporting 1 0 3 4
ETC 0 2 1 3
SELECT c.Category,
COUNT( CASE r.Region_ID WHEN 1 THEN 1 ELSE NULL END ) AS Region1,
COUNT( CASE r.Region_ID WHEN 2 THEN 1 ELSE NULL END ) AS Region2,
COUNT( CASE r.Region_ID WHEN 3 THEN 1 ELSE NULL END ) AS Region3,
COUNT( CASE r.Region_ID WHEN 4 THEN 1 ELSE NULL END ) AS Region4
FROM REGION r
INNER JOIN
STATE s
ON (r.Region_ID = s.Region_ID_FK)
INNER JOIN
ORDER o
ON (s.State_ID = o.State_ID_FK)
INNER JOIN
CATEGORY c
ON (o.Order_ID = c.Order_ID_FK)
GROUP BY c.Category
I have a query which returns list of SIMs. Each SIM is linked to a Customer. The SIMs are in T_SIM table and Customers are in T_CUSTOMER table. There can be more than one SIM linked to a single Customer. When returning the SIMs it returns the Customer details also.
The T_SIM table will have a foreigh key to T_CUSTOMER table.
The issue is:
First run the query by requesting top 100 records by doing order by CUSTOMER_CODE in ascending order.
Now run the same query by requesting top 1000 records by doing order by CUSTOMER_CODE in ascending order.
Here in point #2, in the results of 1000 records the first 100 records are not same as in point #1 result. The records got shuffled. The order is not consistent.
To resolve this I have used ROWID along with order by CUSTOMER_CODE.
But the solution is not accepted by the client.
Could you please suggest any other alternative to resolve the issue. The data type of CUSTOMER_CODE is VARCHAR2
Below is the query:
SELECT TT.SIM_ID,
TT.IMSI,
TT.MSISDN,
TT.SECONDARY_MSISDN,
TT.CUSTOMER_ID,
TT.SIM_STATE,
TCU.CUSTOMER_CODE
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 = 1
AND TT.SIM_ID IN
(SELECT SIM_ID
FROM
(SELECT *
FROM
(SELECT Z.*,
ROWNUM RNUM
FROM
(SELECT TT.SIM_ID
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 =1
ORDER BY TCU.CUSTOMER_CODE ASC
) Z
WHERE ROWNUM <= 1000
)
WHERE RNUM >= 0
)
)
ORDER BY TCU.CUSTOMER_CODE ASC
The result in both the cases is done order by CUSTOMER_CODE but the SIMS belonging to them are not coming in the same order.
The problem is that first you are limiting number of rows when selecting from t_sim (so these are selected randomly) , and just then you are ordering your output.
So what you should do, is to remove ROWNUM<1000 from inner query and
put it on the very top level like this:
select * from
( TT.SIM_ID,
TT.IMSI,
TT.MSISDN,
TT.SECONDARY_MSISDN,
TT.CUSTOMER_ID,
TT.SIM_STATE,
TCU.CUSTOMER_CODE
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 = 1
AND TT.SIM_ID IN
(SELECT SIM_ID
FROM
(SELECT *
FROM
(SELECT Z.*,
ROWNUM RNUM
FROM
(SELECT TT.SIM_ID
FROM T_SIM TT
LEFT OUTER JOIN T_CUSTOMER TCU
ON TT.CUSTOMER_ID = TCU.CUSTOMER_ID
WHERE 1 =1
ORDER BY TCU.CUSTOMER_CODE ASC
) Z
)
WHERE RNUM >= 0
)
)
ORDER BY TCU.CUSTOMER_CODE ASC
) where rownum<1000
Because first you want to make complete ordered result set and just then display 1000 top records of sim cards ordered by customer_code.