I have 3 tables trader, city_state, city_present.
I have 4million rows in trader table and my query is taking atleast 20sec. Few records in city_present and cities table.
Below is my query.
select t.trader_id, t.name, t.city, t.state from
(
SELECT distinct c.city, c.state
FROM city_present p,city_state c
WHERE p.name = 'TEST_TEST'
AND c.city = p.city
AND c.state = p.state
)
cs, trader t
where
AND t.city = cs.city
AND t.state = cs.state
AND t.name = 'john test'
AND t.is_valid= 1
I have index on customer (city, state, name, valid_customer)
Subquery is taking less than a second .. it is outer query that is taking around 20sec.
Can someone please help me how to reduce the query time.
I assume you have an index on trader.name, possibly including trader.is_valid as well?
And is the distinct really required? Does that really need to be an in-line view, or could it be a regular join?
There are some things you can try without adding anything to your schema: in your subquery you never select anything from city_present, so you can turn it into IN / EXISTS
select t.trader_id, t.name, t.city, t.state from
(
SELECT c.city, c.state
FROM city_state c
WHERE EXISTS (
select null
from city_present p
where
p.name = 'TEST_TEST'
AND c.city = p.city
AND c.state = p.state)
)
cs, trader t
where
AND t.city = cs.city
AND t.state = cs.state
AND t.name = 'john test'
AND t.is_valid= 1
Then, the same thing applies to cs. So you could rewrite to:
select t.trader_id, t.name, t.city, t.state from
trader t
where
exists (
SELECT null
FROM city_state c
WHERE EXISTS (
select null
from city_present p
where
p.name = 'TEST_TEST'
AND c.city = p.city
AND c.state = p.state)
AND t.city = c.city
AND t.state = c.state
)
AND t.name = 'john test'
AND t.is_valid= 1
You can also try flattening the subqueries:
select t.trader_id, t.name, t.city, t.state from
trader t
where
exists (
SELECT null
FROM city_present p,city_state c
WHERE p.name = 'TEST_TEST'
AND c.city = p.city
AND c.state = p.state
AND t.city = c.city
AND t.state = c.state
)
AND t.name = 'john test'
AND t.is_valid= 1
From here, you should investigate about indexing:
trader.name and / or trader.id
(city_state.city, city_state.state) and (city_present.city, city_present.state)
Related
How can I convert the below query in Oracle to Hive?
SELECT A.EMP_NO, A.LOGIN_TIMESTAMP FROM TABLE1 A, TABLE2 B
WHERE A.EMP_NO = 1234 AND B.EMP_CURR =
(SELECT MIN(EMP_CURR) FROM TABLE2 WHERE EMP_NO = A.EMP_NO AND
LOGIN_TIMESTAMP = A.LOGIN_TIMESTAMP AND EMP_STATUS_CODE <> 'P')
Use dense_rank() to get rows with minimum EMP_CURR:
SELECT A.EMP_NO, A.LOGIN_TIMESTAMP
FROM TABLE1 A
INNER JOIN (select B.*,
dense_rank() over(partition by B.EMP_NO, B.LOGIN_TIMESTAMP order by B.EMP_CURR) rn
from TABLE2 B where EMP_STATUS_CODE <> 'P'
) B
on B.EMP_NO = A.EMP_NO and B.LOGIN_TIMESTAMP = A.LOGIN_TIMESTAMP and B.rn=1
where B.rn=1 and A.EMP_NO = 1234;
This is my query:
SELECT count(oi.id) imgCnt, o.*,
IF(pricet=2,c.currency_value*o.attributes_36*o.price,
c.currency_value*o.price) AS pprice, od.title, oi.image,
MIN(oi.id), ( c.currency_value * o.price ) AS fprice,
ag.agent_name, DATE_FORMAT( o.date_added, '%d-%m-%Y') as dadded
FROM i_offers_12 o
LEFT JOIN i_agents ag ON o.agents_id = ag.id
LEFT JOIN i_currencies c ON o.currencies_id = c.id
LEFT JOIN i_offers_details od ON ( o.id = od.offers_id AND od.languages_id = 1 )
LEFT JOIN i_offers_images oi ON ( oi.offers_id = o.id AND oi.o_id = '12' )
WHERE ( o.offer_status='active' OR o.offer_status='sold')
AND actions_id = '1'
AND c.id = o.currencies_id
AND o.counties_id = '2'
AND o.cities_id = '3'
GROUP BY o.id
ORDER BY dadded
DESC
I want to sort after dadded(which is of type date) and offer_status(which is of type enum).
I want to display first, all of the elements which have offer_status = 'active' and sort by dadded and after that all of the elements which have offer_status = 'sold' and sort also by dadded. How can I do that? thx
The fields you want to sort on MUST be part of the select statement:
SELECT count(oi.id) imgCnt, o.*,
IF(pricet=2,c.currency_value*o.attributes_36*o.price,
c.currency_value*o.price) AS pprice, od.title, oi.image,
MIN(oi.id), ( c.currency_value * o.price ) AS fprice,
ag.agent_name, DATE_FORMAT( o.date_added, '%d-%m-%Y') as dadded ,
offer_status
FROM i_offers_12 o
LEFT JOIN i_agents ag ON o.agents_id = ag.id
LEFT JOIN i_currencies c ON o.currencies_id = c.id
LEFT JOIN i_offers_details od ON ( o.id = od.offers_id AND od.languages_id = 1 )
LEFT JOIN i_offers_images oi ON ( oi.offers_id = o.id AND oi.o_id = '12' )
WHERE ( o.offer_status='active' OR o.offer_status='sold')
AND actions_id = '1'
AND c.id = o.currencies_id
AND o.counties_id = '2'
AND o.cities_id = '3'
AND o.offer_status='active'
GROUP BY o.id
ORDER BY offer_status, dadded
DESC
Note that you normally should group by ALL non summarized fields (o.*,
IF(pricet=2,c.currency_value*o.attributes_36*o.price,
c.currency_value*o.price) AS pprice, od.title, oi.image, ( c.currency_value * o.price ) AS fprice,
ag.agent_name, DATE_FORMAT( o.date_added, '%d-%m-%Y') as dadded ,
offer_status)
MySQL is not enforcing it, but other DB like Oracle does.
You can use a comma, like
ORDER BY supplier_city DESC, supplier_state ASC;
I believe you can write the SQL with
ORDER BY offer_status, dadded
Please see my code below as it is running too slowly with the CROSS APPLY.
How can I remove the CROSS APPLY and add something else that will run faster?
Please note I am using SQL Server 2008 R2.
;WITH MyCTE AS
(
SELECT
R.NetWinCURRENCYValue AS NetWin
,dD.[Date] AS TheDay
FROM
dimPlayer AS P
JOIN
dbo.factRevenue AS R ON P.playerKey = R.playerKey
JOIN
dbo.vw_Date AS dD ON Dd.dateKey = R.dateKey
WHERE
P.CustomerID = 12345)
SELECT
A.TheDay AS [Date]
,ISNULL(A.NetWin, 0) AS NetWin
,rt.runningTotal AS CumulativeNetWin
FROM MyCTE AS A
CROSS APPLY (SELECT SUM(NetWin) AS runningTotal
FROM MyCTE WHERE TheDay <= A.TheDay) AS rt
ORDER BY A.TheDay
CREATE TABLE #temp (NetWin money, TheDay datetime)
insert into #temp
SELECT
R.NetWinCURRENCYValue AS NetWin
,dD.[Date] AS TheDay
FROM
dimPlayer AS P
JOIN
dbo.factRevenue AS R ON P.playerKey = R.playerKey
JOIN
dbo.vw_Date AS dD ON Dd.dateKey = R.dateKey
WHERE
P.CustomerID = 12345;
SELECT
A.TheDay AS [Date]
,ISNULL(A.NetWin, 0) AS NetWin
,SUM(B.NetWin) AS CumulativeNetWin
FROM #temp AS A
JOIN #temp AS B
ON A.TheDay >= B.TheDay
GROUP BY A.TheDay, ISNULL(A.NetWin, 0);
Here https://stackoverflow.com/a/13744550/613130 it's suggested to use recursive CTE.
;WITH MyCTE AS
(
SELECT
R.NetWinCURRENCYValue AS NetWin
,dD.[Date] AS TheDay
,ROW_NUMBER() OVER (ORDER BY dD.[Date]) AS RN
FROM dimPlayer AS P
JOIN dbo.factRevenue AS R ON P.playerKey = R.playerKey
JOIN dbo.vw_Date AS dD ON Dd.dateKey = R.dateKey
WHERE P.CustomerID = 12345
)
, MyCTERec AS
(
SELECT C.TheDay AS [Date]
,ISNULL(C.NetWin, 0) AS NetWin
,ISNULL(C.NetWin, 0) AS CumulativeNetWin
,C.RN
FROM MyCTE AS C
WHERE C.RN = 1
UNION ALL
SELECT C.TheDay AS [Date]
,ISNULL(C.NetWin, 0) AS NetWin
,P.CumulativeNetWin + ISNULL(C.NetWin, 0) AS CumulativeNetWin
,C.RN
FROM MyCTERec P
INNER JOIN MyCTE AS C ON C.RN = P.RN + 1
)
SELECT *
FROM MyCTERec
ORDER BY RN
OPTION (MAXRECURSION 0)
Note that this query will work if you have 1 record == 1 day! If you have multiple records in a day, the results will be different from the other query.
As I said here, if you want really fast calculation, put it into temporary table with sequential primary key and then calculate rolling total:
create table #Temp (
ID bigint identity(1, 1) primary key,
[Date] date,
NetWin decimal(29, 10)
)
insert into #Temp ([Date], NetWin)
select
dD.[Date],
sum(R.NetWinCURRENCYValue) as NetWin,
from dbo.dimPlayer as P
inner join dbo.factRevenue as R on P.playerKey = R.playerKey
inner join dbo.vw_Date as dD on Dd.dateKey = R.dateKey
where P.CustomerID = 12345
group by dD.[Date]
order by dD.[Date]
;with cte as (
select T.ID, T.[Date], T.NetWin, T.NetWin as CumulativeNetWin
from #Temp as T
where T.ID = 1
union all
select T.ID, T.[Date], T.NetWin, T.NetWin + C.CumulativeNetWin as CumulativeNetWin
from cte as C
inner join #Temp as T on T.ID = C.ID + 1
)
select C.[Date], C.NetWin, C.CumulativeNetWin
from cte as C
order by C.[Date]
I assume that you could have duplicates dates in the input, but don't want duplicates in the output, so I grouped data before puting it into the table.
I have two queries with a union as follows:
select '00/00/0000' as payment_date , h1.customer_no
from payments h1
where not exists ( select 1 from payments h2 where h2.customer_no = h1.customer_no and h2.ctype = 'CASH' )
and h1.customer_no = 400
group by h1.customer_no
union
select to_char(h1.payment_date, 'MM/DD/YYYY') , h1.customer_no
from payments h1 inner join ( select customer_no, max(payment_date ) as max_date from payments where ctype = 'CASH' group by customer_no ) subQ
on ( h1.customer_no = subQ.customer_no
and h1.payment_date = subQ.max_date )
and h1.customer_no = 400
group by h1.payment_date, h1.customer_no
Now, I want to use this union in another query.
select * from (
select '00/00/0000' as payment_date , h1.customer_no
from payments h1
where not exists ( select 1 from payments h2 where h2.customer_no = h1.customer_no and h2.ctype = 'CASH' )
and h1.customer_no = p.customer_no
group by h1.customer_no
union
select to_char(h1.payment_date, 'MM/DD/YYYY') , h1.customer_no
from payments h1 inner join ( select customer_no, max(payment_date ) as max_date from payments where ctype = 'CASH' group by customer_no ) subQ
on ( h1.customer_no = subQ.customer_no
and h1.payment_date = subQ.max_date )
and h1.customer_no = p.customer_no
group by h1.payment_date, h1.customer_no ) sq,
payments p
where p.customer_no = 400
and sq.customer_no = p.customer_no
when I run this, I get ORA-00904: "P"."CUSTOMER_NO": invalid identifier. I need to join h1.customer_no to outer queries customer_no.
I have seen some queries with rank but I couldn't quite figure it out. How do I join the inner query with the outer query?
thanks in advance.
Does your payments table have a customer_no column? That seems to be what your error is indicating.
As for how to join the subqueries, you might want to look into factored subqueries. You can do:
WITH z AS (
SELECT ...
UNION
SELECT ...
), y AS (
SELECT ...
)
SELECT ...
FROM y
JOIN z ON y.x = z.x
JOIN some_other_table t ON z.a = t.a
I need to pull a value from the row that has the maximum date, while also summing all the values of a different column.
What I mean is something like this:
select
a.account_number,
a.client,
a.referral_date,
sum(b.amount),
max(b.date),
case when b.date = max(b.date) then b.due end as due
from a join b on a.account_number = b.account_number
group by a.account_number, a.client, a.referral_date, sum(b.amount), max(b.date), case when b.date = max(b.date) then b.due end
I'm sorry if this doesn't make sense, but I'm trying to sum ALL of "amount" while only getting "due" from the row with the maximum "date".
So if I join them so it only pulls max(date) I won't be able to sum ALL of the amounts.
I've been searching forever for this, but frankly I don't even know what to type into a search engine for this question. Thank you in advance for your help! Let me know how I can further clarify!
Steven
Wouldn't this work:
select a.account_number
, a.client,
, a.referral_date
, sum(b.amount)
, case when b.date = max(b.date) then b.due end as due
from a join b
on a.account_number = b.account_number
group by a.account_number, a.client, a.referral_date
... and if you're using PL\SQL then untested but:
select account_number
, a.client
, a.referral_date
, sum(b.amount)
, max_date
from ( select a.account_number
, a.client,
, a.referral_date
, b.amount
, max(b.date) over ( partition by a.account_number, a.client, a.referral_date ) as max_date
from a join b
on a.account_number = b.account_number )
group by a.account_number, a.client, a.referral_date, max_date
Not sure I precisely understand your goal, but how about this:
SELECT account_number, client, referral_date, amount, due
FROM (SELECT a.account_number,a.client,a.referral_date, b.due, b.date TheDate
, SUM(b.amount) OVER (PARTITION BY b.account_number) amount
, MAX(b.date) OVER (PARTITION BY b.account_number) max_dt
FROM a JOIN b ON a.account_number = b.account_number)
WHERE TheDate = max_dt;