Oracle script for getting results and update a table column in the same time - oracle

I would like your help for creating a script for getting results and in the same time updating a field in my table if necessary.
In my application, I have persons (table PERSON) who create REQUESTS (table REQUEST). A person is active when she has created a request during the last 3 years. I have created a field (ACTIVE - default value: 1) in the table PERSON in order to know if the person is still active.
I create a query for retrieving the number of requests for each person (Total request number, active request, inactive request):
-- PERSONS List with number of request for each person and RE_ACTIVE field
SELECT p.id,
p.lastname || ' ' || p.firstname personname,
p.company,
p.active,
(SELECT count(*)
FROM request req
WHERE req.personid = p.id) total_request_nb,
(SELECT count(*)
FROM request reqact
WHERE reqact.personid = p.id
AND reqact.requestdate > add_months(trunc(sysdate, 'YYYY'), -36)) nb_active_requests,
(SELECT count(*)
FROM request reqinact
WHERE reqinact.personid = p.id
AND reqinact.requestdate < add_months(trunc(sysdate, 'YYYY'), -36)) nb_inactive_requests,
CASE
WHEN EXISTS (SELECT *
FROM request reqreact
WHERE reqreact.personid = p.id
AND reqreact.requestdate > add_months(trunc(sysdate, 'YYYY'), -36))
THEN 1
ELSE 0
END re_active
FROM person p;
This script is working. I would like to update the field ACTIVE when the person is active (with the previous result). For instance:
UPDATE PERSON p SET ACTIVE =
CASE WHEN (
(SELECT count(*)
FROM request reqreact
WHERE reqreact.personid = p.id
AND reqreact.requestdate > add_months(trunc(sysdate, 'YYYY'), -36)) > 0
)
THEN 1
ELSE 0
END
I would like to know if it's possible to do that in the same script? Hence I could know how many updates have been done, failed, ... in once query.
Thanks you in advance for your help

You want a WHERE EXISTS condition with a correlated subquery :
UPDATE PERSON p
SET p.ACTIVE = 1
WHERE EXISTS (
SELECT 1
FROM request reqreact
WHERE reqreact.personid = p.id
AND reqreact.requestdate > add_months(trunc(sysdate, 'YYYY'), -36)
)
If there is no match in the subquery, the UPDATE in the outer query will not happen.
If you want to set to 1 or 0 depending on the result of the subquery :
UPDATE PERSON p
SET p.ACTIVE = CASE
CASE
WHEN EXISTS (
SELECT 1
FROM request reqreact
WHERE reqreact.personid = p.id
AND reqreact.requestdate > add_months(trunc(sysdate, 'YYYY'), -36)
)
THEN 1
ELSE 0
END

Related

Adding filters in subquery from CTE quadruples run time

I am working on an existing query for SSRS report that focuses on aggregated financial aid data split out into 10 aggregations. User wants to be able to select students included in that aggregated data based on new vs. returning and 'selected for verification.' For the new/returning status, I added a CTE to return the earliest admit date for a student. 2 of the 10 data fields are created by a subquery. I have been trying for 3 days to get the subquery to use the CTE fields for a filter, but they won't work. Either they're ignored or I get a 'not a group by expression' error. If I put the join to the CTE within the subquery, the query time jumps from 45 second to 400 seconds. This shouldn't be that complicated! What am I missing? I have added some of the code... 3 of the chunks work - paid_something doesn't.
with stuStatus as
(select
person_uid, min(year_admitted) admit_year
from academic_study
where aid_year between :AidYearStartParameter and :AidYearEndParameter
group by person_uid)
--- above code added to get student information not originally in qry
select
finaid_applicant_status.aid_year
, count(1) as fafsa_cnt --works
, sum( --works
case
when (
package_complete_date is not null
and admit.status is not null
)
then 1
else 0
end
) as admit_and_package
, (select count(*) --does't work
from (
select distinct award_by_aid_year.person_uid
from
award_by_aid_year
where
award_by_aid_year.aid_year = finaid_applicant_status.aid_year
and award_by_aid_year.total_paid_amount > 0 )dta
where
(
(:StudentStatusParameter = 'N' and stuStatus.admit_year = finaid_applicant_status.aid_year)
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
)
as paid_something
, sum( --works
case
when exists (
select
1
from
award_by_person abp
where
abp.person_uid = fafsa.person_uid
and abp.aid_year = fafsa.aid_year
and abp.award_paid_amount > 0
) and fafsa.requirement is not null
then 1
else 0
end
) as paid_something_fafsa
from
finaid_applicant_status
join finaid_tracking_requirement fafsa
on finaid_applicant_status.person_uid = fafsa.person_uid
and finaid_applicant_status.aid_year = fafsa.aid_year
and fafsa.requirement = 'FAFSA'
left join finaid_tracking_requirement admit
on finaid_applicant_status.person_uid = admit.person_uid
and finaid_applicant_status.aid_year = admit.aid_year
and admit.requirement = 'ADMIT'
and admit.status in ('M', 'P')
left outer join stuStatus
on finaid_applicant_status.person_uid = stuStatus.person_uid
where
finaid_applicant_status.aid_year between :AidYearStartParameter and :AidYearEndParameter
and (
(:VerifiedParameter = '%') OR
(:VerifiedParameter <> '%' AND finaid_applicant_status.verification_required_ind = :VerifiedParameter)
)
and
(
(:StudentStatusParameter = 'N' and (stuStatus.admit_year IS NULL OR stuStatus.admit_year = finaid_applicant_status.aid_year ))
OR
(:StudentStatusParameter = 'R' and stuStatus.admit_year <> finaid_applicant_status.aid_year)
OR :StudentStatusParameter = '%'
)
group by
finaid_applicant_status.aid_year
order by
finaid_applicant_status.aid_year
Not sure if this helps, but you have something like this:
select aid_year, count(1) c1,
(select count(1)
from (select distinct person_uid
from award_by_aid_year a
where a.aid_year = fas.aid_year))
from finaid_applicant_status fas
group by aid_year;
This query throws ORA-00904 FAS.AID_YEAR invalid identifier. It is because fas.aid_year is nested too deep in subquery.
If you are able to modify your subquery from select count(1) from (select distinct sth from ... where year = fas.year) to select count(distinct sth) from ... where year = fas.year then it has the chance to work.
select aid_year, count(1) c1,
(select count(distinct person_uid)
from award_by_aid_year a
where a.aid_year = fas.aid_year) c2
from finaid_applicant_status fas
group by aid_year
Here is simplified demo showing non-working and working queries. Of course your query is much more complicated, but this is something what you could check.
Also maybe you can use dbfiddle or sqlfiddle to set up some test case? Or show us sample (anonimized) data and required output for them?

Re-writing a join query

I have a question concerning Hive. Let me explain to you the scenario :
I am using a Hive action on Oozie; I have a query which is doing
succesive LEFT JOIN on different tables;
Total number of rows to be inserted is about 35 million;
First, the job was crashing due to lack of memory, so I set "set hive.auto.convert.join=false" the query was perfectly executed but it took 4 hours to be done;
I tried to rewrite the order of LEFT JOINs putting large tables at the end, but same result, about 4 hours to be executed;
Here is what the query look like:
INSERT OVERWRITE TABLE final_table
SELECT
T1.Id,
T1.some_field_name,
T1.another_filed_name,
T2.also_another_filed_name,
FROM table1 T1
LEFT JOIN table2 T2 ON ( T2.Id = T1.Id ) -- T2 is the smallest table
LEFT JOIN table3 T3 ON ( T3.Id = T1.Id )
LEFT JOIN table4 T4 ON ( T4.Id = T1.Id ) -- T4 is the biggest table
So, knowing the structure of the query is there a way to rewrite it so that I can avoid too many JOINs ?
Thanks in advance
PS: Even vectorization gave me the same timing
Too long for a comment, will be deleted later.
(1) Your current query won't compile.
(2) You are not selecting anything from T3 and T4, which makes no sense.
(3) Changing the order of tables is not likely to have any impact with cost based optimizer.
(4) Basically I would suggest to collect statistics on the tables, specifically on the id columns, but in your case I got a feeling that id is not unique in more than 1 table.
Add to your post the result of the following query:
select *
, case when cnt_1 = 0 then 1 else cnt_1 end
* case when cnt_2 = 0 then 1 else cnt_2 end
* case when cnt_3 = 0 then 1 else cnt_3 end
* case when cnt_4 = 0 then 1 else cnt_4 end as product
from (select id
,count(case when tab = 1 then 1 end) as cnt_1
,count(case when tab = 2 then 1 end) as cnt_2
,count(case when tab = 3 then 1 end) as cnt_3
,count(case when tab = 4 then 1 end) as cnt_4
from ( select 1 as tab,id from table1
union all select 2 as tab,id from table2
union all select 3 as tab,id from table3
union all select 4 as tab,id from table4
) t
group by id
having greatest (cnt_1,cnt_2,cnt_3,cnt_4) >= 10
) t
order by product desc
limit 10
;

Oracle double select issue

So I have these 2 tables on Oracle:
CLIENT
cl_id cl_name
1 John
2 Maria
PAYMENTS
pa_id pa_date pa_status cl_id
1 2017-01-01 1 1
2 2017-01-01 1 2
3 2017-02-01 1 1
4 2017-02-01 1 2
5 2017-03-01 0 1
6 2017-03-01 1 2
I need a select statemant that gives me the client ID, NAME and the STATUS of his last payment. So the end result of my select should be:
cl_id cl_name pa_status
1 John 0
2 Maria 1
This is the CLIENT select that works:
select cl_id, cl_name from CLIENT;
This is the last status of the PAYMENT select that works:
select * from (
select pa_status from PAYMENT ORDER BY PA_DATE DESC)
where rownum = 1;
So now, I need to make them work together. I tried 2 ways that didn't work:
select cl_id, cl_name, (select * from (
select pa_status from PAYMENT ORDER BY PA_DATE DESC)
where rownum = 1 and PAYMENT.cl_id = CLIENT.CL_ID) as last_status from CLIENT;
error: invalid identifier
AND this:
select cl_id, cl_name, (select * from (
select pa_status from PAYMENT ORDER BY PA_DATE DESC)
where rownum = 1 ) as last_status from CLIENT;
which don't give me any errors, but only shows the same last status of John that is the last record:
cl_id cl_name last_status
1 John 0
2 Maria 0
Can anyone give me a hint?
Thanks
you need to use analystic function.
This kind of functions let you split your data to some groups, and rank the data for each group as you wish.
In your case:
Select * from (
Select id, name, status, row_number () over (partition by p.cl_id order by p.pa_date desc) as rw
From client c join payments p on p.cl_id = c.cl_id)
Inn where inn.rw = 1;
First take the max of date from for each clientid.
Select cl_id, max(pa_date) as pa_date from PAYMENTS group by cl_id
Now you take ur client table and join with above subquery
select c.cl_id, c.cl_name,
(select pa_status from PAYMENT t where t.pa_date=p.pa_date and t.cl_id=p.cl_id)
from CLIENT c join (Select cl_id, max(pa_date) as pa_date from PAYMENTS group by cl_id) p on p.cl_id=c.cl_id
You can use Oracle's KEEP LAST here:
select cl_id, c.cl_name, last_payment.status
from client
join
(
select
cl_id,
max(pa_status) keep (dense_rank last order by pa_date) as status
from payments
group by cl_id
) last_payment using (cl_id);
(If you want to include clients without payments, change the join to LEFT OUTER JOIN.)
This gets the max date for the client
and then gets the highest payment id with that date.
with max_date as (
select max(date) as max_date, cl_id from payments group by cl_id
)
select c.cl_id, c.cl_name, p.pa_sttus from client c
join payments p
on c.cl_id = p.cl_id
where p.pa_id = (select max(p2.pa_id) from payments p2
join max_date md
on p2.cl_id = md.cl_id
where p.cl_id = p2.cl_id
and p2.pa_date = md.max_date
)

Selecting rows that has exactly same data as other table

I have these tables:
Products, Articles, Product_Articles
Lets say, product_ids are: p1 , p2 article_ids are: a1 , a2 , a3
product_articles is:
(p1,a1)
(p1,a2)
(p2,a1)
(p2,a1)
(p2,a2)
(p2,a3)
How to query for product_id, which has only a1,a2, nothing less, nothing more?
UPDATED Try
SELECT p.*
FROM products p JOIN
(
SELECT product_id
FROM product_articles
GROUP BY product_id
HAVING COUNT(*) = SUM(CASE WHEN article_id IN (1, 2) THEN 1 ELSE 0 END)
AND SUM(CASE WHEN article_id IN (1, 2) THEN 1 ELSE 0 END) = 2
) q ON p.product_id = q.product_id
or
SELECT p.*
FROM products p JOIN
(
SELECT product_id, COUNT(*) a_count
FROM product_articles
WHERE article_id IN (1, 2)
GROUP BY product_id
HAVING COUNT(*) = 2
) a ON p.product_id = a.product_id JOIN
(
SELECT product_id, COUNT(*) total_count
FROM product_articles
GROUP BY product_id
) b ON p.product_id = b.product_id
WHERE a.a_count = b.total_count
Here is SQLFiddle demo for both queries
This is an example of a "set-within-sets" subquery. I advocate using aggregation with a having clause for the logic, because this is the most general way to express the relationships.
The idea is that you can count the appearance of the articles within a product (in this case) in a way similar to using a where statement. The code is a bit more complex, but it offers flexibility. In your case, this would be:
select pa.product_id
from product_articles pa
group by pa.product_id
having sum(case when pa.article_id = 'a1' then 1 else 0 end) > 0 and
sum(case when pa.article_id = 'a2' then 1 else 0 end) > 0 and
sum(case when pa.article_id not in ('a1', 'a2') then 1 else 0 end) = 0;
The first two clauses count the appearance of the two articles, making sure that there is at least one occurrence of each. The last counts the number of rows without those two articles, making sure there are none.
You can see how this easily generalizes to more articles. Or to queries where you have "a1" and "a2" but not "a3". Or where you have three of four of specific articles, and so on.
I believe this can be done entirely using relational joins, as follows:
SELECT DISTINCT pa1.PRODUCT_ID
FROM PRODUCT_ARTICLES pa1
INNER JOIN PRODUCT_ARTICLES pa2
ON (pa2.PRODUCT_ID = pa1.PRODUCT_ID)
LEFT OUTER JOIN (SELECT *
FROM PRODUCT_ARTICLES
WHERE ARTICLE_ID NOT IN (1, 2)) pa3
ON (pa3.PRODUCT_ID = pa1.PRODUCT_ID)
WHERE pa1.ARTICLE_ID = 1 AND
pa2.ARTICLE_ID = 2 AND
pa3.PRODUCT_ID IS NULL
SQLFiddle here.
The inner join looks for products associated with the articles we care about (articles 1 and 2 - produces product 1 and 2). The left outer looks for products associated with articles we don't care about (anything article except 1 and 2) and then only accepts products which don't have any unwanted articles (i.e. pa3.PRODUCT_ID IS NULL, indicating that no row from pa3 was joined in).

Query to exclude row based on another row's filter

I'm using Oracle 10g.
Question: How can I write query to return just ID only if ALL the codes for that ID end in 6? I don't want ID=1 because not all its codes end in 6.
TABLE_A
ID Code
===============
1 100
1 106
2 206
3 316
3 326
4 444
Desired Result:
ID
==
2
3
You simply want each ID where the count of rows for that id is the same as the count of rows where the third digit is six.
SELECT ID
FROM TABLE_A
GROUP BY ID
HAVING COUNT(*) = COUNT(CASE WHEN SUBSTR(code,3,1) = '6' THEN 1 END)
Try this:
SELECT DISTINCT b.id
FROM (
SELECT id,
COUNT(1) cnt
FROM table_a
GROUP BY id
) a,
(
SELECT id,
COUNT(1) cnt
FROM table_a
WHERE CODE LIKE '%6'
GROUP BY id
)b
WHERE a.id = b.id
AND a.cnt = b.cnt
Alternative using ANALYTIC functions:
SELECT DISTINCT id
FROM
(
SELECT id,
COUNT(1) OVER(PARTITION BY id) cnt,
SUM(CASE WHEN code LIKE '%6' THEN 1 ELSE 0 END) OVER(PARTITION BY id) sm
FROM table_a
)
WHERE cnt = sm

Resources