I have select query which uses subquery to produce results:
SELECT
fu.user_name "User name"
, frt.responsibility_name "Responsibility Name"
, furg.start_date "Start Date"
, furg.end_date "End Date"
, fu.last_logon_date "Last Logon Date"
, fr.responsibility_key "Responsibility key"
, fu.email_address "Email Address"
, fu.description "Description"
, m.MANAGER_NAME
, m.MANAGER_EMAIL
FROM
fnd_user_resp_groups_direct furg
, fnd_user fu
, applsys.fnd_responsibility_tl frt
, applsys.fnd_responsibility fr
---------------- this subquery ------------------------
, (select distinct e.employee_number employee_id, e.full_name employee,e.EMAIL_ADDRESS employee_email,
m.employee_number manager_id, m.full_name manager_name, m.EMAIL_ADDRESS as manager_email
from EMPDATA_IMPORT_STG e
,EMPDATA_IMPORT_STG m
where m.employee_number=e.supervisor_name
) m
---------------------------------------
WHERE furg.user_id = fu.user_id
AND furg.responsibility_id = frt.responsibility_id
AND fr.responsibility_id = frt.responsibility_id
AND (to_char(fu.END_DATE) is null OR fu.END_DATE > sysdate)
and fu.email_address=m.EMPLOYEE_EMAIL
and not (frt.responsibility_name like '%iExpenses%' and frt.responsibility_name not like '%iExpenses Setup and Admin%')
and frt.responsibility_name not like '%Expenses Auditor%'
and frt.responsibility_name not like '%Notifications%'
and frt.responsibility_name not like '%Inquiry%' and frt.responsibility_name not like '%INQUIRY%'
and frt.responsibility_name not like '%Self-Service%' and frt.responsibility_name not like '%Self Service%'
and frt.responsibility_name not like '%(Read Only)%' and frt.responsibility_name not like '%Internet Expenses Help Desk%'
and frt.responsibility_name not like '%Employee Opportunities%'
My issue is that EMPDATA_IMPORT_STG table has over 2 millions records and it has non unique index created on its columns EMPLOYEE_NUMBER, FILE_TIMESTAMP. I have tried to execute this in production and it started execution, and i waited 4 minutes and killed my session. Just to mention in test environment same query executes in 10 sec.
How to speed this query execution in production?
Statistics are collected daily in prod and im thinking to create index on email column as well, would that give me faster response?
Thank you
Because this query is executed on time per month, i have created temporary table which holds those employees by using distinct employee_number from original one:
create table xx_employees as select DISTINCT employee_number,
full_name,
supervisor_name,
EMAIL_ADDRESS
from EMPDATA_IMPORT_STG ;
This solves my problem, because unfortunately i had very small time window to provide results of this query.
My apologies to all once again.
Thank you
Related
There is a table called USER_SETUP. This table has a column called EMPLOYEE_ID which is originally VARCHAR. I created a view to convert the VARCHAR data type to NUMBER. I used TO_DECIMAL("EMPLOYEE_ID",12,0) AS "EMPLOYEE_ID" (because when I use TO_INTEGER("EMPLOYEE_ID") AS "EMPLOYEE_ID", I am getting exception in view).
The view is created using nested SQL statement [e.g: select * from (select * from table)] The view is generating correct output with correct data. In the view I created 2 columns using CASE function which is based on the SQL code inside. Refer the code of SQL View below
Main View code:
CREATE VIEW ECLINIC_KNG.VIEW_USER_SETUP AS
SELECT
USER_ID,
EMPLOYEE_ID,
CASE
WHEN EMPLOYEE_ID < 50000 THEN 'Kuwaiti'
WHEN EMPLOYEE_ID >= 50000 AND LENGTH(EMPLOYEE_ID) <=6 THEN 'Non-Kuwaiti'
ELSE 'NOT KNG'
END AS "KWT_NKWT",
CASE
WHEN LENGTH(EMPLOYEE_ID) <=6 THEN 'KNG'
WHEN LENGTH(EMPLOYEE_ID)=12 THEN 'MOH'
ELSE 'Undefined'
END AS "KNG_MOH",
USER_NAME,
ACTIVE_DATE,
DEACTIVE_DATE,
ACTIVE_STATUS,
USER_CODE,
USER_LABEL,
USER_PASSWORD,
USER_TYPE,
OFFICE_ID,
USER_DESIG,
USER_LICENSE,
USER_SIGN,
CLIMIT_DAYS,
EDIT_BILL_SERVICE,
PASSWORDGROUP_ID,
USERLABEL_AR,
DISCOUNT_APPROVAL,
DISCOUNT_TYPE,
DISCOUNT_MAX,
BILL_CANCELLATION,
BILL_CANCELLATION_DAYS,
DOCTOR_ID
FROM
(SELECT
"USER_ID",
--TO_INTEGER("EMPLOYEE_ID") AS "EMPLOYEE_ID" ,
TO_DECIMAL("EMPLOYEE_ID",12,0) AS "EMPLOYEE_ID",
"USER_NAME",
"ACTIVE_DATE",
"DEACTIVE_DATE",
"ACTIVE_STATUS",
"USER_CODE",
"USER_LABEL",
"USER_PASSWORD",
"USER_TYPE",
"OFFICE_ID",
"USER_DESIG",
"USER_LICENSE",
"USER_SIGN",
"CLIMIT_DAYS",
"EDIT_BILL_SERVICE",
"PASSWORDGROUP_ID",
"USERLABEL_AR",
"DISCOUNT_APPROVAL",
"DISCOUNT_TYPE",
"DISCOUNT_MAX",
"BILL_CANCELLATION",
"BILL_CANCELLATION_DAYS",
"DOCTOR_ID"
FROM
"ECLINIC_KNG"."USER_SETUP"
WHERE
"USER_SETUP"."EMPLOYEE_ID" not in ('undefined', '299450','34(NEW RECRUIT)', 'army', 'MOH
Nurse','NEW RECRUITMENT 380','1111111','0')
AND
"USER_SETUP"."EMPLOYEE_ID" LIKE_REGEXPR '[0-9]'
ORDER BY
"USER_ID");
The 2 new columns created in the view
CASE
WHEN EMPLOYEE_ID < 50000 THEN 'Kuwaiti'
WHEN EMPLOYEE_ID >= 50000 AND LENGTH(EMPLOYEE_ID) <=6 THEN 'Non-Kuwaiti'
ELSE 'NOT KNG'
END AS "KWT_NKWT",
CASE
WHEN LENGTH(EMPLOYEE_ID) <=6 THEN 'KNG'
WHEN LENGTH(EMPLOYEE_ID)=12 THEN 'MOH'
ELSE 'Undefined'
END AS "KNG_MOH"
The issue is when I am filtering specific data using these 2 SQL scripts
--(1)
SELECT * FROM VIEW_USER_SETUP vus WHERE vus.KNG_MOH LIKE 'MOH'; --Issue
--(2)
SELECT * FROM VIEW_USER_SETUP vus WHERE vus.KWT_NKWT LIKE 'NOT KNG'; --Issue
The first issue the output is as follows:
SAP DBTech JDBC: [339]: invalid number: exception 71000339: SQL Error
The second issue the output is a follows:
invalid number: [6930] attribute value is not a number;exception 70006930: attribute value is not a number
To temporary solve 1, I use the below code:
SELECT * FROM VIEW_USER_SETUP vus WHERE vus.KNG_MOH NOT LIKE 'KNG';
Similarly to temporarily solve 2, I use the same above code because of the realtion.
I believe the issue is originating from the two CASE expressions created in the main view code, because when I use other filters from view there is no issue.
What is the correction required in the Main View code such that, when I execute --(1) and --(2), I get the required output?
Require help.
I am having a situation were I have to find out such records from the tables who takes more than 24 hrs two load in DW.
so for this I am having two tables
Table 1 :- Which contains the stats about each and every load
Table 2 :- Which contains the stats about when we received the each file to load
Now I want only those records which took more than 24 hrs to load.
The date on which I have received a file is in table 2 whereas when its load is finished in in table 1, so table2 may have more than 1 entries for each file.
I have developed a below query but it's taking more time
SELECT
rcd.file_date,
rcd.recived_on as "Date received On",
rcd.loaded_On "Date Processed On",
to_char(rcd.recived_on,'DY') as "Day",
round((rcd.loaded_On - rcd.recived_on)*24,2) as "time required"
FROM (
SELECT
tbl1.file_date,
(SELECT tbl2.recived_on
FROM ( SELECT recived_on
FROM table2
Where fileName = tbl1.feedName
order by recived_on) tbl2
WHERE rownum = 1) recived_on,
tbl1.loaded_On,
to_char(tbl2.recived_on,'DY'),
round((tbl1.loaded_On - tbl2.recived_on)*24,2)
FROM Table1 tbl1 ,
Table1 tbl2
WHERE
tbl1.id=tbl2.id
AND tbl1.FileState = 'Success'
AND trunc(loaded_On) between '25-Feb-2020' AND '03-Mar-2020'
) rcd
WHERE (rcd.loaded_On - rcd.recived_on)*24 > 24;
I think a lot of your problem most likely stems from the use of the subquery in your column list of your inner query. Maybe try using an analytic function instead. Something like this:
SELECT rcd.file_date,
rcd.recived_on AS "Date received On",
rcd.loaded_On "Date Processed On",
to_char(rcd.recived_on, 'DY') AS "Day",
round((rcd.loaded_On - rcd.recived_on) * 24, 2) AS "time required"
FROM (SELECT tbl1.file_date,
MIN(tbl2.recived_on) OVER (PARTITION BY tbl2.filename) AS recived_on,
tbl1.loaded_On
FROM Table1 tbl1
INNER JOIN Table1 tbl2 ON tbl1.id = tbl2.id
WHERE tbl1.FileState = 'Success'
AND trunc(loaded_On) BETWEEN '25-Feb-2020' AND '03-Mar-2020') rcd
WHERE (rcd.loaded_On - rcd.recived_on) * 24 > 24;
Also, you were selecting some columns in the inner query and not using them, so I removed them.
full disclosure this is part of a homework question but I have tried 6 different versions and I am stuck.
I am trying for find 1 manager every time the query runs. I.e I put the department id in and 1 name pops out. currently, I get all the names, multiple times. I have tried a nesting with an '=' not nesting, union, intersection, etc. I can get the manager id with a basic query, I just can't get the name. the current version looks like this:
select e.ename
from .emp e
where d.managerid in (select unique d.managerid
from works w, .dept d, emp e1
where d.did=1 and e1.eid=w.eid and d.did=w.did );
I realize its probably a really basic mistake that I am not seeing - any ideas?
Its not clear what do you mean get 1 menager any time. are it should be different menagers any time or the same?
Lest go throw your query:
you select all empolyes from table emp where manager_id in next query dataset
You get all managers for dep=1. The rest of tables and conditions are not influent on result dataset.
I theing did is primary key for table dept, If so your query may be rewritten to
select e.ename
from emp e
where d.managerid in (select unique d.managerid
from dept d
where d.did=1);
but this query return to you all emploees and not manager from dept=1
and if you need a manager. you should get emploee who is a manager. If eid is primary key of employee, and managerid is id from employee table you need something like:
select e.ename
from emp e
where e1.eid in (select unique d.managerid
from dept d
where d.did=1);
Consider there is a table of job runs history with the following schema:
job_runs
(
run_id integer not null, -- identifier of the run
job_id integer not null, -- identifier of the job
run_number integer not null, -- job run number, run numbers increment for each job
status text not null, -- status of the run (running, completed, killed, ...)
primary key (run_id)
-- ...
)
and it is required to get the last 10 runs with status != 'running' for each job (jobs differ by job_id). To do that I wrote the following query:
SELECT
*
FROM
job_runs AS JR1
WHERE
JR1.run_number IN
(
SELECT
JR2.run_number
FROM
job_runs AS JR2
WHERE
JR2.job_id = JR1.job_id
AND
JR2.status != 'running'
ORDER BY
JR2.run_number
DESC
LIMIT
10
)
It do what I need, but even though there is a multifield index on the job_id and run_num fields of the job_runs table the query is slow, because it scans job_runs table and for each its row runs subquery. The index helps subqueries to run fast each time, but the fact that the nester query scans entire table kills performance. So how can I tune performance of the query?
some thoughts:
Number of jobs (different job_ids) is small and if there were a FOR loop in SQLite it would be easy to loop over all distinct job_ids and run the subquery
passing the job id instead of JR1.job_id then UNION all results.
important:
Please don't suggest to run the loop inside the source code of my application. I need pure SQL solution.
You could increase the performance of the subquery further by creating a covering index for it:
CREATE INDEX xxx ON job_runs(job_id, run_number, status);
But the biggest performance problem is that the subquery is executed for each row, although you need to run it only for each unique job ID.
So, first, get just the unique job IDs:
SELECT DISTINCT job_id
FROM job_runs
Then, for each of these IDs, determine the tenth largest run number:
SELECT job_id,
(SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
But if there are less than ten run numbers for a job, the subquery returns NULL, so let's replace that with a small number so that the comparion below (run_number >= first_run_number) works:
SELECT job_id,
IFNULL((SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
), -1) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
So now we have the first interesting run for each job.
Finally, join these values back to the original table:
SELECT job_runs.*
FROM job_runs
JOIN (SELECT job_id,
IFNULL((SELECT run_number
FROM job_runs
WHERE job_id = job_ids.job_id
AND status != 'running'
ORDER BY run_number DESC
LIMIT 1 OFFSET 9
), -1) AS first_run_number
FROM (SELECT DISTINCT job_id
FROM job_runs) AS job_ids
) AS firsts
ON job_runs.job_id = firsts.job_id
AND job_runs.run_number >= firsts.first_run_number;
I have a Hive table named "sales" with below structure:
id,ptype,amount,time,date
1,a,12,2240,2013-12-25
1,a,4,1830,2013-12-25
1,b,2,1920,2013-12-25
1,b,3,2023,2013-12-25
2,a,5,1220,2013-12-25
2,a,1,1320,2013-12-25
Below is my queries for different variables variable:
Q1: select id,sum(amount) as s_amt from sales group by id;
Q2: select id, sum(amount) as s_a_amt from sales where ptype='a' group by id;
Q3: select id, sum(amount) as s_b_amt from sales where ptype='b' group by id;
As far what I learned in Hive we can apply "union all" option only when we have same column name or query schema. Below is the end result what i want to achieve using Hive query:
id,s_amt,s_a_amt,s_b_amt
1,21,16,5
2,6,6,0
Below is one query that i tried and it executed successfully. But it will be a very painful task when you have to design the same query for more than 300 variables. Is there any efficient approach for the same task considering we have more than 300 variables? Appreciate your comments!
select t.id,max(t.s_amt) as s_amt,max(t.s_a_amt) as s_a_amt, max(t.s_b_amt) as s_b_amt
from
(select s1.id,sum(amount) as s_amt,0 as s_a_amt,0 as s_b_amt from sales s1 group by id union all
select s2.id, 0 as s_amt, sum(amount) as s_a_amt, 0 as s_b_amt from sales s2 where ptype='a' group by id union all
select s3.id, 0 as s_amt,0 as s_a_amt, sum(amount) as s_b_amt from sales s3 where ptype='b' group by id) t
group by t.id;
Ideal Solution would be to have a
Materialised Query Table (MQT) as IBM refers.
Summary tables are special form of MQTs and thats exactly what you need. Quick definition - as the name suggests MQT is a simple summary table, materialized on the disk.
With MQT support all you have to do is the below
CREATE MATERIALISED QUERY TABLE MQTA AS (
select id, sum(amount) as s_a_amt from sales where ptype='a' group by id;
)
Data initially deferred
Refresh deferred
Maintained by User
Data initially deferred says not insert summary records into the summary table. Refresh deferred says data in the table can be refreshed at any time using the REFRESH TABLE statement. Maintained by user says the Refersh of this table has to be taken care by the user - Maintained by System is another option in which system takes care of automatically updating the summary table when the base table sees inserts/deletes//updates.
You could directly query the MQT like a simple select query, all the heavy lifting of summarising records would have actually ran before and not when you query the MQT so it would much faster.
But AFAIK HIVE doesn’t support MQT or summary tables.
You now know the concept, you just have to simply simulate this.Create a summary table and insert the summary records (The REFRESH TABLE concept). You have to load summary values periodically by controlling with some kind of last load date fields so you will pickup only the records after last refresh.You can do this with scheduled jobs - Hive scripts.
INSERT INTO PTYPE_AMOUNT_MQT AS (
select *
from
(select s1.id,sum(amount) as s_amt,0 as s_a_amt,0 as s_b_amt from sales s1 where record_create_date > last_Refresh_date group by id union all
select s2.id, 0 as s_amt, sum(amount) as s_a_amt, 0 as s_b_amt from sales s2 where ptype='a' and record_create_date > last_Refresh_date group by id union all
select s3.id, 0 as s_amt,0 as s_a_amt, sum(amount) as s_b_amt from sales s3 where ptype='b' and record_create_date > last_Refresh_date group by id)
)
It is always good to have audit fields like record_create_date and time.The last_Refresh_date is the
last time your job ran
The solution should be:
select id, sum(amount) s_amt,
SUM (CASE WHEN ptype='a' THEN amount
ELSE 0
END) sum_a_amt,
SUM (CASE WHEN ptype='b' THEN amount
ELSE 0
END) sum_b_amt
from sales
group by id;
Please try it and tell me if it works, I cannot test it right now...
Hive has recently added GROUPING SETS as a new feature (https://issues.apache.org/jira/browse/HIVE-3471). It could be a lot easier (to write or read) than MQT. But not everyone knows about this feature and the use of CASE functions, as Arnaud has illustrated, is more commonly used in practice.