How to count repeat values in multiples columns with a select - oracle

i have the position_table table:
CODE_POSITION NAME GRADE VALIDITY DATE ELIMINATION LEVEL
1 AAAA01 MANAGER 10 01/03/2016 31/12/2999 HIGH
2 BBBB01 ANALYST 09 01/03/2016 31/12/2999 LOW
3 CCCC01 STAFF 05 01/03/2016 31/12/2999 HIGH
4 BBBB01 ANALYST 09 01/03/2016 31/12/2999 HIGH
5 AAAA01 MANAGER 10 01/03/2016 31/12/2999 LOW
6 DDDD01 INTERN 01 01/03/2016 31/12/2999 HIGH
7 DDDD01 INTERN 01 01/07/2016 31/12/2999 LOW
I use this query to find and count the same code_position:
select code_position, count(code_position)
from position_table
group by code_position
having count(code_position) > 1;
And this is the result:
CODE_POSITION COUNT(CODE_POSITION)
1 AAAA01 2
2 BBBB01 2
3 DDDD01 2
Note:
The AAAA01 code is repeated twice and has the same date of validity and grade.
The BBBB01 code is repeated twice and has the same date of validity and grade.
The DDDD01 code is repeated twice and have different date of validity.
Now i need to check which code_position is repeated and if they have the same validity date and grade to. Like: AAAA01 and BBBB01.

You can use more then one condition in GROUP BY. If you use more conditions, all will be compared and just grouped if identically.
SELECT
code_position,
COUNT(code_position),
validity,
grade
FROM position_table
GROUP BY
code_position,
validity,
grade
HAVING COUNT(code_position) > 1

Related

How would I add an artificial termination date to the termination date column based on two different dates for the same patient id

I need to figure out a query that will compare two EFFECTIVE dates for a given patient number with different HMOs and determine which is the later date of the two and then populate a TERMINATION date field for only the older of the two effective dates with the last day of the previous month of the newer effective date of the two. This needs to be done across multiple patient, HMO, effective date combinations in a table.
SELECT * FROM tablename
The output is this:
HMO PATIENT EFFECTIVE TERMINATION
16 221135 01-APR-18
18 221135 01-OCT-17
12 251181 01-SEP-16
16 251181 01-MAR-15
12 271126 01-MAR-15
16 271126 01-DEC-16
12 291141 01-DEC-16
16 291141 01-FEB-19
12 391134 09-MAY-13
16 391134 01-APR-18
What I am trying to do via a query or queries is this:
HMO PATIENT EFFECTIVE TERMINATION
16 221235 01-APR-18
18 221235 01-OCT-17 3/31/2018
12 251381 01-SEP-16
16 251381 01-MAR-15 8/31/2016
12 2711126 01-MAR-15 11/30/2016
16 2711126 01-DEC-16
12 292241 01-DEC-16 1/31/2019
16 292241 01-FEB-19
12 391534 09-MAY-13 31-MAR-19
16 391534 01-APR-18
I've tried using a case statement but it is unsurprisingly creating four rows per patient, hmo combo and populating two of the rows with dates and leaving two blank:
SELECT DISTINCT
S.HMO
,S.PATIENT
,S.EFFECTIVE
,CASE WHEN S.EFFECTIVE > E.EFFECTIVE THEN LAST_DAY(ADD_MONTHS(S.EFFECTIVE, -1))
WHEN S.EFFECTIVE < E.EFFECTIVE THEN LAST_DAY(ADD_MONTHS(E.EFFECTIVE, -1))
ELSE NULL END AS TERMINATION
FROM tablename S INNER JOIN tablename E ON S.PATIENT=E.PATIENT
WHERE S.PATIENT =221135
Any ideas or advice would be welcome.
With sample data you posted:
SQL> select * from tablename order by patient, effective;
HMO PATIENT EFFECTIVE TERMINATIO
---------- ---------- ---------- ----------
18 221135 10/01/2017
16 221135 04/01/2018
16 251181 03/01/2015
12 251181 09/01/2016
12 271126 03/01/2015
16 271126 12/01/2016
6 rows selected.
such a MERGE might do:
SQL> merge into tablename a
2 using (select patient, max(effective) max_effective,
3 min(effective) min_effective
4 from tablename
5 group by patient
6 ) x
7 on (a.patient = x.patient)
8 when matched then update set
9 a.termination = x.max_effective - 1
10 where a.effective = x.min_effective;
3 rows merged.
Result is then
SQL> select * from tablename order by patient, effective;
HMO PATIENT EFFECTIVE TERMINATIO
---------- ---------- ---------- ----------
18 221135 10/01/2017 03/31/2018
16 221135 04/01/2018
16 251181 03/01/2015 08/31/2016
12 251181 09/01/2016
12 271126 03/01/2015 11/30/2016
16 271126 12/01/2016
6 rows selected.
SQL>

How to store multiple values in a variable to be used in a case statement

I am having this issue and any help in this regard will greatly be appreciated.
I have Oracle db and working with following business case:
An employee can work in a different job grades in his/her regular time hours or in overtime
Need to calculate employee’s hours w.r.t. different job grades and wage codes, because I have hours and job grades in different tables and the table which has job grades doesn’t have hours, instead time in and time out so after querying the db I get the following result.
Emp_ID
Wage Code
Job grade
Hours
Date
1
01
8
2021/06/07
1
02
P
2
2021/06/07
1
08
8
2021/06/08
1
01
6
2021/06/09
1
01
E
8
2021/06/09
1
01
8
2021/06/10
1
01
8
2021/06/11
1
02
9
2021/06/11
Now I get wrong hours when the employee works in different job grade(s).
To overcome this, I need to identify on which date employee worked in a different job grade do I can put case statement.
I used this logic.
Pick the date on which employee worked in different job grade and on that date do calculation of hours from table A
Other wise do calculation of hours from table B.
The problem is I can’t simply use variables because there could be multiple dates.
How can I achieve this? Can I use any other logic?
Thanks,
Here are my tables
TABLE A
Emp_ID
Wage_code
time_in
time_out
Job_grade
Date
01
8:00
16:00
2021-06-7
01
16:00
18:00
P
2021-06-7
01
8:00
16:00
2021-06-08
01
8:00
14:00
2021-06-09
01
14:00
16:00
E
2021-06-09
01
8:00
16:00
2021-06-10
01
8:00
16:00
2021-06-11
01
16:00
17:00
2021-06-11
This table doesn't store wage_codes. empty job_grade means employee has worked in the same job grade
TABLE B
Emp_ID
Wage_code
Hours
Date
01
1
8
2021-06-7
01
2
2
2021-06-7
01
8
8
2021-06-08
01
1
8
2021-06-09
01
1
8
2021-06-10
01
1
8
2021-06-11
01
2
2
2021-06-11
This table stores wage_codes but no job grade change, just a regular one and hours for each wage_code (1=regular,2=overtime,8=vacation etc..)
my query
select
A.emp_id,
A.job_grade,
B.Wage_code,
B.Date,
case
when A.job_grade ='' then B.Hours
else
to_char(A.time_in - A.time_out) *(24),'fm99.90')
end "Hours"
from A
left join B on A.emp_id=B.emp_id and A.Date=B.Date
With this query I get wrong hours when employee has worked in a different job grade. Because the condition in case statement checks if job grade is empty then calculate hours from Table B. Now e.g. on 06/07, employee has worked in a normal grade as well as in a different job grade.
How can I identify the date on which employee has worked in a different job grade so I can combine it with the job_grade condition in case statement and calculate hours accurately.
Many thanks for your support!!

Reorder factored matrix columns in Power BI

I have a matrix visual in Power BI. The columns are departments and the rows years. The values are counts of people in each department each year. The departments obviously don't have a natural ordering, BUT I would like to reorder them using the total column count for each department in descending order.
For example, if Department C has 100 people total over the years (rows), and all the other departments have fewer, I want Department C to come first.
I have seen other solutions that add an index column, but this doesn't work very well for me because the "count of people" variable is what I want to index by and that doesn't already exist in my data. Rather it's a calculation based on individual people which each have a department and year.
If anyone can point me to an easy way of changing the column ordering/sorting that would be splendid!
| DeptA | DeptB | DeptC
------|-------|-------|-------
1900 | 2 | 5 | 10
2000 | 6 | 7 | 2
2010 | 10 | 1 | 12
2020 | 0 | 3 | 30
------|-------|-------|-------
Total | 18 | 16 | 54
Order: #2 #3 #1
I don't think there is a built-in way to do this like there is for sorting the rows (there should be though, so go vote for a similar idea here), but here's a possible workaround.
I will assume your source table is called Employees and looks something like this:
Department Year Value
A 1900 2
B 1900 5
C 1900 10
A 2000 6
B 2000 7
C 2000 2
A 2010 10
B 2010 1
C 2010 12
A 2020 0
B 2020 3
C 2020 30
First, create a new calculated table like this:
Depts = SUMMARIZE(Employees, Employees[Department], "Total", SUM(Employees[Value]))
This should give you a short table as follows:
Department Total
A 18
B 16
C 54
From this, you can easily rank the totals with a calculated column on this Depts table:
Rank = RANKX('Depts', 'Depts'[Total])
Make sure your new Depts table is related to the original Employees table on the Department column.
Under the Data tab, use Modeling > Sort by Column to sort Depts[Department] by Depts[Rank].
Finally, replace the Employees[Department] with Depts[Department] on your matrix visual and you should get the following:

12 month rolling data from earliest invoice date - Hadoop

Seeking help with the following problem statement.
I/P Data Set:
customer id invoice date item id invoice amount Comment
1 10-Jan-2014 1 10 Start of 12 month window - 10th Jan 2014 to 10th Jan 2015
1 20-Jan-2014 2 20 Falls within 12 month window
1 21-Aug-2014 1 10 Falls within 12 month window
1 31-Dec-2014 1 10 Falls within 12 month window
1 20-Feb-2015 1 10 Start of new 12 month window as this is post 10th Jan 2015
1 30-Mar-2016 1 10 Start of new 12 month window as this is post 20th Feb 2016
Desired o/p
customer id invoice date item id invoice amount window sum(amount where item id = 1)
1 10-Jan-2014 1 10 1 10
1 20-Jan-2014 2 20 1 0
1 21-Aug-2014 1 10 1 20
1 31-Dec-2014 1 10 1 30
1 20-Feb-2015 1 10 2 10
1 30-Mar-2016 1 10 3 10
I tried using the following query in Hive to achieve the above output but the challenge is in resetting the next window once we have crossed the 12 month mark. (Please refer to rows 5 and 6 in the input data set). The need is for these records to be considered as start of a new window.
Following Query Used:
SELECT SUM(if(item_id = 1, invoice_amount, 0)) OVER (
PARTITION BY customer_id
ORDER BY invoice_date ASC
RANGE BETWEEN 31556926 PRECEDING AND CURRENT ROW
) FROM INVOICE_DETAILS;`

Active IDs in date ranges

I have requirement to get number of customers active in a month based on their revenue contribution period.
Original Data:
ACCOUNT_ID REVENUE_START_DATE REVENUE_END_DATE
1234 1/14/2010 0:00 4/13/2010 23:59
4567 2/9/2010 0:00 3/8/2010 23:59
1234 5/9/2010 0:00 6/8/2010 23:59
Expected Result
Month Count
Dec-09 0
Jan-10 1
Feb-10 2
Mar-10 2
Apr-10 1
May-10 1
Jun-10 1
Jul-10 0
Aug-10 0
Sep-10
Oct-10
Below is the oracle code I worked on (with help of google) but I am not getting correct result due to overlapping dates. I request the experts to help me with this. (Thanks in Advance)
Current Result:
YEAR_ MONTH_ ACT
2010 January 2
2010 February 3
2010 March 3
2010 April 3
ORACLE CODE:
with tab as
(
select distinct ACCOUNT_ID, billing_start_date as revenue_start_date, billing_end_date as revenue_end_date
from accounts
),
year_tab as
(
select
add_months(min_date, level -1) m
from
(
select min(trunc(revenue_start_date,'YYYY')) min_date, add_months(max(trunc(revenue_end_date,'YYYY')), 12) max_date
from tab
)
connect by level <= months_between(max_date, min_date)
)
select to_char(m,'YYYY') year_,
to_char(m,'Month') month_,
nvl(act, 0) act
from year_tab,
(
select m date_,count(*) act
from tab, year_tab
where m between trunc(revenue_start_date,'MM') and trunc(revenue_end_date,'MM')
group by m
) month_tab
where m = date_(+)
order by m;
It's taken me a while to see why you think there is a problem. With the original three rows of data you supplied, running your query gives exactly your 'expected result'. With the 54 rows of data from your CSV file, the result is 48 rows (covering four years), with non-zero totals from January 2010 to January 2013. The first few rows returned are:
YEAR_ MONTH_ ACT
----- ------------------------------------ ----------
2010 January 2
2010 February 3
2010 March 3
2010 April 3
2010 May 2
But that looks correct:
select * from accounts
where not (billing_start_date > date '2010-02-01'
or billing_end_date < date '2010-01-01');
ACCOUNT_ID BILLING_START_DATE BILLING_END_DATE
---------- ------------------ ------------------
1234 09/01/2010 00:00 08/02/2010 23:59
4567 14/01/2010 00:00 13/04/2010 23:59
2 rows selected
select * from accounts
where not (billing_start_date > date '2010-03-01'
or billing_end_date < date '2010-02-01');
ACCOUNT_ID BILLING_START_DATE BILLING_END_DATE
---------- ------------------ ------------------
1234 09/01/2010 00:00 08/02/2010 23:59
4567 14/01/2010 00:00 13/04/2010 23:59
1234 09/02/2010 00:00 08/03/2010 23:59
3 rows selected
select * from accounts
where not (billing_start_date > date '2010-04-01'
or billing_end_date < date '2010-03-01');
ACCOUNT_ID BILLING_START_DATE BILLING_END_DATE
---------- ------------------ ------------------
4567 14/01/2010 00:00 13/04/2010 23:59
1234 09/02/2010 00:00 08/03/2010 23:59
1234 09/03/2010 00:00 08/04/2010 23:59
3 rows selected
But what I think you wanted wasn't really stressed in the question: 'to get number of customers active'. Assuming that by 'customer' you mean unique account IDs, you just need to modify the count:
select m date_,count(distinct account_id) act
from tab, year_tab
...
... which gives the first few rows as:
YEAR_ MONTH_ ACT
----- ------------------------------------ ----------
2010 January 2
2010 February 2
2010 March 2
2010 April 2
2010 May 1
What you were doing wrong was trying to apply the distinct in your tab subquery; but distinct returns distinct rows, and as the dates were different that wasn't actually reducing the number of rows returned.
Which doesn't quite match your expected result still, but does seem to match the data (if my assumption about what you want is right), and still does give your expected result for you three-row sample.
Another way to write the query, which I find a bit easier to follow, and using ANSI join syntax:
with t as (
select add_months(min_date, level - 1) month_start,
add_months(min_date, level) next_month_start
from (
select trunc(min(billing_start_date),'YYYY') min_date,
add_months(trunc(max(billing_start_date),'YYYY'), 12) max_date
from accounts
)
connect by level <= months_between(max_date, min_date)
)
select to_char(t.month_start,'YYYY') year_,
to_char(t.month_start,'Month') month_,
count(distinct a.account_id) act
from t
left join accounts a on not (billing_start_date > t.next_month_start
or billing_end_date < t.month_start)
group by t.month_start
order by t.month_start;

Resources