Active IDs in date ranges - oracle

I have requirement to get number of customers active in a month based on their revenue contribution period.
Original Data:
ACCOUNT_ID REVENUE_START_DATE REVENUE_END_DATE
1234 1/14/2010 0:00 4/13/2010 23:59
4567 2/9/2010 0:00 3/8/2010 23:59
1234 5/9/2010 0:00 6/8/2010 23:59
Expected Result
Month Count
Dec-09 0
Jan-10 1
Feb-10 2
Mar-10 2
Apr-10 1
May-10 1
Jun-10 1
Jul-10 0
Aug-10 0
Sep-10
Oct-10
Below is the oracle code I worked on (with help of google) but I am not getting correct result due to overlapping dates. I request the experts to help me with this. (Thanks in Advance)
Current Result:
YEAR_ MONTH_ ACT
2010 January 2
2010 February 3
2010 March 3
2010 April 3
ORACLE CODE:
with tab as
(
select distinct ACCOUNT_ID, billing_start_date as revenue_start_date, billing_end_date as revenue_end_date
from accounts
),
year_tab as
(
select
add_months(min_date, level -1) m
from
(
select min(trunc(revenue_start_date,'YYYY')) min_date, add_months(max(trunc(revenue_end_date,'YYYY')), 12) max_date
from tab
)
connect by level <= months_between(max_date, min_date)
)
select to_char(m,'YYYY') year_,
to_char(m,'Month') month_,
nvl(act, 0) act
from year_tab,
(
select m date_,count(*) act
from tab, year_tab
where m between trunc(revenue_start_date,'MM') and trunc(revenue_end_date,'MM')
group by m
) month_tab
where m = date_(+)
order by m;

It's taken me a while to see why you think there is a problem. With the original three rows of data you supplied, running your query gives exactly your 'expected result'. With the 54 rows of data from your CSV file, the result is 48 rows (covering four years), with non-zero totals from January 2010 to January 2013. The first few rows returned are:
YEAR_ MONTH_ ACT
----- ------------------------------------ ----------
2010 January 2
2010 February 3
2010 March 3
2010 April 3
2010 May 2
But that looks correct:
select * from accounts
where not (billing_start_date > date '2010-02-01'
or billing_end_date < date '2010-01-01');
ACCOUNT_ID BILLING_START_DATE BILLING_END_DATE
---------- ------------------ ------------------
1234 09/01/2010 00:00 08/02/2010 23:59
4567 14/01/2010 00:00 13/04/2010 23:59
2 rows selected
select * from accounts
where not (billing_start_date > date '2010-03-01'
or billing_end_date < date '2010-02-01');
ACCOUNT_ID BILLING_START_DATE BILLING_END_DATE
---------- ------------------ ------------------
1234 09/01/2010 00:00 08/02/2010 23:59
4567 14/01/2010 00:00 13/04/2010 23:59
1234 09/02/2010 00:00 08/03/2010 23:59
3 rows selected
select * from accounts
where not (billing_start_date > date '2010-04-01'
or billing_end_date < date '2010-03-01');
ACCOUNT_ID BILLING_START_DATE BILLING_END_DATE
---------- ------------------ ------------------
4567 14/01/2010 00:00 13/04/2010 23:59
1234 09/02/2010 00:00 08/03/2010 23:59
1234 09/03/2010 00:00 08/04/2010 23:59
3 rows selected
But what I think you wanted wasn't really stressed in the question: 'to get number of customers active'. Assuming that by 'customer' you mean unique account IDs, you just need to modify the count:
select m date_,count(distinct account_id) act
from tab, year_tab
...
... which gives the first few rows as:
YEAR_ MONTH_ ACT
----- ------------------------------------ ----------
2010 January 2
2010 February 2
2010 March 2
2010 April 2
2010 May 1
What you were doing wrong was trying to apply the distinct in your tab subquery; but distinct returns distinct rows, and as the dates were different that wasn't actually reducing the number of rows returned.
Which doesn't quite match your expected result still, but does seem to match the data (if my assumption about what you want is right), and still does give your expected result for you three-row sample.
Another way to write the query, which I find a bit easier to follow, and using ANSI join syntax:
with t as (
select add_months(min_date, level - 1) month_start,
add_months(min_date, level) next_month_start
from (
select trunc(min(billing_start_date),'YYYY') min_date,
add_months(trunc(max(billing_start_date),'YYYY'), 12) max_date
from accounts
)
connect by level <= months_between(max_date, min_date)
)
select to_char(t.month_start,'YYYY') year_,
to_char(t.month_start,'Month') month_,
count(distinct a.account_id) act
from t
left join accounts a on not (billing_start_date > t.next_month_start
or billing_end_date < t.month_start)
group by t.month_start
order by t.month_start;

Related

FIFO inventory aging report using a single query in T-SQL

I've got an inventory transactions table :
Product
Date
Direction
Quantity
A
Date 1
IN
3
B
Date 2
IN
55.7
A
Date 3
OUT
1
B
Date 3
OUT
8
B
Date 3
IN
2
I can easily get the stock for any date with the following query :
SELECT Product,
SUM(CASE Direction WHEN 'IN' THEN Quantity ELSE -1 * Quantity END)
FROM Transactions
WHERE Date <= '#DateValue#'
GROUP BY Product;
Now my purpose is to get stocks aged like this using the FIFO principle :
Product
Total stock
0-30 days
31-60 days
61-90 days
91+ days
A
3
3
0
0
0
B
34.2
10
14.2
7
3
C
25
20
3
1
1
D
10
2
8
0
0
E
1
0
0
1
0
I am using SQL Server 2016 & SSMS 18.
The solution should be fast as it will be working against a table with 3,000,000+ rows.
A single query is preferred since it will be integrated into an ERP system.
I have yet to find a solution based on a single query after weeks of research. Any help is appreciated. Thanks in advance.

Oracle Archive and Purge Options

I am trying to figure out what are the best options to perform archive and purge given our situation.
We have roughly 50 million records in say Table A. We want to archive data into a target table and then purge those data in the source table. We would like to retain the data base on several criteria that overlap with each other. For example, we want to retain the data from the past 5 months in addition to keeping all the records with say Indicator='True'. Indicator='True' will likely return records beyond 5 months. This means I have to use OR condition in order to capture the data. Base on the conditions, we would need to retain 10 million records and archive/purge 40 million records. I would need to create a process that will run every 6 months to do this.
My question is, what are the most efficient options for me to get this done for both archiving and purging? Would a PROC/bulk delete/insert be my best option?
Partition seems to be out of the question since there are several conditions that overlap with each other.
Use composite partitioning, e.g. range (for your time dimension) and list (to distinct between the rows that should be kept long and limited time.
Example
The rows with KEEP_ID='N' should be eliminated after 5 months.
CREATE TABLE tab
( id NUMBER(38,0),
trans_dt DATE,
keep_id VARCHAR2(1)
)
PARTITION BY RANGE (trans_dt) INTERVAL (NUMTOYMINTERVAL(1,'MONTH'))
SUBPARTITION BY LIST (keep_id)
SUBPARTITION TEMPLATE
( SUBPARTITION p_catalog VALUES ('Y'),
SUBPARTITION p_internet VALUES ('N')
)
(PARTITION p_init VALUES LESS THAN (TO_DATE('01-JAN-2019','dd-MON-yyyy'))
);
Populate with sample data for 6 months
insert into tab (id, trans_dt, keep_id)
select rownum, add_months(date'2019-08-01', trunc((rownum-1) / 2)), decode(mod(rownum,2),0,'Y','N')
from dual connect by level <= 12;
select * from tab
order by trans_dt, keep_id;
ID TRANS_DT KEEP_ID
---------- ------------------- -------
1 01.08.2019 00:00:00 N --- this subpartition should be deleted
2 01.08.2019 00:00:00 Y
3 01.09.2019 00:00:00 N
4 01.09.2019 00:00:00 Y
5 01.10.2019 00:00:00 N
6 01.10.2019 00:00:00 Y
7 01.11.2019 00:00:00 N
8 01.11.2019 00:00:00 Y
9 01.12.2019 00:00:00 N
10 01.12.2019 00:00:00 Y
11 01.01.2020 00:00:00 N
12 01.01.2020 00:00:00 Y
Now use partition extended names to reference the subpartition that should be dropped.
Drop subpartition older than 5 months, but only for KEEP_ID = 'N'
alter table tab drop subpartition for (DATE'2019-08-01','N');
New data
ID TRANS_DT KEEP_ID
---------- ------------------- -------
2 01.08.2019 00:00:00 Y
3 01.09.2019 00:00:00 N
4 01.09.2019 00:00:00 Y
.....

How to complete report derived from another query with zeros or nulls

So, I'm really having a hard time with a report.
I need a report grouped by year. For example, we want to show how many cars are sold per year. So, the report has a column Year, the type/model of the car, and the quantity. However, I also want to show a row with null/zero value, so even when no car of a specific type was sold, I want it to show the row, but with 0.
The problem is, this query is based on a lot of views, which shows each transaction. So, my actual query works fine except it doesn't show a type when none of that type was sold in a year.
When I pivot this report using Oracle APEX, it almost works. It shows all the types, but if I filter per year, then they are gone.
I have all the years I need, but I don't have the data for that year. I take all the data from multiple views with the specifics of the sales. Some of the models/types were not sold in some years, so when I query the report, it doesn't show up, which is expected. For example:
What I get is
//YEAR - MODEL - QUANTITY //
2018 - MODEL 1 - 300
2018 - MODEL 2 - 12
2017 - MODEL 1 - 12
2017 - MODEL 2 - 33
2017 - MODEL 3 - 22
What I want
//YEAR - MODEL - QUANTITY //
2018 - MODEL 1 - 300
2018 - MODEL 2 - 12
2018 - MODEL 3 - 0
2017 - MODEL 1 - 12
2017 - MODEL 2 - 33
2017 - MODEL 3 - 22
Any ideas?
You can conjure rows, and outer join to them.
with years as (
select add_months(date '1980-1-1', (rownum-1)*12) dt
from dual
connect by level < 5
)
select y.dt, count(e.hiredate)
from scott.emp e
right outer join years y
on y.dt = trunc(e.hiredate,'yy')
group by y.dt
DT COUNT(E.HIREDATE)
------------------- -----------------
01-01-1982 00:00:00 1
01-01-1983 00:00:00 0
01-01-1981 00:00:00 10
01-01-1980 00:00:00 1

Reorder factored matrix columns in Power BI

I have a matrix visual in Power BI. The columns are departments and the rows years. The values are counts of people in each department each year. The departments obviously don't have a natural ordering, BUT I would like to reorder them using the total column count for each department in descending order.
For example, if Department C has 100 people total over the years (rows), and all the other departments have fewer, I want Department C to come first.
I have seen other solutions that add an index column, but this doesn't work very well for me because the "count of people" variable is what I want to index by and that doesn't already exist in my data. Rather it's a calculation based on individual people which each have a department and year.
If anyone can point me to an easy way of changing the column ordering/sorting that would be splendid!
| DeptA | DeptB | DeptC
------|-------|-------|-------
1900 | 2 | 5 | 10
2000 | 6 | 7 | 2
2010 | 10 | 1 | 12
2020 | 0 | 3 | 30
------|-------|-------|-------
Total | 18 | 16 | 54
Order: #2 #3 #1
I don't think there is a built-in way to do this like there is for sorting the rows (there should be though, so go vote for a similar idea here), but here's a possible workaround.
I will assume your source table is called Employees and looks something like this:
Department Year Value
A 1900 2
B 1900 5
C 1900 10
A 2000 6
B 2000 7
C 2000 2
A 2010 10
B 2010 1
C 2010 12
A 2020 0
B 2020 3
C 2020 30
First, create a new calculated table like this:
Depts = SUMMARIZE(Employees, Employees[Department], "Total", SUM(Employees[Value]))
This should give you a short table as follows:
Department Total
A 18
B 16
C 54
From this, you can easily rank the totals with a calculated column on this Depts table:
Rank = RANKX('Depts', 'Depts'[Total])
Make sure your new Depts table is related to the original Employees table on the Department column.
Under the Data tab, use Modeling > Sort by Column to sort Depts[Department] by Depts[Rank].
Finally, replace the Employees[Department] with Depts[Department] on your matrix visual and you should get the following:

PIG Script How to

I am trying clean up this employee volunteer data. There is no way to track if employee already is registered volunteer so he can sign up as new volunteer and will get a new VOLUNTEER_ID. I have a data feeding into where i can tie each VOLUNTEER_ID to its EMP_ID. The volunteer data needs to be cleaned up so we can figure out how the employee moved from a volunteer_level to another and when.
The business logic is that, when there is a overlaping dates, we give the highest level to the employee for the timeframe of between start_date and end_date.
I posted a Input sample of data and what the output should be.
Is it possible to do this a PIG script ? Can someone please help me
INPUT:
EMP_ID VOLUNTEER_ID V_LEVEL STATUS START_DATE END_DATE
10001 100 1 A 1/1/2006 12/31/2007
10001 200 1 A 5/1/2006
10001 100 1 A 1/1/2008
10001 300 3 P 3/1/2008 3/1/2008
10001 300 3 A 3/2/2008 12/1/2008
10001 1001 2 A 5/1/2008 6/30/2008
10001 1001 3 A 7/1/2008
10001 300 2 A 12/2/2008
OUTPUT NEEDED:( VOLUNTEER_ID is not needed in output but adding below to show which ID was selected for output and which did not)
EMP_ID VOLUNTEER_ID V_LEVEL STATUS START_DATE END_DATE
10001 100 1 A 1/1/2006 12/31/2007
10001 300 3 P 3/1/2008 3/1/2008
10001 300 3 A 3/2/2008 12/1/2008
10001 1001 2 A 5/1/2008 6/30/2008
10001 1001 3 A 7/1/2008
It seems like you want the row in your data with the earliest start date for each V_LEVEL, STATUS, EMP_ID, and VOLUNTEER_ID
First we add a unix time column and then find the min for that column (this is in the latest version of pig so you may need to update your version).
data_with_unix = foreach data generate EMP_ID, VOLUNTEER_ID, V_LEVEL, STATUS, START_DATE, END_DATE, ToUnixTime((datetime)START_DATE) as unix_time;
grp = group data_with_unix by (EMP_ID, VOLUNTEER_ID, V_LEVEL, STATUS);
max_date = foreach grp generate group, MIN(data_with_unix.unix_time);
Then join the start and end date back into your dataset since there it doesn't look like there is currently a way to convert unix time back to date.

Resources