12 month rolling data from earliest invoice date - Hadoop - hadoop

Seeking help with the following problem statement.
I/P Data Set:
customer id invoice date item id invoice amount Comment
1 10-Jan-2014 1 10 Start of 12 month window - 10th Jan 2014 to 10th Jan 2015
1 20-Jan-2014 2 20 Falls within 12 month window
1 21-Aug-2014 1 10 Falls within 12 month window
1 31-Dec-2014 1 10 Falls within 12 month window
1 20-Feb-2015 1 10 Start of new 12 month window as this is post 10th Jan 2015
1 30-Mar-2016 1 10 Start of new 12 month window as this is post 20th Feb 2016
Desired o/p
customer id invoice date item id invoice amount window sum(amount where item id = 1)
1 10-Jan-2014 1 10 1 10
1 20-Jan-2014 2 20 1 0
1 21-Aug-2014 1 10 1 20
1 31-Dec-2014 1 10 1 30
1 20-Feb-2015 1 10 2 10
1 30-Mar-2016 1 10 3 10
I tried using the following query in Hive to achieve the above output but the challenge is in resetting the next window once we have crossed the 12 month mark. (Please refer to rows 5 and 6 in the input data set). The need is for these records to be considered as start of a new window.
Following Query Used:
SELECT SUM(if(item_id = 1, invoice_amount, 0)) OVER (
PARTITION BY customer_id
ORDER BY invoice_date ASC
RANGE BETWEEN 31556926 PRECEDING AND CURRENT ROW
) FROM INVOICE_DETAILS;`

Related

Power Query - Merge tables - Sum values with conditions

I'm new to Power Query and I can't figure out how to do the following:
I have two tables
"REQUESTED"
Date
ReqNumber
Client
SKU
ReqQuantity
Jan-01
Z10
1
A
2
Feb-05
Z11
1
A
3
"SENT"
Date
Client
SKU
Quantity
Jan-15
1
A
1
Feb-02
1
A
3
Mar-10
1
A
5
What I want to achieve is that I want to merfe the tables and allow me to filter by date, showing the corresponding amount requested/sent
For Example:
If I filter dates between Jan-01 / Jan - 31
I should get the following:
"REQUESTED - SENT"
Date
ReqNumber
Client
SKU
ReqQuantity
SentQuantity
Jan-01
Z10
1
A
2
1
If I filter dates between Jan-01 / Feb - 28
I should get the following:
"REQUESTED - SENT"
Date
ReqNumber
Client
SKU
ReqQuantity
SentQuantity
Jan-01
Z10
1
A
2
2
Feb-05
Z11
1
A
3
2
If I filter dates between Jan-01 / Mar- 15
I should get the following:
"REQUESTED - SENT"
Date
ReqNumber
Client
SKU
ReqQuantity
SentQuantity
Jan-01
Z10
1
A
2
2
Feb-05
Z11
1
A
3
3
Is this posiible in Power Query?
Thanks!
I believe your question is a simple one that might get passed over because it is simple.
Two tables "Sent" "Requested"...
Merge is under the Home tab.
Set "Requested" as Left and "Sent" as right. Select date for both.
In the Sent column click on the expand icon.

Calculate total time based on column in Tableau

I have a table like below:
From Date
Issue Id
Issue Id (group)
Status
Till Date
19-07-2021 17:21
4
4
Approved
19-07-2021 17:23
19-07-2021 17:23
4
4
In Progress
19-07-2021 17:23
19-07-2021 17:23
4
4
In Review
19-07-2021 17:25
19-07-2021 17:25
4
4
In Progress
19-07-2021 18:56
19-07-2021 18:56
4
4
In Review
20-07-2021 08:47
20-07-2021 08:47
4
4
Resolved
20-07-2021 14:45
20-07-2021 14:45
4
4
Closed
12-07-2021 10:49
4
4
Open
19-07-2021 17:21
27-04-2016 09:07
3
3
Open
10-01-2017 08:40
10-01-2017 08:40
3
3
Closed
10-01-2017 08:40
3
3
Resolved
10-01-2017 08:40
I need to do the following things:
For Issue Id 4 find the total time in hours or minutes or seconds or days for a particular type of status. For e.g There are 2 In Review rows. So the total time between From Date to Till date will be 17:23 (19-07) till 8:47(20-07).
calculate total time a issue is in between closed and In Review (here Till date for closed issues is unfortunately null).
Basically I am trying to create a dashboard where for each issue i'd i would like to see for how long was a issue "In Review" or "In Progress" before it was closed. So the dashboard will have "Issue Id" in the X axis and "Total Time for Review" or "Total Time for Progress" in the Y axis. For e.g Issue 4 was in a total of 1:31:01 Hours in the "In Progress" state (17:23 to 17:23 on 19th July and 17:25 to 18:56 on 19th July).
I am trying this:
IF [STATUS] = 'In progress' and [STATUS] = 'Closed'
THEN
DATEDIFF('day',[Date Create],[Till Date])
END but it says tables can only be aggregated and using Count function only.
Can someone please help? How can we create a calculated field for the above scenarios.
Think of your IF statements being applied to each row, you cannot have a status that is both in progress and closed.
I would arrange the text table like this:
Columns: Status
Rows: Issue ID (group) | Issue ID
Text Mark: Calculated Field (Named something like Total Time).
That will group all of the statuses together. You can change the aliases of the status if you want to say "Total Time for ..."
Then your calculated field would be:
DATEDIFF("day", [From Date], [Till Date])
And make sure you drag the pill over it is summing it. That will collapse everything at the status level, and then total the days.

FIFO inventory aging report using a single query in T-SQL

I've got an inventory transactions table :
Product
Date
Direction
Quantity
A
Date 1
IN
3
B
Date 2
IN
55.7
A
Date 3
OUT
1
B
Date 3
OUT
8
B
Date 3
IN
2
I can easily get the stock for any date with the following query :
SELECT Product,
SUM(CASE Direction WHEN 'IN' THEN Quantity ELSE -1 * Quantity END)
FROM Transactions
WHERE Date <= '#DateValue#'
GROUP BY Product;
Now my purpose is to get stocks aged like this using the FIFO principle :
Product
Total stock
0-30 days
31-60 days
61-90 days
91+ days
A
3
3
0
0
0
B
34.2
10
14.2
7
3
C
25
20
3
1
1
D
10
2
8
0
0
E
1
0
0
1
0
I am using SQL Server 2016 & SSMS 18.
The solution should be fast as it will be working against a table with 3,000,000+ rows.
A single query is preferred since it will be integrated into an ERP system.
I have yet to find a solution based on a single query after weeks of research. Any help is appreciated. Thanks in advance.

calculate the time difference for same column in Spotfire

I am a beginner for Spotfire. I have a problem about the difference calculation for the some column value. A sample table could be like this:
id timestamp state
1 7/1/2016 12:00:01 AM 1
2 7/1/2016 12:00:03 AM 0
3 7/1/2016 12:00:04 AM 1
4 7/1/2016 12:00:06 AM 0
5 7/1/2016 12:00:09 AM 1
6 7/1/2016 12:00:10 AM 0
7 7/1/2016 12:00:12 AM 1
I want to calculate the time difference for the timestamp when the state is 1,
the final table I want to have is:
id timestamp state time_diffence
3 7/1/2016 12:00:04 AM 1 3
5 7/1/2016 12:00:09 AM 1 5
7 7/1/2016 12:00:12 AM 1 3
it seems that I should identify an expression for the calculation, but I have not idea for the calculation just for one parameter :(. somebody could help me ?
still one more small question: what if the timestamp column value is just number value, how can i calculate the difference, is there any related function like DateDiff() here? for example:
id times state
1 12 1
2 7 0
3 10 1
4 11 0
5 6 1
6 9 0
7 7 1
the result could be :
id times state diffence
3 10 1 -2
5 6 1 -4
7 7 1 1
after running the code: i have the error as below:
for the row if it has the same time stamp as the last previous row, the difference will keep same as before, but actually the difference for the rows which have same time stamp would be as 0
thanks for your help :)
Assuming your data is sorted in ascending order by [timestamp] before you import it, you can partition using the Previous function with Over where the [state]=1.
Insert a calculated column with this expression:
If([state]=1,DateDiff("ss",Min([timestamp]) OVER (Previous([timestamp])),[timestamp]))
You will see it populated in your table like the below:
Then if you ONLY want to see the rows that have the difference you mentioned, on your table you can...
Right Click > Properties > Data > Limit data using expression >
And insert the expression: [time_difference] > 1
This will result in this table:

Processing Timebased values

I have a list of timebased values in the following form:
20/Dec/2011:10:16:29 9
20/Dec/2011:10:16:30 13
20/Dec/2011:10:16:31 13
20/Dec/2011:10:16:32 9
20/Dec/2011:10:16:33 13
20/Dec/2011:10:16:34 14
20/Dec/2011:10:16:35 6
20/Dec/2011:10:16:36 7
20/Dec/2011:10:16:37 16
20/Dec/2011:10:16:38 5
20/Dec/2011:10:16:39 7
20/Dec/2011:10:16:40 15
20/Dec/2011:10:16:41 12
20/Dec/2011:10:16:42 13
20/Dec/2011:10:16:43 11
20/Dec/2011:10:16:44 6
20/Dec/2011:10:16:45 7
20/Dec/2011:10:16:46 9
20/Dec/2011:10:16:47 14
20/Dec/2011:10:16:49 6
20/Dec/2011:10:16:50 11
20/Dec/2011:10:16:51 15
20/Dec/2011:10:16:52 10
20/Dec/2011:10:16:53 16
20/Dec/2011:10:16:54 12
20/Dec/2011:10:16:55 8
The second column contains value against each second. Values are there for complete month and for each and every second. I want to add these values:
Per minute basis. [for 00 - 59 seconds ]
Per hour basis [ for 00 - 59 minutes ]
Per Day basis. [ for 0 - 24 hours ]
Sounds like a job for Excel and a pivot table.
The trick is to parse the text date/time you have into something Excel can work with; splitting it on the colon will do just that. Assuming the value you have is in cell A2, this formula will convert the text into a real date:
=DATEVALUE(LEFT(A2,SEARCH(":",A2)-1))+TIMEVALUE(RIGHT(A2,LEN(A2)-SEARCH(":",A2)))
Then just create Minute, Hour and Day columns where you subtract out that portion of the date. For example, if the date from the above formula is in C2, the following will subtract out the seconds and give you just up to the minute:
=C2-SECOND(C2)/24/60/60
Then repeat the process for the next two columns to give you the hour and the day:
=D2-MINUTE(D2)/24/60
=E2-HOUR(E2)/24
Then all you have to do is create a pivot table on the data with rows Day, Hour, Minute and value Sum(Value).

Resources