Is there a way to calculate Custom Week numbers that start from a user's first Transaction Date onwards? The Users (emailId) and TransDate columns may not be in a sorted condition as shown below:
e.g.
+------+-------------+---------------------+
| WkNo | TransDate | emailId |
+------+-------------+---------------------+
| 1 | 2018-Aug-30 | moz.shea#abc.com |
| 1 | 2018-Aug-30 | moz.shea#abc.com |
| 10 | 2018-Nov-07 | moz.shea#abc.com |
| 1 | 2018-Aug-09 | zabi.prado#abc.com |
| 1 | 2018-Aug-09 | zabi.prado#abc.com |
| 6 | 2018-Sep-20 | zabi.prado#abc.com |
| 15 | 2018-Nov-23 | zabi.prado#abc.com |
| 21 | 2018-Dec-31 | zabi.prado#abc.com |
| 1 | 2018-Aug-20 | silo.whitte#abc.com |
| 5 | 2018-Sep-23 | silo.whitte#abc.com |
| 7 | 2018-10-11 | silo.whitte#abc.com |
| 7 | 2018-10-11 | silo.whitte#abc.com |
| 8 | 2018-Oct-14 | silo.whitte#abc.com |
| 9 | 2018-Oct-19 | silo.whitte#abc.com |
| 1 | 2018-Jul-01 | pablo.gucci#abc.com |
| 6 | 2018-Aug-10 | pablo.gucci#abc.com |
| 13 | 2018-Oct-03 | pablo.gucci#abc.com |
+------+-------------+---------------------+
I wrote the following formula using FILTER function that then supplies the filtered dates per user to the DATEDIF function. However, i am not getting the desired result as shown above.
=ARRAYFORMULA(if(B2:B="","",1 + round(DATEDIF(min(sort(FILTER(B2:B,C2:C=C2:C),1,true)),sort(FILTER(B2:B,C2:C=C2:C),1,true),"D")/7)))
EDIT:
Formula Result:
1
7
7
7
8
10
10
13
13
14
16
16
16
17
19
22
27
Also removed SORT from above formula:
=ARRAYFORMULA(if(B2:B="","",1 + round(DATEDIF(min(sort(FILTER(B2:B,C2:C=C2:C),1,true)),FILTER(B2:B,C2:C=C2:C),"D")/7)))
Formula Result:
10
10
19
7
7
13
22
27
8
13
16
16
16
17
1
7
14
Both seem to work, but give unexpected results as MIN is evaluating to a single date 2018-Jul-01 instead of an Array of Minimum dates per user. Where am i going wrong?
Did you try using the MINIFS() function ?
EG with the following
Dates in cells B2:B10
Emails in Cells C2:C10
The formula in cell A2 would give the earliest date for the email address in Cell C2
=MINIFS($B$2:$B$10, $C2:$C10, "="&C2)
You should be able to use this in your formula to calculate the number of weeks
For those who might face a similar challenge, here is the answer:
=ARRAYFORMULA(IF(B2:B="","",ROUND((B2:B-VLOOKUP(C2:C,SORT({C2:C,B2:B},2,1),2,0))/7)+1))
The idea is to do a Vlookup on column C, passing it a switched and sorted range of dates in ascending order. These first dates are then deducted from the B column dates, to get the desired result ie. either days or weeks.
One can remove the +1 as i used it just to display the starting week as 1 instead of 0. So the result may differ slightly. But without +1, the result is accurate, especially if you are doing a Cohort.
0
0
10
0
0
6
15
21
0
5
7
7
8
9
0
6
13
As a check, i removed division by 7 and +1, then checked the days, which are correct.
0
0
69
0
0
42
106
144
0
34
52
52
55
60
0
40
94
Hope this helps.
Related
I have a table with numeric data that i need make diferent combinations itself.
For example:
| A |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
I need to combine this single column to get the next result:
| A | B | C | D |
| - | - | - | - |
| 1 | | | |
| 1 | 2 | | |
| 1 | 2 | 3 | |
| 1 | 2 | 3 | 4 |
| 1 | 2 | | 4 |
| 1 | | 3 | |
| 1 | | 3 | 4 |
| 1 | | | 4 |
| | 2 | | |
| | 2 | 3 | |
| | 2 | 3 | 4 |
| | 2 | | 4 |
| | | 3 | |
| | | 3 | 4 |
| | | | 4 |
At the end of the table, i have to create a column with the Count of every column that has data and another column that contains the sums of number of each columns.
Maybe it sound very difficult or impossible, but I haven't a way to make it work.
I have try to "Cross Join" from SQL but didn't got the expected result.
Help!
In this case, you can solve this by counting in binary ending with the digits being the number of numbers in the set. etc. the starting set 2568 would end with 1111. this binary number would decide if you show that number in each row. Heres a table of how it would work.
| A |
|---|
| 2 |
| 5 |
| 6 |
| 8 |
A
B
C
D
Binary
Row number
8
0001
1
6
0010
2
6
8
0011
3
5
0100
4
5
8
0101
5
5
6
0110
6
5
6
8
0111
7
2
1000
8
2
8
1001
9
2
6
1010
10
2
6
8
1011
11
2
5
1100
12
2
5
8
1101
13
2
5
6
1110
14
2
5
6
8
1111
15
I have an Oracle DB View like:
DATE | PRODUCT_NUMBER | PRODUCT_COUNT | PRODUCT_FACTOR
2018-01-01 | 1 | 10 | 3
2018-03-15 | 1 | 8 | 3
2019-02-11 | 1 | 11 | 3
2019-08-01 | 1 | 5 | 3
2019-08-01 | 2 | 20 | 5
2019-08-02 | 2 | 15 | 5
2019-06-01 | 2 | 5 | 5
2020-07-01 | 2 | 30 | 5
2018-07-07 | 3 | 100 | 2
Where,
DATE is the date
NUMBER is a unique Product Number
COUNT is the number of items from the Product Number in the storage facility
FACTOR is the number of products that fit into a storage rack
I now need to know how much it changed since the last update for every Product Number.
Since the first entry has no past date to compare to, change is undefined and something like NULL, NONE, 0 or so. Doesn't matter as long as I can filter those out later.
Some products only have 1 entry, those should be ignored (nothing to calculate difference on).
End result should be:
DATE | PRODUCT_NUMBER | PRODUCT_COUNT | PRODUCT_FACTOR | PRODUCT_CHANGE | CHANGE_FACTOR
2018-01-01 | 1 | 10 | 3 | NULL | NULL
2018-03-15 | 1 | 8 | 3 | 2 # 10-8 | 6 # 2*3
2019-02-11 | 1 | 11 | 3 | -3 # 8-11 | -9 # 3*-3
2019-08-01 | 1 | 5 | 3 | 6 # 11-5 | 18 # 6*3
2019-08-01 | 2 | 20 | 5 | -15 # 5-20 | -75 # -15*5
2019-08-02 | 2 | 15 | 5 | 5 # 20-15 | 25 # 5*5
2019-06-01 | 2 | 5 | 5 | NULL | NULL
2020-07-01 | 2 | 30 | 5 | -15 # 15-30 | -75 # -15*5
How can I achieve this within Oracle SQL?
End result is a bit unclear:
Why for product_number 2 15 and 5 values are compared - 2019-06-01 is less than 2019-08-01 and should be first row
Why change_factor for product 1 on the first row is 3 and for product 2 it's null
Why change_factor for 2019-02-11 is calculated as 11 * 0 instead of 0 * 3
Assumming all of this as typos(changed 2019-06-01 to 2019-09-01) you can use something like below
select dt, product_number, product_count, product_factor, product_change, product_change*product_factor change_factor
from (
select "DATE" dt, product_number, product_count, product_factor,
greatest(lag(product_count) over(partition by product_number order by "DATE") - product_count, 0) product_change
from test_tab t1
where (select count(1) from test_tab t2 where t1.product_number = t2.product_number and rownum < 3) > 1
)
fiddle
See also LAG documentation
I have seen many examples about how to sort a DataFrame based on some specific columns.
What I want to achieve is to sort Columns DataFrame individually, independently of each other. See the example below.
Input
+-----------+-----------+----------------+
| Column1 | Column2 | Column 3 |
+-----------+-----------+----------------+
| 61 | 5 | 9 |
| 14 | 16 | 8 |
| 26 | 27 | 7 |
+-----------+-----------+----------------+
Output
+-----------+-----------+----------------+
| Column1 | Column2 | Column 3 |
+-----------+-----------+----------------+
| 14 | 5 | 7 |
| 26 | 16 | 8 |
| 61 | 27 | 9 |
+-----------+-----------+----------------+
Any clue how can I achieve this?
I am trying to understand the time complexity of the recursive Fibonacci algorithm.
fib(n)
if (n < 2)
return n
return fib(n-1)+fib(n-2)
Having not much mathematical background, I tried computing it by hand. That is, I manually count the number of steps as n increases. I ignore all things that I think are constant time. Here is how I did it. Say I want to compute fib(5).
n = 0 - just a comparison on an if statement. This is constant.
n = 1 - just a comparison on an if statement. This is constant.
n = 2 - ignoring anything else, this should be 2 steps, fib(1) takes 1 step and fib(0) takes 1 step.
n = 3 - 3 steps now, fib(2) takes two steps and fib(1) takes 1 step.
n = 4 - 5 steps now, fib(3) takes 3 steps and fib(2) takes 2 steps.
n = 5 - 8 steps now, fib(4) takes 5 steps and fib(3) takes 3 steps.
Judging from these, I believe the running time might be fib(n+1). I am not so sure if 1 is a constant factor because the difference between fib(n) and fib(n+1) might be very large.
I've read the following on SICP:
In general, the number of steps required by a tree-recursive process
will be proportional to the number of nodes in the tree, while the
space required will be proportional to the maximum depth of the tree.
In this case, I believe the number of nodes in the tree is fib(n+1). So I am confident I am correct. However, this video confuses me:
So this is a thing whose time complexity is order of actually, it
turns out to be Fibonacci of n. There's a thing that grows exactly as
Fibonacci numbers.
...
That every one of these nodes in this tree has to be examined.
I am absolutely shocked. I've examined all nodes in the tree and there are always fib(n+1) nodes and thus number of steps when computing fib(n). I can't figure out why some people say it is fib(n) number of steps and not fib(n+1).
What am I doing wrong?
In your program, you have this time-consuming actions (sorted by time used per action, quick actions on top of the list):
Addition
IF (conditional jump)
Return from subroutine
Function call
Lets look at how many of this actions are executed, and lets compare this with n and fib(n):
n | fib | #ADD | #IF | #RET | #CALL
---+-----+------+-----+------+-------
0 | 0 | 0 | 1 | 1 | 0
1 | 1 | 0 | 1 | 1 | 0
For n≥2 you can calculate the numbers this way:
fib(n) = fib(n-1) + fib(n-2)
ADD(n) = 1 + ADD(n-1) + ADD(n-2)
IF(n) = 1 + IF(n-1) + IF(n-2)
RET(n) = 1 + RET(n-1) + RET(n-2)
CALL(n) = 2 + CALL(n-1) + CALL(n-2)
Why?
ADD: One addition is executed directly in the top instance of the program, but in the both subroutines, that you call are also additions, that need to be executed.
IF and RET: Same argument as before.
CALL: Also the same, but you execute two calls in the top instance.
So, this is your list for other values of n:
n | fib | #ADD | #IF | #RET | #CALL
---+--------+--------+--------+--------+--------
0 | 0 | 0 | 1 | 1 | 0
1 | 1 | 0 | 1 | 1 | 0
2 | 1 | 1 | 3 | 3 | 2
3 | 2 | 2 | 5 | 5 | 4
4 | 3 | 4 | 9 | 9 | 8
5 | 5 | 7 | 15 | 15 | 14
6 | 8 | 12 | 25 | 25 | 24
7 | 13 | 20 | 41 | 41 | 40
8 | 21 | 33 | 67 | 67 | 66
9 | 34 | 54 | 109 | 109 | 108
10 | 55 | 88 | 177 | 177 | 176
11 | 89 | 143 | 287 | 287 | 286
12 | 144 | 232 | 465 | 465 | 464
13 | 233 | 376 | 753 | 753 | 752
14 | 377 | 609 | 1219 | 1219 | 1218
15 | 610 | 986 | 1973 | 1973 | 1972
16 | 987 | 1596 | 3193 | 3193 | 3192
17 | 1597 | 2583 | 5167 | 5167 | 5166
18 | 2584 | 4180 | 8361 | 8361 | 8360
19 | 4181 | 6764 | 13529 | 13529 | 13528
20 | 6765 | 10945 | 21891 | 21891 | 21890
21 | 10946 | 17710 | 35421 | 35421 | 35420
22 | 17711 | 28656 | 57313 | 57313 | 57312
23 | 28657 | 46367 | 92735 | 92735 | 92734
24 | 46368 | 75024 | 150049 | 150049 | 150048
25 | 75025 | 121392 | 242785 | 242785 | 242784
26 | 121393 | 196417 | 392835 | 392835 | 392834
27 | 196418 | 317810 | 635621 | 635621 | 635620
You can see, that the number of additions is exactly the half of the number of function calls (well, you could have read this directly out of the code too). And if you count the initial program call as the very first function call, then you have exactly the same amount of IFs, returns and calls.
So you can combine 1 ADD, 2 IFs, 2 RETs and 2 CALLs to one super-action that needs a constant amount of time.
You can also read from the list, that the number of Additions is 1 less (which can be ignored) than fib(n+1).
So, the running time is of order fib(n+1).
The ratio fib(n+1) / fib(n) gets closer and closer to Φ, the bigger n grows. Φ is the golden ratio, i.e. 1.6180338997 which is a constant. And constant factors are ignored in orders. So, the order O(fib(n+1)) is exactly the same as O(fib(n)).
Now lets look at the space:
It is true, that the maximum space, needed to process a tree is equal to the maximum distance between the tree and the maximum distant leaf. This is true, because you call f(n-2) after f(n-1) returned.
So the space needed by your program is of order n.
I have a table in SPSS that contains multiple columns, like this:
+--------+-------+-------+-------+-------+
| | Col 1 | Col 2 | Col 3 | Total |
+--------+-------+-------+-------+-------+
| Data 1 | 10 | 1 | 30 | 41 |
| Data 2 | 4 | 10 | 10 | 24 |
| Data 3 | 3 | 40 | 1 | 44 |
| Data 4 | 10 | 5 | 3 | 18 |
+--------+-------+-------+-------+-------+
I want to add a row at the bottom that calculates the total of each column. in the end, it would look something like this:
+--------+-------+-------+-------+-------+
| | Col 1 | Col 2 | Col 3 | Total |
+--------+-------+-------+-------+-------+
| Data 1 | 10 | 1 | 30 | 41 |
| Data 2 | 4 | 10 | 10 | 24 |
| Data 3 | 3 | 40 | 1 | 44 |
| Data 4 | 10 | 5 | 3 | 18 |
| TOTAL | 27 | 56 | 44 | 127 |
+--------+-------+-------+-------+-------+
Does anyone know what I would have to add to my current code to achieve this?
EDIT: Here is my current code:
TEMPORARY.
SELECT IF Remove = 0.
CTABLES
/VLABELS VARIABLES=TBI1 ME1 BFCE1 CFCE1 RTWPM1 VPA1 NPS1 NPA1 PROV
DISPLAY=LABEL
/TABLE TBI1 [C] + ME1 [C] + BFCE1 [C] + CFCE1 [C] + RTWPM1 [C] + VPA1 [C] + NPS1 [C] + NPA1 [C] BY PROV [C]
[COUNT F40.0, ROWPCT.COUNT PCT40.1]
/CATEGORIES VARIABLES=TBI1 ME1 BFCE1 CFCE1 RTWPM1 VPA1 NPS1 NPA1 [1.00] EMPTY=INCLUDE
/CATEGORIES VARIABLES=PROV ORDER=A KEY=VALUE EMPTY=EXCLUDE TOTAL=YES
/TITLES TITLE='Brain Injury' CAPTION='Type of assessment and volume of each service by provider.'.
You can get output like this from the CROSSTABS procedure (Analyze > Descriptive Statistics > Crosstabs)
Here is one option you may want to try.
Create two new variables in syntax window.
COMPUTE ALL_TAB_COL = 1.
COMPUTE ALL_TAB_ROW = 1.
EXECUTE.
VARIABLE LABELS ALL_TAB_COL 'All Cols'
/ALL_TAB_ROW 'All Rows'.
Go to Main Menu Analyze > Tables > Custom Tables
From Variable List
Drag ALL_TAB_COL to Column Area
Drag ALL_TAB_ROW to Row Area (if needed)
Thereafter you can drag other column to appear after ALL_TAB_COL;
and row variables to appear below ALL_TAB_ROW.
In Summary statistics submenu if you choose Count, the ALL_TAB_COL shows, Column Sum of all Counts against the ALL_TAB_ROW and further break ups by other variables.