Dividing within a column based on other columns matching - datatable

I am trying to calculate the current measurement in column “Total" minus the lowest measurement previously recorded in column "Total" where the current measurement in column “Total” corresponding to the value in column "Trade" is less than(<) the minimum measurement in column “Total” corresponding to the value in column "trade", and if two values in the “SUBJECT” column match and two values in the “PROCEDURE” column match. To emphasize, the minimum value must be a previously recorded. If the measurement is less than the current measurement but was not recorded previously (according to the “date” column), it does not quality to be subtracted from the current measurement. An example of the output is provided below.
data Have;
input Subject Type :$12. Date &:anydtdte. Trade Procedure :$12. Measurement;
format date yymmdd10.;
datalines;
Subject Type Date Trade Procedure Total
500 Initial 15 AUG 2017 6 Invasive 20
500 Initial 15 AUG 2017 9 Surface 35
500 Followup 15 AUG 2018 8 Invasive 54
428 Followup 15 AUG 2018 56 Outer 29
765 Seventh 3 AUG 2018 12 Other 13
500 Followup 3 JUL 2018 23 surface 98
428 Initial 3 JUL 2017 34 Outer 10
765 Initial 20 JUL 2019 4 Other 19
610 Third 20 AUG 2019 58 Invasive 66
610 Initial 17 Mar 2018 25 Invasive 17
*Example of Output;
Subject Type Date Trade Procedure Total Output
500 Initial 15 AUG 2017 6 Invasive 20 20/20
500 Initial 15 AUG 2017 9 Surface 35 35/35
500 Followup 15 AUG 2018 8 Invasive 54 54/20
428 Followup 15 AUG 2018 56 Outer 29 29/10
765 Seventh 3 AUG 2018 12 Other 13 13/19
500 Followup 3 JUL 2018 23 surface 98 98/35
428 Initial 3 JUL 2017 34 Outer 10 10/10
765 Initial 20 JUL 2019 4 Other 19 19/19
610 Third 20 AUG 2019 58 Invasive 66 66/17
610 Initial 17 Mar 2018 25 Invasive 17 17/17

not sure, but this is the closest thing i could get to match your output
I made a monotonic() variable, and then ranked it by SUBJECT and PROCEDURE variables. and then joined the table to itself using condition t1.rank_monotonic +1 = t2.rank_monotonic.

Related

How to subset rows from one dataframe based on matching values from a second smaller data frame in R

I want to select a control group from one data frame based of matching the age from a second data frame. As an example I have subject.df
subject.df
id age
1 1 55
2 2 62
3 3 73
4 4 54
5 5 66
I'd like to subset control.df based off of matching the age directly on a 1 to 1 matching from the subject.df dataframe.
control.df
id age
6 6 66
7 7 71
8 8 80
9 9 51
10 10 55
11 11 56
12 12 77
13 13 62
14 14 64
15 15 73
16 16 67
17 17 54
18 18 75
19 19 77
20 20 78
21 21 53
22 22 64
23 23 83
24 24 61
25 25 77
I'm fairly new to R. In the past I've used Matlab and in this instance would use a for loop to iterate over the control.df dataframe, but I've been told that R doesn't always like for loops and that it can be computationally difficult in R.
In the end I'll be doing this on a much larger data set where the subject group is around 250 and the control group is more than 40K so I know that 1:1 matching is possible.

Finding cummulative sum of MAX values

I need to calculate the cumulative sum of Max value per period (or per category). See the embedded image.
So, first, I need to find max value for each category/month per year. Then I want to calculate the cumulative SUM of these max values. I tried it by setting up max measure (which works fine for the first step - finding max per category/month for a given year) but then I fail at finding a solution to finding cumulative SUM (finding the cumulative Max is easy, but it is not what I'm looking for).
Table1
Year Month MonthlyValue MaxPerYear
2016 Jan 10 15
2016 Feb 15 15
2016 Mar 12 15
2017 Jan 22 22
2017 Feb 19 22
2017 Mar 12 22
2018 Jan 5 17
2018 Feb 16 17
2018 Mar 17 17
Desired Output
Year CumSum
2016 15
2017 37
2018 54
This is a bit similar to this question and this question and this question as far as subtotaling, but also includes a cumulative component as well.
You can do this in two steps. First, calculate a table that gives the max for each year and then use a cumulative total pattern.
CumSum =
VAR Summary =
SUMMARIZE(
ALLSELECTED(Table1),
Table1[Year],
"Max",
MAX(Table1[MonthlyValue])
)
RETURN
SUMX(
FILTER(
Summary,
Table1[Year] <= MAX(Table1[Year])
),
[Max]
)
Here's the output:
If you expand to the month level, then it looks like this:
Note that if you only need the subtotal to work leaving each row as a max (15, 22, 17, 54) rather than as a cumulative sum of maxes (15, 37, 54, 54), then you can use a simpler approach:
MaxSum =
SUMX(
VALUES( Table1[Year] ),
CALCULATE( MAX( Table1[MonthlyValue] ) )
)
This calculates the max for each year separately and then adds them together.
External References:
Subtotals and Grand Totals That Add Up “Correctly”
Cumulative Total - DAX Patterns

How to calculate / describe relative position (rubix cube)

This is an algorithmic problem. I can't seem to find a way to compare relative positions of 2 cubes in a rubix cube.
I've numbered all the 20 cubes in my program. and I'm using their this coordinate system, but now that I wanted to model two cubes in relative position I'm having trouble.
For example, say I saw the two cubes I'm watching in position 8 and 10, then later I saw them in position 12 and 13, well in both situations they're both on the same face of the cube, and they're both across from each other, not adjacent. Relatively speaking, that's the same representation of their location.
(By the way I'm only concerned with the "edge cubes" at this point, that's not the corners, so: 8 10 9 11 12 13 14 15 16 17 18 19 positions).
So anyway I thought if I listed every position in relation to each staring point, using the same algorithm to list each one, then I could compare the indexes and if they were the same, the relative position would be the same (but I was wrong, I might be on the right track, but it doesn't always work):
08 10 18 16 12 13 14 15 09 11 19 17
09 11 19 17 13 14 15 12 10 08 16 18
10 18 16 08 14 15 12 13 11 09 17 19
11 19 17 09 15 12 13 14 08 10 18 16
12 13 14 15 11 19 17 09 16 08 10 18
13 14 15 12 08 16 18 10 17 09 11 19
14 15 12 13 09 17 19 11 18 10 08 16
15 12 13 14 10 18 16 08 19 11 09 17
16 08 10 18 19 17 09 11 13 12 15 14
17 09 11 19 16 18 10 08 14 13 12 15
18 16 08 10 17 19 11 09 15 14 13 12
19 17 09 11 18 16 08 10 12 15 14 13
Consider the following two positions: cube A is at potion 19 and cube b is at 16. they're adjacent on the bottom level. Here's "19" row and it's indices to 16:
0 1 2 3 4 5
19 17 09 11 18 16 08 10 12 15 14 13
Now compare that to the relative position of the cube c and d at 13 and 9. C and D are adjacent on the right side, so they should have the same relative position. But my method doesn't determine that.
0 1 2 3 4 5 6 7 8 9
13 14 15 12 08 16 18 10 17 09 11 19
index 6 is not equal to index 9. Anyway that was my best approach and it took all day to come up with.
Does anyone have any other strategies that come to mind for calculating / expressing relative position between two locations on a cube?
Thanks very much for your help, and consideration on this topic!
There are two problems here:
I think you made a mistake when you calculated the relative positions from cube 13. I get:
0 1 2 3 4 5 6 7 8 9 10 11
13 14 15 12 17 09 11 19 08 16 18 10
This lines up with the other one, so cube 9 occurs at position 5. Compare this with the first row:
0 1 2 3 4 5 6
19 17 09 11 18 16 08 10 12 15 14 13
As required, cube 16 also occurs at position 5 (I think you mixed something up in your question. You mention index 6 when you mean 5. You number the indexes up to 6, but at position 6 there is cube 8, not cube 16. Please check that again).
The second problem is that given only a cube position without a reference cube for the orientation, there are two ways to number the cubes. Since your cube is not colored, you can rotate the cube by 180 degrees and come to another numbering for the reference cubes. Given that the relative positions for cube 19 are correct, I can also number the relative positions for cube 13 like this:
0 1 2 3 4 5 6 7 8 9 10 11
13 12 15 14 08 16 18 10 17 09 11 19
Note that this is close to your version but indexes 1 to 3 are in a different order. I think you were not consistent in the way you looked at the cube.
The main problem already becomes apparent in this paragraph:
For example, say I saw the two cubes I'm watching in position 8 and
10, then later I saw them in position 12 and 13, well in both
situations they're both on the same face of the cube, and they're both
across from each other, not adjacent. Relatively speaking, that's the
same representation of their location.
For every cube, there are two other cubes being on the same face and across from each other. To eliminate this ambiguity, you have to take orientations into account or reduce the number of relative positions (e.g. index 1 and 3 in your current scheme would denote the same relative position).

Initializing matrix with a function

I'm looking for something in Julia like a comprehension but for a matrix instead of a vector. If i have some single-variable function f(x) and I want an array that is filled with f(i) for i in 1..10, I can do this:
[f(i) for i = 1:10]
If I have some two-variable function g(i,j) and I want a matrix from i=[1,10]; j=[1,10] filled with the function I can do this:
M = zeros (10,10)
for i in 1:10
for j in 1:10
M[i,j] = g(i,j)
end
end
Is there some shortcut that allows me to express that in a shorter way and without wasting time allocating all that zeros?
Just use a multidimensional comprehension directly:
julia> g(x,y) = 2x+y
g (generic function with 1 method)
julia> [g(i,j) for i=1:10, j=1:10]
10x10 Array{Int64,2}:
3 4 5 6 7 8 9 10 11 12
5 6 7 8 9 10 11 12 13 14
7 8 9 10 11 12 13 14 15 16
9 10 11 12 13 14 15 16 17 18
11 12 13 14 15 16 17 18 19 20
13 14 15 16 17 18 19 20 21 22
15 16 17 18 19 20 21 22 23 24
17 18 19 20 21 22 23 24 25 26
19 20 21 22 23 24 25 26 27 28
21 22 23 24 25 26 27 28 29 30
This works for any number of dimensions, by adding variable ranges at the end.

Summing by Column

Suppose we have the following columns:
X Y Z
Category Date Amount
A January 10
A February 20
A March 30
B January 34
B February 45
B March 65
C January 87
C February 98
C March 100
D January 80
D February 90
I want to sum the Amount column by Category and Date . So for Category A, we would have the sum of the amount be 10+20+30 = 60 for the dates between January and March. In Oracle BI, how would we do this? Note that Some categories might have missing dates. So I want to sum the Amounts for the only the the available dates between January and March. Category D, for example, has March missing. So the total amount would be 80+90 = 170.
When I do the following, I just get the sum of all the amounts:
sum("Z"."Amount")
If the required result has to be achieved through OBIEE Answer, then it can be done in following way.
Create a table with columns - Category, Date, Amount.
Go to Results tab. Edit view of the table.
Click on Total By icon above Category column. Both After and Report-Based Total (when applicable) should be ticked.
The result will be coming as shown.
Category Date Amount
A January 10
February 20
March 30
A Total 60
B January 34
February 45
March 65
B Total 144
C January 87
February 98
March 100
C Total 285
D January 80
February 90
D Total 170
You can do this quite simply by editing the column formula from within the Criteria. When you look at it to begin, your Amount column formula probably looks something like "Z"."Amount". You can edit this slightly to change the aggregation level:
sum("Z"."Amount" by "X"."Category")
That should give you something like:
Category Date Amount
A Jan 60
A Feb 60
A Mar 60
B Jan 144
B Feb 144
B Mar 144

Resources