Get the variable name instead of the value when searching for peak - max

I have the following problem: I have measured cortisol in different persons and now have a dataframe that shows the respective cortisol value at the different times per test person. Now I would like to determine the maximum value of each test person (the peak), however, I do not only need the value, but above all I would like to know to which point in time this value belongs, i.e. when the cortisol was at its highest. With the code I have inserted here, I can output the maximum value per person, but I would also like to output the time. Can you help me with this?
Many thanks
this is the code I tried:
dataframe %>%
group_by(vp_code) %>%
summarise(max = max(Cortisol, na.rm=TRUE)) %>%
data.frame

Related

Writing a formula in a cell in Google Sheets that averages the results from a column derived from expected values in multiple columns

I'm an average user of Google sheets and I've tried writing/looking up the formula I'm going for, but I haven't had any luck yet.
I have a spreadsheet that details multiple values that I need to display in a single cell the average of a certain set of values derived from a specific set of those values from multiple columns.
The flow of information would look something along the lines of:
if value in Column D=L
then
if value in Column J<$1.20
then
Find Avg of all Values in Column N
I'd need the formula to narrow it's field of data each time so the final result was the average of all the values in Column N that had a value in column J<$1.20 with a value in Column D=L.
I feel like a dummy over here because I just can't narrow down how I should write this flow and get it to work right without adding multiple extra hidden columns. Can anyone help on this one?
I've tried writing the formula multiple different ways but haven't kept it written down to pass on.

Power Query, avg value based on the values appearing within a specified date range

Context:
I have a data set for the weights of truck and trailer combinations coming into my site over the span of a few years. I have organized my data by seasons as I am trying to prove that the truck:trailers in winter are noticeably heavier due to ice, snow, and mud. The theory is, if the tare weight is higher in this season (the weight of the truck after it empties its load) than its Avg tare weight (which I need to calculate from the data) it can be deduced that the truck:trailer combinations are coming in with extra weight that we pay for in part as some snow/ice/mud falls off in the trailer emptying process.
What I've done so far:
I've defined a custom date range for my seasons
I've grouped Truck:Trailer by: count to get a duplicates column and, all rows to keep all my details
I've filtered out every combination I've seen less than 50 times, as i want good representation for each truck:trailer combo so that I can better emphasize repeated patterns
I've added an index column to better keep track of the individuals before expanding the details
What I need to do:
I only want to work with truck:trailer combinations which have weighed in for all four seasons at least once
I need to find the average tare weight of the truck:trailer combinations based over the extended range for both summer and autumn (the dry time of the year) while preserving the raw tare data for all seasons, as I need to eventually compare the winter tare values to this average.
example of my data
When I'm finished I'd like the data to look something like this
Pivot Chart
query data
For your first question (all seasons) you can add a column that holds the distinct count of the values in [Season] for each [Driver:Trailer]. Then filter your table on that column, keeping only the 4's. To achieve this, add the following m-code to your script in the Advanced Editor. Change the part after in to #"DistinctCount Season"
#"DistinctCount Season" = Table.Join(#"insert name previous step","Driver:Trailer",
Table.Group(#"insert name previous step", {"Driver:Trailer"},
{{"DistinctCountSeasons", each Table.RowCount(Table.Distinct(_,"Season")),
type number}}),"Driver:Trailer")
Insert the name of your previous step where indicated.
For second question:
You can use a matrix-visual for that in you report. First create a measure:
[AverageTare] = AVERAGE(table'[Tare])
Then put [Season] on Rows and the [AverageTare] on Values. You can create a group (right-click on [Season] in the FIELDS-pain) called [DrySeason], to combine the values for Spring and Summer.
If that doesn't work for you, explore the AVERAGEX function.
EDIT
In excel you can use a pivottable. Put [Season] on Rows and the [AverageTare] on Values. Right-click a value in the pivottable. Select Value Field Setting and choose Average. Then select the Seasons you want to group, right-click and select Group.
EDIT 2
To add a column in the Power Query Editor that holds the average [Tare] for the [Season] in each row, add the following steps to your script in the Avanced Editor:
#"GroupedSeasonAvg" = Table.Group(#"Insert name previous step", {"Season"}, {{"AVG", each List.Average([Tare]), type number}}),
#"JoinOnSeason" = Table.NestedJoin(#"Insert name previous step",{"Season"},GroupedSeasonAvg,{"Season"},"AVGGrouped"),
#"ExtractSeasonAVG" = Table.ExpandTableColumn(JoinOnSeason, "AVGGrouped", {"AVG"}, {"SeasonAVG"})
It works something like this:
"GroupedSeasonAvg" : Creates a table with the avereges for each [Season]
"JoinOnSeason": Creates a new column with tables joining the [Season] value for each row to [Season] in the grouped table.
#"ExtractSeasonAVG": Expand each table and keep only [AVG].

How to calculate a value using previous rows' values in Talend

I have a dataset like below.
Dataset:
Now the business logic is to find out the last paid date for each of the loans. I tried using a tmap component, it calls a java routine that has a static variable last_paid_dt which would store the transaction date when the daily deposit > 0. However, when the daily deposit is less than 0 the static var would not get changed. This works fine when the amount paid is 0.
Issue - See the red highlighted values in the table below
When the amount paid is reversed a day or after, the last paid should be from previous non-reversed positive amount. I was not able to get that done.
Also when a new loan id starts processing I need the static variable to get reset which is not currently happening.
If my current methodology is wrong, please help me doing in a better and efficient way. Thanks
Expected output:
First of all you need to use a Map component, with the key being the loanId.
You don't want to overwrite the value. I.e. If the key exists in your map, then do not overwrite it with a new value.
You can use the globalMap if you want, in that case I'd do:
globalMap.get("loan_"+loanId) != null ?
globalMap.put("loand_"+loanId,loanDate) : loanDate
then later:
globalMap.get("loan_"+loanId)
Not elegant, but works. a More elegant would be to define your own map that you put into globalMap and after the process null it out, so you free up the memory. But this all depends on the complexity of your job.

Dax - how to get average value per day where column x value is y?

I'm trying to figure out how to get average value per day where type is for example of type "Test". This would show
the average value for all rows wich has the type as "Test" and the average of that same day. So all days would
probably get different average values. How would this be done in Dax syntax? Below is a combination of sql
and normal text which might help explain what im trying to archieve.
select average Values per day from mytable where type = test
I'm providing an example table that might make it easier to understand, it's a quite simple question but I
fail to find information on how to solve it, any help or input is highly appreciated, thanks!
You can filter the dataset to calculate the test value average.
AverageTestValues =
CALCULATE ( AVERAGE ( myTable[Values] ), FILTER ( myTable, [Type] = "Test" ) )
This expression calculates the test values average in a given context. If you use the measure in a total context you will get the average of all rows type Test.
Let me know if this helps.

Can I compare values in the same column in adjacent rows in PowerPivot?

I have a PowerPivot table for which I need to be able to determine how long an item was in an Error state. My data set looks something like this:
What I need to be able to do is to look at the values in the ID and State columns, and see if the value in the previous row is ERROR in the State column, and the same in the ID column. If it is, I then need to calculate the difference between the Changed Date values in those two rows.
So, for example, when I got to row 4, I would see that the value in the State column for Row 3, the previous row, is ERROR, and that the value in the ID column in the previous row is the same as the current row, so I would then calculate the difference between the Changed Date values in Row 3 and Row 4 (I don't care about the values in any of the other columns for this particular requirement).
Is there a way to do this in PowerPivot? I've done a fair amount of Internet searching, and it looks like if it can be done, it would use the EARLIER or EARLIEST DAX functions, but I can't find anything that tells me how, or even if, this can be done.
Thanks.
Chris,
I have had similar requirements many times and after a really long time of trial-and-error, I finally understood how EARLIER works. It can be very powerful, but also very slow so always check for the performance of your calculations.
To answer your question, you will need to create 4 calculated columns:
1) Item Rank - used for ranking the issues with same Item ID
=COUNTROWS(FILTER('ID', EARLIER([Item ID]) = [Item ID] && EARLIER([Date]) >= [Date]))
2) Follows Error - to easily find issue that follows EROR issue
=IF([State] = "EROR",[Item Rank]+1)
3) Time of Following Issue - simple lookup so that you can calculate the different
=IF([Follows Error]>0,
LOOKUPVALUE([Date], [User], [User], [Item Rank], [Follows Error]),
BLANK()
)
4) Time Diff - calculation of time different for the specific issue
=IF([State]="EROR",
DAY([Time of Following Issue])-DAY([Date]),
BLANK()
)
With those calculated columns, you can then easily create a powerpivot table, drag State and Item Id onto the ROWS pane and then simply add Time Diff to Values. You will get an overview of issues that contain string "EROR" issue and the time it took to resolve them.
This is what it looks like in PowerPivot window:
And the resulting Pivot table:
You can download my Excel file here (2013).
As I mentioned, be careful with the performance as the calculated columns with nested EARLIER and IF conditions might be a bit too performance-demanding. If there is a smarter way, I would be very happy to see it, but for now this works for me just fine.
Also, keep in mind that all calculated columns could be nested into 1, but I kept them separated to make it easier to understand the formulas.
Hope this helps :-)

Resources