How to take, individually, the first and last value from groupby columns? - max

Inside a for loop, I am pulling the first and last value from multiple columns, so that I can use each of these values to create a scalar value that is then mathematically applied to create a new column.
My workaround was to limit the selection to the head or tail of each column, and then turn it into a numeric type (int or float) by using min() or max():
for title, group in df.groupby('Test'):
x1 = min(group["Test Reading"].head(1))
x2 = max(group["Test Reading"].tail(1))
x3 = min(group["Test Point"].head(1))
x4 = max(group["Test Point"].tail(1))
R=(x2-x1)/(x4-x3) #linearization scalar
group['Test Point Error'] =100*(group['Test Reading']- (group['Test Point']*R+x1))/(x2-x1)
There are other problems with the code, but I've attempted to address those in another question (How do I use .loc with groupby so that creating a new column based on grouped data won't be considered a copy?).

First we groupby the data by Test and use .agg to get the first occurrence of Test_Point and Test_reading. We save grouped df to a new variable df_agg.
df_agg=df.groupby("Test).agg({"Test_Reading":["first","last"],"Test_Point":["first","last"]})
We flatten the multindex in the columns:
df_agg.columns=["_".join(c) for c in df_agg.columns]
We make your calculation for R:
df_agg["R"]=(df_agg["Test Reading_last"]-df_agg["Test Reading_first"])/(df_agg["Test_Point_last"]-df_agg["Test_Point_first"])
We merge the aggregate df to the original one on the value of Test, so that we have R avaible for each example.
df.merge(df_agg,on="Test")
Now you have the value of R and x1 ... x4 available for all examples and you can run your formula.
df['Test Point Error'] =100*(df['Test Reading']- (df['Test Point']*df["R"]+df["Test Reading_first"]))/(df["Test Reading_last"]-df["Test Reading_first"])

Related

Power BI DAX measure: Count occurences of a value in a column considering the filter context of the visual

I want to count the occurrences of values in a column. In my case the value I want to count is TRUE().
Lets say my table is called Table and has two columns:
boolean value
TRUE() A
FALSE() B
TRUE() A
TRUE() B
All solutions I found so far are like this:
count_true = COUNTROWS(FILTER(Table, Table[boolean] = TRUE()))
The problem is that I still want the visual (card), that displays the measure, to consider the filters (coming from the slicers) to reduce the table. So if I have a slicer that is set to value = A, the card with the count_true measure should show 2 and not 3.
As far as I understand the FILTER function always overwrites the visuals filter context.
To further explain my intent: At an earlier point the TRUE/FALSE column had the values 1/0 and I could achieve my goal by just using the SUM function that does not specify a filter context and just acts within the visuals filter context.
I think the DAX you gave should work as long as it's a measure, not a calculated column. (Calculated columns cannot read filter context from the report.)
When evaluating the measure,
count_true = COUNTROWS ( FILTER ( Table, Table[boolean] = TRUE() ) )
the first argument inside FILTER is not necessarily the full table but that table already filtered by the local filter context (including report/page/visual filters along with slicer selections and local context from e.g. rows/column a matrix visual).
So if you select Value = "A" via slicer, then the table in FILTER is already filtered to only include "A" values.
I do not know for sure if this will fix your problem but it is more efficient dax in my opinion:
count_true = CALCULATE(COUNTROWS(Table), Table[boolean])
If you still have the issue after changing your measure to use this format, you may have an underlying issue with the model. There is also the function KEEPFILTERS that may apply here but I think using KEEPFILTERS is overcomplicating your case.

Simpler alternative to simultaneously Sort and Filter by column in Google Spreadsheets

I have a spreadsheet (here's a copy) with the following (headered) columns:
A: Indices for a list of groceries;
B: Names for the groceries to be indexed by column A;
C: Check column with "x" for inactive items in column B, empty otherwise;
D: Sorting indices that I want to apply to column B;
Currently, I am getting the sorted AND filtered result with this formula:
=SORT(FILTER(B2:B; C2:C = ""); FILTER(D2:D; C2:C = ""); TRUE)
The problem is that I need to apply the filter two times: one for the items and one for the indices, otherwise I get a mismatch between elements for the Sort function.
I feel that this doesn't scale well since it creates duplication.
Is there a way to get the same results with a simpler formula or another arrangement of columns?
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D=""))
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D="");2;1)
or maybe: =SORT(FILTER(Itens!B2:B; Itens!D2:D="");2;1)

PowerPivot DAX Max of two values

I have two columns and I need to extract the maximum value of those two for every row in my table. I have looked at the Max, Maxx and Maxa, but they all have input for just one column.
How would I write following expression in a Calculated column:
=max(
Table1[Column1],
Table1[Column2]
)
Actually, you should write the formula exactly as you described:
=max(
Table1[Column1],
Table1[Column2]
)
MAX function in dax exists in 2 versions: one takes a single column, the other takes 2 singular expressions.
Instead of MAX, you can just use a simple IF to achieve what you want:
= IF(Table1[Column1] >= Table1[Column2], Table1[Column1], Table1[Column2])

How to filter clickhouse table by array column contents?

I have a clickhouse table that has one Array(UInt16) column. I want to be able to filter results from this table to only get rows where the values in the array column are above a threshold value. I've been trying to achieve this using some of the array functions (arrayFilter and arrayExists) but I'm not familiar enough with the SQL/Clickhouse query syntax to get this working.
I've created the table using:
CREATE TABLE IF NOT EXISTS ArrayTest (
date Date,
sessionSecond UInt16,
distance Array(UInt16)
) Engine = MergeTree(date, (date, sessionSecond), 8192);
Where the distance values will be distances from a certain point at a certain amount of seconds (sessionSecond) after the date. I've added some sample values so the table looks like the following:
Now I want to get all rows which contain distances greater than 7. I found the array operators documentation here and tried the arrayExists function but it's not working how I'd expect. From the documentation, it says that this function "Returns 1 if there is at least one element in 'arr' for which 'func' returns something other than 0. Otherwise, it returns 0". But when I run the query below I get three zeros returned where I should get a 0 and two ones:
SELECT arrayExists(
val -> val > 7,
arrayEnumerate(distance))
FROM ArrayTest;
Eventually I want to perform this select and then join it with the table contents to only return rows that have an exists = 1 but I need this first step to work before that. Am I using the arrayExists wrong? What I found more confusing is that when I change the comparison value to 2 I get all 1s back. Can this kind of filtering be achieved using the array functions?
Thanks
You can use arrayExists in the WHERE clause.
SELECT *
FROM ArrayTest
WHERE arrayExists(x -> x > 7, distance) = 1;
Another way is to use ARRAY JOIN, if you need to know which values is greater than 7:
SELECT d, distance, sessionSecond
FROM ArrayTest
ARRAY JOIN distance as d
WHERE d > 7
I think the reason why you get 3 zeros is that arrayEnumerate enumerates over the array indexes not array values, and since none of your rows have more than 7 elements arrayEnumerates results in 0 for all the rows.
To make this work,
SELECT arrayExists(
val -> distance[val] > 7,
arrayEnumerate(distance))
FROM ArrayTest;

Core Data. Is it possible creating a view like you would do with normal SQL

In normal SQL world you would use Create View .... to define a view on one or more tables, e.g. to get a join and already a group by. Is that also possible somehow in Core data?
The reason I'm asking is, I have a table with a details. Each detail record has two keys and an amount. Now I need to show the sum of the amounts grouped by the two keys in a table view - i.e. The first key in the section and the second as normal entry with the sum amount. I thought FRC would work, but it does not group (add up the detail records). With a normal fetch request I can group and get everything - but it seems to be a lot of work to handle the sections manual. So I thought, the best is, I put a view on the table and use the FRC to bring it in the table view. Does that make sense? Any help ist very much appreciated.
example:
I have three fields:
A X 2
A X 2
A Z 3
B X 2
B Y 2
B Y 1
B Z 8
as a result I need
Section : A
X 4
Z 3
Section: B
Y 2
Z 8
So I am not sure if there is a shorter answer but here's how you can do it.
I'll assume the first column, second column and third column are called: firstCol, secondCol, thirdCol.
You can use this predicate to get all object for "A" and put it in resultArray:
//loop over the letters A to Z. Here's what it would look like:
NSPredicate *aPredicate = [NSPredicate predicateWithFormat:#"firstCol = %#)", #"A"];
Then find all the second column letters for objects that have A in first column (resultArray):
NSArray *allLetters = [resultArray valueForKeyPath:#"#distinctUnionOfObjects.secondCol"];
In case of "A" allLetters will include X and Z. Then loop over allLetters and add up the third column:
For (NSString *letter in allLetters) {
int sum = [allLetters valueForKeyPath:[NSString stringWithFormat:#"#sum.%#", letter]];
//this sums up each letter for example returns 4 for X in case of "A"
//insert the sum in an Array and then a Dictionary that can be used for data source of the table.
}

Resources