How do you create a new column based on Max value of 1 column and Category of another? - max

I am working with a bunch of data for my job creating status reports on the documents that we are working through that we then assign to an area. We decided to use PowerBI as an interactive way to see where everything is at.
Using Power BI Desktop I've created a new table that excludes documents that are not ready for QC but we have several different statuses. Instead of creating a new table for each status type (since some can be grouped together) I would like to create a new column that has the grouped status value's Max for each area. The higher the Status Value the further it is from being complete.
EX:
Record:
Area:
Status Value:
Max Status Value:
152385
A
1
2
354354
B
2
3
131322
B
3
3
132136
A
2
2
213513
A
1
2
351315
B
2
3
If anyone knows how to get the Max Status Value column that would greatly help. I did find another post (https://community.powerbi.com/t5/Desktop/LOOKUPVALUE-return-min-max-of-values-found/td-p/657534) that was similar but I'm still new to DAX and could not figure out how to apply it to my situation.

This post actually helped me answer the question.
https://community.powerbi.com/t5/Power-Query/Maxifs-Power-Query/m-p/1693606
The only difference I made was getting rid of the true/false portion to receive my results. Thus my result was:
Max Status Value =
VAR vMaxVal=
CALCULATE (
MAX ( 'Table'[Status Value] ),
ALLEXCEPT (
'Table',
'Table'[Area]
)
)
RETURN
vMaxVal

Related

How to find the maximum value from a column satisfying two or more IF conditions in DAX

I am a newbie to Power BI and DAX.
I have a dataset as attached. I need to find the maximum value for each person for each week. I have written the formula in Excel.
=MAX(IF(A$2:A$32=A2,IF(D$2:D$32=D2,IF(B$2:B$32=1,C$2:C$32))))
How can I convert it to DAX or write the same formula in Power BI? I tried the DAX Code as below, But it did not work(ALLEXCEPT Function expects table).
Weekly Maximum =
CALCULATE ( MAX ( PT[Value] ), ALLEXCEPT ( PT, PT[person], PT[Week],
PT[category] ==1 ) )
Once I calculate this, then I need to calculate the Expected value for each week, that has the maximum value of the previous week * 2.85, as shown in the screenshot. How can I put the previous week's maximum value for this week?
Any corrections/solutions, please?
TIA
The Max Value for Category 1 can be written like this:
= CALCULATE(MAX(PT[Value]),
ALLEXCEPT(PT, PT[Person], PT[Week]),
PT[Category] = 1)
(The Category filter doesn't go inside ALLEXCEPT().)
For your Expected Value column, you can do something similar:
= CALCULATE(2.85 * MAX(PT[Value]),
ALLEXCEPT(PT, PT[Person]),
PT[Category] = 1,
PT[Week] = EARLIER(PT[Week]) - 1)
(The EARLIER function gives you the value for the row you are in. The name refers to the earlier row context.)

Making DAX code more efficient - counting unique Start dates in overlapping date ranges

I have a table of every product purchased by every client over 25 years. The table contains client#, product, start date, and end date.
The products can be owned by the client for any amount of time (1 day to 100 years). While the client owns products with us, the client is active. If a client ends all products they cease to be a client. I want to count new client starts each year. The problem is, some clients end all products then start purchasing products again years later (but clients always retain the same client#) - If the client leaves then rejoins year's later I want to count the client as a new client.
I have created DAX code to do this which works perfectly on a small file, but the code uses up too many resources and so I cannot use it on my data (about 200,000 records). I know my code is HIGHLY INEFFICIENT and could probably be cleaned up...but I am not sure how. Alternately, if I could figure out how to make these columns in PowerQuery, perhaps that would work
Here is how I do it.
1) Add four calculated columns to my table:
VeryFirstStart = Calculate(
Min('Products'[StartDate]),
ALLEXCEPT(Products,Products[ClientNumber]))=Products[StartDate]
this flags records that contain the first ever start date of any client
MaxEndDateofEarlierDates = Calculate(
Max('Products'[EndDate]),
Filter(
Filter(ALLEXCEPT(Products, Products[ClientNumber]), Products[EndDate]),
Products[StartDate] < EARLIER(Products[StartDate])))
This step blows up my PowerBI - this shows the date of any NEW product purchases where the new start date occurs AFTER an ending date
Second+Start = And(
Products[MaxEndDateofEarlierDates]<>BLANK(),
Products[MaxEndDateofEarlierDates]<Products[StartDate])
this flags records where we want to count the new start date as a new client
NewStart = OR(Products[Second+Start],Products[VeryFirstStart])
**this flags ANY new client start date regardless of whether it was the first or a subsequent*
Finally I added this measure:
!MemberNewStarts = CALCULATE(
DISTINCTCOUNT(Products[ClientNumber]),
FILTER(
'Products',
('Products'[StartDate] <= LASTDATE('DIMDate'[Date]) &&
'Products'[StartDate]>= FIRSTDATE('DIMDate'[Date]) &&
Products[NewStart]=TRUE())))
Does anyone have any suggestions about how to achieve this with less resources?
Thanks
Here is some data to try
MemberNumber Product StartDate EndDate Note (not in real data)
1 A 02/02/2003 02/02/2004
1 C 02/02/2009 02/02/2010
2 A 02/02/2001 02/02/2002
2 C 02/02/2001 02/02/2002
2 B 02/02/2005 02/02/2010
3 C 02/02/2002 02/02/2005
3 B 02/02/2002 02/02/2005
3 A 02/02/2003 02/02/2008
4 B 02/02/2002 02/02/2003
4 C 02/02/2003 02/02/2006
5 B 02/02/2003 02/02/2007
5 C 02/02/2005 02/02/2010
5 A 02/02/2005 02/02/2007
6 A 02/02/2001 02/02/2006
6 C 02/02/2003 02/02/2007
7 B 02/02/2001 02/02/2004
7 A 02/02/2001 02/02/2005
7 C 02/02/2005 02/02/2006
8 B 02/02/2002 02/02/2006
8 A 02/02/2004 02/02/2009
note member 1 starts as a new client in 2009 since all previous products ended in 2004 and member 2 starts as a new client in 2005 since all previous products ended in 2002
The desired outcome is:
Start Year 2001 2002 2003 2004 2005 2006 2007 2008
New Clients 3 3 2 0 1 0 0 0
Here's one way of trying to solve it. Let me know if this is any more efficient than yours:
1st New Column:
PreviousHighestFinish:=
Calculate(
Max(Products[EndDate]),
ALLEXCEPT(Products,Products[ClientNumber]),
Products[StartDate] < Earlier(Products[StartDate]
)
This will give you the latest end date where the Client Number matches and the start date is before the current start date. If there is no earlier start date, it returns a blank.
2nd New Column:
NewClientProduct:=
if(Products[StartDate]>=Products[PreviousHighestFinish],1,0)
This will give you a 1 for every row where the client has either not been seen before (and the previous column showed blank) or the client has ben seen before, but has no current products.
The problem with this measure is that if you have a client starting more than one product on the same date, they will show as multiple new clients.
The fix for this is to count up the instances of each client-date combination
3rd New Column:
ClientDateCount:=
CALCULATE(
COUNTROWS(Products),
ALLEXCEPT(Products,Products[ClientNumber],Products[StartDate])
)
This essentially gives the number of times that the client on this row in the table has started a product on this date.
Now divide the 2nd new column by this one
4th New Column:
NewClients:=
DIVIDE(Products[NewClientProduct],Products[ClientDateCount])
And voila:

How to filter clickhouse table by array column contents?

I have a clickhouse table that has one Array(UInt16) column. I want to be able to filter results from this table to only get rows where the values in the array column are above a threshold value. I've been trying to achieve this using some of the array functions (arrayFilter and arrayExists) but I'm not familiar enough with the SQL/Clickhouse query syntax to get this working.
I've created the table using:
CREATE TABLE IF NOT EXISTS ArrayTest (
date Date,
sessionSecond UInt16,
distance Array(UInt16)
) Engine = MergeTree(date, (date, sessionSecond), 8192);
Where the distance values will be distances from a certain point at a certain amount of seconds (sessionSecond) after the date. I've added some sample values so the table looks like the following:
Now I want to get all rows which contain distances greater than 7. I found the array operators documentation here and tried the arrayExists function but it's not working how I'd expect. From the documentation, it says that this function "Returns 1 if there is at least one element in 'arr' for which 'func' returns something other than 0. Otherwise, it returns 0". But when I run the query below I get three zeros returned where I should get a 0 and two ones:
SELECT arrayExists(
val -> val > 7,
arrayEnumerate(distance))
FROM ArrayTest;
Eventually I want to perform this select and then join it with the table contents to only return rows that have an exists = 1 but I need this first step to work before that. Am I using the arrayExists wrong? What I found more confusing is that when I change the comparison value to 2 I get all 1s back. Can this kind of filtering be achieved using the array functions?
Thanks
You can use arrayExists in the WHERE clause.
SELECT *
FROM ArrayTest
WHERE arrayExists(x -> x > 7, distance) = 1;
Another way is to use ARRAY JOIN, if you need to know which values is greater than 7:
SELECT d, distance, sessionSecond
FROM ArrayTest
ARRAY JOIN distance as d
WHERE d > 7
I think the reason why you get 3 zeros is that arrayEnumerate enumerates over the array indexes not array values, and since none of your rows have more than 7 elements arrayEnumerates results in 0 for all the rows.
To make this work,
SELECT arrayExists(
val -> distance[val] > 7,
arrayEnumerate(distance))
FROM ArrayTest;

SOQL - single row per each group

I have the following SOQL query to display List of ABCs in my Page block table.
Public List<ABC__c> getABC(){
List<ABC__c> ListABC = [Select WB1__c, WB2__c, WB3__c, Number, tentative__c, Actual__c, PrepTime__c, Forecast__c from ABC__c ORDER BY WB3__c];
return ListABC;
}
As you can see in the above image, WB3 has number of records for A, B and C. But I want to display only 1 record for each WB3 group based on Actual__c. Only latest Actual__c must be displayed for each WB3 Group.
i.e., Ideally I want to display only 3 rows(one each for A,B,C) in this example.
For this, I have used GROUPBY and displayed the result using AggregateResults. Here is the result.
I got the Latest Actual Date for each WB3 as shown above. But the Tentative date is not corresponding to it. The Tentative Date is also the MAX in the list.
Here is the code I used
public List<SiteMonitoringOverview> getSPM(){
AggregateResult[] AgR = [Select WB_3__c, MAX(Tentaive_Date__c) dtTentativeDate , MAX(Actual_Date__c) LatestCDate FROM Site_progress_Monitoring__c GROUP BY WBS_3__c];
if(AgR.size()>0){
for(AggregateResult SalesList : AgR){
CustSumList.add(new SiteMonitoringOverview(String.ValueOf(SalesList.ge​t('WB_3__c')), String.valueOf(SalesList.get('dtTentativeDate')), String.valueOF(SalesList.get('LatestCDate')) ));
}
}
return CustSumList;
}
I am forced to use MAX() for tentative date. I want the corresponding Tentative date of the MAX Actual Date. Not the Max Tentative Date.
For group A, the Tentative Date of Max Actual Date is 12/09/2012. But it is displaying the MAX tentative date: 27/02/2013. It should display 12/09/2012. This is because I am using MAX(Tentative_Date__c) in my code. Every column in the SOQL query must be either GROUPED or AGGREGATED. That's weird.
How do I get the required 3 rows in this example?
Any suggestions? Any different approach (looping within in groups)? how?
Just ran into this issue myself. The solution I came up with only works if you want the oldest or newest record from each grouping. Unfortunately it probably won't work in your case. I'll still leave this here incase it does happen to help someone searching for a solution to this issue.
AggregateResult[] groupedResults = [Select Max(Id), WBS_3__c FROM Site_progress_Monitoring__c GROUP BY WBS_3__c];
Calling MAX or MIN on the Id will let you get 1 record per group condition. You can then query other information. I my case I just need 1 record from each group and didn't really care which one it was.

Azure Table Storage - PartitionKey and RowKey selection to use between query

I am a total newbie with Azure! The purpose is to return the rows based on the timestamp stored in the RowKey. As there is a transaction cost with each query, I want to minimize the number of transactions/queries whilst maintain performance
These are the proposed Partition and Row Keys:
Partition Key: TextCache_(AccountID)_(ParentMessageId)
Row Key: (DateOfMessage)_(MessageId)
Legend:
AccountId - is an integer
ParentMessageId - The parent messageId if there is one, blank if it is the parent
DateOfMessage - Date the message was created - format will be DateTime.Ticks.ToString("d19")
MessageId - the unique Id of the message
I would like to get back from a single query the rows and any childrows that is > or < DateOfMessage_MessageId
Can this be done via my proposed PartitionKeys and RowKeys?
ie.. (in psuedo code)
var results = ctx.PartitionKey.StartsWith(TextCache_AccountId)
&& ctx.RowKey > (TimeStamp)_MessageId
Secondly, if there I have a number of accounts, and only want to return back the first 10, could it be done via a single query
ie.. (in psuedo code)
var results = (
(
ctx.PartitionKey.StartsWith(TextCache_(AccountId1)) &&
&& ctx.RowKey > (TimeStamp1)_MessageId1 )
)
||
(
ctx.PartitionKey.StartsWith(TextCache_(AccountId2)) &&
&& ctx.RowKey > (TimeStamp2)_MessageId2 )
) ...
)
.Take(10)
The short answer to your questions is yes, but there are some things you need to watch for.
Azure table storage doesn't have a direct equivalent of .StartsWith(). If you're using the storage library in combination with LINQ you can use .CompareTo() (> and < don't translate properly) which will mean that if you run a search for account 1 and you ask the query to return 1000 results, but there are only 600 results for account 1, the last 400 results will be for account 10 (the next account number lexically). So you'll need to be a bit smart about how you deal with your results.
If you padded out the account id with leading 0s you could do something like this (pseudo code here as well)
ctx.PartionKey > "TextCache_0000000001"
&& ctx.PartitionKey < "TextCache_0000000002"
&& ctx.RowKey > "123465798"
Something else to bear in mind is that queries to Azure Tables return their results in PartitionKey then RowKey order. So in your case messages without a ParentMessageId will be returned before messages with a ParentMessageId. If you're never going to query this table by ParentMessageId I'd move this to a property.
If TextCache_ is just a string constant, it's not adding anything by being included in the PartitionKey unless this will actually mean something to your code when it's returned.
While you're second query will run, I don't think it will produce what you're after. If you want the first ten rows in DateOfMessage order, then it won't work (see my point above about sort orders). If you ran this query as it is and account 1 had 11 messages it will return only the first 10 messages related to account 1 regardless if whether account 2 had an earlier message.
While trying to minimise the number of transactions you use is good practice, don't be too concerned about it. The cost of running your worker/web roles will dwarf your transaction costs. 1,000,000 transactions will cost you $1 which is less than the cost of running one small instance for 9 hours.

Resources