Strange behaviour when using FILTER to filter a different table with no direct relationship? - filter

I have two facts tables, First and Second, and two dimension tables, dimTime and dimColour.
Fact table First looks like this:
and facet table Second looks like this:
Both dim-tables have 1:* relationships to both fact tables and the filtering is one-directional (from dim to fact), like this:
dimColour[Color] 1 -> * First[Colour]
dimColour[Color] 1 -> * Second[Colour]
dimTime[Time] 1 -> * First[Time]
dimTime[Time] 1 -> * Second[Time_]
Adding the following measure, I would expect the FILTER-functuion not to have any affect on the calculation, since Second does not filter First, right?
Test_Alone =
CALCULATE (
SUM ( First[Amount] );
First[Alone] = "Y";
FILTER(
'Second';
'Second'[Colour]="Red"
)
)
So this should evaluate to 7, since only two rows in First have [Alone] = "Y" with values 1 and 6 and that there is no direct relationship between First and Second. However, this evaluates to 6. If I remove the FILTER-function argument in the calculate, it evaluates to 7.
There are thre additional measures in the pbix-file attached which show the same type of behaviour.
How is filtering one fact table which has no direct relationship to a second fact table affecting the calculation done on the second table?
Ziped Power BI-file: PowerBIFileDownload

Evaluating the table reference 'Second' produces a table that includes the columns in both the Second table, as well as those in all the (transitive) parents of the Second table.
In this case, this is a table with all of the columns in dimColour, dimTime, Second.
You can't see this if you just run:
evaluate 'Second'
as when 'evaluate' returns the results to the user, these "Parent Table" (or "Related") columns are not included.
Even so, these columns are certainly present.
When a table is converted to a row context, these related columns become available via RELATED.
See the following queries:
evaluate FILTER('Second', ISBLANK(RELATED(dimColour[Color])))
evaluate 'Second' order by RELATED(dimTime[Hour])
Similarly, when arguments to CALCULATE are used to update the filter context, these hidden "Related" columns are not ignored; hence, they can end up filtering First, in your example. You can see this, by using a function that strips the related columns, such as INTERSECT:
Test_ActuallyAlone = CALCULATE (
SUM ( First[Amount] ),
First[Alone] = "Y",
//This filter now does nothing, as none of the columns in Second
//have an impact on 'SUM ( First[Amount] )'; and the related columns
//are removed by the INTERSECT.
FILTER(
INTERSECT('Second', 'Second')
'Second'[Colour]="Red"
)
)
(See these resources that describe the "Expanded Table"
(this is an alternative but equivalent explanation of this behaviour)
https://www.sqlbi.com/articles/expanded-tables-in-dax/
https://www.sqlbi.com/articles/context-transition-and-expanded-tables/
)

Related

Power BI DAX measure: Count occurences of a value in a column considering the filter context of the visual

I want to count the occurrences of values in a column. In my case the value I want to count is TRUE().
Lets say my table is called Table and has two columns:
boolean value
TRUE() A
FALSE() B
TRUE() A
TRUE() B
All solutions I found so far are like this:
count_true = COUNTROWS(FILTER(Table, Table[boolean] = TRUE()))
The problem is that I still want the visual (card), that displays the measure, to consider the filters (coming from the slicers) to reduce the table. So if I have a slicer that is set to value = A, the card with the count_true measure should show 2 and not 3.
As far as I understand the FILTER function always overwrites the visuals filter context.
To further explain my intent: At an earlier point the TRUE/FALSE column had the values 1/0 and I could achieve my goal by just using the SUM function that does not specify a filter context and just acts within the visuals filter context.
I think the DAX you gave should work as long as it's a measure, not a calculated column. (Calculated columns cannot read filter context from the report.)
When evaluating the measure,
count_true = COUNTROWS ( FILTER ( Table, Table[boolean] = TRUE() ) )
the first argument inside FILTER is not necessarily the full table but that table already filtered by the local filter context (including report/page/visual filters along with slicer selections and local context from e.g. rows/column a matrix visual).
So if you select Value = "A" via slicer, then the table in FILTER is already filtered to only include "A" values.
I do not know for sure if this will fix your problem but it is more efficient dax in my opinion:
count_true = CALCULATE(COUNTROWS(Table), Table[boolean])
If you still have the issue after changing your measure to use this format, you may have an underlying issue with the model. There is also the function KEEPFILTERS that may apply here but I think using KEEPFILTERS is overcomplicating your case.

MDX: add dimension in filter but also show in rows

I have a simple MDX query that filters data to one of two Status values:
SELECT
NON EMPTY
{ [confidentialstring].Members }
ON COLUMNS,
NON EMPTY
{ [APMWORDER].[LEVEL01].Members *
[APMSTDOPR].[LEVEL01].Members *
[AWOUSTAT].[LEVEL01].Members *
[ANOTIFCTN].[LEVEL01].Members
}
ON ROWS
FROM [APM_CP002/APM_CP002_QX006]
WHERE { ( [AWOUSTAT].[E0001] ), ( [AWOUSTAT].[E0002] ) }
But I also need [AWOUSTAT] in my table, to see which status value is actually applicable.
However MDX throws an error if I add it: 'You may not use the same dimension on different axes'. I understand this in principle but not in this application ('filter' is not an axis to me...)
How can I resolve this, without having to create two queries?
Thanx!
Since you cannot have the same dimension on rows and in where clause, and you want to see [AWOUSTAT].[E0001]] , [AWOUSTAT].[E0002] , in your table, I think you could try something like this to start:
SELECT
NON EMPTY
{ [confidentialstring].Members }
ON COLUMNS,
NON EMPTY
{ ( [AWOUSTAT].[E0001]] ), ( [AWOUSTAT].[E0002] ) }
ON ROWS
FROM [APM_CP002/APM_CP002_QX006]
A common technique is to replace the WHERE clause with a FROM clause.
Sample using adventure works:
SELECT
NON EMPTY
{[Measures].[Order Count]} ON COLUMNS
,NON EMPTY
{[Date].[Day of Month].[Day of Month].ALLMEMBERS}
ON ROWS
FROM
(
SELECT
{[Date].[Fiscal Year].&[2011]} ON COLUMNS
FROM [Adventure Works]
)
In the above example, it is equivalent to having a WHERE clause of the Fiscal Year = 2011.
You can see a similar output, whenever you use the SSMS Cube Browser to build queries.
It may look strange to replace WHERE with FROM, BUT, in the in most cases with the Cube, it does provide the required output.

How to perform subquery in FROM clause with Yii 1.1?

I have a fairly large and complex query being generated by Yii. To generate this query we're utilizing CDbCritera::with to eagerly load multiple related models, and we're using multiple scopes to limit the records returned. The query being generated is roughly 700 lines long, but looks something like this:
SELECT `t`.`column1` as `t0_c0`,
`t`.`column2` as `t0_c1`,
`related1`.`column1` as `t1_c0`,
...
`related9`.`column5` as `t9_c4`
FROM `model` `t`
LEFT OUTER JOIN `other_model` `related1`
ON ( `t`.`other_model_id` = `related1`.`id` )
...
LEFT OUTER JOIN `more_models` `related9`
ON ( `t`.`more_models_id` = `related9`.`id` )
WHERE
...big long WHERE clause using all of related1 - related9 to filter model...
LIMIT 10
Our database has a not insignificant amount of data, but not obscene, either. In this case the model table has about 126000 rows, every "related" model is a BELONGS_TO relationship and there is an index on t.XXX_id so the join is fairly trivial. The problem is the complexity of the WHERE clause, possessing multiple COALESCE and IF and CASE clauses. Performing the filter on our 126000 rows is taking 2.6 seconds -- far longer than we would like for an API endpoint.
The WHERE clause is divided into multiple different sections like so:
WHERE
( ... part 1 ... )
AND
( ... part 2 ... )
AND
( ... part 3 ... )
With each part corresponding to one of the scopes, and each part using one or more related models
One of the scopes filters on only a single related model, and in doing so filters our table down from 126000 rows to about 2000 rows. I found experimentally (in MySQL Workbench) that I could get our query from 2.6 seconds to 0.2 seconds by simply doing this:
SELECT `t`.`column1` as `t0_c0`,
`t`.`column2` as `t0_c1`,
`related1`.`column1` as `t1_c0`,
...
`related9`.`column5` as `t9_c4`
FROM
(
SELECT `model`.*
FROM `model`
LEFT OUTER JOIN `other_model`
ON ( `t`.`other_model_id` = `other_model`.`id` )
WHERE
( ... part 1 ... )
) `t`
LEFT OUTER JOIN `other_model` `related1`
ON ( `t`.`other_model_id` = `related1`.`id` )
...
LEFT OUTER JOIN `more_models` `related9`
ON ( `t`.`more_models_id` = `related9`.`id` )
WHERE
( ... part 2 ... )
AND
( ... part 3 ... )
LIMIT 10
This way instead of performing the very complex WHERE clause on all 126000 rows of the original model table, we perform the much simpler (and index-enhanced) WHERE clause on these 126000 rows and then perform the complex WHERE clause on only the 2000 relevant rows. The results of the two queries are identical, but using a subquery in the FROM clause causes it to run 13x faster.
The problem is, I have no idea how to do this in Yii. I know that I can use CDbCommand to build a query and even pass in raw SQL, but what I'll get back is an array of "rows" -- they won't be understood by Yii and properly converted to the right models.
Does Yii's ActiveRecord system have a way to say something like the following?
$criteria = new CDbCriteria;
$criteria->scopes = array("part1");
$subQuery = Model::model()->buildQuery($criteria);
$criteria = new CDbCriteria;
$criteria->scopes = array("part2", "part3");
$fullQuery = $subQuery->findAll($criteria);
Although not a perfect solution, I did find something that's almost as good. Break the original query into two:
Get the IDs or models you wish to select in the FROM subquery
Append a WHERE to the outer query with id in (...)
If anyone is interested I'll hunt down the code I wrote for this to post in the answer as an example, but so far this question has gotten very little attention and once I found a pseudo-decent solution I've moved on.

How to filter clickhouse table by array column contents?

I have a clickhouse table that has one Array(UInt16) column. I want to be able to filter results from this table to only get rows where the values in the array column are above a threshold value. I've been trying to achieve this using some of the array functions (arrayFilter and arrayExists) but I'm not familiar enough with the SQL/Clickhouse query syntax to get this working.
I've created the table using:
CREATE TABLE IF NOT EXISTS ArrayTest (
date Date,
sessionSecond UInt16,
distance Array(UInt16)
) Engine = MergeTree(date, (date, sessionSecond), 8192);
Where the distance values will be distances from a certain point at a certain amount of seconds (sessionSecond) after the date. I've added some sample values so the table looks like the following:
Now I want to get all rows which contain distances greater than 7. I found the array operators documentation here and tried the arrayExists function but it's not working how I'd expect. From the documentation, it says that this function "Returns 1 if there is at least one element in 'arr' for which 'func' returns something other than 0. Otherwise, it returns 0". But when I run the query below I get three zeros returned where I should get a 0 and two ones:
SELECT arrayExists(
val -> val > 7,
arrayEnumerate(distance))
FROM ArrayTest;
Eventually I want to perform this select and then join it with the table contents to only return rows that have an exists = 1 but I need this first step to work before that. Am I using the arrayExists wrong? What I found more confusing is that when I change the comparison value to 2 I get all 1s back. Can this kind of filtering be achieved using the array functions?
Thanks
You can use arrayExists in the WHERE clause.
SELECT *
FROM ArrayTest
WHERE arrayExists(x -> x > 7, distance) = 1;
Another way is to use ARRAY JOIN, if you need to know which values is greater than 7:
SELECT d, distance, sessionSecond
FROM ArrayTest
ARRAY JOIN distance as d
WHERE d > 7
I think the reason why you get 3 zeros is that arrayEnumerate enumerates over the array indexes not array values, and since none of your rows have more than 7 elements arrayEnumerates results in 0 for all the rows.
To make this work,
SELECT arrayExists(
val -> distance[val] > 7,
arrayEnumerate(distance))
FROM ArrayTest;

Most efficient way to filter multiple dimensions

Trying to get the [Number] and [Sum of Time Spent] for all Changes that were open during period 201405.
The best definition of open I can think of is is:
- Changes that were logged before or during the [MonthPeriod], while closed during or after the [MonthPeriod]
SELECT
[Measures].[Sum of Time Spent] ON COLUMNS
,
[FactChange].[Number].[Number] ON ROWS
FROM
[Change Management]
WHERE
(FILTER(
[DimLoggedDate].[MonthPeriod].[MonthPeriod]
,[DimLoggedDate].[MonthPeriod].MEMBERVALUE <= 201405
)
,
FILTER(
[DimClosedDate].[MonthPeriod].[MonthPeriod]
,[DimClosedDate].[MonthPeriod].MEMBERVALUE >= 201405
))
The above query returns a list with all numbers, with a null value when the filters in the WHERE clause don't apply. I would like to remove the NULL items.
Because the query returns ALL Numbers, I wonder if this is the most efficient query to solve the issue. Applying NonEmpty() would remove the numbers, but since all changes are enumerated, isn't this putting more stress on the system than required?
You do it simply by adding "Non empty" in the on Rows clause:
...
non empty [FactChange].[Number].[Number] on Rows
...

Resources