Data structure for user input of complex nested if-then rules - user-interface

My application requires the processing of measurement data in part via logical rules that are unknown while coding and will be input manually by the user. An example of such a rule is
IF ( Column_3 < 4.5 ) AND ( ( Column_5 > 3.2 ) OR ( Column_7 <= 0 ) ) THEN Result = 2
where the number of elementary comparisons and the bracketing is, a priori, unknown.
This leads to a design question: What is the most efficient way to allow the user to enter this information in a GUI and how can I represent this information in my program in the best way in order to actually compute the whole IF clause? Actually, I would like to represent the rule in an SQL database and so I need a specific data structure.
Thank you all for your kind help!

Regarding GUI, I feel comfortable with entering the data in text-area box.
Unless your common condition are more than 2-3 lines long it should be ok.
The data structure can be something similar to the below design:
Base_Conditions table
ID
Left_operand
Operator_code (> = <)
Right_operand
Logical_conditions table
ID
Left_condition_id
Left_condition_type ("1" for base condition or "2" for another logical condition)
Operator_code (and/or)
Right_condition_id
Right_condition_type
Rules table
ID
Condition_id
Result_action
To store the condition in a relational DB, the data structure would be something similar to this:
Base_Conditions
[1, Column_3, <, 4.5]
[2, Column_5, >, 3.2]
[3, Column_7, <=, 0]
Logical_conditions
[1, 2, 1, OR, 3, 1]
[2, 1, 1, AND, 1, 2]
Rules
[1, 2, "Result = 2"]

Related

Conditional Relate ID with email and name

I'm trying to create a relation from some tables we could that the ID is the team they belong to, and the division is their role.(IN google sheets)(This table of information is in Sheet7)
Now I'm trying to create a conditional IF B2(Value) is lower than 0.8 give me the name and email from the person with the id and the division corresponding to that value. Here is my formula but it's not working =vlookup(A2,if(B2=>0.8;Sheet7!A1:D16;""),2,0) some help please! (This values tables is in Sheet8)
try in F2:
=INDEX(IF(B2:B5>=0.8, IFNA(VLOOKUP(A2:A,
FILTER({Sheet7!C:C, Sheet7!A:D}, Sheet7!D:D=B1), {2, 3}, 0)), "xxx"))
update:
=INDEX(IFNA(VLOOKUP(INDEX(SORT(QUERY(SPLIT(FLATTEN(
COLUMN(B1:E1)&"×"&B1:E1&A2:A&"×"&SUBSTITUTE(B2:E; "."; ",")*1); "×");
"where Col2 matches '.*\d+$' and Col3 < 0.8"));; 2);
{Sheet7!D:D&Sheet7!C:C\ Sheet7!A:D}; {2\ 3}; 0)))

Strange behaviour when using FILTER to filter a different table with no direct relationship?

I have two facts tables, First and Second, and two dimension tables, dimTime and dimColour.
Fact table First looks like this:
and facet table Second looks like this:
Both dim-tables have 1:* relationships to both fact tables and the filtering is one-directional (from dim to fact), like this:
dimColour[Color] 1 -> * First[Colour]
dimColour[Color] 1 -> * Second[Colour]
dimTime[Time] 1 -> * First[Time]
dimTime[Time] 1 -> * Second[Time_]
Adding the following measure, I would expect the FILTER-functuion not to have any affect on the calculation, since Second does not filter First, right?
Test_Alone =
CALCULATE (
SUM ( First[Amount] );
First[Alone] = "Y";
FILTER(
'Second';
'Second'[Colour]="Red"
)
)
So this should evaluate to 7, since only two rows in First have [Alone] = "Y" with values 1 and 6 and that there is no direct relationship between First and Second. However, this evaluates to 6. If I remove the FILTER-function argument in the calculate, it evaluates to 7.
There are thre additional measures in the pbix-file attached which show the same type of behaviour.
How is filtering one fact table which has no direct relationship to a second fact table affecting the calculation done on the second table?
Ziped Power BI-file: PowerBIFileDownload
Evaluating the table reference 'Second' produces a table that includes the columns in both the Second table, as well as those in all the (transitive) parents of the Second table.
In this case, this is a table with all of the columns in dimColour, dimTime, Second.
You can't see this if you just run:
evaluate 'Second'
as when 'evaluate' returns the results to the user, these "Parent Table" (or "Related") columns are not included.
Even so, these columns are certainly present.
When a table is converted to a row context, these related columns become available via RELATED.
See the following queries:
evaluate FILTER('Second', ISBLANK(RELATED(dimColour[Color])))
evaluate 'Second' order by RELATED(dimTime[Hour])
Similarly, when arguments to CALCULATE are used to update the filter context, these hidden "Related" columns are not ignored; hence, they can end up filtering First, in your example. You can see this, by using a function that strips the related columns, such as INTERSECT:
Test_ActuallyAlone = CALCULATE (
SUM ( First[Amount] ),
First[Alone] = "Y",
//This filter now does nothing, as none of the columns in Second
//have an impact on 'SUM ( First[Amount] )'; and the related columns
//are removed by the INTERSECT.
FILTER(
INTERSECT('Second', 'Second')
'Second'[Colour]="Red"
)
)
(See these resources that describe the "Expanded Table"
(this is an alternative but equivalent explanation of this behaviour)
https://www.sqlbi.com/articles/expanded-tables-in-dax/
https://www.sqlbi.com/articles/context-transition-and-expanded-tables/
)

How do I create a single measure, that slices the same data 3 different ways and unions it?

I've been trying for a day and a half now to figure out how to combine the same measure, in two different ways, in the same measure. It's been broken into parts, I've tried to UNION them, calculate with IF statements, I even thought I could UNION 3 summary tables to get the right output. I'm stuck using Excel 365 ProPlus (which I believe to be 2016 since Get and Transform and PowerPivot are built in).
The goal: I need to do this so that I can trick a PowerPivot table connected to the data model into displaying a) running total with b) a total line with c) a flat, non-running total Goal/Target line in the same measure. I've been able to do a & b, however c is elusive.
I tried to calculate the data in stages, with the first two steps here being that no matter what I try I can't seem to get two filters to work at the same time:
Occbase:=CALCULATE([Occurrences],
FILTER('Final Dataset',
'Final Dataset'[MainFilter] = ""))
CumOcc:=CALCULATE([Occbase],
FILTER(ALL(DimDate[DateValue]),
DimDate[DateValue] <= MAX(DimDate[DateValue])))
These two measures will do part 1, filter the dataset, and then calculate from that filter a simple running total. I've tried to do it in a single step but if the filter is working, then the running total won't work:
CombinedMakesRunningTotolStopWorking:=CALCULATE(SUM('Final Dataset'[xOccurrences]), FILTER(
ALL(Dimdate[DateValue]),
DimDate[DateValue] <= MAX(DimDate[DateValue]))
,FILTER(
'Final Dataset',
'Final Dataset'[MainFilter] = ""
|| 'Final Dataset'[Region] = "Ttl Occ MPR" //I couldn't figure out how to calculate on the fly
) //so I generated this total in PowerQuery
)
The SQL dev in me decided to try to pull all three above separately and then use UNION and SUMMARIZE by the date value and the region value but received an even worse result...
TryHarder:=SUMX(UNION(
SUMMARIZE(FILTER('Final Dataset',
'Final Dataset'[Region] = "Ttl Occ MPR"),
[Region],
[DateValue],
"OccurrencesXXX", CALCULATE([Occbase],
FILTER(ALL(DimDate[DateValue]),
DimDate[DateValue] <= MAX(DimDate[DateValue]))))
,
SUMMARIZE(FILTER(ALL('Final Dataset'),
'Final Dataset'[Region] = "PR Occ Goal"),
[Region],
[DateValue],
"OccurrencesXXX", [Occurrences])
,
SUMMARIZE(FILTER('Final Dataset',
'Final Dataset'[MainFilter] = ""),
[Region],
[DateValue],
"OccurrencesXXX", CALCULATE([Occbase],
FILTER(ALL(DimDate[DateValue]),
DimDate[DateValue] <= MAX(DimDate[DateValue]))))
), [OccurrencesXXX])
With the comically defeating result of:
I could give up and just generate a table for each chart in PowerQuery... but would have to generate a ton of tables. I have to assume I'm doing something wrong with scope/context and I have a feeling my C#/SQL mindset is putting me at a huge disadvantage in learning DAX. I'd like to understand what I'm doing wrong and learn the DAX pattern and terminology to fix it.
One way to do this is to setup a table that is not connected to the model, and then use that to determine what value you return. Example below being for a unit of measure (UOM). The idea being that the measure returned is dependent on the Unit of measure field, so adding it to the legend part of the pivot chart would return unit, case and ESU volume. It also means you could use a slicer to toggle which fields are returned in the chart.
Volume:=IF( HASONEVALUE( 'Unit of Measure'[UOM] ),
SWITCH(TRUE(),
VALUES('Unit of Measure'[Order]) = 1, [Unit Volume],
VALUES('Unit of Measure'[Order]) = 2, [Case Volume],
VALUES('Unit of Measure'[Order]) = 3, [ESU Volume]
),
[ESU Volume]
)

Perfomance wise for LUA table selection

I'm a bit new to LUA. So I have a game that I need to capture the Entity and insert into the table. The maximum possible Entity table that could happen at the same time is 14. So I read that an array based solution is good.
But I saw that the table size increment even if we delete some value, for example from 10 table value and delete value at index 9 its not automatically shift the size when I want to insert table number 11.
Example:
local Table = {"hello", "hello", "hello", "hello", "hello", "hello", "hello", "hello", "hello", "hello"}
-- Current Table size = 10
-- Perform delete at index 9
Table[9] = nil
-- Have new Entity to insert
Table[#Table + 1] = "New Value"
-- The table size will grow by the time the game extend.
So for this type of situation did array based table with nil value inside that grow by the time of new table value inserted will have better perfomance or should I move into table with key?
Or I should just stick with array based table and perform full cleanup when the table isnt used?
If you set an element in a table to nil, then that just stays there as a "hole" in your array.
tab = {1, 2, 3, 4}
tab[2] = nil
-- tab == {1, nil, 3, 4}
-- #tab is actually undefined and could be both 1 or 4 (or something completely unexpected)!
What you need to do is set the field to nil, then shift all the following fields to fill that hole. Luckily, Lua has a function for that, which is table.remove(table, index).
tab = {1, 2, 3, 4}
table.remove(tab, 2)
-- tab == {1, 3, 4}
-- #tab == 3
Keep in mind that this can get very slow as there's lots of memory access involved, so don't go applying this solution when you have a few million elements some day :)
While table.remove(Table, 9) will do the job in your case (removing field from "array" table and shifting remaining fields to fill the hole), you should first consider using "set" table instead.
If you:
- often remove/add elements
- don't care about their order
- often check if table contains a certain element
then the "set" table is your choice. Use it like so
local tab = {
["John"] = true,
["Jane"] = true,
["Bob"] = true,
}
Your elements will be stored as indices in a table.
Remove an element with
tab["Jane"] = nil
Test if table contains an element with
if tab["John"] then
-- tab contains "John"
Advantages compared to array table:
- this will eliminate performance overhead when removing an element because other elements will remain intact and no shifting is required
- checking if element exists in this table (which I assume is the main puspose of this table) is also faster than using array table because it no longer requires iterating over all the elements to find a match, the hash lookup is used instead
Note however that this approach doesn't let you have duplicate values as your elements, because tables can't contain duplicate keys. In that case you can use numbers as values to store the amount of times the element is duplicated in your set, e.g.
local tab = {
["John"] = 1,
["Jane"] = 2,
["Bob"] = 35,
}
Now you have 1 John, 2 Janes and 35 Bobs
https://www.lua.org/pil/11.5.html

Two (seemingly) identical queries, one is faster, why?

Two seemingly identical queries (as far as a newbie like me can tell, but the first is faster overall in the partial template rendering time (nothing else changed but the ids statement). Also, when testing through rails console, the latter will visibly run a query, the former will not. I do not understand why - and why the first statement is a few ms faster than the second - though I can guess it is due to the shorter method chaining to get the same result.
UPDATE: My bad. They are not running the same query, but it still is interesting how a select on all columns is faster than a select on one column. Maybe it is a negligible difference compared to the method chaining though.
ids = current_user.activities.map(&:person_id).reverse
SELECT "activities".* FROM "activities" WHERE "activities"."user_id" = 1
SELECT "people".* FROM "people" WHERE "people"."id" IN (1, 4, 12, 15, 3, 14, 17, 10, 5, 6) Rendered activities/_activities.html.haml (7.4ms)
ids = current_user.activities.order('id DESC').select{person_id}.map(&:person_id)
SELECT "activities"."person_id" FROM "activities" WHERE "activities"."user_id" = 1 ORDER BY id DESC
SELECT "people".* FROM "people" WHERE "people"."id" IN (1, 4, 12, 15, 3, 14, 17, 10, 5, 6) Rendered activities/_activities.html.haml (10.3ms)
The purpose of the statement is to retrieve the foreign key reference to people in the order in which they appeared in the activities table, (on its PK).
Note: I use Squeel for SQL.
In the first query, you've chained .map and .reverse, while in the second query, you've used .order('id DESC') .select(person_id) which were unnecessary, if you added .reverse

Resources