DAX IF measure - return fixed value - dax

This should be a very simple requirement. But it seems impossible to implement in DAX.
Data model, User lookup table joined to many "Cards" linked to each user.
I have a measure setup to count rows in CardUser. That is working fine.
<measureA> = count rows in CardUser
I want to create a new measure,
<measureB> = IF(User.boolean = 1,<measureA>, 16)
If User.boolean = 1, I want to return a fixed value of 16. Effectively, bypassing measureA.
I can't simply put User.boolean = 1 in the IF condition, throws an error.
I can modify measureA itself to return 0 if User.boolean = 1
measureA> =
CALCULATE (
COUNTROWS(CardUser),
FILTER ( User.boolean != 1 )
)
This works, but I still can't find a way to return 16 ONLY if User.boolean = 1.

That's easy in DAX, you just need to learn "X" functions (aka "Iterators"):
Measure B =
SUMX( VALUES(User.boolean),
IF(User.Boolean, [Measure A], 16))
VALUES function generates a list of distinct user.boolean values (1, 0 in this case). Then, SUMX iterates this list, and applies IF logic to each record.

Related

Powerquery: passing column value to custom function

I'm struggling on passing the column value to a formula. I tried many different combinations but I only have it working when I hard code the column,
(tbl as table, col as list) =>
let
avg = List.Average(col),
sdev = List.StandardDeviation(col)
in
Table.AddColumn(tbl, "newcolname" , each ([column] - avg)/sdev)
I'd like to replace [column] by a variable. In fact, it's the column I use for the average and the standard deviation.
Please any help.
Thank you
This probably does what you want, called as x= fctn(Source,"ColumnA")
Does the calculations using and upon ColumnA from Source table
(tbl as table, col as text) =>
let
avg = List.Average(Table.Column(tbl,col)),
sdev = List.StandardDeviation(Table.Column(tbl,col))
in Table.AddColumn(tbl, "newcolname" , each (Record.Field(_, col) - avg)/sdev)
Potentially you want this. Does the average and std on the list provided (which can come from any table) and does the subsequent calculations on the named column in the table passed over
called as x = fctn(Source,"ColumnNameInSource",SomeSource[SomeColumn])
(tbl as table, cname as text, col as list) =>
let
avg = List.Average(col),
sdev = List.StandardDeviation(col)
in Table.AddColumn(tbl, "newcolname" , each (Record.Field(_, cname) - avg)/sdev)

How can I increase my python-code speed?

I have a dataframe, df1, that reports courses students have taken, where ID is the student’s id, COURSES is a list of courses taken by the student, and TYPE and MAJOR are student attributes. The dataframe looks like this:
ID COURSES TYPE MAJOR
1 ['Intr To Archaeology', 'Statics', 'Circuits I…] Freshman EEEL
2 ['Signals & Systems I', ‘Instrumentation’…] Transfer EEEL
3 ['Keyboard Competence', 'Elementary … ] Freshman EEEL
4 ['Cultural Anthro', 'Vector Analysis’ … ] Freshma EEEL
I created a new dataframe, df2, that reports a dissimilarity measure for each pair of students based on the courses they’ve taken. df2 looks like this:
I created using the following script, but it runs very slowly (there are thousands of students). Can someone suggest a more efficient way to create df2?
One major problem is that the script below calculates the distance between (student 1 and student 2) and (student 2 and student 1), which is redundant since the distances are the same. However, the condition I created to prevent this:
if (id1 >= id2):
continue
doesn't work.
Entire script:
for id1, student1 in df.iterrows():
for id2, student2 in df.iterrows():
if (id1 >= id2):
continue
ID_1 = student1["ID"]
ID_2 = student2["ID"]
# courses as list strings
s1 = student1["COURSES"]
s2 = student2["COURSES"]
try:
# courses as sets
courses1 = set(ast.literal_eval(s1))
courses2 = set(ast.literal_eval(s2))
distance = float(len(courses1.symmetric_difference(courses2)))/(len(courses1) + len(courses2))
except:
# Some strings seem to have a different format
distance = -1
ID_1_Transfer = 1 if student1["TYPE"] == "Transfer" else 0
ID_2_Transfer = 1 if student2["TYPE"] == "Transfer" else 0
df2= df2.append({'ID_1': ID_1,'ID_2': PIDM_2,'Distance': distance, 'ID_1_Transfer': ID_1_Transfer, 'ID_2_Transfer': ID_2_Transfer}, ignore_index=True)

Oracle Apex get Multiple results from checkboxes

DB = Oracle 11g
Apex = 4.2.6
In the form I have various items which all work great. However I now have a set of check boxes(:P14_DAYS) one for each day of the week.
What I need to do is get all records between :P14_START_DATE :P14_END_DATE, but only within the days select that's checked.
Below is also a sample of the DATE_SETS table
http://i.stack.imgur.com/YAckN.png
so for example
dates 01-AUG-14 - 5-AUG-14 But only require Sundays AND Mondays date would bring back 2 refs.
BEGIN
UPDATE MD_TS_DETAIL
SET job_for = :P14_JOBFORTEM,
job_type_id = :P14_JOB_TYPE_VALUE,
account_id = :P14_ACC_VAL,
qty = :P14_HRS,
rio = :P14_RIO,
post_code = :P14_POSTCODE
WHERE id IN (SELECT D.id
FROM MD_TS_MAST M
LEFT JOIN MD_TS_DETAIL D
ON M.mast_id = D.md_id
LEFT JOIN DATE_SETS
ON ms_date = dt
WHERE eng_id = :P14_ENG_VAL
AND ms_date BETWEEN :P14_START_DATE AND :P14_END_DATE
AND DATE_SETS.col_day = ANY instr(':'||:P14_DAYS||':',Return)
END;
Any help would be much appreciated .
I found this example: http://docs.oracle.com/cd/B31036_01/doc/appdev.22/b28839/check_box.htm#CHDBGDJH
As I can understand, when you choose some values in your checkbox list, item :P14_DAYS receives value, that contains return values of chosen elements of the LOV with delimiters. Then you need to replace this string in your query
AND DATE_SETS.col_day = ANY instr(':'||:P14_DAYS||':',Return)
with
AND instr(':'||:P14_DAYS||':', ':'||DATE_SETS.col_day||':') > 0
Here function instr searches substring DATE_SETS.col_day in string :P14_DAYS. It returns position of substring if substring was found or 0, if not found. Then you compare result of function with 0, and if result > 0, it means that DATE_SETS.col_day is among selected values.

hadoop cascading how to get top N tuples

New to cascading, trying to find out a way to get top N tuples based on a sort/order. for example, I'd like to know the top 100 first names people are using.
here's what I can do similar in teradata sql:
select top 100 first_name, num_records
from
(select first_name, count(1) as num_records
from table_1
group by first_name) a
order by num_records DESC
Here's similar in hadoop pig
a = load 'table_1' as (first_name:chararray, last_name:chararray);
b = foreach (group a by first_name) generate group as first_name, COUNT(a) as num_records;
c = order b by num_records DESC;
d = limit c 100;
It seems very easy to do in SQL or Pig, but having a hard time try to find a way to do it in cascading. Please advise!
Assuming you just need the Pipe set up on how to do this:
In Cascading 2.1.6,
Pipe firstNamePipe = new GroupBy("topFirstNames", InPipe,
new Fields("first_name"),
);
firstNamePipe = new Every(firstNamePipe, new Fields("first_name"),
new Count("num_records"), Fields.All);
firstNamePipe = new GroupBy(firstNamePipe,
new Fields("first_name"),
new Fields("num_records"),
true); //where true is descending order
firstNamePipe = new Every(firstNamePipe, new Fields("first_name", "num_records")
new First(Fields.Args, 100), Fields.All)
Where InPipe is formed with your incoming tap that holds the tuple data that you are referencing above. Namely, "first_name". "num_records" is created when new Count() is called.
If you have the "num_records" and "first_name" data in separate taps (tables or files) then you can set up two pipes that point to those two Tap sources and join them using CoGroup.
The definitions I used were are from Cascading 2.1.6:
GroupBy(String groupName, Pipe pipe, Fields groupFields, Fields sortFields, boolean reverseOrder)
Count(Fields fieldDeclaration)
First(Fields fieldDeclaration, int firstN)
Method 1
Use a GroupBy and group them base on the columns required and u can make use of secondary sorting that is provided by the cascading ,by default it provies them in ascending order ,if we want them in descing order we can do them by reverseorder()
To get the TOP n tuples or rows
Its quite simple just use a static variable count in FILTER and increment it by 1 for each tuple count value increases by 1 and check weather it is greater than N
return true when count value is greater than N or else return false
this will provide the ouput with first N tuples
method 2
cascading provides an inbuit function unique which returns firstNbuffer
see the below link
http://docs.cascading.org/cascading/2.2/javadoc/cascading/pipe/assembly/Unique.html

How can I merge two outputs of two Linq queries?

I'm trying to merge these two object but not totally sure how.. Can you help me merge these two result objects?
//
// Create Linq Query for all segments in "CognosSecurity"
//
var userListAuthoritative = (from c in ctx.CognosSecurities
where (c.SecurityType == 1 || c.SecurityType == 2)
select new {c.SecurityType, c.LoginName , c.SecurityName}).Distinct();
//
// Create Linq Query for all segments in "CognosSecurity"
//
var userListAuthoritative3 = (from c in ctx.CognosSecurities
where c.SecurityType == 3 || c.SecurityType == 0
select new {c.SecurityType , c.LoginName }).Distinct();
I think I see where to go with this... but to answer the question the types of the objects are int, string, string for SecurityType, LoginName , and SecurityName respectively
If you're wondering why I have them broken like this is because I want to ignore one column when doing a distinct. Here are the SQL queries that I'm converting to SQL.
select distinct SecurityType, LoginName, 'Segment'+'-'+SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType =1
select distinct SecurityType, LoginName, 'Business Line'+'-'+SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType =2
select distinct SecurityType, LoginName, SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType in (1,2)
You can't join these because the types are different (first has 3 properties in the resulting type, second has two).
If you can tolerate putting a null value in for the 3rd result of the second query this will help. I would then suggest you just do a userListAuthoritative.concat(userListAuthoritative3 ) BUT I think this will not work as the anonymous types generated by the linq will not be of the same class, even tho the structure is the same. To solve that you can either define a CustomType to encapsulate the tuple and do select new CustomType{ ... } in both queries or postprocess the results using select() in a similar fashion.
Acutally the latter select() approach will also allow you to solve the parameter count mismatch by implementing the select with a null in the post-process to CustomType.
EDIT: According to the comment below once the structures are the same the anonymous types will be the same.
I assume that you want to keep the results distinct:
var merged = userListAuthoritative.Concat(userListAuthoritative3).Distinct();
And, as Mike Q pointed out, you need to make sure that your types match, either by giving the anonymous types the same signature, or by creating your own POCO class specifically for this purpose.
Edit
If I understand your edit, you want your Distinct to ignore the SecurityName column. Is that correct?
var userListAuthoritative = from c in ctx.CognosSecurities
where new[]{0,1,2,3}.Contains(c.SecurityType)
group new {c.SecurityType, c.LoginName, c.SecurityName}
by new {c.SecurityType, c.LoginName}
select g.FirstOrDefault();
I'm not exactly sure what you mean by merge, since you're returning different (anonymous) types from each one. Is there a reason the following doesn't work for you?
var userListAuthoritative = (from c in ctx.CognosSecurities
where (c.SecurityType == 1 || c.SecurityType == 2 || c.SecurityType == 3 || c.SecurityType == 0)
select new {c.SecurityType, c.LoginName , c.SecurityName}).Distinct();
Edit: This assumed they were of the same type -- but they're not.
userListAuthoritative.Concat(userListAuthoritative3);
Try below code, you might need to implement IEqualityComparer<T> in your ctx type.
var merged = userListAuthoritative.Union(userListAuthoritative3);

Resources