How create a siddhi app with Multiple conditional count - window

I have a special situation and I can not implement it with siddhi options like window, pattern or aggregation functions. The data comes from 2 streams, I set the source in both streams of KAFKA and I set the list of topics in siddhi source p1, p2. I wrote a query for checks 2 rules (type = "h") and (type = "g"). The siddhi app must only allows events to match these conditions. I need to aggregate every 10 seconds, when the number of events that match the first condition is 2 and the number of events that match the second condition is 5 at this time. How?

Finally found the solution:
from stream1#window.time(10 seconds)
select type, id, sum(ifThenElse(type == 'h', 1, 0)) as cnt1, sum(ifThenElse(type == 'g', 1, 0)) as cnt2
having cnt1 == 2 and cnt2 == 5
insert all events into stream2;
I tested this code manytimes with many values(0, 1, 2, ...) that passed all of them.

Related

DAX IF measure - return fixed value

This should be a very simple requirement. But it seems impossible to implement in DAX.
Data model, User lookup table joined to many "Cards" linked to each user.
I have a measure setup to count rows in CardUser. That is working fine.
<measureA> = count rows in CardUser
I want to create a new measure,
<measureB> = IF(User.boolean = 1,<measureA>, 16)
If User.boolean = 1, I want to return a fixed value of 16. Effectively, bypassing measureA.
I can't simply put User.boolean = 1 in the IF condition, throws an error.
I can modify measureA itself to return 0 if User.boolean = 1
measureA> =
CALCULATE (
COUNTROWS(CardUser),
FILTER ( User.boolean != 1 )
)
This works, but I still can't find a way to return 16 ONLY if User.boolean = 1.
That's easy in DAX, you just need to learn "X" functions (aka "Iterators"):
Measure B =
SUMX( VALUES(User.boolean),
IF(User.Boolean, [Measure A], 16))
VALUES function generates a list of distinct user.boolean values (1, 0 in this case). Then, SUMX iterates this list, and applies IF logic to each record.

PIG- Aggregations based on multiple columns

My Input data set has 3 columns and schema looks like below:
ActivityDate, EventId, EventDate
Now, using pig i need to derive multiple variables like below in one output file:
1) All Event Ids after ActivityDate >= EventDate -30 days
2) All Event Ids after ActivityDate >= EventDate -60 days
3) All Event Ids after ActivityDate >= EventDate -90 days
I have more than 30 variables like this. If it is one variable, we can use simple FILTER to filter the data.
I am thinking about any UDF implementation which takes bag as input and returns count of Event IDs based on above criteria for each parameter.
What is the best way to aggregate the data on multiple columns in pig ?
I would suggest creating another file with all of your thresholds and cross joining with the file.
so you would have a file containing:
30
60
90
etc
read it like this:
grouping = load 'grouping.txt' using PigStorage(',') as (groups:double);
Then do:
data_with_grouping = cross data, grouping;
Then have this binary condition:
data_with_binary_condition = foreach data_with_grouping generate ActivityDate, EventId, EventDate, groups, (ActivityDate >= EventDate - groups ? 1 : 0) as binary_condition;
Now you will have one column with the threshold and one column with a binary variable that tells you whether the ID follows the condition or not.
you can do a filter out all of the zeros from the binary_condition and then group on the groups column:
data_with_binary_condition_filtered = filter data_with_binary_condition by (binary_condition != 0);
grouped_by_threshold = group data_with_binary_condition_filtered by groups;
count_of_IDS = foreach grouped_by_threshold generate group, COUNT(data_with_binary_condition.EventId);
I hope this works. Obviously, I didn't debug it for you since I don't have your files.
This code will take a tad more time to run, but it will produce the output you need without a UDF.
If I understand your question correctly, you want to divide the difference between EventDate and ActivityDate in 30 days blocks (e.g. 1 to 30, 31 to 60, 61 to 90 and so on) and then count the frequency of each block.
In this case, I would just rearrange the above equation to create the variable 'range' as below:
// assuming input contains 3 columns ActivityDate, EventId, EventDate
// dividing the difference between ED and AD by 30 and casting it to int, so that 1 block is represented by 1 integer.
input1 = FOREACH input GENERATE (int)((EventDate - ActivityDate) / 30) as range;
output1 = GROUP input1 BY range;
output2 = FOREACH output1 GENERATE group AS range, COUNT(range) as count;
Hope this helps.

simpler mdx filter solution

I have a query which returns results for last 12 months, and I need to apply a filter in a way, that only product models with particular measure in last 7 months > 0 (at least in one of these months) are returned.
I could do it in this way:
SELECT {[Measures].[MQ]} ON COLUMNS,
FILTER([dim_ProductModel].[Product Model].members, (([Dim_Date].[Date Full].&[2013-08-01],[Measures].[MQ]) > 0) OR (([Dim_Date].[Date Full].&[2013-09-01],[Measures].[MQ]) > 0) OR ([Dim_Date].[Date Full].&[2013-10-01],[Measures].[MQ]) > 0) * {[Dim_Date].[Date Full].&[2013-08-01]:[Dim_Date].[Date Full].&[2014-02-01]} ON ROWS FROM [cub_dashboard_spares]
(I left other ORs conjuctions) So I would need 6 ORs which I dont like,
somehow it is not possible to write the filter it in this way as I would expect (pseudocode):
FILTER([dim_ProductModel].[Product Model].members, (ANY({[Dim_Date].[Date Full].&[2013-08-01]:[Dim_Date].[Date Full].&[2014-02-01]}),[Measures].[MQ]) > 0)
, is there please some trick/syntax how to avoid using mutliple ORs? Like ANY or idk..
thank you very much for help in advance,
You could use another Filter() and check if that has at least one result with Count:
FILTER([dim_ProductModel].[Product Model].members,
FILTER({[Dim_Date].[Date Full].&[2013-08-01]:[Dim_Date].[Date Full].&[2014-02-01]}),
[Measures].[MQ]) > 0
).Count > 0
)
Assuming there are no negative values of MQ, you can also observe that if at least one month has values > 0, then the sum must be > 0, and use
FILTER([dim_ProductModel].[Product Model].members,
Sum({[Dim_Date].[Date Full].&[2013-08-01]:[Dim_Date].[Date Full].&[2014-02-01]}),
[Measures].[MQ]
)
> 0
)

How to get desired output in row RDLC report

I have the following report below i can get all the rows
SERVER NAME COUNT1 COUNT2 Count+% 7days count+% 30days
All Servers
( Server1,
Server2,
Server 3) 7 1,501 500 (20%) 850 (53.3%)
Server 1 2 705 200 (28.3%) 350 (49.6%)
Server 2 3 396 100 (25.2%) 200 (50.5%)
Server 3 2 400 200 (50%) 300 (75%)
I have done the last three rows, but how to get the top row? How can we sum up all the rows.In the top row as above?
the others are simple countDistint and count ,
For Count+ % students :
=Sum(IIf(Fields!Logged7Days.Value = "no", 1, 0)) &" ( "& FormatNumber((Sum(IIf(Fields!Logged7Days.Value = "no", 1, 0)) *100) / count(Fields!f_IdPupil.Value),2) & "%) "
And simply change column to 30days , for the next one basically they tell if the student was present in 7 days, 30 days time.
SAMPLE DATA:
Thank you
You can use your expression in any table/group header row; it will just be applied in that current scope, e.g. in a group header the aggregate will be applied to all rows in the group, and in a table header it will be applied to all rows in the dataset.
Say I have the following data:
I've created a simple report based on this:
You can see there are two table header rows, one with headings and one with data, and one group header row - the group is based on ServerName.
For the 7 Day Expression column both the fields in the table header row and the group header row have exactly the same expression:
=Sum(IIf(Fields!Logged7Days.Value = "no", 1, 0))
& " ( "
& FormatNumber((Sum(IIf(Fields!Logged7Days.Value = "no", 1, 0)) * 100)
/ count(Fields!f_IdPupil.Value),2) & "%) "
The is just your exact same expression with some minor formatting. A similar expression is applied in the 30 Day Expression column:
=Sum(IIf(Fields!Logged30Days.Value = "no", 1, 0))
& " ( "
& FormatNumber((Sum(IIf(Fields!Logged30Days.Value = "no", 1, 0)) * 100)
/ count(Fields!f_IdPupil.Value),2) & "%) "
You can see the results are as expected at both the group and grand total levels:
All this is to show that you just need to apply the same expression in the different scopes to get your results. If this doesn't look right to you; please let me know what results you expect based on my dataset.

How can I merge two outputs of two Linq queries?

I'm trying to merge these two object but not totally sure how.. Can you help me merge these two result objects?
//
// Create Linq Query for all segments in "CognosSecurity"
//
var userListAuthoritative = (from c in ctx.CognosSecurities
where (c.SecurityType == 1 || c.SecurityType == 2)
select new {c.SecurityType, c.LoginName , c.SecurityName}).Distinct();
//
// Create Linq Query for all segments in "CognosSecurity"
//
var userListAuthoritative3 = (from c in ctx.CognosSecurities
where c.SecurityType == 3 || c.SecurityType == 0
select new {c.SecurityType , c.LoginName }).Distinct();
I think I see where to go with this... but to answer the question the types of the objects are int, string, string for SecurityType, LoginName , and SecurityName respectively
If you're wondering why I have them broken like this is because I want to ignore one column when doing a distinct. Here are the SQL queries that I'm converting to SQL.
select distinct SecurityType, LoginName, 'Segment'+'-'+SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType =1
select distinct SecurityType, LoginName, 'Business Line'+'-'+SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType =2
select distinct SecurityType, LoginName, SecurityName
FROM [NFPDW].[dbo].[CognosSecurity]
where SecurityType in (1,2)
You can't join these because the types are different (first has 3 properties in the resulting type, second has two).
If you can tolerate putting a null value in for the 3rd result of the second query this will help. I would then suggest you just do a userListAuthoritative.concat(userListAuthoritative3 ) BUT I think this will not work as the anonymous types generated by the linq will not be of the same class, even tho the structure is the same. To solve that you can either define a CustomType to encapsulate the tuple and do select new CustomType{ ... } in both queries or postprocess the results using select() in a similar fashion.
Acutally the latter select() approach will also allow you to solve the parameter count mismatch by implementing the select with a null in the post-process to CustomType.
EDIT: According to the comment below once the structures are the same the anonymous types will be the same.
I assume that you want to keep the results distinct:
var merged = userListAuthoritative.Concat(userListAuthoritative3).Distinct();
And, as Mike Q pointed out, you need to make sure that your types match, either by giving the anonymous types the same signature, or by creating your own POCO class specifically for this purpose.
Edit
If I understand your edit, you want your Distinct to ignore the SecurityName column. Is that correct?
var userListAuthoritative = from c in ctx.CognosSecurities
where new[]{0,1,2,3}.Contains(c.SecurityType)
group new {c.SecurityType, c.LoginName, c.SecurityName}
by new {c.SecurityType, c.LoginName}
select g.FirstOrDefault();
I'm not exactly sure what you mean by merge, since you're returning different (anonymous) types from each one. Is there a reason the following doesn't work for you?
var userListAuthoritative = (from c in ctx.CognosSecurities
where (c.SecurityType == 1 || c.SecurityType == 2 || c.SecurityType == 3 || c.SecurityType == 0)
select new {c.SecurityType, c.LoginName , c.SecurityName}).Distinct();
Edit: This assumed they were of the same type -- but they're not.
userListAuthoritative.Concat(userListAuthoritative3);
Try below code, you might need to implement IEqualityComparer<T> in your ctx type.
var merged = userListAuthoritative.Union(userListAuthoritative3);

Resources