Creating advanced SUMIF() calculations in Quicksight - amazon-quicksight

I have a couple of joined Athena tables in Quicksight. The data looks something like this:
Ans_Count | ID | Alias
10 | 1 | A
10 | 1 | B
10 | 1 | C
20 | 2 | D
20 | 2 | E
20 | 2 | F
I want to create a calculated field such that it sums the Ans_Count column based on distinct IDs only. i.e., in the example above the result should be 30.
How do I do that?? Thanks!

Are you looking for the sum before or after applying a filter?
Sumif(Ans_Count,ID) may be what your looking for.
If you need to always return the result of the sum, regardless of the filter on the visual, look at the sumOver() function.

You can use distinctCountOver at PRE_AGG level to count unique number of values for a given partition. You could use that count to drive the sumIf condition as well.
Example : distinctCountOver(operand, [partition fields], PRE_AGG)
More details about what will be visual's group by specification and an example where there duplicate IDs will help give a specific solution.
It might even be as simple as minOver(Ans_Count, [ID], PRE_AGG) and using SUM aggregation on top of it in the visual.

If you want another column with the values repeated, use sumOver(Ans_Count, [ID], PRE_AGG). Or, if you want to aggregate via QuickSight, you would use sumOver(sum(Ans_Count), [ID]).

I agree with the above suggestions to use sumOver(sum(Ans_Count), [ID]).
I have yet to understand the use cases for pre_agg, so if anyone has concrete examples please share them!
Another suggestion would be to do a sumover + partition by in your table (if possible) before uploading the dataset, then checking if the results matche with Quicksight's aggregations. I find Quicksight can be tricky with calculated fields, aggregations, and nested ifs so I've been doing calculations in SQL where possible before bringing it in to quicksight to have a better grasp of what the outputs should look like. This obviously is an extra step, but can help in understanding how quicksight pulls off calcs and brings up figures (as the documentation doesn't always give much), and spotting things that don't look right (I've had a few) before you share your analysis with a wider group.

Related

Perform an OR operation on same field from multiple rows SSRS

I am a beginner trying to achieve a simple operation in SSRS using Visual Studio 2019. I have a query which returns a table as follows
ID | Name | Married
1 | Jack | Y
2 | Jack | N
The number of records might vary depending on the number of results. On the report, I want to display only the field 'Married' once. The value of the field will be determined using an OR operation, i.e. if the field 'Married' is 'Y' for any one record, I want to display a 'Y' on the report.
Assuming the Values are either Y or N, you should be able to use something like
=MAX(Fields!Married.Value)
If you report is grouped by, for example, Name then this will give you the MAX value within each group which is probably what you want.
If this does not help, edit your question and show
Your report design
Row Group panel plus details of grouping
A larger sample of data
Expected results from that sample data

Get matching records in Laravel

I'm not sure if this is obvious and I don't see it because it is late here, but right now I'm struggling with the following:
I'm trying to find out if there is a match somewhere. So, profile 2 liked profile 1 and also, profile 1 liked profile 2. That would be a match.
I tried combining arrays but that that ran nowhere. ._. How could I archive this in Laravel queries?
$likey = DB::table('likes AS liker')
->join('likes AS liked', 'liker.liked_id', '=', 'liked.liker_id')
->select('liked.liker_id', 'liked.liked_id')
->where('liker.liker_id', '=', 'liked.liked_id')
->get();
Something along those lines.
EDIT: Just to clarify this solution so you don't get into temptation of copy pasting this and never figuring out what just happened here;
we are joining (using INNER JOIN, very important) this table to itself simpy because (just like you've said it) we have to check it twice. First for the liker (the one who liked someones profile first), than for the liked (the one who responded with a like in return) user. Having that in mind, we join this table checking liked_id from the first table on liker_id on the second table.
Which should give us joined result looking like:
liker.liker_id | liker.liked_id | liked.liker_id | liked.liked_id
-----------------------------------------------------------------
2 | 1 | 1 | 2
1 | 2 | 2 | 1
Mind you this will give us duplicates! (VERY IMPORTANT).
Having that in mind I would think about redisigning your table. For example adding boolean column named "liked_back" will give you much cheaper and cleaner queries rather than doing whatever this is...

Filter after grouping columns in Power BI

I want to accomplish something easy to understand (and maybe easy to do but I can't find a way...).
I have a table which represents the date when a client has bought something.
Let's have this example:
=============================================
Purchase_id | Purchase_date | Client_id
=============================================
1 | 2016/03/02 | 1
---------------------------------------------
2 | 2016/03/02 | 2
---------------------------------------------
3 | 2016/03/11 | 3
---------------------------------------------
I want to create a single number card which will be the average of purchase realised by day.
So for this example, the result would be:
Result = 3 purchases / 2 different days = 1.5
I managed doing it by grouping in my query by Purchase_date and my new column is the number of rows.
It gives me the following query:
==================================
Purchase_date | Number of rows
==================================
2016/03/02 | 2
----------------------------------
2016/03/11 | 1
----------------------------------
Then I put the field Number of rows in a single number card, selecting "Average".
I have to precise that I am using Direct Query with SQL Server.
But the problem is that I want to have a filter on the Client_id. And once I do the grouping, I lose this column.
Is there a way to have this Client_id as a parameter?
Maybe even the fact of grouping is not the right solution here.
Thank you in advance.
You can create a measure to calculate this average.
From Power BI's docs:
The calculated results of measures are always changing in response to
your interaction with your reports, allowing for fast and dynamic
ad-hoc data exploration
This means filtering client_id's will change the measure accordingly.
Here is an easy way of defining this measure:
Result = DISTINCTCOUNT(tableName[Purchase_date])/DISTINCTCOUNT(tableName[Purchase_id])

Simplifying a Cascading pipeline used for aggregating sales data

I'm very new to Cascading and Hadoop both, so be gentle... :-D
I think I'm finding myself way over-engineering something. Basically my situation is that I have a pipe delimited file with 9 fields. I want to compute some aggregated statistics over those 9 fields using different groupings. The result should be 10 fields of which only 6 are either counts or sums. So far I'm up to 4 Unique pipes, 4 CountBy pipes, 1 SumBy, 1 GroupBy, 1 Every, 2 Each, 5 CoGroups and a couple others. I'm needing to add another small piece of functionality and the only way I can see to do it is to add in 2 Filters, 2 more CoGroups and 2 more Each pipes. This all seems like way overkill just to compute a few aggregated statistics. So I'm thinking I'm really misunderstanding something.
My input file looks like this:
storeID | invoiceID | groupID | customerID | transaction date | quantity | price | item type | customer type
Item type is either "I", "S" or "G" for inventory, service or group items, customers belong to groups. The rest should be self-explanatory
The result I want is:
project ID | storeID | year | month | unique invoices | unique groups | unique customers | customer visits | inventory type sales | service type sales |
project ID is a constant, customer visits is how many days during the month the customer came in and bought something
The setup that I'm using right now uses a TextDelimited Tap as my source to read the file and passes the records to an Each pipe which uses a DateParser to parse the transaction date and adds in year, month and day fields. So far so good. This is where it gets out of control.
I'm splitting the stream from there up into 5 separate streams to process each of the aggregated fields that I want. Then I'm joining all the results together in 5 CoGroup pipes, sending the result through Insert (to insert the project ID) and writing through a TextDelimited sink Tap.
Is there an easier way than splitting into 5 streams like that? The first four streams do almost the exact same thing just on different fields. For example, the first stream uses a Unique pipe to just get unique invoiceID's then uses a CountBy to count the number of records with the same storeID, year and month. That gives me the number of unique invoices created for each store by year and month. Then there is a stream that does the same thing with groupID and another that does it with customerID.
Any ideas for simplifying this? There must be an easier way.

Calculate percentage of total columns based on total column in SSRS Matrix

Looking to add a column in my SSRS Matrix which will give me the percentage from the total column in that row.
I'm using the following expression, but keep getting 100% for my percentages (I'm assuming this is because the total is evaluated last, so it's just doing Total/Total?
=FORMAT((Fields!ID.Value/SUM(Fields!ID.Value)), "P")
The field ID is calcuted within SQL, not SSRS.
For example
Site | Value 1 | %1 | Value2 | %2 | Total
1 | 20 | 50% | 20 | 50% | 40
Probably this is happening because you need define the right scope for the SUM function:
SUM(Fields!ID.Value,"group_name") instead of plain SUM(Fields!ID.Value)
Updated:
I needed some time to make an example since I didn't have reporting services available the first time I answered you.
You can see the result and the field values
Hard to provide details without more info on the setup of your groups, but you should look at using the scope option to the aggregate operators like SUM or first:
=SUM(Fields!ID.Value, "NameOfRowGrouping") / SUM(Fields!ID.Value, "TopLevelGroupName")
Also, to keep things clean, you should move your format out of the expression and to either the placeholder properties or textbox properties that contains your value.

Resources