AWS QuickSight filtering based on result of a query or other dataset - amazon-quicksight

I want to create an analysis table in AWS Quicksight that shows the quantity sold in a given month and it's subsequent month based on users who made a purchase on the given month.
Let's say I have a dataset called user_orders with the following data:
+---------+----------+------------+
| user_id | quantity | order_date |
+---------+----------+------------+
| 1 | 2 | 2020-04-01 |
+---------+----------+------------+
| 1 | 3 | 2020-04-02 |
+---------+----------+------------+
| 1 | 1 | 2020-05-23 |
+---------+----------+------------+
| 1 | 2 | 2020-06-02 |
+---------+----------+------------+
| 2 | 1 | 2020-05-03 |
+---------+----------+------------+
| 2 | 1 | 2020-05-04 |
+---------+----------+------------+
| 3 | 2 | 2020-04-07 |
+---------+----------+------------+
| 3 | 1 | 2020-04-10 |
+---------+----------+------------+
| 3 | 1 | 2020-06-23 |
+---------+----------+------------+
For example, using the table above I want to be able to show how many quantities sold in April, May, June, and so on (max 12 months) by users who made a purchase in April.
The resulting table should look like this:
+-----------+----------+
| | quantity |
+-----------+----------+
| 04-2020 | 8 |
+-----------+----------+
| 05-2020 | 1 |
+-----------+----------+
| 06-2020 | 3 |
+-----------+----------+
8 sold in April because user_id 1 made 5 purchase and user_id 3 made 3 purchase while user_id 2 did not make any purchase.
There is only 1 item sold in May because only user_id 1 made the purchase in May, but also made a purchase in April. user_id 2 also made a purchase in May but didn't in April so it's not counted.
I can make the table above using PHP and MySQL fairly easily using the following code:
# first get all the user ids who made a purchase in April
$user_ids = sql_query("SELECT DISTINCT user_id FROM user_orders WHERE order_date BETWEEN '2020-04-01' AND '2020-04-30'");
# get the quantity sold for each month by users who made a purchase in April
$purchases = sql_query("SELECT MONTH(order_date), SUM(quantity) FROM user_orders WHERE user_id IN ({$user_ids}) AND order_date BETWEEN '2020-04-01' AND '2021-03-31' GROUP BY MONTH(order_date);")
(Obviously, April is just an example, I'd like to be able to change the starting month dynamically using QuickSight control)
As my above example shown, it requires two queries to perform this analysis. First, is to get the user_ids of the users, and the next is to actually get the quantity sold by the users.
I have been trying to achieve this using Quicksight for the last 3 days but hasn't found any way yet.
I hope someone can point me in the right direction.
Thank you!

You can achieve this by creating a calculated field like this and filtering on it
distinctCountOver(ifelse(truncDate('MM', {order_Date}) = parseDate('2020-04-01'), 1, NULL), [{user_id}], PRE_AGG)
(ofcourse, you can change the parseDate portion to be your date parameter)
Now, lets say the name of the above calculated field is SpecificMonthUser. You can add a filter sum(SpecificMonthUser) != 0.
And then create a pivot table visualization with OrderDate, user id in the rows and sum(quantity) in the values. You should get the desired result.

Related

How can I calculate the total number of timestamps for a given ID and code in Hive

We have code and products across different domains. We are tracking different timestamps at which the products transmit data. This how my data looks:
| timestamp_s | Product | Code | _c3 |
| 2017-01-01 01:18:04.40736 | A | 119 | 1 |
| 2017-01-01 01:18:05.20419 | A | 119 | 1 |
| 2017-01-01 01:18:11.21268 | A | 119 | 1 |
| 2017-01-01 10:48:22.52147 | A | 119 | 1 |
I want to find the total timestamps recorded for a product and code across different timestamps. Meaning like in the above case, the output count of timestamps for 2017-01-01 should be 4 for a unique product A and code 119 combination.
Basically, I want to see the total number of timestamp records for each day (meaning, the total count of all timestamp record(s) for 2017-01-01 as in the above case).
Use simple groupby+count distinct:
select count(distinct timestamp_s) ts_count, Product, Code
from table_name
group by Product, Code
Or if you need to have additional column with count on the same grain (without group by) use analytic function:
select timestamp_s, count(distinct timestamp_s) over(partition by Product, Code) ts_count, Product, Code
from table_name

Show daily count and the total count up to that day in Quicksight

I want to create a table analysis in AWS Quicksight that shows the number of new user per day and also the total number of user that has registered up until that day for the specified month.
The following sample table is what I want to achieve in Quicksight.
It shows the daily register count for March:
+-----------+----------------------+----------------------+
| | Daily Register Count | Total Register Count |
+-----------+----------------------+----------------------+
| March 1st | 2 | 42 |
+-----------+----------------------+----------------------+
| March 2nd | 5 | 47 |
+-----------+----------------------+----------------------+
| March 3rd | 3 | 50 |
+-----------+----------------------+----------------------+
| March 4th | 8 | 58 |
+-----------+----------------------+----------------------+
| March 5th | 2 | 60 |
+-----------+----------------------+----------------------+
The "Total Register Count" column above should show the total count of users registered from the beginning up until March 1st, and then for each row it should be incremented with the value from "Daily Register Count"
I'm absolutely scratching my head trying to implement the "Total Register Count". I have found some form of success using runningSum function however I need to be able to filter my dataset by month, and the runningSum function won't count the number outside of the filtered date.
My dataset is very simple, it looks like this:
+----+-------------+---------------+
| id | email | registered_at |
+----+-------------+---------------+
| 1 | aaa#aaa.com | 2020-01-01 |
+----+-------------+---------------+
| 2 | bbb#aaa.com | 2020-01-01 |
+----+-------------+---------------+
| 3 | ccc#aaa.com | 2020-01-03 |
+----+-------------+---------------+
| 4 | abc#aaa.com | 2020-01-04 |
+----+-------------+---------------+
| 5 | def#bbb.com | 2020-02-01 |
+----+-------------+---------------+
I hope someone can help me with this.
Thank you!
I am new to QuickSight but the way I was able to get Total Register Count is by creating a calculated field called count and assigned it the fixed value of 1.
Then I created a second calculated field "Total Register Count" with the following formula
runningSum(sum(count), [{ registered_at} ASC], [])
It sounds as if the CountOver function would work well for you. You'll need to partition your count by the day of the month (using the extract function). Here is a link related to the CountOver function.
https://docs.aws.amazon.com/quicksight/latest/user/countOver-function.html
This is called a Level Aware Aggregation in QuickSight. Here is additional information on that:
https://docs.aws.amazon.com/quicksight/latest/user/level-aware-aggregations.html
Here is information on the extract function:
https://docs.aws.amazon.com/quicksight/latest/user/extract-function.html
If I were to take a stab at your formula, it would look like this:
countover(ID,[extract('DD',registered_at)],PRE_FILTER)
Your table would have the registered_at field as the date.

Power BI chart with slicers and two bar when the second bar have modificated filter

I have small problem with chart in PowerBi.
I created 4 tables and set up connections between them:
Year
-----------
Id | Value
-----------
Y_19| 2019
Y_20| 2020
Y_21| 2021
Month
-----------
Id | Value
-----------
M_Jan | January
M_Feb | February
M_Mar | March
M_Apr | April
Tax
--------------------------
Id | Value | YearId
--------------------------
Tax_Y_19 | 0.9 | Y_19
Tax_Y_20 | 0.1 | Y_20
Tax_Y_21 | 0.4 | Y_21
Amount
-------------------------------
Id | MonthId | YearId | Value
A_01 | M_Jan | Y_19 | 10000
A_02 | M_Feb | Y_19 | 2000
A_03 | M_Mar | Y_19 | 4000
A_04 | M_Jan | Y_20 | 5000
A_05 | M_Feb | Y_20 | 70000
A_06 | M_Mar | Y_20 | 10000
Then I created slicers where I can choose year (connected column Year[Value]) and created chart with Amount[Value] (y axis), Month[Value] (x axis).
Everything works fine for me, thanks to the relationship of the tables, everything is filtered by year.
But I would like to show the values ​​multiplied by tax if I can somehow make the chart [Tax] * [Amount] - both tables are already filtered by the slicers and when I get it,
I would like to show the bar for Year-1 data on this chart, but I already have a global filter by slicers for a specific year, how could I do it.
I tried to use measure in which I get data from a slicers (Year) using selectedvalue ​​and change the filter on the tables (Tax and Amount) with keepfilter, but measure returns an error that I can't return the table
Your question isn't entirely clear but if your Year table has a relationship with Tax and Amount, then I'm guessing you're after something like this as a measure:
TaxedAmount =
SUMX ( Year, RELATED ( Tax[Value] ) * CALCULATE ( SUM ( Amount[Value] ) ) )

MySQL - Top 5 rank best seller plans or courses

I sell subscriptions of my online course, as well as the courses in retail.
I would bring the "top 5" of best selling plans / courses. For this, I have a table called "subscriptionPlan", which stores the purchased plan ID, or in the case of a course, the course ID, and the amount spent on this transaction. Example:
table subscriptionPlan
sbpId | subId | plaId | couId | sbpAmount
1 | 1 | 1 | 1 | 499.99
2 | 2 | 1 | 2 | 499.99
3 | 3 | 2 | 0 | 899.99
4 | 4 | 1 | 1 | 499.99
Just for educational purposes, plaId = 1 is a plan called "Single Sale" that I created, to maintain the integrity of the DB. When the couId isn't empty, you also have bought a separate course, not a plan where you can attend any course.
My need is: List the top 5 sales. If it is a plan, display the plan name (plan table, column plaTitle). If it is a course, display its name (table course, colna couTitle). This logic that I can't code. I was able to rank a top 5 of PLANS, but it groups the courses, since the GROUP BY is by the ID of the plan. I believe the prank is here, maybe creating an IF / ELSE in this GROUPBY, but I don't know how to do this.
The query that i code, to rank my top 5 plans is:
SELECT sp.plaId, sp.couId, p.plaTitle, p.plaPermanent, c.couTitle, SUM(sbpAmount) AS sbpTotalAmount
FROM subscriptionPlan sp
LEFT JOIN plan p ON sp.plaId = p.plaId
LEFT JOIN course c ON sp.couId = c.couId
GROUP BY sp.plaId
ORDER BY sbpTotalAmount DESC
LIMIT 5
The result that i expected was:
plaId | couId | plaTitle | couTitle | plaPermanent | sbpTotalAmount
1 | 1 | Venda avulsa | Curso 01 | true | 999.98
2 | 0 | Acesso total | null | false | 899.99
3 | 2 | Venda avulsa | Curso 02 | true | 499.99
How could I get into this query formula?
When grouping you can use:
Simple columns, or
Any [complex] expression.
In your case, it seems you need to group by an expression, such as:
GROUP BY CASE WHEN sp.plaId = 1 THEN -1 ELSE sp.couId END
In this case I chose -1 as the grouping for the "Single Plan". You can replace the value for any other that doesn't match any couId.

Hive Contiguous Date Ranges

I am using Hive and I would like to take a table with a historical list of customers, subscription events, and subscription types and summarize by contiguous runs of subscription types for each customer.
Example Input (db.cust_hist):
customer_id | eff_dt | exp_dt | sub_cd | sub_type
---------------------------------------------------------
1 | 02/01/2015 | 03/01/2015 | active | A
1 | 03/01/2015 | 04/01/2015 | active | A
1 | 03/15/2015 | 12/31/9999 | cancel | A
1 | 04/01/2015 | 05/01/2015 | active | A
1 | 05/01/2015 | 06/01/2015 | active | A
1 | 02/01/2015 | 03/01/2015 | active | B
1 | 03/01/2015 | 04/01/2015 | active | B
The sub_cd in this case refers to the type of event that is effective over the date range for that row. For example, the user canceled their A subscription type on 3/15 and resumed on 4/01.
The output I'm trying to get looks like this (db.cust_snapshot):
customer_id | eff_dt | exp_dt | sub_type
------------------------------------------------
1 | 02/01/2015 | 03/15/2015 | A
1 | 04/01/2015 | 06/01/2015 | A
1 | 02/01/2015 | 04/01/2015 | B
and reflects the gap in coverage.
From what I have read in this link from BetterAtOracle (specific to SQL) which does a very good job of laying things out, I need to use row numbers and a lagging window, but I can't seem to apply it to my situation in Hive (perhaps because of the 12/31/9999 notation/subscription code?)
I tried:
SELECT customer_id
, eff_dt
, exp_dt
, sub_cd
, sub_type
, CASE WHEN DATEDIFF(TO_DATE(eff_dt), TO_DATE(lag(exp_dt) OVER (PARTITION BY customer_id, sub_type ORDER BY eff_dt)) <=1 THEN NULL
ELSE row_number() OVER(PARTITION BY customer_id, sub_type ORDER BY eff_dt)
END) as grp
FROM db.cust_hist
ORDER BY TO_DATE(eff_dt)
As you can see, I haven't applied the subscription event code. This sort of gets me there as I can start to see different groups based on subscription type, but I feel like I'm stuck from here on out.
Any help or pointers would be greatly appreciated. Before this task, I never understood the true power of ranks, rows, lag, and other window functions!

Resources