Tableau Help! Adding calculated columns and D day filters - filter

i am fairly new to tableau, i am working on app health attributes basically, like installs , uninstalls,re installs,logins and other levels such as l1 , l2, l3(the user goes on till which level basically)
there are two things that i basically need to do:
Add d filters like
when datediff(days, event_time, install_time) < 2 then D1,
when datediff(days, event_time, install_time) < 7 then D7,
when datediff(days, event_time, install_time) < 30 then D30
else null end as d_filter
so basically a drop down filter of D DAYS which applies to all the metrics.
Adding columns like l1/installs,l2/installs i have tried creating a calculated field but when i finally add that to the view , it shows 0 values( even when i tried to view the ratio in a new sheet it showed wrong calculations)
i am attaching the screenshots for both the things, any help would be greatly appreciated!!!

Related

Business Objects 4.x: need to combine two queries similar to a UNION

I can't seem to figure out how to combine the result of 2 Business Objects queries.
Both queries return a set of codes and a number of hours. Query 1 can have codes that do not appear in Query 2, and Query 2 can have codes that do not appear in Query 1.
The resulting report should contain all codes from both Query 1 and Query2, a column with the sum of hours from Q1 for that code, and a column with the sum of hours from Query 2 for that code. If one of the queries doesn't have a code in it, it would return a blank or 0 total.
Example:
Q1 results:
|Code|Value|
|:---|:----|
|A|15|
|A|17|
|B|12|
|D|22|
|D|35|
|E|16|
|E|9|
|E|11|
Q2 results:
|Code|Value|
|:---|:----|
|A|5|
|A|19|
|B|33|
|C|17|
|C|24|
|E|78|
|E|12|
Report:
|Code|Value1|Value2|
|----|------|------|
|A|32|24|
|B|12|33|
|C| |41|
|D|57| |
|E|36|90|
|Total|137|188|
When I create the Business Object report table as normal, only the values of Query 1 are used, and I miss the row for value C. If I flip the queries around, I miss the row for value D.
How do I set up my report to show all the code values?
Edit: Sorry for the formatting of the tables, in the preview it looks perfect. :(

How to create a measure to count the number of times item appears in a filtered list (power BI, DAX)

Hi, I would like to create a measure in power BI to return the number of times the terminal code appears in my list. The list is filtered from a slicer when I select the service code.
Thanks for any help! been stuck at this seemingly simple problem for a few days!
If you want to count occurence in selected filter context then use ALLSELECTED:
CountOfCode = CALCULATE( COUNTROWS(Sheet1), filter(ALLSELECTED(Sheet1), Sheet1[Terminal Code ] = SELECTEDVALUE(Sheet1[Terminal Code ]) ))

PowerBI: Slicer to filter a table Only when more than 1 value is selected

I have a table with 5 categories and units displayed into 2 types, Actual and budget.
I want to filter this table. Only when 2 or more values are selected in the slicer. Something like this.
I though of adding a measure, but dont know how to work the if statement exactly.
Measure = IF(COUNTROWS(ALLSELECTED(Report[Shipment Group])) = 1, "Something which would not filter the units", SELECTEDVALUE(Report[Units], SUM(Report[Units])))
Not sure if this is correct approach.Would like to know if any other approach is possible. Any help would be helpful. Thank you in advance.
This is a bit of an odd request, but I think I have something that works.
First, you need to create a separate table for your slicer values (or else you can't control filtering how you want). You can hit the new table button and define it as follows:
Groups = VALUES(Report[Shipment Group])
Set your slicer to use Groups[Shipment Group] instead of Report[Shipment Group].
Define your new measure as follows:
Measure = IF(COUNTROWS(ALLSELECTED(Groups[Shipment Group])) = 1,
SUM(Report[Units]),
SUMX(FILTER(Report,
Report[Shipment Group] IN VALUES(Groups[Shipment Group])),
Report[Units]))
or equivalently
Measure = IF(COUNTROWS(ALLSELECTED(Groups[Shipment Group])) = 1,
SUM(Report[Units]),
CALCULATE(SUM(Report[Units]),
FILTER(Report,
Report[Shipment Group] IN VALUES(Groups[Shipment Group]))))
Note: Double check that Power BI has not automatically created a relationship between the Groups and Report tables. You don't want that.

How to design querying multiple tags on analytics database

I would like to store user purchase custom tags on each transaction, example if user bought shoes then tags are "SPORTS", "NIKE", SHOES, COLOUR_BLACK, SIZE_12,..
These tags are that seller interested in querying back to understand the sales.
My idea is when ever new tag comes in create new code(something like hashcode but sequential) for that tag, and code starts from "a-z" 26 letters then "aa, ab, ac...zz" goes on. Now keep all the tags given for in one transaction in the one column called tag (varchar) by separating with "|".
Let us assume mapping is (at application level)
"SPORTS" = a
"TENNIS" = b
"CRICKET" = c
...
...
"NIKE" = z //Brands company
"ADIDAS" = aa
"WOODLAND" = ab
...
...
SHOES = ay
...
...
COLOUR_BLACK = bc
COLOUR_RED = bd
COLOUR_BLUE = be
...
SIZE_12 = cq
...
So storing the above purchase transaction, tag will be like tag="|a|z|ay|bc|cq|" And now allowing seller to search number of SHOES sold by adding WHERE condition tag LIKE %|ay|%. Now the problem is i cannot use index (sort key in redshift db) for "LIKE starts with %". So how to solve this issue, since i might have 100 millions of records? dont want full table scan..
any solution to fix this?
Update_1:
I have not followed bridge table concept (cross-reference table) since I want to perform group by on the results after searching the specified tags. My solution will give only one row when two tags matched in a single transaction, but bridge table will give me two rows? then my sum() will be doubled.
I got suggestion like below
EXISTS (SELECT 1 FROM transaction_tag WHERE tag_id = 'zz' and trans_id
= tr.trans_id) in the WHERE clause once for each tag (note: assumes tr is an alias to the transaction table in the surrounding query)
I have not followed this; since i have to perform AND and OR condition on the tags, example ("SPORTS" AND "ADIDAS") ---- "SHOE" AND ("NIKE" OR "ADIDAS")
Update_2:
I have not followed bitfield, since dont know redshift has this support also I assuming if my system will be going to have minimum of 3500 tags, and allocating one bit for each; which results in 437 bytes for each transaction, though there will be only max of 5 tags can be given for a transaction. Any optimisation here?
Solution_1:
I have thought of adding min (SMALL_INT) and max value (SMALL_INT) along with tags column, and apply index on that.
so something like this
"SPORTS" = a = 1
"TENNIS" = b = 2
"CRICKET" = c = 3
...
...
"NIKE" = z = 26
"ADIDAS" = aa = 27
So my column values are
`tag="|a|z|ay|bc|cq|"` //sorted?
`minTag=1`
`maxTag=95` //for cq
And query for searching shoe(ay=51) is
maxTag <= 51 AND tag LIKE %|ay|%
And query for searching shoe(ay=51) AND SIZE_12 (cq=95) is
minTag >= 51 AND maxTag <= 95 AND tag LIKE %|ay|%|cq|%
Will this give any benefit? Kindly suggest any alternatives.
You can implement auto-tagging while the files get loaded to S3. Tagging at the DB level is too-late in the process. Tedious and involves lot of hard-coding
While loading to S3 tag it using the AWS s3API
example below
aws s3api put-object-tagging --bucket --key --tagging "TagSet=[{Key=Addidas,Value=AY}]"
capture tags dynamically by sending and as a parameter
2.load the tags to dynamodb as a metadata store
3.load data to Redshift using S3 COPY command
You can store tags column as varchar bit mask, i.e. a strictly defined bit sequence of 1s or 0s, so that if a purchase is marked by a tag there will be 1 and if not there will be 0, etc. For every row, you will have a sequence of 0s and 1s that has the same length as the number of tags you have. This sequence is sortable, however you would still need lookup into the middle but you will know at which specific position to look so you don't need like, just substring. For further optimization, you can convert this bit mask to integer values (it will be unique for each sequence) and make matching based on that but AFAIK Redshift doesn't support that yet out of box, you will have to define the rules yourself.
UPD: Looks like the best option here is to keep tags in a separate table and create an ETL process that unwraps tags into tabular structure of order_id, tag_id, distributed by order_id and sorted by tag_id. Optionally, you can create a view that joins the this one with the order table. Then lookups for orders with a particular tag and further aggregations of orders should be efficient. There is no silver bullet for optimizing this in a flat table, at least I don't know of such that would not bring a lot of unnecessary complexity versus "relational" solution.

Max/Min for whole sets of records in PIG

I have a set set of records that I am loading from a file and the first thing I need to do is get the max and min of a column.
In SQL I would do this with a subquery like this:
select c.state, c.population,
(select max(c.population) from state_info c) as max_pop,
(select min(c.population) from state_info c) as min_pop
from state_info c
I assume there must be an easy way to do this in PIG as well but I'm having trouble finding it. It has a MAX and MIN function but when I tried doing the following it didn't work:
records=LOAD '/Users/Winter/School/st_incm.txt' AS (state:chararray, population:int);
with_max = FOREACH records GENERATE state, population, MAX(population);
This didn't work. I had better luck adding an extra column with the same value to each row and then grouping them on that column. Then getting the max on that new group. This seems like a convoluted way of getting what I want so I thought I'd ask if anyone knows a simpler way.
Thanks in advance for the help.
As you said you need to group all the data together but no extra column is required if you use GROUP ALL.
Pig
records = LOAD 'states.txt' AS (state:chararray, population:int);
records_group = GROUP records ALL;
with_max = FOREACH records_group
GENERATE
FLATTEN(records.(state, population)), MAX(records.population);
Input
CA 10
VA 5
WI 2
Output
(CA,10,10)
(VA,5,10)
(WI,2,10)

Resources