Cognos Analytics, multiple columns in crosstab but only one row in measure - business-intelligence

I have a problem where in a crosstab with multiple columns there are multiple rows of measures where I would only like to have one.
The crosstab looks like this:
|-----Amount-----|
SITE-----|---PERSON---|----------------|
----------------------|----------------|
SITE1 | James | 45 |
SITE2 | John | 34 |
SITE2 | Jones | 34 |
SITE3 | Jane | 54 |
----------------------|----------------|
TOTAL-----------------| 167 |
So the first column is the site, the second one people on the site (notice that site2 has two people). The structure is simplified, but you get the point.
what I would like to have is the following structure:
|-----Amount-----|
SITE-----|---PERSON---|----------------|
----------------------|----------------|
SITE1 | James | 45 |
SITE2 | John | 34 |
SITE2 | Jones | |
SITE3 | Jane | 54 |
----------------------|----------------|
TOTAL-----------------| 133 |
So the measure rows are generated only from the site column, not from site and person columns. This way I can calculate the total amount across sites, not across persons. Currently the duplicate row(s) cause the total value to be higher than it actually is.
Is there a way to achieve this using crosstab, or do I need to think some other approach (second list to show sites and persons) for this use-case.
<--------------------EDIT-------------------->
I have mistakenly explained the amount column in my example. I have a table containing sales events and the amount measure should actually be the number of sales events per site. So what I'm trying to achieve is an question: For a given type of a sales event list the sites where these sales occurred, list the persons working on that site and list the total number of sales events on said site. So basically I'm fetching all the sales events with some filter (type=something). These sales events have a site where they occurred. that site has zero to n employees. So there's one inner join between sales event and site, and outer join between site and person table. The SQL query returns data like this:
sales_event_1|site1|James|type1|subtype2
sales_event_2|site2|John|type1|subtype1
sales_event_2|site2|Jones|type1|subtype1
sales_event_3|site2|John|type1|subtype2
sales_event_3|site2|Jones|type1|subtype2
sales_event_4|site3|Jane|type1|subtype1
...
So the crosstab structure is the following:
Rows= site|person
Columns= subtype
measure= count (distinct [sales_event_id] for [site])
And crosstab looks something like this:
|-----subtype1----|-----subtype2----|-----total----|
SITE-----|---PERSON---|-----------------|-----------------|--------------|
----------------------|-----------------|-----------------|--------------|
SITE1 | James | 35 | 10 | 45 |
SITE2 | John | 20 | 14 | 34 |
SITE2 | Jones | 20 | 14 | 34 |
SITE3 | Jane | 54 | 0 | 54 |
--------------------------|-------------|-----------------|--------------|
TOTAL-----------------|-----------------|-----------------| 133 |
I hope this helps you guys.

Create a new data item
total([Sales] for [Site])
Use that as the metric for the crosstab
Next, click on the metric, and set the property Group Span to be [Site]

It is good that you understand your data well enough to recognize that you are getting the wrong results. It would help you to know that the term is double-counting.
In your case, the grain of the amount fact is on the site level. I'm assuming that person is an attribute in the same dimension (the relational thing; not the thing with members, hierarchies, and levels, although that is built on concepts from the relational thing (read Kimball )). Your report is trying to project the query below the grain of the fact and you get double counting.
You ought to have determinants defined in your model (if you are using a Framework Manager package) or column dependency (if you are using a data module). These are things set up to tell the query engine the fact grains and what objects in a dimension are at which grain, to tell the query engine how to aggregate facts in a multi-fact multi-grain situation, and how to deal with attempts to project a query below the grain of a fact.
Because it would be defined in your model, it would be available to every report that you create and every report that ordinary users create, which would be better than trying to create handling for these sorts of situations in every report you create and hoping that your ordinary users know what to do, which they probably won't.
The fact that you don't have determinants set up that would suggest that your organization's modeller might have let your team down in other ways. For example, not handling role-playing and disambiguating query paths.

Related

Filter after grouping columns in Power BI

I want to accomplish something easy to understand (and maybe easy to do but I can't find a way...).
I have a table which represents the date when a client has bought something.
Let's have this example:
=============================================
Purchase_id | Purchase_date | Client_id
=============================================
1 | 2016/03/02 | 1
---------------------------------------------
2 | 2016/03/02 | 2
---------------------------------------------
3 | 2016/03/11 | 3
---------------------------------------------
I want to create a single number card which will be the average of purchase realised by day.
So for this example, the result would be:
Result = 3 purchases / 2 different days = 1.5
I managed doing it by grouping in my query by Purchase_date and my new column is the number of rows.
It gives me the following query:
==================================
Purchase_date | Number of rows
==================================
2016/03/02 | 2
----------------------------------
2016/03/11 | 1
----------------------------------
Then I put the field Number of rows in a single number card, selecting "Average".
I have to precise that I am using Direct Query with SQL Server.
But the problem is that I want to have a filter on the Client_id. And once I do the grouping, I lose this column.
Is there a way to have this Client_id as a parameter?
Maybe even the fact of grouping is not the right solution here.
Thank you in advance.
You can create a measure to calculate this average.
From Power BI's docs:
The calculated results of measures are always changing in response to
your interaction with your reports, allowing for fast and dynamic
ad-hoc data exploration
This means filtering client_id's will change the measure accordingly.
Here is an easy way of defining this measure:
Result = DISTINCTCOUNT(tableName[Purchase_date])/DISTINCTCOUNT(tableName[Purchase_id])

How to handle scenario of same city with multiple names

Ok, I have a list with some contacts on it filled by respective persons. But persons living in same city might write different names of cities(that I don't have any control on sice the names of cities may change with changing government).
For example:
NAME CITY
John Banaglore
Amit Bengaluru
Here both the Bangalore and the Bengaluru refer to the same city. How can I make sure(my be programatically) that my system does not consider as two different cities but one, while traversing the list.
One solution I could think of is to have a notion of unique ids attached to each city, but that requires recreating the list and also I have to train the my contacts about the unique id notion.
Any thoughts are appreciated.
Please feel free to route this post to any other stackexchange.com site if you think it does not belong here or update the tags.
I would recommend creating a table alias_table which maps city aliases to a single common name:
+------------+-----------+
| city_alias | city_name |
+------------+-----------+
| Banaglore | Bangalore |
| Bengaluru | Bangalore |
| Bangalore | Bangalore |
| Mumbai | Bombay |
| Bombay | Bombay |
+------------+-----------+
When you want do any manipulation of the table in your OP, you can join the CITY column to the city_alias column above as follows:
SELECT *
FROM name_table nt INNER JOIN alias_table at
ON nt.CITY = at.city_alias
I think the best way is to provide selection from a list of existing Cities and do not allow to enter it by the user manually.
But if you have data already it is more reliable to use alias table proposed by #Tim Biegeleisen
In addition, some automation could be added, as an example, to check is it correct to ignore 2 words difference in a case when it is not first letter by put it on the aliases table with mark as candidate for future review.
Here is examples of reasons for exclusion of first letter:
Kiev = Kyiv
Lviv != Kiev

How to optimize massive amounts of data in mysql tables using a join

Perhaps using a JOIN isn't the best option here, but here's the scenario:
I have two tables, one is for houses, the other for objects in that house.
I have 50 houses and 8000 objects.
Lastly, each object will be either black or white (boolean).
Each object must be associated with each house and each object must be either black or white, which means, through my current design, there are going to be 400,000 records (8,000 ones, 8,000 twos all the way up to 50) in the objects table! Not the best for optimization. And my site turned into geriatric snails smoking ganja when I tried to load the query on my webpage. It died.
The table I have for houses looks like this:
==============================
House| Other cols | Other cols
==============================
1 | |
2 | |
3 | |
4 | |
to 50
The table I have for objects looks like this:
============================
House_ID | Object | Color
============================
1 | 1 | 1
1 | 2 | 1
1 | 3 | 0
1 | 4 | 1
1 | 5 | 0
"House_ID" increments to 2 once "Object" reaches 8,000. This incrementing continues until House_ID reaches 50.
There must be a better way to create an association between the house and the objects where each object must have that specific house ID and it is not quite so taxing on the server.
BTW, I'm using an INNER JOIN to combine both tables. I think this might be wrong, but don't know a way around it. Doing SQL queries in phpMyAdmin.
How would I join or set up my table/queries so that it's not so cumbersome?
You probably need to investigate indexing your tables. This is actually a fairly small data set for what you are doing.
If your table names are houses and objects, try:
CREATE INDEX houses_index ON houses (House)
and
CREATE INDEX house_objects_index ON objects (HouseID,Object)
This will make your queries run MUCH faster, if, as I presume, indexes do not already exist.
(You might also want to keep you column names consistent between tables; calling the field House in one table and HouseID in another is, I think, more confusing than calling it HouseID both places.)

Using HBase for analytics

I'm almost completely new to HBase. I would like to take my current site tracking based on MySQL and put it to HBase because MySQL simply doesn't scale anymore.
I'm totally lost int eh first step...
I need to track different actions of users and need to be able to aggregate them by some aspects (date, country they come from, product they performed the action with, etc...)
The way I store it currently is that I have a table with a composite PK with all these aspects (country, date, product, ...) and the rest of the fields are counters for actions. When an action is performed, I insert it to the table incrementing the action's column by one (ON DUPLICATE KEY UPDATE...).
*date | *country | *product | visited | liked | put_to_basket | purchased
2011-11-11 | US | 123 | 2 | 1 | 0 | 0
2011-11-11 | GB | 123 | 23 | 10 | 5 | 4
2011-11-12 | GB | 555 | 54 | 0 | 10 | 2
I have a feeling that this is completely against the HBase way, and also doesn't really scale (with the growing number if keys inserts get expensive) and not really flexible.
How to track user actions with it attributes effectively in HBase? How table(s) should look like? Where MapReduce comes in the picture?
Thanks for all suggestions!
Lars George's "HBASE: the definitive guide" explains a design very similar to what you want to achieve in the introduction chapter
This can be done as follows,
Have the unique row id in Hbase as follows,
rowid = date + country + product ---> append these into a single entity and have it as key.
Then have the counters as columns. So when you get an event like,
if(event == liked){
increment the liked column of the hbase by 1 for the corresponding key combination.
}
and so on for other cases.
Hope this helps!!

SSRS Report Based on Query from SSAS Runs much slower than cube browse in SSAS

I have a report in SQL Reporting Services (SSRS) that pulls data from an SQL Analysis Services (SSAS) cube. The Cube has two important dimensions - Time and Activity that are related (it's a report on activity over time). The activity dimension has a single unique key, and attributes to indicate who performed the activity. Measures are simple counts and percentages of types of activity and their results.
The report looks something like this:
Report for user: xyx
Report Period: 1/1/2011 - 3/1/2011
Type of Activity | Submitted | Completed | Success Rate
Type 1 | 50 | 20 | 40%
--------------------------------------------------------
Type 2 | 50 | 20 | 40%
--------------------------------------------------------
Type 3 | 50 | 20 | 40%
--------------------------------------------------------
Type 4 | 50 | 20 | 40%
--------------------------------------------------------
Type 5 | 50 | 20 | 40%
--------------------------------------------------------
Total | 250 | 100 | 40%
If I browse the cube is SQL Management studio, I get the results in a fraction of a second. In SSRS it takes upwards of 7 minutes to generate. The Execution Log for SSRS shows time pretty evenly split in retrieval/processing/rendering at:
> TimeDataRetrieval TimeProcessing TimeRendering
> 170866 142324 154689
I suspect it has to do with how the report is filtered, but I don't know how investigate that.
What should I look at next to figure out why SSRS seems to take so long when doing the browse in SSAS is really fast (and the actual reports aren't much bigger than my sample, 3 more rows and a few more columns)?
Have you compared the queries being generated by SSMS and SSRS to see if they are the same? That would be my next step. SSRS has been known to generate terribly inefficient queries on occasion...when a dataset is built via the drag-n-drop designer.

Resources