Sotring massive data by a parameter in Google Sheets - sorting

I have a massive table with data that looks like. Let's call it "Initial data"
place | phone number | prize ($)| promotion | client status | personal manager
Every column has own data in it. And there can be doubles.
What's the goal
To make new sheet list (call it 'Sorted data'), where we have columns
sort Parameter | phone number | number of prize places | client status | personal manager | Average prize | average place
We have such sort parameters:
by number of prize places in all the promotions
by status
by personal manager
by average place
by average prize
So when we choose sort parameter we have sorted data in other columns by this parameter
Any ideas on how to it can be made?

if Sheet1 looks like this:
then Sheet2:
=ARRAYFORMULA({Sheet1!A1:F1; SORT(Sheet1!A2:F, MATCH(B1,
{"place","phone number","price","promotion","client status","personal manager"}, 0),
IF(B2="ascending", 1, 0))})
spreadsheet demo

Related

Cognos Analytics, multiple columns in crosstab but only one row in measure

I have a problem where in a crosstab with multiple columns there are multiple rows of measures where I would only like to have one.
The crosstab looks like this:
|-----Amount-----|
SITE-----|---PERSON---|----------------|
----------------------|----------------|
SITE1 | James | 45 |
SITE2 | John | 34 |
SITE2 | Jones | 34 |
SITE3 | Jane | 54 |
----------------------|----------------|
TOTAL-----------------| 167 |
So the first column is the site, the second one people on the site (notice that site2 has two people). The structure is simplified, but you get the point.
what I would like to have is the following structure:
|-----Amount-----|
SITE-----|---PERSON---|----------------|
----------------------|----------------|
SITE1 | James | 45 |
SITE2 | John | 34 |
SITE2 | Jones | |
SITE3 | Jane | 54 |
----------------------|----------------|
TOTAL-----------------| 133 |
So the measure rows are generated only from the site column, not from site and person columns. This way I can calculate the total amount across sites, not across persons. Currently the duplicate row(s) cause the total value to be higher than it actually is.
Is there a way to achieve this using crosstab, or do I need to think some other approach (second list to show sites and persons) for this use-case.
<--------------------EDIT-------------------->
I have mistakenly explained the amount column in my example. I have a table containing sales events and the amount measure should actually be the number of sales events per site. So what I'm trying to achieve is an question: For a given type of a sales event list the sites where these sales occurred, list the persons working on that site and list the total number of sales events on said site. So basically I'm fetching all the sales events with some filter (type=something). These sales events have a site where they occurred. that site has zero to n employees. So there's one inner join between sales event and site, and outer join between site and person table. The SQL query returns data like this:
sales_event_1|site1|James|type1|subtype2
sales_event_2|site2|John|type1|subtype1
sales_event_2|site2|Jones|type1|subtype1
sales_event_3|site2|John|type1|subtype2
sales_event_3|site2|Jones|type1|subtype2
sales_event_4|site3|Jane|type1|subtype1
...
So the crosstab structure is the following:
Rows= site|person
Columns= subtype
measure= count (distinct [sales_event_id] for [site])
And crosstab looks something like this:
|-----subtype1----|-----subtype2----|-----total----|
SITE-----|---PERSON---|-----------------|-----------------|--------------|
----------------------|-----------------|-----------------|--------------|
SITE1 | James | 35 | 10 | 45 |
SITE2 | John | 20 | 14 | 34 |
SITE2 | Jones | 20 | 14 | 34 |
SITE3 | Jane | 54 | 0 | 54 |
--------------------------|-------------|-----------------|--------------|
TOTAL-----------------|-----------------|-----------------| 133 |
I hope this helps you guys.
Create a new data item
total([Sales] for [Site])
Use that as the metric for the crosstab
Next, click on the metric, and set the property Group Span to be [Site]
It is good that you understand your data well enough to recognize that you are getting the wrong results. It would help you to know that the term is double-counting.
In your case, the grain of the amount fact is on the site level. I'm assuming that person is an attribute in the same dimension (the relational thing; not the thing with members, hierarchies, and levels, although that is built on concepts from the relational thing (read Kimball )). Your report is trying to project the query below the grain of the fact and you get double counting.
You ought to have determinants defined in your model (if you are using a Framework Manager package) or column dependency (if you are using a data module). These are things set up to tell the query engine the fact grains and what objects in a dimension are at which grain, to tell the query engine how to aggregate facts in a multi-fact multi-grain situation, and how to deal with attempts to project a query below the grain of a fact.
Because it would be defined in your model, it would be available to every report that you create and every report that ordinary users create, which would be better than trying to create handling for these sorts of situations in every report you create and hoping that your ordinary users know what to do, which they probably won't.
The fact that you don't have determinants set up that would suggest that your organization's modeller might have let your team down in other ways. For example, not handling role-playing and disambiguating query paths.

Oracle BI EE filter on same dimension

I am new to OBIEE and would like to create an analysis where I can place one next to the other 2 columns with figures from same dimension but with different data.
To better explain it: let's say that in Dim1 we have Invoices and Payments as members. We also have other dims as Date, Invoice Number and so on. This would be the current output:
Date | Dim1 | Invoice Number | Amount
10/01/17 Invoice 1234 -450
10/02/17 Payment 1234 450
So, what I want is, instead of creating 2 reports, one for the Invoices and the other one for Payments, a single report with the following output:
Invoice Date | Invoice | Payment date | Payment | Invoice Number | Amount inv | Amount paid
10/01/17 Invoice 10/02/17 Payment 1234 -450 450
Is this kind of output achievable inside OBIEE?
Thanks!
You are not trying to "filter on same dimension" but you are trying to convert rows into columns.
While it is possible to cheat your way around this it is definitely not something which is suggested! You are facing an analytical system - not Excel.
If this is an actual requirement and not simply a "I wish to see it this way" then the best approach is to store the data properly.
Second-best approach is to model it in the RPD with different logical table sources.
Last and the option NOT to go for right away is what you are asking for: Doing it in the front-end.
Apart from that: It's "analyses" that you are working with in OBI. If you have a "report" then you are in BI Publisher which is a completely different tool.

Filter after grouping columns in Power BI

I want to accomplish something easy to understand (and maybe easy to do but I can't find a way...).
I have a table which represents the date when a client has bought something.
Let's have this example:
=============================================
Purchase_id | Purchase_date | Client_id
=============================================
1 | 2016/03/02 | 1
---------------------------------------------
2 | 2016/03/02 | 2
---------------------------------------------
3 | 2016/03/11 | 3
---------------------------------------------
I want to create a single number card which will be the average of purchase realised by day.
So for this example, the result would be:
Result = 3 purchases / 2 different days = 1.5
I managed doing it by grouping in my query by Purchase_date and my new column is the number of rows.
It gives me the following query:
==================================
Purchase_date | Number of rows
==================================
2016/03/02 | 2
----------------------------------
2016/03/11 | 1
----------------------------------
Then I put the field Number of rows in a single number card, selecting "Average".
I have to precise that I am using Direct Query with SQL Server.
But the problem is that I want to have a filter on the Client_id. And once I do the grouping, I lose this column.
Is there a way to have this Client_id as a parameter?
Maybe even the fact of grouping is not the right solution here.
Thank you in advance.
You can create a measure to calculate this average.
From Power BI's docs:
The calculated results of measures are always changing in response to
your interaction with your reports, allowing for fast and dynamic
ad-hoc data exploration
This means filtering client_id's will change the measure accordingly.
Here is an easy way of defining this measure:
Result = DISTINCTCOUNT(tableName[Purchase_date])/DISTINCTCOUNT(tableName[Purchase_id])

Order By any field in Cassandra

I am researching cassandra as a possible solution for my up coming project. The more I research the more I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created.
Is it possible to sort on any field? If there is a performance impact for sorting on fields not in the cluster what is that performance impact? I need to sort around or about 2 million records in the table.
I keep hearing that it is a bad idea to sort on fields that is not setup for sorting when the table was created.
It's not so much that it's a bad idea. It's just really not possible to make Cassandra sort your data by an arbitrary column. Cassandra requires a query-based modeling approach, and that goes for sort order as well. You have to decide ahead of time the kinds of queries you want Cassandra to support, and the order in which those queries return their data.
Is it possible to sort on any field?
Here's the thing with how Cassandra sorts result sets: it doesn't. Cassandra queries correspond to partition locations, and the data is read off of the disk and returned to you. If the data is read in the same order that it was sorted in on-disk, the result set will be sorted. On the other hand if you try a multi-key query or an index-based query where it has to jump around to different partitions, chances are that it will not be returned in any meaningful order.
But if you plan ahead, you can actually influence the on-disk sort order of your data, and then leverage that order in your queries. This can be done with a modeling mechanism called a "clustering column." Cassandra will allow you to specify multiple clustering columns, but they are only valid within a single partition.
So what does that mean? Take this example from the DataStax documentation.
CREATE TABLE playlists (
id uuid,
artist text,
album text,
title text,
song_order int,
song_id uuid,
PRIMARY KEY ((id),song_order))
WITH CLUSTERING ORDER BY (song_order ASC);
With this table definition, I can query a particular playlist by id (the partition key). Within each id, the data will be returned ordered by song_order:
SELECT id, song_order, album, artist, title
FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204
ORDER BY song_order DESC;
id | song_order | album | artist | title
------------------------------------------------------------------------------------------------------------------
62c36092-82a1-3a00-93d1-46196ee77204 | 4 | No One Rides For Free | Fu Manchu | Ojo Rojo
62c36092-82a1-3a00-93d1-46196ee77204 | 3 | Roll Away | Back Door Slam | Outside Woman Blues
62c36092-82a1-3a00-93d1-46196ee77204 | 2 | We Must Obey | Fu Manchu | Moving in Stereo
62c36092-82a1-3a00-93d1-46196ee77204 | 1 | Tres Hombres | ZZ Top | La Grange
In this example, if I only need to specify an ORDER BY if I want to switch the sort direction. As the rows are stored in ASCending order, I need to specify DESC to see them in DESCending order. If I was fine with getting the rows back in ASCending order, I don't need to specify ORDER BY at all.
But what if I want to order by artist? Or album? Or both? Since one artist can have many albums (for this example), we'll modify the PRIMARY KEY definition like this:
PRIMARY KEY ((id),artist,album,song_order)
Running the same query above (minus the ORDER BY) produces this output:
SELECT id, song_order, album, artist, title
FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204;
id | song_order | album | artist | title
------------------------------------------------------------------------------------------------------------------
62c36092-82a1-3a00-93d1-46196ee77204 | 3 | Roll Away | Back Door Slam | Outside Woman Blues
62c36092-82a1-3a00-93d1-46196ee77204 | 4 | No One Rides For Free | Fu Manchu | Ojo Rojo
62c36092-82a1-3a00-93d1-46196ee77204 | 2 | We Must Obey | Fu Manchu | Moving in Stereo
62c36092-82a1-3a00-93d1-46196ee77204 | 1 | Tres Hombres | ZZ Top | La Grange
Notice that the rows are now ordered by artist, and then album. If we had two songs from the same album, then song_order would be next.
So now you might ask "what if I just want to sort by album, and not artist?" You can sort just by album, but not with this table. You cannot skip clustering keys in your ORDER BY clause. In order to sort only by album (and not artist) you'll need to design a different query table. Sometimes Cassandra data modeling will have you duplicating your data a few times, to be able to serve different queries...and that's ok.
For more detail on how to build data models while leveraging clustering order, check out these two articles on PlanetCassandra:
Getting Started With Time Series Data Modeling - Patrick McFadin
We Shall Have Order! - Disclaimer - I am the author

Simplifying a Cascading pipeline used for aggregating sales data

I'm very new to Cascading and Hadoop both, so be gentle... :-D
I think I'm finding myself way over-engineering something. Basically my situation is that I have a pipe delimited file with 9 fields. I want to compute some aggregated statistics over those 9 fields using different groupings. The result should be 10 fields of which only 6 are either counts or sums. So far I'm up to 4 Unique pipes, 4 CountBy pipes, 1 SumBy, 1 GroupBy, 1 Every, 2 Each, 5 CoGroups and a couple others. I'm needing to add another small piece of functionality and the only way I can see to do it is to add in 2 Filters, 2 more CoGroups and 2 more Each pipes. This all seems like way overkill just to compute a few aggregated statistics. So I'm thinking I'm really misunderstanding something.
My input file looks like this:
storeID | invoiceID | groupID | customerID | transaction date | quantity | price | item type | customer type
Item type is either "I", "S" or "G" for inventory, service or group items, customers belong to groups. The rest should be self-explanatory
The result I want is:
project ID | storeID | year | month | unique invoices | unique groups | unique customers | customer visits | inventory type sales | service type sales |
project ID is a constant, customer visits is how many days during the month the customer came in and bought something
The setup that I'm using right now uses a TextDelimited Tap as my source to read the file and passes the records to an Each pipe which uses a DateParser to parse the transaction date and adds in year, month and day fields. So far so good. This is where it gets out of control.
I'm splitting the stream from there up into 5 separate streams to process each of the aggregated fields that I want. Then I'm joining all the results together in 5 CoGroup pipes, sending the result through Insert (to insert the project ID) and writing through a TextDelimited sink Tap.
Is there an easier way than splitting into 5 streams like that? The first four streams do almost the exact same thing just on different fields. For example, the first stream uses a Unique pipe to just get unique invoiceID's then uses a CountBy to count the number of records with the same storeID, year and month. That gives me the number of unique invoices created for each store by year and month. Then there is a stream that does the same thing with groupID and another that does it with customerID.
Any ideas for simplifying this? There must be an easier way.

Resources