Is there a way to rank multiple columns in power query? - ranking

I am setting up a query where I need to rank multiple columns. I was able to sort the first column in descending order and inserted an index column. However, I am not able to rank the other columns.
I have included an example below:
Table to show agent performance
Agent | surveys | rank | outcalls |total calls |outcalls/total calls |rank
Dallas | 80% | 1 | 50 | 80 | 62.5% | ?
May | 75% | 2 | 90 | 100 | 90.0% | ?
Summer | 60% | 3 | 60 | 75 | 80.0% | ?
So basically from the example above, I was able to add an index column that ranked the surveys. How can I rank the outcalls/total calls column while still maintaining the rank in the other columns?

In this case, a simple approach would be to sort on outcalls/total calls, add another index column, and then sort the first rank column if you want to revert back to your starting order.

Related

How to arrange multi partitions in hive?

say i have a order table, which contains multi time column(spend_time,expire_time,withdraw_time),
usually,i will query the table with the above column independently,so how do i create the partitions?
order_no | spend_time | expire_time | withdraw_time | spend_amount
A001 | 2017/5/1 | 2017/6/1 | 2017/6/2 | 100
A002 | 2017/4/1 | 2017/4/19 | 2017/4/25 | 500
A003 | 2017/3/1 | 2017/3/19 | 2017/3/25 | 1000
Usually the business situation is to calculate total spend_amount between certain spend_time or expire_time or withdraw_time, or the combination of the 3.
But with 3 time dimensions cross combination(each has about 1000 partitions) can be a lot of partitions(1000*1000*1000),is that ok and efficient?
my solution is that i create 3 tables with 3 different columns.Is this a efficient way to solve this problem?

Get 1 value of each date SSRS

Ussing SSRS, I have data with duplicate values in Field1. I need to get only 1 value of each month.
Field1 | Date |
----------------------------------
30 | 01.01.1990 |
30 | 01.01.1990 |
30 | 01.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
50 | 02.01.1990 |
40 | 03.01.1990 |
40 | 03.01.1990 |
40 | 03.01.1990 |
It should be ssrs expression with average value of each month or mb there are other solutions to get requested data by ssrs expression. Requested data in table:
30 | 01.01.1990 |
50 | 02.01.1990 |
40 | 03.01.1990 |
Hope for help.
There is no SumDistinct function in SSRS, and it is real lack of it (CountDistinct exist although). So you obviously can't achieve what you want easy way. You have two options:
Implement a new stored procedure with select distinct, returning reduced set of fields to avoid repeated data that you need. You then need to use this stored procedure to build new dataset and use in your table. But this way obviously may be not applicable in your case.
The other option is to implement your own function, which will save state of aggregation and perform distinct sum. Take a look at this page, it contains examples of code that you need.

Using a non-literal value in Apache Derby's OFFSET clause

Using Derby, is it possible to offset by a value from the query rather than an integer literal?
When I run this query, it complains about the value I've given to the offset clause.
select
PRIZE."NAME" as "Prize Name",
PRIZE."POSITION" as "Position",
(select
PARTICIPANT."NAME"
from PARTICIPANT
order by POINTS desc
offset PRIZE."POSITION" rows fetch next 1 row only <-- notice I'm trying to pass in a value to offset by
) as "Participant"
from PRIZE
With the expectation that the results would look like this:
| Prize Name | Position | Participant |
|--------------|----------|---------------|
| Gold medal | 1 | Mari Loudi |
| Silver medal | 2 | Keesha Vacc |
| Bronze medal | 3 | Melba Hammit |
| Hundredth | 100 | James Thornby |
The documentation suggests that it's possible to pass in a value from java code, but I'm trying to use a value from the query itself.
By the way, this is just an example schema to illustrate the point.
I know there are other ways to achieve the ranking, but I'm specifically interested if there's a way to pass values to the offset clause.

ORACLE Performance: (Hash) Join vs Subselect With Order By's in Huge Tables

I've got the following (abstract) problem in ORACLE 11g:
There are two tables called STRONG and WEAK.
The primary key of WEAK is a compound consisting of the primary key of STRONG (called pk) plus an additional column (called fk).
Thus we have a generic 1:N relationship.
However, in the real world application, there is always exactly one row within the N WEAK rows related to an entry of STRONG that has a special relationship.
This is why there is an additional 1:1 relationship realised by adding the column fk to STRONG as well.
Furthermore, it might be worth noting that both tables are huge but well indexed.
The overall picture looks like this:
Now, I have to define a view showing rows of STRONG along with some additional columns linked by that 1:1 relationship. I tried two basic approaches:
Subselects
SELECT
(SELECT some_data1 FROM weak WHERE weak.pk = strong.pk AND weak.fk = strong.fk) some_data1,
(SELECT some_data2 FROM weak WHERE weak.pk = strong.pk AND weak.fk = strong.fk) some_data2
FROM strong
Left Outer Join
SELECT
weak.some_data1,
weak.some_data2
FROM strong
LEFT OUTER JOIN weak ON weak.pk = strong.pk AND weak.fk = strong.fk
I first thought that the "Left Outer Join"-way has to be better and I still think that this is true as long as there is no WHERE/ORDER_BY-clause. However, in the real world application, user query dialog inputs are dynamically
translated into extensions of the above statements. Typically, the user knows the primary key of STRONG resulting in queries like this:
SELECT *
FROM the_view
WHERE the_view.pk LIKE '123%' --Or even the exact key
ORDER BY the_view.pk
Using the "Left Outer Join"-way, we encountered some very serious performance problems, even though most of these SELECTs only return a few rows. I think what happened is that the hash table did not fit into the
memory resulting in way too many I/O-events. Thus, we went back to the Subselects.
Now, i have a few questions:
Q1
Does Oracle have to compute the entire hash table for every SELECT (with ORDER_BY)?
Q2
Why is the "Subselect"-way faster? Here, it might be worth noting that these columns can appear in the WHERE-clause as well.
Q3
Does it somehow matter that joining the two tables might potentially increase the number of selcted rows? If so: Can we somehow tell Oracle that this can never happen from a logical perspective?
Q4
In case that the "Left Outer Join"-Way is not a well-performing option: The "Subselect"-way does seem somewhat redundant. Is there a better way?
Thanks a lot!
EDIT
Due to request, I will add an explanation plan of the actual case. However, there are a few important things here:
In the above description, I tried to simplify the actual problem. In the real world, the view is a lot more complex.
One thing I left out due to simplification is that the performance issues mainly occur when using the STRONG => WEAK-Join in a nested join (see below). The actual situation looks like this:
ZV is the name of our target view - the explanation plan below refers to that view.
Z (~3M rows) join T (~1M rows)
T joins CCPP (~1M rows)
TV is a view based on T. Come to think of it... this might be critical. The front end application sort of restricts us in the way we define these views: In ZV, we have to join TV instead of T and we can not implement that T => CCPP-join in TV, forcing us to define the join TV => CCPP as a nested join.
We only encountered the performance problems in our productive environment with lots of user. Obviously, we had to get rid of these problems. Thus, it can not be reproduced right now. The response time of the statements below are totally fine.
The Execution Plan
---------------------------------------------------------------------------- ----------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
---------------------------------------------------------------------------- ----------------------------------
| 0 | SELECT STATEMENT | | 717K| 73M| | 13340 (2)|
| 1 | HASH JOIN OUTER | | 717K| 73M| 66M| 13340 (2)|
| 2 | VIEW | | 687K| 59M| | 5 (0)|
| 3 | NESTED LOOPS OUTER | | 687K| 94M| | 5 (0)|
| 4 | NESTED LOOPS OUTER | | 1 | 118 | | 4 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID | Z | 1 | 103 | | 3 (0)|
| 6 | INDEX UNIQUE SCAN | SYS_C00245876 | 1 | | | 2 (0)|
| 7 | INDEX UNIQUE SCAN | SYS_C00245876 | 1798K| 25M| | 1 (0)|
| 8 | VIEW PUSHED PREDICATE | TV | 687K| 17M| | 1 (0)|
| 9 | NESTED LOOPS OUTER | | 1 | 67 | | 2 (0)|
| 10 | TABLE ACCESS BY INDEX ROWID| T | 1 | 48 | | 2 (0)|
| 11 | INDEX UNIQUE SCAN | SYS_C00245609 | 1 | | | 1 (0)|
| 12 | INDEX UNIQUE SCAN | SYS_C00254613 | 1 | 19 | | 0 (0)|
| 13 | TABLE ACCESS FULL | CCPP | 5165K| 88M| | 4105 (3)|
--------------------------------------------------------------------------------------------------------------
The real question is - how many records does your query return?
10 records only or 10.000 (or 10M) and you expect to see the first 10 rows quickly?
For the letter case the subquery solution works indeed better as you need no sort and you lookup the WEAK table only small number of times.
For the former case (i.e. the number of selected rows in both table is small) I'd expect execution plan as follows:
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 336 | 100 (1)| 00:00:02 |
| 1 | SORT ORDER BY | | 4 | 336 | 100 (1)| 00:00:02 |
| 2 | NESTED LOOPS OUTER | | 4 | 336 | 99 (0)| 00:00:02 |
| 3 | TABLE ACCESS BY INDEX ROWID| STRONG | 4 | 168 | 94 (0)| 00:00:02 |
|* 4 | INDEX RANGE SCAN | STRONG_IDX | 997 | | 4 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| WEAK | 1 | 42 | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | WEAK_IDX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("STRONG"."PK" LIKE '1234%')
filter("STRONG"."PK" LIKE '1234%')
6 - access("WEAK"."PK"(+)="STRONG"."PK" AND "WEAK"."FK"(+)="STRONG"."FK")
filter("WEAK"."PK"(+) LIKE '1234%')
If you see FULL TABLE SCAN on one or other table - the optimizes impression could be that the predicate pk LIKE '123%' will return too much records and the index access will be slower.
This could be a good or bad guess, so you may need to check your table statistics and cardinality estimation.
Some additional information follows
Q1
If Oracle performs a HASH JOIN the whole data source (typically the smaller one) must be read in memory
in the hash table. This is the whole table or the part of it as filtered by the WHERE/ON clause.
(In your case only records with pk LIKE '123%' )
Q2
This may be only an impression, as you see quickly first records. The subquery is performed
only for the first few fetched rows.
To know the exact answer you must examine (or post) the execution plans.
Q3
No, sorry, joining of the two tables NEVER potentially increase the number of selcted rows but returns exact the number of rows
as defined in the SQL standard.
It is your responsibility to define the join on a unique / primary key to avoid duplication.
Q4
You may of course select something like some_data1 ||'#'||some_data2 in the subquery, but it is in your responsibility
to decide if it is safe..

Select only the rows which timestamp correspond to the current month

I am starting to try some experiments using Google SpreadSheets as a DB and for that I am collecting data from different sources and inserting them via spreadsheets API into a sheet.
Each row has a value (Column B) and a timestamp (Column A).
+---------------------+------+
| ColA | ColB |
+---------------------+------+
| 13/10/2012 00:19:01 | 42 |
| 19/10/2012 00:29:01 | 100 |
| 21/10/2012 00:39:01 | 23 |
| 22/10/2012 00:29:01 | 1 |
| 23/10/2012 00:19:01 | 24 |
| 24/10/2012 00:19:01 | 4 |
| 31/10/2012 00:19:01 | 2 |
+---------------------+------+
What I am trying to do is to programmatically add the sum of all rows in Column B where Column A is equal to the current month into a different cell.
Is there any function that I can use for that? Or anyone can point me to the right direction on how can I create a custom function which might do something like this? I know how to do this using MySQL but I couldn't find anything for Google SpreadSheets
Thanks in advance for any tip in the right direction.
Would native spreadsheet functions do?
=ArrayFormula(SUMIF(TEXT(A:A;"MM/yyyy");TEXT(GoogleClock();"MM/yyyy");B:B))

Resources