80% Rule Estimation Value in PL/SQL - oracle

Assume a range of values inserted in a schema table and in the end of the month i want to apply for these records (i.e. 2500 rows = numeric values) the algorithm: sort the values descending (from the smallest to highest value) and then find the 80% value of the sorted column.
In my example, if each row increases by one starting from 1, the 80% value will be the 2000 row=value (=2500-2500*20/100). This algorithm needs to be implemented in a procedure where the number of rows is not constant, for example it can varries from 2500 to 1,000,000 per month

Hint: You can achieve this using Oracle's cumulative aggregate functions. For example, suppose your table looks like this:
MY_TABLE
+-----+----------+
| ID | QUANTITY |
+-----+----------+
| A | 1 |
| B | 2 |
| C | 3 |
| D | 4 |
| E | 5 |
| F | 6 |
| G | 7 |
| H | 8 |
| I | 9 |
| J | 10 |
+-----+----------+
At each row, you can sum the quantities so far using this:
SELECT
id,
quantity,
SUM(quantity)
OVER (ORDER BY quantity ROWS UNBOUNDED PRECEDING)
AS cumulative_quantity_so_far
FROM
MY_TABLE
Giving you:
+-----+----------+----------------------------+
| ID | QUANTITY | CUMULATIVE_QUANTITY_SO_FAR |
+-----+----------+----------------------------+
| A | 1 | 1 |
| B | 2 | 3 |
| C | 3 | 6 |
| D | 4 | 10 |
| E | 5 | 15 |
| F | 6 | 21 |
| G | 7 | 28 |
| H | 8 | 36 |
| I | 9 | 45 |
| J | 10 | 55 |
+-----+----------+----------------------------+
Hopefully this will help in your work.

Write a query using the percentile_disc function to solve your problem. Sounds like it does what you want.
An example would be
select percentile_disc(0.8) within group (order by the_value)
from my_table

Related

How to pivot data in Hive?

First, I've checked other topics on the subject like this one How to transpose/pivot data in hive? but that doesn't match with what I want.
So this is the table I have
| ID | Day | Status |
| 1 | 1 | A |
| 2 | 10 | B |
| 3 | 101 | A |
| 3 | 322 | B |
| 3 | 102 | C |
| 3 | 354 | D |
And i'd like to concat the different Status for each IDs ordering by the Day, in order to have this :
| ID | Status |
| 1 | A |
| 2 | B |
| 3 | A,C,B,D |
The thing is that I don't know how many status I can have, so i can't create as many columns I want for the days since I don't know how many day/status I'll have, so the answers from other topics with group_map or others, I don't know how to adapt it for my problem.
Thank's for helping me ^^
use collect_set (for distinct values) or collect_list to aggregate array and concatenate it using concat_ws:
select ID, concat_ws(',',collect_list(Status)) as Status
from table
group by ID;

How to categorize to age ranges and find the top 3?

Having a hive table with age column consisting of age of persons.
Have to count and display the top 3 age categories.
Ex: whether below 10, 10-15, 15-20, 20-25, 25-30, ...
Which age category appears more.
Please suggest me a query to do this.
select case
when age <= 10 then '0-10'
else concat_ws
(
'-'
,cast(floor(age/5)*5 as string)
,cast((floor(age/5)+1)*5 as string)
)
end as age_group
,count(*) as cnt
from mytable
group by 1
order by cnt desc
limit 3
;
You might need to set this parameter:
set hive.groupby.orderby.position.alias=true;
Demo
with mytable as
(
select floor(rand()*100) as age
from (select 1) x lateral view explode(split(space(100),' ')) pe
)
select case
when age <= 10 then '0-10'
else concat_ws('-',cast(floor(age/5)*5 as string),cast((floor(age/5)+1)*5 as string))
end as age_group
,count(*) as cnt
,sort_array(collect_list(age)) as age_list
from mytable
group by 1
order by cnt desc
;
+-----------+-----+------------------------------+
| age_group | cnt | age_list |
+-----------+-----+------------------------------+
| 0-10 | 9 | [0,0,1,3,3,6,8,9,10] |
| 25-30 | 9 | [26,26,28,28,28,28,29,29,29] |
| 55-60 | 8 | [55,55,56,57,57,57,58,58] |
| 35-40 | 7 | [35,35,36,36,37,38,39] |
| 80-85 | 7 | [80,80,81,82,82,82,84] |
| 30-35 | 6 | [31,32,32,32,33,34] |
| 70-75 | 6 | [70,70,71,71,72,73] |
| 65-70 | 6 | [65,67,67,68,68,69] |
| 50-55 | 6 | [51,53,53,53,53,54] |
| 45-50 | 5 | [45,45,48,48,49] |
| 85-90 | 5 | [85,86,87,87,89] |
| 75-80 | 5 | [76,77,78,79,79] |
| 20-25 | 5 | [20,20,21,22,22] |
| 15-20 | 5 | [17,17,17,18,19] |
| 10-15 | 4 | [11,12,12,14] |
| 95-100 | 4 | [95,95,96,99] |
| 40-45 | 3 | [41,44,44] |
| 90-95 | 1 | [93] |
+-----------+-----+------------------------------+

laravel, group by category and select the record with the minimum price

I have models Book and BookCategory
How do I select the cheapest book in every category?
Book table:
| id | name | price | book_category_id |
| 1 | test | 10 | 1
| 2 | test | 15 | 3
| 3 | test | 75 | 1
| 4 | test | 25 | 2
| 5 | test | 19 | 1
| 6 | test | 11 | 2
| 7 | test | 10 | 1
The selection should be :
| id | name | price | book_category_id |
| 1 | test | 10 | 1
| 2 | test | 15 | 3
| 6 | test | 11 | 2
I've tried:
$books = Book::groupBy("book_category_id")->orderBy("price")->get()
But the output is not the minimum price row.
any idea?
EDIT:
I found this page:
https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
it has 90% of the solution in SQL
SELECT *
FROM books
WHERE price = (SELECT MIN(price) FROM books AS u WHERE u.book_category_id= books.book_category_id)
GROUP BY books.book_category_id
how to convert this to laravel query builder?
You need to perform a subquery like this post. Try this:
$books = Book::from(DB::raw("SELECT * FROM books order by price asc"))
->groupBy("book_category_id")->get();
Please note that this is a mysql only solution because in mysql you're allowed to not aggregate non-group-by columns. If you need to do this for another DB, you need to perform an inner join on the subquery

Top N query performance when accessing a list of IDs

I have a top N query that is giving me problems.
First of all, I have a query like the following:
select /*+ gather_plan_statistics */ * from
(
select rowid
from payer_subscription ps
where ps.subscription_status = :i_subscription_status
and ps.merchant_id = :merchant_id2
order by transaction_date desc
) where rownum <= :i_rowcount;
This query works well. It can very efficiently find me the top 10 rows for a massive data set, using an index on merchant_id, subscription_status, transaction_date.
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 4 |
|* 1 | COUNT STOPKEY | | 1 | | 10 |00:00:00.01 | 4 |
| 2 | VIEW | | 1 | 11 | 10 |00:00:00.01 | 4 |
|* 3 | INDEX RANGE SCAN DESCENDING| SODTEST2_IX | 1 | 100 | 10 |00:00:00.01 | 4 |
-------------------------------------------------------------------------------------------------------
As you can see the estimated actual rows at each stage are 10, which is correct.
Now, I have a requirement to get the top N records for a set of merchant_Ids, so if I change the query to include two merchant_ids, the performance tanks:
select /*+ gather_plan_statistics */ * from
(
select rowid
from payer_subscription ps
where ps.subscription_status = :i_subscription_status
and (ps.merchant_id = :merchant_id or
ps.merchant_id = :merchant_id2 )
order by transaction_date desc
) where rownum <= :i_rowcount;
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.17 | 178 | | | |
|* 1 | COUNT STOPKEY | | 1 | | 10 |00:00:00.17 | 178 | | | |
| 2 | VIEW | | 1 | 200 | 10 |00:00:00.17 | 178 | | | |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 200 | 10 |00:00:00.17 | 178 | 2048 | 2048 | 2048 (0)|
| 4 | INLIST ITERATOR | | 1 | | 42385 |00:00:00.10 | 178 | | | |
|* 5 | INDEX RANGE SCAN | SODTEST2_IX | 2 | 200 | 42385 |00:00:00.06 | 178 | | | |
----------------------------------------------------------------------------------------------------------------------------
Notice now that there are 42K rows coming out of the two index range scans - Oracle is no longer aborting the index range scan when it reaches 10 rows. What I thought would happen, is that Oracle would get at most 10 rows for each merchant_id, knowing that at most 10 rows are to be returned by the query. Then it would sort that 10 + 10 rows and output the top 10 based on the transaction date, but it refuses to do that.
Does anyone know how I can get the performance of the first query, when I need to pass a list of merchants into the query? I could probably get the performance using a union all, but the list of merchants is variable, and could be anywhere between 1 or 2 to several 100.
You can use --+ use_concat hint to make Oracle execute query as if it was a UNION ALL.
From documentation:
The USE_CONCAT hint instructs the optimizer to transform combined
OR-conditions in the WHERE clause of a query into a compound query
using the UNION ALL set operator. Without this hint, this
transformation occurs only if the cost of the query using the
concatenations is cheaper than the cost without them. The USE_CONCAT
hint overrides the cost consideration.
There are many cases where use_concat is ignored.
See: MOS Note: USE_CONCAT hint on different versions (Doc ID 259741.1)
I have had success in 10.2.0.4, 11.2.0.1 with OR_EXPAND where USE_CONCAT will not work.
/*+ OR_EXPAND( alias column_name ) */
Documented here:
http://www.hellodba.com/reader.php?ID=199&lang=EN
I'm not sure if this helps, but you cah try to replace the OR operator with IN:
and ps.merchant_id IN (:merchant_id, :merchant_id2)

MySQL equivalent of ORACLES rank()

Oracle has 2 functions - rank() and dense_rank() - which i've found very useful for some applications. I am doing something in mysql now and was wondering if they have something equivalent to those?
Nothing directly equivalent, but you can fake it with some (not terribly efficient) self-joins. Some sample code from a collection of MySQL query howtos:
SELECT v1.name, v1.votes, COUNT(v2.votes) AS Rank
FROM votes v1
JOIN votes v2 ON v1.votes < v2.votes OR (v1.votes=v2.votes and v1.name = v2.name)
GROUP BY v1.name, v1.votes
ORDER BY v1.votes DESC, v1.name DESC;
+-------+-------+------+
| name | votes | Rank |
+-------+-------+------+
| Green | 50 | 1 |
| Black | 40 | 2 |
| White | 20 | 3 |
| Brown | 20 | 3 |
| Jones | 15 | 5 |
| Smith | 10 | 6 |
+-------+-------+------+
how about this "dense_rank implement" in MySQL
CREATE TABLE `person` (
`id` int(11) DEFAULT NULL,
`first_name` varchar(20) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
`gender` char(1) DEFAULT NULL);
INSERT INTO `person` VALUES
(1,'Bob',25,'M'),
(2,'Jane',20,'F'),
(3,'Jack',30,'M'),
(4,'Bill',32,'M'),
(5,'Nick',22,'M'),
(6,'Kathy',18,'F'),
(7,'Steve',36,'M'),
(8,'Anne',25,'F'),
(9,'Mike',25,'M');
the data before dense_rank() like this
mysql> select * from person;
+------+------------+------+--------+
| id | first_name | age | gender |
+------+------------+------+--------+
| 1 | Bob | 25 | M |
| 2 | Jane | 20 | F |
| 3 | Jack | 30 | M |
| 4 | Bill | 32 | M |
| 5 | Nick | 22 | M |
| 6 | Kathy | 18 | F |
| 7 | Steve | 36 | M |
| 8 | Anne | 25 | F |
| 9 | Mike | 25 | M |
+------+------------+------+--------+
9 rows in set (0.00 sec)
the data after dense_rank() like this,including "partition by" function
+------------+--------+------+------+
| first_name | gender | age | rank |
+------------+--------+------+------+
| Anne | F | 25 | 1 |
| Jane | F | 20 | 2 |
| Kathy | F | 18 | 3 |
| Steve | M | 36 | 1 |
| Bill | M | 32 | 2 |
| Jack | M | 30 | 3 |
| Mike | M | 25 | 4 |
| Bob | M | 25 | 4 |
| Nick | M | 22 | 6 |
+------------+--------+------+------+
9 rows in set (0.00 sec)
the query statement is
select first_name,t1.gender,age,FIND_IN_SET(age,t1.age_set) as rank from person t2,
(select gender,group_concat(age order by age desc) as age_set from person group by gender) t1
where t1.gender=t2.gender
order by t1.gender,rank

Resources