Trying to figure out top 5 land areas of the 50 states in the U.S - oracle

I have a table created. With one column named states and another column called land area. I am using oracle 11g. I have looked at various questions on here and cannot find a solution. Here is what I have tried so far:
SELECT LandAreas, State
FROM ( SELECT LandAreas, State, DENSE_RANK() OVER (ORDER BY State DESC) sal_dense_rank
FROM Map )
WHERE sal_dense_rank >= 5;
This does not provide the top 5 land areas as far as number wise.
I have also tried this one but no go either:
SELECT * FROM Map order by State desc)
where rownum < 5;
Anyone have any suggestions to get me on the right track??
Here is a samle of the table
states land areas
michagan 15000
florida 25000
tennessee 10000
alabama 80000
new york 150000
california 20000
oregon 5000
texas 6000
utah 3000
nebraska 1000
Desired output from query:
States land area
new york 150000
alabama 80000
florida 25000
california 20000

Try:
Select * from
(SELECT State, LandAreas FROM Map ORDER BY LandAreas DESC)
where rownum < 6
Link to Fiddle

Use a HAVING clause and count the number state states larger:
SELECT m.state, m.landArea
FROM Map m
LEFT JOIN Map m2 on m2.landArea > m.landArea
GROUP BY m.state, m.landArea
HAVING count(*) < 5
ORDER BY m.landArea DESC
See SQLFiddle
This joins each state to every state whose area is greater, then uses a HAVING clause to return only those states where the number of larger states was less than 5.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
The left join is needed for the case of the largest state, which has no other larger state to join to.
The ORDER BY is optional.

Try something like this
select m.states,m.landarea
from map m
where (select count(‘x’) from map m2 where m2.landarea > m.landarea)<=5
order by m.landarea

There are two bloomers in your posted code.
You need to use landarea in the DENSE_RANK() call. At the moment you're ordering the states in reverse alphabetical order.
Your filter in the outer query is the wrong way around: you're excluding the top four results.
Here is what you need ...
SELECT LandArea, State
FROM ( SELECT LandArea
, State
, DENSE_RANK() OVER (ORDER BY landarea DESC) as area_dr
FROM Maps )
WHERE area_dr <= 5
order by area_dr;
... and here is the SQL Fiddle to prove it. (I'm going with the statement in the question that you want the top 5 biggest states and ignoring the fact that your desired result set has only four rows. But adjust the outer filter as you will).
There are three different functions for deriving top-N result sets: DENSE_RANK, RANK and ROW_NUMBER.
Using ROW_NUMBER will always guarantee you 5 rows in the result set, but you may get the wrong result if there are several states with the same land area (unlikely in this case, but other data sets will produce such clashes). So: 1,2,3,4,5
The difference between RANK and DENSE_RANK is how they handle ties. DENSE_RANK always produces a series of consecutive numbers, regardless of how many rows there are in each rank. So: 1,2,2,3,3,3,4,5
RANK on the other hand will produce a sparse series if a given rank has more than one hit. So: 1,2,2,4,4,4.
Note that each of the example result sets has a different number of rows. Which one is correct? It depends on the precise question you want to ask.
Using a sorted sub-query with the ROWNUM pseudo-column will work like the ROW_NUMBER function, but I prefer using ROW_NUMBER because it is more powerful and more error-proof.

Related

Add indicator to top and bottom 10%

I'm trying to capture the average of FIRST_CONTACT_CAL_DAYS but what I would like to do is create an indicator for the top and bottom 10% of values so I can exclude those (outliers) from my average calculation.
Not sure how to go about do this, any thoughts?
SELECT DISTINCT
TO_CHAR(A.FIRST_ASSGN_DT,'DAY') AS DAY_NUMBER,
A.FIRST_ASSGN_DT,
A.FIRST_CONTACT_DT,
TO_CHAR(A.FIRST_CONTACT_DT,'DAY') AS DAY_NUMBER2,
A.FIRST_CONTACT_DT AS FIRST_PHONE_CONTACT,
A.ID,
ABS(TO_DATE(A.FIRST_CONTACT_DT, 'DD/MM/YYYY') - TO_DATE(A.FIRST_ASSGN_DT, 'DD/MM/YYYY')) AS FIRST_CONTACT_CAL_DAYS,
FROM HIST A
LEFT JOIN CONTACTS D ON A.ID = D.ID
WHERE 1=1
You may be looking for something like this. Please adapt to your situation.
I assume you may have more than one "group" or "partition" and you need to compute the average for each group separately, after throwing out the outliers in each partition. (An alternative, which can be easily accommodated by adapting the query below, is to throw out the outliers at the global level, and only then to group and take the average for each group.)
If you don't have any groups, and everything is one big pile of data, it's even easier - you don't need GROUP BY and PARTITION BY.
Then: the function NTILE assigns a bucket number, in this example between 1 and 10, to each row, based on where they fall (first decile, i.e. first 10%, next decile, ... all the way to the last decile). I do this in a subquery. Then in the outer query just filter out the first and last bucket before you group by and you compute the average.
For testing purposes I create three groups with 10,000 random numbers each in a WITH clause - no need to spend any time on that portion of the code, since it is not part of the solution (the SQL code to solve your problem) - it's just a dirty trick to create test data on the fly.
with
inputs ( grp, val ) as (
select ceil(level/10000), dbms_random.value(0, 150)
from dual
connect by level <= 30000
)
select grp, avg(val) as avg_val
from (
select grp, val, ntile(10) over (partition by grp order by val) as bkt
from inputs
)
where bkt between 2 and 9
group by grp
;
GRP AVG_VAL
--- -----------------------
1 75.021614866547043734458
2 74.286117923344418598032
3 75.437412573353736953791

MDX rather complicated sorting

I can't find out a way, how to sort my query, this is the simple query:
SELECT {[Measures].[IB]}
ON COLUMNS,
{[Dim_Product_Models_new].[PLA].members } *
{[Dim Dates_new].[Date Full].&[2013-02-01]:[Dim Dates_new].[Date Full].&[2014-01-01]}
ON ROWS
FROM [cub_dashboard_spares]
The think is, I would get a result for 6 PLAs combined across 12 months (72 rows in total), however it is sorted alphabetically upon PLA.
What i need, is to sort the PLAs based on a measure in last month (2014-01-01 in this case).
Is there any way to perform this task so that the groupping (PLAs, Dates from 2013-02 to 2013-12) is perserved, but only the order of my PLAs is different. (PLA with highest measure in last month would be first, and so on)
Thank you very much for any kind of help
Just put the sorted set on the rows, using the Order function. The third parameter of this function is DESC if you want to sort within each hierarchy level, but still want to get parents before children (like ALL before the single attribute members), or BDESC if you want to sort across all levels.
SELECT {[Measures].[IB]}
ON COLUMNS,
Order({[Dim_Product_Models_new].[PLA].members },
([Measures].[IB], [Dim Dates_new].[Date Full].&[2014-01-01]),
DESC)
*
{[Dim Dates_new].[Date Full].&[2013-02-01]:[Dim Dates_new].[Date Full].&[2014-01-01]}
ON ROWS
FROM [cub_dashboard_spares]
The order function over a crossjoin should preserve the initial order of the first set so reversing the order of the tuple will do the job:
SELECT
{
[Measures].[IB]
} ON COLUMNS,
order(
{[Dim Dates_new].[Date Full].&[2013-02-01]:[Dim Dates_new].[Date Full].&[2014-01-01]} *
{[Dim_Product_Models_new].[PLA].members } ,
[Measures].[IB],
desc
) ON ROWS
FROM [cub_dashboard_spares]
If you want to preserve the oder of appearance of the column labels, you can use the generate function like in the following example from the AW cube:
SELECT
{[Measures].[Internet Sales Amount]} ON 0
,Generate
(
{[Customer].[Country].&[Australia]:[Customer].[Country].&[United Kingdom]}
,(
Order
(
[Date].[Calendar Year].[Calendar Year].MEMBERS
,(
[Customer].[Country].CurrentMember
,[Measures].[Internet Sales Amount]
)
,DESC
)
,[Customer].[Country].CurrentMember
)
) ON 1
FROM [Adventure Works];
Philip,

Select Multiple rows into one row in Oracle

I have a table that stores student results for different exams and different exam types say main exam, continuous assessment, course work etc, I need to query the table such that I get only one row for one particular exam unit with the percentage averaged depending on the number of exams the students sat for. Here is my attempted query:
select stu_reg_no, unit_code,
exam_unit, exam_semester,
student_year,
sum(per_centage_score)/count(per_centage_score) percentage
from student_results_master
group by unit_code, exam_unit,
per_centage_score, stu_reg_no,
exam_semester, student_year;
Here is my resultset:
I have two rows for the same exam unit since one is main exam and the other course work I need my output like this:
E35/1000/2013 TFT001 COMPLEX ANALYSIS 1 1 71.04
E35/1000/2013 TFT002 LINEAR ALGEBRA 1 1 56.25
The percentage for that particular unit is added and divided by the number of exams for that particular unit.
How can I achieve this?
Oracle provides a built-in function for calculating average value for an expression over a set of rows - AVG(). To get the desired output you need to do the following two things:
Replace sum(per_centage_score)/count(per_centage_score) with avg(per_centage_score)
Remove per_centage_score column from the group by clause.
To that end, your query might look like this:
select stu_reg_no
, unit_code
, exam_unit
, exam_semester
, student_year
, avg(percentage) percentage
from student_results_master
group by unit_code
, exam_unit
, stu_reg_no
, exam_semester
, student_year;
Result:
STU_REG_NO UNIT_CODE EXAM_UNIT EXAM_SEMESTER STUDENT_YEAR PERCENTAGE
------------- --------- ---------------- ------------- ------------ ----------
E35/1000/2013 TFT001 COMPLEX ANALYSIS 1 1 71.04
E35/1000/2013 TFT002 LINEAR ALGEBRA 1 1 56.25
try this:
select stu_reg_no, unit_code, exam_unit, exam_semester, student_year,
(select sum(per_centage_score) from student_results_master t2 where t2.exam_unit = t1.exam_unit)
/(select count(per_centage_score) from student_results_master t2 where t2.exam_unit = t1.exam_unit)
from student_results_master t1
group by unit_code, exam_unit, stu_reg_no, exam_semester, student_year;

How can I limit the numbers of results being grouped in my Group By in Oracle?

I've got a table of a parameters, values, and times at which those values were recorded.
I've got a procedure which takes in a time, and needs to get the average result of each parameters value in the window of time that is -15/+5 seconds around that time frame. On top of that, I want to make sure that I take the no more than 15 records before the passed in time, and no more than 5 records after it.
For example, maybe I'm recording values of some parameters every second. If I passed in the time 21:30:30, I'd want to get the values between 21:30:15 and 21:30:35. But if I was recording every half second, I'd actually have more parameters that fit in that time frame than I want, and that's where my need to limit my results comes in.
I've read this question and this article which seem pretty related to what I'm trying to do, but unfortunately I'm dealing with Oracle and not MySQL, so I can't use "limit".
I've currently got something that looks like this:
std_values as
(
select
V.ParameterId,
V.NumericValue,
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time >= pSummaryTime - 15/86400
and V.Time <= pSummaryTime + 5/86400
)
select
ParameterId,
avg(NumericValue) as NumericValue
from
std_values
group by
ParameterId
pValueSource is just something that lets me filter down which value types I'm looking at, and pSummaryTime is the input time that I'm basing my time frame around. The goal here is to get the 15 records before pSummaryTime that falls within that window, and the 5 after that falls within that window, and use those for the average. Currently I'm not limiting the number of "before" and "after" results though, so I'm ending up with the average of everything that falls into that time window. And without something like "limit", I'm not sure how to do this in Oracle.
Sounds like you want a moving window aggregate function. This is part of the Analytical functions feature of Oracle.
It's not my strong suit, and since you didn't include sample tables/data to build a test case, I'll just point you to the Oracle documentation, here:
http://docs.oracle.com/cd/B14117_01/server.101/b10736/analysis.htm#i1006709
You probably want something like:
AVG(NumericValue) over (order by pSummaryTime RANGE BETWEEN 15 PRECEDING AND 5 FOLLOWING)
but, like I said, not my strong suit, and totally untested, but, I hope it gets the idea across.
Hope that helps.
Thanks to Mark Bobak's answer getting me on the right track, I ended up with this solution.
with
values_before as
(
select
V.ParameterId,
V.NumericValue,
row_number() over (Partition by V.ParameterId order by V.Time desc) as RowNumber
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time >= pSummaryTime - 15/86400
and V.Time <= pSummaryTime
),
values_after as
(
select
V.ParameterId,
V.NumericValue,
row_number() over (Partition by V.ParameterId order by V.Time desc) as RowNumber
from
ValuesTable V
where
V.ValueSource = pValueSource
and V.Time <= pSummaryTime + 5/86400
and V.Time > pSummaryTime
),
values_all as
(
select * from values_before where RowNumber <= 15
union all
select * from values_after where RowNumber <= 5
)
select ParameterId, avg(NumericValue) from values_all group by ParameterId
No doubt there's a better way to do this, but it at least seems to be giving the correct result. The key was using an analytical function to set the row number and order for the 15 before and 5 after, and then filtering my results down to just those.

How to get records randomly from the oracle database?

I need to select rows randomly from an Oracle DB.
Ex: Assume a table with 100 rows, how I can randomly return 20 of those records from the entire 100 rows.
SELECT *
FROM (
SELECT *
FROM table
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum < 21;
SAMPLE() is not guaranteed to give you exactly 20 rows, but might be suitable (and may perform significantly better than a full query + sort-by-random for large tables):
SELECT *
FROM table SAMPLE(20);
Note: the 20 here is an approximate percentage, not the number of rows desired. In this case, since you have 100 rows, to get approximately 20 rows you ask for a 20% sample.
SELECT * FROM table SAMPLE(10) WHERE ROWNUM <= 20;
This is more efficient as it doesn't need to sort the Table.
SELECT column FROM
( SELECT column, dbms_random.value FROM table ORDER BY 2 )
where rownum <= 20;
In summary, two ways were introduced
1) using order by DBMS_RANDOM.VALUE clause
2) using sample([%]) function
The first way has advantage in 'CORRECTNESS' which means you will never fail get result if it actually exists, while in the second way you may get no result even though it has cases satisfying the query condition since information is reduced during sampling.
The second way has advantage in 'EFFICIENT' which mean you will get result faster and give light load to your database.
I was given an warning from DBA that my query using the first way gives loads to the database
You can choose one of two ways according to your interest!
In case of huge tables standard way with sorting by dbms_random.value is not effective because you need to scan whole table and dbms_random.value is pretty slow function and requires context switches. For such cases, there are 3 additional methods:
1: Use sample clause:
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
for example:
select *
from s1 sample block(1)
order by dbms_random.value
fetch first 1 rows only
ie get 1% of all blocks, then sort them randomly and return just 1 row.
2: if you have an index/primary key on the column with normal distribution, you can get min and max values, get random value in this range and get first row with a value greater or equal than that randomly generated value.
Example:
--big table with 1 mln rows with primary key on ID with normal distribution:
Create table s1(id primary key,padding) as
select level, rpad('x',100,'x')
from dual
connect by level<=1e6;
select *
from s1
where id>=(select
dbms_random.value(
(select min(id) from s1),
(select max(id) from s1)
)
from dual)
order by id
fetch first 1 rows only;
3: get random table block, generate rowid and get row from the table by this rowid:
select *
from s1
where rowid = (
select
DBMS_ROWID.ROWID_CREATE (
1,
objd,
file#,
block#,
1)
from
(
select/*+ rule */ file#,block#,objd
from v$bh b
where b.objd in (select o.data_object_id from user_objects o where object_name='S1' /* table_name */)
order by dbms_random.value
fetch first 1 rows only
)
);
To randomly select 20 rows I think you'd be better off selecting the lot of them randomly ordered and selecting the first 20 of that set.
Something like:
Select *
from (select *
from table
order by dbms_random.value) -- you can also use DBMS_RANDOM.RANDOM
where rownum < 21;
Best used for small tables to avoid selecting large chunks of data only to discard most of it.
Here's how to pick a random sample out of each group:
SELECT GROUPING_COLUMN,
MIN (COLUMN_NAME) KEEP (DENSE_RANK FIRST ORDER BY DBMS_RANDOM.VALUE)
AS RANDOM_SAMPLE
FROM TABLE_NAME
GROUP BY GROUPING_COLUMN
ORDER BY GROUPING_COLUMN;
I'm not sure how efficient it is, but if you have a lot of categories and sub-categories, this seems to do the job nicely.
-- Q. How to find Random 50% records from table ?
when we want percent wise randomly data
SELECT *
FROM (
SELECT *
FROM table_name
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum <= (select count(*) from table_name) * 50/100;

Resources