Oracle aggregate function to return a random value for a group? - oracle

The standard SQL aggregate function max() will return the highest value in a group; min() will return the lowest.
Is there an aggregate function in Oracle to return a random value from a group? Or some technique to achieve this?
E.g., given the table foo:
group_id value
1 1
1 5
1 9
2 2
2 4
2 8
The SQL query
select group_id, max(value), min(value), some_aggregate_random_func(value)
from foo
group by group_id;
might produce:
group_id max(value), min(value), some_aggregate_random_func(value)
1 9 1 1
2 8 2 4
with, obviously, the last column being any random value in that group.

You can try something like the following
select deptno,max(sal),min(sal),max(rand_sal)
from(
select deptno,sal,first_value(sal)
over(partition by deptno order by dbms_random.value) rand_sal
from emp)
group by deptno
/
The idea is to sort the values within group in random order and pick the first.I can think of other ways but none so efficient.

You might prepend a random string to the column you want to extract the random element from, and then select the min() element of the column and take out the prepended string.
select group_id, max(value), min(value), substr(min(random_value),11)
from (select dbms_random.string('A', 10)||value random_value,foo.* from foo)
In this way you cand avoid using the aggregate function and specifying twice the group by, which might be useful in a scenario where your query is very complicated / or you are just exploring the data and are entering manually queries with a lengthy and changing list of group by columns.

Related

Allow multiple values from SSRS in oracle

I have a query that gets contract_types 1 to 10. This query is being used in an SSRS report to filter out a larger dataset. I am using -1 for nulls and -2 for all.
I would like to know how we would allow multiple values - does oracle concatenate the inputs together so '1,2,3' would be passed in? Say we get select -1,0,1 in SSRS, how could we alter the bottom query to return values?
My query to get ContractTypes:
SELECT
ContractType,
CASE WHEN ContractType = -2 THEN 'All'
WHEN ContractType = -1 THEN'Null'
ELSE to_Char(ContractType)
END AS DisplayFigure
FROM ContractTypes
which returns
ContractType DisplayFig
-1 Null
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
This currently is only returning single values or all, not muliple values:
SELECT *
FROM Employee
WHERE NVL(CONTRACT_TYPE, -1) = :contract_type or :contract_type = -2
I'm assuming we want to do something like:
WHERE NVL(CONTRACT_TYPE, -1) IN (:contract_type)
But this doesn't seem to work.
Data in Employee
Name ContractType
Bob 1
Sue 0
Bill Null
Joe 2
In my report, I want to be able to select contract_type as -1(null),0,1 using the 'allow muliple values' checkbox. At the moment, I can only select either 'all' using my -2 value, or single contract types.
My input would be: contract type = -1,1,2
My output would be Bill, Bob, Joe.
This is how I'm executing my code
I use SSRS with Oracle a lot so I see where you're coming from. Thankfully, they work pretty well together.
First make sure the parameter is set to allow multiple values. This adds a Select All option to your dropdown so you don't have to worry about adding a special case for "All". You'll want to make sure the dataset for the parameter has a row with -1 as the Value and a friendly description for the Label.
Next, the WHERE clause would be just as you mentioned:
WHERE NVL(CONTRACT_TYPE, -1) IN (:contract_type)
SSRS automatically populates the values. There is no XML or string manipulation needed. Keep in mind that this will not work with single-value parameters.
If for some reason this still doesn't work as expected in your environment, there is another workaround you can use which is more universal and works even with ODBC connections.
In the dataset parameter properties, use an expression like this to concatenate the values into a single, comma-separated string:
="," + Join(Parameters!Parameter.Value, ",") + ","
Then use an expression like this in your WHERE clause:
where :parameter like '%,' + Column + ',%'
Obviously, this is less efficient because it most likely won't be using an index, but it works.
I don't know SSRS, but - if I understood you correctly, you'll have to split that comma-separated values list into rows. Something like in this example:
SQL> select *
2 from dept
3 where deptno in (select regexp_substr('&&contract_type', '[^,]+', 1, level)
4 from dual
5 connect by level <= regexp_count('&&contract_type', ',') + 1
6 );
Enter value for contract_type: 10,20,40
DEPTNO DNAME LOC
---------- -------------------- --------------------
20 RESEARCH DALLAS
10 ACCOUNTING NEW YORK
40 OPERATIONS BOSTON
SQL>
Applied to your code:
select *
from employee
where nvl(contract_type, -1) in (select regexp_substr(:contract_type, '[^,]+', 1, level)
from dual
connect by level <= regexp_substr(:contract_type, ',') + 1
)
If you have the comma separated list of numbers and then if you like to split it then, the below seems simple and easy to maintain.
select to_number(column_value) from xmltable(:val);
Inputs: 1,2,3,4
Output:
I guess I understood your problem. If I am correct the below should solve your problem:
with inputs(Name, ContractType) as
(
select 'Bob', 1 from dual union all
select 'Sue', 0 from dual union all
select 'Bill', Null from dual union all
select 'Joe', 2 from dual
)
select *
from inputs
where decode(:ContractType,'-2',-2,nvl(ContractType,-1)) in (select to_number(column_value) from xmltable(:ContractType))
Inputs: -1,1,2
Output:
Inputs: -2
Output:

Filter out duplicate results in where clause in sql server

select * from user where username is equal to 'jw'
Add DISTINCT directly after SELECT, or alternatively, use the GROUP BY clause after your WHERE clause.
ex: GROUP BY works.memb_id
I think I understand what you are trying to do. WorkID, EmployeeID, and MembID all have to be 1:1 and then you want to group on all three fields.
You have asserted that each Memb_id is assigned to a unique employee. If WorkID is not then you will need to use an aggregate function to select one of the records returned after the group. Such as Min(), Max() or a window function like Row_Number and or Rank.
For example
WorkID, EmployeeID, MembID
1 1 1
2 1 1
3 2 2
4 1 1
If you use Min(WorkID), EmployeeID, MembID....Group By EmployeeID, MembID you will get:
WorkID, EmployeeID, MembID
1 1 1
3 2 2
If this is not what you are looking for some sample data as a whole and what results you expect would be necessary.

Count Length and then Count those records.

I am trying to create a view that displays size (char) of LastName and the total number of records whose last name has that size. So far I have:
SELECT LENGTH(LastName) AS Name_Size
FROM Table
ORDER BY Name_Size;
I need to add something like
COUNT(LENGTH(LastName)) AS Students
This is giving me an error. Do I need to add a GROUP BY command? I need the view:
Name_Size Students
3 11
4 24
5 42
SELECT LENGTH(LastName) as Name_Size, COUNT(*) as Students
FROM Table
GROUP BY Name_Size
ORDER BY Name_Size;
You may have to change the group by and order by to LENGTH(LastName) as not all SQL engines let you reference an alias from the select statement in a clause on that same statement.
HTH,
Eric

Collect to a Map in Hive

I have a Hive table such as
id | value
-------------
A 1
A 2
B 3
A 4
B 5
Essentially, I want to mimic Python's defaultdict(list) and create a map with id as the keys and value as the values.
Query:
select COLLECT_TO_A_MAP(id, value)
from table
Output:
{A:[1,2,4], B:[3,5]}
I tried using klout's CollectUDAF() but it appears this will not append the values to an array, it will just update them. Any ideas?
EDIT:
Here is a more detailed description so I can avoid answers referencing that I try functions in the Hive documentation. Suppose I have a table
num |id |value
____________________
1 A 1
1 A 2
1 B 3
2 A 4
2 B 5
2 B 6
What I am looking for is for a UDAF that provides this output
num |new_map
________________________
1 {A:[1,2], B:[3]}
2 {A:[4], B:[5,6]}
To this query
select num
,COLLECT_TO_A_MAP(id, value) as new_map
from table
group by num
There is a workaround to achieve this. It can be mimicked by using Klout's (see above referenced UDAF) CollectUDAF() in a query such as
add jar '~/brickhouse/target/brickhouse-0.6.0.jar'
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select num
,collect(id_array, value_array) as new_map
from (
select collect_list(id) as id_array
,collect_list(value) as value_array
,num
from table
group by num
) A
group by num
However, I would rather not write a nested query.
EDIT #2
(As referenced in my original question) I have already tried using Klout's CollectUDAF(), even in the instance where you pass it two parameter and it creates a map. The output from that is (if applied to the dataset in my 1st edit)
1 {A:2, B:3}
2 {A:4, B:6}
As stated in my original question, it doesn't collect the values to an array it just collects the last one (or updates the array).
Use the collect UDF in Brickhouse (http://github.com/klout/brickhouse )
It is exactly what you need. Brickhouse's 'collect' returns a list if one parameter is used, and a map if two parameters are used.
the CollectUDAF in Brickhouse (http://github.com/klout/brickhouse ) will get you there.
regarding your comment EDIT #2:
first, collect the values to a list, then collect the k,v pairs to a map:
select
num,
collectUDAF(id, values) as new_map
from
(
SELECT
num,
id,
collect_set(value) as values
FROM
tbl
GROUP BY
num,
id
) as sub
GROUP BY
num
will return
num | new_map
________________________
1 {A:[1,2], B:[3]}
2 {A:[4], B:[5,6]}
If you don't care about the order in which the values appear, you could use the collect_set() UDAF that comes with Hive.
SELECT id, collect_set(value) FROM table GROUP BY id;
This should solve your issue.
Your current query groups by num in both the inner and outer query -- you need to group by id in the inner query to accomplish what you're trying to do.
https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CollectUDAF.java#L55
see brickhouse udaf,when args num larger than 1, MapCollectUDAFEvaluator would be used.
add jar */brickhouse.jar ;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select
collect(a,b)
from( select 1232123 a,21 b
union all select 123 a,23 b)a;
result:{1232123:21,123:23}

How to get records randomly from the oracle database?

I need to select rows randomly from an Oracle DB.
Ex: Assume a table with 100 rows, how I can randomly return 20 of those records from the entire 100 rows.
SELECT *
FROM (
SELECT *
FROM table
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum < 21;
SAMPLE() is not guaranteed to give you exactly 20 rows, but might be suitable (and may perform significantly better than a full query + sort-by-random for large tables):
SELECT *
FROM table SAMPLE(20);
Note: the 20 here is an approximate percentage, not the number of rows desired. In this case, since you have 100 rows, to get approximately 20 rows you ask for a 20% sample.
SELECT * FROM table SAMPLE(10) WHERE ROWNUM <= 20;
This is more efficient as it doesn't need to sort the Table.
SELECT column FROM
( SELECT column, dbms_random.value FROM table ORDER BY 2 )
where rownum <= 20;
In summary, two ways were introduced
1) using order by DBMS_RANDOM.VALUE clause
2) using sample([%]) function
The first way has advantage in 'CORRECTNESS' which means you will never fail get result if it actually exists, while in the second way you may get no result even though it has cases satisfying the query condition since information is reduced during sampling.
The second way has advantage in 'EFFICIENT' which mean you will get result faster and give light load to your database.
I was given an warning from DBA that my query using the first way gives loads to the database
You can choose one of two ways according to your interest!
In case of huge tables standard way with sorting by dbms_random.value is not effective because you need to scan whole table and dbms_random.value is pretty slow function and requires context switches. For such cases, there are 3 additional methods:
1: Use sample clause:
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SELECT.html#GUID-CFA006CA-6FF1-4972-821E-6996142A51C6
for example:
select *
from s1 sample block(1)
order by dbms_random.value
fetch first 1 rows only
ie get 1% of all blocks, then sort them randomly and return just 1 row.
2: if you have an index/primary key on the column with normal distribution, you can get min and max values, get random value in this range and get first row with a value greater or equal than that randomly generated value.
Example:
--big table with 1 mln rows with primary key on ID with normal distribution:
Create table s1(id primary key,padding) as
select level, rpad('x',100,'x')
from dual
connect by level<=1e6;
select *
from s1
where id>=(select
dbms_random.value(
(select min(id) from s1),
(select max(id) from s1)
)
from dual)
order by id
fetch first 1 rows only;
3: get random table block, generate rowid and get row from the table by this rowid:
select *
from s1
where rowid = (
select
DBMS_ROWID.ROWID_CREATE (
1,
objd,
file#,
block#,
1)
from
(
select/*+ rule */ file#,block#,objd
from v$bh b
where b.objd in (select o.data_object_id from user_objects o where object_name='S1' /* table_name */)
order by dbms_random.value
fetch first 1 rows only
)
);
To randomly select 20 rows I think you'd be better off selecting the lot of them randomly ordered and selecting the first 20 of that set.
Something like:
Select *
from (select *
from table
order by dbms_random.value) -- you can also use DBMS_RANDOM.RANDOM
where rownum < 21;
Best used for small tables to avoid selecting large chunks of data only to discard most of it.
Here's how to pick a random sample out of each group:
SELECT GROUPING_COLUMN,
MIN (COLUMN_NAME) KEEP (DENSE_RANK FIRST ORDER BY DBMS_RANDOM.VALUE)
AS RANDOM_SAMPLE
FROM TABLE_NAME
GROUP BY GROUPING_COLUMN
ORDER BY GROUPING_COLUMN;
I'm not sure how efficient it is, but if you have a lot of categories and sub-categories, this seems to do the job nicely.
-- Q. How to find Random 50% records from table ?
when we want percent wise randomly data
SELECT *
FROM (
SELECT *
FROM table_name
ORDER BY DBMS_RANDOM.RANDOM)
WHERE rownum <= (select count(*) from table_name) * 50/100;

Resources