Hive QL - Limiting number of rows per each item - hadoop

If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list?
EX:
select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000

Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:
SELECT a_id, b, c, count(*) as sumrequests
FROM (
SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
FROM table_name
) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;
This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.

Related

Merge query to distribute equal number of records between 100 users

I have two tables in Oracle, in first table I have 100 users and in second table I have 100000 records. I want to distribute equal amount of records between them.....
Instead of writing updating and using rownum <= 1000 to distribute data....I want to write merge statement that can divide equal number of records between 100 users.
Table 1
column A Column B column c
1 Pre 90008765
2 Pre 90008766 and so on like this
Table 2
column a Column B column C Column d
1 null null null
2 null null null
And so on will have 100000 records
and between these two tables column a will be common in which we can apply join condition..... please guide me with merge query
If I understand correctly these words "write merge statement that can divide equal number of records between 100 users", you want this:
merge into table2 tgt
using (
select tb.rwd, ta.a
from (select rownum rn, a, b, c, count(1) over () cnt from table1) ta
join (select rowid rwd, rownum rn, a, b, c, d from table2) tb
on mod(ta.rn, cnt) = mod(tb.rn, cnt)) src
on (tgt.rowid = src.rwd)
when matched then update set a = src.a
dbfiddle
This statement assigns rows from T1 to rows in T2 in sequence 1-2-3-...-1-2-3-..., using function mod(). Of course you can update other columns if you need, not only A.

How to write a query to select only groups where certain values are present in all rows?

I need to write a query for Oracle 11g to select only the groups which are highlighted in the image here. Specifically, there are 3 columns -- A, B, and C. Group 1 contains a null value in column C, group 2 contains several nulls in column C.Group 3 does not contain any null values in column C. Group 4 yes contains some null values in column C, and Group 5 does NOT contain any null values in column C.
I need to selects only the groups of rows that do NOT contain any null values in column C -- that would be group 3 rows and group 5 rows. I only need the group number, like return only a 3 and a 5 What SQL functions can I use to write this query? What does this query look like?
something like (pseudo code)
select colA, count(*) As cnt From tblX join to itself on something
Where ...something ...
Group By colA
(or we could skip the grouping and just select 3 and 5 or the three 3's and the three 5's)
The aggregate function COUNT(colname) would only count non-NULL values, so something like this might work:
select A from yourtable
group by A
having count(A) = count(C)
assuming A is never NULL.
You can use a not in clause and subquery
select colA, count(*) As cnt
From tblX
Where colA not in (select distinct colA from tblx where colC is null)
Group By colA

Fetching Lastrow from the table in toad

I have to fetch the first and last row of the table in Toad.
I have used the following query
select * from grade_master where rownum=(select max(rownum) from grade_master)
select * from grade_master where rownum=1
The second query works to fetch the first row. but the first not working. Anyone please help me.
Thanks in advance
Such request makes sense if you specify sort order of the results - there are no such things in database as "first" and "last" rows if sort order is not specified.
SQL> with t as (
2 select 'X' a, 1 b from dual union all
3 select 'C' , 2 from dual union all
4 select 'A' a, 3 b from dual
5 )
6 select a, b, decode(rn, 1, 'First','Last')
7 from (
8 select a, b, row_number() over(order by a) rn,
9 count(*) over() cn
10 from t
11 )
12 where rn in (1, cn)
13 order by rn
14 /
A B DECOD
- ---------- -----
A 3 First
X 1 Last
In oracle the data is not ordered until you specify the order in you sql statement.
So when you do:
select * from grade_master
oracle will give the rows in anyway it want wants.
OTOH if you do
select * from grade_master order by id desc
Then oracle will give the rows back ordered by id descending.
So to get the last row you could do this:
select *
from (select * from grade_master order by id desc)
where rownum = 1
The rownum is determined BEFORE the "order by" clause is assessed, so what this query is doing is ordering the rows descending (the inside query) and then giving this ordered set to the outer query. The outer gets the first row of the set then returns it.

How to use the union operator in oracle

I know that the union operator is used to (for example) return all rows from two tables after eliminating duplicates. Example:
SELECT a_id
FROM a
UNION
SELECT b_id
FROM b;
The result of listing of all elements in A and B eliminating duplicates is {1,2,3,4,5,6,7,8}.
If you joined A and B you would get only {4,5}. You would have to perform a full outer join to get the full list of 1-8. My question is if I wanted to use the union operator to display from a table called employees, the employee_id and job_id ( employee id being a number data type, and job_id being a VARCHAR2 data type) How would I go about doing this?
Would it be something like this: This does not run in oracle obviously,
SELECT employee_id
UNION
SELECT job_id
FROM employees;
If you really wanted to union together all the EMPLOYEE_IDs followed by all the JOB_IDs you'd use
SELECT TO_CHAR(EMPLOYEE_ID) FROM EMPLOYEES
UNION ALL
SELECT JOB_ID FROM EMPLOYEES
If you had rows with EMPLOYEE_IDs of 1, 2, and 3, and those same rows had JOB_IDs of 1, 11, and 111 you'd get a result set of six rows with a single column which would have values of
1
2
3
1
11
111
By using UNION ALL Oracle will allow the duplicates to pass through.
Share and enjoy.

Best practice for pagination in Oracle?

Problem: I need write stored procedure(s) that will return result set of a single page of rows and the number of total rows.
Solution A: I create two stored procedures, one that returns a results set of a single page and another that returns a scalar -- total rows. The Explain Plan says the first sproc has a cost of 9 and the second has a cost of 3.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) AS RowNum, ...
) AS PageResult
WHERE RowNum >= #from
AND RowNum < #to
ORDER BY RowNum
SELECT COUNT(*)
FROM ...
Solution B: I put everything in a single sproc, by adding the same TotalRows number to every row in the result set. This solution feel hackish, but has a cost of 9 and only one sproc, so I'm inclined to use this solution.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) RowNum, COUNT(*) OVER () TotalRows,
WHERE RowNum >= from
AND RowNum < to
ORDER BY RowNum;
Is there a best-practice for pagination in Oracle? Which of the aforementioned solutions is most used in practice? Is any of them considered just plain wrong? Note that my DB is and will stay relatively small (less than 10GB).
I'm using Oracle 11g and the latest ODP.NET with VS2010 SP1 and Entity Framework 4.4. I need the final solution to work within the EF 4.4. I'm sure there are probably better methods out there for pagination in general, but I need them working with EF.
If you're already using analytics (ROW_NUMBER() OVER ...) then adding another analytic function on the same partitioning will add a negligible cost to the query.
On the other hand, there are many other ways to do pagination, one of them using rownum:
SELECT *
FROM (SELECT A.*, rownum rn
FROM (SELECT *
FROM your_table
ORDER BY col) A
WHERE rownum <= :Y)
WHERE rn >= :X
This method will be superior if you have an appropriate index on the ordering column. In this case, it might be more efficient to use two queries (one for the total number of rows, one for the result).
Both methods are appropriate but in general if you want both the number of rows and a pagination set then using analytics is more efficient because you only query the rows once.
In Oracle 12C you can use limit LIMIT and OFFSET for the pagination.
Example -
Suppose you have Table tab from which data needs to be fetched on the basis of DATE datatype column dt in descending order using pagination.
page_size:=5
select * from tab
order by dt desc
OFFSET nvl(page_no-1,1)*page_size ROWS FETCH NEXT page_size ROWS ONLY;
Explanation:
page_no=1
page_size=5
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY - Fetch 1st 5 rows only
page_no=2
page_size=5
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY - Fetch next 5 rows
and so on.
Refrence Pages -
https://dba-presents.com/index.php/databases/oracle/31-new-pagination-method-in-oracle-12c-offset-fetch
https://oracle-base.com/articles/12c/row-limiting-clause-for-top-n-queries-12cr1#paging
This may help:
SELECT * FROM
( SELECT deptno, ename, sal, ROW_NUMBER() OVER (ORDER BY ename) Row_Num FROM emp)
WHERE Row_Num BETWEEN 5 and 10;
A clean way to organize your SQL code could be trough WITH statement.
The reduced version implements also total number of results and total pages count.
For example
WITH SELECTION AS (
SELECT FIELDA, FIELDB, FIELDC FROM TABLE),
NUMBERED AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
SELECTION.*
FROM SELECTION)
SELECT
(SELECT COUNT(*) FROM NUMBERED) TOTAL_ROWS,
NUMBERED.*
FROM NUMBERED
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
This code gives you a paged resultset with two more fields:
TOTAL_ROWS with the total rows of your full SELECTION
RN the row number of the record
It requires 2 parameter: :page_size and :page_number to slice your SELECTION
Reduced Version
Selection implements already ROW_NUMBER() field
WITH SELECTION AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
FIELDA,
FIELDB,
FIELDC
FROM TABLE)
SELECT
:page_number PAGE_NUMBER,
CEIL((SELECT COUNT(*) FROM SELECTION ) / :page_size) TOTAL_PAGES,
:page_size PAGE_SIZE,
(SELECT COUNT(*) FROM SELECTION ) TOTAL_ROWS,
SELECTION.*
FROM SELECTION
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
Try this:
select * from ( select * from "table" order by "column" desc ) where ROWNUM > 0 and ROWNUM <= 5;
I also faced a similar issue. I tried all the above solutions and none gave me a better performance. I have a table with millions of records and I need to display them on screen in pages of 20. I have done the below to solve the issue.
Add a new column ROW_NUMBER in the table.
Make the column as primary key or add a unique index on it.
Use the population program (in my case, Informatica), to populate the column with rownum.
Fetch Records from the table using between statement. (SELECT * FROM TABLE WHERE ROW_NUMBER BETWEEN LOWER_RANGE AND UPPER_RANGE).
This method is effective if we need to do an unconditional pagination fetch on a huge table.
Sorry, this one works with sorting:
SELECT * FROM (SELECT ROWNUM rnum,a.* FROM (SELECT * FROM "tabla" order by "column" asc) a) WHERE rnum BETWEEN "firstrange" AND "lastrange";

Resources