Systematic Random Sample - Teradata SQL Ast - random

I need to produce a systematic random sample of (approx) 2500 rows from (approx)230,000 rows with a unique auto generated number on each row.
Is this possible using Teradata SQL ast? (The Sample function produces a simple random sample.)
Thank you for your time.

When there's already a gapless unique row number:
select t.*
from mytable as t
cross join
( select random(1,2500) as rnd ) as dt -- random start row
where rownumber mod 2500 = rnd -- every 2500 rows
Otherwise ROW_NUMBER can be used to create it:
select t.*
from mytable as t
cross join
( select random(1,2500) as rnd ) as dt
qualify ROW_NUMBER() OVER (ORDER BY whatever_determines_your_order) mod 2500 = rnd

select rank() over(order by $primary_index_key), t1.*
FROM
(select * from $table_name
sample 2500) t1
Assistant would do that, and so would any other client.
Same approach can be used to generate winning Powerball numbers.

From your comment, I think that you cannot do it on the fly. You need to generate some table first that would contain Step# and step_value, for example:
[1,2500],[2,5000],...[x,x*2500]
Than you should join this table to your query and limit row numbers by the logic: rn = random_seed + step_value.
It will look like this:
select
from (
select t1.*
, row_number() over("ordering logic") as rn
from my_table
) as t1
, (select random(1,2500) as random_seed) as seed --so it will be generated only once
where exists (
select 1 from sampling_table as t2
where t1.rn = t2.step_value + seed.random_seed
)

Related

How to generate a start_no and end_no of 1000 split from a table where in it contains numbers from 944900000 to 944999999 using oracle sql

The table contains numbers from 944900000 to 944999999 and i want to split these numbers into ranges of 1000 each like
944900000 to 944900999 -- 1000
944901000 to 944901999 -- 1000
..
..
944999000 to 944999999 -- 1000
is there any way to generate this through oracle SQL not with PL/SQL
You could also use this.
--Your data
CREATE TABLE your_table(your_column) AS
SELECT 944900000 + LEVEL - 1
FROM dual
CONNECT BY 944900000 + LEVEL < 944999999 + 2
;
--from 944900000 to 944999999
WITH cte AS (
SELECT your_column, CASE WHEN MOD(your_column, 1000) = 0 THEN your_column END start_range
FROM your_table
)
SELECT start_range, end_range
FROM (
SELECT start_range, CASE WHEN start_range IS NOT NULL THEN LEAD(your_column, 999)OVER(ORDER BY your_column) END end_range
FROM cte
)T
WHERE end_range IS NOT NULL /*because of the last execution of lead function in the inline view t*/
GROUP BY start_range, end_range
ORDER BY 1, 2
;
You could use ROW_NUMBER to number all records according to increasing number value. Then, compute a DENSE_RANK using the "group" of increments of 1000 to which each record belongs.
WITH cte AS (
SELECT t.*, ROW_NUMBER OVER (ORDER BY col) rn
FROM yourTable t
),
cte2 AS (
SELECT t.*, DENSE_RANK() OVER (ORDER BY FLOOR(rn / 1000)) dr
FROM cte t
)
SELECT *
FROM cte2
WHERE dr = 2; -- e.g. for 2nd partition of 1000 records

How to select data from multiple row into one row with multiple column dynamically?

I am trying to select multiple rows of data into one row through multiple columns which will change dynamically.
This is in Oracle database. I want to count repeated work done by the LEAD_TECHNISIAN_ID within a duration. If the difference of last work delivery date and new work receive date is 15 or below 15 then LEAD_TECHNISIAN_ID has one repeated work.
List item
SELECT *
FROM (WITH CTE AS (
SELECT ROW_NUMBER () OVER (ORDER BY ID) AS RW,
RECEIVED_DATE,
DELIVERY_DATE,
SERVICE_NO,
LEAD_TECHNISIAN_ID,
ID,
SERVICE_CENTER
FROM ( SELECT cc.SERVICE_CENTER,
CC.ID,
CC.BARCODE,
TRUNC (cc.CREATED_DATE) RECEIVED_DATE,
TRUNC (CC.DELIVERY_DATE) DELIVERY_DATE,
cc.SERVICE_NO,
CC.LEAD_TECHNISIAN_ID
FROM customer_complains cc
WHERE cc.BARCODE IN (SELECT BARCODE
FROM (SELECT BARCODE,
COUNT (BARCODE)
FROM customer_complains c
WHERE c.BARCODE <> 'UNDEFINE'
AND C.BARCODE = NVL ('351950102757821', BARCODE)
AND c.SEGMENT3 = NVL ('',c.SEGMENT3)
AND c.SEGMENT3 IN (SELECT SEGMENT3
FROM ITEM_MST
WHERE PRODUCT_GROUP = NVL ('',PRODUCT_GROUP))
GROUP BY c.BARCODE
HAVING COUNT (c.BARCODE) >1))
ORDER BY ID DESC)
ORDER BY ID DESC)
SELECT a.id,
a.DELIVERY_DATE,
a.RECEIVED_DATE,
b.RECEIVED_DATE PRE_RCV,
b.DELIVERY_DATE PRE_DEL,
(a.RECEIVED_DATE - b.DELIVERY_DATE) AS DIFF,
a.SERVICE_NO,
a.LEAD_TECHNISIAN_ID,
b.LEAD_TECHNISIAN_ID PRE_TECH --, a.DELIVERY_DATE
FROM CTE a
LEFT JOIN CTE b ON a.RW = b.RW + 1
)
WHERE DIFF <= 15
Here is the output for a specific barcode. but when I try for All the barcode I have in My Customer_complains table. The query provides irrelevant output.
Currently your code is giving numbers 1,2,3,4... to rows irrespective of LEAD_TECHNISIAN_ID and then you are joining it with RW. It will not consider LEAD_TECHNISIAN_ID while giving row numbers.
RW must start with 1 for each LEAD_TECHNISIAN_ID.
You just need to change calculation of RW as following:
ROW_NUMBER () OVER (PARTITION BY LEAD_TECHNISIAN_ID ORDER BY ID) AS RW
Cheers!!

Oracle DB query to get records count and calculate average afterward

Could anyone please help me regarding that Oracle query.
Here is my required output.
Thanks in advanced.from below query return some record i want to Calculate
Average means SUM(COST+DURATION)/No.of row return form that query but below query doing partition based on SITE_ID i dont want any partition,What is want is whatever query return (Sum of all Duration+sum of all Cost/No. of row returns)but without any partition is that any way i can do that
select * from (
SELECT ROW_NUMBER() OVER (ORDER BY 1) RN,
SITE_ID,
CASEID,
WOID,
DURATION,
COST,
(SUM(COST+DURATION) OVER(PARTITION BY SITE_ID))/COUNT as TotalCost from
(
SELECT COUNT(*) OVER () as COUNT,
SITE_ID,
CASEID,
WOID,
ROUND(table1.duration/3600,2) AS DURATION,
ROUND(table2.duration/3600),2) AS COST
FROM table1 ) AVG
WHERE (1=1)
)
where rn between 0 and 1000
On the basis of your revised question what I think you want is the total of COST and DURATION for each SITE, plus an average of these figures derived from the number of entries in table1 and table2. This is what this query does:
select site_id
, duration
, cost
, (duration+cost)/no_of as avg_total_cost
from (
select
table1.site_id
, count(*) as no_of
, sum(round(table1.duration/3600,2)) as duration
, sum(round((table1.labor_rate * table2.duration)/3600,2)) as cost
from table1
inner join table2
on table1.id = table2.id
group by table1.site_id
)
/
Here is a SQL Fiddle demo to prove that this query runs successfully.
Your question appears to reference other tables, site and emp, which I have ignored because you don't provide join conditions for them. Likewise I have ignored the top-n filter; you can add it back it if you want.

Error on Partition over row number

I am creating a sub-query to select distinct entries on a certain column, DIS_COL, then return all other columns for those distinct entries, arbitrarily selecting the first row.
To do this I'm creating a sub-query that selects only first rows using over - partition by, then selecting from that sub-query.
There is an error with my code however; "ORA-00923: FROM keyword not found where expected".
My code is below:
select *
from (
select *,
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE
) as rows
where row_number = 1
AND CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2';
How can I correct my code to achieve the desired result?
I am working on an Oracle database.
Remove as rows. It is not proper syntax for the table/query alias. It is syntax for column alias.
select *
from (
select T.*,
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE t
)
where row_number = 1
AND (CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2');
It's not the ROW_NUMBER, it's the *, Add an alias to the subquery:
select *
from (
select T.*, -- here
row_number() over (partition by DIS_COL order by COL_2) as row_number --ORDER BY FIELD DETERMINES WHICH ROW IS THE FIRST ROW AND THUS WHICH ONE IS SELECTED.
from MY_TABLE
)T as rows -- and here
where row_number = 1
AND CRITERIA_COL = 'CRIT_1'
OR CRITERIA_COL_2 = 'CRIT_2';

Best practice for pagination in Oracle?

Problem: I need write stored procedure(s) that will return result set of a single page of rows and the number of total rows.
Solution A: I create two stored procedures, one that returns a results set of a single page and another that returns a scalar -- total rows. The Explain Plan says the first sproc has a cost of 9 and the second has a cost of 3.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) AS RowNum, ...
) AS PageResult
WHERE RowNum >= #from
AND RowNum < #to
ORDER BY RowNum
SELECT COUNT(*)
FROM ...
Solution B: I put everything in a single sproc, by adding the same TotalRows number to every row in the result set. This solution feel hackish, but has a cost of 9 and only one sproc, so I'm inclined to use this solution.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) RowNum, COUNT(*) OVER () TotalRows,
WHERE RowNum >= from
AND RowNum < to
ORDER BY RowNum;
Is there a best-practice for pagination in Oracle? Which of the aforementioned solutions is most used in practice? Is any of them considered just plain wrong? Note that my DB is and will stay relatively small (less than 10GB).
I'm using Oracle 11g and the latest ODP.NET with VS2010 SP1 and Entity Framework 4.4. I need the final solution to work within the EF 4.4. I'm sure there are probably better methods out there for pagination in general, but I need them working with EF.
If you're already using analytics (ROW_NUMBER() OVER ...) then adding another analytic function on the same partitioning will add a negligible cost to the query.
On the other hand, there are many other ways to do pagination, one of them using rownum:
SELECT *
FROM (SELECT A.*, rownum rn
FROM (SELECT *
FROM your_table
ORDER BY col) A
WHERE rownum <= :Y)
WHERE rn >= :X
This method will be superior if you have an appropriate index on the ordering column. In this case, it might be more efficient to use two queries (one for the total number of rows, one for the result).
Both methods are appropriate but in general if you want both the number of rows and a pagination set then using analytics is more efficient because you only query the rows once.
In Oracle 12C you can use limit LIMIT and OFFSET for the pagination.
Example -
Suppose you have Table tab from which data needs to be fetched on the basis of DATE datatype column dt in descending order using pagination.
page_size:=5
select * from tab
order by dt desc
OFFSET nvl(page_no-1,1)*page_size ROWS FETCH NEXT page_size ROWS ONLY;
Explanation:
page_no=1
page_size=5
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY - Fetch 1st 5 rows only
page_no=2
page_size=5
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY - Fetch next 5 rows
and so on.
Refrence Pages -
https://dba-presents.com/index.php/databases/oracle/31-new-pagination-method-in-oracle-12c-offset-fetch
https://oracle-base.com/articles/12c/row-limiting-clause-for-top-n-queries-12cr1#paging
This may help:
SELECT * FROM
( SELECT deptno, ename, sal, ROW_NUMBER() OVER (ORDER BY ename) Row_Num FROM emp)
WHERE Row_Num BETWEEN 5 and 10;
A clean way to organize your SQL code could be trough WITH statement.
The reduced version implements also total number of results and total pages count.
For example
WITH SELECTION AS (
SELECT FIELDA, FIELDB, FIELDC FROM TABLE),
NUMBERED AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
SELECTION.*
FROM SELECTION)
SELECT
(SELECT COUNT(*) FROM NUMBERED) TOTAL_ROWS,
NUMBERED.*
FROM NUMBERED
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
This code gives you a paged resultset with two more fields:
TOTAL_ROWS with the total rows of your full SELECTION
RN the row number of the record
It requires 2 parameter: :page_size and :page_number to slice your SELECTION
Reduced Version
Selection implements already ROW_NUMBER() field
WITH SELECTION AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
FIELDA,
FIELDB,
FIELDC
FROM TABLE)
SELECT
:page_number PAGE_NUMBER,
CEIL((SELECT COUNT(*) FROM SELECTION ) / :page_size) TOTAL_PAGES,
:page_size PAGE_SIZE,
(SELECT COUNT(*) FROM SELECTION ) TOTAL_ROWS,
SELECTION.*
FROM SELECTION
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
Try this:
select * from ( select * from "table" order by "column" desc ) where ROWNUM > 0 and ROWNUM <= 5;
I also faced a similar issue. I tried all the above solutions and none gave me a better performance. I have a table with millions of records and I need to display them on screen in pages of 20. I have done the below to solve the issue.
Add a new column ROW_NUMBER in the table.
Make the column as primary key or add a unique index on it.
Use the population program (in my case, Informatica), to populate the column with rownum.
Fetch Records from the table using between statement. (SELECT * FROM TABLE WHERE ROW_NUMBER BETWEEN LOWER_RANGE AND UPPER_RANGE).
This method is effective if we need to do an unconditional pagination fetch on a huge table.
Sorry, this one works with sorting:
SELECT * FROM (SELECT ROWNUM rnum,a.* FROM (SELECT * FROM "tabla" order by "column" asc) a) WHERE rnum BETWEEN "firstrange" AND "lastrange";

Resources