Query to count distinct values in Oracle db CLOB column - oracle

I would like to query an Oracle DB table for the number of rows containing each distinct value in a CLOB column.
This returns all rows containing a value:
select * from mytable where dbms_lob.instr(mycol,'value') > 0;
Using DBMS_LOB, this returns the number of rows containing that value:
select count(*) from mytable where dbms_lob.instr(mycol,'value') > 0;
But is it possible to query for the number of times (rows in which) each distinct value appears?

Depending on what that column really contains, see whether TO_CHAR helps.
SQL> create table mytable (mycol clob);
Table created.
SQL> insert into mytable
2 select 'Query to count distinct values' from dual union all
3 select 'I have no idea which values are popular' from dual;
2 rows created.
SQL> select count(*), to_char(mycol) toc
2 from mytable
3 where dbms_lob.instr(mycol,'value') > 0
4 group by to_char(mycol);
COUNT(*) TOC
---------- ----------------------------------------
1 Query to count distinct values
1 I have no idea which values are popular
SQL>

If your CLOB values are more than 4000 bytes (and if not, why are they CLOBs?) then it's not perfect - collisions are possible, if unlikely - but you could hash the CLOB values.
If you want to count the number of distinct values:
select count(distinct dbms_crypto.hash(src=>mycol, typ=>2))
from mytable
where dbms_lob.instr(mycol,'value') > 0;
If you want to count how many times each distinct value appears:
select mycol, cnt
from (
select mycol,
count(*) over (partition by dbms_crypto.hash(src=>mycol, typ=>2)) as cnt,
row_number() over (partition by dbms_crypto.hash(src=>mycol, typ=>2) order by null) as rn
from mytable
where dbms_lob.instr(mycol,'value') > 0
)
where rn = 1;
Both are likely to be fairly expensive and slow with a lot of data.
(typ=>2 gives the numeric value for dbms_crypto.hash_md5, as you can't refer to the package constant in a SQL call, at least up to 12cR1...)
Rather more crudely, but possibly significantly quicker, you could base the count on the just the first 4000 characters - which may or may not be plausible for your actual data:
select count(distinct dbms_lob.substr(mycol, 4000, 1))
from mytable
where dbms_lob.instr(mycol,'value') > 0;
select dbms_lob.substr(mycol, 4000, 1), count(*)
from mytable
where dbms_lob.instr(mycol,'value') > 0
group by dbms_lob.substr(mycol, 4000, 1);

Standard Oracle functions do not support distinction of CLOB values. But, if you have access to DBMS_CRYPTO.HASH function, you can compare CLOB hashes instead, and thus, get the desired output:
select myCol, h.num from
myTable t join
(select min(rowid) rid, count(rowid) num
from myTable
where dbms_lob.instr(mycol,'value') > 0
group by DBMS_CRYPTO.HASH(myCol, 3)) h
on t.rowid = h.rid;
Also, note, that there's a very little possibility of hash collision. But if that's ok with you, you can use this approach.

Related

(Oracle)Skip scanning table conditionally

My goal is to get average of val column in table HISTORY only when average of val columns in MYREF is NULL.
Below is my code. Is this correct way to do this? Is there an other efficient way?
Because, actual data in HISOTY are massive and I want to sure that the scanning the HISTROY will never occur if MYREF has value.
CREATE TABLE HISTORY (THEDATE VARCHAR(20),VAL NUMBER);
INSERT INTO HISTORY VALUES('20170101', 3);
INSERT INTO HISTORY VALUES('20200923', 4);
CREATE TABLE MYREF (VAL NUMBER);
INSERT INTO MYREF VALUES( NULL);
WITH MYREF_VAL AS( SELECT AVG(VAL) FROM MYREF)
,HISTORY_CAL AS (SELECT AVG(VAL) FROM HISTORY)
SELECT NVL((SELECT AVG(VAL) FROM MYREF), (SELECT AVG(VAL) FROM HISTORY)) VAL FROM DUAL
--Expected result 3.5 which is correct
Just use coalesce instead, because Oracle uses short-circuit evaluation for coalesce: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions023.htm
WITH MYREF_VAL AS( SELECT AVG(VAL) FROM MYREF)
,HISTORY_CAL AS (SELECT AVG(VAL) FROM HISTORY)
SELECT
COALESCE((SELECT AVG(VAL) FROM MYREF), (SELECT AVG(VAL) FROM HISTORY)) VAL
FROM DUAL;
Simple example:
SQL> select nvl(1, 1/0) a from dual;
select nvl(1, 1/0) a from dual
*
ERROR at line 1:
ORA-01476: divisor is equal to zero
SQL> select coalesce(1, 1/0) a from dual;
A
----------
1

Oracle APEX chart - Invalid data is displayed

I have a gauge chart that displays a value label and a percentage. it is based on a query that returns VALUE and MAX_VALUE. In some cases the query returns 0 as MAX_VALUE and then the chart displays a message 'Invalid data'. I believe this is happening because of division by zero. How to prevent this message from getting displayed and return 0 as both VALUE and MAX_VALUE and 0%?
SELECT NUM_FAILED VALUE, TOTAL_NUM MAX_VALUE
FROM
(
SELECT (
SELECT COUNT(*) FROM (select t1.name, t1.run_id
from Table1 t1
LEFT JOIN Table2 t2 ON t1.ID=t2.ID
WHERE t1.START_DATE > SYSDATE - 1
GROUP BY t1.name, t1.run_id)
) AS TOTAL_NUM,
(
SELECT COUNT(*) FROM (select name, run_id
from Table1 t1
LEFT JOIN Tabe2 t2 ON t1.ID=t2.ID
where t1.run_id IN (SELECT run_id FROM Table1 where err_code > 0)
AND t1.START_DATE > SYSDATE - 1
GROUP BY t1.name, t1.run_id)
) as NUM_FAILED
FROM dual)
Thank you for posting a query.
If you want to avoid 0 as a result, how about DECODE?
SELECT decode(NUM_FAILED, 0, 1E99, NUM_FAILED) VALUE,
decode(TOTAL_NUM , 0, 1E99, TOTAL_NUM ) MAX_VALUE
FROM (the rest of your query goes here)
You didn't say what's going on afterwards (i.e. what exactly causes division by zero - is it NUM_FAILED, or TOTAL_NUM)? The idea is: instead of dividing by zero, divide by a very large value (such as 1E99) and you'll get a very small number, for example:
SQL> select 24/1E99 from dual;
24/1E99
----------
2,4000E-98
SQL>
which is, practically, zero. See if you can do something with such an approach.

How can I update a column with PL\SQL by using a calculated value

I created a dummy database for learning purposes, and I purposefully created some duplicated records in one of the tables. In every case I want to flag one of the duplicated records as Latest='Y', and the other record as 'N', and for every single record the Latest flag would be 'Y'.
I tried to use PlSQL to go through all of my records, but when I try to use the previously calculated value (which would tell that its a duplicated record) it says that:
ORA-06550: line 20, column 17:
PLS-00201: identifier 'COUNTER' must be declared
Here is the statement I try to use:
DECLARE
CURSOR cur
IS
SELECT order_id, order_date, person_id,
amount, successfull_order, country_id, latest, ROWCOUNT AS COUNTER
FROM (SELECT order_id,
order_date,
person_id,
amount,
successfull_order,
country_id,
latest,
ROW_NUMBER () OVER (PARTITION BY order_id, order_date,
person_id, amount, successfull_order, country_id
ORDER BY order_id, order_date,
person_id, amount, successfull_order, country_id) ROWCOUNT
FROM orders) orders
FOR UPDATE OF orders.latest;
rec cur%ROWTYPE;
BEGIN
FOR rec IN cur
LOOP
IF MOD (COUNTER, 2) = 0
THEN
UPDATE orders
SET latest = 'N'
WHERE CURRENT OF cur;
ELSE
UPDATE orders
SET latest = 'Y'
WHERE CURRENT OF cur;
END IF;
END LOOP;
END;
I am new to PlSQL so I tried to modify the statements I found here:
http://www.adp-gmbh.ch/ora/plsql/cursors/for_update.html
What should I change in my statement, or should I use a different approach?
Thanks for your answers in advance!
Botond
Your refer the ROWNUM as COUNTER in your cursor.
While fetching, you should be accessing it from the cursor reference like MOD (rec.COUNTER, 2)
You need to declare the variable COUNTER and then you need to maintain (ie increment) it in your loop.
I suspect that you example is just for learning PL/SQL. However be aware that it's often much more performant to do things with a single SQL statement, as opposed to using cursor loops.
Your issue is that COUNTER is an attribute of the cursor record rec and not a PL/SQL variable. So:
IF MOD (COUNTER, 2) = 0
Should be:
IF MOD (rec.COUNTER, 2) = 0
However, you do not need to use PL/SQL or cursors, it can be done in a single MERGE statement:
Oracle Setup:
CREATE TABLE orders ( order_id, order_date, latest ) AS
SELECT 1, DATE '2017-01-01', CAST( NULL AS CHAR(1) ) FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-02', NULL FROM DUAL UNION ALL
SELECT 1, DATE '2017-01-03', NULL FROM DUAL UNION ALL
SELECT 2, DATE '2017-01-04', NULL FROM DUAL UNION ALL
SELECT 2, DATE '2017-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2017-01-06', NULL FROM DUAL;
Update Statement:
MERGE INTO orders dst
USING ( SELECT ROW_NUMBER() OVER ( PARTITION BY order_id
ORDER BY order_date DESC ) AS rn
FROM orders
) src
ON ( src.ROWID = dst.ROWID )
WHEN MATCHED THEN
UPDATE SET latest = CASE src.rn WHEN 1 THEN 'Y' ELSE 'N' END;
Output:
SELECT * FROM orders;
ORDER_ID ORDER_DATE LATEST
-------- ---------- ------
1 2017-01-01 N
1 2017-01-02 N
1 2017-01-03 Y
2 2017-01-04 Y
2 2017-01-01 N
3 2017-01-06 Y

PL/SQL select the minimum value of a vector

I have a table that looks like this :
CREATE OR REPLACE TYPE tip_orase AS VARRAY(10) of VARCHAR2(50)
/
CREATE table excursie_try (
cod_excursie NUMBER(4),
denumire VARCHAR2(20),
orase tip_orase,
status varchar2(20)
);
And i need to find out 'cod_excursie' of the entry that has in orase the lowest number of entries.
I can do this with a lot of work by counting for each entry the number of cities and selecting a minimum. Then making a query to give 'cod_excursie' of the entry that has the lowest number of entries in orase.
Is there a simpler way ? I tried something like:
select cod_excursie
from excursie_try, (select max(orase.count()) m
from excursie_try) T
where orase.count = T.m
and ROWNUM <= 1;
but it does not work. Any ideas or i have to take the LONG way ?
Try this:
select cod_excursie from (
select et.cod_excursie,
(select count(*) from table(et.orase)) n
from excursie_try et order by 2 desc
) where rownum = 1;
(select count(*) from table(et.orase)) is a single-row subquery, I used TABLE to emulate sql table on varray.
order by 2 desc in the subquery + where rownum = 1 is used for top-N reporting.

Best practice for pagination in Oracle?

Problem: I need write stored procedure(s) that will return result set of a single page of rows and the number of total rows.
Solution A: I create two stored procedures, one that returns a results set of a single page and another that returns a scalar -- total rows. The Explain Plan says the first sproc has a cost of 9 and the second has a cost of 3.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) AS RowNum, ...
) AS PageResult
WHERE RowNum >= #from
AND RowNum < #to
ORDER BY RowNum
SELECT COUNT(*)
FROM ...
Solution B: I put everything in a single sproc, by adding the same TotalRows number to every row in the result set. This solution feel hackish, but has a cost of 9 and only one sproc, so I'm inclined to use this solution.
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY D.ID DESC ) RowNum, COUNT(*) OVER () TotalRows,
WHERE RowNum >= from
AND RowNum < to
ORDER BY RowNum;
Is there a best-practice for pagination in Oracle? Which of the aforementioned solutions is most used in practice? Is any of them considered just plain wrong? Note that my DB is and will stay relatively small (less than 10GB).
I'm using Oracle 11g and the latest ODP.NET with VS2010 SP1 and Entity Framework 4.4. I need the final solution to work within the EF 4.4. I'm sure there are probably better methods out there for pagination in general, but I need them working with EF.
If you're already using analytics (ROW_NUMBER() OVER ...) then adding another analytic function on the same partitioning will add a negligible cost to the query.
On the other hand, there are many other ways to do pagination, one of them using rownum:
SELECT *
FROM (SELECT A.*, rownum rn
FROM (SELECT *
FROM your_table
ORDER BY col) A
WHERE rownum <= :Y)
WHERE rn >= :X
This method will be superior if you have an appropriate index on the ordering column. In this case, it might be more efficient to use two queries (one for the total number of rows, one for the result).
Both methods are appropriate but in general if you want both the number of rows and a pagination set then using analytics is more efficient because you only query the rows once.
In Oracle 12C you can use limit LIMIT and OFFSET for the pagination.
Example -
Suppose you have Table tab from which data needs to be fetched on the basis of DATE datatype column dt in descending order using pagination.
page_size:=5
select * from tab
order by dt desc
OFFSET nvl(page_no-1,1)*page_size ROWS FETCH NEXT page_size ROWS ONLY;
Explanation:
page_no=1
page_size=5
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY - Fetch 1st 5 rows only
page_no=2
page_size=5
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY - Fetch next 5 rows
and so on.
Refrence Pages -
https://dba-presents.com/index.php/databases/oracle/31-new-pagination-method-in-oracle-12c-offset-fetch
https://oracle-base.com/articles/12c/row-limiting-clause-for-top-n-queries-12cr1#paging
This may help:
SELECT * FROM
( SELECT deptno, ename, sal, ROW_NUMBER() OVER (ORDER BY ename) Row_Num FROM emp)
WHERE Row_Num BETWEEN 5 and 10;
A clean way to organize your SQL code could be trough WITH statement.
The reduced version implements also total number of results and total pages count.
For example
WITH SELECTION AS (
SELECT FIELDA, FIELDB, FIELDC FROM TABLE),
NUMBERED AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
SELECTION.*
FROM SELECTION)
SELECT
(SELECT COUNT(*) FROM NUMBERED) TOTAL_ROWS,
NUMBERED.*
FROM NUMBERED
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
This code gives you a paged resultset with two more fields:
TOTAL_ROWS with the total rows of your full SELECTION
RN the row number of the record
It requires 2 parameter: :page_size and :page_number to slice your SELECTION
Reduced Version
Selection implements already ROW_NUMBER() field
WITH SELECTION AS (
SELECT
ROW_NUMBER() OVER (ORDER BY FIELDA) RN,
FIELDA,
FIELDB,
FIELDC
FROM TABLE)
SELECT
:page_number PAGE_NUMBER,
CEIL((SELECT COUNT(*) FROM SELECTION ) / :page_size) TOTAL_PAGES,
:page_size PAGE_SIZE,
(SELECT COUNT(*) FROM SELECTION ) TOTAL_ROWS,
SELECTION.*
FROM SELECTION
WHERE
RN BETWEEN ((:page_size*:page_number)-:page_size+1) AND (:page_size*:page_number)
Try this:
select * from ( select * from "table" order by "column" desc ) where ROWNUM > 0 and ROWNUM <= 5;
I also faced a similar issue. I tried all the above solutions and none gave me a better performance. I have a table with millions of records and I need to display them on screen in pages of 20. I have done the below to solve the issue.
Add a new column ROW_NUMBER in the table.
Make the column as primary key or add a unique index on it.
Use the population program (in my case, Informatica), to populate the column with rownum.
Fetch Records from the table using between statement. (SELECT * FROM TABLE WHERE ROW_NUMBER BETWEEN LOWER_RANGE AND UPPER_RANGE).
This method is effective if we need to do an unconditional pagination fetch on a huge table.
Sorry, this one works with sorting:
SELECT * FROM (SELECT ROWNUM rnum,a.* FROM (SELECT * FROM "tabla" order by "column" asc) a) WHERE rnum BETWEEN "firstrange" AND "lastrange";

Resources