How to use Scalar Subquery on a single table for lob column? - oracle

I have a below query in Oracle having duplicate rows, where file_data is a BLOB column.
SELECT attachsysfilename, file_seq, version, file_size, lastupddttm, lastupdoprid, file_data
from PS_LP_EX_FILEATTCH
I want to apply distinct clause on top of it to get unique records. But unable to do so because of BLOB column.
Can someone please help in this regards?
How can I use the Scalar subquery on file_data column to get the DISTINCT records from the table?

assuming you have a primaru key for the PS_LP_EX_FILEATTCH table's row you could rey using subquery for an aggreagted result of the related primary key
select t.*, ps.file_data
from (
SELECT min(pk) my_id
attachsysfilename
,file_seq
,version
,file_size
,lastupddttm
,lastupdoprid
from PS_LP_EX_FILEATTCH
group by attachsysfilename
,file_seq
,version
,file_size
,lastupddttm
,lastupdoprid
) t
inner join PS_LP_EX_FILEATTCH ps ON t.my_id = ps.pk

You could use a hash of the BLOB values and group by the hash along with all the other (the non-BLOB) columns, select one pk (or rowid, see discussion below) from each group, for example min(pk) or min(rowid), and then select the corresponding rows from the table.
For hashing you could use ora_hash, but that is only for school work. If this is a serious project, you probably need to use dbms_crypto.hash.
Whether this is a correct solution depends on the possibility of collisions when hashing the BLOB values. In Oracle 11.1 - 11.2 you can use SHA-1 hashes (160 bits); perhaps this is enough to distinguish between your BLOB values. In higher Oracle versions, longer hashes (up to 512 bits in my version, 12.2) are available. Obviously, the longer the hashes, the slower the query - but also the higher the likelihood that you won't incorrectly identify different BLOB values as "duplicates" due to collisions.
Other responders asked about or mentioned a primary key (pk) column or columns in your table. If you have one, you can use it instead of the rowid in my query below - but rowid should work OK for this. (Still, pk is preferred if your table has one.)
dbms_crypto.hash takes an integer argument (1, 2, 3, etc.) for the hashing algorithm to be used. These are defined as named constants in the package. Alas, in SQL you can't reference package constants; you need to find the values beforehand. (Or, in Oracle 12.1 or higher, you can do it on the fly, by including a function in a with clause - but let's keep it simple.)
So, to cover Oracle 11.1 and higher, I'll assume we want to use the SHA-1 algorithm. To find its integer value from the package, I can do this:
begin
dbms_output.put_line(dbms_crypto.hash_sh1);
end;
/
3
PL/SQL procedure successfully completed.
If your Oracle version is higher, you can check for the value of hash_sh256, for example; on my system, it's 4. Remember this number, since we will use it below.
The query is:
select {whatever columns you need, including the BLOB}
from {your table}
where rowid in (
select min(rowid)
from {your table}
group by {the non-BLOB columns},
dbms_crypto.hash({BLOB column}, 3)
)
;
Notice the number 3 used in the hash function - that's the value of dbms_crypto.hash_sh1, which we found earlier.

I used below query to get the distinct rows including BLOB column.
select
attachsysfilename,
file_seq,
version,
lastupddttm,
lastupdoprid,
file_data,
ROW_NUMBER() OVER (PARTITION BY attachsysfilename,file_seq,version,lastupddttm,lastupdoprid ORDER BY attachsysfilename ,file_seq ,version,lastupddttm DESC,lastupdoprid) RNK
from ps_lp_ex_fileattch a
) WHERE RNK=1

Related

MD5 of entire row in oracle

Below is the query to get MD5 of entire row in snowflake
SELECT MD5(TO_VARCHAR(ARRAY_CONSTRUCT(*)))FROM T
taken from here
what is the alternative query in oracle to achieve such requirement without having to put all column names manually.
You may use the packaged function dbms_sqlhash.gethash as described below, but remember:
the package was removed from the documentation (I guess in 11g), so in the recent releases this is an undocumented feature
if you calculate the hash code from more than one row you must define order (order by on a unique key(s)) . Otherwise the calculated hash is not deterministic. (This was probaly the reason of the removal)
the columns with other data types than varchar2 are converted to strings before the hash calculation, so the result is dependent on the NLS setting. You must stabilize the NLS setting to get reproducible results, e.g. with alter session set nls_date_format='dd-mm-yyyy hh24:mi:ss';
The column must be concatenated with some special delimiter (that does not occure in the data) to aviod collision: 'A' || null is the same as null || 'A'. This are unknown internals, so it is rather hard to compare the result MD5 hash with hash calculated on other (non Oracle) data.
You need extra grant to execute the package
Some additional info
Example
select * from tab where x=1;
X Y Z
---------- - -------------------
1 a 13.10.2021 00:00:00
select
dbms_sqlhash.gethash(
sqltext => 'select * from tab where x=1',
digest_type => 2 /* dbms_crypto.hash_md5 */
) MD5
from dual;
MD5
--------------------------------
215A9C4642A3691F951DD8060877D191
Order Independent Hash Code of a Table
Contrary to a file (where the order matter) in a database table the order is not relevant. It would be therefore meaningfull to have a possibility to calculate an order independent hash code of a table.
Unfortunately this feature is currently not available in Oracle, but was implemented as a prototype as described here

Oracle Merge ignore duplicates

I have a table accessible via an oracle database link, which I'm attempting to pull into a local database table because reasons.
MERGE INTO MEMBERSHIPS LOCAL
USING (
SELECT DISTINCT
REMOTE.GROUP_NAME "GROUP_NAME",
REMOTE.USER_ACCOUNT "USERNAME",
REMOTE.SOME_OTHER_COLUMN "COL3"
FROM MEMBERSHIPS#link REMOTE
) REMOTE
ON (
REMOTE.GROUP_NAME = LOCAL.GROUP_NAME AND
REMOTE.USERNAME = LOCAL.USERNAME
)
WHEN MATCHED THEN
UPDATE SET
LOCAL.COL3 = REMOTE.COL3
LOCAL.UPDATED_AT = sysdate
WHEN NOT MATCHED THEN
INSERT (ID, GROUP_NAME, USERNAME, COl3, CREATED_AT, UPDATED_AT)
VALUES (MEMBERSHIPS_SEQ.NEXTVAL, REMOTE.GROUP_NAME, REMOTE.USERNAME, REMOTE.COl3, sysdate, sysdate)
Most unfortunately, the owner of the original database has not lost much sleep worrying about data integrity, so in the 3 millionish rows, there are 71 duplicates, which blows up my Unique Index on Group Name, Username. The merge will process if I remove the uniqueness constraint, however these rows will then blow up on a subsequent execution of the query with a ORA-30926: unable to get a stable set of rows in the source tables.
This is the sort of thing that will get run on a daily basis, so I need to find a way to ignore duplicates
EDIT:
I would have thought the distinct would have took care of the problem for me, but it doesn't. I still get duplicates:
SELECT DISTINCT
REMOTE.GROUP_NAME,
REMOTE.USER_ACCOUNT
COUNT(*)
FROM MEMBERSHIPS#link REMOTE
GROUP BY
REMOTE.GROUP_NAME,
REMOTE.USER_ACCOUNT
HAVING COUNT(*) > 1;
Shows 71 GROUP_NAME/USER_ACCOUNT combinations that are still duplicated
In similar situation you can always try to rank rows to avoid duplicates, instead of DISTINCT.
In this case it would be something like that:
MERGE INTO MEMBERSHIPS LOCAL
USING (
SELECT rank() over(partition by REMOTE.GROUP_NAME
,REMOTE.USER_ACCOUNT
order by NVL(REMOTE.UPDATED_AT,REMOTE.CREATED_AT) DESC NULLS LAST) r,
REMOTE.GROUP_NAME "GROUP_NAME",
REMOTE.USER_ACCOUNT "USERNAME",
REMOTE.SOME_OTHER_COLUMN "COL3"`
FROM MEMBERSHIPS#link REMOTE ) REMOTE
ON (
REMOTE.GROUP_NAME = LOCAL.GROUP_NAME
AND REMOTE.USERNAME = LOCAL.USERNAME
AND REMOTE.r = 1
)
WHEN MATCHED THEN
UPDATE SET
LOCAL.COL3 = REMOTE.COL3
LOCAL.UPDATED_AT = sysdate
WHEN NOT MATCHED THEN
INSERT (ID, GROUP_NAME, USERNAME, COl3, CREATED_AT, UPDATED_AT)
VALUES (MEMBERSHIPS_SEQ.NEXTVAL, REMOTE.GROUP_NAME, REMOTE.USERNAME, REMOTE.COl3, sysdate, sysdate);
I tried to reproduce the problem you describe, using two local tables. With the merge as you've provided it, I have no problems caused by duplicates in the source table (in Oracle 11.2.0.4). If I remove the DISTINCT keyword from the subquery in the USING clause, then I get exactly the problems you have described - constraint violations in the first attempt, or ORA-30926 in the second attempt if I remove the unique constraint.
The two explanations I can think of for this are that (a) you are hitting some bug in Oracle, possibly involving DISTINCT in remote subqueries, or (b) the merge statement you have actually been running doesn't include that DISTINCT. (I also considered the possibility that NULL values might be causing unexpected results from the DISTINCT operations, but I couldn't come up with a way that would happen.)
EDIT:
Another half-thought-out explanation - if the two databases use different character sets, I wonder if it is possible that values that are distinct in the original table are getting converted in transit to identical values?

A fast query that selects the number of rows in each table

I want a query that selects the number of rows in each table
but they are NOT updated statistically .So such query will not be accurate:
select table_name, num_rows from user_tables
i want to select several schema and each schema has minimum 500 table some of them contain a lot of columns . it will took for me days if i want to update them .
from the site ask tom he suggest a function includes this query
'select count(*)
from ' || p_tname INTO l_columnValue;
such query with count(*) is really slow and it will not give me fast results.
Is there a query that can give me how many rows are in table in a fast way ?
You said in a comment that you want to delete (drop?) empty tables. If you don't want an exact count but only want to know if a table is empty you can do a shortcut count:
select count(*) from table_name where rownum < 2;
The optimiser will stop when it reaches the first row - the execution plan shows a 'count stopkey' operation - so it will be fast. It will return zero for an empty table, and one for a table with any data - you have no idea how much data, but you don't seem to care.
You still have a slight race condition between the count and the drop, of course.
This seems like a very odd thing to want to do - either your application uses the table, in which case dropping it will break something even if it's empty; or it doesn't, in which case it shouldn't matter whether it has (presumably redundant) and it can be dropped regardless. If you think there might be confusion, that sounds like your source (including DDL) control needs some work, maybe?
To check if either table in two schemas have a row, just count from both of them; either with a union:
select max(c) from (
select count(*) as c from schema1.table_name where rownum < 2
union all
select count(*) as c from schema2.table_name where rownum < 2
);
... or with greatest and two sub-selects, e.g.:
select greatest(
(select count(*) from schema1.table_name where rownum < 2),
(select count(*) from schema2.table_name where rownum < 2)
) from dual;
Either would return one if either table has any rows, and would only return zero f they were both empty.
Full Disclosure: I had originally suggested a query that specifically counts a column that's (a) indexed and (b) not null. #AlexPoole and #JustinCave pointed out (please see their comments below) that Oracle will optimize a COUNT(*) to do this anyway. As such, this answer has been altered significantly.
There's a good explanation here for why User_Tables shouldn't be used for accurate row counts, even when statistics are up to date.
If your tables have indexes which can be used to speed up the count by doing an index scan rather than a table scan, Oracle will use them. This will make the counts faster, though not by any means instantaneous. That said, this is the only way I know to get an accurate count.
To check for empty (zero row) tables, please use the answer posted by Alex Poole.
You could make a table to hold the counts of each table. Then, set a trigger to run on INSERT for each of the tables you're counting that updates the main table.
You'd also need to include a trigger for DELETE.

Oracle random row from table

I found this solution for selecting a random row from a table in Oracle. Actually sorting rows in a random manner, but you can fetch only the first row for a random result.
SELECT *
FROM table
ORDER BY dbms_random.value;
I just don't understand how it works. After ORDER BY it should be a column used for sorting. I see that "dbms_random.value" returns a value lower than zero. This behavior can be explained or is just like that?
Thanks
you could also think of it like this:
SELECT col1, col2, dbms_random.value
FROM table
ORDER BY 3
In this example the number 3 = the third column
When you order by dbms_random.value, Oracle orders by the expression, not for a column.For every record Oracle calculate a random number, and then order by this number.
In a similar way, is like this:
select * from emp order by upper(ename);
You have an order by based on a function.

What is the dual table in Oracle?

I've heard people referring to this table and was not sure what it was about.
It's a sort of dummy table with a single record used for selecting when you're not actually interested in the data, but instead want the results of some system function in a select statement:
e.g. select sysdate from dual;
See http://www.adp-gmbh.ch/ora/misc/dual.html
As of 23c, Oracle supports select sysdate /* or other value */, without from dual, as has been supported in MySQL for some time already.
It is a dummy table with one element in it. It is useful because Oracle doesn't allow statements like
SELECT 3+4
You can work around this restriction by writing
SELECT 3+4 FROM DUAL
instead.
From Wikipedia
History
The DUAL table was created by Chuck Weiss of Oracle corporation to provide a table for joining in internal views:
I created the DUAL table as an underlying object in the Oracle Data Dictionary. It was never meant to be seen itself, but instead used
inside a view that was expected to be queried. The idea was that you
could do a JOIN to the DUAL table and create two rows in the result
for every one row in your table. Then, by using GROUP BY, the
resulting join could be summarized to show the amount of storage for
the DATA extent and for the INDEX extent(s). The name, DUAL, seemed
apt for the process of creating a pair of rows from just one. 1
It may not be obvious from the above, but the original DUAL table had two rows in it (hence its name). Nowadays it only has one row.
Optimization
DUAL was originally a table and the database engine would perform disk IO on the table when selecting from DUAL. This disk IO was usually logical IO (not involving physical disk access) as the disk blocks were usually already cached in memory. This resulted in a large amount of logical IO against the DUAL table.
Later versions of the Oracle database have been optimized and the database no longer performs physical or logical IO on the DUAL table even though the DUAL table still actually exists.
I think this wikipedia article may help clarify.
http://en.wikipedia.org/wiki/DUAL_table
The DUAL table is a special one-row
table present by default in all Oracle
database installations. It is suitable
for use in selecting a pseudocolumn
such as SYSDATE or USER The table has
a single VARCHAR2(1) column called
DUMMY that has a value of "X"
It's the special table in Oracle. I often use it for calculations or checking system variables. For example:
Select 2*4 from dual prints out the result of the calculation
Select sysdate from dual prints the server current date.
A utility table in Oracle with only 1 row and 1 column. It is used to perform a number of arithmetic operations and can be used generally where one needs to generate a known output.
SELECT * FROM dual;
will give a single row, with a single column named "DUMMY" and a value of "X" as shown here:
DUMMY
-----
X
Kind of a pseudo table you can run commands against and get back results, such as sysdate. Also helps you to check if Oracle is up and check sql syntax, etc.
The DUAL table is a special one-row table present by default in all Oracle database installations. It is suitable for use in selecting a pseudocolumn such as SYSDATE or USER
The table has a single VARCHAR2(1) column called DUMMY that has a value of "X"
You can read all about it in http://en.wikipedia.org/wiki/DUAL_table
DUAL is necessary in PL/SQL development for using functions that are only available in SQL
e.g.
DECLARE
x XMLTYPE;
BEGIN
SELECT xmlelement("hhh", 'stuff')
INTO x
FROM dual;
END;
More Facts about the DUAL....
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1562813956388
Thrilling experiments done here, and more thrilling explanations by Tom
DUAL we mainly used for getting the next number from the sequences.
Syntax : SELECT 'sequence_name'.NEXTVAL FROM DUAL
This will return the one row one column value(NEXTVAL column name).
another situation which requires select ... from dual is when we want to retrieve the code (data definition) for different database objects (like TABLE, FUNCTION, TRIGGER, PACKAGE), using the built in DBMS_METADATA.GET_DDL function:
select DBMS_METADATA.GET_DDL('TABLE','<table_name>') from DUAL;
select DBMS_METADATA.GET_DDL('FUNCTION','<function_name>') from DUAL;
in is true that nowadays the IDEs do offer the capability to view the DDL of a table, but in simpler environments like SQL Plus this can be really handy.
EDIT
a more general situation: basically, when we need to use any PL/SQL procedure inside a standard SQL statement, or when we want to call a procedure from the command line:
select my_function(<input_params>) from dual;
both recipes are taken from the book 'Oracle PL/SQL Recipes' by Josh Juneau and Matt Arena
The DUAL is special one row, one column table present by default in all Oracle databases. The owner of DUAL is SYS.
DUAL is a table automatically created by Oracle Database along with the data functions. It is always used to get the operating systems functions(like date, time, arithmetic expression., etc.)
SELECT SYSDATE from dual;
It's a object to put in the from that return 1 empty row. For example:
select 1 from dual;
returns 1
select 21+44 from dual;
returns 65
select [sequence].nextval from dual;
returns the next value from the sequence.

Resources