Extract TEXT from a CLOB field - oracle

I have a CLOB field in my Oracle Database that store TEXT data in the following format:
__99__RU_LOCKED=N;;__99__RU_SUSPENDED=Y;;__17__USER_TYPE=A;;__17__USER_TYPE_610=A;;__17__GUIFLAG=0;;__17__DEFAULT_LANG_610=E;;__17__OUTPUT_DEVICE_46=LOCL;;__17__PRINT_IMMED=G;;__17__DELETE_AFTER_PRINT=D;;__17__CATT=*BLANK;;__17__CATT_46=*;;__17__DEC_FORMAT=*BLANK;;__17__DEC_FORMAT_46=X;;__17__DATE_FORMAT=2;;__17__PARAMETERS=OM_OBJM_NO_DISPLAYX;;__17__MEAS_EASLPFL=0;;__17__USER_GROUP=S1BR22;;__17__VALID_FROM=20080222;;__17__VALID_UNTIL=99991231;;__17__ACCOUNT=37004968;;
I'm using TOAD and while I am creating the query I can read the CLOB field with the following:
--- To read the CLOB field.
select DBMS_LOB.substr(ADD_INFO_MASTER) from USER
This select return me the CLOB field HUMAN READABLE.
My question is: Is there any way to extract the one single value like ACCOUNT value from the line above?
Keep in mind that this CLOB field can variate and the __17__ACCOUNT= will not be in the same place every time. I need a way to extract to locate the ;;__17__ACCOUNT= (this will be a pattern) and extract the the value 37004968.
It is possible to achieve this while performing a query in TOAD?

If you want to deal with CLOB values larger than 4000 symbols length (Oracle 11g) or 32K length (Oracle 12c) then you must use DBMS_LOB package.
This package contains instr() and substr() functions which operates on LOBs.
In your case query looks like that:
with prm as (
select '__17__ACCOUNT' as fld_start from dual
)
select
dbms_lob.substr(
text,
-- length of substring
(
-- position of delimiter found after start of desired field
dbms_lob.instr(text, ';;', dbms_lob.instr(text, prm.fld_start))
-
-- position of the field description plus it's length
( dbms_lob.instr(text, prm.fld_start) + length(fld_start) + 1 )
),
-- start position of substring
dbms_lob.instr(text,prm.fld_start) + length(fld_start) + 1
)
from
text_table,
prm
Query above uses this setup:
create table text_table(text clob);
insert into text_table(text) values (
'__99__RU_LOCKED=N;;__99__RU_SUSPENDED=Y;;__17__USER_TYPE=A;;__17__USER_TYPE_610=A;;__17__GUIFLAG=0;;__17__DEFAULT_LANG_610=E;;__17__OUTPUT_DEVICE_46=LOCL;;__17__PRINT_IMMED=G;;__17__DELETE_AFTER_PRINT=D;;__17__CATT=*BLANK;;__17__CATT_46=*;;__17__DEC_FORMAT=*BLANK;;__17__DEC_FORMAT_46=X;;__17__DATE_FORMAT=2;;__17__PARAMETERS=OM_OBJM_NO_DISPLAYX;;__17__MEAS_EASLPFL=0;;__17__USER_GROUP=S1BR22;;__17__VALID_FROM=20080222;;__17__VALID_UNTIL=99991231;;__17__ACCOUNT=37004968;;'
);
For everyday use with development tools it may be useful to define a function which returns value of field with desired name and use it instead of writing complicated expressions each time.
E.g. :
create or replace function get_field_from_text(
pi_text in clob,
pi_field_name in varchar2
) return varchar2 deterministic parallel_enable
is
v_start_pos binary_integer;
v_field_start varchar2(4000);
v_field_value varchar2(32767);
begin
if( (pi_text is null) or (pi_field_name is null) ) then
return null;
end if;
v_field_start := pi_field_name || '=';
v_start_pos := dbms_lob.instr(pi_text, v_field_start);
if(v_start_pos is null) then
return null;
end if;
v_start_pos := v_start_pos + length(v_field_start);
v_field_value := dbms_lob.substr(
pi_text,
(dbms_lob.instr(pi_text, ';;', v_start_pos) - v_start_pos),
v_start_pos
);
return v_field_value;
end;
Usage:
select get_field_from_text(text,'__17__OUTPUT_DEVICE_46') from text_table

You could use a regular expression to extract the value:
WITH your_table AS (
SELECT '__99__RU_LOCKED=N;;__99__RU_SUSPENDED=Y;;__17__USER_TYPE=A;;__17__USER_TYPE_610=A;;__17__GUIFLAG=0;;__17__DEFAULT_LANG_610=E;;__17__OUTPUT_DEVICE_46=LOCL;;__17__PRINT_IMMED=G;;__17__DELETE_AFTER_PRINT=D;;__17__CATT=*BLANK;;__17__CATT_46=*;;__17__DEC_FORMAT=*BLANK;;__17__DEC_FORMAT_46=X;;__17__DATE_FORMAT=2;;__17__PARAMETERS=OM_OBJM_NO_DISPLAYX;;__17__MEAS_EASLPFL=0;;__17__USER_GROUP=S1BR22;;__17__VALID_FROM=20080222;;__17__VALID_UNTIL=99991231;;__17__ACCOUNT=37004968;;' clob_field FROM DUAL
)
SELECT REGEXP_SUBSTR(clob_field,'__17__ACCOUNT=.*;;')
FROM your_table
Using that you would get "__17__ACCOUNT=37004968;;". You can easily extract the value with SUBSTR.
I think that in Oracle 11g REGEXP_SUBSTR has extra parameters that would let you extract a certain group within the regular expression.

You can use INSTR and SUBSTR with CLOB datatype:
WITH T1 AS (
SELECT '__99__RU_LOCKED=N;;__99__RU_SUSPENDED=Y;;__17__USER_TYPE=A;;__17__USER_TYPE_610=A;;__17__GUIFLAG=0;;__17__DEFAULT_LANG_610=E;;__17__OUTPUT_DEVICE_46=LOCL;;__17__PRINT_IMMED=G;;__17__DELETE_AFTER_PRINT=D;;__17__CATT=*BLANK;;__17__CATT_46=*;;__17__DEC_FORMAT=*BLANK;;__17__DEC_FORMAT_46=X;;__17__DATE_FORMAT=2;;__17__PARAMETERS=OM_OBJM_NO_DISPLAYX;;__17__MEAS_EASLPFL=0;;__17__USER_GROUP=S1BR22;;__17__VALID_FROM=20080222;;__17__VALID_UNTIL=99991231;;__17__ACCOUNT=37004968;;' TEXT FROM DUAL
)
SELECT SUBSTR(TEXT,
INSTR(TEXT, '__17__ACCOUNT=') + LENGTH('__17__ACCOUNT') + 1, -- find the first position of the value
INSTR (TEXT, ';;', INSTR(TEXT, '__17__ACCOUNT=')) - (INSTR(TEXT, '__17__ACCOUNT=') + LENGTH('__17__ACCOUNT') + 1) -- length to read. Difference between the end position (the first ;; after your placeholder) and the value start position (the same value as above)
)
FROM T1;
However I like the REGEXP solution proposed by pablomatico more.

Related

PL/SQL : Need to compare data for every field in a table in plsql

I need to create a procedure which will take collection as an input and compare the data with staging table data row by row for every field (approx 50 columns).
Business logic :
whenever a staging table column value will mismatch with the corresponding collection variable value then i need to update 'FAIL' into staging table STATUS column and reason into REASON column for that row.
If matched then need to update 'SUCCESS' in STATUS column.
Payload will be approx 500 rows in each call.
I have created below sample script:
PKG Specification :
CREATE OR REPLACE
PACKAGE process_data
IS
TYPE pass_data_rec
IS
record
(
p_eid employee.eid%type,
p_ename employee.ename%type,
p_salary employee.salary%type,
p_dept employee.dept%type
);
type p_data_tab IS TABLE OF pass_data_rec INDEX BY binary_integer;
PROCEDURE comp_data(inpt_data IN p_data_tab);
END;
PKG Body:
CREATE OR REPLACE
PACKAGE body process_data
IS
PROCEDURE comp_data (inpt_data IN p_data_tab)
IS
status VARCHAR2(10);
reason VARCHAR2(1000);
cnt1 NUMBER;
v_eid employee_copy.eid%type;
v_ename employee_copy.ename%type;
BEGIN
FOR i IN 1..inpt_data.count
LOOP
SELECT ec1.eid,ec1.ename,COUNT(*) over () INTO v_eid,v_ename,cnt1
FROM employee_copy ec1
WHERE ec1.eid = inpt_data(i).p_eid;
IF cnt1 > 0 THEN
IF (v_eid=inpt_data(i).p_eid AND v_ename = inpt_data(i).p_ename) THEN
UPDATE employee_copy SET status = 'SUCCESS' WHERE eid = inpt_data(i).p_eid;
ELSE
UPDATE employee_copy SET status = 'FAIL' WHERE eid = inpt_data(i).p_eid;
END IF;
ELSE
NULL;
END IF;
END LOOP;
COMMIT;
status :='success';
EXCEPTION
WHEN OTHERS THEN
status:= 'fail';
--reason:=sqlerrm;
END;
END;
But in this approach i have below mentioned issues.
Need to declare all local variables for each column value.
Need to compare all variable data using 'and' operator. Not sure whether it is correct way or not because if there are 50 columns then if condition will become very heavy.
IF (v_eid=inpt_data(i).p_eid AND v_ename = inpt_data(i).p_ename) THEN
Need to update REASON column when any column data mismatched (first mismatched column name) for that row, in this approach i am not able to achieve.
Please suggest any other good way to achieve this requirement.
Edit :
There is only one table at my end i.e target table. Input will come from any other source as collection object.
REVISED Answer
You could load the the records into t temp table, but unless you want additional processing it's not necessary. AFAIK there is no way to identify the offending column (first one only) without slugging through column-by-column. However, your other concern having to declare a variable is not necessary. You can declare a single variable defined as %rowtype which gives you access to each column by name.
Looping through an array of data to find the occasional error is just bad (imho) with SQL available to eliminate the good ones in one fell swoop. And it's available here. Even though your input is a array we can use as a table by using the TABLE operator, which allows an array (collection) as though it were a database table. So the MINUS operator can till be employed. The following routine will set the appropriate status and identify the first miss matched column for each entry in the input array. It reverts to your original definition in package spec, but replaces the comp_data procedure.
create or replace package body process_data
is
procedure comp_data (inpt_data in p_data_tab)
is
-- define local array to hold status and reason for ecah.
type status_reason_r is record
( eid employee_copy.eid%type
, status employee_copy.status%type
, reason employee_copy.reason%type
);
type status_reason_t is
table of status_reason_r
index by pls_integer;
status_reason status_reason_t := status_reason_t();
-- define error array to contain the eid for each that have a mismatched column
type error_eids_t is table of employee_copy.eid%type ;
error_eids error_eids_t;
current_matched_indx pls_integer;
/*
Helper function to identify 1st mismatched column in error row.
Here is where we slug our way through each column to find the first column
value mismatch. Note: There is actually validate the column sequence, but
for purpose here we'll proceed in the input data type definition.
*/
function identify_mismatch_column(matched_indx_in pls_integer)
return varchar2
is
employee_copy_row employee_copy%rowtype;
mismatched_column employee_copy.reason%type;
begin
select *
into employee_copy_row
from employee_copy
where employee_copy.eid = inpt_data(matched_indx_in).p_eid;
-- now begins the task of finding the mismatched column.
if employee_copy_row.ename != inpt_data(matched_indx_in).p_ename
then
mismatched_column := 'employee_copy.ename';
elsif employee_copy_row.salary != inpt_data(matched_indx_in).p_salary
then
mismatched_column := 'employee_copy.salary';
elsif employee_copy_row.dept != inpt_data(matched_indx_in).p_dept
then
mismatched_column := 'employee_copy.dept';
-- elsif continue until ALL columns tested
end if;
return mismatched_column;
exception
-- NO_DATA_FOUND is the one error that cannot actually be reported in the customer_copy table.
-- It occurs when an eid exista in the input data but does not exist in customer_copy.
when NO_DATA_FOUND
then
dbms_output.put_line( 'Employee (eid)='
|| inpt_data(matched_indx_in).p_eid
|| ' does not exist in employee_copy table.'
);
return 'employee_copy.eid ID is NOT in table';
end identify_mismatch_column;
/*
Helper function to find specified eid in the initial inpt_data array
Since the resulting array of mismatching eid derive from a select without sort
there is no guarantee the index values actually match. Nor can we sort to build
the error array, as there is no way to know the order of eid in the initial array.
The following helper identifies the index value in the input array for the specified
eid in error.
*/
function match_indx(eid_in employee_copy.eid%type)
return pls_integer
is
l_at pls_integer := 1;
l_searching boolean := true;
begin
while l_at <= inpt_data.count
loop
exit when eid_in = inpt_data(l_at).p_eid;
l_at := l_at + 1;
end loop;
if l_at > inpt_data.count
then
raise_application_error( -20199, 'Internal error: Find index for ' || eid_in ||' not found');
end if;
return l_at;
end match_indx;
-- Main
begin
-- initialize status table for each input enter
-- additionally this results is a status_reason table in a 1:1 with the input array.
for i in 1..inpt_data.count
loop
status_reason(i).eid := inpt_data(i).p_eid;
status_reason(i).status :='SUCCESS';
end loop;
/*
We can assume the majority of data in the input array is valid meaning the columns match.
We'll eliminate all value rows by selecting each and then MINUSing those that do match on
each column. To accomplish this cast the input with TABLE function allowing it's use in SQL.
Following produces an array of eids that have at least 1 column mismatch.
*/
select p_eid
bulk collect into error_eids
from (select p_eid, p_ename, p_salary, p_dept from TABLE(inpt_data)
minus
select eid, ename, salary, dept from employee_copy
) exs;
/*
The error_eids array now contains the eid for each miss matched data item.
Mark the status as failed, then begin the long hard process of identifying
the first column causing the mismatch.
The following loop used the nested functions to slug the way through.
This keeps the main line logic clear.
*/
for i in 1 .. error_eids.count -- if all inpt_data rows match then count is 0, we bypass the enttire loop
loop
current_matched_indx := match_indx(error_eids(i));
status_reason(current_matched_indx).status := 'FAIL';
status_reason(current_matched_indx).reason := identify_mismatch_column(current_matched_indx);
end loop;
-- update employee_copy with appropriate status for each row in the input data.
-- Except for any cid that is in the error eid table but doesn't exist in the customer_copy table.
forall i in inpt_data.first .. inpt_data.last
update employee_copy
set status = status_reason(i).status
, reason = status_reason(i).reason
where eid = inpt_data(i).p_eid;
end comp_data;
end process_data;
There are a couple other techniques used you may want to look into if you are not familiar with them:
Nested Functions. There are 2 functions defined and used in the procedure.
Bulk Processing. That is Bulk Collect and Forall.
Good Luck.
ORIGINAL Answer
It is NOT necessary to compare each column nor build a string by concatenating. As you indicated comparing 50 columns becomes pretty heavy. So let the DBMS do most of the lifting. Using the MINUS operator does exactly what you need.
... the MINUS operator, which returns only unique rows returned by the
first query but not by the second.
Using that this task needs only 2 Updates: 1 to mark "fail", and 1 to mark "success". So try:
create table e( e_id integer
, col1 varchar2(20)
, col2 varchar2(20)
);
create table stage ( e_id integer
, col1 varchar2(20)
, col2 varchar2(20)
, status varchar2(20)
, reason varchar2(20)
);
-- create package spec and body
create or replace package process_data
is
procedure comp_data;
end process_data;
create or replace package body process_data
is
package body process_data
procedure comp_data
is
begin
update stage
set status='failed'
, reason='No matching e row'
where e_id in ( select e_id
from (select e_id, col1, col2 from stage
except
select e_id, col1, col2 from e
) exs
);
update stage
set status='success'
where status is null;
end comp_data;
end process_data;
-- test
-- populate tables
insert into e(e_id, col1, col2)
select (1,'ABC','def') from dual union all
select (2,'No','Not any') from dual union all
select (3,'ok', 'best ever') from dual union all
select (4,'xx','zzzzzz') from dual;
insert into stage(e_id, col1, col2)
select (1,'ABC','def') from dual union all
select (2,'No','Not any more') from dual union all
select (4,'yy', 'zzzzzz') from dual union all
select (5,'no e','nnnnn') from dual;
-- run procedure
begin
process_data.comp_date;
end;
-- check results
select * from stage;
Don't ask. Yes, you to must list every column you wish compared in each of the queries involved in the MINUS operation.
I know the documentation link is old (10gR2), but actually finding Oracle documentation is a royal pain. But the MINUS operator still functions the same in 19c;

Alternative to REGEXP_LIKE due to regular expression too long

I am getting an ORA-12733: regular expression too long error when trying to find if certain ids already inside the database.
regexp_like (','||a.IDs||',',',('||replace(b.IDs,',','|')||'),')
a.IDs and b.IDs are in a format of something like id=16069,16070,16071,16072,16099,16100.
i will replace comma with | in b so it will tell me if any of the number is matched. The length of both a.IDs and b.IDs might vary from different queries.
Oracle regexp_like limit is only 512. Anyone know if other possible solutions?
Why on earth do you store list of numbers as string?
Anyway, one possible solution is this one. Create TYPE and FUNCTION like this:
CREATE OR REPLACE TYPE NUMBER_TABLE_TYPE AS TABLE OF NUMBER;
CREATE OR REPLACE FUNCTION SplitArray(LIST IN VARCHAR2, Separator IN VARCHAR2) RETURN NUMBER_TABLE_TYPE IS
OutTable NUMBER_TABLE_TYPE;
BEGIN
IF LIST IS NULL THEN
RETURN NULL;
ELSE
SELECT REGEXP_SUBSTR(LIST, '[^'||Separator||']+', 1, LEVEL)
BULK COLLECT INTO OutTable
FROM dual
CONNECT BY REGEXP_SUBSTR(LIST, '[^'||Separator||']+', 1, LEVEL) IS NOT NULL;
END IF;
IF OutTable.COUNT > 0 THEN
RETURN OutTable;
ELSE
RETURN NULL;
END IF;
END SplitArray;
Then you query for a single number as this:
WHERE 16071 MEMBER OF SplitArray(a.IDs, ',')
or for several numbers as this:
WHERE SplitArray(b.IDs, ',') SUBMULTISET OF SplitArray(a.IDs, ',')
Have a look at Multiset Conditions

Need help in converting the clob to varchar in oracle, I have to use the varchar in case function of oracle

I have a requirement where I have to fetch the data from the clob data type , convert to the varchar2, to make a pivot for oracle 10g.
I am using the following
select max(case
when key='abc'
then dbms_lob.substr(value)
end) as data_abc
from table.
if the value is less than 4000 the above query works fine but if it is more than 4000 it shows and error of buffer limit. On reading few blogs I came to know that dbms_lob.substr() can handle only 4000 characters in sql but can handle up to 32k in a pl/sql statement.
if I write a procedure and run it, it works fine. but i want to use it in a function. below is my function :
create or replace FUNCTION CLOBTOVARCHAR
RETURN varchar2 is out_attribute_var varchar2(32767) ;
BEGIN
FOR i IN (select attribute_Value from car_course_attribute where id=1547156)
LOOP
out_attribute_var := dbms_lob.substr(i.attribute_Value, 32000, 1);
END LOOP;
RETURN out_attribute_var;
EXCEPTION
WHEN OTHERS THEN
raise_application_error(-20001, 'An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);
END CLOBTOVARCHAR;
If the data is small it works fine but if the data is bigger than 4k, it gives same error back. now I have two questions:
1) Am I doing right by converting the clob to varchar2 as I want to get pivot
2) Is my function correct?
Unfortunately SQL in Oracle supports varchars up to 4000.
Your function won't work in SQL queries.
You can upgrade to oracle 12c which increases this limit up to 32767 characters.
However there is a simple workaround that works on 11g, here is an example of CLOBs pivot for 3 columns:
SELECT (select val from xx
where rowid = a_rid ) a,
(select val from xx
where rowid = b_rid ) b,
(select val from xx
where rowid = c_rid ) c
from (
select max( case key when 'A' then rowid end ) a_rid,
max( case key when 'B' then rowid end ) b_rid,
max( case key when 'C' then rowid end ) c_rid
from xx
);
Here is SQLFiddle demo with 3 strings, each of them contains 7996 characters.
A result row in this demo is very wide, it has over 150 "horizontal pages"
I am surprised that SQLFiddle can display rows 24K characters wide
A third query in this demo display lengths of pivoted colums, each of them has 7996 characters.
dbms_lob.substr( clob_column, for_how_many_bytes, from_which_byte );
for example:
select dbms_lob.substr( x, 4000, 1 ) from T;
will get me the first 4000 bytes of the clob. Note that when using SQL as I did, the max length is 4000. You can get 32k using plsql:
declare
my_var long;
begin
for x in ( select X from t )
loop
my_var := dbms_lob.substr( x.X, 32000, 1 );
....
This was accurate as of 2016 or so. As of Oracle Database 12c, we now support extended varchars2's - up to a length of now 32k - which may be fewer than 32,000 and change characters if using Multi-Byte character set (2,3,or 4 bytes per character)
Originally answered by Tom here
I have a working scenario for an 8k long string into a varchar2 using just SQL on LiveSQL here
The code for a 12c environment with extended varchars enabled.
create table long_vars2 (a integer, xyz varchar2(8000), clobs clob);
create or replace procedure p( p_x in int, p_new_text in varchar2 )
as
begin
insert into long_vars2 (a, xyz, clobs) values ( p_x, p_new_text, p_new_text );
end;
/
exec p(1, rpad('*',8000,'*') );
select dbms_lob.substr( clobs, 7000, 1 ) from long_vars2;

Oracle: Convert xml entities in a varchar2 field to utf-8 characters

I have a field in a table which holds XML entities for special characters, since the table is in latin-1.
E.g. "Hallöle slovenčina" (the "ö" is in latin-1, but the "č" in "slovenčina" had to be converted to an entity by some application that stores the values into the database)
Now I need to export the table into a utf-8 encoded file by converting the XML entities to their original characters.
Is there a function in Oracle that might handle this for me, or do I really need to create a huge key/value map for that?
Any help is greatly appreciated.
EDIT: I found the function DBMS_XMLGEN.convert, but it only works on <,> and &. Not on &#NNN; :-(
I believe the problem with dbms_xmlgen is that there are technically only five XML entities. Your example has a numeric HTML entity, which corresponds with Unicode:
http://theorem.ca/~mvcorks/cgi-bin/unicode.pl.cgi?start=0100&end=017F
Oracle has a function UNISTR which is helpful here:
select unistr('sloven\010dina') from dual;
I've converted 269 to its hex equivalent 010d in the example above (in Unicode it is U+010D). However, you could pass a decimal number and do a conversion like this:
select unistr('sloven\' || replace(to_char(269, 'xxx'), ' ', '0') || 'ina') from dual;
EDIT: The PL/SQL solution:
Here's an example I've rigged up for you. This should loop over and replace any occurrences for each row you select out of your table(s).
create table html_entities (
id NUMBER(3),
text_row VARCHAR2(100)
);
INSERT INTO html_entities
VALUES (1, 'Hallöle slovenčina Ċ ú');
INSERT INTO html_entities
VALUES (2, 'I like the letter Ċ');
INSERT INTO html_entities
VALUES (3, 'Nothing to change here.');
DECLARE
v_replace_str NVARCHAR2(1000);
v_fh UTL_FILE.FILE_TYPE;
BEGIN
--v_fh := utl_file.fopen_nchar(LOCATION IN VARCHAR2, FILENAME IN VARCHAR2, OPEN_MODE IN VARCHAR2, MAX_LINESIZE IN BINARY_INTEGER);
FOR v_rec IN (select id, text_row from html_entities) LOOP
v_replace_str := v_rec.text_row;
WHILE (REGEXP_INSTR(v_replace_str, '&#[0-9]+;') <> 0) LOOP
v_replace_str := REGEXP_REPLACE(
v_replace_str,
'&#([0-9]+);',
unistr('\' || replace(to_char(to_number(regexp_replace(v_replace_str, '.*?&#([0-9]+);.*$', '\1')), 'xxx'), ' ', '0')),
1,
1
);
END LOOP;
-- utl_file.put_line_nchar(v_fh, v_replace_str);
dbms_output.put_line(v_replace_str);
END LOOP;
--utl_file.fclose(v_fh);
END;
/
Notice that I've stubbed in calls to the UTL_FILE function to write NVARCHAR lines (Oracle's extended character set) to a file on the database server. The dbms_output, while great for debugging, doesn't seem to support extended characters, but this shouldn't be a problem if you use UTL_FILE to write to a file. Here's the DBMS_OUTPUT:
Hallöle slovencina C ú
I like the letter C
Nothing to change here.
You can also just use the internationalization package :
UTL_I18N.unescape_reference ('text')
Works great in changing those html entities to normal characters (such as cleanup after moving a database from iso 8859P1 to UTF-8)
This should probably be done in PL/SQL which I do not know, but I wanted to see how far I could get it with pure SQL. This only replaces the first occurence of the code, so you would have to somehow run it multiple times.
select regexp_replace(s, '&#([0-9]+);', u) from
(select s, unistr('\0' || REPLACE(TO_CHAR(TO_NUMBER(c), 'xxxx'), ' ', '')) u from
(select s, regexp_replace(s, '.*&#([0-9]+);.*', '\1') c from
(select 'Hallöle slovenčina' s from dual)))
Or less readable but more usable:
SELECT
REGEXP_REPLACE(s, '&#([0-9]+);', unistr('\0' || REPLACE(TO_CHAR(TO_NUMBER(regexp_replace(s, '.*?&#([0-9]+);.*$', '\1', 1, 1)), 'xxxx'), ' ', '')), 1, 1)
FROM
(SELECT 'Hallöle slovenčina č Ė' s FROM DUAL)
This (updated) version correctly replaces the first occurrence. You need to apply it until all of them are replaced.

oracle blob text search

Is it possible to search through blob text using sql statement?
I can do select * from $table where f1 like '%foo%' if the f1 is varchar, how about f1 is a blob? Any counter part for this?
This is quite possible and easy to do.
Simply use dbms_lob.instr in conjunction with utl_raw.cast_to_raw
So in your case, if t1 is a BLOB the select would look like:
select *
from table1
where dbms_lob.instr (t1, -- the blob
utl_raw.cast_to_raw ('foo'), -- the search string cast to raw
1, -- where to start. i.e. offset
1 -- Which occurrance i.e. 1=first
) > 0 -- location of occurrence. Here I don't care. Just find any
;
If it is a Word or PDF document, look into Oracle Text.
If you are storing plain text it should be a CLOB, not a BLOB, and then you can still query using LIKE. A BLOB contains binary data that Oracle doesn't know the structure of, so it cannot search it in this way.
This works for CLOBs of any length (at least on Oracle 12C):
SQL> create table t1 (c clob);
Table created.
SQL> declare
2 x clob;
3 begin
4 for i in 1..100 loop
5 x := x || rpad('x', 32767, 'x');
6 end loop;
7 x := x || 'z';
8 for i in 1..100 loop
9 x := x || rpad('x', 32767, 'x');
10 end loop;
11 insert into t1 values (x);
12 end;
13 /
PL/SQL procedure successfully completed.
SQL> select dbms_Lob.getlength(c) from t1 where c like '%z%';
DBMS_LOB.GETLENGTH(C)
---------------------
6553401
Note that there is only one 'z' in that 6,554,401 byte CLOB - right in the middle of it:
SQL> select instr(c, 'z') from t1;
INSTR(C,'Z')
------------
3276701
the below code is to display the details from blob as text using UTL_RAW.CAST_TO_VARCHAR2 function then we use substr function to cut the text from the start of expected data till end. however, you can use instr function, LENGTH function , if you know the location of the data you are looking for
select NVL(SUBSTR(UTL_RAW.CAST_TO_VARCHAR2(blob_body),
INSTR(UTL_RAW.CAST_TO_VARCHAR2(blob_body), '<ns:xml_element>') + LENGTH('<ns:xml_element>'),
INSTR(UTL_RAW.CAST_TO_VARCHAR2(blob_body), '</ns:xml_element>') - (
INSTR(UTL_RAW.CAST_TO_VARCHAR2(blob_body), '<ns:xml_element>') + LENGTH('<ns:xml_element>'))),
utl_raw.cast_to_varchar2(DBMS_LOB.SUBSTR(blob_body))
) blob_body
from dual
where SUBSTR(UTL_RAW.CAST_TO_VARCHAR2(blob_body),
INSTR(UTL_RAW.CAST_TO_VARCHAR2(blob_body), '<ns:xml_element>') + LENGTH('<ns:xml_element>'),
INSTR(UTL_RAW.CAST_TO_VARCHAR2(blob_body), '</ns:xml_element>') - (
INSTR(UTL_RAW.CAST_TO_VARCHAR2(blob_body), '<ns:xml_element>') + LENGTH('<ns:xml_element>'))) like '%foo%';
Select * From TABLE_NAME
and dbms_lob.instr("BLOB_VARIABLE_NAME", utl_raw.cast_to_raw('search_text'), 1, 1) > 0

Resources