Oracle: Convert xml entities in a varchar2 field to utf-8 characters

Oracle: Convert xml entities in a varchar2 field to utf-8 characters - oracle

I have a field in a table which holds XML entities for special characters, since the table is in latin-1.
E.g. "Hallöle slovenčina" (the "ö" is in latin-1, but the "č" in "slovenčina" had to be converted to an entity by some application that stores the values into the database)
Now I need to export the table into a utf-8 encoded file by converting the XML entities to their original characters.
Is there a function in Oracle that might handle this for me, or do I really need to create a huge key/value map for that?
Any help is greatly appreciated.
EDIT: I found the function DBMS_XMLGEN.convert, but it only works on <,> and &. Not on &#NNN; :-(

I believe the problem with dbms_xmlgen is that there are technically only five XML entities. Your example has a numeric HTML entity, which corresponds with Unicode:
http://theorem.ca/~mvcorks/cgi-bin/unicode.pl.cgi?start=0100&end=017F
Oracle has a function UNISTR which is helpful here:
select unistr('sloven\010dina') from dual;
I've converted 269 to its hex equivalent 010d in the example above (in Unicode it is U+010D). However, you could pass a decimal number and do a conversion like this:
select unistr('sloven\' || replace(to_char(269, 'xxx'), ' ', '0') || 'ina') from dual;
EDIT: The PL/SQL solution:
Here's an example I've rigged up for you. This should loop over and replace any occurrences for each row you select out of your table(s).
create table html_entities (
id NUMBER(3),
text_row VARCHAR2(100)
);
INSERT INTO html_entities
VALUES (1, 'Hallöle slovenčina Ċ ú');
INSERT INTO html_entities
VALUES (2, 'I like the letter Ċ');
INSERT INTO html_entities
VALUES (3, 'Nothing to change here.');
DECLARE
v_replace_str NVARCHAR2(1000);
v_fh UTL_FILE.FILE_TYPE;
BEGIN
--v_fh := utl_file.fopen_nchar(LOCATION IN VARCHAR2, FILENAME IN VARCHAR2, OPEN_MODE IN VARCHAR2, MAX_LINESIZE IN BINARY_INTEGER);
FOR v_rec IN (select id, text_row from html_entities) LOOP
v_replace_str := v_rec.text_row;
WHILE (REGEXP_INSTR(v_replace_str, '&#[0-9]+;') <> 0) LOOP
v_replace_str := REGEXP_REPLACE(
v_replace_str,
'&#([0-9]+);',
unistr('\' || replace(to_char(to_number(regexp_replace(v_replace_str, '.*?&#([0-9]+);.*$', '\1')), 'xxx'), ' ', '0')),
1,
1
);
END LOOP;
-- utl_file.put_line_nchar(v_fh, v_replace_str);
dbms_output.put_line(v_replace_str);
END LOOP;
--utl_file.fclose(v_fh);
END;
/
Notice that I've stubbed in calls to the UTL_FILE function to write NVARCHAR lines (Oracle's extended character set) to a file on the database server. The dbms_output, while great for debugging, doesn't seem to support extended characters, but this shouldn't be a problem if you use UTL_FILE to write to a file. Here's the DBMS_OUTPUT:
Hallöle slovencina C ú
I like the letter C
Nothing to change here.

You can also just use the internationalization package :
UTL_I18N.unescape_reference ('text')
Works great in changing those html entities to normal characters (such as cleanup after moving a database from iso 8859P1 to UTF-8)

This should probably be done in PL/SQL which I do not know, but I wanted to see how far I could get it with pure SQL. This only replaces the first occurence of the code, so you would have to somehow run it multiple times.
select regexp_replace(s, '&#([0-9]+);', u) from
(select s, unistr('\0' || REPLACE(TO_CHAR(TO_NUMBER(c), 'xxxx'), ' ', '')) u from
(select s, regexp_replace(s, '.*&#([0-9]+);.*', '\1') c from
(select 'Hallöle slovenčina' s from dual)))
Or less readable but more usable:
SELECT
REGEXP_REPLACE(s, '&#([0-9]+);', unistr('\0' || REPLACE(TO_CHAR(TO_NUMBER(regexp_replace(s, '.*?&#([0-9]+);.*$', '\1', 1, 1)), 'xxxx'), ' ', '')), 1, 1)
FROM
(SELECT 'Hallöle slovenčina č Ė' s FROM DUAL)
This (updated) version correctly replaces the first occurrence. You need to apply it until all of them are replaced.

Related

LISTAGG 4000 Character Limit - Result of string concatenation is too long [duplicate]

This question already has answers here:
Oracle - convert column values to comma separated values as CLOB without using XMLAGG
(1 answer)
ORA-64451: Conversion of special character to escaped character failed
(1 answer)
Closed 8 months ago.
select t.name, listagg(t.text)
from user_source t
group by t.name;
I am trying to execute the code above but since varchar2 is limited by 4000 chars it throws error. I tried to convert listagg to xml but I could not solve the
ORA-64451: Conversion of special character to escaped character failed.
error. I also tried the answers from other posts from various websites including stackoverflow.
I do not want to truncate the string, also I can't change MAX_STRING_SIZE parameter.
This example below throws ORA-64451 as well. I tried but could not solve the problem.
select rtrim(
xmlagg(
xmlelement(e, to_clob(t.TEXT), '; ').extract('//text()')
).GetClobVal(),
',')
from user_source t;

The best solution I know is posted somewhere in the Internet... You could probably just google for it. It basically consist of few steps:
Creating a collection type to store each text value to concatenate
create or replace type string_array_t as table of VARCHAR2(4000);
Creating a PL/SQL function which takes string_array_t as parameter and returns concatenated text as CLOB:
create or replace function
string_array2clob(
p_string_array string_array_t
,p_delimiter varchar2 default ','
) RETURN CLOB IS
v_string CLOB;
BEGIN
-- inside is a loop over p_string_array to concatenate all elements
--
-- below is just a draft because in real you should use a while loop
-- to handle sparse collection and you should put some checks to handle not initialized collection
-- and other important cases
-- furthermore it's better to create additional varchar2 variable as a buffer
-- and convert that buffer to clob when it's full for a better performance
for indx in p_string_array.first..p_string_array.last loop
v_string := v_string || to_clob(p_string_array(indx) || p_delimiter);
end loop;
RETURN substr(v_string, 1, nvl(length(v_string),0) - nvl(length(p_delimiter),0));
END string_array2clob;
/
Aggregate query as usual but using cast and collect instead of listagg and at the end convert it to clob with function from step above:
select t.name, string_array2clob(cast(collect(t.text order by t.line) as string_array_t ), p_delimiter => chr(10)) as text
from user_source t
group by t.name;
If your query is not just an example of concept and really you're trying to get a source of some object in database, then you should read about dbms_metadata.get_ddl function. It's made for it.

ORA-01461 (with > 4k varchar2) error Only in merge statement. Insert or update works fine

Here is my clue...
I'm on oracle 11g. Searched a lot, but nothing found.
I need to execute DML operations, which can contain data > 4k characters.
If i use a sql block, directly in oracle, like the next one, everything works fine
declare
txtV varchar2(32000);
BEGIN
txtV:= 'MORE THAN 4k CHARS, here only few for readability' ;
Update FD_FILTERDEF
set SQLFILTER = txtV
where id='blabla';
END;
BUT!!!
if i use merge statement, it gives me error ORA-01461
declare
txtV varchar2(32000);
BEGIN
txtV:= '' ;
MERGE INTO FD_FILTERDEF A
USING ( select txtV C0
from dual) ST
ON (A.CODE = 'bla bla')
WHEN MATCHED THEN
Update set A.SQLFILTER = st.C0
WHEN NOT MATCHED THEN
insert (CODE ,SQLFILTER )
values ('bla bla' , ST.C0 );
END;
If have some hint would be appreciated :)

Use this:
create table fd_filterdef
( code varchar2(10) primary key
, sqlfilter clob );
declare
txtv varchar2(32000);
begin
txtv := rpad('select statement, really really long', 5000, ' etc');
merge into fd_filterdef a
using (select 'bla bla' as code from dual) st
on (a.code = st.code)
when matched then
update set a.sqlfilter = txtv
when not matched then
insert (code, sqlfilter)
values (st.code,txtv);
end;
/
select code, length(sqlfilter) from fd_filterdef;
CODE LENGTH(SQLFILTER)
---------- -----------------
bla bla 5000
Selecting your long variable from dual implicitly casts it to a SQL varchar2 which prior to 12c only holds up to 4000 bytes.

you can't select varchar2 more than 4k https://docs.oracle.com/cd/B28359_01/server.111/b28320/limits001.htm
see your code
select txtV C0 from dual
but in oracle 12c you can
https://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF30020

#William great hint, this you posted works.
But i'm quite sure i have tested a syntax very similar, in which the only difference was in the select statement,
The one you provided works fine... the following one will raise the error:
declare
txtV varchar2(32000);
BEGIN
txtV:= '5000 chars ....';
MERGE INTO FD_FILTERDEF A
USING ( select 'not used' Code from dual) ST
ON (A.CODE = 'TESTCODE')
WHEN MATCHED THEN
Update set A.SQLFILTER = txtV --<<<< LOOK HERE I USE DIRECTLY THE VARIABLE DELCARED, NOT THE ONE FROM SELECT STMT
WHEN NOT MATCHED THEN
insert (CODE ,SQLFILTER )
values (st.code , txtV );
END;

#William ... ok, maybe i've made a bit of confusion in writing down the scripts. I was surprised about the "pk error" cause in my mind i have the script done as follows. I intended to no use at all the "select" statement, just passing the code inside insert and update (as follows), cause i build the query programmatically and replace values with placeholders....
In this way, there is no pk error at all. In you examples, of course, cause in insert was used the code from the query, but in update was used the value "TESTCODE" ... so it was searching (and updating) for a code, but inserting another :P
Sorry for my mistake :)
declare
txtV varchar2(32000);
BEGIN
txtV:= '5000 chars ....';
MERGE INTO FD_FILTERDEF A
USING ( select 'not used' Code from dual) ST
ON (A.CODE = 'TESTCODE')
WHEN MATCHED THEN
Update set A.SQLFILTER = txtV --<<<< LOOK HERE I USE DIRECTLY THE VARIABLE DELCARED, NOT THE ONE FROM SELECT STMT
WHEN NOT MATCHED THEN
insert (CODE ,SQLFILTER )
values ('TESTCODE' , txtV );
END;

Extract TEXT from a CLOB field

I have a CLOB field in my Oracle Database that store TEXT data in the following format:
__99__RU_LOCKED=N;;__99__RU_SUSPENDED=Y;;__17__USER_TYPE=A;;__17__USER_TYPE_610=A;;__17__GUIFLAG=0;;__17__DEFAULT_LANG_610=E;;__17__OUTPUT_DEVICE_46=LOCL;;__17__PRINT_IMMED=G;;__17__DELETE_AFTER_PRINT=D;;__17__CATT=*BLANK;;__17__CATT_46=*;;__17__DEC_FORMAT=*BLANK;;__17__DEC_FORMAT_46=X;;__17__DATE_FORMAT=2;;__17__PARAMETERS=OM_OBJM_NO_DISPLAYX;;__17__MEAS_EASLPFL=0;;__17__USER_GROUP=S1BR22;;__17__VALID_FROM=20080222;;__17__VALID_UNTIL=99991231;;__17__ACCOUNT=37004968;;
I'm using TOAD and while I am creating the query I can read the CLOB field with the following:
--- To read the CLOB field.
select DBMS_LOB.substr(ADD_INFO_MASTER) from USER
This select return me the CLOB field HUMAN READABLE.
My question is: Is there any way to extract the one single value like ACCOUNT value from the line above?
Keep in mind that this CLOB field can variate and the __17__ACCOUNT= will not be in the same place every time. I need a way to extract to locate the ;;__17__ACCOUNT= (this will be a pattern) and extract the the value 37004968.
It is possible to achieve this while performing a query in TOAD?

If you want to deal with CLOB values larger than 4000 symbols length (Oracle 11g) or 32K length (Oracle 12c) then you must use DBMS_LOB package.
This package contains instr() and substr() functions which operates on LOBs.
In your case query looks like that:
with prm as (
select '__17__ACCOUNT' as fld_start from dual
)
select
dbms_lob.substr(
text,
-- length of substring
(
-- position of delimiter found after start of desired field
dbms_lob.instr(text, ';;', dbms_lob.instr(text, prm.fld_start))
-
-- position of the field description plus it's length
( dbms_lob.instr(text, prm.fld_start) + length(fld_start) + 1 )
),
-- start position of substring
dbms_lob.instr(text,prm.fld_start) + length(fld_start) + 1
)
from
text_table,
prm
Query above uses this setup:
create table text_table(text clob);
insert into text_table(text) values (
'__99__RU_LOCKED=N;;__99__RU_SUSPENDED=Y;;__17__USER_TYPE=A;;__17__USER_TYPE_610=A;;__17__GUIFLAG=0;;__17__DEFAULT_LANG_610=E;;__17__OUTPUT_DEVICE_46=LOCL;;__17__PRINT_IMMED=G;;__17__DELETE_AFTER_PRINT=D;;__17__CATT=*BLANK;;__17__CATT_46=*;;__17__DEC_FORMAT=*BLANK;;__17__DEC_FORMAT_46=X;;__17__DATE_FORMAT=2;;__17__PARAMETERS=OM_OBJM_NO_DISPLAYX;;__17__MEAS_EASLPFL=0;;__17__USER_GROUP=S1BR22;;__17__VALID_FROM=20080222;;__17__VALID_UNTIL=99991231;;__17__ACCOUNT=37004968;;'
);
For everyday use with development tools it may be useful to define a function which returns value of field with desired name and use it instead of writing complicated expressions each time.
E.g. :
create or replace function get_field_from_text(
pi_text in clob,
pi_field_name in varchar2
) return varchar2 deterministic parallel_enable
is
v_start_pos binary_integer;
v_field_start varchar2(4000);
v_field_value varchar2(32767);
begin
if( (pi_text is null) or (pi_field_name is null) ) then
return null;
end if;
v_field_start := pi_field_name || '=';
v_start_pos := dbms_lob.instr(pi_text, v_field_start);
if(v_start_pos is null) then
return null;
end if;
v_start_pos := v_start_pos + length(v_field_start);
v_field_value := dbms_lob.substr(
pi_text,
(dbms_lob.instr(pi_text, ';;', v_start_pos) - v_start_pos),
v_start_pos
);
return v_field_value;
end;
Usage:
select get_field_from_text(text,'__17__OUTPUT_DEVICE_46') from text_table

You could use a regular expression to extract the value:
WITH your_table AS (
SELECT '__99__RU_LOCKED=N;;__99__RU_SUSPENDED=Y;;__17__USER_TYPE=A;;__17__USER_TYPE_610=A;;__17__GUIFLAG=0;;__17__DEFAULT_LANG_610=E;;__17__OUTPUT_DEVICE_46=LOCL;;__17__PRINT_IMMED=G;;__17__DELETE_AFTER_PRINT=D;;__17__CATT=*BLANK;;__17__CATT_46=*;;__17__DEC_FORMAT=*BLANK;;__17__DEC_FORMAT_46=X;;__17__DATE_FORMAT=2;;__17__PARAMETERS=OM_OBJM_NO_DISPLAYX;;__17__MEAS_EASLPFL=0;;__17__USER_GROUP=S1BR22;;__17__VALID_FROM=20080222;;__17__VALID_UNTIL=99991231;;__17__ACCOUNT=37004968;;' clob_field FROM DUAL
)
SELECT REGEXP_SUBSTR(clob_field,'__17__ACCOUNT=.*;;')
FROM your_table
Using that you would get "__17__ACCOUNT=37004968;;". You can easily extract the value with SUBSTR.
I think that in Oracle 11g REGEXP_SUBSTR has extra parameters that would let you extract a certain group within the regular expression.

You can use INSTR and SUBSTR with CLOB datatype:
WITH T1 AS (
SELECT '__99__RU_LOCKED=N;;__99__RU_SUSPENDED=Y;;__17__USER_TYPE=A;;__17__USER_TYPE_610=A;;__17__GUIFLAG=0;;__17__DEFAULT_LANG_610=E;;__17__OUTPUT_DEVICE_46=LOCL;;__17__PRINT_IMMED=G;;__17__DELETE_AFTER_PRINT=D;;__17__CATT=*BLANK;;__17__CATT_46=*;;__17__DEC_FORMAT=*BLANK;;__17__DEC_FORMAT_46=X;;__17__DATE_FORMAT=2;;__17__PARAMETERS=OM_OBJM_NO_DISPLAYX;;__17__MEAS_EASLPFL=0;;__17__USER_GROUP=S1BR22;;__17__VALID_FROM=20080222;;__17__VALID_UNTIL=99991231;;__17__ACCOUNT=37004968;;' TEXT FROM DUAL
)
SELECT SUBSTR(TEXT,
INSTR(TEXT, '__17__ACCOUNT=') + LENGTH('__17__ACCOUNT') + 1, -- find the first position of the value
INSTR (TEXT, ';;', INSTR(TEXT, '__17__ACCOUNT=')) - (INSTR(TEXT, '__17__ACCOUNT=') + LENGTH('__17__ACCOUNT') + 1) -- length to read. Difference between the end position (the first ;; after your placeholder) and the value start position (the same value as above)
)
FROM T1;
However I like the REGEXP solution proposed by pablomatico more.

Oracle PL/SQL function that accepts variable number of string parameters and returns comma separated

I need an Oracle PL/SQL function that accepts a variable number of string parameters and returns those strings comma separated with any null values ignored.
Can't find any examples on google.
So for example I would call:
foo('hello', null, 'world')
and it would return:
'hello, world'
or
foo('hello', 'world')

From your comment I'm assuming you have a row in a table, with some null columns, you now want this to be a comma delimited string... There are a lot easier ways of doing this than a function. Given the following table:
create table the_table (
a varchar2(100)
, b varchar2(100)
, c varchar2(100)
, d varchar2(100)
);
insert into the_table
values ('hello',null,'world', null);
You could do this, which comma delimits everything and then cleans up after itself.
select regexp_replace(trim(both ',' from a || ',' || b || ',' || c || ',' || d)
, ',{2,}', ',')
from the_table
SQL Fiddle
To provide a better explanation of TRIM() (documentation); the default behaviour of TRIM() is to remove trailing and leading whitespace, however, it can be used to remove any trailing and/or leading single character using the following syntax:
trim( <TRAILING|LEADING|BOTH trim_character FROM> trim_string )
where
TRAILING|LEADING|BOTH indicates whether you want to remove trailing or leading characters, or both.
trim_character is the character you want to remove
FROM is syntactic sugar to make the entire thing make sense.
If an alternative character is specified then TRIM() does not remove trailing and leading whitespace.
For e.g. the following would both remove both trailing and leading semi-colons:
trim(both ';' from ';hello world;')
trim(';' from ';hello world;'
and this would remove leading hashes:
trim(leading '#' from '#hello world')
The documentation describes in more detail all the possible scenarios.

I'm not sure about null values, but I would recommend to pass into function nested table:
create or replace TYPE "OT_VARCHAR_TABLE" as table of varchar2(200);
create or replace FUNCTION removeNulls(v_in_table OT_VARCHAR_TABLE) RETURN OT_VARCHAR_TABLE IS
....
In stored procedure code you can just filter values. This is simple to do.
UPD.1
Here is full code:
create or replace FUNCTION foo(v_in_table OT_VARCHAR_TABLE) RETURN VARCHAR2 IS
vRes VARCHAR2(1000);
BEGIN
SELECT
listagg(t.column_value,',') within group (order by t.column_value)
INTO vRes
FROM TABLE (CAST(v_in_table AS OT_VARCHAR_TABLE)) t;
return vRes;
END;
/
And usage:
select foo(OT_VARCHAR_TABLE('cc','aa', null, 'bb', null , null )) from sys.dual;
Result:
aa,bb,cc

Error : ORA-01704: string literal too long

While I try to set the value of over 4000 characters on a field that has data type CLOB, it gives me this error :
ORA-01704: string literal too long.
Any suggestion, which data type would be applicable for me if I have to set value of unlimited characters although for my case, it happens to be of about 15000 chars.
Note : The long string that I am trying to store is encoded in ANSI.

What are you using when operate with CLOB?
In all events you can do it with PL/SQL
DECLARE
str varchar2(32767);
BEGIN
str := 'Very-very-...-very-very-very-very-very-very long string value';
update t1 set col1 = str;
END;
/
Proof link on SQLFiddle

Try to split the characters into multiple chunks like the query below and try:
Insert into table (clob_column) values ( to_clob( 'chunk 1' ) || to_clob( 'chunk 2' ) );
It worked for me.

To solve this issue on my side, I had to use a combo of what was already proposed there
DECLARE
chunk1 CLOB; chunk2 CLOB; chunk3 CLOB;
BEGIN
chunk1 := 'very long literal part 1';
chunk2 := 'very long literal part 2';
chunk3 := 'very long literal part 3';
INSERT INTO table (MY_CLOB)
SELECT ( chunk1 || chunk2 || chunk3 ) FROM dual;
END;
Hope this helps.

The split work until 4000 chars depending on the characters that you are inserting. If you are inserting special characters it can fail.
The only secure way is to declare a variable.

Though its a very old question but i think sharing experience still might help others:
Large text can be saved in a single query if we break-down it in chunks of 4000 bytes/characters by concatinating them using '||'
Running following query will tell you:
Required Number of chunks containing 4000 bytes
Remaining bytes
Since, in given example you are trying to save text contining 15000 bytes (characters), so,
select 15000/4000 chunk,mod(15000,4000) remaining_bytes from dual;
Result:
That means, you need to concatenate 3 chunks of 4000 bytes and one chunk of 3000 bytes, so it would be like:
INSERT INTO <YOUR_TABLE>
VALUES (TO_CLOB('<1st_4K_bytes>') ||
TO_CLOB('<2nd_4K_bytes>') ||
TO_CLOB('<3rd_4K_bytes>') ||
TO_CLOB('<last_3K_bytes>)');

create a function that return a clob
create function ret_long_chars return clob is
begin
return to_clob('put here long characters');
end;
update table set column = ret_long_chars;

INSERT INTO table(clob_column) SELECT TO_CLOB(q'[chunk1]') || TO_CLOB(q'[chunk2]') ||
TO_CLOB(q'[chunk3]') || TO_CLOB(q'[chunk4]') FROM DUAL;

Accepted answer did not work for me in sql developper but combination of this answer and another one did :
DECLARE
str varchar2(32767);
BEGIN
update table set column = to_clob('Very-very-...-very-very-very-very-very-very long string value');
END;
/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Oracle: Convert xml entities in a varchar2 field to utf-8 characters - oracle

You can also just use the internationalization package : UTL_I18N.unescape_reference ('text') Works great in changing those html entities to normal characters (such as cleanup after moving a database from iso 8859P1 to UTF-8)

Related

LISTAGG 4000 Character Limit - Result of string concatenation is too long [duplicate]

ORA-01461 (with > 4k varchar2) error Only in merge statement. Insert or update works fine

Extract TEXT from a CLOB field

Oracle PL/SQL function that accepts variable number of string parameters and returns comma separated

Error : ORA-01704: string literal too long

Categories

Resources