Following my previous question, I don't seem to be able to convert a HTTP response from ISO-8859-1 to UTF-8.
I am using APEX_WEB_SERVICE package to to my requests. (I know this package uses UTL_HTTP itself, so it should be similar in usage)
What I do:
apex_web_service.g_request_headers(1).name := 'Content-Type';
apex_web_service.g_request_headers(1).value := 'text/csv';
l_response := apex_web_service.make_rest_request(
p_url => MY_URL || '/download_csv',
p_http_method => 'GET'
);
l_response contains the csv data but all 'é' and 'è' are replaced by '¿':
Type;Groupe Acc¿Code;EOTP autoris¿Familles EOTP autoris¿;Nom;Pr¿m;Adresse
Whereas if I access the link directly, my browser downloads it with proper encoding:
Type;Groupe Accès;Code;EOTP autorisés;Familles EOTP autorisées;Nom;Prénom;Adresse
I tried to convert the response with:
l_response := convert(l_response, 'AL16UTF16', 'WE8ISO8859P1');
But it has absolutely no effect.
The website is https://www.stocknet.fr/ and is in ISO-8859-1.
My Oracle NLS parameters (which I obviously can't modify):
+-------------------------+-----------------------------+
| PARAMETER | VALUE |
+-------------------------+-----------------------------+
| NLS_LANGUAGE | ENGLISH |
| NLS_TERRITORY | UNITED KINGDOM |
| NLS_CURRENCY | £ |
| NLS_ISO_CURRENCY | UNITED KINGDOM |
| NLS_NUMERIC_CHARACTERS | ., |
| NLS_CALENDAR | GREGORIAN |
| NLS_DATE_FORMAT | DD-MON-RR HH24:MI |
| NLS_DATE_LANGUAGE | ENGLISH |
| NLS_CHARACTERSET | WE8MSWIN1252 |
| NLS_SORT | BINARY |
| NLS_TIME_FORMAT | HH24.MI.SSXFF |
| NLS_TIMESTAMP_FORMAT | DD-MON-RR HH24.MI.SSXFF |
| NLS_TIME_TZ_FORMAT | HH24.MI.SSXFF TZR |
| NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH24.MI.SSXFF TZR |
| NLS_DUAL_CURRENCY | € |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 |
| NLS_COMP | BINARY |
| NLS_LENGTH_SEMANTICS | BYTE |
| NLS_NCHAR_CONV_EXCP | FALSE |
+-------------------------+-----------------------------+
At this point, I don't know if there is any way to process data from this website from plsql. Any help, tips or suggestion would be appreciated.
Turns out I had to use UTL_HTTP and DBMS_LOB in order for Oracle to parse correctly the characters.
This solved my issue:
DECLARE
l_clob CLOB;
l_http_request utl_http.req;
l_http_response utl_http.resp;
l_text VARCHAR2(32767);
BEGIN
dbms_lob.createtemporary(l_clob, false);
l_http_request := utl_http.begin_request(my_url || '/download_csv');
l_http_response := utl_http.get_response(l_http_request);
BEGIN
LOOP
utl_http.read_text(l_http_response, l_text, 32766);
dbms_lob.writeappend(l_clob, length(l_text), l_text);
END LOOP;
EXCEPTION
WHEN utl_http.end_of_body THEN
utl_http.end_response(l_http_response);
END;
dbms_output.put_line(l_clob); /* => ENCODING IS FINALLY GOOD ! */
dbms_lob.freetemporary(l_blob);
EXCEPTION
WHEN OTHERS THEN
utl_http.end_response(l_http_response);
dbms_lob.freetemporary(l_blob);
RAISE;
END;
/
I hope this could help someone else.
Related
I'm facing a very strange issue where Oracle 12c is not managing 2 bytes character as Oracle 11g, leading to issues with some functions like LPAD.
We have two databases, one 11g and one 12c, with identical NLS parameters, but while 11g manages cyrillic characters as 1 byte in functions like LPAD, 12c manages them as 2 bytes, leading to problems: if we need a certain value to be 40 chars long, every cyrillic character in it will count as 2 bytes while being padded, but will be displayed as 1 char, meaning that 5 cyrillic characters to be LPADded to 40 will in fact generate a value with length 35.
This behaviour is described in the official Oracle documentation (https://docs.oracle.com/database/121/SQLRF/functions107.htm#SQLRF00663), but it has been so for several versions (including 11g), so it's unclear to me why these 2 versions should have different behaviours with the same settings, and in case, how to manage this.
Important notes:
both databases manage european characters (including special characters from some eastern european alphabets like greek, etc.) and russian characters (cyrillic), so it's not really an option to switch region to "RUSSIA";
using nvarchar2 instead of varchar2 solves the issue (it switches to national charset which is UTF16), but it would imply switching all varchar2 columns in a 4 TB database to nvarchar2, which is quite troublesome and might lead to a LOT of wasted space;
the problem occurs in stored procedures managing data already stored in the database, so this doesn't look like a client misconfiguration.
Database properties for NLS parameters (I've removed date and currency formats since they're not really relevant):
+-----------------------------------+------------+------------+
| Parameter | 12c | 11g |
+-----------------------------------+------------+------------+
| NLS_CHARACTERSET | AL32UTF8 | AL32UTF8 |
| NLS_COMP | BINARY | BINARY |
| NLS_DATE_LANGUAGE | AMERICAN | AMERICAN |
| NLS_ISO_CURRENCY | AMERICA | AMERICA |
| NLS_LANGUAGE | AMERICAN | AMERICAN |
| NLS_LENGTH_SEMANTICS | BYTE | BYTE |
| NLS_NCHAR_CHARACTERSET | AL16UTF16 | AL16UTF16 |
| NLS_NCHAR_CONV_EXCP | FALSE | FALSE |
| NLS_NUMERIC_CHARACTERS | ., | ., |
| NLS_RDBMS_VERSION | 12.1.0.2.0 | 11.2.0.4.0 |
| NLS_SORT | BINARY | BINARY |
| NLS_TERRITORY | AMERICA | AMERICA |
+-----------------------------------+------------+------------+
V$Parameter properties (same, removed dates):
+-----------------------------------+----------------+----------------+
| Parameter | 12c | 11g |
+-----------------------------------+----------------+----------------+
| NLS_COMP | BINARY | BINARY |
| NLS_DATE_LANGUAGE | ENGLISH | ENGLISH |
| NLS_ISO_CURRENCY | UNITED KINGDOM | UNITED KINGDOM |
| NLS_LANGUAGE | ENGLISH | ENGLISH |
| NLS_LENGTH_SEMANTICS | CHAR | CHAR |
| NLS_NCHAR_CONV_EXCP | FALSE | FALSE |
| NLS_NUMERIC_CHARACTERS | ., | ., |
| NLS_SORT | BINARY | BINARY |
| NLS_TERRITORY | UNITED KINGDOM | UNITED KINGDOM |
+-----------------------------------+----------------+----------------+
Example from the 12c database:
SELECT 'This is a test данные испытаний' as "Original",
lpad(nvl('This is a test данные испытаний', ' '), 40) as "LPADded",
lpad(nvl('данные испытаний', ' '), 40) as "Cyrillic only",
lpad(nvl('This is a test', ' '), 40) as "Non-cyrillic only",
lpad(nvl(to_nchar('данные испытаний'), ' '), 40) as "NChar cyrillic only",
lpad(nvl(to_nchar('This is a test данные испытаний'),
' '),
40) as "NChar mixed"
FROM dual;
Results:
This is a test данные испытаний (original - 31 chars)
This is a test данные испыта (std lpad - 28 chars)
данные испытаний (std lpad cyrillic only - 25 chars)
This is a test (std lpad non-cyrillic only - 40 chars)
данные испытаний (nchar lpad cyrillic only - 40 chars)
This is a test данные испытаний (nchar lpad mixed - 40 chars)
In the 11g database, all the above (except, of course, the original) have a length of 40 chars.
Thanks
I think the problem is related to the ambiguous fonts in UNICODE. You can find a description here:
http://unicode.org/reports/tr11/#Ambiguous
In oracle if you use
lengthc function
always returns the actual length of the character,
while the
lenghtb function
returns the byte occupation of the character.
A possible solution could be to use the following form:
i tried with UNISTR('\4F4F') that takes up 2 bytes
select lpad('pippo'||UNISTR('\4F4F'),10+lengthc(UNISTR('\4F4F')),'x') from dual;
and the displayed length is the desired one
I'm new to sqlplus.
I tried
'ALTER SESSION SET NLS_TERRITORY = "United Kingdom"
to change the currency to £. It worked once but after I changed it to
'ALTER SESSION SET NLS_TERRITORY = "Japan"
and changed it back to
'ALTER SESSION SET NLS_TERRITORY = "United Kingdom"
SELECT TO_CHAR(1111, 'L9999') FROM DUAL
it outputs #1111.
How can I fix this? Looking at v$nls_parameters, this only happens only when I set it to the UK.
[TL;DR]
One solution may be to upgrade to Oracle 19.
Another may be, if you are using windows command prompt, to run:
set NLS_LANG=ENGLISH_UNITED KINGDOM.we8pc850
before running SQL/Plus to set the code-page for displaying characters.
This works on Oracle's LiveSQL which is running Oracle 19c.
ALTER SESSION SET NLS_TERRITORY = "UNITED KINGDOM";
SELECT TO_CHAR(1111, 'FML9999') FROM DUAL;
Which outputs:
TO_CHAR(1111,'FML9999')
-----------------------
£1111
and:
select *
from nls_database_parameters
ORDER BY parameter
outputs:
PARAMETER VALUE
----------------------- ----------------------------
NLS_CALENDAR GREGORIAN
NLS_CHARACTERSET AL32UTF8
NLS_COMP BINARY
NLS_CURRENCY $
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_DUAL_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_LANGUAGE AMERICAN
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_NCHAR_CONV_EXCP FALSE
NLS_NUMERIC_CHARACTERS .,
NLS_RDBMS_VERSION 19.0.0.0.0
NLS_SORT BINARY
NLS_TERRITORY AMERICA
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
and
select *
from nls_session_parameters
ORDER BY parameter
outputs:
PARAMETER VALUE
----------------------- ----------------------------
NLS_CALENDAR GREGORIAN
NLS_COMP BINARY
NLS_CURRENCY £
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_DUAL_CURRENCY €
NLS_ISO_CURRENCY UNITED KINGDOM
NLS_LANGUAGE AMERICAN
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NUMERIC_CHARACTERS .,
NLS_SORT BINARY
NLS_TERRITORY UNITED KINGDOM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH24.MI.SSXFF
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH24.MI.SSXFF TZR
NLS_TIME_FORMAT HH24.MI.SSXFF
NLS_TIME_TZ_FORMAT HH24.MI.SSXFF TZR
LiveSQL here
However, running the same script in db<>fiddle, which is running Oracle 18c, outputs:
| TO_CHAR(1111,'FML9999') |
| :---------------------- |
| #1111 |
and the database NLS parameters are:
PARAMETER | VALUE
:---------------------- | :---------------------------
NLS_CALENDAR | GREGORIAN
NLS_CHARACTERSET | AL32UTF8
NLS_COMP | BINARY
NLS_CURRENCY | $
NLS_DATE_FORMAT | DD-MON-RR
NLS_DATE_LANGUAGE | AMERICAN
NLS_DUAL_CURRENCY | $
NLS_ISO_CURRENCY | AMERICA
NLS_LANGUAGE | AMERICAN
NLS_LENGTH_SEMANTICS | BYTE
NLS_NCHAR_CHARACTERSET | AL16UTF16
NLS_NCHAR_CONV_EXCP | FALSE
NLS_NUMERIC_CHARACTERS | .,
NLS_RDBMS_VERSION | 18.0.0.0.0
NLS_SORT | BINARY
NLS_TERRITORY | AMERICA
NLS_TIMESTAMP_FORMAT | DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT | HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT | HH.MI.SSXFF AM TZR
and the session parameters are:
PARAMETER | VALUE
:---------------------- | :--------------------------
NLS_CALENDAR | GREGORIAN
NLS_COMP | BINARY
NLS_CURRENCY | #
NLS_DATE_FORMAT | DD-MON-RR
NLS_DATE_LANGUAGE | ENGLISH
NLS_DUAL_CURRENCY | ?
NLS_ISO_CURRENCY | UNITED KINGDOM
NLS_LANGUAGE | ENGLISH
NLS_LENGTH_SEMANTICS | BYTE
NLS_NCHAR_CONV_EXCP | FALSE
NLS_NUMERIC_CHARACTERS | .,
NLS_SORT | BINARY
NLS_TERRITORY | UNITED KINGDOM
NLS_TIMESTAMP_FORMAT | DD-MON-RR HH24.MI.SSXFF
NLS_TIMESTAMP_TZ_FORMAT | DD-MON-RR HH24.MI.SSXFF TZR
NLS_TIME_FORMAT | HH24.MI.SSXFF
NLS_TIME_TZ_FORMAT | HH24.MI.SSXFF TZR
Trying various ways to get a £ symbol as the NLS_CURRENCY value:
DECLARE
TYPE StringMap IS TABLE OF VARCHAR2(10) INDEX BY VARCHAR2(20);
a_curr StringMap;
v_idx VARCHAR2(20);
v_stmt VARCHAR2(200);
BEGIN
a_curr('Pound sign') := '£';
a_curr('UNISTR(''\00A3'')') := UNISTR( '\00A3' );
a_curr('CHR(163)') := CHR(163);
v_idx := a_curr.FIRST;
LOOP
EXIT WHEN v_idx IS NULL;
DECLARE
v_stmt VARCHAR2(200) := 'ALTER SESSION SET NLS_CURRENCY = '''|| a_curr(v_idx) ||'''';
BEGIN
DBMS_OUTPUT.PUT_LINE( v_idx );
DBMS_OUTPUT.PUT_LINE( ' ' || v_stmt );
EXECUTE IMMEDIATE v_stmt;
DBMS_OUTPUT.PUT_LINE( ' ' || TO_CHAR(1111, 'FML9999' ) );
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE( 'Error: ' || SQLERRM );
END;
v_idx := a_curr.NEXT(v_idx);
END LOOP;
END;
/
Outputs:
CHR(163)
ALTER SESSION SET NLS_CURRENCY = ''
Error: ORA-12705: Cannot access NLS data files or invalid environment specified
Pound sign
ALTER SESSION SET NLS_CURRENCY = '??'
??1111
UNISTR('\00A3')
ALTER SESSION SET NLS_CURRENCY = '#'
#1111
And none of them have set NLS_CURRENCY to £. In fact, it seems to be very difficult (impossible?) to get a £ character to output in an Oracle db<>fiddle.
db<>fiddle here
However, on a local install of Oracle 11gR2, if I run:
ALTER SESSION SET NLS_TERRITORY = "UNITED KINGDOM";
ALTER SESSION SET NLS_CURRENCY = '£';
SELECT TO_CHAR(1111, 'FML9999') FROM DUAL;
Then the output is:
TO_CHAR(1111,'F
---------------
£1111
Looking at this:
The character sets are the same and there are no obvious difference in settings (and changing NLS_LANGUAGE to AMERICAN makes no difference) but its possible there is a setting that I've missed;
It may be that the difference is in the front-end from LiveSQL to db<>fiddle (and SQL/PLUS);
It may be in the codepage used by the operating system to display characters - you can see more details for Windows here;
Or it may be an issue with the Oracle version and the handling of £ is "fixed" in Oracle 19 (although since it works for me on Oracle 11g I'm not sure this is the case).
I have a table SOME_TABLE with a field SOME_NUMBER of type number. If I try to convert this field to char, I'm getting an error in some cases.
This select
select id,
SOME_NUMBER,
to_char(SOME_NUMBER,'fm999999990D999999999999','nls_numeric_characters=''.,''') tochar,
dump(SOME_NUMBER)
from SOME_TABLE
where id in (
19876,
19886,
19857,
19792,
19810
);
will result in
+-------+---------------+-------------------------+----------------------------+
| ID | SOME_NUMBER | TOCHAR | DUMP(SOME_NUMBER) |
+-------+---------------+-------------------------+----------------------------+
| 19792 | 0,00000013147 | 0.00000013147 | Typ=2 Len=4: 189,14,15,71 |
| 19810 | 0,0000001387 | ####################### | Typ=2 Len=3: 189,15,244 |
| 19857 | 0,00000011896 | ####################### | Typ=2 Len=4: 189,13,246,61 |
| 19876 | 0,00000012962 | 0.00000012962 | Typ=2 Len=4: 189,13,97,21 |
| 19886 | 0,00000011896 | ####################### | Typ=2 Len=4: 189,13,246,61 |
+-------+---------------+-------------------------+----------------------------+
Furthermore, I've noticed that internal representation for erroneous examples is different from the one I can get with a "direct" select:
select 0.0000001387,dump(0.0000001387) dump from dual union all
select SOME_NUMBER,dump(SOME_NUMBER)from SOME_TABLE where id in (19810);
returns
+--------------+-------------------------+
| 0.0000001387 | DUMP |
+--------------+-------------------------+
| 0,0000001387 | Typ=2 Len=3: 189,14,88 |
| 0,0000001387 | Typ=2 Len=3: 189,15,244 |
+--------------+-------------------------+
and this is true only and always for the records that give me error, so this must be related with the issue.
Is this a bug in how Oracle manages number fields? Or this is an acceptable/correct/possible situation? How could I fix this? If I try to to_char the field without any format string the connection drops down... it cannot be normal. Is this a known issue?
I try to add a constant in package specification with nvarchar2 datatype but after compilation it stores in database something like ???. For example I try to add a constant for armenian word մեկ
x constant nvarchar2(3) default 'մեկ';
Can anyone suggest a solution to this problem or it is impossible to do so?
I have tested you example on two different databases with different NLS_CHARACTERSET configurations.
Configurations (recived by query -
select *
from v$nls_parameters
where parameter in ('NLS_NCHAR_CHARACTERSET','NLS_CHARACTERSET','NLS_LANGUAGE')
):
First:
+----+------------------------+----------+
| id | PARAMETER | VALUE |
+----+------------------------+----------+
| 1 | NLS_LANGUAGE | AMERICAN |
| 2 | NLS_CHARACTERSET | AL32UTF8 |
| 3 | NLS_NCHAR_CHARACTERSET | AL16UTF16|
+----+------------------------+----------+
Second:
+----+------------------------+-------------+
| id | PARAMETER | VALUE |
+----+------------------------+-------------+
| 1 | NLS_LANGUAGE | RUSSIAN |
| 2 | NLS_CHARACTERSET | CL8MSWIN1251|
| 3 | NLS_NCHAR_CHARACTERSET | AL16UTF16 |
+----+------------------------+-------------+
And the result is following, on DB with charset AL32UTF8 variable displays correctly, on charset CL8MSWIN1251 with questions '???'.
I haven't change charsests on databases to validate my suggestion. So I suggest you change NLS_CHARACTERSET to AL32UTF8 it should help.
My package for tests:
create or replace package question27577711 is
x constant nvarchar2(3) default 'մեկ';
function get_constant_x return nvarchar2;
end question27577711;
create or replace package body question27577711 is
function get_constant_x
return nvarchar2
is
begin
return x;
end get_constant_x;
end question27577711;
select question27577711.get_constant_x from dual
I am migrating a MySQL 5.1 database in Amazon's EC2, and I am having issues tables with longblob datatype we use for image storage. Basically, after the migration, the data in the longblob column is a different size, due to the fact that the character encoding seems to be different.
First of all, here is an example of before and after the migration:
Old:
x??]]??}?_ѕ??d??i|w?%?????q$??+?
New:
x��]]����_ѕ��d��i|w�%�����q$��+�
I checked the character set variables on both machines and they are identical. I also checked the 'show create table' and they are identical as well. The client's are both connecting the same way (no SET NAMES, or specifying character sets).
Here is the mysqldump command I used (I tried it without --hex-blob as well):
mysqldump --hex-blob --default-character-set=utf8 --tab=. DB_NAME
Here is how I loaded the data:
mysql DB_NAME --default-character-set=utf8 -e "LOAD DATA INFILE 'EXAMPLE.txt' INTO TABLE EXAMPLE;"
Here are the MySQL character set variables (identical):
Old:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
New:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I'm not sure what else to try to be able to run mysqldump and have the blob data be identical on both machines. Any tips would be greatly appreciated.
The issue seems to be a bug in mysql (http://bugs.mysql.com/bug.php?id=27724). The solution is to not use mysqldump, but to write your own SELECT INTO OUTFILE script for the tables that have blob data. Here is an example:
SELECT
COALESCE(column1, #nullval),
COALESCE(column2, #nullval),
COALESCE(HEX(column3), #nullval),
COALESCE(column4, #nullval),
COALESCE(column5, #nullval)
FROM table
INTO OUTFILE '/mnt/dump/table.txt'
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
To load the data:
SET NAMES utf8;
LOAD DATA INFILE '/mnt/dump/table.txt'
INTO TABLE table
FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
(column1, column1, #column1, column1, column1)
SET data = UNHEX(#column1)
This loads the blob data correctly.