Euro '€' symbol not inserted correctly during Oracle SQL LOAD - oracle

I'm loading a csv file using sqlldr.
the file contains the symbol "€" which is inserted into a VARCHAR2 column.
After the load, the database displays '¿' instead of the euro symbol.
I have specified the characterset in the control file during the load:
LOAD DATA
CHARACTERSET WE8MSWIN1252
I'm runing all of this on a Solaris machine, which by the way can't display the '€' symbol, I gives me a '.' instead when I hit the key to get the €.
We are using the data for BI purposes, so we have to keep the varchar2 column, even though changing its type to nvarchar2 inserts the € symbol correctly.
Can you suggest any other solution for the issue?
When I run locale command on the machin, I get :
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
and my NLS DATABASE PARAMETERS Are:
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8DEC
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
I have tried setting the NLS_LANG variable, but nothing seems to work.
Regards.

Your database is not configured to support the Euro character in a VARCHAR2 column. Your database's NLS_CHARACTERSET of WE8DEC means that it uses the old DEC MCS character set. That character set long predates the Euro character (it's even older than the ISO 8859-1 character set that also predates the Euro). So you can't validly store a Euro character in a VARCHAR2 column in your database.
You could change the database character set to something that supports the Euro character. There are many such character sets but I would guess that ISO-8859-15, Windows-1252, or UTF-8 would be the easiest migrations. Personally, I'd always prefer Unicode (UTF-8 which is the AL32UTF8 character set in Oracle) but you may have a reason to prefer a single-byte character set. Alternatively, you could declare this column (and any others that need to use the Euro character) to be NVARCHAR2 columns since your national character set supports Unicode. Supporting NVARCHAR2 columns may require changes to your front-end applications. If you can change the database character set, that would be my strong preference over adding NVARCHAR2 columns.

Related

difference between NLS_NCHAR_CHARACTERSET and NLS_CHARACTERSET for Oracle

I would like to know the difference between
NLS_NCHAR_CHARACTERSET and NLS_CHARACTERSET settings in Oracle?
From my understanding, NLS_NCHAR_CHARACTERSET is for NVARCHAR data types
and for NLS_CHARACTERSET would be for VARCHAR2 data types.
I tried to test this on my development server which my current settings for CHARACTERSET are as the following :
PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET US7ASCII
Then I inserted some Chinese character values into the database. I inserted the characters into a table called data_<seqno> and updated the column for ADDRESS and ADDRESS_2 which are VARCHAR2 columns. Right from my understanding with the current setting for NLS_CHARACTERSET US7ASCII, Chinese characters should not be supported but it is still showing in the database. Does NLS_NCHAR_CHARACTERSET take precedence over this?
Thank You.
In general all your points are correct. NLS_NCHAR_CHARACTERSET defines the character set for NVARCHAR2, et. al. columns whereas NLS_CHARACTERSET is used for VARCHAR2.
Why is it possible that you see Chinese characters with US7ASCII?
The reason is, your database character set and your client character set (i.e. see NLS_LANG value) are both US7ASCII. Your database uses US7ASCII and it "thinks" also the client sends data using US7ASCII. Thus it does not make any conversion of the strings, the data are transferred bit-by-bit from client to server and vice versa.
Due to that fact you can use characters which are actually not supported by US7ASCII. Be aware, in case your client uses a different character set (e.g. when you use ODP.NET Managed Driver in an Windows application) the data will be rubbish! Also if you would consider a database character set migration you have the same issue.
Another note: I don't think you would get the same behavior with other character sets, e.g. if your database and your client both would use WE8ISO8859P1 for example. Also be aware that you actually have wrong configuration. Your database uses character set US7ASCII, your NLS_LANG value is also US7ASCII (most likely it is not set at all and Oracle defaults it to US7ASCII) but the real character set of SQL*Plus, resp. your cmd.exe terminal is most likely CP950 or CP936.
If you like to set everything properly you can either set your environment variable NLS_LANG=.ZHT16MSWIN950 (CP936 seems to be not supported by Oracle) or change your codepage before running sqlplus.exe with command chcp 437. With this proper settings you will not see any Chinese characters as you probably would have expected.

NLS_NCHAR_CHARACTERSET explanation

I must start that I am not Oracle user expert so I have some problems undestanding the basics :D. Our application is an MVC one with Nhibernate db connection. Problem lies when we try to save a characters like 'ѼóÂ' into a NVARCHAR2 field, they are saved as a question mark '?'. To fix this we changed to a different charset in database.
Here are our nls_database_parameters at installation:
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET EE8ISO8859P2
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 10.2.0.4.0
Originaly NLS_CHARSET was EE8ISO8859P2 and we changed it to: AL32UTF8 (works perfectly). Question is isn't NLS_NCHAR_CHARACTERSET suposed to handle those special characters for fields like nvarchar2 ? If no then can someone please explain to me the its purspose?
Edit: NLS_LANG is set to: POLISH_POLAND.AL32UTF8
National characterset was used in earlier times, i.e. before Unicode was available. The main idea was to have the common characterset where you store language independent items (include any source codes, etc.) for VARCHAR2/CHAR and have a customer-specific, i.e. language specific national characterset for NVARCHAR2/NCHAR.
In my opinion there is no reason to use it nowadays, since AL32UTF8 (or any other Unicode coding) is able to store any character anyway.
Maybe when you work in non-western languages a national characterset like AL16UTF16 or AL32UTF32 are slightly beneficial in terms of storage and efficiency.
Regarding you question: National Characterset AL16UTF16 is able to store any Unicode character, so your polish characters should be no problem. However, maybe your client applications (or the selected font) is not able to display such characters

Hardcoded strings are cut in half with node-oracle

I use node-oracle to connect to an Oracle db.
When I select values from tables with cyrillic data, everything is fine, but if I call a procudure like this:
CREATE OR REPLACE PROCEDURE TEST_ENCODING (CUR OUT SYS_REFCURSOR) AS
BEGIN
open cur for
select 'тест' as hello from dual; -- cyrillic hardcoded text
END TEST_ENCODING;
and then call it from node:
connection.execute("call TEST_ENCODING(:1)", [new oracle.OutParam(oracle.OCCICURSOR)],
function (err, result) {
console.log(result)
}
);
Result is:[ { HELLO: 'те' } ] (the string is cut in half).
The database is configured as follows:
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET CL8MSWIN1251
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 11.2.0.3.0
In my local env: NLS_LANG=AMERICAN_AMERICA.UTF8
(also tried NLS_LANG=RUSSIAN_RUSSIA.UTF8 and RUSSIAN_RUSSIA.AL32UTF8 with same results)
My configuration:
Mac OS X 10.9
Oracle Client 11.2
node 0.10.22
node-oracle 0.3.4
Seems that for now there are no support for encodings other then UTF8 in node-oracle because node.js dosn't support native encodings (proof).
To handle strings properly you need to set NLS_LANG parameter on the client to same value as in database (CL8MSWIN1251)
So, you can choice from 2 variants:
A) Migrate database to UTF8 encoding.
B) Patch node-oracle source to convert strings and CLOBs to UTF8 before returning it content to node.js and applying conversion from UTF8 to CL8MSWIN1251 before passing it to Oracle. OCI interface have a functions for such conversions. E.g. for your local purpose it's enough to patch OBJ_GET_STRING macro in utils.h
P.S. node-oracle looks very simplistic at the moment, so be prepared for many surprises (e.g. no support for BLOBs and collections, lack of connections settings and so on).
It could be because your database primary charset is CL8MSWIN1251, when local setting specifies UTF8.
NLS_CHARACTERSET CL8MSWIN1251
The variable NLS_LANG specifies how to interpret your local environment
NLS_LANG = language_territory.charset
The last part of NLS_LANG provides information about local charset and it is used to let Oracle know what character set you are USING on the client side, so Oracle can do the proper conversion. Probably, values from tables are converted properly, when charset of value from dual table is not identified correctly.
Please try to set NLS_LANG variable to AMERICAN_AMERICA.CL8MSWIN1251 (or RUSSIAN_RUSSIA.CL8MSWIN1251, it doesn't really matter)
are you sure that your source code has UTF-8 character set?
if problem only with hardcoded symbols maybe your GUI for Oracle development do not support UTF-8
I have similar problem with special characters like ¥ in my package and sql*plus that convert special characters into some unreadable

How can I tell if my Oracle system is set to support Unicode or multibyte characters?

I understand that Oracle supports multiple character sets, but how can determine if the current 11g system where I work has that functionality enabled?
SELECT *
FROM v$nls_parameters
WHERE parameter LIKE '%CHARACTERSET';
will show you the database and national character set. The database character set controls the encoding of data in CHAR and VARCHAR2 columns. If the database supports Unicode in those columns, the database character set should be AL32UTF8 (or UTF8 in some rare cases). The national character set controls the encoding of data in NCHAR and NVARCHAR2 columns. If the database character set does not support Unicode, you may be able to store Unicode data in columns with these data types but that generally adds complexity to the system-- applications may have to change to support the national character set.
Unicode is a character encoding system that defines every character in most of the spoken languages in the world, Support for Unicode in Oracle Database:
Character Set Supported in RDBMS Release Unicode Encoding
AL24UTFFSS 7.2 - 8i UTF-8
UTF8 8.0 - 11g UTF-8
UTFE 8.0 - 11g UTF-EBCDIC
AL32UTF8 9i - 11g UTF-8
AL16UTF16 9i - 11g UTF-16
To Make sure your database is Unicode, please check the value of "NLS_CHARACTERSET" Parameter and it should be AL32UTF8 or AL16UTF16 from above list.
SQL>
SQL> SELECT * FROM v$nls_parameters WHERE parameter='NLS_CHARACTERSET';
PARAMETER VALUE CON_ID
--------------------------- ------------------- ----------
NLS_CHARACTERSET AL32UTF8 0
To Change the value of Parameter, Please Take the Fullback up because ALTER DATABASE statement cannot be rolled back and the Use following statements:
SHUTDOWN IMMEDIATE
STARTUP MOUNT;
ALTER SYSTEM ENABLE RESTRICTED SESSION;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;
ALTER DATABASE OPEN;
ALTER DATABASE CHARACTER SET AL32UTF8;
SHUTDOWN IMMEDIATE;
STARTUP;

Oracle character encoding - specific symbols displayed incorrectly

Web-browser displays specific Norwegian symbols incorrectly.
configuration of DB:
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET WE8ISO8859P1
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
Is the problem in NLS_CHARACTERSET?
If the database hasn't been set up to use a multi-byte characterset, or a single character byteset compatible with those Norwegian characters, then nothing the client can do can fix that.
But you can try setting the environment variables and see whether it works
NLS_LANG=AMERICAN_AMERICA.AL32UTF8
export NLS_LANG
Looks like the database is set up for a US based character set. It is single byte "WE8ISO8859P1" and not aimed at Norwegian.
Modern apps (and Web browsers) generally use UTF-8 so there is going to be a loss of information when converting UTF-8 data to a single byte character set.
I think, for the long run, you'd be best off trying to get the database converted.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm

Resources