My test-database has a AL32UTF8 encoding, however the production database has a WE8ISO8859P1 encoding. My application is writen in .NET and I use the default System.Data.OracleClient.OracleConnection class to make the connection.
I create a IDbCommand and I add IDbDataParameter objects when I want to INSERT some strings with non ASCII symbols.
On the test database everything works well, apperently converting .NET's internal string format to AL32UTF8 works fine. However on production I just doesn't work. The .NET internal string representation (which is utf16) can't be converted to WE8ISO8859P1 somehow.
My question:
Can you specify the database's encoding in the connection-string? Or is there another method to tell the driver (System.Data.OracleClient.OracleConnection) that the database expects a specific encoding?
The conversion should happend automatically as long as you don't use characters that cannot be represented in WE8ISO8859P1. If you have such characters, you cannot store it in the database anyway.
If you try to store the Euro sign (€), you'll be out of luck. It's not part of WE8ISO8859P1.
Related
What ways do you know to convert UTF8 to UTF16. Maybe even through UDF functions. Well, you can also write in general about how to use UTF16 in Firebird?
You can't do that, because Firebird doesn't support UTF16 as a character set. If you want to convert data in UTF-8 to UTF-16, you will have to do that in your client application, not in Firebird.
In theory you could do it using a UDF (deprecated) or UDR (replacement for UDF), but from the perspective of Firebird it would just be binary data, so doing this within Firebird wouldn't be very useful in my opinion.
Based on the our client requirements we configure our oracle (version - 12c) deployments to support single or multi-byte data (through character set setting). There is a need to cache third party multi-byte data(json) for performance reasons. We found that we could encode the data in UTF-8 and persist it (after converting it to bytes) in a BLOB column of an Oracle table. This is a hack that allows us to store multi-byte data in single byte deployments. There are certain limitations that come with this approach like
The data cannot be queried or updated through SQL code (stored procedures).
Search operation using for e.g. LIKE operators could not be performed.
Marshaling and unmarshaling overhead for every operation at the application layer (java)
Assuming we compromise with these limitations are there any other drawbacks that we should be aware off?
Thanks.
Ok, I am summarizing my comments in a proper answer.
You have two possible solutions:
store them in NVARCHAR2/NCLOB columns
re-encode JSON values in order to use only ASCII characters
1. NCLOB/NVARCHAR
The "N" character in "NVARCHAR2" stands for "National": this type of column has been introduced exactly to store characters that can't be represented in the "database character set".
Oracle actually supports TWO character sets:
"Database Character Set" it is the one used for regular varchar/char/clob fields and for the internal data-dictionary (in other words: it is the character set you can use for naming tables, triggers, columns, etc...)
"National Character Sets": the character set used for storing NCLOB/NCHAR/NVARCHAR values, which is supposed to be used to be able to store "weird" characters used in national languages.
Normally the second one is a UNICODE character set, so you can store any kind of data in there, even in older installations
2. encode JSON values using only ASCII characters
It is true that the JSON standard is designed with UNICODE in mind, but it is also true that it allows characters to be expressed as escape sequences using the exadecimal representation of their code points.. and if you do so for every character having a code point greater than 127, you can express ANY unicode object using only ASCII character.
This ASCII JSON string: '{"UnicodeCharsTest":"ni\u00f1o"}' represents the very same object of this other one: '{"UnicodeCharsTest" : "niño"}'.
Personally I prefer this second approach because it permits me to share easily these json strings also with systems using antiquate legacy protocols and also it allows me to be sure that the json strings are read correctly by any client regardless of its national settings (the oracle client protocol can try to translate strings into the character used by the client... and this is a complication I don't want to deal with. By the way: this might be the reason of the problems you are experiencing with SQL clients)
I have a Oracle server with a DAD defined with PlsqlNLSLanguage DANISH_DENMARK.WE8ISO8859P1.
I also have a JavaScript file that is loaded in the browser. The JavaScript file contains the danish letters æøå. When the js file is saved as UTF8 the danish letters are misencoded. When I save js file as UTF8-BOM or ANSI then the letters are shown correctly.
I am not sure what is wrong.
Try to set your DAD
PlsqlNLSLanguage DANISH_DENMARK.UTF8
or even better
PlsqlNLSLanguage DANISH_DENMARK.AL32UTF8
When you save your file as ANSI it typically means "Windows Codepage 1252" on Western Windows, see column "ANSI codepage" at National Language Support (NLS) API Reference. CP1252 is very similar to ISO-8859-1, see ISO 8859-1 vs. Windows-1252 (it is the German Wikipedia, however that table shows the differences much better than the English Wikipedia). Hence for a 100% correct setting you would have to set PlsqlNLSLanguage DANISH_DENMARK.WE8MSWIN1252.
Now, why do you get correct characters when you save your file as UTF8-BOM, although there is a mismatch with .WE8ISO8859P1?
When the browser opens the file it first reads the BOM 0xEF,0xBB,0xBF and assumes the file encoded as UTF-8. However, this may fail in some circumstances, e.g. when you insert text from a input field to database.
With PlsqlNLSLanguage DANISH_DENMARK.AL32UTF8 you tell the Oracle Database: "The web-server uses UTF-8." No more, no less (in terms of character set encoding). So, when your database uses character set WE8ISO8859P1 then the Oracle driver knows he has to convert ISO-8859-1 characters coming from database to UTF-8 for the browser - and vice versa.
I have a legacy database that claims to have collation set to windows-1252 and is storing a text field's contents as
I’d
When it is displayed in a legacy web-app it shows as I’d in the browser. The browser reports a page encoding of UTF-8. I can't figure out how that conversion has been done (almost certain it isn't via an on-the-fly search-and-replace). This is a problem for me because I am taking the text field (and many others like it) from the legacy database and into a new UTF-8 database. A new web app displays the text from the new database as
I’d
and I would like it to show it as I’d. I can't figure out how the legacy app could have achieved this (some fiddling in Ruby hasn't showed me a way to affect converting a string I’d to I’d).
I've tied myself in a knot here somewhere.
It probably means the previous developer screwed up data insertion (or you're screwing up somewhere). The scenario goes like this:
the database connection is set to latin1
app actually sends UTF-8 to database
database interprets received data as latin1, stores it as such (interprets ’ as ’)
app queries for the data again
database returns ’ encoded in latin1
app interprets the data as UTF-8, resulting in ’
You essentially need to do the same misinterpretation to get good data. Right now you may be querying the database through a utf8 connection, so the database returns ’ encoded in UTF-8. What you need to do is query through a latin1 connection and interpret the data as UTF-8 instead.
See Handling Unicode Front To Back In A Web App for a more detailed explanation of all this.
How to import chinese characters from excel into oracle and also extract chinese characters from oracle into excel?
You would simply need to ensure that you have set Oracle to use a Unicode-compatible character encoding (I recommend UTF-8) and that no tool you are using to do the transfer is non-Unicode-safe.
Without knowing what you're using to do the transfer it's hard to give more specific information about where something is going wrong.
Make sure the database is setup with the correct character set to store/house chinese characters properly. If the database can't store it then it pointless to imported it. If the database can support the asian character set then set the client character set to a compatible one. Once that is done the underline oci/odbc layer will handle the translation of one character set to the other as long as they are compatible.
Check the following.
Database Character set (zhs16gbk)
Set the local variable to NLS_LANG=SIMPLIFIED CHINESE.CHINA_UTF8
Open the excel file save the file to unicode text
open the text file save the file to ANSI format
In the PLSQL developer, we have option called text importer. Using this tool we can import the data sucessfully