I have an issue with invalid characters appearing in an Oracle database. The "¿" or upside-down question mark. I know its caused by UTF8 encoding's being put into the encoding of the Oracle database. Examples of characters I am expecting are ' – ' which looks like a normal hyphen but isnt, and ' ’ ' which should be a normal single quote.
Input: "2020-08-31 – 2020-12-31" - looks like normal hyphen but not
Output: "2020-08-31 ¿ 2020-12-31"
I know the primary source of the characters are copy and paste from office programs like word and its conversion to smart quotes and stuff. Getting the users to turn off this feature is not a viable solution.
There was a good article about dealing with this and they offered a number of solutions:
switch database to UTF8 - cant do it due to reasons
change varchar2 to nvarchar2 - I tried this solution as it would be the best for my needs, but when tested I was still ending up with invalid characters, so its not the only point.
use blob/clob - tried and got page crashes
filter - not attractive as there are a lot of places that would need to be updated.
So the technologies being used are Eclipse, Spring, JDBC version 4.2, and Oracle 12.
So the information is entered into the form, the form gets saved, it passes from the controller into the DAO and when its checked here, the information is correct. Its when it passes here into the JDBC where I lose sight of it, and once it enters the database I cant tell if its already changed or if thats where its happening, but the database stores the invalid character.
So somewhere between the insert statement to the database this is happening. So its either the JDBC or the database, but how to tell and how to fix? I have changed the field where the information is being stored, its originally a varchar2 but I tried nvarchar2 and its still invalid.
I know that this question is asking about the same topic, but the answers do not help me.
SELECT * FROM V$NLS_PARAMETERS WHERE PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
PARAMETER VALUE CON_ID
---------------------------------------------------------------- ---------------------------------------------------------------- ----------
NLS_CHARACTERSET WE8ISO8859P15 0
NLS_NCHAR_CHARACTERSET AL16UTF16 0
So in the comments I was given a link to something that might help but its giving me an error.
#Override public Long createComment(Comment comment) {
return jdbcTemplate.execute((Connection connection) -> {
PreparedStatement ps = connection.prepareStatement("INSERT INTO COMMENTS (ID, CREATOR, CREATED, TEXT) VALUES (?,?,?,?)", new int[] { 1 });
ps.setLong(1, comment.getFileNumber());
ps.setString(2, comment.getCreator());
ps.setTimestamp(3, Timestamp.from(comment.getCreated()));
((OraclePreparedStatement) ps).setFormOfUse(4, OraclePreparedStatement.FORM_NCHAR);
ps.setString(4, comment.getText());
if(ps.executeUpdate() != 1) return null;
ResultSet rs = ps.getGeneratedKeys();
return rs.next() ? rs.getLong(1) : null;
});
An exception occurred: org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement cannot be cast to oracle.jdbc.OraclePreparedStatement
Character set WE8ISO8859P15 (resp. ISO 8859-15) does not contain – or ’, so you cannot store them in a VARCHAR2 data field.
Either you migrate the database to Unicode (typically AL32UTF8) or use NVARCHAR2 data type for this column.
I am not familiar with Java but I guess you have to use getNString. The linked document should provide some hints to solve the issue.
Related
We have an Oracle Database which has many records in it. Recently we noticed that we can not save Persian/Arabic digits within a column with a datatype nvarchar2 and instead of the numbers it shows question marks "?".
I went through this to check the charset using these commands :
SELECT *
from NLS_DATABASE_PARAMETERS
WHERE PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
and this command
SELECT USERENV('language') FROM DUAL;
The results are these two respectively:
I also issue this command :
SELECT DUMP(myColumn, 1016) FROM myTable;
And the result is like this :
Typ=1 Len=22 CharacterSet=AL16UTF16: 6,33,6,44,6,27,6,45,0,20,0,3f,0,3f,0,2f,0,3f,0,2f,0,3f
The results seem to be okay but unfortunately we still cannot save any Persian/Arabic digit within that column. however the Persian/Arabic alphabets are okay. Do you know what is the cause of this problem ?
Thank You
USERENV('language') does not return your client characters set.
So, SELECT USERENV('language') FROM DUAL; is equal to
SELECT l.value||'_'||t.value||'.'||c.value
FROM (SELECT * FROM NLS_SESSION_PARAMETERS WHERE PARAMETER = 'NLS_LANGUAGE') l
CROSS JOIN (SELECT * FROM NLS_SESSION_PARAMETERS WHERE PARAMETER = 'NLS_TERRITORY') t
CROSS JOIN (SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER = 'NLS_CHARACTERSET') c;
It is not possible to get the client NLS_LANG by any SQL statement (although there seems to be a quirky workaround: How do I check the NLS_LANG of the client?)
Check your client NLS_LANG setting. It is defined either by Registry (HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG, resp. HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG) or as Environment variable. The Environment variable takes precedence.
Then you must ensure that your client application (you did not tell us which one you are using) uses the same character set as specified in NLS_LANG.
In case your application runs on Java have a look at this: Database JDBC Developer's Guide - Globalization Support
See also OdbcConnection returning Chinese Characters as "?"
Do yourself a favor and convert the main character set of your database from AR8MSWIN1256 to AL32UTF8. Most of these problems will simply go away. You can forget about NCHAR and NVARCHAR2. They are not needed anymore.
It's a one-time effort that will pay back a thounsand times.
See Character Set Migration for instructions.
Using PL/SQL Developer, I'm able to insert French character in my Oracle database without any error.
Querying:
SELECT * FROM nls_database_parameters WHERE parameter = 'NLS_NCHAR_CHARACTERSET';
Output: AL16UTF16
But when i retreive the data using select statement it get converted into some junk characters, For eg:
système gets converted to système and so on....
Any suggestion/workaround will be appreciated.
The issue was due to different values in NLS_LANGUAGE at client and server.
At server it was: AMERICAN
use following query to read the parameters:
SELECT * FROM nls_database_parameters
At client it was: AMERICAN_AMERICA.WE8MSWIN1252
In PL/SQL Developer Help->About, click on Additional Info button and scroll down.
What i observed other thing, while trying to fix the issue:
The characters were not converting to junk characters in first update.
But when i retreive them(which contains non-ascii characters) and update again, then they are converting to junk characters.
We are having problems with text that is encoded in some different ways but kept in a single column in a table. Long story. On MySQL, I can do "select hex(str) from table where" and I see the bytes of the string exactly as I set them.
On Oracle, I have a string which starts with the Turkish character İ, which is the Unicode character 0x0130 "LATIN CAPITAL LETTER WITH DOT ABOVE". This is in my printed copy of the Unicode Version 2.0 book. In UTF-8, this character is 0xc4b0.
We have very old client apps we need to support. They would send us this text in "windows-1254". We used to just close our eyes, store it, and hand it back later. Now we need the Unicode, or are being given the Unicode.
So I have:
SQL> select id, name from table where that thing;
ID NAME
------ ------------------------
746 Ý
This makes sense because the "İ" is 0xdd in windows-1254 and 0xdd in wondows-1252 is "Ý". My terminal is presumably set to the usual windows-1252.
But:
SQL> select id, rawtohex(name) from table where that thing;
ID RAWTOHEX(NAME)
------ ------------------------
746 C39D
There seems to be no equivalent to the hex(name) function in MySQL. But I must be missing something. What am I missing here?
My java code has to take the utf8 that I am supplied and save a utf8 copy and a windows-1252 copy. The java code gives me:
bytes (utf8): c4 b0
bytes (1254): dd
Yet, when I save it, the client does not get the correct character. And when I try to see what Oracle has actually stored, i get the garbage seen above. I have no idea where the C39D is coming from. Any suggestions?
We have ojdbc14.jar built into all of our applications and we are connecting to a database that says it is "Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production".
Use the dump function to see how Oracle stores data internally.
You seem to have a misunderstanding on how Oracle treats VARCHAR2 characters set conversions: you can't influence how Oracle stores its data physically. (Also if you haven't already, it's helpful to read: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets).
Your client speaks to Oracle only in binary. In fact all systems exchange information in binary only. To understand each others, it is necessary that both systems know what language (character set) is being used.
In your case we can reconstruct what happens:
Your client sends the byte dd to Oracle and says it is windows-1252 (instead of 1254).
Oracle looks up its character set table and sees that this data is translated to the symbol Ý in this character set.
Oracle logically stores this information in its table.
Since Oracle is setup in UTF-8, it converts this data to the UTF-8 binary reprensentation of Ý:
SQL> SELECT rawtohex('Ý') FROM dual;
RAWTOHEX('Ý')
--------------
C39D
Oracle stores C39D internally.
As you can see, the problem comes from the first step: there is a problem of setup. As long as you don't fix this, the systems won't be able to successfully dialogue.
The conversion is automatic when you use VARCHAR2 because this datatype is a logical text symbol interface (you have next to no control over forcing the actual binary data being stored).
I have bytes in UTF-8 to begin.
String strFromUTF8 = new String(bytes, "UTF8");
byte[] strInOldStyle = strFromUTF8.getBytes("Cp1254");
With MySQL, I am done. I takes these bytes, turn them into a hex string and do an update with unhex(hexStr). This allows me to put the legacy bytes into a varchar column.
With Oracle, I must do:
String again = new String(strInOldStyle, "Cp1254");
byte[] nextOldBytes = again.getBytes("UTF8");
Now, I can do an update and get the bytes into a varchar2 column with:
update table set colName = UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('hexStr')) where ...
Strange, no? I am sure I have made this more complex than it needed to be.
What we see is this, though,
"İ" in UTF-8 == 0xc4d0
"İ" in Cp1254 == 0xdd == "Ý" in Cp1252
"Ý" in UTF-8 == 0xc3d9
So, if I get the string "İ" and do:
update table set name = UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C3D9')) where ...
Then our legacy client gives us a "İ". Yep. It works.
When using Oracle ascii function:
select ascii('A') from dual;
It return 65 is right.
But,when i using:
select ascii('周') from dual;
The return is 55004.The ascii can represent>255???
How to explain?
Help!!!!
My oracle version:Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
my Characterset:6 NLS_CHARACTERSET ZHS16GBK
ASCII in the name is a holdover from when Oracle only supported ASCII. It does not mean it only returns ASCII values.
From the docs:
ASCII returns the decimal representation in the database character set of the first character of char.
http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions013.htm#sthref933
So the result depends on the database character set, which can be greater than 255.
This may vary with your version of Oracle, but it is probably trying to do you the favor of gracefully handling the non-7bit ASCII value that you are passing (but should not be). The doc in at least one version discusses some handling of non-ASCII inputs (http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions007.htm) though if you are using a different version of oracle you may want to refer to the appropriate docs.
If your docs don't say anything more about how it handles non-7bit characters then the answer is probably not well defined (ie no guarantee from Oracle on behavior) and you may want to consider cleansing your input so you only try calling the ASCII function on values that you know to be in the proper input set.
I've run into an odd problem with a legacy ASP application that uses Oracle 9i as its database. We recently had the DBA increase the size of a VARCHAR2 field from VARCHAR2(2000) to VARCHAR2(4000). The problem is that a NULL byte (ASCII: 0) is inserted into the 2001 character position in the string even if the string inserted is larger than 2000 characters. For example, if the application inserts a new record into the table with a string of 2500 characters, then 2501 characters are stored in the VARCHAR2 field and the NULL byte is inserted at the 2001 character position. The NULL byte is not part of the original posted data that was saved to the database and is causing us some grief. Have any of you ever come across something like this? The application is using the MSDAORA ODBC driver and I'm thinking that the driver is adding the terminating NULL character to its string buffer which may have a internal limit of 2000 characters.
Any ideas?
Thanks.
I really don't think it is ORACLE doing that. You could always use RTRIM to remove any white space in your insert statement.
Go for a 10046 trace level 4
alter session set events '10046 trace name context forever, level 4';
That will create a trace file on the server which will contain the bind value (so you can see whether the null byte is part of string being pushed into the SQL).
Are there any character set oddities (eg are you using a multi-byte character set) ? The more oddities in the environment, the more likely you are to find a bug.
Last thought is try a CLOB instead of a VARCHAR2(4000)