Oracle adds NULL Byte (ASCII: 0) to varchar2 string - oracle

I've run into an odd problem with a legacy ASP application that uses Oracle 9i as its database. We recently had the DBA increase the size of a VARCHAR2 field from VARCHAR2(2000) to VARCHAR2(4000). The problem is that a NULL byte (ASCII: 0) is inserted into the 2001 character position in the string even if the string inserted is larger than 2000 characters. For example, if the application inserts a new record into the table with a string of 2500 characters, then 2501 characters are stored in the VARCHAR2 field and the NULL byte is inserted at the 2001 character position. The NULL byte is not part of the original posted data that was saved to the database and is causing us some grief. Have any of you ever come across something like this? The application is using the MSDAORA ODBC driver and I'm thinking that the driver is adding the terminating NULL character to its string buffer which may have a internal limit of 2000 characters.
Any ideas?
Thanks.

I really don't think it is ORACLE doing that. You could always use RTRIM to remove any white space in your insert statement.

Go for a 10046 trace level 4
alter session set events '10046 trace name context forever, level 4';
That will create a trace file on the server which will contain the bind value (so you can see whether the null byte is part of string being pushed into the SQL).
Are there any character set oddities (eg are you using a multi-byte character set) ? The more oddities in the environment, the more likely you are to find a bug.
Last thought is try a CLOB instead of a VARCHAR2(4000)

Related

Spring to Oracle UTF-8 character truncation

I have an issue with invalid characters appearing in an Oracle database. The "¿" or upside-down question mark. I know its caused by UTF8 encoding's being put into the encoding of the Oracle database. Examples of characters I am expecting are ' – ' which looks like a normal hyphen but isnt, and ' ’ ' which should be a normal single quote.
Input: "2020-08-31 – 2020-12-31" - looks like normal hyphen but not
Output: "2020-08-31 ¿ 2020-12-31"
I know the primary source of the characters are copy and paste from office programs like word and its conversion to smart quotes and stuff. Getting the users to turn off this feature is not a viable solution.
There was a good article about dealing with this and they offered a number of solutions:
switch database to UTF8 - cant do it due to reasons
change varchar2 to nvarchar2 - I tried this solution as it would be the best for my needs, but when tested I was still ending up with invalid characters, so its not the only point.
use blob/clob - tried and got page crashes
filter - not attractive as there are a lot of places that would need to be updated.
So the technologies being used are Eclipse, Spring, JDBC version 4.2, and Oracle 12.
So the information is entered into the form, the form gets saved, it passes from the controller into the DAO and when its checked here, the information is correct. Its when it passes here into the JDBC where I lose sight of it, and once it enters the database I cant tell if its already changed or if thats where its happening, but the database stores the invalid character.
So somewhere between the insert statement to the database this is happening. So its either the JDBC or the database, but how to tell and how to fix? I have changed the field where the information is being stored, its originally a varchar2 but I tried nvarchar2 and its still invalid.
I know that this question is asking about the same topic, but the answers do not help me.
SELECT * FROM V$NLS_PARAMETERS WHERE PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');
PARAMETER VALUE CON_ID
---------------------------------------------------------------- ---------------------------------------------------------------- ----------
NLS_CHARACTERSET WE8ISO8859P15 0
NLS_NCHAR_CHARACTERSET AL16UTF16 0
So in the comments I was given a link to something that might help but its giving me an error.
#Override public Long createComment(Comment comment) {
return jdbcTemplate.execute((Connection connection) -> {
PreparedStatement ps = connection.prepareStatement("INSERT INTO COMMENTS (ID, CREATOR, CREATED, TEXT) VALUES (?,?,?,?)", new int[] { 1 });
ps.setLong(1, comment.getFileNumber());
ps.setString(2, comment.getCreator());
ps.setTimestamp(3, Timestamp.from(comment.getCreated()));
((OraclePreparedStatement) ps).setFormOfUse(4, OraclePreparedStatement.FORM_NCHAR);
ps.setString(4, comment.getText());
if(ps.executeUpdate() != 1) return null;
ResultSet rs = ps.getGeneratedKeys();
return rs.next() ? rs.getLong(1) : null;
});
An exception occurred: org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement cannot be cast to oracle.jdbc.OraclePreparedStatement
Character set WE8ISO8859P15 (resp. ISO 8859-15) does not contain – or ’, so you cannot store them in a VARCHAR2 data field.
Either you migrate the database to Unicode (typically AL32UTF8) or use NVARCHAR2 data type for this column.
I am not familiar with Java but I guess you have to use getNString. The linked document should provide some hints to solve the issue.

Why is the ora-archive-state column a varchar2 4000 chars?

Can someone explain why Oracle made the ora-archive-state column a varchar2 of 4000 chars? When using the in-database archiving feature of Oracle 12c, when the column is 0, the record is visible. When anything other than 0, the record is hidden.
What's the purpose of having the extra 3999 chars when simply setting the column to 1 accomplishes the goal? I'm doubting Oracle is just wasting the space.
Because it allows you to mark "archived" rows differently: you can update ORA_ARCHIVE_STATE to different values, for example: to_char(systimestamp,'yyyy-mm-dd hh24:mi:ssxff')
to set it to the date of archiving. And later you can analyze archived records by this column.
I'm doubting Oracle is just wasting the space.
Varchar2 doesn't waste space. It is variable-length character string. Ie varchar2(4000b) doesn't mean it will use 4000 bytes, or varchar2(4000c) ~ chars. That's just maximum allowed column length

Displaying the hex value of a string from a oracle varchar2?

We are having problems with text that is encoded in some different ways but kept in a single column in a table. Long story. On MySQL, I can do "select hex(str) from table where" and I see the bytes of the string exactly as I set them.
On Oracle, I have a string which starts with the Turkish character İ, which is the Unicode character 0x0130 "LATIN CAPITAL LETTER WITH DOT ABOVE". This is in my printed copy of the Unicode Version 2.0 book. In UTF-8, this character is 0xc4b0.
We have very old client apps we need to support. They would send us this text in "windows-1254". We used to just close our eyes, store it, and hand it back later. Now we need the Unicode, or are being given the Unicode.
So I have:
SQL> select id, name from table where that thing;
ID NAME
------ ------------------------
746 Ý
This makes sense because the "İ" is 0xdd in windows-1254 and 0xdd in wondows-1252 is "Ý". My terminal is presumably set to the usual windows-1252.
But:
SQL> select id, rawtohex(name) from table where that thing;
ID RAWTOHEX(NAME)
------ ------------------------
746 C39D
There seems to be no equivalent to the hex(name) function in MySQL. But I must be missing something. What am I missing here?
My java code has to take the utf8 that I am supplied and save a utf8 copy and a windows-1252 copy. The java code gives me:
bytes (utf8): c4 b0
bytes (1254): dd
Yet, when I save it, the client does not get the correct character. And when I try to see what Oracle has actually stored, i get the garbage seen above. I have no idea where the C39D is coming from. Any suggestions?
We have ojdbc14.jar built into all of our applications and we are connecting to a database that says it is "Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production".
Use the dump function to see how Oracle stores data internally.
You seem to have a misunderstanding on how Oracle treats VARCHAR2 characters set conversions: you can't influence how Oracle stores its data physically. (Also if you haven't already, it's helpful to read: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets).
Your client speaks to Oracle only in binary. In fact all systems exchange information in binary only. To understand each others, it is necessary that both systems know what language (character set) is being used.
In your case we can reconstruct what happens:
Your client sends the byte dd to Oracle and says it is windows-1252 (instead of 1254).
Oracle looks up its character set table and sees that this data is translated to the symbol Ý in this character set.
Oracle logically stores this information in its table.
Since Oracle is setup in UTF-8, it converts this data to the UTF-8 binary reprensentation of Ý:
SQL> SELECT rawtohex('Ý') FROM dual;
RAWTOHEX('Ý')
--------------
C39D
Oracle stores C39D internally.
As you can see, the problem comes from the first step: there is a problem of setup. As long as you don't fix this, the systems won't be able to successfully dialogue.
The conversion is automatic when you use VARCHAR2 because this datatype is a logical text symbol interface (you have next to no control over forcing the actual binary data being stored).
I have bytes in UTF-8 to begin.
String strFromUTF8 = new String(bytes, "UTF8");
byte[] strInOldStyle = strFromUTF8.getBytes("Cp1254");
With MySQL, I am done. I takes these bytes, turn them into a hex string and do an update with unhex(hexStr). This allows me to put the legacy bytes into a varchar column.
With Oracle, I must do:
String again = new String(strInOldStyle, "Cp1254");
byte[] nextOldBytes = again.getBytes("UTF8");
Now, I can do an update and get the bytes into a varchar2 column with:
update table set colName = UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('hexStr')) where ...
Strange, no? I am sure I have made this more complex than it needed to be.
What we see is this, though,
"İ" in UTF-8 == 0xc4d0
"İ" in Cp1254 == 0xdd == "Ý" in Cp1252
"Ý" in UTF-8 == 0xc3d9
So, if I get the string "İ" and do:
update table set name = UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C3D9')) where ...
Then our legacy client gives us a "İ". Yep. It works.

Defining a Character Set for a column For oracle database tables

I am running following query in SQL*Plus
CREATE TABLE tbl_audit_trail (
id NUMBER(11) NOT NULL,
old_value varchar2(255) NOT NULL,
new_value varchar2(255) NOT NULL,
action varchar2(20) CHARACTER SET latin1 NOT NULL,
model varchar2(255) CHARACTER SET latin1 NOT NULL,
field varchar2(64) CHARACTER SET latin1 NOT NULL,
stamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
user_id NUMBER(11) NOT NULL,
model_id varchar2(65) CHARACTER SET latin1 NOT NULL,
PRIMARY KEY (id),
KEY idx_action (action)
);
I am getting following error:
action varchar2(20) CHARACTER SET latin1 NOT NULL,
*
ERROR at line 5:
ORA-00907: missing right parenthesis
Can you suggest what am I missing?
The simple answer is that, unlike MySQL, character sets can't be defined at column (or table) level. Latin1 is not a valid Oracle character set either.
Character sets are consistent across the database and will have been specified when you created the database. You can find your character by querying NLS_DATABASE_PARAMETERS,
select value
from nls_database_parameters
where parameter = 'NLS_CHARACTERSET'
The full list of possible character sets is available for 11g r2 and for 9i or you can query V$NLS_VALID_VALUES.
It is possible to use the ALTER SESSION statement to set the NLS_LANGUAGE or the NLS_TERRITORY, but unfortunately you can't do this for the character set. I believe this is because altering the language changes how Oracle would display the stored data whereas changing the character set would change how Oracle stores the data.
When displaying the data, you can of course specify the required character set in whichever client you're using.
Character set migration is not a trivial task and should not be done lightly.
On a slight side note why are you trying to use Latin 1? It would be more normal to set up a new database in something like UTF-8 (otherwise known as AL32UTF8 - don't use UTF8) or UTF-16 so that you can store multi-byte data effectively. Even if you don't need it now it's wise to attempt - no guarantees in life - to future proof your database with no need to migrate in the future.
If you're looking to specify differing character sets for different columns in a database then the better option would be to determine if this requirement is really necessary and to try to remove it. If it is definitely necessary1 then your best bet might be to use a character set that is a superset of all potential character sets. Then, have some sort of check constraint that limits the column to specific hex values. I would not recommend doing this at all, the potential for mistakes to creep in is massive and it's extremely complex. Furthermore, different character sets render different hex values differently. This, in turn, means that you need to enforce that a column is rendered in a specific character, which is impossible as it falls outside the scope of the database.
1. I'd be interested to know the situation
According to provided DDL statement it's some need to use 2 character sets. The implementation of this functionality in Oracle is different from MySQL and done with n* data types like nvarchar2, nchar... Latin1 is similar to some Western European character set that might be default. So you able to define for example "Latin1" (WE**) and some Unicode (UTF8..).
The NVARCHAR2 datatype was introduced by Oracle for databases that want to use Unicode for some columns while keeping another character set for the rest of the database (which uses VARCHAR2). The NVARCHAR2 is a Unicode-only datatype.
The reason you want to use NVARCHAR2 might be that your DB uses a non-Unicode character and you still want to be able to store Unicode data for some columns.
Columns in your example would be able to store the same data, however the byte storage will be different.

What is the difference between varchar and varchar2 in Oracle?

What is the difference between varchar and varchar2?
As for now, they are synonyms.
VARCHAR is reserved by Oracle to support distinction between NULL and empty string in future, as ANSI standard prescribes.
VARCHAR2 does not distinguish between a NULL and empty string, and never will.
If you rely on empty string and NULL being the same thing, you should use VARCHAR2.
Currently VARCHAR behaves exactly the same as VARCHAR2. However, the type VARCHAR should not be used as it is reserved for future usage.
Taken from: Difference Between CHAR, VARCHAR, VARCHAR2
Taken from the latest stable Oracle production version 12.2:
Data Types
The major difference is that VARCHAR2 is an internal data type and VARCHAR is an external data type. So we need to understand the difference between an internal and external data type...
Inside a database, values are stored in columns in tables. Internally, Oracle represents data in particular formats known as internal data types.
In general, OCI (Oracle Call Interface) applications do not work with internal data type representations of data, but with host language data types that are predefined by the language in which they are written. When data is transferred between an OCI client application and a database table, the OCI libraries convert the data between internal data types and external data types.
External types provide a convenience for the programmer by making it possible to work with host language types instead of proprietary data formats. OCI can perform a wide range of data type conversions when transferring data between an Oracle database and an OCI application. There are more OCI external data types than Oracle internal data types.
The VARCHAR2 data type is a variable-length string of characters with a maximum length of 4000 bytes. If the init.ora parameter max_string_size is default, the maximum length of a VARCHAR2 can be 4000 bytes. If the init.ora parameter max_string_size = extended, the maximum length of a VARCHAR2 can be 32767 bytes
The VARCHAR data type stores character strings of varying length. The first 2 bytes contain the length of the character string, and the remaining bytes contain the string. The specified length of the string in a bind or a define call must include the two length bytes, so the largest VARCHAR string that can be received or sent is 65533 bytes long, not 65535.
A quick test in a 12.2 database suggests that as an internal data type, Oracle still treats a VARCHAR as a pseudotype for VARCHAR2. It is NOT a SYNONYM which is an actual object type in Oracle.
SQL> select substr(banner,1,80) from v$version where rownum=1;
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> create table test (my_char varchar(20));
Table created.
SQL> desc test
Name Null? Type
MY_CHAR VARCHAR2(20)
There are also some implications of VARCHAR for ProC/C++ Precompiler options. For programmers who are interested, the link is at: Pro*C/C++ Programmer's Guide
After some experimentation (see below), I can confirm that as of September 2017, nothing has changed with regards to the functionality described in the accepted answer:-
Rextester demo for Oracle 11g:
Empty strings are inserted as NULLs for both VARCHAR
and VARCHAR2.
LiveSQL demo for Oracle 12c: Same results.
The historical reason for these two keywords is explained well in an answer to a different question.
VARCHAR can store up to 2000 bytes of characters while VARCHAR2 can store up to 4000 bytes of characters.
If we declare datatype as VARCHAR then it will occupy space for NULL values. In the case of VARCHAR2 datatype, it will not occupy any space for NULL values. e.g.,
name varchar(10)
will reserve 6 bytes of memory even if the name is 'Ravi__', whereas
name varchar2(10)
will reserve space according to the length of the input string. e.g., 4 bytes of memory for 'Ravi__'.
Here, _ represents NULL.
NOTE: varchar will reserve space for null values and varchar2 will not reserve any space for null values.
Currently, they are the same. but previously
Somewhere on the net, I read that,
VARCHAR is reserved by Oracle to support distinction between NULL and empty string in future, as ANSI standard prescribes.
VARCHAR2 does not distinguish between a NULL and empty string, and never will.
Also,
Emp_name varchar(10) - if you enter value less than 10 digits then remaining space cannot be deleted. it used total of 10 spaces.
Emp_name varchar2(10) - if you enter value less than 10 digits then remaining space is automatically deleted

Resources