I've installed Oracle Database 10g Express Edition (Universal) with the default settings:
SELECT * FROM NLS_DATABASE_PARAMETERS;
NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
Given that both CHAR and NCHAR data types seem to accept multi-byte strings, what is the exact difference between these two column definitions?
VARCHAR2(10 CHAR)
NVARCHAR2(10)
The NVARCHAR2 datatype was introduced by Oracle for databases that want to use Unicode for some columns while keeping another character set for the rest of the database (which uses VARCHAR2). The NVARCHAR2 is a Unicode-only datatype.
One reason you may want to use NVARCHAR2 might be that your DB uses a non-Unicode character set and you still want to be able to store Unicode data for some columns without changing the primary character set. Another reason might be that you want to use two Unicode character set (AL32UTF8 for data that comes mostly from western Europe, AL16UTF16 for data that comes mostly from Asia for example) because different character sets won't store the same data equally efficiently.
Both columns in your example (Unicode VARCHAR2(10 CHAR) and NVARCHAR2(10)) would be able to store the same data, however the byte storage will be different. Some strings may be stored more efficiently in one or the other.
Note also that some features won't work with NVARCHAR2, see this SO question:
Oracle Text will not work with NVARCHAR2. What else might be unavailable?
I don't think answer from Vincent Malgrat is correct. When NVARCHAR2 was introduced long time ago nobody was even talking about Unicode.
Initially Oracle provided VARCHAR2 and NVARCHAR2 to support localization. Common data (include PL/SQL) was hold in VARCHAR2, most likely US7ASCII these days. Then you could apply NLS_NCHAR_CHARACTERSET individually (e.g. WE8ISO8859P1) for each of your customer in any country without touching the common part of your application.
Nowadays character set AL32UTF8 is the default which fully supports Unicode. In my opinion today there is no reason anymore to use NLS_NCHAR_CHARACTERSET, i.e. NVARCHAR2, NCHAR2, NCLOB. Note, there are more and more Oracle native functions which do not support NVARCHAR2, so you should really avoid it. Maybe the only reason is when you have to support mainly Asian characters where AL16UTF16 consumes less storage compared to AL32UTF8.
The NVARCHAR2 stores variable-length character data. When you
create a table with the NVARCHAR2 column, the maximum size is always
in character length semantics, which is also the default and only
length semantics for the NVARCHAR2 data type.
The NVARCHAR2data type uses AL16UTF16character set which encodes Unicode data in the UTF-16 encoding. The AL16UTF16 use 2 bytes to store a character. In addition, the maximum byte length of an NVARCHAR2 depends on the configured national character set.
VARCHAR2 The maximum size of VARCHAR2 can be in either bytes or characters. Its column only can store characters in the default character
set while the NVARCHAR2 can store virtually any characters. A single character may require up to 4 bytes.
By defining the field as:
VARCHAR2(10 CHAR) you tell Oracle it can use enough space to store 10
characters, no matter how many bytes it takes to store each one. A single character may require up to 4 bytes.
NVARCHAR2(10) you tell Oracle it can store 10 characters with 2 bytes per character
In Summary:
VARCHAR2(10 CHAR) can store maximum of 10 characters and maximum of 40 bytes (depends on the configured national character set).
NVARCHAR2(10) can store maximum of 10 characters and maximum of 20 bytes (depends on the configured national character set).
Note: Character set can be UTF-8, UTF-16,....
Please have a look at this tutorial for more detail.
Have a good day!
nVarchar2 is a Unicode-only storage.
Though both data types are variable length String datatypes, you can notice the difference in how they store values.
Each character is stored in bytes. As we know, not all languages have alphabets with same length, eg, English alphabet needs 1 byte per character, however, languages like Japanese or Chinese need more than 1 byte for storing a character.
When you specify varchar2(10), you are telling the DB that only 10 bytes of data will be stored. But, when you say nVarchar2(10), it means 10 characters will be stored. In this case, you don't have to worry about the number of bytes each character takes.
Related
in Oracle 18c I am not able to create table with column NVARCHAR2 with length >2000:
Error report -
ORA-00910: specified length too long for its datatype
00910. 00000 - "specified length too long for its datatype"
*Cause: for datatypes CHAR and RAW, the length specified was > 2000;
otherwise, the length specified was > 4000.
*Action: use a shorter length or switch to a datatype permitting a
longer length such as a VARCHAR2, LONG CHAR, or LONG RAW
Which is strange, because MAX_STRING_SIZE is STANDARD, so I should be able to store up to 4000.
What should be changed in DB setting to allow it?
With MAX_STRING_SIZE = STANDARD the limit for VARCHAR2, NVARCHAR2, and RAW types in Oracle SQL is 4000 bytes.
When you specify VARCHAR2(size) the size is interpreted as a byte length by default. Therefore you can specify up to VARCHAR2(4000).
When you specify NVARCHAR2(size) the size is interpreted as character length. The relevant character set is the national character set, which is AL16UTF16 by default. And for AL16UTF16 the multiplier to convert character length to byte length is 2. Therefore, you can specify up to NVARCHAR2(2000) because this converts to 4000 bytes.
Setting MAX_STRING_SIZE = EXTENDED increases the capacity of NVARCHAR to 32767 bytes which means you could specify up to NVARCHAR2(16383).
Note that changing MAX_STRING_SIZE in an existing database is more complicated than simply setting a parameter because metadata update is required for some database objects to account for the increased VARCHAR2, NVARCHAR2, and RAW capacity. The process is explained in the Oracle Reference Manual.
Note that Oracle's Autonomous Database Cloud Service on Shared Infrastructure (ADB-S) is setup with MAX_STRING_SIZE = EXTENDED by default!
If you are interested in more details, there are some nuances:
Are you sure you need to use NVARCHAR2 at all? Typically if you choose a database character set that supports Unicode, such as AL32UTF8, then you can avoid NCHAR, NVARCHAR2, and NCLOB data types.
It is possible to specify VARCHAR2 type with character length semantics, by using optional CHAR keyword like VARCHAR2(1000 CHAR) or by setting NLS_LENGTH_SEMANTICS parameter in the session to CHAR.
Strictly speaking, for NVARCHAR2(size) size is not characters but code points in the national character set. Some characters use two code points in AL16UTF16.
The national character set is AL16UTF16 by default, but could also be UTF8. The Oracle documentation for NVARCAHR2 type explains the NVARCHAR2 limits for the different combinations of MAX_STRING_SIZE and national character set:
The minimum value of size is 1. The maximum value is:
16383 if MAX_STRING_SIZE = EXTENDED and the national character set is AL16UTF16
32767 if MAX_STRING_SIZE = EXTENDED and the national character set is UTF8
2000 if MAX_STRING_SIZE = STANDARD and the national character set is AL16UTF16
4000 if MAX_STRING_SIZE = STANDARD and the national character set is UTF8
Thai characters not allowing more than 1333 characters from Java code.is there any possible way except using CLOB data type in db. we are using Oracle 11g.
Simply, no (I assume you use VARCHAR2 data type.), except Oracle 12c with EXTENDED string.
VARCHAR2 columns allow 4000 bytes in normal mode and up to 32767 in extended.
Thai requires multibyte characters that's why more than 1333 characters can take more than 4000 bytes.
NVARCHAR2 columns allow 2000 characters in normal mode and up to 16383 in extended.
What is the db character set ?
I suspect your scenario is as follows:
al32utf8 is the db character set.
the varchar2 column(s) in your table(s) have byte semantics.
The utf8 encoding represents each thai in up to 3 bytes. thus you encounter the length limit of 1333 instead of 4000.
You can change the length semantics from byte to char with ALTER TABLE MODIFY <column> VARCHAR2(n CHAR); (ref.: see here).
For the sake of completness: in case you are operating with a single byte db character set like WE8ISO8859P11 ( iso 8859-11, thai script ), characters can be composed from base characters and diacritical marks. In that case you might have success in changing encoding in the data source to use the code points for composite characters. However, I feel this scenario is unlikely, given that actually each of your test data characters must be composed from three parts to match the observation.
I have a problem inserting special characters (á é í ú or ñ) in a char(1) field.
CREATE TABLE sgc2."tabtest2"(field1 CHAR(1), field2 VARCHAR(1));
INSERT INTO sgc2."tabtest2" values('á', 'á');
ERROR:
Value "á" is too long.. SQLCODE=-433, SQLSTATE=22001, DRIVER=4.13.111
Apparently to insert these characters take two byte, and as the field only accepts one can not end with the insertion.
Is there any way to configure the database, to support these special characters taking only 1 byte?
Apparently your database was created with the Unicode codeset, where special characters are represented by multiple bytes. If you only need to represent a limited range of accented characters you can choose one of the supported codesets, specified by ISO-8859, for the corresponding language -- details in the manual. You will have to re-create the database using an appropriate CODESET option, as you cannot change the codeset of an existing database.
However, you should consider changing your tables instead, as Unicode gives you more flexibility. A Unicode database can also be a requirement for certain DB2 features, for example BLU Acceleration.
I'm using DBIx::Class to fetch data from Oracle (11.2). when the data fetched, for example "Alfred Kärcher" its returns the value as "Alfred Karcher". I tried to add the $ENV NLS_LANG and NLS_NCHAR but still no change.
I also used the utf8 module to verify that the data is utf8 encoded.
This looks like the Oracle client library converting the data.
Make sure the database encoding is set to AL32UTF8 and the environment variable NLS_LANG to AMERICAN_AMERICA.AL32UTF8.
It might also be possible by setting the ora_(n)charset parameter instead.
The two links from DavidEG contain all the info that's needed to make it work.
You don't need use utf8; in your script but make sure you set STDOUT to UTF-8 encoding: use encoding 'utf8';
here the problem is with the column data type that you specified for the storing
you column database specified as VARCHAR2(10), then for oracle, actually stores the 10 bytes, for English 10 bytes means 10 characters, but in case the data you insert into the column contains some special characters like umlaut, it require 2 bytes. then you end up RA-12899: VALUE too large FOR column.
so in case the data that you inserting into the column which is provided the user and from different countries then use VARCHAR2(10 char)
In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multi-byte character sets.
In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.
What is the difference between varchar and varchar2?
As for now, they are synonyms.
VARCHAR is reserved by Oracle to support distinction between NULL and empty string in future, as ANSI standard prescribes.
VARCHAR2 does not distinguish between a NULL and empty string, and never will.
If you rely on empty string and NULL being the same thing, you should use VARCHAR2.
Currently VARCHAR behaves exactly the same as VARCHAR2. However, the type VARCHAR should not be used as it is reserved for future usage.
Taken from: Difference Between CHAR, VARCHAR, VARCHAR2
Taken from the latest stable Oracle production version 12.2:
Data Types
The major difference is that VARCHAR2 is an internal data type and VARCHAR is an external data type. So we need to understand the difference between an internal and external data type...
Inside a database, values are stored in columns in tables. Internally, Oracle represents data in particular formats known as internal data types.
In general, OCI (Oracle Call Interface) applications do not work with internal data type representations of data, but with host language data types that are predefined by the language in which they are written. When data is transferred between an OCI client application and a database table, the OCI libraries convert the data between internal data types and external data types.
External types provide a convenience for the programmer by making it possible to work with host language types instead of proprietary data formats. OCI can perform a wide range of data type conversions when transferring data between an Oracle database and an OCI application. There are more OCI external data types than Oracle internal data types.
The VARCHAR2 data type is a variable-length string of characters with a maximum length of 4000 bytes. If the init.ora parameter max_string_size is default, the maximum length of a VARCHAR2 can be 4000 bytes. If the init.ora parameter max_string_size = extended, the maximum length of a VARCHAR2 can be 32767 bytes
The VARCHAR data type stores character strings of varying length. The first 2 bytes contain the length of the character string, and the remaining bytes contain the string. The specified length of the string in a bind or a define call must include the two length bytes, so the largest VARCHAR string that can be received or sent is 65533 bytes long, not 65535.
A quick test in a 12.2 database suggests that as an internal data type, Oracle still treats a VARCHAR as a pseudotype for VARCHAR2. It is NOT a SYNONYM which is an actual object type in Oracle.
SQL> select substr(banner,1,80) from v$version where rownum=1;
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
SQL> create table test (my_char varchar(20));
Table created.
SQL> desc test
Name Null? Type
MY_CHAR VARCHAR2(20)
There are also some implications of VARCHAR for ProC/C++ Precompiler options. For programmers who are interested, the link is at: Pro*C/C++ Programmer's Guide
After some experimentation (see below), I can confirm that as of September 2017, nothing has changed with regards to the functionality described in the accepted answer:-
Rextester demo for Oracle 11g:
Empty strings are inserted as NULLs for both VARCHAR
and VARCHAR2.
LiveSQL demo for Oracle 12c: Same results.
The historical reason for these two keywords is explained well in an answer to a different question.
VARCHAR can store up to 2000 bytes of characters while VARCHAR2 can store up to 4000 bytes of characters.
If we declare datatype as VARCHAR then it will occupy space for NULL values. In the case of VARCHAR2 datatype, it will not occupy any space for NULL values. e.g.,
name varchar(10)
will reserve 6 bytes of memory even if the name is 'Ravi__', whereas
name varchar2(10)
will reserve space according to the length of the input string. e.g., 4 bytes of memory for 'Ravi__'.
Here, _ represents NULL.
NOTE: varchar will reserve space for null values and varchar2 will not reserve any space for null values.
Currently, they are the same. but previously
Somewhere on the net, I read that,
VARCHAR is reserved by Oracle to support distinction between NULL and empty string in future, as ANSI standard prescribes.
VARCHAR2 does not distinguish between a NULL and empty string, and never will.
Also,
Emp_name varchar(10) - if you enter value less than 10 digits then remaining space cannot be deleted. it used total of 10 spaces.
Emp_name varchar2(10) - if you enter value less than 10 digits then remaining space is automatically deleted