Defining a Character Set for a column For oracle database tables - oracle

I am running following query in SQL*Plus
CREATE TABLE tbl_audit_trail (
id NUMBER(11) NOT NULL,
old_value varchar2(255) NOT NULL,
new_value varchar2(255) NOT NULL,
action varchar2(20) CHARACTER SET latin1 NOT NULL,
model varchar2(255) CHARACTER SET latin1 NOT NULL,
field varchar2(64) CHARACTER SET latin1 NOT NULL,
stamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
user_id NUMBER(11) NOT NULL,
model_id varchar2(65) CHARACTER SET latin1 NOT NULL,
PRIMARY KEY (id),
KEY idx_action (action)
);
I am getting following error:
action varchar2(20) CHARACTER SET latin1 NOT NULL,
*
ERROR at line 5:
ORA-00907: missing right parenthesis
Can you suggest what am I missing?

The simple answer is that, unlike MySQL, character sets can't be defined at column (or table) level. Latin1 is not a valid Oracle character set either.
Character sets are consistent across the database and will have been specified when you created the database. You can find your character by querying NLS_DATABASE_PARAMETERS,
select value
from nls_database_parameters
where parameter = 'NLS_CHARACTERSET'
The full list of possible character sets is available for 11g r2 and for 9i or you can query V$NLS_VALID_VALUES.
It is possible to use the ALTER SESSION statement to set the NLS_LANGUAGE or the NLS_TERRITORY, but unfortunately you can't do this for the character set. I believe this is because altering the language changes how Oracle would display the stored data whereas changing the character set would change how Oracle stores the data.
When displaying the data, you can of course specify the required character set in whichever client you're using.
Character set migration is not a trivial task and should not be done lightly.
On a slight side note why are you trying to use Latin 1? It would be more normal to set up a new database in something like UTF-8 (otherwise known as AL32UTF8 - don't use UTF8) or UTF-16 so that you can store multi-byte data effectively. Even if you don't need it now it's wise to attempt - no guarantees in life - to future proof your database with no need to migrate in the future.
If you're looking to specify differing character sets for different columns in a database then the better option would be to determine if this requirement is really necessary and to try to remove it. If it is definitely necessary1 then your best bet might be to use a character set that is a superset of all potential character sets. Then, have some sort of check constraint that limits the column to specific hex values. I would not recommend doing this at all, the potential for mistakes to creep in is massive and it's extremely complex. Furthermore, different character sets render different hex values differently. This, in turn, means that you need to enforce that a column is rendered in a specific character, which is impossible as it falls outside the scope of the database.
1. I'd be interested to know the situation

According to provided DDL statement it's some need to use 2 character sets. The implementation of this functionality in Oracle is different from MySQL and done with n* data types like nvarchar2, nchar... Latin1 is similar to some Western European character set that might be default. So you able to define for example "Latin1" (WE**) and some Unicode (UTF8..).
The NVARCHAR2 datatype was introduced by Oracle for databases that want to use Unicode for some columns while keeping another character set for the rest of the database (which uses VARCHAR2). The NVARCHAR2 is a Unicode-only datatype.
The reason you want to use NVARCHAR2 might be that your DB uses a non-Unicode character and you still want to be able to store Unicode data for some columns.
Columns in your example would be able to store the same data, however the byte storage will be different.

Related

Declaring a CLOB in an Oracle database with a custom charset

Is it possible to declare a UTF-8 CLOB if the database is set up with the following character sets?
PARAMETER VALUE
NLS_CHARACTERSET CL8ISO8859P5
NLS_NCHAR_CHARACTERSET AL16UTF16
I tried passing a charset name to the declaration, but it looks like it can only accept references to character sets of other objects.
declare
clob_1 clob character set "AL32UTF8";
begin
null;
end;
/
I don't think this is possible, see PL/SQL Language Fundamentals
PL/SQL uses the database character set to represent:
Stored source text of PL/SQL units
Character values of data types CHAR, VARCHAR2, CLOB, and LONG
So, in your case you have to use NCLOB which uses AL16UTF16 or try a workaround with BLOB. However, this might become cumbersome.
As far as I can tell, you can't do that.
Database character set is defined during database creation (and can't be changed unless you recreate the database) and all character datatype columns store data in that character set.
Perhaps you could try with NCLOB data type, where "N" represents "national character set" and it'll store Unicode character data.
Unicode is a universal encoded character set that can store
information in any language using a single character set

How did the unicode characters endup in the database table column?

Recently I came across a unicode character (\u2019) in a database table column while parsing using Python.
Question: What are the reasons that can result in unicode characters showing up in the database table? Is it data entry issue?
Appreciate any input.
When you set up your Oracle Database you choose a character set which will be used in the SQL char datatypes (char, varchar2 etc).
Suppose you chose your character set and you have a table with a column of VARCHAR2 type. Suddenly you need to store some string with non-ASCII symbols not supported by your database (chosen character set). You may convert this string into ASCII string by calling ASCIISTR function for example and store it in your VARCHAR2 column (but it's not a good idea because many SQL built-in functions don't understand '\u2019' (they think it's just 6 symbols)). That's how Unicode may appear in your table column (ASCIISTR converts non-ascii symbols into unicode representation such as '\u2019').
Another option is special Oracle nchar datatypes which were designed to store UNICODE without altering global database settings.
Here is the link with Oracle documentation: https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch6unicode.htm

perl DBIx::Class converting values with umlaut

I'm using DBIx::Class to fetch data from Oracle (11.2). when the data fetched, for example "Alfred Kärcher" its returns the value as "Alfred Karcher". I tried to add the $ENV NLS_LANG and NLS_NCHAR but still no change.
I also used the utf8 module to verify that the data is utf8 encoded.
This looks like the Oracle client library converting the data.
Make sure the database encoding is set to AL32UTF8 and the environment variable NLS_LANG to AMERICAN_AMERICA.AL32UTF8.
It might also be possible by setting the ora_(n)charset parameter instead.
The two links from DavidEG contain all the info that's needed to make it work.
You don't need use utf8; in your script but make sure you set STDOUT to UTF-8 encoding: use encoding 'utf8';
here the problem is with the column data type that you specified for the storing
you column database specified as VARCHAR2(10), then for oracle, actually stores the 10 bytes, for English 10 bytes means 10 characters, but in case the data you insert into the column contains some special characters like umlaut, it require 2 bytes. then you end up RA-12899: VALUE too large FOR column.
so in case the data that you inserting into the column which is provided the user and from different countries then use VARCHAR2(10 char)
In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multi-byte character sets.
In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.

Oracle SQL Developer client encoding

I read many of the related Stack Overflow's topics and I spent a whole day with googleing the following problem but I haven't found anything that would help, however the problem not seems to be complicated.
I have an Oracle database.
Let's see the following PL/SQL script:
CREATE TABLE Dummy(
id number(19,0),
tclob clob,
tnclob nclob,
PRIMARY KEY (id));
INSERT INTO dummy (id, tclob, tnclob) VALUES (1, 'ñ$ߤ*>;''<’', 'ñ$ߤ*>;''<’');
SELECT tclob, tnclob FROM dummy;
My problem is that 'ñ' and '’' characters are stored as a question mark.
I also tried to load the previously inserted values through JAVA, but I get the question marks instead of the special characters.
I created a small Java method which uses OraclePreparedStatement to save test data, and I use setNString() method to attach the nclob data to the query. In this case all characters are displayed fine in Java and also in SqlDeveloper.
So a possible solution is to use JAVA to save my data into the db. I have a thousands of lines SQL script which inserts data and I don't necessarily want to write the whole thing again in java.
So the question is: why the SqlDeveloper breaks the special characters?
My settings:
SELECT DECODE(parameter, 'NLS_CHARACTERSET', 'CHARACTER SET',
'NLS_LANGUAGE', 'LANGUAGE',
'NLS_TERRITORY', 'TERRITORY') name,
value from v$nls_parameters
WHERE parameter IN ( 'NLS_CHARACTERSET', 'NLS_LANGUAGE', 'NLS_TERRITORY')
Result:
+---------------+--------------+
| NAME | VALUE |
+---------------+--------------+
| LANGUAGE | HUNGARIAN |
| TERRITORY | HUNGARY |
| CHARACTER SET | EE8ISO8859P2 |
+---------------+--------------+
I changed SqlDeveloper/Preferences/Environment/Encoding to UTF-8.
I also changed HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_OraDb11g_home1 value to HUNGARIAN_HUNGARY.UTF8
Update: I tried to insert the data with the following syntaxes:
INSERT INTO dummy (id, tclob, tnclob) VALUES (1, N'ñ$ߤ*>;''<’', N'ñ$ߤ*>;''<’');
INSERT INTO dummy (id, tclob, tnclob) VALUES (1, 'ñ$ߤ*>;''<’', to_nclob('ñ$ߤ*>;''<’'));
Nothing helped.
So what can I do?
On the PC that PLSQL is installed, set the value of NLS_LANG registery entry equal to the PC's operation system locale (code page), equivalent value.
How to detect operating system language locale?
How to map OS locale to NLS_LANG value?
When using PLSQL the initial parameter of client-language that is required to create an Oracle session is read from NLS_LANG registry entry.
Due to Oracle documents, invalid data usually occurs in a database because the NLS_LANG parameter is not set properly on the client.
The NLS_LANG value should reflect the client operating system code page.
For example, in an English Windows environment, the code page is WE8MSWIN1252. When the NLS_LANG parameter is set properly, the database can automatically convert incoming data from the client operating system to its encoding.
When using JAVA method, the client-language parameter is set by the value from the Control Panel, under Regional and Language Options, so the things will be OK.
You can try to change NLS_LANG value on your Win PC with regedit tool.
Path is: \HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE, when NLS_LANG contain value of your Oracle client's current encoding.
There is Oracle's list of available encodings:
Commonly Used Values for NLS_LANG

Why is it necessary to specify length of a column in a table

I always wonder why should we limit a column length in a database table to some limit then the default one.
Eg. I have a column short_name in my table People the default length is 255 characters for the column but I restrict it to 100 characters. What difference will it make.
The string will be truncated to the maximum length ( in characters usually ).
The way it is actually implemented is up to the database engine you use.
For example:
CHAR(30) will always use up 30 characters in MySQL, and this allows
MySQL to speed up access because it is able to predict the value
length without parsing anything;
VARCHAR(30) will trim any lengthy strings to 30 characters in MySQL when strict mode is on, otherwise you may use longer strings and they will be fully stored;
In SQLite, you can store strings in any type of column, ignoring the type.
The reason many features of SQL are supported in those database engines eventhough they are not being utilized, or being utilized in different ways, is in order to maintain compliance to the SQL schema.

Resources