SQL Loader with utf8

SQL Loader with utf8 - oracle

I am getting following error while loading Japanese data using SQL*Loader. My Database is UTF8 (NLS parameters) and my OS supports UTF8.
Record 5: Rejected - Error on table ACTIVITY_FACT, column METADATA.
ORA-12899: value too large for column METADATA (actual: 2624, maximum: 3500)
My Control file:
load data
characterset UTF8
infile '../tab_files/activity_fact.csv' "STR ';'"
APPEND
into tableactivity_fact
fields terminated by ',' optionally enclosed by '~'
TRAILING NULLCOLS
(metadata CHAR(3500))
My table
create table actvuty_facr{
metadata varchar2(3500 char)
}
Why SQL Loader is throwing the wrong exception, (actual: 2624, maximum: 3500). 2624 is less than 3500.

The default length semantics for all datafiles (except UFT-16) is byte. So in your case you have a CHAR of 3500 bytes rather than characters. You have some multi-byte characters in your file and the 2624 characters is therefore using more than 3500 bytes, hence the (misleading) message.
You can sort this out by using character length semantics instead
alter this line in your control file
characterset UTF8
to this
characterset UTF8 length semantics char
and it will work on characters for CHAR fields (and some others) - in the same way that you have set up your table, so 3500 characters of up to four bytes each.
See the Utilities Guide on Character Length Semantics for more information

Related

length error getting while loading Cyrillic records into Oracle DB

Have a record which is having Cyrillic characters in it along with the english characters in MySql with datatype varchar(30). Getting "value too large" error while loading same through Informatica 9.6.1 to Oracle database having column datatype as varchar2(30). Could anyone explain it why is it happening like that. In both the DBs charset is UTF8.
For eg, data in mySQl is 'Александровском 2022'. Loading same to Oracle DB, getting below error:
ORA-12899: value too large for column "DB"."USER_DETAILS"."AUTHORITY_NAME" (actual: 31, maximum: 30)

In Oracle, you can specify if your column should have the maximum size of 300 BYTE or 300 CHAR.
You have defined (explicit or implicit) your column to have a maximum size if 300 BYTE.
So some of your strings with less than 300 characters will require more than 300 bytes as Cyrillic characters need more than 1 byte in UTF8.
You can change the definition of your column to varchar2(300 CHAR).
If [BYTE|CHAR] is omitted, the DB falls back to the setting defined in NLS_LENGTH_SEMANTICS. This can be set on DB or session.

Thai characters not allowing more than 1333 characters from Java code

Thai characters not allowing more than 1333 characters from Java code.is there any possible way except using CLOB data type in db. we are using Oracle 11g.

Simply, no (I assume you use VARCHAR2 data type.), except Oracle 12c with EXTENDED string.
VARCHAR2 columns allow 4000 bytes in normal mode and up to 32767 in extended.
Thai requires multibyte characters that's why more than 1333 characters can take more than 4000 bytes.
NVARCHAR2 columns allow 2000 characters in normal mode and up to 16383 in extended.

What is the db character set ?
I suspect your scenario is as follows:
al32utf8 is the db character set.
the varchar2 column(s) in your table(s) have byte semantics.
The utf8 encoding represents each thai in up to 3 bytes. thus you encounter the length limit of 1333 instead of 4000.
You can change the length semantics from byte to char with ALTER TABLE MODIFY <column> VARCHAR2(n CHAR); (ref.: see here).
For the sake of completness: in case you are operating with a single byte db character set like WE8ISO8859P11 ( iso 8859-11, thai script ), characters can be composed from base characters and diacritical marks. In that case you might have success in changing encoding in the data source to use the code points for composite characters. However, I feel this scenario is unlikely, given that actually each of your test data characters must be composed from three parts to match the observation.

ORA-12899: value too large for column

I am getting data from erp systems in the form of feeds ,in particular one column length in feed is 15 only.
In target table also corresponded column also length is varchar2(15) but when I am trying to load same into db it showing error like:
ORA-12899: value too large for column emp_name (actual: 16, maximum:
15)
I cant increase the column length since it is base table in the production.

have a look into this blog, the problem resolved for me by changing the column datatype from varchar(100) to varchar(100 char). in my case the data contains some umlaut characters.
http://gerardnico.com/wiki/database/oracle/byte_or_character

The usual reason for problems like this are non-ASCII characters that can be represented with one byte in the original database but require two (or more) bytes in the target database (due to different NLS settings).
To ensure your target column is large enough for 15 characters, you can modify it:
ALTER TABLE table_name MODIFY column_name VARCHAR2(15 CHAR)
(note the 15 CHAR - you can also use BYTE; if neither is present, the database uses the NLS_LENGTH_SEMANTICS setting as a default).
To check which values are larger than 15 bytes, you can
create a staging table in the target database with the column length set to 15 CHAR
insert the data from the source table into the staging table
find the offending rows with
SELECT * FROM staging WHERE lengthb(mycol) > 15
(note the use of LENGTHB as apposed to LENGTH - the former returns the length in bytes, whereas the latter returns the length in characters)

I found AL32UTF8 as the only valid setting. This varies from standard UTF8 with a few character having supplementary bytes, i.e, the characters are about 99% the same. I am guessing you have character conversion problems going on. In other words the data in table1 was written using one charset, and the new table has a slightly different charset.
If this is true, you have to find the source of the oddball charset. Because this will continue to happen.

Solution to:
ORA-12899: VALUE TOO LARGE FOR COLUMN(ACTUAL,MAXIMUM)
If you are facing problem while updating a column size of a table which already has data more than the new length below is the simple script that would work definitely.
ALTER TABLE TABLE_NAME ADD (NEW_COLUMN_NAME DATATYPE(DATASIZE));
UPDATE TABLE_NAME SET NEW_COLUMN_NAME = SUBSTR(OLD_COLUMN_NAME , 1, NEW_LENGTH);
ALTER TABLE TABLE_NAME DROP COLUMN OLD_COLUMN_NAME ;
ALTER TABLE TABLE_NAME RENAME COLUMN NEW_COLUMN_NAME TO OLD_COLUMN_NAME;
Meaning of the query:
ALTER TABLE TABLE_NAME ADD (NEW_COLUMN_NAME DATATYPE(DATASIZE));
It would just create a new column of the required new length in your existing table.
UPDATE TABLE_NAME SET NEW_COLUMN_NAME = SUBSTR(OLD_COLUMN_NAME , 1, NEW_LENGTH);
It will discard all the values after the new length value from old column values and set the trimmed values into the new column name.
ALTER TABLE TABLE_NAME DROP COLUMN OLD_COLUMN_NAME ;
It will remove the old column name as its absurd now and we have copied all the information into the new column.
ALTER TABLE TABLE_NAME RENAME COLUMN NEW_COLUMN_NAME TO OLD_COLUMN_NAME;
Renaming the new column name to the old column name would help you regain the original table structure except for the new column size as you wished.

Certainly the cause of error is that the value is too large for column data type. However, sometimes it is not visible at first sight. Except "byte versus char" differences mentioned in other answers, there can also be problem with line terminators.
I was trying to load CSV file using SQL*Loader in dockerized Oracle. The foo column of type char(1) was the last column. I got ORA-12899: value too large for column foo (actual: 2, maximum: 1) error despite all values of foo column were of length 1. Later I noticed the CSV file has been edited in Windows editor and accidentally saved with CRLF terminators. Since Linux in Docker container expects just LF, the CR was treated as part of column data.

This error made me confused a little bit.
VARCHAR2(x CHAR) means that the column will hold x characters but not
more than can fit into 4000 bytes. Internally, Oracle will set the
byte length of the column (DBA_TAB_COLUMNS.DATA_LENGTH) to MIN(x *
mchw, 4000), where mchw is the maximum byte width of a character in
the database character set. This is 1 for US7ASCII or WE8MSWIN1252, 2
for JA16SJIS, 3 for UTF8, and 4 for AL32UTF8.
For example, a VARCHAR2(3000 CHAR) column in an AL32UTF8 database will
be internally defined as having the width of 4000 bytes. It will hold
up to 3000 characters from the ASCII range (the character limit), but
only 1333 Chinese characters (the byte limit, 1333 * 3 bytes = 3999
bytes). A VARCHAR2(100 CHAR) column in an AL32UTF8 database will be
internally defined as having the width of 400 bytes. It will hold up
to any 100 Unicode characters.
Reference: https://community.oracle.com/tech/developers/discussion/421117/difference-between-varchar2-4000-byte-varchar2-4000-char

Clob size in bytes

I have a database with the below NLS settings
NLS_NCHAR_CHARACTERSET - AL16UTF16
NLS_CHARACTERSET - AL32UTF8
There's a table with a clob column storing a base64 encoded data.
Since the characters are mostly english and letters, I would assume each character takes up 1 byte only as clob using the charset of NLS_CHARACTERSET for encoding.
With a inline enabled clob column, the clob will be stored inline unless it goes more that 4096 bytes in size. However, when I tried to store a set of data with 2048 chars, I found that it is not stored inline (By checking the table DBA_TABLES). So does it mean each character is not using only 1 byte? Can anyone elaborate on this?
Another test added:
Create a table with clob column with chunk size 8kb so that initial segment size is 65536 bytes.
After insert a row with 32,768 chars in clob column. The 2nd extent creation can be told by querying dba_segments.

http://docs.oracle.com/cd/E11882_01/server.112/e10729/ch6unicode.htm#r2c1-t12
It says:
Data in CLOB columns is stored in a format that is compatible with
UCS-2 when the database character set is multibyte, such as UTF8 or
AL32UTF8. This means that the storage space required for an English
document doubles when the data is converted
So it looks like CLOB internally stores everything as UCS-2 (Unicode), i.e. 2 bytes fixed per symbol. Consequently, it stores inline 4096/2 = 2048 chars.

Loading Unicode Characters with Oracle SQL Loader (sqlldr) results in question marks

I'm trying to load localized strings from a unicode (UTF8-encoded) csv using SQL Loader into an oracle database. I've tried all sort of combinations but nothing seems to give me the result I'm looking for which is to have special greek characters like (Δ) not get converted to Î” or ¿.
My table definition looks like this:
CREATE TABLE "GLOBALIZATIONRESOURCE"
(
"RESOURCETYPE" VARCHAR2(255 CHAR) NOT NULL ENABLE,
"CULTURE" VARCHAR2(20 CHAR) NOT NULL ENABLE,
"KEY" VARCHAR2(128 CHAR) NOT NULL ENABLE,
"VALUE" VARCHAR2(2048 CHAR),
"DESCRIPTION" VARCHAR2(512 CHAR),
CONSTRAINT "PK_GLOBALIZATIONRESOURCE" PRIMARY KEY ("RESOURCETYPE","CULTURE","KEY") USING INDEX TABLESPACE REPSPACE_IX ENABLE
)
TABLESPACE REPSPACE;
I have tried the following configurations in my control file (and actually every permutation I could think of)
load data
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
load data
CHARACTERSET UTF8
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
load data
CHARACTERSET UTF16
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY X'002c' OPTIONALLY ENCLOSED BY X'0022'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
With the first two options, the unicode characters don't get encoded and just show up as upside down question marks.
If I choose last option, UTF16, then I get the following error even though all my data in my fields are much shorter than the length specified.
Field in data file exceeds maximum length
It seems as though every possible combination of ctl file configurations (even setting the byte order to little and big) doesn't work correctly. Can someone please give an example of a configuration (table structure and CTL file) that correctly loads unicode data from a csv? Any help would be greatly appreciated.
Note: I've already been to http://docs.oracle.com/cd/B19306_01/server.102/b14215/ldr_concepts.htm, http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htm and http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htm.

I had same issue and resolved by below steps -
Open data file into Notepad++ , Go to "Encoding" dropdown and select UTF8 encoding and save file.
use CHARACTERSET UTF8 into CTL file and then upload data.

You have two problem;
Character set.
Answer: You can solve this problem by finding your text character set (most of time notepad++ can do this.). After finding character set, you have to find sqlldr correspond of character set name. So, you can find this info from link https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313
After all of these, you should solve character set problem.
In contrast to your actual data length, sqlldr says that, Field in data file exceeds maximum length.
Answer: You can solve this problem by adding CHAR(4000) (or what the actual length is) to problematic column. In my case, the problematic column is "E" column. Example is below. In my case I solved my problem in this way, hope helps.
LOAD DATA
CHARACTERSET UTF8
-- This line is comment
-- Turkish charset (for ÜĞİŞ etc.)
-- CHARACTERSET WE8ISO8859P9
-- Character list is here.
-- https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313
INFILE 'data.txt' "STR '~|~\n'"
TRUNCATE
INTO TABLE SILTAB
FIELDS TERMINATED BY '#'
TRAILING NULLCOLS
(
a,
b,
c,
d,
e CHAR(4000)
)

You must ensure that the following charactersets are the same:
db characterset
dump file characterset
the client from which you are doing the import (NLS_LANG)
If the client-side characterset is different, oracle will attempt to perform character conversions to the native db characterset and this might not always provide the desired result.

Don't use MS Office to save the spreadsheet into unicode .csv.
Instead, use OpenOffice to save into unicode-UTF8 .csv file.
Then in the loader control file, add "CHARACTERSET UTF8"
run Oracle SQL*Loader, this gives me correct results

There is a range of character set encoding that you can use in control file while loading data from sql loader.
For greek characters I believe Western European char set should do the trick.
LOAD DATA
CHARACTERSET WE8ISO8859P1
or in case of MS word input files with smart characters try in control file
LOAD DATA
CHARACTERSET WE8MSWIN1252

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

SQL Loader with utf8 - oracle

Related

length error getting while loading Cyrillic records into Oracle DB

Thai characters not allowing more than 1333 characters from Java code

ORA-12899: value too large for column

Clob size in bytes

Loading Unicode Characters with Oracle SQL Loader (sqlldr) results in question marks

Categories

Resources