Oracle Import Error - Field Length - oracle

I'm running into errors when trying to import a table from a western character set into UTF character set. I know the problem is that some of the data "grows". F
For example:
ORA-02374: conversion error loading table "RESULT"
ORA-12899: value too large for column MATRIX_NAME (actual: 11, maximum: 10)
ORA-02372: data for row: MATRIX_NAME :
I know originally when the backup was done in the source database (which uses Western English character set), that column field length is defined as varchar(10), however when imported into target database (uses UTF character set), the data grows to length of 11.
Does Oracle have a way to automatically fix the field length (so it expands the field length automatically into 11), before it imports the data, so to ensure there is no truncating of the data. How to do it?
I'm using Oracle 11G.

Yes. Oracle's built-in import / export tool can automatically handle size issues.


ORA-12899: value too large for column

I am getting data from erp systems in the form of feeds ,in particular one column length in feed is 15 only.
In target table also corresponded column also length is varchar2(15) but when I am trying to load same into db it showing error like:
ORA-12899: value too large for column emp_name (actual: 16, maximum:
I cant increase the column length since it is base table in the production.
have a look into this blog, the problem resolved for me by changing the column datatype from varchar(100) to varchar(100 char). in my case the data contains some umlaut characters.
The usual reason for problems like this are non-ASCII characters that can be represented with one byte in the original database but require two (or more) bytes in the target database (due to different NLS settings).
To ensure your target column is large enough for 15 characters, you can modify it:
ALTER TABLE table_name MODIFY column_name VARCHAR2(15 CHAR)
(note the 15 CHAR - you can also use BYTE; if neither is present, the database uses the NLS_LENGTH_SEMANTICS setting as a default).
To check which values are larger than 15 bytes, you can
create a staging table in the target database with the column length set to 15 CHAR
insert the data from the source table into the staging table
find the offending rows with
SELECT * FROM staging WHERE lengthb(mycol) > 15
(note the use of LENGTHB as apposed to LENGTH - the former returns the length in bytes, whereas the latter returns the length in characters)
I found AL32UTF8 as the only valid setting. This varies from standard UTF8 with a few character having supplementary bytes, i.e, the characters are about 99% the same. I am guessing you have character conversion problems going on. In other words the data in table1 was written using one charset, and the new table has a slightly different charset.
If this is true, you have to find the source of the oddball charset. Because this will continue to happen.
Solution to:
If you are facing problem while updating a column size of a table which already has data more than the new length below is the simple script that would work definitely.
Meaning of the query:
It would just create a new column of the required new length in your existing table.
It will discard all the values after the new length value from old column values and set the trimmed values into the new column name.
It will remove the old column name as its absurd now and we have copied all the information into the new column.
Renaming the new column name to the old column name would help you regain the original table structure except for the new column size as you wished.
Certainly the cause of error is that the value is too large for column data type. However, sometimes it is not visible at first sight. Except "byte versus char" differences mentioned in other answers, there can also be problem with line terminators.
I was trying to load CSV file using SQL*Loader in dockerized Oracle. The foo column of type char(1) was the last column. I got ORA-12899: value too large for column foo (actual: 2, maximum: 1) error despite all values of foo column were of length 1. Later I noticed the CSV file has been edited in Windows editor and accidentally saved with CRLF terminators. Since Linux in Docker container expects just LF, the CR was treated as part of column data.
This error made me confused a little bit.
VARCHAR2(x CHAR) means that the column will hold x characters but not
more than can fit into 4000 bytes. Internally, Oracle will set the
byte length of the column (DBA_TAB_COLUMNS.DATA_LENGTH) to MIN(x *
mchw, 4000), where mchw is the maximum byte width of a character in
the database character set. This is 1 for US7ASCII or WE8MSWIN1252, 2
for JA16SJIS, 3 for UTF8, and 4 for AL32UTF8.
For example, a VARCHAR2(3000 CHAR) column in an AL32UTF8 database will
be internally defined as having the width of 4000 bytes. It will hold
up to 3000 characters from the ASCII range (the character limit), but
only 1333 Chinese characters (the byte limit, 1333 * 3 bytes = 3999
bytes). A VARCHAR2(100 CHAR) column in an AL32UTF8 database will be
internally defined as having the width of 400 bytes. It will hold up
to any 100 Unicode characters.

How did the unicode characters endup in the database table column?

Recently I came across a unicode character (\u2019) in a database table column while parsing using Python.
Question: What are the reasons that can result in unicode characters showing up in the database table? Is it data entry issue?
Appreciate any input.
When you set up your Oracle Database you choose a character set which will be used in the SQL char datatypes (char, varchar2 etc).
Suppose you chose your character set and you have a table with a column of VARCHAR2 type. Suddenly you need to store some string with non-ASCII symbols not supported by your database (chosen character set). You may convert this string into ASCII string by calling ASCIISTR function for example and store it in your VARCHAR2 column (but it's not a good idea because many SQL built-in functions don't understand '\u2019' (they think it's just 6 symbols)). That's how Unicode may appear in your table column (ASCIISTR converts non-ascii symbols into unicode representation such as '\u2019').
Another option is special Oracle nchar datatypes which were designed to store UNICODE without altering global database settings.
Here is the link with Oracle documentation:

Oracle: possible encoding problems when importing data

On Oracle 11, I dumped my data using exp/imp to be migrated to another DB.
I tested to import the dump file on my local database, with no problem at all.
But then my colleague tried the same on his own machine and some tables couldn't get imported due to the error:
can bind a LONG value only for insert into a LONG column.
I dont have any long type, but I read that this error could also be thrown when size exceeds on a varchar2 type, So I checked character sets of databases, I have default Windows charset and he has utf8 charset. So do you think maybe same length of data are represented with more bytes and this leads to this kind of error?
Do I have to change my database charset and create another dump? I look for a better solution, because this also needs to be imported to customers database, which I know has a totally different charset..
Any windows inherited character set isn't multi byte by definition. When you created multi byte(utf8) db every single character may be converted during the import to 1-3 bytes. So you have to increase automatically before import every string type column to x3 times. In case you will case the limit of 4096 use Clob type instead.

Character set in Oracle 11g r2 XE

I have an exported data using exp command from a full Oracle 11gR2 database that has the AR8MSWIN1256 charset. However, when I import the data into an 11gR2 XE database, I get the error:
row rejected due to ORACLE error 12899
Could the problem be the mismatch in charsets (AL32UTF8 vs AR8MSWIN1256)? If so, is there a solution?
the table almost certainly has length semantics BYTE for the character columns. imp creates the table with the same length semantics as they were in the source database. So if you want to migrate to a multibyte character set you need to make sure that the length semantics of those columns are changed to character.
Easiest is to pre-create the tables and make sure that your column definitions don't specify their length in bytes but in characters.

SSIS Package Troubleshooting

I'm working with an SSIS Package that pulls data from a DB2 source, runs through a conversion process (unicode stuff) and then stores the data in a SQL table. From the error information below, I have been able to determine that there is some kind of special characters in the DB2 file/table. What I do not know is how I can narrow down which specific record has the issue. There are about 200,000 records in the DB2 file and I need to know which one is specifically causing the issue.
Is there a way to query the DB2 source looking for "special characters"? Is there a way to have the SSIS package show me which record it is failing on?
Error: 2009-07-15 01:32:31.19 Code:
0xC020901C Source: Import MY APP Data
DETAIL [2670] Description: There was
an error with output column "COLUMN1"
(2710) on output "OLE DB Source
Output" (2680). The column status
returned was: "Text was truncated or
one or more characters had no match in
the target code page.".
DB2 has a built-in function called HEX() that takes in just about any expression of any type and returns a VARCHAR of the hex representation of each byte. You can also specify any binary value as a literal by prepending it x', for example: x'0123456789abcdef'
If the problem is coming from a single-byte character, you could find it by building up temp table of all single characters from x'00' to x'ff' and seeing which ones appear in each row of your DB2 data. You could also add some code to the utility that converts the data for Unicode so it will scan the DB2 records for any anomalies.
