I am working with an Oracle 12.2 database. The database characterset is WE8MSWIN1252 (ie. an ASCII characterset).
The database contains a table with a CLOB column (according to Oracle SQL Developer). Some values in this column contain non-ASCII characters (I know this as when using ASCIISTR function on this column I can see the escaped non-ASCII character codes).
How is this possible? I thought ASCII characterset databases could only store unicode in NVARCHAR, NCLOB etc.
(I only discovered this when I was using a linked server to the Oracle db from SQL Server - when I ran an OPENQUERY on the table with the CLOB, it returned ? for the non-ASCII characters. I changed the OPENQUERY query string to use TO_NCLOB(clob_column) and it returned the non-ASCII characters.)
Any ideas?
Thanks
From wikipedia WE8MSWIN1252 description:
Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.
So, it a CLOB in a database with this charset can store strings like éàè. And ASCIISTR returns escaped codes because these chars are not defined in ASCII, for example:
SQL> select asciistr('é') eaccent, asciistr('e') e from dual;
EACCENT E
---------- -
\FFFD\FFFD e
I'm trying to load an UTF8 CSV file with Chinese characters on it, only to discover that in my table the correct encoding is lost. My table has UTF8 as configured charset.
I'm using a bash script on RHEL 5 with MySQL command line client and my statement is
LOAD DATA LOCAL INFILE 'file' INTO TABLE 'table'
CHARACTER SET "UTF8"
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
Is there something I can do to overcome this?
Recently I managed to do this.
I loaded a text file which has lots of Chinese characters into my MySQL:
My text file is encoded with utf8,
my table is encoded with utf8
and I used your statement.
It worked.
I think you should try to convert you file into utf8 first and make sure your table is encoded with utf8.
BTW, adding charset=utf8, like CREATE TABLE test (column_a varchar(100)) charset=utf8, will make the table encoded with utf8.
I am using oracle 10g.The character set for DB is as below: NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_CHARACTERSET AL32UTF8.
I have ®(the circled "R") symbol coming in a .txt file in one of the fields and when the same file is loaded in a external table, the symbol is converted in a '?'.
Please suggest.
Where do you see the ® being converted to a question mark? It may be a problem in the encoding of what you're using to view the table rather than the table itself. I'd also check what you're using to load the database. UTF8 should support the character.
I'm trying to load localized strings from a unicode (UTF8-encoded) csv using SQL Loader into an oracle database. I've tried all sort of combinations but nothing seems to give me the result I'm looking for which is to have special greek characters like (Δ) not get converted to Δ or ¿.
My table definition looks like this:
CREATE TABLE "GLOBALIZATIONRESOURCE"
(
"RESOURCETYPE" VARCHAR2(255 CHAR) NOT NULL ENABLE,
"CULTURE" VARCHAR2(20 CHAR) NOT NULL ENABLE,
"KEY" VARCHAR2(128 CHAR) NOT NULL ENABLE,
"VALUE" VARCHAR2(2048 CHAR),
"DESCRIPTION" VARCHAR2(512 CHAR),
CONSTRAINT "PK_GLOBALIZATIONRESOURCE" PRIMARY KEY ("RESOURCETYPE","CULTURE","KEY") USING INDEX TABLESPACE REPSPACE_IX ENABLE
)
TABLESPACE REPSPACE;
I have tried the following configurations in my control file (and actually every permutation I could think of)
load data
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
load data
CHARACTERSET UTF8
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
load data
CHARACTERSET UTF16
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY X'002c' OPTIONALLY ENCLOSED BY X'0022'
TRAILING NULLCOLS
(
"RESOURCETYPE" CHAR(255),
"CULTURE" CHAR(20),
"KEY" CHAR(128),
"VALUE" CHAR(2048),
"DESCRIPTION" CHAR(512)
)
With the first two options, the unicode characters don't get encoded and just show up as upside down question marks.
If I choose last option, UTF16, then I get the following error even though all my data in my fields are much shorter than the length specified.
Field in data file exceeds maximum length
It seems as though every possible combination of ctl file configurations (even setting the byte order to little and big) doesn't work correctly. Can someone please give an example of a configuration (table structure and CTL file) that correctly loads unicode data from a csv? Any help would be greatly appreciated.
Note: I've already been to http://docs.oracle.com/cd/B19306_01/server.102/b14215/ldr_concepts.htm, http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htm and http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htm.
I had same issue and resolved by below steps -
Open data file into Notepad++ , Go to "Encoding" dropdown and select UTF8 encoding and save file.
use CHARACTERSET UTF8 into CTL file and then upload data.
You have two problem;
Character set.
Answer: You can solve this problem by finding your text character set (most of time notepad++ can do this.). After finding character set, you have to find sqlldr correspond of character set name. So, you can find this info from link https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313
After all of these, you should solve character set problem.
In contrast to your actual data length, sqlldr says that, Field in data file exceeds maximum length.
Answer: You can solve this problem by adding CHAR(4000) (or what the actual length is) to problematic column. In my case, the problematic column is "E" column. Example is below. In my case I solved my problem in this way, hope helps.
LOAD DATA
CHARACTERSET UTF8
-- This line is comment
-- Turkish charset (for ÜĞİŞ etc.)
-- CHARACTERSET WE8ISO8859P9
-- Character list is here.
-- https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313
INFILE 'data.txt' "STR '~|~\n'"
TRUNCATE
INTO TABLE SILTAB
FIELDS TERMINATED BY '#'
TRAILING NULLCOLS
(
a,
b,
c,
d,
e CHAR(4000)
)
You must ensure that the following charactersets are the same:
db characterset
dump file characterset
the client from which you are doing the import (NLS_LANG)
If the client-side characterset is different, oracle will attempt to perform character conversions to the native db characterset and this might not always provide the desired result.
Don't use MS Office to save the spreadsheet into unicode .csv.
Instead, use OpenOffice to save into unicode-UTF8 .csv file.
Then in the loader control file, add "CHARACTERSET UTF8"
run Oracle SQL*Loader, this gives me correct results
There is a range of character set encoding that you can use in control file while loading data from sql loader.
For greek characters I believe Western European char set should do the trick.
LOAD DATA
CHARACTERSET WE8ISO8859P1
or in case of MS word input files with smart characters try in control file
LOAD DATA
CHARACTERSET WE8MSWIN1252
I have an external table that reads from a fixed length file. The file is expected to contain special characters. In my case the word containing special character is "Göteborg". Because "ö" is a special character, looks like Oracle is considering it as 2 bytes. That causes the trouble. The subsequent fields in the files get shifted by 1 byte thereby messing up the data. Has anyone faced the issue before. So far we have tried the following solution:
Changed the value of NLS_LANG to AMERICAN_AMERICA.WE8ISO8859P1
Tried Setting the Database Character set to UTF-8
Tried changing the NLS_LENGTH_SYMMANTIC to CHAR instead of BYTE using ALTER SYSTEM
Tried changing the External table characterset to: AL32UTF8
Tried changing the External table characterset to: UTF-8
Nothing works.
Other details include:
File is UTF-8 encoded
Operating System : RHEL
Database: Oracle 11g
Any thing else that I might be missing? Any help will be appreciated. Thanks!
The nls_length_semantics only pertains to the creation of new tables.
Below is what I did to fix this very problem.
records delimited by newline
CHARACTERSET AL32UTF8
STRING SIZES ARE IN CHARACTERS
i.e.
ALTER SESSION SET nls_length_semantics = CHAR
/
CREATE TABLE TDW_OWNER.SDP_TST_EXT
(
COST_CENTER_CODE VARCHAR2(10) NULL,
COST_CENTER_DESC VARCHAR2(40) NULL,
SOURCE_CLIENT VARCHAR2(3) NULL,
NAME1 VARCHAR2(35) NULL
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY DBA_DATA_DIR
ACCESS PARAMETERS
( records delimited by newline
CHARACTERSET AL32UTF8
STRING SIZES ARE IN CHARACTERS
logfile DBA_DATA_DIR:'sdp_tst_ext_%p.log'
badfile DBA_DATA_DIR:'sdp_tst_ext_%p.bad'
discardfile DBA_DATA_DIR:'sdp_tst_ext_%p.dsc'
fields
notrim
(
COST_CENTER_CODE CHAR(10)
,COST_CENTER_DESC CHAR(40)
,SOURCE_CLIENT CHAR(3)
,NAME1 CHAR(35)
)
)
LOCATION (DBA_DATA_DIR:'sdp_tst.dat')
)
REJECT LIMIT UNLIMITED
NOPARALLEL
NOROWDEPENDENCIES
/