oracle sqlldr not recognizing special characters - oracle

I am facing a scenario where the sqlldr is not being able to recognize special characters. I usually don't bother about this as its not important for me to have the exact same names however this led to another issue which is causing the system to malfunction.
unittesting.txt
8888888,John SMITÉ,12345678
unittesting.ctl
load data
CHARACTERSET UTF8
infile 'PATH/unittesting.txt'
INSERT
into table temp_table_name
Fields Terminated By ',' TRAILING NULLCOLS(
ID_NO CHAR(50) "TRIM(:ID_NO)" ,
NAME CHAR(50) "TRIM(:NAME)" ,
ID_NO2 CHAR(50) "TRIM(:ID_NO2)" )
SQLLDR command
sqlldr DB_ID/DB_PASS#TNS
control=PATH/unittesting.ctl
log=PATH/unittesting.log
bad=PATH/unittesting.bad
errors=100000000
OUTPUT from table
|ID_NO |NAME |ID_NO2 |
|8888888 |John SMIT�12345678 | |
Other information about system [RHEL 7.2, Oracle 11G]
export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
select userenv('language') from dual
OUTPUT: AMERICAN_AMERICA.AL32UTF8
file -i unittesting.txt
OUTPUT: unittesting.txt: text/plain; charset=iso-8859-1
echo $LANG
OUTPUT: en_US.UTF-8
Edit:
So i tried to change the encoding as advised by [Cyrille MODIANO] of my file & use it. The issue got resolved.
iconv -f iso-8859-1 -t UTF-8 unittesting.txt -o unittesting_out.txt
My challenge now is that I don't know the character set of the incoming files & its coming from different different sources. The output of file -i i get for my source data file is :
: inode/x-empty; charset=binary
From my understanding, charset=binary means that the character set is unknown. Please advise what I can do in this case. Any small advice/ idea is much appreciated.

Related

How to add sysdate from bcp

I have a .csv file with the following sample data format:
REFID|PARENTID|QTY|DESCRIPTION|DATE
AA01|1234|1|1st item|null
AA02|12345|2|2nd item|null
AA03|12345|3|3rd item|null
AA04|12345|4|4th item|null
To load the above file into a table I am using below BCP command:
/bcp $TABLE_NAME in $FILE_NAME -S $DB_SERVER -t "|" -F 1 -U $DB_USERNAME -d $DB_NAME
What i am trying to look here is like below (adding sysdate instead of null from bcp)
AA01|1234|1|1st item|3/16/2020
AA02|12345|2|2nd item|3/16/2020
AA03|12345|3|3rd item|3/16/2020
AA04|12345|4|4th item|3/16/2020
Update : I was able to exclude header with #Jamie answer by -F 1 option, but looking for some help on inserting date with bcp. Tried looking some old Q&A, but no luck so far..
To exclude a single header record, you can use the -F option. This will tell BCP which line in the file is the first line to begin loading from. For your sample, -F2 should work fine. However, your command has other issues. See comments.
There is no way to introduce new data using the BCP command as you stated. BCP cannot introduce a date value while copying data into your table. To accomplish this I suggest a default for your date column or to first load the raw data into a table without the date column then you can introduce the date value as you see fit in late processing.

MYSQL bulk insert - Linux

I am trying to load the text file in MYSQL but I got below error.
Error Code: 1064
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Rank=#Rank' at line 7
LOAD DATA LOCAL INFILE 'F:/keyword/Key_2018-10-06_06-44-09.txt'
INTO TABLE table
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r\n'
IGNORE 0 LINES
(#dump_date,#Rank)
SET dump_date=#dump_date,Rank=#Rank;
But the above query working in windows server. And same time not working in Linux server .
I am going to suggest here that you try executing that command from the command line in a single line:
LOAD DATA LOCAL INFILE 'F:/keyword/Key_2018-10-06_06-44-09.txt' INTO TABLE
table FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r\n' IGNORE 0 LINES
(#dump_date,#Rank) SET dump_date=#dump_date,Rank=#Rank;
For formatting reasons, I have added newlines above, but don't do that when you run it from the Linux prompt, just use a single line. Anyway, the text should nicely wrap around when you type it.

Register/Trademark symbols in vertica

I have a txt file containing some data.
One of the columns contains Register/Trademark/Copyright symbol in it.
For eg, "DataWeb #symphone ®" and "Copyright © technologies"
Now when I load this txt file in database, all data gets stored properly except these symbols ®©
Are they supported by vertica ? Are there any way to do this ?
Thanks!
Vertica supports Unicode characters encoded UTF-8. Your message is a little bit vague because is not clear what is your problem. If I were you I would double check those characters are properly encoded and your font set is able to visualise them. Here you have a little test...
First let's create a properly UTF-8 encoded file:
$ echo -e "DataWeb #symphone \xc2\xae" > /tmp/test.dat
$ echo -e "Copyright \xc2\xa9 technologies" >> /tmp/test.dat
$ cat /tmp/test.dat
DataWeb #symphone ®
Copyright © technologies
Then let's create/load a table:
$ vsql
SQL> CREATE TABLE public.test ( txt VARCHAR(20) ) ;
SQL> COPY public.test FROM '/tmp/test.dat' ABORT ON ERROR DIRECT;
And, finally, let's query this table:
$ vsql
SQL> SELECT txt FROM public.test ;
txt
---------------------
DataWeb #symphone ®
Copyright © technol
(2 rows)
I'd suggest you to run this test from Linux using vsql command line interface (avoid Win and click-click interfaces).

Insert Special characters 'Mongolian tögrög' and symbol '₮' in Oracle Database

I need to insert currency Mongolian tögrög and symbol ₮ to Oracle Database.
The insert query as :
INSERT INTO CURRENCY (CUR_ISO_ID, CUR_ISO_CODE, CUR_DESC, CUR_DECIMAL_PLACE, CUR_SYMBOL)
VALUES (496,'MNT','Mongolian tögrög',2,'₮');
results as:
CUR_ISO_ID | CUR | CUR_DESC | CUR_DECIMAL_PLACE | CUR_SYMBOL |
-----------------------------------------------------------------------
496 | MNT | Mongolian t?gr?g | 2 | . |
Kindly advise on how to get the special characters inserted as is to the Database? i.e. the symbol not as . but ₮ and the description not as Mongolian t?gr?g but Mongolian tögrög. Please help.
Before you launch your SQL*Plus enter these commands:
chcp 65001
set NLS_LANG=.AL32UTF8
The first command sets codepage of cmd.exe to UTF-8.
The second command tells your database: "I am using UTF-8"
Then you sql should work. I don't think there is any 8-bit Windows codepage 125x which supports Mongolian tögrög.
See also this post to get some more information: NLS_LANG and others
Check also this discussion how to use sqlplus with utf8 on windows command line, there is an issue when you use UTF-8 at command line.

sqlite field separator for importing

I Just started using SQLite for our log processing system where I just import a file in to sqlite database which has '#' as field separator.
If I run the following in SQLite repl
$ sqlite3 log.db
sqlite> .separator "#"
sqlite> .import output log_dump
It works [import was successful]. But if I try to do the same via a bash script
sqlite log.db '.separator "#"'
sqlite log.db '.import output log_dump'
it doesn't. The separator shifts back to '|' and I'm getting an error saying that there are insufficient columns
output line 1: expected 12 columns of data but found 1
How can I overcome this issue?
You should pass two commands to sqlite at the same time:
echo -e '.separator "#"\n.import output log_dump' | sqlite log.db

Resources