How to make an INFILE in SQLLDR - oracle

Help guys i need to make a dynamic infile. so the common infile is
INFILE 'yourcsvfile.csv'
LOAD DATA
INFILE 'yourcsvfile.csv'
append
INTO TABLE table_name
TRUNCATE
FIELDS TERMINATED BY ','
is it possible to make it like this?
INFILE (sysdate , 'YYYYMMDD') || (_STRING.csv)
so meaning im searching for 20160816_STRING.csv

You can create a Windows batch script which gets the current date in the format you want and then calls the Oracle loader:
set "part1=!date:~10,4!!date:~6,2!/!date:~4,2!"
set "part2=_STRING.csv"
set "yourfile=%part1%%part2%"
#echo LOAD DATA INFILE %yourfile% APPEND INTO TABLE table_name TRUNCATE FIELDS TERMINATED BY ','; | sqlplus username/password#database

Related

multi file insert from hive table not working?

Hi i have 200 gb of data in one of my hive table backed on HBase.
I have to create 142 different files out of that table currently trying for 3 files only .
I want to run all query to run parallel at the same time .
I was trying multi file insert from hive table but getting parse exception .
This is my query that i was trying .
FROM hbase_table_FinancialLineItem
INSERT OVERWRITE LOCAL DIRECTORY '/hadoop/user/m6034690/FSDI/FinancialLineItem/Japan.txt'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
select * from hbase_table_FinancialLineItem WHERE FilePartition='Japan'
INSERT OVERWRITE LOCAL DIRECTORY '/hadoop/user/m6034690/FSDI/FinancialLineItem/SelfSourcedPrivate.txt'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
select * from hbase_table_FinancialLineItem WHERE FilePartition='SelfSourcedPrivate'
INSERT OVERWRITE LOCAL DIRECTORY '/hadoop/user/m6034690/FSDI/FinancialLineItem/ThirdPartyPrivate.txt'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
select * from hbase_table_FinancialLineItem WHERE FilePartition='ThirdPartyPrivate';
And after running this i was getting below error.
FAILED: ParseException line 7:9 missing EOF at 'from' near '*'
I think it can be solved when you add this FROM hbase_table_FinancialLineItem; at the end of each insert overwrite.

Create temporary file to load in Hive table using stdout redirection

I would like to create a script to load a tsv file into HIVE.
However since the .tsv file contains an header
so I first have to create a temporary file without it.
In my script.hql I have the following:
DROP TABLE metadata IF EXISTS ;
CREATE TABLE metadata (
id INT,
value STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE ;
! tail -n +2 metadata.tsv > tmp_metadata.tsv ;
LOAD DATA LOCAL 'tmp_metadata.tsv' INTO metadata ;
The problem is that hive complains about the > that should make the redirection to the new fails and therefore the scripts fails.
How can I fix this?
1. Create a new shell script named script.sh and add this in your shell script:
#!/bin/sh
tail -n +2 metadata.tsv > tmp_metadata.tsv
hive -v -f ./script.hql
2. Instead of your script.hql:
DROP TABLE metadata IF EXISTS ;
CREATE TABLE metadata (id INT,value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;
! tail -n +2 metadata.tsv > tmp_metadata.tsv ;
LOAD DATA LOCAL 'tmp_metadata.tsv' INTO metadata ;
Add this to your script.hql:
DROP TABLE IF EXISTS metadata;
CREATE TABLE metadata (
id INT,
value STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE ;
LOAD DATA LOCAL INPATH 'tmp_metadata.tsv' INTO TABLE metadata ;
DROP TABLE metadata IF EXISTS ; is not correct. Change it to DROP TABLE IF EXISTS metadata;
and
LOAD DATA LOCAL 'tmp_metadata.tsv' INTO metadata ; is not correct. Change it to LOAD DATA LOCAL INPATH 'tmp_metadata.tsv' INTO TABLE metadata ;
3. Now, change permission for your shell script and execute it:
sudo chmod 777 script.sh
./script.sh

add date time from flat file name cloudera

I started an EC2 cluster on amazon to install cloudera...I got it installed and configured and loaded some of the Wiki Page Views public snapshot into HDFS. The structure of the files are as such:
projectcode, pagename, pageviews, bytes
the files are named as such:
pagecounts-20090430-230000.gz
date time
when loading the data from HDFS to Impala, I do it as such:
CREATE EXTERNAL TABLE wikiPgvws
(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LOCATION '/user/hdfs';
one thing I missed is the date and time of each of the file. The dir:
/user/hdfs
contains multiple pagecount files associated with different dates and times. How can one pull that information and store it in a column when loading to impala?
I think the thing you are missing is the concept of partitions. If you define the table as partitioned, the data may be divided to different partitions based on the timestamp(in the name) of the file. I'm able to work around it in hive, I hope you to do the needful(if any) for impala as there query syntax is the same.
For me, this problem is not possible to solve only using hive. So I mixed up bash with hive scripting and it works fine for me. This is how I wrapped it up :
Create table wikiPgvws with partition
Create table wikiTmp with same fields as wikiPgvws except for partitions
For each file
i. Load data into wikiTmp
ii. grep timeStamp from fileName
iii. Use sed to replace placeholders in a predefined hql script file to load the data to the actual table. Then run it.
Drop table wikiTmp & remove tmp.hql
The script is as follows :
#!/bin/bash
hive -e "CREATE EXTERNAL TABLE wikiPgvws(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
PARTITIONED BY(dts STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE";
hive -e "CREATE TABLE wikiTmp(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE"
for fileName in $(hadoop fs -ls /user/hdfs/bounty/pagecounts-*.txt | grep -Po '(?<=\s)(/user.*$)')
do
echo "currentFile :$fileName"
dst=$(echo $filename | grep -oE '[0-9]{8}-[0-9]{6}')
echo "currentStamp $dst"
sed "s!sourceFile!'$fileName'!" t.hql > tmp.hql
sed -i "s!targetPartition!$dst!" tmp.hql
hive -f tmp.hql
done
hive -e "DROP TABLE wikiTmp"
rm -f tmp.hql
The hql script consists of just two lines :
LOAD DATA INPATH sourceFile OVERWRITE INTO TABLE wikiTmp;
INSERT OVERWRITE TABLE wikiPgvws PARTITION (dts = 'targetPartition') SELECT w.* FROM wikiTmp w;
Epilogue :
Check, whether options equivalent to hive -e & hive -f are available in impala. Without them, this script is of no use to you. Again the grep commands to fetch the fileName & timeStamp need to be modified according to your table location and stamp pattern. It's just one a way to show how the job can be done, but couldn't able to find another one.
Enhencement
If everything works well, consider merging the first two DDLs into another script to make it look cleaner. Although, I'm not sure that hql script arguments can be used to define partition values, still you can have a look to replace sed.

Multiple Line Variable into SQLPlus from Shell Script

What is the best way to pass multiple values from one variable into separate records in an oracle db?
I want to take the output from:
hddlist=`iostat -Dl|awk '{print ""$1"="$(NF)}'
This returns output like this:
hdisk36=0.8
hdisk37=0.8
hdisk38=0.8
hdisk40=5.5
hdisk52=4.9
I want to insert them into a database like so:
sqlplus -s /nolog <<EOF1
connect / as sysdba
set verify off
insert into my_table ##Single Record Here
EOF1
How can I systematically separate out the values so i can create individual records that look like this:
Disk Value
--------- -------
hdisk36 0.8
hdisk37 0.8
hdisk38 0.8
hdisk40 5.5
hdisk52 4.9
I originally tried a while loop with a counter but could not seem to get it to work. An exact solution would be nice but some directional advice would be just as helpful.
Loop and generate insert statements.
sql=$(iostat -Dl | awk '{print ""$1"="$(NF)}' | while IFS== read -r k v ; do
printf 'insert into mytable (k, v) values (%s, %s);\n' "$k" "$v"
done)
This output can be passed in some manner to sqlplus, perhaps like this
sqlplus -s /nolog <<EOF1
connect / as sysdba
set verify off
$sql
EOF1
Although, depending on the line format of iostat, it might be simpler to just omit awk and parse with read directly.
You can redirect the output to a file and then use an external table
It should look something like this:
CREATE TABLE hddlist_ext_table (
disk CHAR(16),
value CHAR(3)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER DEFAULT DIRECTORY tab_dir
ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '=')
LOCATION ('your_file_name'));
Then you can either use this table for your data or insert-select from it to your table;
insert into my_table
select disk, value from hddlist_ext_table;
You can insert multiple rows in a single SQL statement in Oracle like this
INSERT ALL
INTO mytable (column1, column2, column3) VALUES ('val1.1', 'val1.2', 'val1.3')
INTO mytable (column1, column2, column3) VALUES ('val2.1', 'val2.2', 'val2.3')
INTO mytable (column1, column2, column3) VALUES ('val3.1', 'val3.2', 'val3.3')
SELECT * FROM dual;
If you intend to run this script automatically at intervals to then see the results of each disk, you will probably need additional columns to hold the date and time.
You might also look at sqlldr as you can specify a control file telling it what your data contains and then this will load the data into a table. It is more suited to the purpose if you are loading lots of data than SQL Plus.

Oracle. load data infile error

Table:
CREATE TABLE image_table (
image_id NUMBER(5),
file_name VARCHAR2(30),
image_data BLOB);
SQL:
load data infile * replace into table test_image_table
fields terminated by ','
(
image_id INTEGER(5),
file_name CHAR(30),
image_data LOBFILE (CONSTANT 'C:\img.txt') TERMINATED BY EOF
)
C:\img.txt: 001,C:\1.jpg
Error:
ORA-00928: missing SELECT keyword
00928. 00000 - "missing SELECT keyword"
*Cause:
*Action:
Error at Line: 4 Column: 1
What I do wrong ??
You want to use SQL*Loader which is not SQL*Plus. You have to save what you call SQL as a file with the .ctl extension, and call sqlldr:
sqlldr login/password#database control=my_file.ctl
Note that infile * means that you must have some BEGINDATA inside your CTL file.
It seems like you are trying to use the SQL*Plus to run your SQL*Loader control file. Use one of the below sqlldr in your UNIX command line. Don't forget to save your mentioned SQL file as a .ctl file.
sqlldr username#server/password control=loader.ctl
or
sqlldr username/password#server control=loader.ctl
Try this in SQL Developer: host sqlldr username/password control=my_file.ctl

Resources