What is the best way to pass multiple values from one variable into separate records in an oracle db?
I want to take the output from:
hddlist=`iostat -Dl|awk '{print ""$1"="$(NF)}'
This returns output like this:
hdisk36=0.8
hdisk37=0.8
hdisk38=0.8
hdisk40=5.5
hdisk52=4.9
I want to insert them into a database like so:
sqlplus -s /nolog <<EOF1
connect / as sysdba
set verify off
insert into my_table ##Single Record Here
EOF1
How can I systematically separate out the values so i can create individual records that look like this:
Disk Value
--------- -------
hdisk36 0.8
hdisk37 0.8
hdisk38 0.8
hdisk40 5.5
hdisk52 4.9
I originally tried a while loop with a counter but could not seem to get it to work. An exact solution would be nice but some directional advice would be just as helpful.
Loop and generate insert statements.
sql=$(iostat -Dl | awk '{print ""$1"="$(NF)}' | while IFS== read -r k v ; do
printf 'insert into mytable (k, v) values (%s, %s);\n' "$k" "$v"
done)
This output can be passed in some manner to sqlplus, perhaps like this
sqlplus -s /nolog <<EOF1
connect / as sysdba
set verify off
$sql
EOF1
Although, depending on the line format of iostat, it might be simpler to just omit awk and parse with read directly.
You can redirect the output to a file and then use an external table
It should look something like this:
CREATE TABLE hddlist_ext_table (
disk CHAR(16),
value CHAR(3)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER DEFAULT DIRECTORY tab_dir
ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '=')
LOCATION ('your_file_name'));
Then you can either use this table for your data or insert-select from it to your table;
insert into my_table
select disk, value from hddlist_ext_table;
You can insert multiple rows in a single SQL statement in Oracle like this
INSERT ALL
INTO mytable (column1, column2, column3) VALUES ('val1.1', 'val1.2', 'val1.3')
INTO mytable (column1, column2, column3) VALUES ('val2.1', 'val2.2', 'val2.3')
INTO mytable (column1, column2, column3) VALUES ('val3.1', 'val3.2', 'val3.3')
SELECT * FROM dual;
If you intend to run this script automatically at intervals to then see the results of each disk, you will probably need additional columns to hold the date and time.
You might also look at sqlldr as you can specify a control file telling it what your data contains and then this will load the data into a table. It is more suited to the purpose if you are loading lots of data than SQL Plus.
Related
I have an external hive table on top of a parquet file.
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
I want to get the count of table using shell script.
I tried with following command
myVar =$(hive -S -e " select count(*) from parquet_test;")
echo $myVar
Added -S to run hive in silent mode still I get whole map reduce log and count in the myVar variable. How to get only count.
I don't have access to any of the configuration file to enable or disable the level of logging. Is there any other way?
Finally found a work around.
First flushed the query result into a file in HDFS then read answer from file.
The file only contains the result of the query.
(hive -S -e " INSERT OVERWRITE LOCAL DIRECTORY '/home/test/result/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select count(*) from parquet_test;")
Then reading the file into a variable
Count var=$(hdfs dfs -tail /home/test/result/)
echo $var
Thank you
myVar=$(eval "hive -S -e 'select count(*) from parquet_test;' ")
echo $myVar
Help guys i need to make a dynamic infile. so the common infile is
INFILE 'yourcsvfile.csv'
LOAD DATA
INFILE 'yourcsvfile.csv'
append
INTO TABLE table_name
TRUNCATE
FIELDS TERMINATED BY ','
is it possible to make it like this?
INFILE (sysdate , 'YYYYMMDD') || (_STRING.csv)
so meaning im searching for 20160816_STRING.csv
You can create a Windows batch script which gets the current date in the format you want and then calls the Oracle loader:
set "part1=!date:~10,4!!date:~6,2!/!date:~4,2!"
set "part2=_STRING.csv"
set "yourfile=%part1%%part2%"
#echo LOAD DATA INFILE %yourfile% APPEND INTO TABLE table_name TRUNCATE FIELDS TERMINATED BY ','; | sqlplus username/password#database
I started an EC2 cluster on amazon to install cloudera...I got it installed and configured and loaded some of the Wiki Page Views public snapshot into HDFS. The structure of the files are as such:
projectcode, pagename, pageviews, bytes
the files are named as such:
pagecounts-20090430-230000.gz
date time
when loading the data from HDFS to Impala, I do it as such:
CREATE EXTERNAL TABLE wikiPgvws
(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LOCATION '/user/hdfs';
one thing I missed is the date and time of each of the file. The dir:
/user/hdfs
contains multiple pagecount files associated with different dates and times. How can one pull that information and store it in a column when loading to impala?
I think the thing you are missing is the concept of partitions. If you define the table as partitioned, the data may be divided to different partitions based on the timestamp(in the name) of the file. I'm able to work around it in hive, I hope you to do the needful(if any) for impala as there query syntax is the same.
For me, this problem is not possible to solve only using hive. So I mixed up bash with hive scripting and it works fine for me. This is how I wrapped it up :
Create table wikiPgvws with partition
Create table wikiTmp with same fields as wikiPgvws except for partitions
For each file
i. Load data into wikiTmp
ii. grep timeStamp from fileName
iii. Use sed to replace placeholders in a predefined hql script file to load the data to the actual table. Then run it.
Drop table wikiTmp & remove tmp.hql
The script is as follows :
#!/bin/bash
hive -e "CREATE EXTERNAL TABLE wikiPgvws(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
PARTITIONED BY(dts STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE";
hive -e "CREATE TABLE wikiTmp(
project_code varchar(100),
page_name varchar(1000),
page_views int,
page_bytes int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE"
for fileName in $(hadoop fs -ls /user/hdfs/bounty/pagecounts-*.txt | grep -Po '(?<=\s)(/user.*$)')
do
echo "currentFile :$fileName"
dst=$(echo $filename | grep -oE '[0-9]{8}-[0-9]{6}')
echo "currentStamp $dst"
sed "s!sourceFile!'$fileName'!" t.hql > tmp.hql
sed -i "s!targetPartition!$dst!" tmp.hql
hive -f tmp.hql
done
hive -e "DROP TABLE wikiTmp"
rm -f tmp.hql
The hql script consists of just two lines :
LOAD DATA INPATH sourceFile OVERWRITE INTO TABLE wikiTmp;
INSERT OVERWRITE TABLE wikiPgvws PARTITION (dts = 'targetPartition') SELECT w.* FROM wikiTmp w;
Epilogue :
Check, whether options equivalent to hive -e & hive -f are available in impala. Without them, this script is of no use to you. Again the grep commands to fetch the fileName & timeStamp need to be modified according to your table location and stamp pattern. It's just one a way to show how the job can be done, but couldn't able to find another one.
Enhencement
If everything works well, consider merging the first two DDLs into another script to make it look cleaner. Although, I'm not sure that hql script arguments can be used to define partition values, still you can have a look to replace sed.
Using following command:
insert overwrite local directory '/my/local/filesystem/directory/path'
select * from Emp;
overwrites the entire already existing data in /my/local/filesystem/directory/path with the data of Emp.
What i want is to just copy the data of Emp to /my/loca/filesystem/directory/path and not overwrite, how to do that?
Following are my failed trials:
hive> insert into local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:12 mismatched input 'local' expecting
TABLE near 'into' in insert clause
hive> insert local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:0 cannot recognize input near 'insert'
'local' 'directory' in insert clause
Can u please tell me how can I get this solved?
To appened to a hive table you need to use INSERT INTO:
INSERT INTO will append to the table or partition keeping the existing
data in tact. (Note: INSERT INTO syntax is only available starting in
version 0.8)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
But you can't use this to append to an existing local file so another option is to use a bash command.
If you have a file called 'export.hql' and in that file your code is:
select * from Emp;
Then your bash command can be:
hive -f 'export.hql' >> localfile.txt
The -f command executes the hive file and the >> append pipes the results to the text file.
EDIT:
The command:
hive -f 'export.hql' > localfile.txt
Will save the hive query to a new file, not append.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-SQLOperations
When using 'LOCAL', 'OVERWRITE' is also needed in your hql.
For example:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out' SELECT * FROM test
Hive documentation lacking again:
I'd like to write the results of a query to a local file as well as the names of the columns.
Does Hive support this?
Insert overwrite local directory 'tmp/blah.blah' select * from table_name;
Also, separate question: Is StackOverflow the best place to get Hive Help? #Nija, has been very helpful, but I don't to keep bothering them...
Try
set hive.cli.print.header=true;
Yes you can. Put the set hive.cli.print.header=true; in a .hiverc file in your main directory or any of the other hive user properties files.
Vague Warning: be careful, since this has crashed queries of mine in the past (but I can't remember the reason).
Indeed, #nija's answer is correct - at least as far as I know. There isn't any way to write the column names when doing an insert overwrite into [local] directory ... (whether you use local or not).
With regards to the crashes described by #user1735861, there is a known bug in hive 0.7.1 (fixed in 0.8.0) that, after doing set hive.cli.print.header=true;, causes a NullPointerException for any HQL command/query that produces no output. For example:
$ hive -S
hive> use default;
hive> set hive.cli.print.header=true;
hive> use default;
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Whereas this is fine:
$ hive -S
hive> set hive.cli.print.header=true;
hive> select * from dual;
c
c
hive>
Non-HQL commands are fine though (set,dfs !, etc...)
More info here: https://issues.apache.org/jira/browse/HIVE-2334
Hive does support writing to the local directory. You syntax looks right for it as well.
Check out the docs on SELECTS and FILTERS for additional information.
I don't think Hive has a way to write the names of the columns to a file for the query you're running . . . I can't say for sure it doesn't, but I do not know of a way.
I think the only place better than SO for Hive questions would be the mailing list.
I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.
create table new_table as
select
field1,
field2,
field3
from
(
select
0 as sort_col, --header row gets lowest number
'field1_name' as field1,
'field2_name' as field2,
'field3_name' as field3
from
some_small_table --table needs at least 1 row
limit 1 --only need 1 header row
union all
select
1 as sort_col, --original query goes here
field1,
field2,
field3
from
main_table
) a
order by
sort_col --make sure header row is first
It's a little bulky, but at least you can get what you need with a single query.
Hope this helps!
Not a great solution, but here is what I do:
create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;
hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt
cat header.txt 000* > all.dat
Here's my take on it. Note, i'm not very well versed in bash, so improvements suggestions welcome :)
#!/usr/bin/env bash
# works like this:
# ./get_data.sh database.table > data.csv
INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}
HEADER=`hive -e "
set hive.cli.print.header=true;
use $DB;
INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
row format delimited
fields terminated by ','
SELECT * FROM $TABLE;"`
HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*