Hive : How to execute a query from a file and dump the output in hdfs - hadoop

I can execute a query from a sql file and store the output in a local file using
hive -f /home/Prashasti/test.sql > /home/Prashasti/output.csv
Also, I can store the output of a hive query in hdfs using :
insert overwrite directory 'user/output' select * from folders;
Is there any way I can run the query from a sql file and store the output in hdfs too?

Just modify the sql file and add the insert overwrite directory 'user/output' to the front of the query.

Related

Exit status of hive queries in .hql file

I have multiple hive queries in hive_queries.hql. I want to keep a log tracking the exit status of individual queries. Also if possible, I want to change the individual queries to fetch the data such as I want to change the query
"select * from ABC"
to
"load data local inpath '<path>/<folder_name>' select * from ABC"
I want to keep a log tracking the exit status of individual queries
As per my knowledge there is no standard way to track the exit status of individual queries being run through .hql file. What you may do:
Output your data in a hive table format.
Check for _SUCCESS file at the warehouse location/output location (if it is an external table or using INSERT OVERWRITE) to determine for failure.
I want to change the individual queries to fetch the data such as I
want to change the query "select * from ABC" to "load data local
inpath '/' select * from ABC"
There is a trick to use hiveconf to achieve this.
Write your query like
`${hiveconf:start_tag}`
select * from ABC
By this way, basically you are creating a placeholder in the script which may be replaced at runtime. E.g.
if you execute the script as
hive -hiveconf start_tag= -f my_script.hql
Then your query will be executed as
select * from ABC
if you execute the script as
hive -hiveconf start_tag="load data local inpath '<path>/<folder_name>'" -f my_script.hql
Then your query will be executed as
load data local inpath '<path>/<folder_name>'
select * from ABC

Hadoop backend with millions of records insertion

I am new to hadoop, can someone please suggest me how to upload millions of records to hadoop? Can I do this with hive and where can I see my hadoop records?
Until now I have used hive for creation of the database on hadoop and I am accessing it with localhost 50070. But I am unable to load data from csv file to hadoop from terminal. As it is giving me error:
FAILED: Error in semantic analysis: Line 2:0 Invalid path ''/user/local/hadoop/share/hadoop/hdfs'': No files matching path hdfs://localhost:54310/usr/local/hadoop/share/hadoop/hdfs
Can anyone suggest me some way to resolve it?
I suppose initially the data is in the Local file system.
So a simple workflow could be: load data from local to hadoop file system(HDFS), create a hive table over it and then load the data in hive table.
Step 1:
// put in HDFS
$~ hadoop fs -put /local_path/file_pattern* /path/to/your/HDFS_directory
// check files
$~ hadoop fs -ls /path/to/your/HDFS_directory
Step 2:
CREATE EXTERNAL TABLE if not exists mytable (
Year int,
name string
)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as TEXTFILE;
// display table structure
describe mytable;
Step 3:
Load data local INPATH '/path/to/your/HDFS_directory'
OVERWRITE into TABLE mytable;
// simple hive statement to fetch top 10 records
SELECT * FROM mytable limit 10;
You should use LOAD DATA LOCAL INPATH <local-file-path> to load the files from local directory to Hive tables.
If you dont specify LOCAL , then load command will assume to lookup the given file path from HDFS location to load.
Please refer below link,
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables

Checking the table existence and loading the data into Hbase and HIve table

I have data in HDFS. And I wanted to load that data into hbase and hive table.
I have written a bash shell script in which I have written a pig script to load the data form HDFS to HBASE and also written hive script to load the data from HDFS to HIVE table which are working perfectly fine.Here my HDFS data files are with the same structure and I'm loading all the data files into single hbase and hive table.
Now my query is suppose if I receive some more data files in HDFS directory and if I run the shell script again it will create hbase and hive table again with the same name and tells table already exists. How can I write a hive and hbase query so that 1st it will check for the table existence, if table does not exists it create the table for the 1st time and load the data from HDFS to HBASE & Hive table. If the table is already exists then it will just insert the data into an existing hbase and hive table. It should not overwrite the data alreday exists in the tables.
How this can be done ?
Below is my script file: myScript.sh
echo "create 'goodtable','gt'" | hbase shell
pig -f a.pig -param input=/user/user/d/
hive -f h.hql
Where a.pig :
G = LOAD '$input' USING PigStorage(',') as (c1:chararray, c2:chararray,c3:chararray,c4:chararray,c5:chararray);
STORE G INTO 'hbase://goodtable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('gt:name gt:state gt:phone_no gt:gender');
h.hql:
create external table hive_table(
id int,
name string,
state string,
phone_no int,
gender string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/user/d/' INTO TABLE hive_table;
I just wanted to add an example for HBase as Hive was already covered before:
if [[ $(echo "exists 'goodtable'" | hbase shell | grep 'not exist') ]];
then
echo "create 'goodtable','gt'" | hbase shell;
fi
For HIVE, you can add the command IF NOT EXISTS in the CREATE TABLE statement. See the documentation
I don't have much experience on Hbase, but I believe you can use EXISTS table_name command to check whether the table exists and then create the table is it doesn't exist. See here
#visakh is correct - you can see if table exists in HBase by entering the HBase shell, and typing : exists '<tablename>
In order to do this without entering the HBase shell interactively, you can create a simple ruby script such as the following:
exists 'mytable'
exit
Let's say you save this to a file called tabletest.rb. You can then execute this script by calling hbase shell tabletest.rb. This will create the following output, which you can then parse from your shell script:
Table tableisthere does exist
0 row(s) in 0.9830 seconds
OR
Table tableisNOTthere does not exist
0 row(s) in 0.9830 seconds
Adding more details for 'all in one' script:
Alternatively, you can create a more advanced script in ruby that checks for table existence and then will create it if needed - this is done calling the HBaseAdmin java api from within the ruby script.
conf = HBaseConfiguration.new
hbaseAdmin = HBaseAdmin.new(conf)
if !hbaseAdmin.tableExists('mytable')
hbaseAdmin.createTable('mytable',...)
end

Exporting Data from Hive table to Local Machine File System

Using following command:
insert overwrite local directory '/my/local/filesystem/directory/path'
select * from Emp;
overwrites the entire already existing data in /my/local/filesystem/directory/path with the data of Emp.
What i want is to just copy the data of Emp to /my/loca/filesystem/directory/path and not overwrite, how to do that?
Following are my failed trials:
hive> insert into local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:12 mismatched input 'local' expecting
TABLE near 'into' in insert clause
hive> insert local directory '/home/cloudera/Desktop/Sumit' select * from appdata;
FAILED: ParseException line 1:0 cannot recognize input near 'insert'
'local' 'directory' in insert clause
Can u please tell me how can I get this solved?
To appened to a hive table you need to use INSERT INTO:
INSERT INTO will append to the table or partition keeping the existing
data in tact. (Note: INSERT INTO syntax is only available starting in
version 0.8)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
But you can't use this to append to an existing local file so another option is to use a bash command.
If you have a file called 'export.hql' and in that file your code is:
select * from Emp;
Then your bash command can be:
hive -f 'export.hql' >> localfile.txt
The -f command executes the hive file and the >> append pipes the results to the text file.
EDIT:
The command:
hive -f 'export.hql' > localfile.txt
Will save the hive query to a new file, not append.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-SQLOperations
When using 'LOCAL', 'OVERWRITE' is also needed in your hql.
For example:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/out' SELECT * FROM test

how to append hadoop job id to hive query result file?

I have a hive query insert overwrite to local file system. My query is as the following:
insert overwrite local directory /home/test/dds
select col1, col2 from test_table where query_ymd='2011-05-15' or query_ymd='2011-05-16' or query_ymd='2011-05-17';
It generates 2 files:
.000000_0.crc
000000_0
I would like the output to be:
attempt_201303210330_19069_r_000000_0
attempt_201303210330_19069_r_000000_0.crc
How can I config the hive server or query?
one HQL has some jobs,not only one.So,you can not do this.

Resources