I have multiple hive queries in hive_queries.hql. I want to keep a log tracking the exit status of individual queries. Also if possible, I want to change the individual queries to fetch the data such as I want to change the query
"select * from ABC"
to
"load data local inpath '<path>/<folder_name>' select * from ABC"
I want to keep a log tracking the exit status of individual queries
As per my knowledge there is no standard way to track the exit status of individual queries being run through .hql file. What you may do:
Output your data in a hive table format.
Check for _SUCCESS file at the warehouse location/output location (if it is an external table or using INSERT OVERWRITE) to determine for failure.
I want to change the individual queries to fetch the data such as I
want to change the query "select * from ABC" to "load data local
inpath '/' select * from ABC"
There is a trick to use hiveconf to achieve this.
Write your query like
`${hiveconf:start_tag}`
select * from ABC
By this way, basically you are creating a placeholder in the script which may be replaced at runtime. E.g.
if you execute the script as
hive -hiveconf start_tag= -f my_script.hql
Then your query will be executed as
select * from ABC
if you execute the script as
hive -hiveconf start_tag="load data local inpath '<path>/<folder_name>'" -f my_script.hql
Then your query will be executed as
load data local inpath '<path>/<folder_name>'
select * from ABC
Related
Is there a way to catch all schema + table name info in a single command through Hive in a similar way to
SELECT * FROM information_schema.tables
from the PostgreSQL world?
show databases and show tables combined in a loop [here an example] is an answer, but I'm looking for a more compact way to have the same result in a single command.
It's been long I have worked on Hive Queries but as far as I remember you can probably use
hive> desc formatted tableName;
or
hive> describe formatted tableName;
It will give you all the relevant information related to the Table like the Schema, Partition info, Table Type like Managed Table, etc
I am not sure If you are particularly looking for this ??
There is another way to query Hive Tables, is writing Hive Scripts which can be called from Hadoop Terminal rather than from Hive Terminal itself.
std]$ cat sample.hql or vi sample.hql
use dbName;
select * from tableName;
desc formatted tableName;
# this hql script can be called from outside the hive terminal
std]$ hive -f sample.hql
or, without even have to write script file you can probably query hive as
std]$ hive -e "use dbName; select * from emp;" > text.txt or >> to append
On the Database level, you can probably query as :
hive> use dbName;
hive> set hive.cli.print.current.db=true;
hive(dbName)> describe database dbName;
it will bring metadata from MySQL(metastore) about the Database.
I want to export query results from Impala to a csv file through UNIX shell script. I want to write to csv only if the query returns any rows. If it doesn't return any row, the code should send mail saying no records found. If the impala query returns any row, it should write to a csv file.
My current script can export results to csv. But when no rows are fetched, a blank csv gets generated.
impala_connection="impala-shell -k --ssl -i 1.1.1.1"
mail_id="abc#def.com"
status="Fail"
query="select process_id, batch_id, job_name, status_cd from table_name where status_cd=\"$status\";"
$impala_shell -B -q "$query" -o /local/job_failures.csv '--output_delimiter=,'
I have an external hive table on top of a parquet file.
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
I want to get the count of table using shell script.
I tried with following command
myVar =$(hive -S -e " select count(*) from parquet_test;")
echo $myVar
Added -S to run hive in silent mode still I get whole map reduce log and count in the myVar variable. How to get only count.
I don't have access to any of the configuration file to enable or disable the level of logging. Is there any other way?
Finally found a work around.
First flushed the query result into a file in HDFS then read answer from file.
The file only contains the result of the query.
(hive -S -e " INSERT OVERWRITE LOCAL DIRECTORY '/home/test/result/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select count(*) from parquet_test;")
Then reading the file into a variable
Count var=$(hdfs dfs -tail /home/test/result/)
echo $var
Thank you
myVar=$(eval "hive -S -e 'select count(*) from parquet_test;' ")
echo $myVar
I can execute a query from a sql file and store the output in a local file using
hive -f /home/Prashasti/test.sql > /home/Prashasti/output.csv
Also, I can store the output of a hive query in hdfs using :
insert overwrite directory 'user/output' select * from folders;
Is there any way I can run the query from a sql file and store the output in hdfs too?
Just modify the sql file and add the insert overwrite directory 'user/output' to the front of the query.
I have data in HDFS. And I wanted to load that data into hbase and hive table.
I have written a bash shell script in which I have written a pig script to load the data form HDFS to HBASE and also written hive script to load the data from HDFS to HIVE table which are working perfectly fine.Here my HDFS data files are with the same structure and I'm loading all the data files into single hbase and hive table.
Now my query is suppose if I receive some more data files in HDFS directory and if I run the shell script again it will create hbase and hive table again with the same name and tells table already exists. How can I write a hive and hbase query so that 1st it will check for the table existence, if table does not exists it create the table for the 1st time and load the data from HDFS to HBASE & Hive table. If the table is already exists then it will just insert the data into an existing hbase and hive table. It should not overwrite the data alreday exists in the tables.
How this can be done ?
Below is my script file: myScript.sh
echo "create 'goodtable','gt'" | hbase shell
pig -f a.pig -param input=/user/user/d/
hive -f h.hql
Where a.pig :
G = LOAD '$input' USING PigStorage(',') as (c1:chararray, c2:chararray,c3:chararray,c4:chararray,c5:chararray);
STORE G INTO 'hbase://goodtable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('gt:name gt:state gt:phone_no gt:gender');
h.hql:
create external table hive_table(
id int,
name string,
state string,
phone_no int,
gender string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/user/d/' INTO TABLE hive_table;
I just wanted to add an example for HBase as Hive was already covered before:
if [[ $(echo "exists 'goodtable'" | hbase shell | grep 'not exist') ]];
then
echo "create 'goodtable','gt'" | hbase shell;
fi
For HIVE, you can add the command IF NOT EXISTS in the CREATE TABLE statement. See the documentation
I don't have much experience on Hbase, but I believe you can use EXISTS table_name command to check whether the table exists and then create the table is it doesn't exist. See here
#visakh is correct - you can see if table exists in HBase by entering the HBase shell, and typing : exists '<tablename>
In order to do this without entering the HBase shell interactively, you can create a simple ruby script such as the following:
exists 'mytable'
exit
Let's say you save this to a file called tabletest.rb. You can then execute this script by calling hbase shell tabletest.rb. This will create the following output, which you can then parse from your shell script:
Table tableisthere does exist
0 row(s) in 0.9830 seconds
OR
Table tableisNOTthere does not exist
0 row(s) in 0.9830 seconds
Adding more details for 'all in one' script:
Alternatively, you can create a more advanced script in ruby that checks for table existence and then will create it if needed - this is done calling the HBaseAdmin java api from within the ruby script.
conf = HBaseConfiguration.new
hbaseAdmin = HBaseAdmin.new(conf)
if !hbaseAdmin.tableExists('mytable')
hbaseAdmin.createTable('mytable',...)
end