I have a hive script that executes some DMLs and drop some tables, and executes some shell-delete files. I am firing the script using hive -f myscript.hql.
From within the script I need to remove files from local directory. I tried to use !rm /home/myuser/temp_table_id_*; throws error:
rm: cannot remove ‘/home/myuser/temp_table_id_*’: No such file or directory
Command failed with exit code = 1
* is not working.
Here is a sample script:
--My HQL File--
INSERT OVERWRITE ....
...
..;
DROP TABLE TEMP_TABLE;
!hadoop fs -rm -r /user/myuser/ext_tables/temp_table;
!rm /home/myuser/temp_table_id_*;
CREATE TABLE NEW_TABLE(
....
...
;
I am calling the script with the command: hive -f myscript.hql
The script is running fine till it finds the line :!rm /home/myuser/temp_table_id_*; where is cursing about the *.
When I am providing separate file names instead of the *, its working.
But i wish to use *.
Try
dfs -rm /home/myuser/temp_table_id_*;
in the HQL. Wildcard works well with hive dfs commands.
From Hive docs
dfs <dfs command> -Executes a dfs command from the Hive shell.
Related
i am writing a bash script to export dynamic sql query into a hql file in HDFS directory.I am going to run this bash through oozie.
sql_v= select 'create table table_name from user_tab_columns where ...;'
beeline -u "$sql_v" > local_path
sql_v variable will store dynamic create table command which i want to store in a hql file in hdfs directory. If i run above 2 steps it runs fine because i am storing data in local path but instead of passing local_path i want to store sql in hdfs directory.Is there a way i can pass hdfs path instead of local_path like below but this doesn't work. Can i use any other command instead of beeline to achieve this ?
beeline -u "$sql_v" | hdfs dfs -appendToFile -
If the goal is to write the output of beeline to hdfs file then below options should work fine since both commands will pipe the standard output of beeline to hadoop commands as input which is recognized by (-).
beeline -u beeline_connection_string .... -e "$sql_v" | hadoop fs -put - /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" | hadoop fs -appendToFile - /user/userid/file.hql
Note:
1. It's a little unclear based on your question and comments on why can't you use the suggestion given by #cricket_007 and why to go for a beeline in particular.
echo "$sql_v" > file.hql
hadoop fs -put file.hql /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" > file.hql
hadoop fs -appendToFile file.hql /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" > file.hql
hadoop fs -put file.hql /user/userid/file.hql
If oozie shell action is used to run the bash script which containing the sql_v and beeline command, beeline needs to be present in the node where shell action will run if not you will face beeline not found an error.
Refer: beeline-command-not-found-error
I am trying to execute a beeline hql file with the following contents.
INSERT OVERWRITE DIRECTORY "${hadoop_temp_output_dir}${file_pattern}${business_date}" select data from database.${table}
I am executing the script using the following command:
beeline -u "jdbc:hive2://svr.us.XXXX.net:10000/;principal=hive/svr.us.XXXX.net#NAEAST.COM" --hivevar hadoop_temp_output_dir=/tenants/demo/hive/database/ --hivevar file_pattern=sales --hivevar business_date=20180709 -f beeline_test.hql
I see the variables are not getting substituted while they are getting executed in the hive environment. What is the mistake I made here.
Also, how to setup init.hql(for all configurations) and execute this hql file
EDIT:I got the answer: I just used double quotes for the variables and corrected few typos
I am new to Hive and wanted to know how to execute hive commoands directly from .hql file.
As mentioned by #rajshukla4696, both hive -f filename.hql or beeline -f filename will work.
You can also execute queries from the command line via "-e":
hive -e "select * from my_table"
There are plenty of useful command line arguments for Hive that can be found here: Hive Command line Options
hive -f filepath;
example-hive -f /home/Nitin/Desktop/Hive/script1.hql;
Use hive -f filename.hql;
Remember to terminate your command with ;
simple working beeline query below; when i put in script it will run but I want to put a hivevar for the path, how do I accomplish this as when i put in my script .properties file the ='path' does not seem to work. I am missing something with these single quotes i think and I just can't seem to get it to work.
maxValQuery.hql
WORKING: INSERT OVERWRITE DIRECTORY '/user/tmp/maxVal' select max(${hivevar:MAX_VAL_COL}) from ${hivevar:FACT_TABLE};
WANTED: INSERT OVERWRITE DIRECTORY ${hivevar:PATH_ON_HDFS} select max(${hivevar:MAX_VAL_COL}) from ${hivevar:FACT_TABLE};
script.sh
#! /bin/bash
# I want to add --hivevar PATH_ON_HDFS=${maxValPathOnHDFS}
beeline \
-u $hiveServer2 \
--hivevar DATABASE_NAME_ON_HIVE=${dbNameOnHive} \
--hivevar FACT_TABLE=${mainFactTableOnHive} \
--hivevar MAX_VAL_COL=${factTableIncrementalColumn} \
-f ${maxValQueryFile}
script.properties
dbNameOnHive=poc
mainFactTableOnHive=factTable
factTableIncrementalColumn=aTimeColumn
maxValQueryFile=maxValQuery.hql
#maxValPathOnHDFS='/user/tmp/maxVal'
#I believe problem is above with the single quotes, yes I uncomment when i execute :P
removed single quotes from properties file and added around hivevar in query:
#maxValPathOnHDFS=/user/tmp/maxVal & '${hivevar:PATH_ON_HDFS}'
I am trying to write a shell script that opens grunt shell, runs a pig file in it and then copies the output files to local machine. Is this possible? Any links will be helpful!
you can run a pig script from the command line:
#> pig -f script.txt
The tail end of the script can execute fs commands to 'get' the data back to the local filesystem
grunt> fs -get /path/in/hdfs /local/path