Passing Local Parameters to Hadoop script

Passing Local Parameters to Hadoop script - hadoop

To my understanding, the following will result in passing a global Hive variable:
hive -hiveconf DATE='01/01/2000' -f test_script.hql
That can be called with
SELECT * FROM DATETABLE WHERE DATE = ${hiveconf:DATE}
And I know that local variables can be defined in the script and called by doing:
set DATE='01/01/2000'
SELECT * FROM DATETABLE WHERE DATE = ${DATE}
But say one wanted to submit many jobs with local parameters set for each script, how can we pass them from the command line?
The emphasis is avoiding one script picking up the hiveconf:DATE set by another script that was submitted in quick succession.
EDIT:
I guess this could work, creating a shell script and passing variables to the shell script and then passing those to the individual queries:
#!/bin/bash
FIRST_QUERY = "SELECT * FROM DATETABLE WHERE DATE = '$DATE'"
hive -e "$FIRST_QUERY"
But this seems inefficient, I would still want to know if the option above is possible.

I found the option -define here:
hive -e 'SELECT * FROM DATETABLE WHERE DATE = ${DATE}' -define DATE='01/01/2000'

Related

Parse Strings in HIVE using Shell

I have a shell script that I use to parse a string variable into Hive, in order to filter my observations. I provide both the script and the hive code below.
In the following script I have a variable which has a string value and I try to parse it into hive, the example below:
Shell Script:
name1='"Maria Nash"' *(I use a single quote first and then a double)*
hive --hiveconf name=${name1} -f t2.hql
Hive code (t2.hql)
create table db.mytable as
SELECT *
FROM db.employees
WHERE emp_name='${hivevar:name}';
Conclusion
To be accurate, the final table is created but it does not contain any observation. The employees table contains observations which has emp_name "Maria Nash" though.
I think that I might not parse the string correctly from shell or I do not follow the correct syntax on how I should handle the parsed variable in the hive query.
I would appreciate your help!

you are passing variable in hiveconf namespace but in the sql script are using hivevar, you should also use hiveconf:
WHERE emp_name=${hiveconf:name} --hiveconf, not hivevar

Use of the CLI is deprecated
you can use beeline from a shell script
it should look something like
beeline << EOF
!connect jdbc:hive2://host:port/db username password
select *
from db.employees
where emp_name = "${1}"
EOF
assuming that $1 is the input from the script.
This is an example of how to do it rather than a production implementation. Generally,
Kerberos would be enabled so username and password wouldn't be there
and a valid token would be available
Validate the input parameters.
Given that you can do it in a single line
beeline -u jdbc:hive2://hostname:10000 -f {full Path to Script} --hivevar {variable}={value}

passing argument from shell script to hive script

I've a concern which can be categorized in 2 ways:
My requirement is of passing argument from shell script to hive script.
OR
within one shell script I should include variable's value in hive statement.
I'll explain with an example for both:
1) Passing argument from shell script to hiveQL->
My test Hive QL:
select count(*) from demodb.demo_table limit ${hiveconf:num}
My test shell script:
cnt=1
sh -c 'hive -hiveconf num=$cnt -f countTable.hql'
So basically I want to include the value of 'cnt' in the HQL, which is not happening in this case. I get the error as:
FAILED: ParseException line 2:0 mismatched input '<EOF>' expecting Number near 'limit' in limit clause
I'm sure the error means that the variable's value isn't getting passed on.
2) Passing argument directly within the shell script->
cnt=1
hive -e 'select count(*) from demodb.demo_table limit $cnt'
In both the above cases, I couldn't pass the argument value. Any ideas??
PS: I know the query seems absurd of including the 'limit' in count but I have rephrased the problem I actually have. The requirement remains intact of passing the argument.
Any ideas, anyone?
Thanks in advance.

Set the variable this way:
#!/bin/bash
cnt=3
echo "Executing the hive query - starts"
hive -hiveconf num=$cnt -e ' set num; select * from demodb.demo_table limit ${hiveconf:num}'
echo "Executing the hive query - ends"

This works, if put in a file named hivetest.sh, then invoked with sh hivetest.sh:
cnt=2
hive -e "select * from demodb.demo_table limit $cnt"
You are using single quotes instead of double.
Using double quotes for OPTION #1 also works fine.

hadoop#osboxes:~$ export val=2;
hadoop#osboxes:~$ hive -e "select * from bms.bms1 where max_seq=$val";
or
vi test.sh
#########
export val=2
hive -e "select * from bms.bms1 where max_seq=$val";
#####################

Try this
cnt=1
hive -hiveconf number=$cnt select * from demodb.demo_table limit ${hiveconf:number}

Hive - How to store a query result in a variable in a Bash script

I need to store the result of a Hive query in a variable whose value will be used later. So, something like:
$var = select col1 from table;
$var_to_used_later = $var;
All this is part of a bash shell script. How to form the query so as to get the desired result?

Hive should provide command line support for you. I am not familiar with hive but I found this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli, you can check whether that works.
Personally, I used mysql to achieve similar goal before. The command is:
mysql -u root -p`[script to generate the key]` -N -B -e "use XXXDB; select aaa, bbb, COUNT(*) from xxxtable where some_attribute='$CertainValue';"

I used the method shown here and got it! Instead of calling a file as shown, I run the query directly and use the value stored in the variable.

How to create a "one-liner" for oracle that includes "set" commands as well as sql statements

I want to execute a dynamic sql containing some set commands. Is it possible to do so without embedding newlines?
set heading off ; set lines 1000 ; select * from my_table;
Note the above does not work due to the semicolons between the set commands:
SP2-0158: unknown SET option ";"'
Update The whole point of this question is to do it on one line.

The best I have found for my own purposes is to put my standard SET commands in a file called sql_settings.txt in a directory with an environment variable holding its path and another variable for the connect string:
sqlsets=/directory/where/sql_settings/stored/sql_settings.txt
db_conn=<ConnectStr>
& then execute a one-liner as such with a shell here-string:
sqlplus -s $db_conn #$sqlsets <<< "select * from my_table;" | less
(The "less" pipe will prevent from cluttering your shell session)
You could also get fancy and create a shell function to minimize typing to the SQL query:
function mydb { sqlplus -s $db_conn #$sqlsets <<< "$#;" ; }
Then call as such:
mydb 'select * from my_table;'

set command is a directive for sqlplus and is not related to sql and you can do it this way
set heading off lines 1000
select * from my_table;

After extensive research, I have concluded this is not possible to perform with oracle.

Hive Parameter in hive query

I have two code.
One is named testing.hql
select dt, '${hiveconf:var}' from temp_table;
other is named testing.sh
temp= date --date='yesterday' +%y%m%d
hive -f testing.hql -hiveconf var=$temp
so basically im trying to pass date value to the query so i can filter the data i process in the query based on the current date.
im running this by this command. easy.
./testing.sh
This doesn't work,, can someone quickly check and see which part am i making a mistake?
So basically what I want to do is
select jobid from temp_table where dt >= '${hiveconf:var}';
so that the jobid i get are only the ones that are done from yesterday, since the
shell script sets paramter as the date for yesterday.
Thanks!
Currently this outputs the empty space after the dt value.

Figured it out.
2 simple bugs.
1) When doing shell scripts, executing lines have to be wrapped with `.
so i did
temp=`date --date='yesterday' +%y%m%d`
hive -f testing.hql -hiveconf var=$temp
and it works like a charm
2) in the query, the parameter must be in double quotes.
select jobid from temp_table where dt >= "${hiveconf:var}";
Hope this question can help others who had this issue.

There is a space after temp= removing that possibly should solve the issue
temp=<blank>date ...

temp variable should be declared as below,
temp=$(date --date='yesterday' +%y%m%d)

You can use BeeTamer for that. It allows to store result (or part of it) in a variable, and use this variable later in your code.
Beetamer is macro extension to Hive or Impala that allows to extend functionality of the Apache Hive and Cloudera Impala engines.
select avg(a) from abc;
%capture MY_AVERAGE;
select * from abc2 where avg_var=#MY_AVERAGE#;
In here you save average value from you query into macro variable MY_AVERAGE and then reusing it in the second query.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Passing Local Parameters to Hadoop script - hadoop

I found the option -define here: hive -e 'SELECT * FROM DATETABLE WHERE DATE = ${DATE}' -define DATE='01/01/2000'

Related

Parse Strings in HIVE using Shell

passing argument from shell script to hive script

Hive - How to store a query result in a variable in a Bash script

How to create a "one-liner" for oracle that includes "set" commands as well as sql statements

Hive Parameter in hive query

Categories

Resources