Set variables in hive scripts using command line - hadoop

I have checked the related thread - How to set variables in HIVE scripts
Inside hive the variable is working fine:
hive> set hivevar:cal_month_end='2012-01-01';
hive> select ${cal_month_end};
But when I run this through command line:
$ hive -e "set hivevar:cal_month_end='2012-01-01';select '${cal_month_end}';"
It keeps giving me below error:
Error: java.lang.IllegalArgumentException: Can not create a Path from
an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:131)
at org.apache.hadoop.fs.Path.(Path.java:139)
at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:110)
at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:463)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1469)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

You have to escape few characters. This is working for me.
hive -e "set hivevar:cal_month_end=\'2012-01-01\';select '\${cal_month_end}';"

you have to get the " and ' right.
use this :
hive -e 'set hivevar:cal_month_end="2012-01-01";select ${cal_month_end};'

I finally know what's went wrong. The problem is in command line I can't just select something, it needs to from some table. Below is working fine.
$ hive -e "set hivevar:cal_month_end='2012-01-01';select * from foo where start_time > '${cal_month_end}' limit 10"

You can also set variables as an argument of the hive command:
hive --hivevar cal_month_end='2012-01-01' -e "select '${cal_month_end}';"

Related

How to pass multiple parameter in hive script

employee:
Table data
I want to fetch records of year=2016 by running hive script sample.hql.
use octdb;
select * from '${hiveconf:table}' where year = '${hiveconf:year}';
[cloudera#quickstart ~]$ hive -hiveconf table='employee', year=2016 -f sample.hql
But i am getting error NoViableAltException(307#[]).......
You need to use the --hiveconf option twice:
hive --hiveconf table=employee --hiveconf year=2016 -f sample.hql
You should use --hivevar instead with newer Hive versions. Earlier, developers were able to set configuration using --hiveconf and it was also used for variables. However, later --hivevar was implemented to have separate namespace for variables as mentioned in HIVE-2020.
Use following with beeline
beeline --hivevar table=employee --hivevar year=2016 -f sample.hql
With this, in the Hive script file you can access this variables directly or using hivevar namespace like below.
select * from ${table};
select * from ${hivevar:table};
Please, note that you may need to specify URL string using -u <db_URL> option.
By doing R&D found the correct answer, ${hiveconf:table} should define in script without ' '.
sample.hql:-
use ${hiveconf:database};
select * from ${hiveconf:table} where year = ${hiveconf:year};
Running sample.hql
[cloudera#quickstart shell]$ hive -hiveconf database=octdb -hiveconf table=employee -hiveconf year=2016 -f sample.hql
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 1.484 seconds
OK
1 A 2016
2 B 2016
4 D 2016
Time taken: 4.423 seconds, Fetched: 3 row(s)
Passing variables can also be achieved through "hivevar" along with "hiveconf".
Here is the difference:
The hiveconf namespace was added and (--hiveconf) should be used to set Hive configuration values.
The hivevar namespace was added and (--hivevar) should be used to define user variables.
Using hiveconf will also work, but isn't recommended for variable substitution as hivevar is explicitly created for that purpose.
set hivevar:YEAR=2018;
SELECT * from table where year=${YEAR};
hive --hiveconf var='hello world' -e '!echo ${hiveconf:var};'
-- this will print: hello world

Hive query in Shell Script

I have an external hive table on top of a parquet file.
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
I want to get the count of table using shell script.
I tried with following command
myVar =$(hive -S -e " select count(*) from parquet_test;")
echo $myVar
Added -S to run hive in silent mode still I get whole map reduce log and count in the myVar variable. How to get only count.
I don't have access to any of the configuration file to enable or disable the level of logging. Is there any other way?
Finally found a work around.
First flushed the query result into a file in HDFS then read answer from file.
The file only contains the result of the query.
(hive -S -e " INSERT OVERWRITE LOCAL DIRECTORY '/home/test/result/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select count(*) from parquet_test;")
Then reading the file into a variable
Count var=$(hdfs dfs -tail /home/test/result/)
echo $var
Thank you
myVar=$(eval "hive -S -e 'select count(*) from parquet_test;' ")
echo $myVar

Using beeline to run some sql on remote impala failed

when I useing this command got an error:
$ beeline --silent=true -u 'jdbc:hive2://[ip]:21050/[database];auth=noSasl' -n 'username' -p 'password' -e 'use [database]; create table test_table (id int, name string);'
Error: AnalysisException: Could not resolve table reference: 'arcaccessdenied' (state=HY000,code=0)
How to fix this issue?
After test, I fixed this issue.
This is because I named the wrong table name.
"." dot is not allowed in the table name.
Right:
impala_test
Wrong:
impala_type.normal_test

How to read specific setting from Hive shell

So you can set a setting in the Hive console with:
hive> set hive.enforce.bucketing=true
And you can view ALL of the settings with:
hive> set
or
hive> set -v
But how do you read the current value of a specified setting from the Hive console?
hive> hive.enforce.bucketing;
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1074)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
...
FAILED: ParseException line 1:0 cannot recognize input near 'hive' '.' 'enforce'
Right now I'm redirecting hive -e 'set' to file and then using grep. Is there a better way?
Simply use set and the property name without a value
hive> set mapreduce.input.fileinputformat.split.maxsize;
mapreduce.input.fileinputformat.split.maxsize=256000000

Hive: writing column headers to local file?

Hive documentation lacking again:
I'd like to write the results of a query to a local file as well as the names of the columns.
Does Hive support this?
Insert overwrite local directory 'tmp/blah.blah' select * from table_name;
Also, separate question: Is StackOverflow the best place to get Hive Help? #Nija, has been very helpful, but I don't to keep bothering them...
Try
set hive.cli.print.header=true;
Yes you can. Put the set hive.cli.print.header=true; in a .hiverc file in your main directory or any of the other hive user properties files.
Vague Warning: be careful, since this has crashed queries of mine in the past (but I can't remember the reason).
Indeed, #nija's answer is correct - at least as far as I know. There isn't any way to write the column names when doing an insert overwrite into [local] directory ... (whether you use local or not).
With regards to the crashes described by #user1735861, there is a known bug in hive 0.7.1 (fixed in 0.8.0) that, after doing set hive.cli.print.header=true;, causes a NullPointerException for any HQL command/query that produces no output. For example:
$ hive -S
hive> use default;
hive> set hive.cli.print.header=true;
hive> use default;
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Whereas this is fine:
$ hive -S
hive> set hive.cli.print.header=true;
hive> select * from dual;
c
c
hive>
Non-HQL commands are fine though (set,dfs !, etc...)
More info here: https://issues.apache.org/jira/browse/HIVE-2334
Hive does support writing to the local directory. You syntax looks right for it as well.
Check out the docs on SELECTS and FILTERS for additional information.
I don't think Hive has a way to write the names of the columns to a file for the query you're running . . . I can't say for sure it doesn't, but I do not know of a way.
I think the only place better than SO for Hive questions would be the mailing list.
I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.
create table new_table as
select
field1,
field2,
field3
from
(
select
0 as sort_col, --header row gets lowest number
'field1_name' as field1,
'field2_name' as field2,
'field3_name' as field3
from
some_small_table --table needs at least 1 row
limit 1 --only need 1 header row
union all
select
1 as sort_col, --original query goes here
field1,
field2,
field3
from
main_table
) a
order by
sort_col --make sure header row is first
It's a little bulky, but at least you can get what you need with a single query.
Hope this helps!
Not a great solution, but here is what I do:
create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;
hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt
cat header.txt 000* > all.dat
Here's my take on it. Note, i'm not very well versed in bash, so improvements suggestions welcome :)
#!/usr/bin/env bash
# works like this:
# ./get_data.sh database.table > data.csv
INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}
HEADER=`hive -e "
set hive.cli.print.header=true;
use $DB;
INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
row format delimited
fields terminated by ','
SELECT * FROM $TABLE;"`
HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*

Resources