How to read specific setting from Hive shell - hadoop

So you can set a setting in the Hive console with:
hive> set hive.enforce.bucketing=true
And you can view ALL of the settings with:
hive> set
or
hive> set -v
But how do you read the current value of a specified setting from the Hive console?
hive> hive.enforce.bucketing;
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1074)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
...
FAILED: ParseException line 1:0 cannot recognize input near 'hive' '.' 'enforce'
Right now I'm redirecting hive -e 'set' to file and then using grep. Is there a better way?

Simply use set and the property name without a value
hive> set mapreduce.input.fileinputformat.split.maxsize;
mapreduce.input.fileinputformat.split.maxsize=256000000

Related

How to pass multiple parameter in hive script

employee:
Table data
I want to fetch records of year=2016 by running hive script sample.hql.
use octdb;
select * from '${hiveconf:table}' where year = '${hiveconf:year}';
[cloudera#quickstart ~]$ hive -hiveconf table='employee', year=2016 -f sample.hql
But i am getting error NoViableAltException(307#[]).......
You need to use the --hiveconf option twice:
hive --hiveconf table=employee --hiveconf year=2016 -f sample.hql
You should use --hivevar instead with newer Hive versions. Earlier, developers were able to set configuration using --hiveconf and it was also used for variables. However, later --hivevar was implemented to have separate namespace for variables as mentioned in HIVE-2020.
Use following with beeline
beeline --hivevar table=employee --hivevar year=2016 -f sample.hql
With this, in the Hive script file you can access this variables directly or using hivevar namespace like below.
select * from ${table};
select * from ${hivevar:table};
Please, note that you may need to specify URL string using -u <db_URL> option.
By doing R&D found the correct answer, ${hiveconf:table} should define in script without ' '.
sample.hql:-
use ${hiveconf:database};
select * from ${hiveconf:table} where year = ${hiveconf:year};
Running sample.hql
[cloudera#quickstart shell]$ hive -hiveconf database=octdb -hiveconf table=employee -hiveconf year=2016 -f sample.hql
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 1.484 seconds
OK
1 A 2016
2 B 2016
4 D 2016
Time taken: 4.423 seconds, Fetched: 3 row(s)
Passing variables can also be achieved through "hivevar" along with "hiveconf".
Here is the difference:
The hiveconf namespace was added and (--hiveconf) should be used to set Hive configuration values.
The hivevar namespace was added and (--hivevar) should be used to define user variables.
Using hiveconf will also work, but isn't recommended for variable substitution as hivevar is explicitly created for that purpose.
set hivevar:YEAR=2018;
SELECT * from table where year=${YEAR};
hive --hiveconf var='hello world' -e '!echo ${hiveconf:var};'
-- this will print: hello world

Hive query in Shell Script

I have an external hive table on top of a parquet file.
CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';
I want to get the count of table using shell script.
I tried with following command
myVar =$(hive -S -e " select count(*) from parquet_test;")
echo $myVar
Added -S to run hive in silent mode still I get whole map reduce log and count in the myVar variable. How to get only count.
I don't have access to any of the configuration file to enable or disable the level of logging. Is there any other way?
Finally found a work around.
First flushed the query result into a file in HDFS then read answer from file.
The file only contains the result of the query.
(hive -S -e " INSERT OVERWRITE LOCAL DIRECTORY '/home/test/result/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select count(*) from parquet_test;")
Then reading the file into a variable
Count var=$(hdfs dfs -tail /home/test/result/)
echo $var
Thank you
myVar=$(eval "hive -S -e 'select count(*) from parquet_test;' ")
echo $myVar

Set variables in hive scripts using command line

I have checked the related thread - How to set variables in HIVE scripts
Inside hive the variable is working fine:
hive> set hivevar:cal_month_end='2012-01-01';
hive> select ${cal_month_end};
But when I run this through command line:
$ hive -e "set hivevar:cal_month_end='2012-01-01';select '${cal_month_end}';"
It keeps giving me below error:
Error: java.lang.IllegalArgumentException: Can not create a Path from
an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:131)
at org.apache.hadoop.fs.Path.(Path.java:139)
at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:110)
at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:463)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:411)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1469)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
You have to escape few characters. This is working for me.
hive -e "set hivevar:cal_month_end=\'2012-01-01\';select '\${cal_month_end}';"
you have to get the " and ' right.
use this :
hive -e 'set hivevar:cal_month_end="2012-01-01";select ${cal_month_end};'
I finally know what's went wrong. The problem is in command line I can't just select something, it needs to from some table. Below is working fine.
$ hive -e "set hivevar:cal_month_end='2012-01-01';select * from foo where start_time > '${cal_month_end}' limit 10"
You can also set variables as an argument of the hive command:
hive --hivevar cal_month_end='2012-01-01' -e "select '${cal_month_end}';"

Hive INSERT OVERWRITE to Google Storage as LOCAL DIRECTORY not working

I use the following Hive Query:
hive> INSERT OVERWRITE LOCAL DIRECTORY "gs:// Google/Storage/Directory/Path/Name" row format delimited fields terminated by ','
select * from <HiveDatabaseName>.<HiveTableName>;
I am getting the following error:
"Error: Failed with exception Wrong FS:"gs:// Google/Storage/Directory/PathName", expected: file:///
What could I be doing wrong?
Remove Local from your syntax.
See the below syntax
INSERT OVERWRITE DIRECTORY 'gs://Your_Bucket_Path/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY "\n"
SELECT * FROM YourExistingTable;
There's a bug in Hive, including IIRC Hive 1.2.1, where it uses the configured fs.default.name or fs.defaultFS for its scratchdir even if the table path is in a different filesystem. In your case, it appears you have the out-of-the-box defaults setting fs.defaultFS to file:///, which is why it says "expected: file:///". On a distributed Hadoop cluster, you might see it say "expected: hdfs://..." instead.
You can fix it within the single hive prompt by overriding fs.default.name and fs.defaultFS:
> set fs.default.name=gs://your-bucket/
> set fs.defaultFS=gs://your-bucket/
You may also want to modify those entries inside your core-site.xml file to point at your GCS location to make it easier.

Hive error: parseexception missing EOF

I am not sure what I am doing wrong here:
hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
tblproperties ("orc.compress"="NONE")
LOCATION "/user/hive/test_table";
FAILED: ParseException line 1:107 missing EOF at 'LOCATION' near ')'
while the following query works perfectly fine:
hive> CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
tblproperties ("orc.compress"="NONE");
OK
Time taken: 0.106 seconds
Am I missing something here. Any pointers will help. Thanks!
Try put the "LOCATION" in front of "tblproperties" like below, worked for me.
CREATE TABLE default.testtbl(int1 INT,string1 STRING)
stored as orc
LOCATION "/user/hive/test_table"
tblproperties ("orc.compress"="NONE");
It seems even the sample SQL from book "Programming Hive" got the order wrong. Please reference to the official definition of create table command:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
#Haiying Wang pointed out that LOCATION is to be put in front of tblproperties.
But I think the error also occurs when location is specified above stored as.
Its better to stick to the correct order:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
Refer: Hive Create Table
Check this post:
Loading Data from a .txt file to Table Stored as ORC in Hive
And check your source files present at the specified directory /user/hive/test_table. Incase the files are in .txt or some other non ORC format then you can follow the steps in the above post to come out of the error.
ParseException line lineNumber missing EOF at '.' near 'schemaName':
Got the above error while trying to execute the following command from linux script to truncate a hive table
dse -u username -p password hive -e "truncate table keyspace.tablename;"
Fix:
Need to separate the commands within the script line as follows -
dse -u username -p password hive -e "use keyspace; truncate table keyspace.tablename;"
Happy coding!
Got the same error while creating a table in hive.
I used the drop command to drop the table and then run the create table command that I had again.
Worked for me.
If you see this error when running the HiveQL from a file with the command "hive -f file.hql". And that it points the first line of your query most definitely this is because of a forgotten semicolon(;) for a previous query.
Since parser looks for semicolon(;) as a terminator for each query.
for example:
DROP TABLE IF EXISTS default.emp
create table default.emp (
field1 type,
field2 type)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 's3://gts-promocube/source-data/Lowes/POS/';
If you save the above in a file and execute it with hive -f, then you'll get the error:
FAILED: ParseException line 2:0 missing EOF at 'CREATE' near emp.
Solution: Put a semicolon(;) for the DROP TABLE command above.

Resources