Correcting a hive script - hadoop

Suppose I am writing any script for exa. creation of table as,
hive (test)> create TABLE tlocal
> (id int,
> name string
> addr string);
FAILED: ParseException line 4:5 mismatched input 'addr' expecting ) near 'string' in create table statement.
Here I forgot to add a comma after name string, so I got the error. I want to add the comma after name string and run again. But, like sql, hive does not allow you to correct only the wrong part of script - I have to rewrite the script again from beginning.
How can I do this?

As Andrew suggested you can write your query in a file and run it using
hive -f <your query file>
Alternatively you can use Hue which is open-source Web interface that supports Apache Hadoop SQL editors for Apache Hive.

Related

Passing a variable to a Hive script file -- works with integer but not string

I need to pass a variable to an hql file in Hive using putty. I've set up a test scenario. Basically I want to select a row from a table where a value equals the variable. It will work when the variable is an integer but not a string.
The hql file /home_dir_users/username/smb_bau/testy.hql has this code in it:
drop table if exists tam_seg.tbl_ppp;
create table tam_seg.tbl_ppp as
select
*
from
tam_seg.1_testy as b
where
b.column_a = ${hivevar:my_var};
tam_seg.1_testy looks like this:
column_a
A
B
C
D
ZZZ
123
I want to use PuTTY to pass the variable my_var to the hql file. It works if I try 123 using this:
hive --hivevar my_var=123 -f /home_dir_users/username/smb_bau/testy.hql
But it doesn't work if I try to select one of the strings. I have tried the below:
hive --hivevar my_var=ZZZ -f /home_dir_users/username/smb_bau/testy.hql
hive --hivevar my_var='ZZZ' -f /home_dir_users/username/smb_bau/testy.hql
my_var='ZZZ'
hive --hivevar my_var=$my_var -f /home_dir_users/username/smb_bau/testy.hql
But every time I get this error message:
*FAILED: SemanticException [Error 10004]: Line 9:14 Invalid table alias or column reference 'ZZZ': (possible column names are: column_a)*
I have also tried hiveconf, only one dash before it instead of two, not having hiveconf or hivevar before the variable in the code file.
Any ideas what am I doing wrong?
Many thanks.
OK so it looks like I have found the answer below through trial and error. I am leaving the post here in case any other users new to Hive find this useful.
I put single quotes round the variable in the hql file so it looks like this:
select
*
from
tam_seg.1_testy as b
where
b.column_a = '${hivevar:my_var}';
In a way this maybe seems obvious -- I would put single quotes round a string if I weren't using a variable. I guess I had my VBA/SQL Server hat on where a variable would not have quotes round it even if it were a string e.g. = strMyVar or = #STR_MY_VAR (otherwise the result would literally be "${hivevar:my_var}" as a string).

Parse Strings in HIVE using Shell

I have a shell script that I use to parse a string variable into Hive, in order to filter my observations. I provide both the script and the hive code below.
In the following script I have a variable which has a string value and I try to parse it into hive, the example below:
Shell Script:
name1='"Maria Nash"' *(I use a single quote first and then a double)*
hive --hiveconf name=${name1} -f t2.hql
Hive code (t2.hql)
create table db.mytable as
SELECT *
FROM db.employees
WHERE emp_name='${hivevar:name}';
Conclusion
To be accurate, the final table is created but it does not contain any observation. The employees table contains observations which has emp_name "Maria Nash" though.
I think that I might not parse the string correctly from shell or I do not follow the correct syntax on how I should handle the parsed variable in the hive query.
I would appreciate your help!
you are passing variable in hiveconf namespace but in the sql script are using hivevar, you should also use hiveconf:
WHERE emp_name=${hiveconf:name} --hiveconf, not hivevar
Use of the CLI is deprecated
you can use beeline from a shell script
it should look something like
beeline << EOF
!connect jdbc:hive2://host:port/db username password
select *
from db.employees
where emp_name = "${1}"
EOF
assuming that $1 is the input from the script.
This is an example of how to do it rather than a production implementation. Generally,
Kerberos would be enabled so username and password wouldn't be there
and a valid token would be available
Validate the input parameters.
Given that you can do it in a single line
beeline -u jdbc:hive2://hostname:10000 -f {full Path to Script} --hivevar {variable}={value}

Error while exporting the results of a HiveQL query to CSV?

I am a beginner in Hadoop/Hive. I did some research to find out a way to export results of HiveQL query to CSV.
I am running below command line in Putty -
Hive -e ‘use smartsourcing_analytics_prod; select * from solution_archive_data limit 10;’ > /home/temp.csv;
However below is the error I am getting
ParseException line 1:0 cannot recognize input near 'Hive' '-' 'e'
I would appreciate inputs regarding this.
Run your command from outside the hive shell - just from the linux shell.
Run with 'hive' instead of 'Hive'
Just redirecting your output into csv file won't work. You can do:
hive -e 'YOUR QUERY HERE' | sed 's/[\t]/,/g' > sample.csv
like was offered here: How to export a Hive table into a CSV file?
AkashNegi answer will also work for you... a bit longer though
One way I do such things is to create an external table with the schema you want. Then do INSERT INTO TABLE target_table ... Look at the example below:
CREATE EXTERNAL TABLE isvaliddomainoutput (email_domain STRING, `count` BIGINT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE
LOCATION "/user/cloudera/am/member_email/isvaliddomain";
INSERT INTO TABLE isvaliddomainoutput
SELECT * FROM member_email WHERE isvalid = 1;
Now go to "/user/cloudera/am/member_email/isvaliddomain" and find your data.
Hope this helps.

Hadoop Hcatalog -How to pass key value pair

I have a create table script where the table name will be decided at runtime. How do I pass the value to sql script?
I'm trying something like this
hcat -e "create table ${D:TAB_NAME} (name string)" -DTAB_NAME=person
But I keep getting errors.
Can I get the correct syntax?
Try this:
hcat -e 'create table ${hiveconf:TAB_NAME} (name string);' -DTAB_NAME=person2
Here are two things to note:
In shell, default variable expansion is $ so your ${D:TAB_NAME} is getting expanded to nothing before even getting passed to hcat parser. So, either escape the $ or use strong quoting using: ''.
Use hiveconf instead of D for variable substitution as hcat under the hoods is still using hive to parse commands.

Hive argument using Variable Substitution (-d|--define) fails with a string argument

When I run the a hive script with the command
hive -d arg_partition1="p1" -f test.hql
It returns the error
FAILED: SemanticException [Error 10004]: Line 3:36 Invalid table alias or column reference 'p1': (possible column names are: line, partition1)
Script with name test.hql
DROP TABLE IF EXISTS test;
CREATE EXTERNAL TABLE IF NOT EXISTS test (Line STRING)
PARTITIONED BY (partition1 STRING);
ALTER TABLE test ADD PARTITION (partition1="p1") LOCATION '/user/test/hive_test_data';
SELECT * FROM test WHERE partition1=${arg_partition1};
If I modify the partition to be an integer then it works fine and returns the correct results.
How do I run a Hive script with a string argument?
You'll have to escape your quotes when invoking hive, such as -d arg_partition1=\"p1\" for this to work.
However, I don't see why you'd have to add the quotes to the replacement string in any case. Presumably you know the data types of your fields when writing the query, so if partition1 is a string then include the quotes in the query, such as WHERE partition1="${arg_partition1}"; and if it's an integer just leave them out entirely.

Resources