Hadoop Hive: Generate Table Name and Attribute Name using Bash script - hadoop

In our environment we do not have access to Hive meta store to directly query.
I have a requirement to generate tablename , columnname pairs for a set of tables dynamically.
I was trying to achieve this by running "describe extended $tablename" to a file for all tables and pick up tablename and column name pairs from the file.
is there any easier way it is done/it can be done other than this way .
The desired output is like
table1|col1
table1|col2
table1|col3
table2|col1
table2|col2
table3|col1

This script will print columns in desired format for single table. AWK parses strings from describe command, takes only column_name, concatenates with "|" and table_name variable, each string printed with \n as a delimiter between them.
#!/bin/bash
#Set table name here
TABLE_NAME=your_schema.your_table
TABLE_COLUMNS=$(hive -S -e "set hive.cli.print.header=false; describe ${TABLE_NAME};" | awk -v table_name="${TABLE_NAME}" -F " " 'f&&!NF{exit}{f=1}f{printf c table_name "|" toupper($1)}{c="\n"}')
You can easily modify it for generating output for all tables using show tables command for example.
The easier way is to access metadata database directly.

Related

Passing a variable to a Hive script file -- works with integer but not string

I need to pass a variable to an hql file in Hive using putty. I've set up a test scenario. Basically I want to select a row from a table where a value equals the variable. It will work when the variable is an integer but not a string.
The hql file /home_dir_users/username/smb_bau/testy.hql has this code in it:
drop table if exists tam_seg.tbl_ppp;
create table tam_seg.tbl_ppp as
select
*
from
tam_seg.1_testy as b
where
b.column_a = ${hivevar:my_var};
tam_seg.1_testy looks like this:
column_a
A
B
C
D
ZZZ
123
I want to use PuTTY to pass the variable my_var to the hql file. It works if I try 123 using this:
hive --hivevar my_var=123 -f /home_dir_users/username/smb_bau/testy.hql
But it doesn't work if I try to select one of the strings. I have tried the below:
hive --hivevar my_var=ZZZ -f /home_dir_users/username/smb_bau/testy.hql
hive --hivevar my_var='ZZZ' -f /home_dir_users/username/smb_bau/testy.hql
my_var='ZZZ'
hive --hivevar my_var=$my_var -f /home_dir_users/username/smb_bau/testy.hql
But every time I get this error message:
*FAILED: SemanticException [Error 10004]: Line 9:14 Invalid table alias or column reference 'ZZZ': (possible column names are: column_a)*
I have also tried hiveconf, only one dash before it instead of two, not having hiveconf or hivevar before the variable in the code file.
Any ideas what am I doing wrong?
Many thanks.
OK so it looks like I have found the answer below through trial and error. I am leaving the post here in case any other users new to Hive find this useful.
I put single quotes round the variable in the hql file so it looks like this:
select
*
from
tam_seg.1_testy as b
where
b.column_a = '${hivevar:my_var}';
In a way this maybe seems obvious -- I would put single quotes round a string if I weren't using a variable. I guess I had my VBA/SQL Server hat on where a variable would not have quotes round it even if it were a string e.g. = strMyVar or = #STR_MY_VAR (otherwise the result would literally be "${hivevar:my_var}" as a string).

String and non string data getting converted to 'null' for empty fields while exporting into Oracle table through hive

I am new to Hadoop and I have a scenario where I have to export the dataset/file from HDFS to Oracle table using sqoop export. The file has values of 'null' in it so same is getting exported in table as well. I want to know how we can replace 'null' with blank in database while exporting?
You can create a TSV file from hive/beeline in that process you can add nulls to be blank with this --nullemptystring=true
Example : beeline -u ${hhiveConnectionString} --outputformat=csv2 --showHeader=false --silent=true --nullemptystring=true --incremental=true -e 'set hive.support.quoted.identifiers =none; select * from someSchema.someTable where whatever > something' > /Your/Local/Location or EdgeNode/exportingfile.tsv
You can use the created file in the sqoop export for exporting to Oracle table.
You can also replace the nulls with blanks on the file with Unix sed
Ex : sed -i s/null//g /Your/file//Your/Local/Location or EdgeNode/exportingfile.tsv
In oracle empty strings and nulls are treated the same for varchars. That is why Oracle internally converts empty strings into nulls for varchar. When '' assigned to a char(1) it becomes ' ' (char types are blank padded strings). See what Tom Kite says about this: https://asktom.oracle.com/pls/asktom/f?p=100:11:0%3a%3a%3a%3aP11_QUESTION_ID:5984520277372
See this manual: https://www.techonthenet.com/oracle/questions/empty_null.php

How to use current timestamp as filename for Hive output

I'm using this code to write the results of a Hive query to the specified file:
INSERT OVERWRITE DIRECTORY '/user/test.user/test.csv'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ESCAPED BY '"' STORED AS TEXTFILE
SELECT
...
I don't want the filename to be test.csv however but the unix timestamp, that is 1517213651.csv or something like that.
I understand I can't use the concat function to manipulate the filename, but that is as far as I got.
How do I get the timestamp of the moment of query execution to be the filename of my output?
EDIT: We're using Cloudera.
Another option is to put the Hive insert inside of a Shell Script. Define a Date variable in the script and then use the Date Variable to define the output file.
TIMESTAMP_VAR=date +"%Y-%m-%d-%H-%M-%S"
FILENAME_VAR=/user/test/${TIMESTAMP_VAR}.csv
You can manipulate the timestamp layout in numerous ways.
you have to add TalendDate.getDate("CCYYMMDD") in file path.
"/File1/Output_File_" + TalendDate.getDate("CCYYMMDD") + ".csv"

Error while exporting the results of a HiveQL query to CSV?

I am a beginner in Hadoop/Hive. I did some research to find out a way to export results of HiveQL query to CSV.
I am running below command line in Putty -
Hive -e ‘use smartsourcing_analytics_prod; select * from solution_archive_data limit 10;’ > /home/temp.csv;
However below is the error I am getting
ParseException line 1:0 cannot recognize input near 'Hive' '-' 'e'
I would appreciate inputs regarding this.
Run your command from outside the hive shell - just from the linux shell.
Run with 'hive' instead of 'Hive'
Just redirecting your output into csv file won't work. You can do:
hive -e 'YOUR QUERY HERE' | sed 's/[\t]/,/g' > sample.csv
like was offered here: How to export a Hive table into a CSV file?
AkashNegi answer will also work for you... a bit longer though
One way I do such things is to create an external table with the schema you want. Then do INSERT INTO TABLE target_table ... Look at the example below:
CREATE EXTERNAL TABLE isvaliddomainoutput (email_domain STRING, `count` BIGINT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
STORED AS TEXTFILE
LOCATION "/user/cloudera/am/member_email/isvaliddomain";
INSERT INTO TABLE isvaliddomainoutput
SELECT * FROM member_email WHERE isvalid = 1;
Now go to "/user/cloudera/am/member_email/isvaliddomain" and find your data.
Hope this helps.

Apache Pig store delimiters

I'm using Pig Latin to store values from an alias into the HDFS. The alias contains a semicolon in one of its fields.
dump A;
(Richard & John, 1993)
(Albert, 1994)
A table that shows the data in the HDFS, but the semicolon makes John go to the next column.
| Name | Year |
|--------------|------|
| Richard &amp | John |
| Albert | 1994 |
Trying to use store like this is also not working as expected:
STORE A INTO '/user/hive/warehouse/test.db/names' using PigStorage('\t');
but even when telling PigStore to use tab as delimiter the semicolon breaks the table data. How can I fix it?
I just locally create a file suppose a.txt and copy your data into this file.
(Richard & John, 1993)
(Albert, 1994)
Now I see that your data is not in tab delimiter form and that's why it split after semicolon part.So to solve this problem i just right a query like this
data = load '/home/hduser/Desktop/a.txt' using PigStorage(',');
dump data;
and my output result is this
((Richard & John, 1993))
((Albert, 1994))
I split it using this
,
because your data looks like this delimiter.
Note: I run it my local file system.So to run it locally you must start your pig using this command pig -x local and give your relevant path.
It seems there was a bug in the create table.
create table test.names
(
name varchar(40),
year varchar(40)
)
row format delimited fields terminated by '\073'
lines terminated by '\n';
The delimiter I used was \073 (semicolon), so changing the PigStorage delimiter had no effect.
I'm using \072 (double colon) and it is now working. I think any other delimiter would work as long as it is not a common or possible character in the input data.

Resources