Shell Hive - Identify a column in entire Hive Database - bash

The below code not go in loop if I change first line to hive -S -e 'show databases like 'abc_xyz%''|
Can you please help to fix this issue?`
hive -S -e 'show databases'|
while read database
do
eval "hive -S -e 'show tables in $database'"|
while read line
do
if eval "hive -S -e 'describe $database.$line'"| grep -q "<column_name>"; then
output="Required table name: $database.$line"'\n';
else
output=""'\n';
fi
echo -e "$output"
done
done```

Wildcards in the show databases command pattern can only be '*' for any character(s) or '|' for a choice for Hive < 4.0.0.
For example like this:
show databases like 'abc_xyz*|bcd_xyz*'
SQL-style patterns '%' for any character(s), and '_' for a single character work only since Hive 4.0.0

Related

Read input from a csv and append that values to variable using shell script

#!/bin/bash
Dir=""
while read line
do echo "Record is :$line"
Dir+="Dir,$line"
done < dir.csv
echo $Dir
where dir.csv is an input file here which includes following data:
/example/people/
/book/
/book/english/
I would like to append the values of the rows into one variable like
/example/people/,/book/,/book/english/
Is there any easy way to achieve this through shell script? above script is showing only the last value ex:/book/english/
I don't see anything in your code that would cause your script to only show the last value.
This may be an illusion: is your dir.csv file CRLF-delimited? (DOS/Windows format) If so, remove the CR that ends each line with a utility like dos2unix or a command like tr -d '\r'.
Some notes though:
In Dir+="Dir,$line", the string Dir should probably be removed (Dir+=",$line").
You probably want to get rid off the initial comma: Dir=${Dir#,}.
All this can be simplified with the single command below:
Dir=$(paste -s -d, dir.csv)
... or, with CRLF line-endings:
Dir=$(tr -d '\r' < dir.csv | paste -s -d,)
This is easier:
xargs < dir.csv|sed 's/ /,/g'
or if you have CRLF line endings, you can clean those up with:
xargs < dir.csv|tr -d '\r'|sed 's/ /,/g'
The above proposed tr '\n' ',' < dir.csv can add an additional , at the end if the CSV ends with newline

shell read and pass mulptiple variables in for or while loop

Below is my input file and the code I am using
FILE :
cat $TESTFILE
2020-01-13,COST_CH_RPT
2018-04-19,LOSS_CH_RPT
CODE :
for i in `cat $TESTFILE`
do
export date=`cat $TESTFILE|cut -d',' -f1`
echo date=$date
export Name=`cat $TESTFILE|cut -d',' -f2`
echo Name=$Name
beeline --outputformat=csv2 --hivevar Name=$Name --hivevar date=$date -u ${beeURL} -f TEST.hql
done
The objective is to run the hql file for every line in the file , the above code is running twice for the two lines available , but the variables that are being passed for both the runs is the same , which is the first line in the file .
How can i differentiate the input variables for each run.
As noted by comment, the current solution re-process the TESTFILE multiple times, incorrectly. Simpler alternative is to use the 'read' to loop thru the lines:
while IFS=, read date Name ; do
echo beeline --outputformat=csv2 --hivevar Name="$Name" --hivevar date="$date" -u "${beeURL}" -f TEST.hql
done < $TESTFILE
It simply iterate over the line in the TESTFILE, and execute the beeline command. Suggesting using quotes to protect against error in the input file - in particular, spaces, which will 'break' the command line.

Dropping multiple tables with same prefix in Hive

I have few tables in hive that has same prefix like below..
temp_table_name
temp_table_add
temp_table_area
There are few hundreds of tables like this in my database along with many other tables.
I want to delete tables that starts with "temp_table".
Do any of you know any query that can do this work in Hive?
There is no such thing as regular expressions for drop query in hive (or i didn't find them). But there are multipe ways to do it, for example :
With a shell script :
hive -e "show tables 'temp_*'" | xargs -I '{}' hive -e 'drop table {}'
Or by putting your tables in a specific database and dropping the whole database.
Create table temp.table_name;
Drop database temp cascade;
Above solutions are good. But if you have more tables to delete, then running 'hive -e drop table' is slow. So, I used this:
hive -e 'use db;show tables' | grep pattern > file.hql
use vim editor to open file.hql and run below commands
:%s!^!drop table
:%s!$!;
then run
hive -f file.hql
This approach will be much faster.
My solution has been to use bash script with the following cmd:
hive -e "SHOW TABLES IN db LIKE 'schema*';" | grep "schema" | sed -e 's/^/hive -e \"DROP TABLE db\./1' | sed -e 's/$/\"/1' > script.sh
chmod +x script.sh
./script.sh
I was able to delete all tables using following steps in Apache Spark with Scala:
val df = sql("SHOW TABLES IN default LIke 'invoice*'").select("tableName") // to drop only selected column
val df = sql("SHOW TABLES IN default").select("tableName")
val tableNameList: List[String] = df.as[String].collect().toList
val df2 = tableNameList.map(tableName => sql(s"drop table ${tableName}"))
As I had a lot of tables do drop I used the following command, inspired in the #HorusH answer
hive -e "show tables 'table_prefix*'" | sed -e 's/^/ \DROP TABLE db_name\./1' | sed -e 's/$/;/1' > script.sh
hive -f script.sh
Try this:
hive -e 'use sample_db;show tables' | xargs -I '{}' hive -e 'use sample_db;drop table {}'
Below command will also work.
hive -e 'show tables' | grep table_prefix | while read line; do hive -e "drop table $line"; done
fastest solution through one shell script:
drop_tables.sh pattern
Shell script content:
hive -e 'use db;show tables' | grep $1 | sed 's/^/drop table db./' | sed 's/$/;/' > temp.hql
hive -f temp.hql
rm temp.hql

How to get all table definitions in a database in Hive?

I am looking to get all table definitions in Hive. I know that for single table definition I can use something like -
describe <<table_name>>
describe extended <<table_name>>
But, I couldn't find a way to get all table definitions. Is there any table in megastore similar to Information_Schema in mysql or is there command to get all table definitions ?
You can do this by writing a simple bash script and some bash commands.
First, write all table names in a database to a text file using:
$hive -e 'show tables in <dbname>' | tee tables.txt
Then create a bash script (describe_tables.sh) to loop over each table in this list:
while read line
do
echo "$line"
eval "hive -e 'describe <dbname>.$line'"
done
Then execute the script:
$chmod +x describe_tables.sh
$./describe_tables.sh < tables.txt > definitions.txt
The definitions.txt file will contain all the table definitions.
The above processes work, however it will be slow due to the fact that the hive connection is made for each query. Instead you can do what I just did for the same need below.
Use one of the above methods to get your list of tables.
Then modify the list to make it a hive query for each table as follows:
describe my_table_01;
describe my_TABLE_02;
So you will have a flat file with the all your describe statements mentioned above. For example, if you have the query in a flat file called my_table_description.hql.
Get the output in one scoop as follows:
"hive -f my_table_description.hql > my_table_description.output
It is super fast and gets the output in one shot.
Fetch list of hive databases hive -e 'show databases' > hive_databases.txt
Echo each table's desc:
cat hive_databases.txt | grep -v '^$' | while read LINE;
do
echo "## TableName:" $LINE
eval "hive -e 'show tables in $LINE' | grep -v ^$ | grep -v Logging | grep -v tab_name | tee $LINE.tables.txt"
cat $LINE.tables.txt | while read table
do
echo "### $LINE.$table" > $LINE.$table.desc.md
eval "hive -e 'describe $LINE.$table'" >> $LINE.$table.desc.md
sed -i 's/\t/|/g' ./$LINE.$table.desc.md
sed -i 's/comment/comment\n|:--:|:--:|:--:|/g' ./$LINE.$table.desc.md
done
done

Bash variables not acting as expected

I have a bash script which parses a file line by line, extracts the date using a cut command and then makes a folder using that date. However, it seems like my variables are not being populated properly. Do I have a syntax issue? Any help or direction to external resources is very appreciated.
#!/bin/bash
ls | grep .mp3 | cut -d '.' -f 1 > filestobemoved
cat filestobemoved | while read line
do
varYear= $line | cut -d '_' -f 3
varMonth= $line | cut -d '_' -f 4
varDay= $line | cut -d '_' -f 5
echo $varMonth
mkdir $varMonth'_'$varDay'_'$varYear
cp ./$line'.mp3' ./$varMonth'_'$varDay'_'$varYear/$line'.mp3'
done
You have many errors and non-recommended practices in your code. Try the following:
for f in *.mp3; do
f=${f%%.*}
IFS=_ read _ _ varYear varMonth varDay <<< "$f"
echo $varMonth
mkdir -p "${varMonth}_${varDay}_${varYear}"
cp "$f.mp3" "${varMonth}_${varDay}_${varYear}/$f.mp3"
done
The actual error is that you need to use command substitution. For example, instead of
varYear= $line | cut -d '_' -f 3
you need to use
varYear=$(cut -d '_' -f 3 <<< "$line")
A secondary error there is that $foo | some_command on its own line does not mean that the contents of $foo gets piped to the next command as input, but is rather executed as a command, and the output of the command is passed to the next one.
Some best practices and tips to take into account:
Use a portable shebang line - #!/usr/bin/env bash (disclaimer: That's my answer).
Don't parse ls output.
Avoid useless uses of cat.
Use More Quotes™
Don't use files for temporary storage if you can use pipes. It is literally orders of magnitude faster, and generally makes for simpler code if you want to do it properly.
If you have to use files for temporary storage, put them in the directory created by mktemp -d. Preferably add a trap to remove the temporary directory cleanly.
There's no need for a var prefix in variables.
grep searches for basic regular expressions by default, so .mp3 matches any single character followed by the literal string mp3. If you want to search for a dot, you need to either use grep -F to search for literal strings or escape the regular expression as \.mp3.
You generally want to use read -r (defined by POSIX) to treat backslashes in the input literally.

Resources