I am looking to get all table definitions in Hive. I know that for single table definition I can use something like -
describe <<table_name>>
describe extended <<table_name>>
But, I couldn't find a way to get all table definitions. Is there any table in megastore similar to Information_Schema in mysql or is there command to get all table definitions ?
You can do this by writing a simple bash script and some bash commands.
First, write all table names in a database to a text file using:
$hive -e 'show tables in <dbname>' | tee tables.txt
Then create a bash script (describe_tables.sh) to loop over each table in this list:
while read line
do
echo "$line"
eval "hive -e 'describe <dbname>.$line'"
done
Then execute the script:
$chmod +x describe_tables.sh
$./describe_tables.sh < tables.txt > definitions.txt
The definitions.txt file will contain all the table definitions.
The above processes work, however it will be slow due to the fact that the hive connection is made for each query. Instead you can do what I just did for the same need below.
Use one of the above methods to get your list of tables.
Then modify the list to make it a hive query for each table as follows:
describe my_table_01;
describe my_TABLE_02;
So you will have a flat file with the all your describe statements mentioned above. For example, if you have the query in a flat file called my_table_description.hql.
Get the output in one scoop as follows:
"hive -f my_table_description.hql > my_table_description.output
It is super fast and gets the output in one shot.
Fetch list of hive databases hive -e 'show databases' > hive_databases.txt
Echo each table's desc:
cat hive_databases.txt | grep -v '^$' | while read LINE;
do
echo "## TableName:" $LINE
eval "hive -e 'show tables in $LINE' | grep -v ^$ | grep -v Logging | grep -v tab_name | tee $LINE.tables.txt"
cat $LINE.tables.txt | while read table
do
echo "### $LINE.$table" > $LINE.$table.desc.md
eval "hive -e 'describe $LINE.$table'" >> $LINE.$table.desc.md
sed -i 's/\t/|/g' ./$LINE.$table.desc.md
sed -i 's/comment/comment\n|:--:|:--:|:--:|/g' ./$LINE.$table.desc.md
done
done
Related
The below code not go in loop if I change first line to hive -S -e 'show databases like 'abc_xyz%''|
Can you please help to fix this issue?`
hive -S -e 'show databases'|
while read database
do
eval "hive -S -e 'show tables in $database'"|
while read line
do
if eval "hive -S -e 'describe $database.$line'"| grep -q "<column_name>"; then
output="Required table name: $database.$line"'\n';
else
output=""'\n';
fi
echo -e "$output"
done
done```
Wildcards in the show databases command pattern can only be '*' for any character(s) or '|' for a choice for Hive < 4.0.0.
For example like this:
show databases like 'abc_xyz*|bcd_xyz*'
SQL-style patterns '%' for any character(s), and '_' for a single character work only since Hive 4.0.0
Below is my input file and the code I am using
FILE :
cat $TESTFILE
2020-01-13,COST_CH_RPT
2018-04-19,LOSS_CH_RPT
CODE :
for i in `cat $TESTFILE`
do
export date=`cat $TESTFILE|cut -d',' -f1`
echo date=$date
export Name=`cat $TESTFILE|cut -d',' -f2`
echo Name=$Name
beeline --outputformat=csv2 --hivevar Name=$Name --hivevar date=$date -u ${beeURL} -f TEST.hql
done
The objective is to run the hql file for every line in the file , the above code is running twice for the two lines available , but the variables that are being passed for both the runs is the same , which is the first line in the file .
How can i differentiate the input variables for each run.
As noted by comment, the current solution re-process the TESTFILE multiple times, incorrectly. Simpler alternative is to use the 'read' to loop thru the lines:
while IFS=, read date Name ; do
echo beeline --outputformat=csv2 --hivevar Name="$Name" --hivevar date="$date" -u "${beeURL}" -f TEST.hql
done < $TESTFILE
It simply iterate over the line in the TESTFILE, and execute the beeline command. Suggesting using quotes to protect against error in the input file - in particular, spaces, which will 'break' the command line.
I want to modify my existing bash script. This is how it looks now:
#! /bin/bash
SAMPLE = myfile.txt
while read SAMPLE
do
name = $SAMPLE
# some other code
done < $SAMPLE
In this case 'myfile'.txt consists only of one column, with all the info I need.
Now I want to modify this script because 'myfile.txt' contains now more columns and more lines than I need.
grep 'TEST' myfile.txt | cut -d "," -f 1
gives me the values I need. But how can I integrate this into my bash script?
You can pipe the output of any command into a while read loop.
Try this:
#! /bin/bash
INPUT=myfile.txt
grep 'TEST' $INPUT |
cut -d "," -f 1 |
while read SAMPLE
do
name=$SAMPLE
# some other code
done
You have to change the input field separator (IFS), which tells read where to split the input line. Then you tell read to read two fields: the one you need and one you do not care about.
#! /bin/bash
SAMPLE=myfile.txt
while IFS=, read SAMPLE dontcare
do
name="$SAMPLE"
# some other code
done < <(grep TEST "$SAMPLE")
By the way: whenever you use read, you should consider to use the option -r.
Need to query a certain value from each file in a directory and put it in a file. I use the code:
#!/bin/bash
ls -lrt | grep -w "458752" | awk '{print $9}' | sort -V > list
for linename in cat list
do
/d/home/alima0152/Desktop/sqlite3 $linename "select trace_count from volume"; >> trc_count
done
rm list
But I get this error:
file is encrypted or is not a database
This code is trying to open the files cat and list.
To execute something and insert its output, use `...` or $(...):
for linename in $(cat list)
I have few tables in hive that has same prefix like below..
temp_table_name
temp_table_add
temp_table_area
There are few hundreds of tables like this in my database along with many other tables.
I want to delete tables that starts with "temp_table".
Do any of you know any query that can do this work in Hive?
There is no such thing as regular expressions for drop query in hive (or i didn't find them). But there are multipe ways to do it, for example :
With a shell script :
hive -e "show tables 'temp_*'" | xargs -I '{}' hive -e 'drop table {}'
Or by putting your tables in a specific database and dropping the whole database.
Create table temp.table_name;
Drop database temp cascade;
Above solutions are good. But if you have more tables to delete, then running 'hive -e drop table' is slow. So, I used this:
hive -e 'use db;show tables' | grep pattern > file.hql
use vim editor to open file.hql and run below commands
:%s!^!drop table
:%s!$!;
then run
hive -f file.hql
This approach will be much faster.
My solution has been to use bash script with the following cmd:
hive -e "SHOW TABLES IN db LIKE 'schema*';" | grep "schema" | sed -e 's/^/hive -e \"DROP TABLE db\./1' | sed -e 's/$/\"/1' > script.sh
chmod +x script.sh
./script.sh
I was able to delete all tables using following steps in Apache Spark with Scala:
val df = sql("SHOW TABLES IN default LIke 'invoice*'").select("tableName") // to drop only selected column
val df = sql("SHOW TABLES IN default").select("tableName")
val tableNameList: List[String] = df.as[String].collect().toList
val df2 = tableNameList.map(tableName => sql(s"drop table ${tableName}"))
As I had a lot of tables do drop I used the following command, inspired in the #HorusH answer
hive -e "show tables 'table_prefix*'" | sed -e 's/^/ \DROP TABLE db_name\./1' | sed -e 's/$/;/1' > script.sh
hive -f script.sh
Try this:
hive -e 'use sample_db;show tables' | xargs -I '{}' hive -e 'use sample_db;drop table {}'
Below command will also work.
hive -e 'show tables' | grep table_prefix | while read line; do hive -e "drop table $line"; done
fastest solution through one shell script:
drop_tables.sh pattern
Shell script content:
hive -e 'use db;show tables' | grep $1 | sed 's/^/drop table db./' | sed 's/$/;/' > temp.hql
hive -f temp.hql
rm temp.hql