Dropping multiple tables with same prefix in Hive - hadoop

I have few tables in hive that has same prefix like below..
temp_table_name
temp_table_add
temp_table_area
There are few hundreds of tables like this in my database along with many other tables.
I want to delete tables that starts with "temp_table".
Do any of you know any query that can do this work in Hive?

There is no such thing as regular expressions for drop query in hive (or i didn't find them). But there are multipe ways to do it, for example :
With a shell script :
hive -e "show tables 'temp_*'" | xargs -I '{}' hive -e 'drop table {}'
Or by putting your tables in a specific database and dropping the whole database.
Create table temp.table_name;
Drop database temp cascade;

Above solutions are good. But if you have more tables to delete, then running 'hive -e drop table' is slow. So, I used this:
hive -e 'use db;show tables' | grep pattern > file.hql
use vim editor to open file.hql and run below commands
:%s!^!drop table
:%s!$!;
then run
hive -f file.hql
This approach will be much faster.

My solution has been to use bash script with the following cmd:
hive -e "SHOW TABLES IN db LIKE 'schema*';" | grep "schema" | sed -e 's/^/hive -e \"DROP TABLE db\./1' | sed -e 's/$/\"/1' > script.sh
chmod +x script.sh
./script.sh

I was able to delete all tables using following steps in Apache Spark with Scala:
val df = sql("SHOW TABLES IN default LIke 'invoice*'").select("tableName") // to drop only selected column
val df = sql("SHOW TABLES IN default").select("tableName")
val tableNameList: List[String] = df.as[String].collect().toList
val df2 = tableNameList.map(tableName => sql(s"drop table ${tableName}"))

As I had a lot of tables do drop I used the following command, inspired in the #HorusH answer
hive -e "show tables 'table_prefix*'" | sed -e 's/^/ \DROP TABLE db_name\./1' | sed -e 's/$/;/1' > script.sh
hive -f script.sh

Try this:
hive -e 'use sample_db;show tables' | xargs -I '{}' hive -e 'use sample_db;drop table {}'

Below command will also work.
hive -e 'show tables' | grep table_prefix | while read line; do hive -e "drop table $line"; done

fastest solution through one shell script:
drop_tables.sh pattern
Shell script content:
hive -e 'use db;show tables' | grep $1 | sed 's/^/drop table db./' | sed 's/$/;/' > temp.hql
hive -f temp.hql
rm temp.hql

Related

Shell Hive - Identify a column in entire Hive Database

The below code not go in loop if I change first line to hive -S -e 'show databases like 'abc_xyz%''|
Can you please help to fix this issue?`
hive -S -e 'show databases'|
while read database
do
eval "hive -S -e 'show tables in $database'"|
while read line
do
if eval "hive -S -e 'describe $database.$line'"| grep -q "<column_name>"; then
output="Required table name: $database.$line"'\n';
else
output=""'\n';
fi
echo -e "$output"
done
done```
Wildcards in the show databases command pattern can only be '*' for any character(s) or '|' for a choice for Hive < 4.0.0.
For example like this:
show databases like 'abc_xyz*|bcd_xyz*'
SQL-style patterns '%' for any character(s), and '_' for a single character work only since Hive 4.0.0

shell read and pass mulptiple variables in for or while loop

Below is my input file and the code I am using
FILE :
cat $TESTFILE
2020-01-13,COST_CH_RPT
2018-04-19,LOSS_CH_RPT
CODE :
for i in `cat $TESTFILE`
do
export date=`cat $TESTFILE|cut -d',' -f1`
echo date=$date
export Name=`cat $TESTFILE|cut -d',' -f2`
echo Name=$Name
beeline --outputformat=csv2 --hivevar Name=$Name --hivevar date=$date -u ${beeURL} -f TEST.hql
done
The objective is to run the hql file for every line in the file , the above code is running twice for the two lines available , but the variables that are being passed for both the runs is the same , which is the first line in the file .
How can i differentiate the input variables for each run.
As noted by comment, the current solution re-process the TESTFILE multiple times, incorrectly. Simpler alternative is to use the 'read' to loop thru the lines:
while IFS=, read date Name ; do
echo beeline --outputformat=csv2 --hivevar Name="$Name" --hivevar date="$date" -u "${beeURL}" -f TEST.hql
done < $TESTFILE
It simply iterate over the line in the TESTFILE, and execute the beeline command. Suggesting using quotes to protect against error in the input file - in particular, spaces, which will 'break' the command line.

ldapsearch script with an input file

I have a file with a list of users. I would like to query our company's ldap to check if the users on my list are still existing accounts on the company's ldap server.
The bash script would essentially, use the file to use the names to check with ldaps 'cn', then possibly output/print to the results to identify which names no longer exist.
It sounds simple, and I'm familiar with doing basic ldapsearch commands, but not sure how I would begin scripting this out.
Appreciate all the help!
I have done this exact task and my approach is this: Do the ldapsearch query and get all emails for valid users in ldap. Convert to lower case, sort, remove duplicates and store in a file. Do the same with your list of users you want to check. Then use comm to find any emails that are not in the list from LDAP. This method should be the fastest unless you have a large number of LDAP records. Here is the code:
LDAP_SERVER="ldap://YOUR.LDAP.SERVER:389"
LDAP_USER="QUERY_USER_NAME"
LDAP_PASSWORD="QUERY_PASSWORD"
ldapsearch -E pr=1000/noprompt -LLL -o ldif-wrap=no -x \
-b 'dc=example,dc=com' \
-H $LDAP_SERVER -D $LDAP_USER -w $LDAP_PASSWORD \
'(&(objectCategory=person)(objectClass=user)(Mail=*))' mail |\
awk '/^mail:/{print $2}' |\
tr '[:upper:]' '[:lower:]' |\
sort -u >ldap_emails
cat user_list |\
tr '[:upper:]' '[:lower:]' |\
sort -u >user_emails
comm -13 ldap_emails user_emails

How do I convert a tab separated file into comma separated file in Mac OSX using sed?

I am doing a MySQL query from the terminal and trying to convert the output from a tab-separated file to a comma-separated file. I've tried the following with no luck:
... mysql query | sed 's/\t/,/g'
... mysql query | sed 's/\\t/","/g'
... mysql query | sed 's/\\t/\\",\\"/g'
and various combinations of these with no luck.
I was able to find the solution here. You have to insert the tab by doing "ctrl+v" then hit the "tab" button to manually insert the tab. For some reason, sed in the Mac OSX terminal doesn't like the \t regex.
... mysql query | sed 's/ /,/g'
Another, IMHO simpler, option is to use tr to transliterate tabs into commas, like this:
mysql query | tr '\t' ','

How to get all table definitions in a database in Hive?

I am looking to get all table definitions in Hive. I know that for single table definition I can use something like -
describe <<table_name>>
describe extended <<table_name>>
But, I couldn't find a way to get all table definitions. Is there any table in megastore similar to Information_Schema in mysql or is there command to get all table definitions ?
You can do this by writing a simple bash script and some bash commands.
First, write all table names in a database to a text file using:
$hive -e 'show tables in <dbname>' | tee tables.txt
Then create a bash script (describe_tables.sh) to loop over each table in this list:
while read line
do
echo "$line"
eval "hive -e 'describe <dbname>.$line'"
done
Then execute the script:
$chmod +x describe_tables.sh
$./describe_tables.sh < tables.txt > definitions.txt
The definitions.txt file will contain all the table definitions.
The above processes work, however it will be slow due to the fact that the hive connection is made for each query. Instead you can do what I just did for the same need below.
Use one of the above methods to get your list of tables.
Then modify the list to make it a hive query for each table as follows:
describe my_table_01;
describe my_TABLE_02;
So you will have a flat file with the all your describe statements mentioned above. For example, if you have the query in a flat file called my_table_description.hql.
Get the output in one scoop as follows:
"hive -f my_table_description.hql > my_table_description.output
It is super fast and gets the output in one shot.
Fetch list of hive databases hive -e 'show databases' > hive_databases.txt
Echo each table's desc:
cat hive_databases.txt | grep -v '^$' | while read LINE;
do
echo "## TableName:" $LINE
eval "hive -e 'show tables in $LINE' | grep -v ^$ | grep -v Logging | grep -v tab_name | tee $LINE.tables.txt"
cat $LINE.tables.txt | while read table
do
echo "### $LINE.$table" > $LINE.$table.desc.md
eval "hive -e 'describe $LINE.$table'" >> $LINE.$table.desc.md
sed -i 's/\t/|/g' ./$LINE.$table.desc.md
sed -i 's/comment/comment\n|:--:|:--:|:--:|/g' ./$LINE.$table.desc.md
done
done

Resources