Bash script help/evaluation - bash

I'm trying to learn some scripting however I can't find solution for one functionality.
Basically I would like to ask to evaluate my script as it's probably possible to reduce the complexity and number of lines.
The purpose of this script is to download random, encrypted MySQL backups from Amazon S3, restore the dump and run some random MySQL queries.
I'm not sure how to email the output from printf statements - one is for headers and second one for actual data. I've tried to format the output so it looks like below but I had to exclude the headers from the loop:
Database: Table: Entries:
database1 random_table 0
database2 random_table 0
database3 random_table 0
database4 random_table 0
I would like to include this output in the email and also change the email subject based on the success/failure of the script.
I probably use to much if loops and MySQL queries are probably to complicated.
Script:
#!/usr/bin/env bash
# DB Details:
db_user="user"
db_pass="password"
db_host="localhost"
# Date
date_stamp=$(date +%d%m%Y)
# Initial Setup
data_dir="/tmp/backup"
# Checks
if [ ! -e /usr/bin/s3cmd ]; then
echo "Required package (http://s3tools.org/s3cmd)"
exit 2
fi
if [ -e /usr/bin/gpg ]; then
gpg_key=$(gpg -K | tr -d "{<,>}" | awk '/an#example.com/ { print $4 }')
if [ "$gpg_key" != "an#example.com" ]; then
echo "No GPG key"
exit 2
fi
else
echo "No GPG package"
exit 2
fi
if [ -d $data_dir ]; then
rm -rf $data_dir/* && chmod 700 $data_dir
else
mkdir $data_dir && chmod 700 $data_dir
fi
# S3 buckets
bucket_1=s3://test/
# Download backup
for backup in $(s3cmd ls s3://test/ | awk '{ print $2 }')
do
latest=$(s3cmd ls $backup | awk '{ print $2 }' | sed -n '$p')
random=$(s3cmd ls $latest | shuf | awk '{ print $4 }' | sed -n '1p')
s3cmd get $random $data_dir >/dev/null 2>&1
done
# Decrypting Files
for file in $(ls -A $data_dir)
do
filename=$(echo $file | sed 's/\.e//')
gpg --out $data_dir/$filename --decrypt $data_dir/$file >/dev/null 2>&1 && rm -f $data_dir/$file
if [ $? -eq 0 ]; then
# Decompressing Files
bzip2 -d $data_dir/$filename
if [ $? -ne 0 ]; then
echo "Decompression Failed!"
fi
else
echo "Decryption Failed!"
exit 2
fi
done
# MySQL Restore
printf "%-40s%-30s%-30s\n\n" Database: Table: Entries:
for dump in $(ls -A $data_dir)
do
mysql -h $db_host -u $db_user -p$db_pass < $data_dir/$dump
if [ $? -eq 0 ]; then
# Random DBs query
db=$(echo $dump | sed 's/\.sql//')
random_table=$(mysql -h $db_host -u $db_user -p$db_pass $db -e "SHOW TABLES" | grep -v 'Tables' | shuf | sed -n '1p')
db_entries=$(mysql -h $db_host -u $db_user -p$db_pass $db -e "SELECT * FROM $random_table" | grep -v 'id' | wc -l)
printf "%-40s%-30s%-30s\n" $db $random_table $db_entries
mysql -h $db_host -u $db_user -p$db_pass -e "DROP DATABASE $db"
else
echo "The system was unable to restore backups!"
rm -rf $data_dir
exit 2
fi
done
#Remove backups
rm -rf $data_dir

move out of the loop :
random_tables=$(mysql -h $db_host -u $db_user -p$db_pass $db -e "SHOW TABLES" | grep -v 'Tables')
table_nb=$(wc -l <<<"$random_tables")
and in the loop
random_table=$(sed -n $((RANDOM%table_nb+1))p <<<"$random_tables")
A remark $? is the status of latest command executed so after && rm it will not be the status of decrypt

Related

How to pipe aws s3 cp to gzip to be used with "$QUERY" | psql utility

I have following command
"$QUERY" | psql -h $DB_HOST -p $DB_PORT -U $DB_USERNAME $DB_NAME
Where $QUERY is a command that loads files from a bucket, unzip it, and put to the database. It looks like following:
COPY my_table
FROM PROGRAM 'readarray -t files <<<"$(aws s3 ls ${BUCKET_PATH} | tr [:space:] "\n")"; for (( n = ${#files[#]} - 1; n >= 0; n--)); do if [[ ${files[$n]} =~ .csv.gz$ ]]; then aws s3 cp ${BUCKET_PATH}${files[$n]} >(gzip -d -c); break; fi done'
WITH DELIMITER ',' CSV
Here is formatted bash code:
#!/usr/bin/env bash
raw_files=`aws s3 ls ${BUCKET_PATH} | tr [:space:] "\n"`
readarray -t files <<<"$raw_files"
for (( n = ${#files[#]} - 1; n >= 0; n--)); do
if [[ ${files[$n]} =~ .csv.gz$ ]];
then aws s3 cp ${BUCKET_PATH}${files[$n]} >(gzip -d -c);
break; # for test purposes to be no load all files, jsut one
fi
done
aws-CLI version
#: aws --version
#: aws-cli/1.11.13 Python/3.5.2 Linux/4.13.0-43-generic botocore/1.4.70
This script works. But when I try to use it with psql, it fails, and I cannot understand why.
How can I fix it?
Here is a script that loads data from s3 bucket and merges it to fat file:
#!/usr/bin/env bash
bucket_path=$1
limit_files=$2
target_file_name=$3
echo "Source bucket $bucket_path"
if [ -z $target_file_name ]; then
target_file_name="fat.csv.gz"
echo "Default target file $target_file_name"
fi
echo "Total files $(aws s3 ls $bucket_path | wc -l)"
readarray -t files <<<"$(aws s3 ls $bucket_path | tr [:space:] "\n")"
for (( n = ${#files[#]} - 1, i=1; n >= 0; n--)); do
if [[ ${files[$n]} =~ .csv.gz$ ]]; then
aws s3 cp --quiet $bucket_path${files[$n]} >(cat >> "$target_file_name");
echo "$((i++)), ${files[$n]}, current size: $(du -sh $target_file_name)"
if [ ! -z $limit_files ] && [ $i -gt $limit_files ]; then
echo "Final size $(du -sh $target_file_name)"
exit 0
fi
fi
done
exit 0
It works correctly.
But when I try pipe this fat.csv.gz to psql db using the following code
echo "COPY my_table
FROM PROGRAM 'gzip -d -c fat.csv.gz'
WITH DELIMITER ',' CSV" | psql -h $DB_HOST -p $DB_PORT -U $DB_USERNAME $DB_NAME
I am getting the error:
ERROR: must be superuser to COPY to or from a file
It looks like a specific of working of pg (I guess it's due to security reasons) - link
So, the problem now that I don't know how to rework my script to be pipe the fat.csv.gz. I cannot get such privilege and should find a workaround.
I finally wrote the following bash script downloads files from s3, merge them to 50MB archives and pipe to pg in a sub process. Hope it will be helpful for somebody:
get_current_timestamp() (
date '+%s.%N'
)
execute_sql() (
write_log "Importing data from s3 to pg..."
import_data_from_s3 "$EVENTS_PATH"
write_log "Importing data from s3 to pg...done"
)
columns() (
local columns=`echo "SELECT array_to_string(
array(SELECT column_name::text
FROM information_schema.columns
WHERE table_name ILIKE '${TMP_TABLE}'
AND column_name NOT ILIKE '${DATE_FIELD}'), ',')" | \
psql --tuples-only -h $DB_HOST -p $DB_PORT -U $DB_USERNAME $DB_NAME`
echo -n "${columns}"
)
get_timestamp_difference() (
FROM=$1
TO=$2
echo $FROM $TO | awk '{
diff = $2-$1
if (diff >= 86400) {
printf "%i days ", diff/86400
}
if (diff >= 3600) {
printf "%i hours ", (diff/3600)%24
}
if (diff >= 60) {
printf "%i mins ", (diff/60)%60
}
printf "%f secs", diff%60
}'
)
pretty_size() (
if [ ! -z $1 ]; then
local size=$1;
else
local size=`cat <&0`;
fi
echo "${size}" | \
awk '{ \
split( "B KB MB GB" , v ); \
s=1; \
while( $1>=1024 ) { \
$1/=1024; s++ \
} \
printf "%.1f%s", $1, v[s] \
}' | \
add_missing_eol >&1
)
import_data_from_s3() (
local bucket_path=$1
local limit_files=$2
local target_file_name=$3
write_log "Source bucket $bucket_path"
if [ -z ${target_file_name} ]; then
target_file_name="fat.csv.gz"
write_log "Default target file $target_file_name"
fi
if [ ! -z ${limit_files} ]; then
write_log "Import ${limit_files} files"
else
write_log "Import all files"
fi
write_log "Total files $(aws s3 ls $bucket_path | wc -l)"
readarray -t files <<<"$(aws s3 ls $bucket_path | tr [:space:] "\n")"
write_log "Remove old data files..."
find . -maxdepth 1 -type f -name "*${target_file_name}" -execdir rm -f {} +;
write_log "Remove old data files...done"
TMP_TABLE_COLUMNS=$(columns)
write_log "Importing columns: ${DW_EVENTS_TMP_TABLE_COLUMNS}"
declare -A pids
local total_data_amount=0
local file_size_bytes=0
local file_size_bytes=0
local size_limit=$((50*1024*1024))
for (( n = ${#files[#]} - 1, file_counter=1, fat_file_counter=1; n >= 0; n--)); do
if [[ ! ${files[$n]} =~ .csv.gz$ ]]; then continue; fi
file="${fat_file_counter}-${target_file_name}"
aws s3 cp --quiet ${bucket_path}${files[$n]} >(cat >> "${file}");
file_size_bytes=$(stat -c%s "$file")
if [ $file_size_bytes -gt $size_limit ]; then
import_zip "${file}" "$(pretty_size ${file_size_bytes})" & pids["${file}"]=$!;
total_data_amount=$((total_data_amount+file_size_bytes))
write_log "Files read: ${file_counter}, total size(zipped): $(pretty_size ${total_data_amount})"
((fat_file_counter++))
fi
# write_log "${file_counter}, ${files[$n]}, current size: $(du -sh $file)"
if [ ! -z ${limit_files} ] && [ ${file_counter} -gt ${limit_files} ]; then
write_log "Final size $(du -sh ${file})"
if [ ! ${pids["${file}"]+0} ]; then
import_zip "${file}" "$(pretty_size ${file_size_bytes})" & pids["${file}"]=$!;
fi
break;
fi
((file_counter++))
done
# import rest file that can less than limit size
if [ ! ${pids["${file}"]+0} ]; then
import_zip "${file}" "$(pretty_size ${file_size_bytes})" & pids["${file}"]=$!;
fi
write_log "Waiting for all pids: ${pids[*]}"
for pid in ${pids[*]}; do
wait $pid
done
write_log "All sub process have finished. Total size(zipped): $(pretty_size ${total_data_amount})"
)
import_zip() (
local file=$1
local size=$2
local start_time=`get_current_timestamp`
write_log "pid: $!, size: ${size}, importing ${file}...";
gzip -d -c ${file} | \
psql --quiet -h ${DB_HOST} -p ${DB_PORT} -U ${DB_USERNAME} ${DB_NAME} \
-c "COPY ${TMP_TABLE}(${TMP_TABLE_COLUMNS})
FROM STDIN
WITH DELIMITER ',' CSV";
rm $file;
local end_time=`get_current_timestamp`
write_log "pid: $!, time: `get_timestamp_difference ${start_time} ${end_time}`, size: ${size}, importing ${file}...done";
)

Invalidate metadata gives TableNotFoundException although I can see the table in the list of tables

I am trying to run a script that validates if all the tables that are supposed to be created as a part of my deployment. This is my script:
set -x
ENV=$1
. /user/setenv.sh
ticket=""
wlf_kinit ticket
echo "Checking if Kerberos Ticket is available.."
klist $ticket
if [ $? -eq 1 ]; then
echo "Kerberos Ticket not found..Exiting"
exit 1
fi
echo "Validating Hadoop tables" > /tmp/psk1/db_validation_log.txt
db_dir=/user/db
cd $db_dir
for current_directory in `find . -maxdepth 1 -type d`
do
#current_directory=`echo $current_directory | awk -F '/' '{print $2}'`
echo $current_directory
if [ "$current_directory" != "." ]; then
current_directory=`echo $current_directory | awk -F '/' '{print $2}'`
if [ "${current_directory:0:1}" = "v" ]; then
dir=$db_dir/$current_directory/views
else
dir=$db_dir/$current_directory/tables
fi
cd $dir
pwd
#echo "Validating tables in "$current_directory >> /tmp/psk1/db_validation_log.txt
find . -name '*.hql' | while read rec; do
echo $rec
tbl_name=`echo $rec | awk -F '/' '{print $2}' | awk -F '.' '{print $1}'`
result=$(impala-shell --quiet --delimited --ssl -i ${impala_host} -ku ${user_id}${impala_realm} -q "set request_pool = ${request_pool}; use $current_directory$ENV; invalidate metadata $tbl_name; show tables like '$tbl_name';") 2>> /tmp/psk1/db_validation_log.txt
#echo $result
if [ $? -eq 0 ]; then
if [ ${result} == ${tbl_name} ]; then
echo "$tbl_name exists" #>> /tmp/psk1/db_validation_log.txt
else
echo "$tbl_name does not exist" #>> /tmp/psk1/db_validation_log.txt
fi
else
echo $current_directory$ENV"."$tbl_name" Query error" >> /tmp/psk1/db_validation_log.txt
impala-shell --quiet --delimited --ssl -i ${impala_host} -ku ${user_id}${impala_realm} -q "set request_pool = ${request_pool}; use $current_directory$ENV; invalidate metadata $tbl_name; show tables like '$tbl_name';" 2>> /tmp/psk1/db_validation_log.txt
fi
done
fi
done
cat /tmp/psk1/db_validation_log.txt | mail -a /tmp/psk1/db_validation_log.txt -s 'Hadoop DB validation completed. Check the attached log.' -r ${from_email} ${to_email} 2>> /tmp/psk1/db_validation_log.txt
kdestroy -c $ticket
This script prints the failed queries to a .txt file and sends it in an email. In the text file I see some of the queries failing with TableNotFoundException. But when I open the impala shell and list out the tables, I am able to see the table in the list. I am not sure what is causing this inconsistency.
Any help would be appreciated. Thank you.

hide output with errors from stdout

need hide error from the script
for s in $s_list; do
if [ "${s}" = "test" ]; then
db_status=$(mysql -h localhost -P 3306 -u test -ptest -e "show create database test;" | awk {'print $1'} | tail -n 1 )
db_status_error=$(mysql -h localhost -P 3306 -u test -ptest -e "show create database test;" 2>&1 | awk {'print $1'} | tail -n 1 )
# echo $db_status_error
if [ "$db_status" == "test" ]; then
echo "Database exist, need wait..."
sleep 2;
elif [ "$db_status_error" == "ERROR" ] < /dev/null > /dev/null 2>&1 ; then
echo "Database does not exist"
sleep 2;
exit 0
fi
fi
done
result is
ERROR 1049 (42000) at line 1: Unknown database 'test'
Database does not exist
i need just line with Database does not exist
You need to redirect stderr to /dev/null when you set db_status:
db_status=$(mysql -h localhost -P 3306 -u test -ptest -e "show create database test;" 2> /dev/null | awk {'print $1'} | tail -n 1 )

Bash script help/evaluation

I'm trying to learn some scripting however I can't find solution for one functionality.
Basically I would like to ask to evaluate my script as it's probably possible to reduce the complexity and number of lines.
The purpose of this script is to download random, encrypted MySQL backups from Amazon S3, restore the dump and run some random MySQL queries.
I'm not sure how to email the output from printf statements - one is for headers and second one for actual data. I've tried to format the output so it looks like below but I had to exclude the headers from the loop:
Database: Table: Entries:
database1 random_table 0
database2 random_table 0
database3 random_table 0
database4 random_table 0
I would like to include this output in the email and also change the email subject based on the success/failure of the script.
I probably use to much if loops and MySQL queries are probably to complicated.
Script:
#!/usr/bin/env bash
# DB Details:
db_user="user"
db_pass="password"
db_host="localhost"
# Date
date_stamp=$(date +%d%m%Y)
# Initial Setup
data_dir="/tmp/backup"
# Checks
if [ ! -e /usr/bin/s3cmd ]; then
echo "Required package (http://s3tools.org/s3cmd)"
exit 2
fi
if [ -e /usr/bin/gpg ]; then
gpg_key=$(gpg -K | tr -d "{<,>}" | awk '/an#example.com/ { print $4 }')
if [ "$gpg_key" != "an#example.com" ]; then
echo "No GPG key"
exit 2
fi
else
echo "No GPG package"
exit 2
fi
if [ -d $data_dir ]; then
rm -rf $data_dir/* && chmod 700 $data_dir
else
mkdir $data_dir && chmod 700 $data_dir
fi
# S3 buckets
bucket_1=s3://test/
# Download backup
for backup in $(s3cmd ls s3://test/ | awk '{ print $2 }')
do
latest=$(s3cmd ls $backup | awk '{ print $2 }' | sed -n '$p')
random=$(s3cmd ls $latest | shuf | awk '{ print $4 }' | sed -n '1p')
s3cmd get $random $data_dir >/dev/null 2>&1
done
# Decrypting Files
for file in $(ls -A $data_dir)
do
filename=$(echo $file | sed 's/\.e//')
gpg --out $data_dir/$filename --decrypt $data_dir/$file >/dev/null 2>&1 && rm -f $data_dir/$file
if [ $? -eq 0 ]; then
# Decompressing Files
bzip2 -d $data_dir/$filename
if [ $? -ne 0 ]; then
echo "Decompression Failed!"
fi
else
echo "Decryption Failed!"
exit 2
fi
done
# MySQL Restore
printf "%-40s%-30s%-30s\n\n" Database: Table: Entries:
for dump in $(ls -A $data_dir)
do
mysql -h $db_host -u $db_user -p$db_pass < $data_dir/$dump
if [ $? -eq 0 ]; then
# Random DBs query
db=$(echo $dump | sed 's/\.sql//')
random_table=$(mysql -h $db_host -u $db_user -p$db_pass $db -e "SHOW TABLES" | grep -v 'Tables' | shuf | sed -n '1p')
db_entries=$(mysql -h $db_host -u $db_user -p$db_pass $db -e "SELECT * FROM $random_table" | grep -v 'id' | wc -l)
printf "%-40s%-30s%-30s\n" $db $random_table $db_entries
mysql -h $db_host -u $db_user -p$db_pass -e "DROP DATABASE $db"
else
echo "The system was unable to restore backups!"
rm -rf $data_dir
exit 2
fi
done
#Remove backups
rm -rf $data_dir
You'll get the best answers if you ask specific questions (rather than, "please review my code")...and if you limit each post to a single question. Regarding emailing the output of your printf statements:
You can group statements into a block and then pipe the output of a block into another program. For example:
{
echo "This is a header"
echo
for x in {1..10}; do
echo "This is row $x"
done
} | mail -s "Here is my output" lars#example.com
If you want to make the email subject conditional upon the success or
failure of something elsewhere in the script, you can (a) save your
output to a file, and then (b) email the file after building the
subject line:
{
echo "This is a header"
echo
for x in {1..10}; do
echo "This is row $x"
done
} > output
if is_success; then
subject="SUCCESS: Here is your output"
else
subject="FAILURE: Here are your errors"
fi
mail -s "$subject" lars#example.com < output

Bulk account creation from text file

I have the following script, it reads from users.txt the first and second fields and uses them to generate the username and password and creates the accounts for each line. the problem is that the script is only creating accounts for the first 2 lines and not for the rest
#!/bin/bash
FILE=/home/knoppix/users.txt
USERSH=/bin/bash
while IFS=":" read GECOS USRGRP ; do
groupadd -f $USRGRP
USERNM="u$(cat /dev/urandom| tr -dc '0-9' | fold -w 6| head -n 1)"
USERPW=$(cat /dev/urandom| tr -dc 'a-zA-Z0-9' | fold -w 6| head -n 1)
useradd $USERNM -p $USERPW -g $USRGRP -c "$GECOS,$USRGRP" -d $HOME/$USERNM -s $USERSH -m
ACCNT=$(grep $USRNM /etc/passwd)
echo "${tgrn}Account creation successful!${tr}"
echo "$ACCNT"
echo "Credentials"
echo "${tred}Username:${tr} $USERNM ${tred}Password:${tr} $USERPW"
echo
done < $FILE
#!/bin/bash
while IFS=: read GECOS USRGRP; do
# your groupadd and useradd commands here
done < /home/knoppix/users.txt
#!/bin/bash
for line in $file; do
# make the account
done
rm $file
touch $file

Resources