is there is a way to automate the redshift vaccum process through udf? - aws-lambda

I have more that 300+ table in redshift.
Data is getting update daily basic just want to know can i create a udf in redshift to automate the vaccum process.
I found a link automate using python but not that great python coder i am so looking for solution in sql script.

Unfortunately, you can't use a udf for something like this, udf's are simple input/ouput function meant to be used in queries.
Your best bet is to use this open source tool from AWS Labs: VaccumAnalyzeUtility. The great thing about using this tool is that it is very smart about only running VACUUM on tables that need them, and it will also run ANALYZE on tables that need it.
It's pretty easy to set up as cron job. Here is an example of how it can be done:
Pull the amazon-redshift-utils repo in git:
git clone https://github.com/awslabs/amazon-redshift-utils
cd amazon-redshift-utils
Create a script that can be run by cron. In your text editor create a file called run_vacuum_analyze.sh with the following, and fill in the values for the your environment:
export REDSHIFT_USER=<your db user name>
export REDSHIFT_PASSWORD=<your db password>
export REDSHIFT_DB=<your db>
export REDSHIFT_HOST=<your redshift host>
export REDSHIFT_PORT=<your redshift port>
export WORKSPACE=$PWD/src/AnalyzeVacuumUtility
#
# VIRTUALENV
#
rm -rf $WORKSPACE/ve1
virtualenv -p python2.6 "$WORKSPACE/ve1"
# enter virutalenv
source $WORKSPACE/ve1/bin/activate
#
# DEPENDENCIES
#
pip install PyGreSQL
cd $WORKSPACE/run
#
# RUN IT
#
python analyze-vacuum-schema.py --db $REDSHIFT_DB --db-user $REDSHIFT_USER --db-pwd $REDSHIFT_PASSWORD --db-port $REDSHIFT_PORT --db-host $REDSHIFT_HOST
Then create a cron job that will run this script (In this example, I run it daily at 2:30 AM)
chmod +x run_vacuum_analyze.sh
crontab -e
Add the following entry:
30 2 * * * <path-to-the-cloned-repo>/run_vacuum_analyze.sh

You CANNOT use a UDF for this, UDFs cannot run command that update data.

Yes, I have created a AWS lamda function in java and used cloud watch event to schedule using a cron. AWS lamda function in java expects shaded jar to be uploaded. I have created environment variable in lamda function for redshift connection properties which are passed into java handler.

Now you can use auto vaccum ,Redshift now providing this option

Here is my shell script utility to automate this with better control over the table filters.
https://thedataguy.in/automate-redshift-vacuum-analyze-using-shell-script-utility/
Example Commands:
Run vacuum and Analyze on all the tables.
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev
Run vacuum and Analyze on the schema sc1, sc2.
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -s 'sc1,sc2'
Run vacuum FULL on all the tables in all the schema except the schema sc1. But don’t want Analyze
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -k sc1 -o FULL -a 0 -v 1
or
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -k sc1 -o FULL -a 0
Run Analyze only on all the tables except the tables tb1,tbl3.
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -b 'tbl1,tbl3' -a 1 -v 0
or
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -b 'tbl1,tbl3' -v 0
Use a password on the command line.
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -P bhuvipassword
Run vacuum and analyze on the tables where unsorted rows are greater than 10%.
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -v 1 -a 1 -x 10
or
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -x 10
Run the Analyze on all the tables in schema sc1 where stats_off is greater than 5.
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -v 0 -a 1 -f 5
Run the vacuum only on the table tbl1 which is in the schema sc1 with the Vacuum threshold 90%.
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -s sc1 -t tbl1 -a 0 -c 90
Run analyze only the schema sc1 but set the analyze_threshold_percent=0.01
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -s sc1 -t tbl1 -a 1 -v 0 -r 0.01
Do a dry run (generate SQL queries) for analyze all the tables on the schema sc2.
./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -s sc2 -z 1

Related

searching for a solution to write log file pg restore

I get an conflict error with -d and -f together, do you have a solution ?
pg_restore -d mydb -h myhost --clean --verbose c:\dba\Manager\tk\Tasks\import-tk-21aug2022\budget-app-sara-21.8_updated_withdata -f c:\dba\manager\tk\restore-tk-21aug2022.log
-f is not for a log file. -f says instead of restoring the dump to a database, write a SQL script which will do the restore.
-d says to restore the dump to the given database.
They say to do two different things, so they conflict. You have to decide which one you want to do.
See the pg_restore docs.

Pass unknown number of argument to from command line to Makefile

I have a docker image that I want to run locally and to make my life easier I am using make file to pass AWS environment variable.
aws_access_key_id := $(shell aws configure get aws_access_key_id)
aws_secret_access_key := $(shell aws configure get aws_secret_access_key)
aws_region := $(shell aws configure get region)
docker-run:
docker run -e AWS_ACCESS_KEY_ID="$(aws_access_key_id)" -e AWS_SECRET_ACCESS_KEY="$(aws_secret_access_key)" -e AWS_DEFAULT_REGION="$(aws_region)" --rm mydocker-image
And I need to find a way to do something like this in my terminal
make docker-run -d my_db -s dev -t my_table -u my_user -i URI://redshift
make docker-run --pre-actions "delete from dev.my_table where first_name = 'John'" -s dev -t my_table
make docker-run -s3 s3://temp-parquet/avro/ -s dev -t my_table -u myuser -i URI://redshift
These are the arguments that my docker (python application with argparse) will accept.
You can't do that, directly. The command line arguments to make are parsed by make, and must be valid make program command line arguments. Makefiles are not shell scripts and make is not a general interpreter: there's no facility for passing arbitrary options to it.
You can do this by putting them into a variable, like this:
make docker-run DOCKER_ARGS="-d my_db -s dev -t my_table -u my_user -i URI://redshift"
make docker-run DOCKER_ARGS="-d my_db -s dev -t my_table"
then use $(DOCKER_ARGS) in your makefile. But that's the only way.
If you want to do argument parsing yourself, you probably don't want a Makefile! You should probably write a Bash script instead.
Example:
#!/usr/bin/env bash
set -euo pipefail
aws_access_key_id="$(aws configure get aws_access_key_id)"
aws_secret_access_key="$(aws configure get aws_secret_access_key)"
aws_region="$(aws configure get region)"
docker run -e AWS_ACCESS_KEY_ID="$aws_access_key_id" -e AWS_SECRET_ACCESS_KEY="$aws_secret_access_key" -e AWS_DEFAULT_REGION="$aws_region" --rm mydocker-imagedocker "$#"
Note the $# at the end, which passes the arguments from Bash to the docker command.
You might want to try someting like:
$ cat Makefile
all:
#echo make docker-run -d my_db -s dev -t my_table $${MYUSER+-u "$(MYUSER)"} $${URI+-i "URI://$(URI)"}
$ make
make docker-run -d my_db -s dev -t my_table
$ make MYUSER=myuser URI=redshift
make docker-run -d my_db -s dev -t my_table -u myuser -i URI://redshift

Issue with manually giving password every time with psql

We are trying to migrate data from one Amazon RDS database to Amazon Aurora Serverless database using psql of postgresql by COPY command. The script is working fine when I run it from an EC2 instance but I need to give password for rdswizard and postgres every iteration manually. I just want to give the password along with my psql command. How to give password along with the psql command not manually every time?
allSites=(3 5 9 11 29 30 31 32 33 34 37 38 39 40 41 45 46 47 48)
for i in "${allSites[#]}"
do
psql \
-X \
-U rdswizard \
-h my_rds_host_url_goes_here \
-d wizard \
-c "\\copy (select site_id,name,phone from client_${i} where date(create_date) > '2019-09-11' LIMIT 100) to stdout" \
| \
psql \
-X \
-U postgres \
-h my_aurora_serverless_host_url_goes_here \
-d wizard \
-c "\\copy client_${i}(site_id,name,phone) from stdin"
done
Both of my database host is on remote server not in local machine
You can add details to ~/.pgpass file to avoid regularly having to type in passwords. Make sure to provide -rw-------(0600) permissions to the file.
This file should contain lines of the following format:
hostname:port:database:username:password
The password field from the first line that matches the current connection parameters will be used. Refer official documentation

pg_dump: too many command-line arguments when calling from cmd

Im trying to make a backup in a folder C:\Users\Marko Petričević\Documents\Radni_sati_Backup\proba where "proba" is the name of backup file.
My command looks like this:
pg_dump -h 192.168.130.240 -p 5433 -U postgres -F c postgres > C:\Users\Marko Petričević\Documents\Radni_sati_Backup\proba
and then i get an error: " pg_dump: too many command-line arguments (first is "Petričević\Documents\Radni_sati_Backup\proba") "
But, when I write a command like:
pg_dump -h 192.168.130.240 -p 5433 -U postgres -F c postgres >C:\radni_sati_backup\radni_sati_proba
Everything works, and I get that "radni_sati_proba" file in the directory as I listed in command.
Why is this happening?
Found out what the problem was:
pg_dump -h 192.168.130.240 -p 5433 -U postgres -F c postgres > C:\Users\Marko Petričević\Documents\Radni_sati_Backup\proba
needs to be like this:
pg_dump -h 192.168.130.240 -p 5433 -U postgres -F c postgres > "C:\Users\Marko Petričević\Documents\Radni_sati_Backup\proba"
Problem was the space in path.

Facing issue in Shell while executing query on remote postgres database

I am running one shell script on my App server which will go on another machine where Postgres database is installed. It will execute query and return couple of IDs and store into variables. Please find below my shell script.
ssh root#<Remote_HOST> 'bash -s'<< EOF
projectid=`/usr/pgsql-9.4/bin/psql $DB_NAME -U $DB_USER -h $DB_HOST -t -c "select projectid from projects where project_Name='$projectName';"`
scenarioid=`/usr/pgsql-9.4/bin/psql $DB_NAME -U $DB_USER -h $DB_HOST -t -c "select scenarioid from scenarios where scenario='$scenario' and projectid='$projectid';"`
EOF
echo $projectid
If i execute Shell, i get following error :
/root/test/data.sh: line 62: /usr/pgsql-9.4/bin/psql: No such file or directory
/root/test/data.sh: line 62: /usr/pgsql-9.4/bin/psql: No such file or directory
But on machine where database is installed, if i execute same query, i get proper results. So i am not sure what is wrong, query is fine and directory is present. Even after SSH to remote host, if i do ls or pwd, i am getting proper output. I have already exported database password, so database login without password is already working fine.
Can some please tell me what am i missing here?
Finally i was able to resolve my issue by making changes in Shell
projectid=$(ssh root#<Remote_HOST> << EOF
/usr/pgsql-9.4/bin/psql $DB_NAME -U $DB_USER -h $DB_HOST -t -c "select projectid from projects where project_Name='$projectName';"
EOF)
scenarioid=$(ssh root#<Remote_HOST> << EOF
/usr/pgsql-9.4/bin/psql $DB_NAME -U $DB_USER -h $DB_HOST -t -c "select scenarioid from scenarios where scenario='$scenario' and projectid='$projectid';"
EOF)
echo "$projectid : $scenarioid"

Resources