Using open database connection to PostgreSQL in BASH? - bash

I have to use BASH to connect to our PostgreSQL 9.1 database server to execute various SQL statements.
We have a performance issue caused by repeatedly opening/closing too many database connections (right now, we send each statement to a psql command).
I am looking at the possibility of maintaining an open database connection for a block of SQL statements using named pipes.
The problem I have is that once I open a connection and execute a SQL statement, I don't know when to stop reading from the psql. I've thought about parsing the output to look for a prompt, although I don't know if that is safe considering the possibility that the character may be embedded in a SELECT output.
Does anyone have a suggestion?
Here's a simplified example of what I have thus far...
#!/bin/bash
PIPE_IN=/tmp/pipe.in
PIPE_OUT=/tmp/pipe.out
mkfifo $PIPE_IN $PIPE_OUT
psql -A -t jkim_edr_md_xxx_db < $PIPE_IN > $PIPE_OUT &
exec 5> $PIPE_IN; rm -f $PIPE_IN
exec 4< $PIPE_OUT; rm -f $PIPE_OUT
echo 'SELECT * FROM some_table' >&5
# unfortunately, this loop blocks
while read -u 4 LINE
do
echo LINE=$LINE
done

Use --file=filename for a batch execution.
Depending on your need for flow control you may want to use another language with a more flexible DB API (Python would be my choice here but use whatever works).

echo >&5 "SELECT * FROM some_table"
should read
echo 'SELECT * FROM some_table' >&5
The redirection operator >& comes after the parameters to echo; and also, if you use "" quotes, some punctuation may be treated specially by the shell, causing foul and mysterious bugs later. On the other hand, quoting ' will be … ugly. SELECT * FROM some_table WHERE foo=\'Can\'\'t read\'' …
You probably want to also create these pipes someplace safer than /tmp. There's a big security-hole race condition where someone else on host could hijack your connection. Try creating a folder like /var/run/yournamehere/ with 0700 privileges, and create the pipes there, ideally with names like PIPE_IN=/var/run/jinkimsqltool/sql.pipe.in.$$ — $$ will be your process ID, so simulataneously-executed scripts won't clobber one another. (To exacerbate the security hole, rm -rf should not be needed for a pipe, but a clever cracker could use that excalation of privileges to abuse the -r there. Just rm -f is sufficient.)

in psql You can use
\o YOUR_PIPE
SELECT whatever;
\o
which will open, write and close the pipe. Your BASH-fu seems quite a lot stronger than mine, so I'll let You work out the details :)

Related

Bash commands as variables failing when joining to form a single command

ssh="ssh user#host"
dumpstructure="mysqldump --compress --default-character-set=utf8 --no-data --quick -u user -p database"
mysql=$ssh "$dumpstructure"
$mysql | gzip -c9 | cat > db_structure.sql.gz
This is failing on the third line with:
mysqldump --compress --default-character-set=utf8 --no-data --quick -u user -p database: command not found
I've simplified my actualy script for the purpose of debugging this specific error. $ssh and $dumpstructure aren't always being joined together in the real script.
Variables are meant to hold data, not commands. Use a function.
mysql () {
ssh user#host mysqldump --compress --default-character-set=utf8 --nodata --quick -u user -p database
}
mysql | gzip -c9 > db_structure.sql.gz
Arguments to a command can be stored in an array.
# Although mysqldump is the name of a command, it is used here as an
# argument to ssh, indicating the command to run on a remote host
args=(mysqldump --compress --default-character-set=utf8 --nodata --quick -u user -p database)
ssh user#host "${args[#]}" | gzip -c9 > db_structure.sql.gz
Chepner's answer is correct about the best way to do things like this, but the reason you're getting that error is actually even more basic. The line:
mysql=$ssh "$dumpstructure"
doesn't do anything like what you want. Because of the space between $ssh and "$dumpstructure", it'll parse this as environmentvar=value command, which means it should execute the "mysqldump..." part with the environment variable mysql set to ssh user#host. But it's worse than that, since the double-quotes around "$dumpstructure" mean that it won't be split into words, and so the entire string gets treated as the command name (rather than mysqldump being the command name, and the rest being arguments to it).
If this had been the right way to go about building the command, the right way to stick the parts together would be:
mysql="$ssh $dumpstructure"
...so that the whole combined string gets treated as part of the value to assign to mysql. But as I said, you really should use Chepner's approach instead.
Actually, commands in variables should also work and can be in form of `$var` or just $($var). If it says command not found, it could because the command maybe not in you PATH. Or you should give full path of you command.
So let's put this vote down away and talk about this question.
The real problem is mysql=$ssh "$dumpstructure". This means you'll execute $dumpstructure with additional environment mysql=$ssh. So we got command not found exception. It's actually because mysqldump is located on remote server not this host, so it's reasonable this command is not found.
From this point, let's see how to fix this question.
OP want to dumpplicate mysql data from remote server, which means $dumpstructure shoud be executed remotely. Let's see third line mysql=$ssh "$dumpstructure". Now we figure out this would result in problem. So what should be the correct command? The simplest command should be like mysql="$ssh $dumpstructure", which means both $ssh and $dumpstructure will be join into single command line in variable $mysql.
At the end, let's talk about the last command line. I do not agree with variable are meant to hold data, not command. Cause command is also a kind of data. The real problem is how to use it correctly.
OP's command is also supported, at least it is supported on bash 4.2.46.
So the real problem is how to use a variable to hold commands not import a new method to do that, wraping them into a bash function, for example.
So who can tell me why this answer does not come into readers' notice but be voted down?

Bash script with psql command tells nothing, but doesn't work

I am really confused with this piece of code:
...
COM="psql -d $DBNAME -p $PGPORT -c 'COPY (SELECT * FROM $TABLE_NAME s WHERE cast(s.$COLUMN_NAME as DATE) < DATE '$DATE_STOP' ) TO '$SCRIPTPATH/$ARCHIVE_NAME--$DBNAME' WITH CSV HEADER;'"
su postgres -c '$COM' &> pg_a.log
...
in psql shell this SQL code works fine, but in script he is not creating archive and tells me nothing about mistakes or fails.
Thanks in advance!
You'll get one hint if you replace your "su" command with this:
echo '$COM'
What you'll see is that it prints out $COM -- not an expansion, but the string itself.
You'll probably find a number of other problems with this approach. You're going to have to escape a bunch of characters so the shell doesn't interpret them for you. It's going to be a real pain.
I would put the sql into a file and use the -f option to psql.

Executing PSQL in R on Windows

I have been trying to execute PSQL from system() within R in RStudio. I have PSQL setup in my PATH and can execute PSQL from the cmd line. I cannot for the life of me figure out the correct method for executing psql from within R on windows. I have code supplied from a ubuntu environment. I have not used system() previously before this and researching for this specific issue has been unsuccessful.
The hardest part is not receiving any output after executing system in R. I have tried a few different setting from looking at ?system. With no luck.
This code should execute a simple sql statement and pass the output to a local file. Ultimately this will be more robust to include dynamic elements in an application. Just having the basics working seems like the hardest part.
system(paste("export PGPASSWORD=db_password;psql -h db_host -d db_name -c 'copy(select * from large_table limit 1000) to stdout csv' > C:/temp_data/db_test.dat", sep=""))
I am curious as to if anyone has a working windows environment using PSQL in R. My greenplum server is not local.
My echo %PATH% includes C:\Program Files (x86)\pgAdmin III\1.12
included in both system and user vars.
There are a few problems with your command.
system cannot be used with redirects, you must use shell
You cannot use single quotes to quote commands in Windows, you must use double quotes.
To concatenate commands, you use the & operator, not a ; like in Unix.
So your command would look like (it appears to be necessary to include this in one line):
cmd<-'set PGPASSWORD=db_password& psql -h db_host -d db_name -c "copy(select * from large_table limit 1000) TO STDOUT CSV;" > C:/temp_data/db_test.dat'
shell(cmd)
But, have you considered using the RPostgresql driver, which is a much simpler, platform-independent way to do your task?
# Load up the driver
library(RPGsql)
drv <- dbDriver("PostgreSQL")
# Create a connection
con <- dbConnect(drv, dbname="db_name", host='db_host',password='db_password',user='db_user')
# Query the database
db_test=dbGetQuery(con, 'select * from large_table limit 1000')
# Write your file
write.csv(db_test,'C:/temp_data/db_test.dat')

Bash script to start Solr deltaimporthandler

I am after a bash script which I can use to trigger a delta import of XML files via CRON. After a bit of digging and modification I have this:
#!/bin/bash
# Bash to initiate Solr Delta Import Handler
# Setup Variables
urlCmd='http://localhost:8080/solr/dataimport?command=delta-import&clean=false'
statusCmd='http://localhost:8080/solr/dataimport?command=status'
outputDir=.
# Operations
wget -O $outputDir/check_status_update_index.txt ${statusCmd}
2>/dev/null
status=`fgrep idle $outputDir/check_status_update_index.txt`
if [[ ${status} == *idle* ]]
then
wget -O $outputDir/status_update_index.txt ${urlCmd}
2>/dev/null
fi
Can I get any feedback on this? Is there a better way of doing it? Any optimisations or improvements would be most welcome.
This certainly looks usable. Just to confirm, you intend to run this ever X minutes from your crontab? That seems reasonsable.
The only major quibble (IMHO) is discarding STDERR information with 2>/dev/null. Of course it depends on what are your expectations for this system. If this is for a paying customer or employer, do you want to have to explain to the boss, "gosh, I didn't know I was getting error message 'Cant connect to host X' for the last 3 months because we redirect STDERR to /dev/null"! If this is for your own project, and your monitoring the work via other channels, then not so terrible, but why not capture STDERR to file, and if check that there are no errors. as a general idea ....
myStdErrLog=/tmp/myProject/myProg.stderr.$(/bin/date +%Y%m%d.%H%M)
wget -O $outputDir/check_status_update_index.txt ${statusCmd} 2> ${myStdErrLog}
if [[ ! -s ${myStdErrLog} ]] ; then
mail -s "error on myProg" me#myself.org < ${myStdErrLog}
fi
rm ${myStdErrLog}
Depending on what curl includes in its STDERR output, you may need filter what is in the StdErrLog to see if there are "real" error messages that you need to have sent to you.
A medium quibble is your use backticks for command substitution, if you're using dbl-sqr-brackets for evaluations, then why not embrace complete ksh93/bash semantics. The only reason to use backticks is if you think you need to be ultra-backwards compatible and that you'll be running this script under the bourne shell (or possibly one of the stripped down shells like dash).Backticks have been deprecated in ksh since at least 1993. Try
status=$(fgrep idle $outputDir/check_status_update_index.txt)
The $( ... ) form of command substitution makes it very easy to nest multiple cmd-subtitutions, i.e. echo $(echo one $(echo two ) ). (Bad example, as the need to nest cmd-sub is pretty rare, I can't think of a better example right now).
Depending on your situation, but in a large production environement, where new software is installed to version numbered directories, you might want to construct your paths from variables, i.e.
hostName=localhost
portNum=8080
SOLRPATH=/solr
SOLRCMD='delta-import&clean=false"
urlCmd='http://${hostName}:${portNum}${SOLRPATH}/dataimport?command=${SOLRCMD}"
The final, minor quibble ;-). Are you sure ${status} == *idle* does what you want?
Try using something like
case "${status}" in
*idle* ) .... ;;
* ) echo "unknown status = ${status} or similar" 1>&2 ;;
esac
Yes, your if ... fi certainly works, but if you want to start doing more refined processing of infomation that you put in your ${status} variable, then case ... esac is the way to go.
EDIT
I agree with #alinsoar that 2>/dev/null on a line by itself will be a no-op. I assumed that it was a formatting issue, but looking in edit mode at your code I see that it appears to be on its own line. If you really want to discard STDERR messages, then you need cmd ... 2>/dev/null all on one line OR as alinsoar advocates, the shell will accept redirections at the front of the line, but again, all on one line ;-!.
IHTH

starting remote script via ssh containing nohup

I want to start a script remotely via ssh like this:
ssh user#remote.org -t 'cd my/dir && ./myscript data my#email.com'
The script does various things which work fine until it comes to a line with nohup:
nohup time ./myprog $1 >my.log && mutt -a ${1%.*}/`basename $1` -a ${1%.*}/`basename ${1%.*}`.plt $2 < my.log 2>&1 &
it is supposed to do start the program myprog, pipe its output to mylog and send an email with some datafiles created by myprog as attachment and the log as body. Though when the script reaches this line, ssh outputs:
Connection to remote.org closed.
What is the problem here?
Thanks for any help
Your command runs a pipeline of processes in the background, so the calling script will exit straight away (or very soon afterwards). This will cause ssh to close the connection. That in turn will cause a SIGHUP to be sent to any process attached to the terminal that the -t option caused to be created.
Your time ./myprog process is protected by a nohup, so it should carry on running. But your mutt isn't, and that is likely to be the issue here. I suggest you change your command line to:
nohup sh -c "time ./myprog $1 >my.log && mutt -a ${1%.*}/`basename $1` -a ${1%.*}/`basename ${1%.*}`.plt $2 < my.log 2>&1 " &
so the entire pipeline gets protected. (If that doesn't fix it it may be necessary to do something with file descriptors - for instance mutt may have other issues with the terminal not being around - or the quoting may need tweaking depending on the parameters - but give that a try for now...)
This answer may be helpful. In summary, to achieve the desired effect, you have to do the following things:
Redirect all I/O on the remote nohup'ed command
Tell your local SSH command to exit as soon as it's done starting the remote process(es).
Quoting the answer I already mentioned, in turn quoting wikipedia:
Nohuping backgrounded jobs is for example useful when logged in via SSH, since backgrounded jobs can cause the shell to hang on logout due to a race condition [2]. This problem can also be overcome by redirecting all three I/O streams:
nohup myprogram > foo.out 2> foo.err < /dev/null &
UPDATE
I've just had success with this pattern:
ssh -f user#host 'sh -c "( (nohup command-to-nohup 2>&1 >output.file </dev/null) & )"'
Managed to solve this for a use case where I need to start backgrounded scripts remotely via ssh using a technique similar to other answers here, but in a way I feel is more simple and clean (at least, it makes my code shorter and -- I believe -- better-looking), by explicitly closing all three streams using the stream-close redirection syntax (as discussed at the following locations:
https://unix.stackexchange.com/questions/131801/closing-a-file-descriptor-vs
https://unix.stackexchange.com/questions/70963/difference-between-2-2-dev-null-dev-null-and-dev-null-21
http://www.tldp.org/LDP/abs/html/io-redirection.html#CFD
https://www.gnu.org/software/bash/manual/html_node/Redirections.html
Rather than the more widely used but (IMHO) hackier "redirect to/from /dev/null", resulting in the deceptively simple:
nohup script.sh >&- 2>&- <&-&
2>&1 works just as well as 2>&-, but I feel the latter is ever-so-slightly more clear. ;) Most people might have a space preceding the final "background job" ampersand, but since it is not required (as the ampersand itself functions like a semicolon in normal usage), I prefer to omit it. :)

Resources