Passing parameter in a BigQuery Script - shell

I want to pass argument to a BigQuery script in shell, here is the example of script I wrote
#!/bin/bash
bq query --use_legacy_sql=false --destination_table=abc --append 'select * from `xyz.INFORMATION_SCHEMA.VIEWS` union all Select * from `def.VIEWS`) where table_name = "$1"'
when I run this script and pass the argument, I do not get any errors but no row is appended to the table. whereas when i specify the table_name as rty that row is appended to the table. What am I missing here?

When you run the script you'll get a prompt like:
Waiting on <BIGQUERY_JOB_ID> ... (0s) Current status: DONE
You can inspect the job in many ways, including the bqtool:
bq show -j --format=prettyjson <BIGQUERY_JOB_ID>
If you have jq installed (sudo apt install jq) you can get just the translated query with:
bq show -j --format=prettyjson <BIGQUERY_JOB_ID> | jq '.configuration.query.query'
which will get you something similar to:
select * from xyz.INFORMATION_SCHEMA.VIEWS where table_name = \"$1\"
As you can see the variable is not correctly escaped so no table matches the WHERE filter. To avoid this you can enclose the query in double quotes and the variable in single ones like this:
#!/bin/bash
bq query \
--use_legacy_sql=false \
--destination_table=xyz.abc \
--append \
"select * from xyz.INFORMATION_SCHEMA.VIEWS where table_name='$1'"
You can get the INFORMATION_SCHEMA.VIEWS: command not found error if using back-ticks. You can omit or escape them with a backslash:
"select * from \`xyz\`.INFORMATION_SCHEMA.VIEWS where table_name='$1'"

Related

How do I ssh over ssh to machine C and log it in a file stored on machine C? [duplicate]

I use an HPC cluster. The compute nodes can't have access to internet, only the frontal.
So I want to wrap all the commands that need to access internet in order to execute them on the frontal.
ex: for wget
#!/bin/bash
ssh frontal /bin/wget "$#"
-> works fine
I have to wrap this bq (google BigQuery) command:
bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;"
I managed to requote the command and to launch it successfully on CLI:
ssh frontal '~/downloads_and_builds/builds/google-cloud-sdk/bin/bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '"'"'2016%'"'"' AND mgrs_tile == '"'"'32ULU'"'"' ORDER BY sensing_time ASC LIMIT 1000;"'
Now I want to write a wrapper named bq able to get the parameters and launch this command through ssh ... here is what i have tried :
#!/bin/bash
set -eu
# all parameters in an array
args=("$#")
# unset globing (there's a * in the SELECT clause)
set -f
# managing inner quotes
arg2=`echo "${args[2]}" | perl -pe 's/'\''/'\''"'\''"'\''/g'`
# put back double quotes (") suppressed by bash
args="${args[0]} ${args[1]} \"${arg2}\""
# build command with parameters
cmd="~/downloads_and_builds/builds/google-cloud-sdk/bin/bq $args"
echo ""
echo "command without external quotes"
echo "$cmd"
echo ""
echo "testing it ..."
ssh hpc-login1 "$cmd"
echo ""
# wrapping command between simple quotes (like on the CLI)
cmd="'"'~/downloads_and_builds/builds/google-cloud-sdk/bin/bq '"$args""'"
echo "commande with external quotes"
echo "$cmd"
echo ""
echo "testing it ..."
ssh hpc-login1 $cmd
echo "done"
Here is the output of this script:
$ bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;"
command without external quotes
~/downloads_and_builds/builds/google-cloud-sdk/bin/bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '"'"'2016%'"'"' AND mgrs_tile == '"'"'32ULU'"'"' ORDER BY sensing_time ASC LIMIT 1000;"
testing it ...
Waiting on bqjob_r102b0c22cdd77c2d_000001629b8391a3_1 ... (0s) Current status: DONE
commande with external quotes
'~/downloads_and_builds/builds/google-cloud-sdk/bin/bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '"'"'2016%'"'"' AND mgrs_tile == '"'"'32ULU'"'"' ORDER BY sensing_time ASC LIMIT 1000;"'
testing it ...
bash: ~/downloads_and_builds/builds/google-cloud-sdk/bin/bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;": Aucun fichier ou dossier de ce type (in english: no file or directory of this kind)
As you can see, I managed to get a correct command string, just like the one which works on CLI, but it doesn't work in my script:
The first attempt succeeded but gives no output (I have tried to redirect it in a file: the file were created but is empty)
In the second attempt (with external simple quotes, just like the CLI command that worked), bash take the quoted arg as a block and don't find the command ...
Has somebody an idea on how to launch a complex command (with quotes, wildcards ...) like this one through ssh using a wrapper script ?
(ie. one wrapper named foo able to replace a foo command and execute it correctly through ssh with the arguments provided)
ssh has the same semantics as eval: all arguments are concatenated with spaces and then evaluated as a shell command.
You can have it work with execve semantics (like sudo) by having a wrapper escape the arguments:
remotebq() {
ssh yourhost "~/downloads_and_builds/builds/google-cloud-sdk/bin/bq $(printf '%q ' "$#")"
}
This quotes thoroughly and consistently, so you no longer have to worry about adding additional escaping. It'll run exactly what you tell it (as long as your remote shell is bash):
remotebq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;"
However, the downside to running exactly what you tell it is that now you need to know exactly what you want to run.
For example, you can no longer pass '~/foo' as an argument because this is not a valid file: ~ is a shell feature and not a directory name, and when it's correctly escaped it will not be replaced by your home directory.
The basic way to do this, using shell here doc :
#!/bin/bash
ssh -t server<<'EOF'
bq --format=json query "SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;"
command2
command3
...
EOF
I see you are already using Perl so...
use Net::OpenSSH;
my $query = q(SELECT * FROM [bigquery-public-data:cloud_storage_geo_index.sentinel_2_index] WHERE sensing_time LIKE '2016%' AND mgrs_tile == '32ULU' ORDER BY sensing_time ASC LIMIT 1000;);
my $ssh = Net::OpenSSH->new($host);
$ssh->system('bq', '--format=json', 'query', $query)
or die $ssh->error;
Net::OpenSSH would take care of quoting everything.

How to pass timestamp from bash to psql

I am having problem with passing timestamp parameter to psql. In $since variable I can have any string formatted according to SQL standard and I pass this value to sql like this:
First I check if $since is in correct format (if it fails it won't continue):
1) psql --command "SELECT ($since)::TIMESTAMPTZ;"
Second I use the value in my function (it takes timestamptz as input parameter):
2) cmd="SELECT myfunc($since);"
psql --command "$cmd" $DBNAME
Works: if since="NOW() - INTERVAL '5 months'"
Does not work: if since="2017-10-23 10:42:48" (it fails on LINE 1: SELECT (2017-10-23 10:42:48)::TIMESTAMPTZ; error)
I tried to escape the $since string somehow with ', ", \ characters, but after many combinations both in bash and sql I gave up.
What is the correct way to escape in such case?
If you need to cast a string to TIMESTAMPTZ, so you need to enclose the value of $since in ' ', either when creating the variable
since="'2017-10-23 10:42:48'"
or when passing it to psql:
since="2017-10-23 10:42:48"
psql --command "SELECT '$since'::TIMESTAMPTZ ;"
If you need to pass either a string or an expression like NOW() - INTERVAL '1 day' you would better decide about the quoting when assigning the value to the variable, so:
$ since="'2017-10-23 10:42:48'"
$ psql postgres --command "SELECT $since::TIMESTAMPTZ ;"
timestamptz
------------------------
2017-10-23 10:42:48+02
(1 row)
$ since="(NOW() - INTERVAL '1 day')"
$ psql postgres --command "SELECT $since::TIMESTAMPTZ ;"
timestamptz
-------------------------------
2018-03-24 08:49:24.577356+01
(1 row)

BigQuery BashScript-- Not transferring to the destination

I wrote a simple bash script to that takes the results from a query and appends them to an existing table. My script executes but the data doesn't seem to make it to the destination table. Any idea what i might be doing wrong? is it possible that I can't use a partition ($) as a destination?
Thank you so much for your help.
#!/bin/bash
bq query \
--destination_table=logs.p_activity_428001$20170803 \
--append_table <<EOF
SELECT
*
FROM log.p_activity_428001
where _PARTITIONTIME = TIMESTAMP('2017-08-03')
EOF
You need to escape the dollar sign; bash is expanding the positional parameter $20170803, which is empty unless you provide 20,170,803 arguments to the script. A single backslash will suffice:
#!/bin/bash
bq query \
--destination_table=logs.p_activity_428001\$20170803 \
--append_table <<EOF
SELECT
*
FROM log.p_activity_428001
where _PARTITIONTIME = TIMESTAMP('2017-08-03')
EOF
although single-quoting the whole table name may be more readable:
#!/bin/bash
bq query \
--destination_table='logs.p_activity_428001$20170803' \
--append_table <<EOF
SELECT
*
FROM log.p_activity_428001
where _PARTITIONTIME = TIMESTAMP('2017-08-03')
EOF

MonetDB doesn't recognize function names given to mclient via command line

I am trying to export a few columns from a table as encoded integer.
Basically I want to use a bash script to pass the SQL command to mclient as command line argument. My bash script looks like:
#!/bin/bash
dir=`pwd`
for col in occupation native_country martial_status race sex
do
mclient -d adult -s \
"create function encode${col}(s varchar(200)) returns int begin return (select code from ${col}_dict where ${col}=s); end;"
mclient -d adult -s \
"COPY (select encode${col}($col) from adult) INTO '${dir}/${col}.txt' NULL AS '0'"
mclient -d adult -s \
"drop function encode${col}"
done
In each iteration, I want to create a SQL function on the fly. Then use the function to encode an attribute and export it to a text file. And lastly drop the function.
However, the output strangely contains some monster characters,
as if it can't recognize the function name.
If I remove the second mclient command, the other operations are successful.
operation successful
Function 'X�..X�' not defined
operation successful
operation successful
Function '��"X�.X�' not defined
operation successful
operation successful
Function ' X�.PX�' not defined

Removing leading \n while assaigning to a variable

I have a DB2 query in a shell script which return an integer value, but I am unable to store it in a variable.
temp1= echo db2 -x "select max(id) from work.work_tb"
I am getting this output when I run it, sh -x test.sh
db2 -x select max(id) from work.work_tb
echo 50
temp1=
50
So for some reason $temp1 is unable to get the value, I think its because the db2 query is returning value prefixed with \n. How do I get rid of the newline char and get the value to temp1?
No, that's not why.
temp1=`db2 -x "select max(id) from work.work_tb"`
emp1=$(echo db2 -x "select max(id) from work.work_tb")
or using backticks
emp1=`echo db2 -x "select max(id) from work.work_tb"`
In general, to remove newlines, you can pass it to tools like tr/sed etc
... | tr -d "\n"

Resources