BigQuery BashScript-- Not transferring to the destination - bash

I wrote a simple bash script to that takes the results from a query and appends them to an existing table. My script executes but the data doesn't seem to make it to the destination table. Any idea what i might be doing wrong? is it possible that I can't use a partition ($) as a destination?
Thank you so much for your help.
#!/bin/bash
bq query \
--destination_table=logs.p_activity_428001$20170803 \
--append_table <<EOF
SELECT
*
FROM log.p_activity_428001
where _PARTITIONTIME = TIMESTAMP('2017-08-03')
EOF

You need to escape the dollar sign; bash is expanding the positional parameter $20170803, which is empty unless you provide 20,170,803 arguments to the script. A single backslash will suffice:
#!/bin/bash
bq query \
--destination_table=logs.p_activity_428001\$20170803 \
--append_table <<EOF
SELECT
*
FROM log.p_activity_428001
where _PARTITIONTIME = TIMESTAMP('2017-08-03')
EOF
although single-quoting the whole table name may be more readable:
#!/bin/bash
bq query \
--destination_table='logs.p_activity_428001$20170803' \
--append_table <<EOF
SELECT
*
FROM log.p_activity_428001
where _PARTITIONTIME = TIMESTAMP('2017-08-03')
EOF

Related

Passing parameter in a BigQuery Script

I want to pass argument to a BigQuery script in shell, here is the example of script I wrote
#!/bin/bash
bq query --use_legacy_sql=false --destination_table=abc --append 'select * from `xyz.INFORMATION_SCHEMA.VIEWS` union all Select * from `def.VIEWS`) where table_name = "$1"'
when I run this script and pass the argument, I do not get any errors but no row is appended to the table. whereas when i specify the table_name as rty that row is appended to the table. What am I missing here?
When you run the script you'll get a prompt like:
Waiting on <BIGQUERY_JOB_ID> ... (0s) Current status: DONE
You can inspect the job in many ways, including the bqtool:
bq show -j --format=prettyjson <BIGQUERY_JOB_ID>
If you have jq installed (sudo apt install jq) you can get just the translated query with:
bq show -j --format=prettyjson <BIGQUERY_JOB_ID> | jq '.configuration.query.query'
which will get you something similar to:
select * from xyz.INFORMATION_SCHEMA.VIEWS where table_name = \"$1\"
As you can see the variable is not correctly escaped so no table matches the WHERE filter. To avoid this you can enclose the query in double quotes and the variable in single ones like this:
#!/bin/bash
bq query \
--use_legacy_sql=false \
--destination_table=xyz.abc \
--append \
"select * from xyz.INFORMATION_SCHEMA.VIEWS where table_name='$1'"
You can get the INFORMATION_SCHEMA.VIEWS: command not found error if using back-ticks. You can omit or escape them with a backslash:
"select * from \`xyz\`.INFORMATION_SCHEMA.VIEWS where table_name='$1'"

unable to loop through array in bash for PostgreSQL query export

I have a PostgreSQL query that I'd like to run for multiple geographic areas via a loop. I want to use the elements in the array to modify the query and the name of the csv file where I'm exporting the data to. So in essence, I want the query to run on ...cwa = 'MFR'... and export to hourly_MFR.csv, then run on ...cwa = 'PQR'... and export to hourly_PQR.csv, and so on.
Here's what I have so far. I thought maybe the EOF in the script might be causing problems, but I couldn't figure out how to get the loop to work while maintaining the general format of the script.
Also, the query/script, without the looping (excluding declare, for, do, done statements) works fine.
dbname="XXX"
username="XXXXX"
psql $dbname $username << EOF
declare -a arr=('MFR', 'PQR', 'REV')
for i in "${arr[#]}"
do
\COPY
(SELECT d.woyhh,
COALESCE(ct.ct, 0) AS total_count
FROM
(SELECT f_woyhh(d::TIMESTAMP) AS woyhh
FROM generate_series(TIMESTAMP '2018-01-01', TIMESTAMP '2018-12-31', interval '1 hour') d) d
LEFT JOIN
(SELECT f_woyhh((TIME)::TIMESTAMP) AS woyhh,
count(*) AS ct
FROM counties c
JOIN ltg_data d ON ST_contains(c.the_geom, d.ltg_geom)
WHERE cwa = $i
GROUP BY 1) ct USING (whh)
ORDER BY 1) TO /var/www/html/GIS/ltg_db/bigquery/hourly_$i.csv CSV HEADER;
done
EOF
Thanks for any help!
I think you are nearly there, you just have to reorder some lines. Try this:
dbname="XXX"
username="XXXXX"
declare -a arr=('MFR', 'PQR', 'REV')
for i in "${arr[#]}"
do
psql $dbname $username << EOF
\COPY
(SELECT d.woyhh,
COALESCE(ct.ct, 0) AS total_count
FROM
(SELECT f_woyhh(d::TIMESTAMP) AS woyhh
FROM generate_series(TIMESTAMP '2018-01-01', TIMESTAMP '2018-12-31', interval '1 hour') d) d
LEFT JOIN
(SELECT f_woyhh((TIME)::TIMESTAMP) AS woyhh,
count(*) AS ct
FROM counties c
JOIN ltg_data d ON ST_contains(c.the_geom, d.ltg_geom)
WHERE cwa = $i
GROUP BY 1) ct USING (whh)
ORDER BY 1) TO /var/www/html/GIS/ltg_db/bigquery/hourly_$i.csv CSV HEADER;
EOF
done
The declare and the for loop are part of the bash script while everything between <<EOF and EOF are part of your Postgresql query.
In #Lienhart Woitok's answer above, the solution will definitely work. However - note that this has the side effect of executing a new 'psql' call, database connection setup, authentication, and subsequent response returned; followed by closing the connection - for each iteration of the loop.
In this case you are only running 3 iterations of the loop, so it may not be a significant issue. However, if you expand the usage to run more iterations, you may want to optimize this to only run a single DB connection and batch query it.
To do that, use of a temporary working file to build the SQL commands may be necessary. There are other ways, but this is relatively simple to use and debug:
QUERY_FILE=$(mktemp /tmp/query.XXXXXXX)
# note the use of an array isn't really necessary in this use
# case - and a simple set of values can be used equally as well
CWA="MFR PQR REV"
for i in $CWA
do
cat <<EOF >> $QUERY_FILE
<ADD_YOUR_QUERY_STATEMENTS_HERE>
EOF
done
psql --file=$QUERY_FILE $dbname $username
if (( $? ))
then
echo "query failed (QUERY_FILE: ($QUERY_FILE')"
exit 1
else
echo "query succeeded"
rm -f $QUERY_FILE
exit 0
fi

MonetDB doesn't recognize function names given to mclient via command line

I am trying to export a few columns from a table as encoded integer.
Basically I want to use a bash script to pass the SQL command to mclient as command line argument. My bash script looks like:
#!/bin/bash
dir=`pwd`
for col in occupation native_country martial_status race sex
do
mclient -d adult -s \
"create function encode${col}(s varchar(200)) returns int begin return (select code from ${col}_dict where ${col}=s); end;"
mclient -d adult -s \
"COPY (select encode${col}($col) from adult) INTO '${dir}/${col}.txt' NULL AS '0'"
mclient -d adult -s \
"drop function encode${col}"
done
In each iteration, I want to create a SQL function on the fly. Then use the function to encode an attribute and export it to a text file. And lastly drop the function.
However, the output strangely contains some monster characters,
as if it can't recognize the function name.
If I remove the second mclient command, the other operations are successful.
operation successful
Function 'X�..X�' not defined
operation successful
operation successful
Function '��"X�.X�' not defined
operation successful
operation successful
Function ' X�.PX�' not defined

stop bash from expanding * in sql statement

i have a delete statement delete * from table_name;. Currently shell expands it to list all existing files in current directory. How can i escape it, so that the string that is passed to sqlplus is indeed "delete * from table_name". I tried \* '*' and \\* and none of them work.
The exact script is
#!/bin/bash
table_name = $1
delete_sql= "delete * from $table_name;"
echo $delete_sql > abc.sql
How about
echo "delete * from table_name" | sqlplus
or
echo "delete * from table_name" > mystatements.txt
sqlplus #mystatements.txt
On a side note, you don't need to specify * in a delete statement - all fields in the matching rows are deleted.
You just need to quote the variable (and fix your spacing around the = sign):
#!/bin/bash
table_name=$1
delete_sql="delete * from $table_name;"
echo "$delete_sql" > abc.sql

Removing leading \n while assaigning to a variable

I have a DB2 query in a shell script which return an integer value, but I am unable to store it in a variable.
temp1= echo db2 -x "select max(id) from work.work_tb"
I am getting this output when I run it, sh -x test.sh
db2 -x select max(id) from work.work_tb
echo 50
temp1=
50
So for some reason $temp1 is unable to get the value, I think its because the db2 query is returning value prefixed with \n. How do I get rid of the newline char and get the value to temp1?
No, that's not why.
temp1=`db2 -x "select max(id) from work.work_tb"`
emp1=$(echo db2 -x "select max(id) from work.work_tb")
or using backticks
emp1=`echo db2 -x "select max(id) from work.work_tb"`
In general, to remove newlines, you can pass it to tools like tr/sed etc
... | tr -d "\n"

Resources