Big Query job in shell script

Big Query job in shell script - shell

I'm trying to automate a Big Query job in shell script but I'm getting errors while trying to do this. I'm reading a local CSV file with two columns, reading line by line and updating the values, with the following script:
#!/bin/bash
IFS=","
while read f1 f2
do
echo "De $f1 para $f2"
bq query --use_legacy_sql=false "UPDATE agendas_usuarios.tb_usuarios SET cargo='${f2}' WHERE cargo='${f1}'"
done < cargos_ps.csv
But I'm getting a syntax error: Unclosed
string literal at [1:47].
I've seen something that Shell Script doesn't allow for single quotes inside double quotes, is that true? If so, what's the best way to do this job in shell? I really need to develop in another programming language?
My CSV reading is right, my echo before the bq query is showing correctly.

I'm not sure what the actual problem is (perhaps it's necessary to escape the quotes) but using query parameters will mean that you don't need to inject strings into the query directly and can hopefully avoid the issue you're seeing. You'd want something like this:
bq query --use_legacy_sql=false \
--parameter="cargo:STRING:${f2}" \
--parameter="target:STRING:${f1}" \
"UPDATE agendas_usuarios.tb_usuarios SET cargo=#cargo WHERE cargo=#target"

Related

Escape single quotes in shell invocation

I'm trying to write a systemd service file without resorting to using an external script.
I need to query an sqlite database and write the contents to a file. But my query uses double quotes, I need to wrap the query in single quotes and since systemd doesn't use a shell, I need to manually use one. So how do I accomplish this?
ExecStart=sh -c 'sqlite3 dbfile.db 'SELECT "The db value is: "||value FROM table' > output.log'
I have tried escaping the inner single quotes, but for some reason that doesn't work.

Try this:
ExecStart=sh -c 'sqlite3 dbfile.db '\''SELECT "The db value is: "||value FROM table'\'' > output.log'
I used to use mysql and double quotes work as well. You can also give it a shot:
ExecStart=sh -c 'sqlite3 dbfile.db "SELECT \"The db value is: \"||value FROM table" > output.log'

SED is giving me issues

I am working on a really basic script:
1) Grabs account keys from a text file (keyList.txt) --> key format looks like this: 1002000222,1002000400
2) For each key I am looping through and inserting them (using SED) into SQL queries held in another text file.
3) Query example:
UPSERT INTO ACCT_HIST (ACCT_KEY) SELECT ACCT_KEY FROM ACCT_HIST WHERE ACCT_KEY IN (101000033333) AND REC_ACTV_IND = 'Y' AND DT_KEY < 20191009;
My Bash snippet is below but to summarize the issue, SED is only replacing the values in the parenthesis one key at a time, rather than placing them both in the same parenthesis space. The below is now working perfectly.
#!/bin/bash
now=$(date +"%Y%m%d-%H:%M")
cp acct_transfer_soft_del_list.csv keyList_$now.txt
for key in $(<keyList_$now.txt)
do
sed "s/([^)]*)/(${key})/3" hbase.txt >> queries_$now.txt
done
hbase.txt holds the queries but I don't want to permanently change them, so I send the output to queries_$now.txt.

Please, note that you have IFS=,.
This is (probably) breaking your key with a unwanted behaviour.
I admit that I am not sure I understood entirely what you need, but I think you can use the first cycle in order to get everything you need.
Reusing your code, you can do something like this:
#!/bin/bash
now=$(date +"%Y%m%d-%H:%M")
IFS=","
while read f1 f2
do
echo "$f1,$f2"
sed "s/([^\)]*)/($f1,$f2)/3 " hbase.txt >> queries_$now.txt
done < acct_transfer_soft_del_list.csv > keyList_$now.txt
Anyway, I can't get straight your while cycle: it seems to do a simple copy of your file.
You could avoid it with cp acct_transfer_soft_del_list.csv keyList_$now.txt

Hive - How to store a query result in a variable in a Bash script

I need to store the result of a Hive query in a variable whose value will be used later. So, something like:
$var = select col1 from table;
$var_to_used_later = $var;
All this is part of a bash shell script. How to form the query so as to get the desired result?

Hive should provide command line support for you. I am not familiar with hive but I found this: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli, you can check whether that works.
Personally, I used mysql to achieve similar goal before. The command is:
mysql -u root -p`[script to generate the key]` -N -B -e "use XXXDB; select aaa, bbb, COUNT(*) from xxxtable where some_attribute='$CertainValue';"

I used the method shown here and got it! Instead of calling a file as shown, I run the query directly and use the value stored in the variable.

Running a number of hive queries and writing output to file

I'm trying to make use of the DESCRIBE function via Hive to output the column descriptions of each of the tables out to individual files. I've discovered the -f option so I can just read from a file and write the output back out:
hive -f nameOfSqlQueryFile.sql > out.txt
However, if I open the output file, it throws all the descriptions back to back and it's unclear where one description starts for a table and where it ends.
So, I've tried making a batch file that uses -e to describe each of the tables individually and output to a file:
#!/bin/bash
nameArr=( $(hive -e 'show tables;') )
count=0
for i in "${nameArr[#]}"
do
echo 'Working on table('$count'): '$i
hive -e 'describe '$i > $i'_.txt';
count=$(($count+1))
done
However, because this needs to reconnect for each query, it's remarkably slow, taking hours to process several hundred queries.
Does anyone have an idea of how else I might run each of these DESCRIBE functions, and ideally output to separate files?

You can probably use one of these, depending on how you process the output:
Just use the OK line as a separator and search for it using a script.
Use DESCRIBE EXTENDED which adds a line at the end with info on the table, including its location, which can be used to extract the table name (using sed, for example)
If you're just using the output file as a manual reference, insert a SQL statement that prints a separator of your choice between each table, e.g.:
DESCRIBE table;
SELECT '-----------------' FROM table;

bash script to update postgres database

I have some html data stored in text files right now. I recently decided to store the HTML data in the pgsql database instead of flat files. Right now, the 'entries' table contains a 'path' column that points to the file. I have added a 'content' column that should now store the data in the file pointed to by 'path'. Once that is complete, the 'path' column will be deleted. The problem that I am having is that the files contain apostrophes that throw my script out of whack. What can I do to correct this issue??
Here is the script
#!/bin/sh
dbname="myDB"
username="username"
fileroot="/path/to/the/files/*"
for f in $fileroot
do
psql $dbname $username -c "
UPDATE entries
SET content='`cat $f`'
WHERE id=SELECT id FROM entries WHERE path LIKE '*`$f`';"
done
Note: The logic in the id=SELECT...FROM...WHERE path LIKE "" is not the issue. I have tested this with sample filenames in the pgsql environment.
The problem is that when I cat $f, any apostrophe in Edit: the contents of $f closes the SQL string, and I get a syntax error.

For the single quote escaping issue, a reasonable workaround might be to double the quotes, so you'd use:
`sed "s/'/''/g" < "$f"`
to include the file contents instead of the cat, and for the second invocation in the LIKE where you appeared to intend to use the file name use:
${f/"'"/"''"/}
to include the literal string content of $f instead of executing it, and double the quotes. The ${varname/match/replace} expression is bash syntax and may not work in all shells; use:
`echo "$f" | sed "s/'/''/g"`
if you need to worry about other shells.
There are a bunch of other problems in that SQL.
You're trying to execute $f in your second invocation. I'm pretty sure you didn't intend that; I imagine you meant to include the literal string.
Your subquery is also wrong, it lacks parentheses; (SELECT ...) not just SELECT.
Your LIKE expression is also probably not doing what you intended; you probably meant % instead of *, since % is the SQL wildcard.
If I also change backticks to $() (because it's clearer and easier to read IMO), fix the subquery syntax and add an alias to disambiguate the columns, and use a here-document instead passed to psql's stdin, the result is:
psql $dbname $username <<__END__
UPDATE entries
SET content=$(sed "s/'/''/g" < "$f")
WHERE id=(SELECT e.id FROM entries e WHERE e.path LIKE '$(echo "$f" | sed "s/'/''/g")');
__END__
The above assumes you're using a reasonably modern PostgreSQL with standard_conforming_strings = on. If you aren't, change the regexp to escape apostrophes with \ instead of doubling them, and prefix the string with E, so O'Brien becomes E'O\'Brien'. In modern PostgreSQL it'd instead become 'O''Brien'.
In general, I'd recommend using a real scripting language like Perl with DBD::Pg or Python with psycopg to solve scripting problems with databases. Working with the shell is a bit funky. This expression would be much easier to write with a database interface that supported parameterised statements.
For example, I'd write this as follows:
import os
import sys
import psycopg2
try:
connstr = sys.argv[1]
filename = sys.argv[2]
except IndexError as ex:
print("Usage: %s connect_string filename" % sys.argv[0])
print("Eg: %s \"dbname=test user=fred\" \"some_file\"" % sys.argv[0])
sys.exit(1)
def load_file(connstr,filename):
conn = psycopg2.connect(connstr)
curs = conn.cursor()
curs.execute("""
UPDATE entries
SET content = %s
WHERE id = (SELECT e.id FROM entries e WHERE e.path LIKE '%%'||%s);
""", (filename, open(filename,"rb").read()))
curs.close()
if __name__ == '__main__':
load_file(connstr,filename)
Note the SQL wildcard % is doubled to escape it, so it results in a single % in the final SQL. That's because Python is using % as its format-specifier so a literal % must be doubled to escape it.
You can trivially modify the above script to accept a list of file names, connect to the database once, and loop over the list of all file names. That'll be a lot faster, especially if you do it all in one transaction. It's a real pain to do that with psql scripting; you have to use bash co-process as shown here ... and it isn't worth the hassle.

In the original post, I made it sound like there were apostrophes in the filename represented by $f. This was NOT the case, so a simple echo "$f" was able to fix my issue.
To make it more clear, the contents of my files were formatted as html snippets, typically something like <p>Blah blah <b>blah</b>...</p>. After trying the solution posted by Craig, I realized I had used single quotes in some anchor tags, and I did NOT want to change those to something else. There were only a few files where this violation occurred, so I just changed these to double quotes by hand. I also realized that instead of escaping the apostrophes, it would be better to convert them to &apos; Here is the final script that I ended up using:
dbname="myDB"
username="username"
fileroot="/path/to/files/*"
for f in $fileroot
do
psql $dbname $username << __END__
UPDATE entries
SET content='$(sed "s/'/\&apos;/g" < "$f")'
WHERE id=(SELECT e.id FROM entries e WHERE path LIKE '%$(echo "$f")');
__END__
done
The format coloring on here might make it look like the syntax is incorrect, but I have verified that it is correct as posted.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Big Query job in shell script - shell

Related

Escape single quotes in shell invocation

SED is giving me issues

Hive - How to store a query result in a variable in a Bash script

Running a number of hive queries and writing output to file

bash script to update postgres database

Categories

Resources