I would like to make an assertion the output of a postgres query using bash. Concretely, I am writing a bash job that counts the number of rows and if the count is not equal to zero, does something to raise alert.
$ psql MY_DATABASE -c "SELECT COUNT(*) WHERE foo=bar"
count
-------
0
(1 row)
In my script, I would like to assert that the output of above query is zero. However I am not sure where to begin because the output is not a number, but a formatted multi line string.
Is there an option in psql that makes it output a single number when counting, or could you think of any other approaches?
I would suggest to use temporary file to redirect the output and use it. Once your work is done, delete the temp file.
psql your_database -c "SELECT COUNT(*) as Count from table_a where c1=something" -t >assert.tmp
line=$(head -n 1 assert.tmp)
if [ $line -ge 0 ]; then
echo "greater then 0 and values is--"$line
fi
rm assert.tmp
Hope it works for you.
Related
Linux, Bash Script.
I have created a query as below, which works
sqlcmd -S [dbname].database.windows.net -d [database name]-U [username ]-P [password] -Q "SELECT * FROM dbo.microiopenvpn WHERE mode='create'"
How do I then loop over results.
The table has name, email_address & mode fields.
I just want the email_address field (for this query)
Figured it out - one way I guess
while IFS="," read -r d1 d2 d3 d4 d5 d6; do
echo "$d1" "$d2" "$d4"
done < sqloutputNew.csv
Get the column from each row, then iterates, seems to be the best solution I could find.
With the SQL having an output as below
-o "sqloutputNew.csv" -W -w 1024 -s"," -h-1
I'm trying to download a website that uses semi-predictable urls, meaning the url will always end with a random five character alphanumeric string. I created a file with crunch with the random string by using the command:
crunch 5 5 abcdefghijklmnopqrstuvwxyz123456789 > possible_links
Then I created a bash file to call the lines and wget the links:
#!/bin/bash
FILE=possible_links
while read line; do
wget -q --wait=20 www.ghostbin.com/paste/${line}
done < $FILE
but obviously it is going to go to aaaaa, then aaaab, then aaaac, aaaad, etc etc, is there a way to make it go to random lines?
Use mktemp --dry-run option :
#!/bin/bash
while true # or specify a count using something like while [ $count -le 20 ]
do
rand_str="$(mktemp --dry-run XXXXX)" # 5 Xs for five random characters
wget -q --wait=20 www.ghostbin.com/paste/${rand_str}
# if you use count increment count ie do '((count++))' else you get infinite loop
done
General solution (for n random characters)
str=$(printf "%-10s" "X") # here n=10
while condition
do
rand_str=$(mktemp --dry-run ${str// /X})
.
.
So I have two files. File A and File B. File A is huge (>60 GB) and has 16 rows, a mix of numeric and strings, is separated by "|", and has over 600,000,000 lines. Field 3 in this file is the ID and it is a numeric field, with different lengths (e.g., someone's ID can be 1, and someone else's can be 100)
File B just has a bunch of ID (~1,000,000) and I want to extract all the rows from File A that have an ID that is in `File B'. I have started doing this using Linux with the following code
sort -k3,3 -t'|' FileA.txt > FileASorted.txt
sort -k1,1 -t'|' FileB.txt > FileBSorted.txt
join -1 3 -2 1 -t'|' FileASorted.txt FileBSorted.txt > merged.txt
The problem I have is that merged.txt is empty (when I know for a fact there are at least 10 matches)... I have googled this and it seems like the issue is that the join field (the ID) is numeric. Some people propose padding the field with zeros but 1) I'm not entirely sure how to do this, and 2) this seems very slow/time inefficient.
Any other ideas out there? or help on how to add the padding of 0s only to the relevant field.
I would first sort file b using the unique flag (-u)
sort -u file.b > sortedfile.b
Then loop through sortedfile.b and for each grep file.a. In zsh I would do a
foreach C (`cat sortedfile.b`)
grep $C file.a > /dev/null
if [ $? -eq 0 ]; then
echo $C >> res.txt
fi
end
Redirect output from grep to /dev/null and test whether there was a match ($? -eq 0) and append (>>) the result from that line to res.txt.
A single > will overwrite the file. I'm a bit rusty at zsh now so there might be a typo. You may be using bash which can have a slightly different foreach syntax.
How can I get row count from all tables using hive? I am interested in the database name, table name and row count
You will need to do a
select count(*) from table
for all tables.
To automate this, you can make a small bash script and some bash commands.
First run
$hive -e 'show tables' | tee tables.txt
This stores all tables in the database in a text file tables.txt
Create a bash file (count_tables.sh) with the following contents.
while read line
do
echo "$line "
eval "hive -e 'select count(*) from $line'"
done
Now run the following commands.
$chmod +x count_tables.sh
$./count_tables.sh < tables.txt > counts.txt
This creates a text file(counts.txt) with the counts of all the tables in the database
A much faster way to get approximate count of all rows in a table is to run explain on the table. In one of the explain clauses, it shows row counts like below:
TableScan [TS_0] (rows=224910 width=78)
The benefit is that you are not actually spending cluster resources to get that information.
The HQL command is explain select * from table_name; but when not optimized not shows rows in the TableScan.
select count(*) from table
I think there is no more efficient way.
You can collect the statistics on the table by using Hive ANALAYZE command. Hive cost based optimizer makes use of these statistics to create optimal execution plan.
Below is the example of computing statistics on Hive tables:
hive> ANALYZE TABLE stud COMPUTE STATISTICS;
Query ID = impadmin_20171115185549_a73662c3-5332-42c9-bb42-d8ccf21b7221
Total jobs = 1
Launching Job 1 out of 1
…
Table training_db.stud stats: [numFiles=5, numRows=5, totalSize=50, rawDataSize=45]
OK
Time taken: 8.202 seconds
Links:
http://dwgeek.com/apache-hive-explain-command-example.html/
You can also set the database in the same command and separate with ;.
hive -e 'use myDatabase;show tables'
try this guys to automate-- put in shell after that run bash filename.sh
hive -e 'select count(distinct fieldid) from table1 where extracttimestamp<'2018-04-26'' > sample.out
hive -e 'select count(distinct fieldid) from table2 where day='26'' > sample.out
lc=cat sample.out | uniq | wc -l
if [ $lc -eq 1 ]; then
echo "PASS"
else
echo "FAIL"
fi
How do I mention the specific database that it needs to refer in below snippet:
while read line
do
echo "$line "
eval "hive -e 'select count(*) from $line'"
done
Here's a solution I wrote that uses python:
import os
dictTabCnt={}
print("=====Finding Tables=====")
tableList = os.popen("hive --outputformat=dsv --showHeader=false -e \"use [YOUR DB HERE]; show tables;\"").read().split('\n')
print("=====Finding Table Counts=====")
for i in tableList:
if i <> '':
strTemp = os.popen("hive --outputformat=dsv --showHeader=false -e \"use [YOUR DB HERE]; SELECT COUNT(*) FROM {}\"".format(i)).read()
dictTabCnt[i] = strTemp
print("=====Table Counts=====")
for table,cnt in dictTabCnt.items():
print("{}: {}".format(table,cnt))
Thanks to #mukul_gupta for providing shell script.
how ever we are encounting below error for the same
"bash syntax error near unexpected token done"
Solution for this at below link
BASH Syntax error near unexpected token 'done'
Also if any one need how to select DB Name
$hive -e 'use databasename;show tables' | tee tables.txt
for passing db name in select statement, give DB name in tableslist file itself.
I'm trying to use a Bash script to run a large number of calculations (just over 2 million) using a terminal-based program called uvspec. But I've hit a serious barrier following the latest addition to the calculation...
The script, opens an input file which has 2e^6 lines looking like this:
0 66.3426 -9.999 -9999
0 66.6192 -9.999 -9999
0 61.9212 1.655 1655
0 61.9999 1.655 1655
...
Each of these values represents a different value I want to substitute into the input file (using sed), so I read each line into an array. Many of these lines contain negative values in the 4th column e.g. -9999, which result in errors in the program so I would like to omit those lines and return a standard output - I'm doing this with the if statement... Problem is something terribly wrong is coming out of my output and I'm 99.9% sure the problem is a mistake in the following script as I'm fairly new to bash.
Can anyone spot anything here that doesn't make sense or is bad syntax?
Any comments on the script in general would also be useful feedback.
cat ".../Maps/dniinput" | while IFS=$' ' read -r -a myArray
do
if [ "${myArray[3]}" -gt 0 ]
then
sed s/TAU/"${myArray[0]}"/ x.template x.template > a.template
sed s/SZA/"${myArray[1]}"/ a.template a.template > b.template
sed s/ALT/"${myArray[2]}"/ b.template b.template > x.inp
../bin/uvspec < x.inp >> dni.out
else
echo "0 -9999" >> dnijul.out
fi
done
Sed can do all three substitutions in one go and you can pipe the output straight into your analysis program without creating any intermediate a.template and b.template files...
sed -e "s/.../.../" -e "s/.../.../" -e "s/.../.../" x.template | ../bin/uvspec
By the way, you can also get rid of the "cat" at the start, and replace your array with variables whose names better match what they are, if you use a loop like this:
while IFS=S' ' read tau sza alt p4
do
echo $tau $sza $alt $p4
done < a
0 66.3426 -9.999 -9999
0 66.6192 -9.999 -9999
0 61.9212 1.655 1655
0 61.9999 1.655 1655
I named the fourth element "p4" because you refer to the 4th one as the altitude in your comment, but in your code you replace the word "ALT" with the third column - so I am not really sure what your parameters are, but you should hopefully get the idea from the example above.
You might want to combine those "sed" lines into something more like:
sed -e "s/TAU/${myArray[0]}/" -e "s/SZA/${myArray[1]}/" \
-e "s/ALT/${myArray[2]}/" < x.template \
| ../bin/uvspec >> dni.out