Bash - Iterate trough SQLite3 DB - bash

quick overview: I got sqlite3 db which contains following structure and data
Id|Name|Value
1|SomeName1|SomeValue1
2|SomeName2|SomeValue2
3|SomeName3|SomeValue3
(continuation of SomeValue3 in here, after ENTER)
Problem is with iteration trough "Value" column, I'm using that code:
records=(`sqlite3 database.db "SELECT Value FROM Values"`)
for record in "${records[#]}"; do
echo $record
done
Problem is there should three values using that iteration, but it is showing four.
As result I received:
1 step of loop - SomeValue1
2 step of loop - SomeValue2
3 step of loop - SomeValue3
4 step of loop - (continuation of SomeValue3 in here, after ENTER)
it should end at third step and just show with line break up something like that:
3 step of loop - SomeValue3
(continuation of SomeValue3 in here, after ENTER)
Any suggestion how I can handle it with bash?
Thank you in advance!

Instead of relying on word splitting to populate an array with the result of a command, it's much more robust to use the readarray builtin, or read a result at a time with a loop. Examples of both follow, using sqlite3's ascii output mode, where rows are separated by the byte 0x1E and columns in the rows by 0x1F. This allows the literal newlines in your data to be easily accepted.
#!/usr/bin/env bash
# The -d argument to readarray and read changes the end-of-line character
# from newline to, in this case, ASCII Record Separator
# Uses the `%q` format specifier to avoid printing the newline
# literally for demonstration purposes.
echo "Example 1"
readarray -d $'\x1E' -t rows < <(sqlite3 -batch -noheader -ascii database.db 'SELECT value FROM "Values"')
for row in "${rows[#]}"; do
printf "Value: %q\n" "$row"
done
echo "Example 2 - multiple columns"
while IFS=$'\x1F' read -d $'\x1E' -ra row; do
printf "Rowid: %d Value: %q\n" "${row[0]}" "${row[1]}"
done < <(sqlite3 -batch -noheader -ascii database.db 'SELECT rowid, value FROM "Values"')
outputs
Example 1
Value: SomeValue1
Value: SomeValue2
Value: $'SomeValue2\nand more'
Example 2 - multiple columns
Rowid: 1 Value: SomeValue1
Rowid: 2 Value: SomeValue2
Rowid: 3 Value: $'SomeValue2\nand more'
See Don't Read Lines With for for more on why your approach is bad.
Since VALUES is a SQL keyword, when using it as a table name (Don't do that!) it has to be escaped by double quotes.

Your problem here is the IFS (internal field seperator) in Bash, which the for -loop counts as a new record.
Your best option is to remove the linefeed in the select statement from sqlite, e.g:
records=(`sqlite3 database.db "SELECT replace(Value, '\n', '') FROM Values"`)
for record in "${records[#]}"; do
echo $record
done
Alternatively, you could change the IFS in Bash - but you are relying on linefeed as a seperator between records.

Related

Read content of file line by line in unix using 'line'

I have a file - abc, which has the below content -
Bob 23
Jack 44
Rahul 36
I also have a shell script that do the addition of all the numbers here.
The specific line that picks up these numbers is -
while read line
do
num=echo ${line#* }
sum=`expr $sum + $num`
count=`expr $count + 1`
done< "$readfile"
I assumed that the code is just picking up the last field from file, but it's not. If i modify the file like
Bob 23 12
Jack 44 23
Rahul 36 34
The same script fails with syntax error.
NOTE: I know there are other ways to pick up the field value, but i would like to know how this works.
The syntax ${line#* } will skip the shortest string from the beginning till it finds a space and returns the rest. It worked fine when you had just 2 columns. But the same will not work when 3 columns are present as it will return you the last 2 column values which when you use it in the sum operator will throw you an error. To explain that, just imagine
str='foo bar'
printf '%s\n' "${str#* }"
bar
but imagine the same for 3 fields
str='foo bar foobar'
printf '%s\n' "${str#* }"
bar foobar
To fix that use the parameter expansion syntax of "${str##* }" to skip the longest sub-string from beginning. To fix your script for the example with 3 columns, I would use a script as below.
This does a simple input redirection on the file and uses the read command with the default IFS value which is a single white space. So I'm getting only the 3rd field on each line (even if it has multiple fields), the _ mark the fields I'm skipping. You could also have some variables as place-holders and use their value in the scripts also.
declare -i sum
while read -r _ _ value _ ; do
((sum+=value)
done < file
printf '%d\n' "$sum"
See Bash - Parameter Expansion (Substring removal) to understand more.
You could also use the PE syntax ${line##* } as below,
while read -r line ; do
((sum+=${line##* }))
done < file
[Not relevant to the current question]
If you just want the sum to be computed and not specifically worried about using bash script for this. You can use a simple Awk command to sum up values in 3rd column as
awk '{sum+=$3}END{print sum}' inputfile

How can I select a sqlite column with multiple lines in bash?

I have a sqlite database table with three columns that is storing Name, Location, and Notes. It appears that everything is stored correctly, as when using the sqlite command line I see the correct number of columns and the data is grouped correctly.
The problem comes when using a bash script (this is a requirement) to access the data. The "Notes" column stores data that can potentially be multiple lines (with newlines and such). When I query this table, using something like the following:
stmt="Select name, location, notes from t1"
sqlite3 db "$stmt" | while read ROW;
do
name=`echo $V_ROW | awk '{split($0,a,"|"); print a[1]}'`
location=`echo $V_ROW | awk '{split($0,a,"|"); print a[2]}'`
notes=`echo $V_ROW | awk '{split($0,a,"|"); print a[3]}'`
done
I end up with everything normal until the first newline character in the notes column. After this, each note line is treated as a new row. What would be the correct way to handle this in bash?
Since the data is pipe separated, you can do this (untested): read each line into an array; check the size of the array
if 3 fields, then you have a row from the db, but the notes field may be incomplete. Do something with the previous row, which by now has a complete notes field.
if 1 field found, append the field value to the current notes field.
sqlite3 db "$stmt" | {
full_row=()
while IFS='|' read -ra row; do
if [[ ${#row[#]} -eq 3 ]]; then
# this line contains all 3 fields
if [[ ${#full_row[#]} -eq 0 ]]; then
: # "row" is the first row to be seen, nothing to do here
else
name=${full_row[0]}
location=${full_row[1]}
notes=${full_row[2]}
do_something_with "$name" "$location" "$notes"
#
# not necessary to use separate vars
# do_something_with "${row[#]}"
fi
# then store the current row with incomplete notes
full_row=( "${row[#]}" )
else
# only have notes.
full_row[2]+=" "${row[0]}
fi
done
}
You better takes steps to ensure the notes field does not contain your field separator (|)

Bash Columns SED and BASH Commands without AWK?

I wrote 2 difference scripts but I am stuck at the same problem.
The problem is am making a table from a file ($2) that I get in args and $1 is the numbers of columns. A little bit hard to explain but I am gonna show you input and output.
The problem is now that I don't know how I can save every column now in a difference var so i can build it in my HTML code later
#printf #TR##TD#$...#/TD##TD#$...#/TD##TD#$..#/TD##/TR##TD#$...
so input look like that :
Name\tSize\tType\tprobe
bla\t4711\tfile\t888888888
abcde\t4096\tdirectory\t5555
eeeee\t333333\tblock\t6666
aaaaaa\t111111\tpackage\t7777
sssss\t44444\tfile\t8888
bbbbb\t22222\tfolder\t9999
Code :
c=1
column=$1
file=$2
echo "$( < $file)"| while read Line ; do
Name=$(sed "s/\\\t/ /g" $file | cut -d' ' -f$c,-$column)
printf "$Name \n"
#let c=c+1
#printf "<TR><TD>$Name</TD><TD>$Size</TD><TD>$Type</TD></TR>\n"
exit 0
done
Output:
Name Size Type probe
bla 4711 file 888888888
abcde 4096 directory 5555
eeeee 333333 block 6666
aaaaaa 111111 package 7777
sssss 44444 file 8888
bbbbb 22222 folder 9999
This is tailor-made job for awk. See this script:
awk -F'\t' '{printf "<tr>";for(i=1;i<=NF;i++) printf "<td>%s</td>", $i;print "</tr>"}' input
<tr><td>bla</td><td>4711</td><td>file</td><td>888888888</td></tr>
<tr><td>abcde</td><td>4096</td><td>directory</td><td>5555</td></tr>
<tr><td>eeeee</td><td>333333</td><td>block</td><td>6666</td></tr>
<tr><td>aaaaaa</td><td>111111</td><td>package</td><td>7777</td></tr>
<tr><td>sssss</td><td>44444</td><td>file</td><td>8888</td></tr>
<tr><td>bbbbb</td><td>22222</td><td>folder</td><td>9999</td></tr>
In bash:
celltype=th
while IFS=$'\t' read -a columns; do
rowcontents=$( printf '<%s>%s</%s>' "$celltype" "${columns[#]}" "$celltype" )
printf '<tr>%s</tr>\n' "$rowcontents"
celltype=td
done < <( sed $'s/\\\\t/\t/g' "$2")
Some explanations:
IFS=$'\t' read -a columns reads a line from standard input, using only the tab character to separate fields, and putting each field into a separate element of the array columns. We change IFS so that other whitespace, which could occur in a field, is not treated as a field delimiter.
On the first line read from standard input, <th> elements will be output by the printf line. After resetting the value of celltype at the end of the loop body, all subsequent rows will consist of <td> elements.
When setting the value of rowcontents, take advantage of the fact that the first argument is repeated as many times as necessary to consume all the arguments.
Input is via process substitution from the sed command, which requires a crazy amount of quoting. First, the entire argument is quoted with $'...', which tells bash to replace escaped characters. bash converts this to the literal string s/\\t/^T/g, where I am using ^T to represent a literal ASCII 09 tab character. When sed sees this argument, it performs its own escape replacement, so the search text is a literal backslash followed by a literal t, to be replaced by a literal tab character.
The first argument, the column count, is unnecessary and is ignored.
Normally, you avoid making the while loop part of a pipeline because you set parameters in the loop that you want to use later. Here, all the variables are truly local to the while loop, so you could avoid the process substitution and use a pipeline if you wish:
sed $'s/\\\\t/\t/g' "$2" | while IFS=$'\t' read -a columns; do
...
done

How to extract one column of a csv file

If I have a csv file, is there a quick bash way to print out the contents of only any single column? It is safe to assume that each row has the same number of columns, but each column's content would have different length.
You could use awk for this. Change '$2' to the nth column you want.
awk -F "\"*,\"*" '{print $2}' textfile.csv
yes. cat mycsv.csv | cut -d ',' -f3 will print 3rd column.
The simplest way I was able to get this done was to just use csvtool. I had other use cases as well to use csvtool and it can handle the quotes or delimiters appropriately if they appear within the column data itself.
csvtool format '%(2)\n' input.csv
Replacing 2 with the column number will effectively extract the column data you are looking for.
Landed here looking to extract from a tab separated file. Thought I would add.
cat textfile.tsv | cut -f2 -s
Where -f2 extracts the 2, non-zero indexed column, or the second column.
Here is a csv file example with 2 columns
myTooth.csv
Date,Tooth
2017-01-25,wisdom
2017-02-19,canine
2017-02-24,canine
2017-02-28,wisdom
To get the first column, use:
cut -d, -f1 myTooth.csv
f stands for Field and d stands for delimiter
Running the above command will produce the following output.
Output
Date
2017-01-25
2017-02-19
2017-02-24
2017-02-28
To get the 2nd column only:
cut -d, -f2 myTooth.csv
And here is the output
Output
Tooth
wisdom
canine
canine
wisdom
incisor
Another use case:
Your csv input file contains 10 columns and you want columns 2 through 5 and columns 8, using comma as the separator".
cut uses -f (meaning "fields") to specify columns and -d (meaning "delimiter") to specify the separator. You need to specify the latter because some files may use spaces, tabs, or colons to separate columns.
cut -f 2-5,8 -d , myvalues.csv
cut is a command utility and here is some more examples:
SYNOPSIS
cut -b list [-n] [file ...]
cut -c list [file ...]
cut -f list [-d delim] [-s] [file ...]
I think the easiest is using csvkit:
Gets the 2nd column:
csvcut -c 2 file.csv
However, there's also csvtool, and probably a number of other csv bash tools out there:
sudo apt-get install csvtool (for Debian-based systems)
This would return a column with the first row having 'ID' in it.
csvtool namedcol ID csv_file.csv
This would return the fourth row:
csvtool col 4 csv_file.csv
If you want to drop the header row:
csvtool col 4 csv_file.csv | sed '1d'
First we'll create a basic CSV
[dumb#one pts]$ cat > file
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
Then we get the 1st column
[dumb#one pts]$ awk -F , '{print $1}' file
a
1
a
1
Many answers for this questions are great and some have even looked into the corner cases.
I would like to add a simple answer that can be of daily use... where you mostly get into those corner cases (like having escaped commas or commas in quotes etc.,).
FS (Field Separator) is the variable whose value is dafaulted to
space. So awk by default splits at space for any line.
So using BEGIN (Execute before taking input) we can set this field to anything we want...
awk 'BEGIN {FS = ","}; {print $3}'
The above code will print the 3rd column in a csv file.
The other answers work well, but since you asked for a solution using just the bash shell, you can do this:
AirBoxOmega:~ d$ cat > file #First we'll create a basic CSV
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
a,b,c,d,e,f,g,h,i,k
1,2,3,4,5,6,7,8,9,10
And then you can pull out columns (the first in this example) like so:
AirBoxOmega:~ d$ while IFS=, read -a csv_line;do echo "${csv_line[0]}";done < file
a
1
a
1
a
1
a
1
a
1
a
1
So there's a couple of things going on here:
while IFS=, - this is saying to use a comma as the IFS (Internal Field Separator), which is what the shell uses to know what separates fields (blocks of text). So saying IFS=, is like saying "a,b" is the same as "a b" would be if the IFS=" " (which is what it is by default.)
read -a csv_line; - this is saying read in each line, one at a time and create an array where each element is called "csv_line" and send that to the "do" section of our while loop
do echo "${csv_line[0]}";done < file - now we're in the "do" phase, and we're saying echo the 0th element of the array "csv_line". This action is repeated on every line of the file. The < file part is just telling the while loop where to read from. NOTE: remember, in bash, arrays are 0 indexed, so the first column is the 0th element.
So there you have it, pulling out a column from a CSV in the shell. The other solutions are probably more practical, but this one is pure bash.
You could use GNU Awk, see this article of the user guide.
As an improvement to the solution presented in the article (in June 2015), the following gawk command allows double quotes inside double quoted fields; a double quote is marked by two consecutive double quotes ("") there. Furthermore, this allows empty fields, but even this can not handle multiline fields. The following example prints the 3rd column (via c=3) of textfile.csv:
#!/bin/bash
gawk -- '
BEGIN{
FPAT="([^,\"]*)|(\"((\"\")*[^\"]*)*\")"
}
{
if (substr($c, 1, 1) == "\"") {
$c = substr($c, 2, length($c) - 2) # Get the text within the two quotes
gsub("\"\"", "\"", $c) # Normalize double quotes
}
print $c
}
' c=3 < <(dos2unix <textfile.csv)
Note the use of dos2unix to convert possible DOS style line breaks (CRLF i.e. "\r\n") and UTF-16 encoding (with byte order mark) to "\n" and UTF-8 (without byte order mark), respectively. Standard CSV files use CRLF as line break, see Wikipedia.
If the input may contain multiline fields, you can use the following script. Note the use of special string for separating records in output (since the default separator newline could occur within a record). Again, the following example prints the 3rd column (via c=3) of textfile.csv:
#!/bin/bash
gawk -- '
BEGIN{
RS="\0" # Read the whole input file as one record;
# assume there is no null character in input.
FS="" # Suppose this setting eases internal splitting work.
ORS="\n####\n" # Use a special output separator to show borders of a record.
}
{
nof=patsplit($0, a, /([^,"\n]*)|("(("")*[^"]*)*")/, seps)
field=0;
for (i=1; i<=nof; i++){
field++
if (field==c) {
if (substr(a[i], 1, 1) == "\"") {
a[i] = substr(a[i], 2, length(a[i]) - 2) # Get the text within
# the two quotes.
gsub(/""/, "\"", a[i]) # Normalize double quotes.
}
print a[i]
}
if (seps[i]!=",") field=0
}
}
' c=3 < <(dos2unix <textfile.csv)
There is another approach to the problem. csvquote can output contents of a CSV file modified so that special characters within field are transformed so that usual Unix text processing tools can be used to select certain column. For example the following code outputs the third column:
csvquote textfile.csv | cut -d ',' -f 3 | csvquote -u
csvquote can be used to process arbitrary large files.
I needed proper CSV parsing, not cut / awk and prayer. I'm trying this on a mac without csvtool, but macs do come with ruby, so you can do:
echo "require 'csv'; CSV.read('new.csv').each {|data| puts data[34]}" | ruby
I wonder why none of the answers so far have mentioned csvkit.
csvkit is a suite of command-line tools for converting to and working
with CSV
csvkit documentation
I use it exclusively for csv data management and so far I have not found a problem that I could not solve using cvskit.
To extract one or more columns from a cvs file you can use the csvcut utility that is part of the toolbox. To extract the second column use this command:
csvcut -c 2 filename_in.csv > filename_out.csv
csvcut reference page
If the strings in the csv are quoted, add the quote character with the q option:
csvcut -q '"' -c 2 filename_in.csv > filename_out.csv
Install with pip install csvkit or sudo apt install csvkit.
Simple solution using awk. Instead of "colNum" put the number of column you need to print.
cat fileName.csv | awk -F ";" '{ print $colNum }'
csvtool col 2 file.csv
where 2 is the column you are interested in
you can also do
csvtool col 1,2 file.csv
to do multiple columns
You can't do it without a full CSV parser.
If you know your data will not be quoted, then any solution that splits on , will work well (I tend to reach for cut -d, -f1 | sed 1d), as will any of the CSV manipulation tools.
If you want to produce another CSV file, then xsv, csvkit, csvtool, or other CSV manipulation tools are appropriate.
If you want to extract the contents of one single column of a CSV file, unquoting them so that they can be processed by subsequent commands, this Python 1-liner does the trick for CSV files with headers:
python -c 'import csv,sys'$'\n''for row in csv.DictReader(sys.stdin): print(row["message"])'
The "message" inside of the print function selects the column.
If the CSV file doesn't have headers:
python -c 'import csv,sys'$'\n''for row in csv.reader(sys.stdin): print(row[1])'
Python's CSV library supports all kinds of CSV dialects, so if your CSV file uses different conventions, it's possible to support them with relatively little change to the code.
Been using this code for a while, it is not "quick" unless you count "cutting and pasting from stackoverflow".
It uses ${##} and ${%%} operators in a loop instead of IFS. It calls 'err' and 'die', and supports only comma, dash, and pipe as SEP chars (that's all I needed).
err() { echo "${0##*/}: Error:" "$#" >&2; }
die() { err "$#"; exit 1; }
# Return Nth field in a csv string, fields numbered starting with 1
csv_fldN() { fldN , "$1" "$2"; }
# Return Nth field in string of fields separated
# by SEP, fields numbered starting with 1
fldN() {
local me="fldN: "
local sep="$1"
local fldnum="$2"
local vals="$3"
case "$sep" in
-|,|\|) ;;
*) die "$me: arg1 sep: unsupported separator '$sep'" ;;
esac
case "$fldnum" in
[0-9]*) [ "$fldnum" -gt 0 ] || { err "$me: arg2 fldnum=$fldnum must be number greater or equal to 0."; return 1; } ;;
*) { err "$me: arg2 fldnum=$fldnum must be number"; return 1;} ;;
esac
[ -z "$vals" ] && err "$me: missing arg2 vals: list of '$sep' separated values" && return 1
fldnum=$(($fldnum - 1))
while [ $fldnum -gt 0 ] ; do
vals="${vals#*$sep}"
fldnum=$(($fldnum - 1))
done
echo ${vals%%$sep*}
}
Example:
$ CSVLINE="example,fields with whitespace,field3"
$ $ for fno in $(seq 3); do echo field$fno: $(csv_fldN $fno "$CSVLINE"); done
field1: example
field2: fields with whitespace
field3: field3
You can also use while loop
IFS=,
while read name val; do
echo "............................"
echo Name: "$name"
done<itemlst.csv

Storing CHAR or CLOB sqlplus columns into a shell script variable

I'm having trouble storing column values into shell script variables when these include white spaces, since all the results are split on whitespaces instead of actual column values.
For example, this is what I got now:
set -A SQL_RESULTS_ARRAY `sqlplus -s un/pass#database << EOF
SET ECHO OFF
SET FEED OFF
SET HEAD OFF
SET SPACE 0
SELECT EMAIL_SUBJECT, MAIL_TO FROM EMAIL_TABLE;
EOF`
echo "${SQL_RESULTS_ARRAY[0]}"
echo "${SQL_RESULTS_ARRAY[1]}"
This doesn't work because the value of EMAIL_SUBJECT is an entire sentence, ie "Message subject test", so those echos just end up printing
Message
subject
Instead of
Message subject test
email1#email.com email2#email.com
Basically, how do I end up with only two items in the array (one per column), instead of five items (one per word)? Is this at all possible with a single connection? (I'd rather not start a new connection per column)
EDIT: Another thing, another one of my CLOB columns is EMAIL_BODY, which can basically be any text-- thus I'd rather not have a preset separator, since EMAIL_BODY can have all sorts of commas, pipes, new lines, etc...
The key you're missing is to set the shell's IFS (internal field separator) to be the same as your query results. Here's a ksh session:
$ results="Message subject test,email1#email.com email2#email.com"
$ set -A ary $results
$ for i in 0 1 2 3 4; do print "$i. ${ary[$i]}"; done
0. Message
1. subject
2. test,email1#email.com
3. email2#email.com
4.
$ IFS=,
$ set -A ary $results
$ for i in 0 1 2 3 4; do print "$i. ${ary[$i]}"; done
0. Message subject test
1. email1#email.com email2#email.com
2.
3.
4.
You'll probably want to do something like this:
results=`sqlplus ...`
old_IFS="$IFS"
IFS=,
set -A SQL_RESULTS_ARRAY $results
IFS="$old_IFS
print "${SQL_RESULTS_ARRAY[0]}"
print "${SQL_RESULTS_ARRAY[1]}"
You may try to set COLSEP and separate by its value.
Try adding double quotes using string concatenation in the select statement. Array elements that are quoted permit white space (at least in bash).
read up about the bash's "Internal Field Separator" $IFS
it is set to whitespace by default, which may be causing your problem.

Resources