bash scripting, how to parse string separated with : - bash

I have lines that look like these
value: "15"
value: "20"
value: "3"
I am getting this as input pipe after grepping
... | grep value:
What I need is a simple bash script that takes this pipe and produce me the sum
15 + 20 + 3
So my command will be:
... | grep value: | calculate_sum_value > /tmp/sum.txt
sum.txt should contain a single number which is the sum.
How can I do with bash? I have no experience with bash at all.

You could try awk. Something like this should work
... | grep value: | awk '{sum+=$2}END{print sum}'
And you could possibly avoid grep alltogether like this
.... | awk '/^value:/{sum+=$2}END{print sum}'
Update:
You can add the " character as a field seperator with the -F option.
... | awk -F\" '/^value:/{sum+=$2}END{print sum}'

My first try was to grab the stuff on the right of the colon and let bash sum it:
$ sum=0
$ cat sample.txt | while IFS=: read key value; do ((sum += value)); done
bash: ((: "15": syntax error: operand expected (error token is ""15"")
bash: ((: "20": syntax error: operand expected (error token is ""20"")
bash: ((: "3": syntax error: operand expected (error token is ""3"")
0
So, have to remove the quotes. Fine, use a fancy Perl regex to extract the first set of digits to the right of the colon:
$ cat sample.txt | grep -oP ':\D+\K\d+'
15
20
3
OK, onwards:
$ cat sample.txt | grep -oP ':\D+\K\d+' | while read n; do ((sum+=n)); done; echo $sum
0
Huh? Oh yeah, running while in a pipeline puts the modifications to sum in a subshell, not in the current shell. Well, do the echo in the subshell too:
$ cat sample.txt | grep -oP ':\D+\K\d+' | { while read n; do ((sum+=n)); done; echo $sum; }
38
That's better, but still the value is not in the current shell. Let's try something trickier
$ set -- $(cat sample.txt | grep -oP ':\D+\K\d+')
$ sum=$(IFS=+; bc <<< "$*")
$ echo $sum
38
And yes, UUOC, but it's a placeholder for whatever the OP's pipeline was.

Related

How can I use 'echo' output as an operand for the 'seq' command within a terminal?

I have an excercise where I need to sum together every digit up until a given number like this:
Suppose I have the number 12, I need to do 1+2+3+4+5+6+7+8+9+1+0+1+1+1+2.
(numbers past 9 are split up into their separate digits eg. 11 = 1+1, 234 = 2+3+4, etc.)
I know I can just use:
seq -s '' 12
which outputs 123456789101112 and then add them all together with '+' in between and then pipe to 'bc' BUT I have to specifically do :
echo 12 | ...
as the first step (because the online IDE fills it in as the unchangeable first step for every testcase) and when I do this I start to have problems with seq
I tried
echo 12 | seq -s '' $1
### or just ###
echo 12 | seq -s ''
but can't get it to work as this just gives back a missing operand error for seq (because I'm in the terminal, not a script and the '12' isn't just assigned to $1 I assume), any recommendations on how to avoid it or how to get seq to interpret the 12 from echo as operand or alternative ways to go?
seq -s '' $(cat)
full solution:
echo "12" | seq -s '' $(cat) | sed 's/./&+/g; s/$/0/' | bc
Or
echo 12 | { echo $(( $({ seq -s '' $(< /dev/stdin); echo; } | sed -E 's/([[:digit:]])/\1+/g; s/$/0/') )); }
without sed:
d=$(echo 12 | { seq -s '' $(< /dev/stdin); echo; }); echo $(( "${d//?/&+}0" ))
echo 12 | awk '{
cnt=0
for(i=1;i<=$1;i++) {
cnt+=i
printf("%s%s",i,i<$1?"+":"=")
}
print cnt
}'
Prints:
1+2+3+4+5+6+7+8+9+10+11+12=78
If it is supposed to be just the digits added up:
echo 12 | awk '{s=""
for(i=1;i<=$1;i++) s=s i
split(s,ch,"")
for(i=1;i<=length(ch); i++) cnt+=ch[i]
print cnt
}'
51
Or a POSIX pipeline:
$ echo 12 | seq -s '' "$(cat)" | sed -E 's/([0-9])/\1+/g; s/$/0/' | bc
51

variable error in bash when doing calculation

I assigned output of piping into a variable, but when I try to use the variable to do math, it won't allow me:
%%bash
cd /data/ref/
grep -v ">" EN | wc -c > ref
cat ref
cd /example/
grep -v ">" SR | wc -l > sample
cat sample
echo $((x= cat sample, y= cat ref, u=x/y, z=u*100))
I get this error:
41858
38986
bash: line 7: x= cat sample, y= cat ref, u=x/y, z=u*100: syntax error in expression (error token is "sample, y= cat ref, u=x/y, z=u*100"
You received that error because you passed an invalid arithmetic expression into a bash arithetic expansion. Only an arithmetic expression is allowed for this place. What you try to do seems like this:
ref="$(grep -v ">" /data/ref/EN | wc -c)"
sample="$(grep -v ">" /example/SR | wc -l)"
# this is only integer division
#u=$(( sample / ref ))
#z=$(( 100 * u ))
# to do math calculations, you can use bc
u=$(bc <<< "scale=2; $sample/$ref")
z=$(bc <<< "scale=2; 100*$u")
printf "%d, %d, %.2f, %.2f\n" "$ref" "$sample" "$u" "$z"
so hopefully you get an output like this:
41858, 38986, 0.93, 93.00
Notes:
There is no need to cd before executing a grep, it accepts the full path with the target filename as an argument. So without changing directory, you can grep various locations.
In order to save the output of your command (which is only a number) you don't need to save it in a file and cat the file. Just use the syntax var=$( ) and var will be assigned the output of this command substitution.
Have in mind that / will result to 0 for the division 38986/41858 because it's the integer division. If you want to do math calculations with decimals, you can see this post for how to do them using bc.
To print anything, use the shell builtin printf. Here the last two numbers are formatted with 2 decimal points.

Why does my bash script flag this awk substring command as a syntactic error when it works in the terminal?

I'm trying to extract a list of dates from a series of links using lynx's dump function and piping the output through grep and awk. This operation works successfully in the terminal and outputs dates accurately. However, when it is placed into a shell script, bash claims a syntax error:
Scripts/ETC/PreD.sh: line 18: syntax error near unexpected token `('
Scripts/ETC/PreD.sh: line 18: ` lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt'
For context, this is part of a while-read loop in which $link is being read from a file. Operations undertaken inside this while-loop when the awk command is removed are all successful, as are similar while-loops that include other awk commands.
I know that either I'm misunderstanding how bash handles variable substitution, or how bash handles awk commands, or some combination of the two. Any help would be immensely appreciated.
EDIT: Shellcheck is divided on this, the website version finds no error, but my downloaded version provides error SC1083, which says:
This { is literal. Check expression (missing ;/\n?) or quote it.
A check on the Shellcheck GitHub page provides this:
This error is harmless when the curly brackets are supposed to be literal, in e.g. awk {'print $1'}.
However, it's cleaner and less error prone to simply include them inside the quotes: awk '{print $1}'.
Script follows:
#!/bin/bash
while read -u 4 link
do
IFS=/ read a b c d e <<< "$link"
echo "$e" >> 1.txt
lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt
done 4< links.txt
In sed command you have unmatched ', due to unquoted '.
In awk script your have constant zero length variable.
From gawk manual:
substr(string, start [, length ])
Return a length-character-long substring of string, starting at character number start. The first character of a string is character
number one.48 For example, substr("washington", 5, 3) returns "ing".
If length is not present, substr() returns the whole suffix of string that begins at character number start. For example,
substr("washington", 5) returns "ington". The whole suffix is also
returned if length is greater than the number of characters remaining
in the string, counting from character start.
If start is less than one, substr() treats it as if it was one. (POSIX doesn’t specify what to do in this case: BWK awk acts this way,
and therefore gawk does too.) If start is greater than the number of
characters in the string, substr() returns the null string. Similarly,
if length is present but less than or equal to zero, the null string
is returned.
Also I suggest you combine grep|awk|sed|tr into single awk script. And debug the awk script with printouts.
From:
lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10,length)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
To:
lynx --dump "$link" | awk '/With/{found=1;next}found{found=0;print sub(/\(.*\),/,"& and",gsub(/ +/," ",substr($0,10)))}' >> 2.txt
From:
lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10,length)}' >> dates.txt
To:
lynx --dump "$link" | awk '/Date/{print substr($0,10)}' >> dates.txt

Bash, adding together the results of a grep

In bash I want to echo some integer value which is the sum of various "grep | wc -l" combinations.
I have tried
echo $( (`grep string file.txt | wc-l`) + (`grep string2 file.txt | wc -l`))
I assumed the return of these greps is just an integer bash would recognise but it doesn't. Where do I need to explicit and why?
Simplifying your inner commands (with seq to produce lines) but keeping your same parenthesis, this does not work:
$ echo $( (`seq 5 | wc -l`) + (`seq 10 | wc -l`))
-bash: command substitution: line 1: syntax error near unexpected token `+'
-bash: command substitution: line 1: ` (`seq 5 | wc -l`) + (`seq 10 | wc -l`)'
Arithmetic Expansion in Bash is two parenthesis next to each other:
$ echo $(( 1+2 ))
3
This works:
$ echo $((`seq 5 | wc -l` + `seq 10 | wc -l`))
15
As does the more modern version:
$ echo $(( $(seq 5 | wc -l) + $(seq 10 | wc -l) ))
15
So your parenthesis are not matched and spaced incorrectly.
As a side note, if you can refactor the two grep to produce one output with something like:
$ grep "string 1|string2" | wc -l
As Charles Duffy suggests, with command grouping either in a sub shell or same shell you can combine two greps output in a single stream. Then you do not need the echo or the arithmetic.
Using seq as a simple model for the lines from two different processes, you can do:
$ (seq 10; seq 5) | wc -l # sub shell
15
$ { seq 5; seq 10; } | wc -l # same shell
15
Finally, to the extent you have the "sum of various "grep | wc -l" combinations" you might consider awk as better grep + wc replacement.
You can do:
awk '/string1/{c++} /string2/{c++} END{print c+0}' file.txt
as well as far more complex combinations than you should be doing in Bash alone. It will perform much better if you have many different search strings.

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources