How to do operations depending on the presence of a specific string in bash? - bash

I am working with a csv file, so imagine I have this column:
5;10;>11;20;<14
My desired output would be:
5;10;12;20;13
So I would like to add +1 to those values who have the greater than (>) symbol and to subtract 1 to those values with a lesser than (<) symbol with bash language. I have tried something weird with sed but given that it interprets those changes as strings it didn't work out.
Any suggestions?

With awk (tested with GNU awk):
$ awk -F\; -v OFS=\; '
{
for(i = 1; i <= NF; i++) {
if($i ~ /^<[[:digit:]]+$/) {
sub(/^</,"",$i)
$i--
}
else if($i ~ /^>[[:digit:]]+$/) {
sub(/^>/,"",$i)
$i++
}
}
} 1' <<< "5;10;>11;20;<14"
5;10;12;20;13
Warning: use the following if and only if you trust your input file and you are 100% sure it does not contains malicious fields (see the final note).
With GNU sed (and assuming your shell is bash), a bit shorter but also a bit more difficult to understand (as usual with sed):
$ sed -E '
s/<([[:digit:]]+)/$((\1-1))/g
s/>([[:digit:]]+)/$((\1+1))/g
s/.*/printf "%s\n" "&"/e
' <<< "5;10;>11;20;<14"
5;10;12;20;13
That is (where N is a string of digits), substitute all <N with $((N-1)), all >N with $((N+1)), substitute the resulting string S with printf "%s\n" "S", execute it with bash and replace with the output (this is what the e modifier of the substitute command does). In your example the input string successively becomes:
5;10;>11;20;$((14-1))
5;10;$((11+1));20;$((14-1))
printf "%s\n" "5;10;$((11+1));20;$((14-1))"
5;10;12;20;13
The reason why there is a serious security issue here is that if one of your fields is, for instance, $(rm -rf ~/*) it will simply and recursively delete your entire home directory... So, if you do not control the input prefer the awk version.

5;10;>11;20;<14
|
{m,g}awk '
BEGIN {
_*=(OFS= "") (__-=_^= FS ="("(\
___="\31\17")"|"(____="\16\24")")+"
} {
gsub(";[<>][0-9]+",____ "&" ___)
gsub(____ ";[<>]", "&" ___)
NF
for(_+=(_^=($_=$_)<"")+_;_<=NF;_++) {
if ($_~"^[0-9]+$") {
$_+=__^($(_+__)~"[<]$")
}
} print $(_=_<_) }'
=
5;10;>12;20;<13

Related

Convert a bash array into an awk array

I have an array in bash and want to use this array in an awk script. How can I pass the array from bash to awk?
The keys of the awk array should be the indices of the bash array. For simplicity, we can assume that the bash array is dense, that is, the array is not sparse like a=([3]=x [5]=y).
The elements inside the array can have any value. Besides strange unicode symbols and ascii control characters they may contain spaces or even newlines. Also, there might be empty ("") entries which should be retained. As an example consider the following array:
a=(AB " C D " $'E\nF\tG' "¼ẞ🍕" "")
Extending approach #1 provided by Socowi, it is possible to address the shortcoming that he identified using the awk split function. Note that this solution does not use the stdin - it uses command line options - allowing awk to process stdin, files, etc.
The solution will convert the 'a' bash array into the 'a' awk, using intermediate awk file AVG (process substituion). This is a workaround to the bash limit that prevent NUL from being stored in a string.
a=(AB " C D " $'E\nF\tG' "¼ẞ🍕" "")
awk -v AVF=<(printf '%s\0' "${a[#]}") '
BEGIN {
# Temporary RS to allow reading the array with a single read.
saveRS=RS
RS=""
getline AV < AVF
rs = saveRS
na=split(AV, a, "\\0")
# Remove trailing empty element (printf add trailing separator).
delete a[na]
na-- ; for (i=1 ; i<=na ; i++ ) print "AV#", i, "=" a[i]
}{
# Use a[x]
}
'
Output:
1 AB
2 C D
3 E
F G
4 ¼ẞ🍕
5
Previous solution: For practical reason, Using the '\001' character as separator. make the script much easier (can use any other character sequence that is known not to appear in the info array). Bash command substitution does not allow NUL character. Hopefully, not a major issue, as this control character is not used for normal files, etc. I believe possible to solve this, but I'm not how.
The solution will convert the 'a' bash array into the 'a' awk, using intermediate awk variable 'AV'.
a=(AB " C D " $'E\nF\tG' "¼ẞ🍕" "")
awk -v AV="$(printf '%s\1' "${a[#]}")" '
BEGIN {
na=split(AV, a, "\\1") }
# Remove trailing empty element (printf add trailing separator).
delete a[na]
for (i=1 ; i<=na ; i++ ) print "AV#", i, "=" a[i]
{
# Use a[x]
}
'
Approach 1: Reading in awk
Since the array elements can contain any character but the null byte (\0) we have to delimit them by \0. This is done with printf. For simplicity we assume that the array has at least one entry.
Due to the \0 we can no longer pass the string to awk as an argument but have to use (or emulate) a file instead. We then read that file in awk using \0 as the record separator RS (may require GNU awk).
awk 'BEGIN {RS="\0"} {a[n++]=$0; next}' <(printf %s\\0 "${a[#]}")
This reliably constructs the awk array a from the bash array a. The length of a is stored in n.
This approach is ugly when you actually want to use it. There is no simple step-by-step instruction on how to incorporate this approach into your existing awk script. Normally, your awk script would read another file afterwards, therefore you have to change the record separator RS after the array file was read. This can be done with NR>FNR. However, if your awk script already reads multiple files and relies on something like NR==FNR things get complicated.
Approach 2: Generating awk Code with bash
Instead of parsing the array in awk we hard-code the array by generating awk code. This code will be injected at the beginning of an existing awk script and initialize the array. This approach also supports sparse arrays and associative arrays and should work with all awk versions, not only GNU.
For the code generation we have to correctly quote all strings. For example, the code generator echo "a[0]=${a[0]}" would fail if ${a[0]} was " resulting in the code a[1]=""". POSIX awk supports octal escape sequences (\012) which can encode all bytes. We simply encoding everything. That way we cannot forget any special symbols (even though the generated code is a bit inefficient).
octString() {
printf %s "$*" | od -bvAn | tr ' ' '\\' | tr -d '\n'
}
arrayToAwk() {
printf 'BEGIN{'
n=0
for key in "${!a[#]}"; do
printf 'a["%s"]="%s";' "$(octString "$key")" "$(octString "${a[$key]}")"
((n++))
done
echo "n=$n}"
}
The function arrayToAwk converts the bash array a (can be sparse or associative) into a BEGIN block. After inserting the generated code block at the begging of your existing awk program you can use the awk array a anywhere inside awk without having to adapt anything (assuming that the variable names a and n were unused before). n is the size of the awk array a.
For awk commands of the form awk ... 'program' ... use
awk ... "$(arrayToAwk)"'program' ...
For big arrays this might result in the error Argument list too long. You can circumvent this problem using a program file:
awk ... -f <(arrayToAwk; echo 'program') ...
For awk commands of the form awk ... -f progfile ... use
awk ... -f <(arrayToAwk; cat progfile) ...
I'd like to point out that this can be extremely simple if you do not mind using ARGV and deleting all the non-file arguments. One way:
>cat awk_script.sh
#!/bin/awk -f
BEGIN{
i=1
while(ARGV[i] != "--" && i < ARGC) {
print ARGV[i]
delete ARGV[i]
i++
}
if(i < ARGC)
delete ARGV[i]
} {
print "File 1 contains at 1",$1
}
Then run it with:
>./awk_script.sh "${a[#]}" -- file1
AB
C D
E
F G
¼ẞ�
File 1 contains at 1 a
Obviously I'm missing some symbols.
Note while I like this method it assumes -- is not in the array, as pointed out by Oguz Ismail. They give a great alternate solution of having the first argument the length of your list.
This can be a one liner to where you have
awk 'BEGIN{... get and delete first arguments ...}{process files}END{if wanted} "${a[#]}" file1 file2...
but will become unreadable very quickly.

Parse out key=value pairs into variables

I have a bunch of different kinds of files I need to look at periodically, and what they have in common is that the lines have a bunch of key=value type strings. So something like:
Version=2 Len=17 Hello Var=Howdy Other
I would like to be able to reference the names directly from awk... so something like:
cat some_file | ... | awk '{print Var, $5}' # prints Howdy Other
How can I go about doing that?
The closest you can get is to parse the variables into an associative array first thing every line. That is to say,
awk '{ delete vars; for(i = 1; i <= NF; ++i) { n = index($i, "="); if(n) { vars[substr($i, 1, n - 1)] = substr($i, n + 1) } } Var = vars["Var"] } { print Var, $5 }'
More readably:
{
delete vars; # clean up previous variable values
for(i = 1; i <= NF; ++i) { # walk through fields
n = index($i, "="); # search for =
if(n) { # if there is one:
# remember value by name. The reason I use
# substr over split is the possibility of
# something like Var=foo=bar=baz (that will
# be parsed into a variable Var with the
# value "foo=bar=baz" this way).
vars[substr($i, 1, n - 1)] = substr($i, n + 1)
}
}
# if you know precisely what variable names you expect to get, you can
# assign to them here:
Var = vars["Var"]
Version = vars["Version"]
Len = vars["Len"]
}
{
print Var, $5 # then use them in the rest of the code
}
$ cat file | sed -r 's/[[:alnum:]]+=/\n&/g' | awk -F= '$1=="Var"{print $2}'
Howdy Other
Or, avoiding the useless use of cat:
$ sed -r 's/[[:alnum:]]+=/\n&/g' file | awk -F= '$1=="Var"{print $2}'
Howdy Other
How it works
sed -r 's/[[:alnum:]]+=/\n&/g'
This places each key,value pair on its own line.
awk -F= '$1=="Var"{print $2}'
This reads the key-value pairs. Since the field separator is chosen to be =, the key ends up as field 1 and the value as field 2. Thus, we just look for lines whose first field is Var and print the corresponding value.
Since discussion in commentary has made it clear that a pure-bash solution would also be acceptable:
#!/bin/bash
case $BASH_VERSION in
''|[0-3].*) echo "ERROR: Bash 4.0 required" >&2; exit 1;;
esac
while read -r -a words; do # iterate over lines of input
declare -A vars=( ) # refresh variables for each line
set -- "${words[#]}" # update positional parameters
for word; do
if [[ $word = *"="* ]]; then # if a word contains an "="...
vars[${word%%=*}]=${word#*=} # ...then set it as an associative-array key
fi
done
echo "${vars[Var]} $5" # Here, we use content read from that line.
done <<<"Version=2 Len=17 Hello Var=Howdy Other"
The <<<"Input Here" could also be <file.txt, in which case lines in the file would be iterated over.
If you wanted to use $Var instead of ${vars[Var]}, then substitute printf -v "${word%%=*}" %s "${word*=}" in place of vars[${word%%=*}]=${word#*=}, and remove references to vars elsewhere. Note that this doesn't allow for a good way to clean up variables between lines of input, as the associative-array approach does.
I will try to explain you a very generic way to do this which you can adapt easily if you want to print out other stuff.
Assume you have a string which has a format like this:
key1=value1 key2=value2 key3=value3
or more generic
key1_fs2_value1_fs1_key2_fs2_value2_fs1_key3_fs2_value3
With fs1 and fs2 two different field separators.
You would like to make a selection or some operations with these values. To do this, the easiest is to store these in an associative array:
array["key1"] => value1
array["key2"] => value2
array["key3"] => value3
array["key1","full"] => "key1=value1"
array["key2","full"] => "key2=value2"
array["key3","full"] => "key3=value3"
This can be done with the following function in awk:
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
So, after processing the string, you have the full flexibility to do operations in any way you like:
awk '
function str2map(str,fs1,fs2,map, n,tmp) {
n=split(str,map,fs1)
for (;n>0;n--) {
split(map[n],tmp,fs2);
map[tmp[1]]=tmp[2]; map[tmp[1],"full"]=map[n]
delete map[n]
}
}
{ str2map($0," ","=",map) }
{ print map["Var","full"] }
' file
The advantage of this method is that you can easily adapt your code to print any other key you are interested in, or even make selections based on this, example:
(map["Version"] < 3) { print map["var"]/map["Len"] }
The simplest and easiest way is to use the string substitution like this:
property='my.password.is=1234567890=='
name=${property%%=*}
value=${property#*=}
echo "'$name' : '$value'"
The output is:
'my.password.is' : '1234567890=='
Yore.
Using bash's set command, we can split the line into positional parameters like awk.
For each word, we'll try to read a name value pair delimited by =.
When we find a value, assign it to the variable named $key using bash's printf -v feature.
#!/usr/bin/env bash
line='Version=2 Len=17 Hello Var=Howdy Other'
set $line
for word in "$#"; do
IFS='=' read -r key val <<< "$word"
test -n "$val" && printf -v "$key" "$val"
done
echo "$Var $5"
output
Howdy Other
SYNOPSIS
an awk-based solution that doesn't require manually checking the fields to locate the desired key pair :
approach being avoid splitting unnecessary fields or arrays - only performing regex match via function call when needed
only returning FIRST occurrence of input key value. Subsequent matches along the row are NOT returned
i just called it S() cuz it's the closest letter to $
I only included an array (_) of the 3 test values for demo purposes. Those aren't needed. In fact, no state information is being kept at all
caveat being : key-match must be exact - this version of the code isn't for case-insensitive or fuzzy/agile matching
Tested and confirmed working on
- gawk 5.1.1
- mawk 1.3.4
- mawk-2/1.9.9.6
- macos nawk
CODE
# gawk profile, created Fri May 27 02:07:53 2022
{m,n,g}awk '
function S(__,_) {
return \
! match($(_=_<_), "(^|["(_="[:blank:]]")")"(__)"[=][^"(_)"*") \
? "^$" \
: substr(__=substr($-_, RSTART, RLENGTH), index(__,"=")+_^!_)
}
BEGIN { OFS = "\f" # This array is only for testing
_["Version"] _["Len"] _["Var"] # purposes. Feel free to discard at will
} {
for (__ in _) {
print __, S(__) } }'
OUTPUT
Var
Howdy
Len
17
Version
2
So either call the fields in BAU fashion
- $5, $0, $NF, etc
or call S(QUOTED_KEY_VALUE), case-sensitive, like
As a safeguard, to prevent mis-interpreting null strings
or invalid inputs as $0, a non-match returns ^$
instead of empty string
S("Version") to get back 2.
As a bonus, it can safely handle values in multibyte unicode, both for values and even for keys, regardless of whether ur awk is UTF-8-aware or not :
1 ✜
🤡
2 Version
2
3 Var
Howdy
4 Len
17
5 ✜=🤡 Version=2 Len=17 Hello Var=Howdy Other
I know this is particularly regarding awk but mentioning this as many people come here for solutions to break down name = value pairs ( with / without using awk as such).
I found below way simple straight forward and very effective in managing multiple spaces / commas as well -
Source: http://jayconrod.com/posts/35/parsing-keyvalue-pairs-in-bash
change="foo=red bar=green baz=blue"
#use below if var is in CSV (instead of space as delim)
change=`echo $change | tr ',' ' '`
for change in $changes; do
set -- `echo $change | tr '=' ' '`
echo "variable name == $1 and variable value == $2"
#can assign value to a variable like below
eval my_var_$1=$2;
done

awk substitution ascii table rules bash

I want to perform a hierarchical set of (non-recursive) substitutions in a text file.
I want to define the rules in an ascii file "table.txt" which contains lines of blank space tabulated pairs of strings:
aaa 3
aa 2
a 1
I have tried to solve it with an awk script "substitute.awk":
BEGIN { while (getline < file) { subs[$1]=$2; } }
{ line=$0; for(i in subs)
{ gsub(i,subs[i],line); }
print line;
}
When I call the script giving it the string "aaa":
echo aaa | awk -v file="table.txt" -f substitute.awk
I get
21
instead of the desired "3". Permuting the lines in "table.txt" doesn't help. Who can explain what the problem is here, and how to circumvent it? (This is a simplified version of my actual task. Where I have a large file containing ascii encoded phonetic symbols which I want to convert into Latex code. The ascii encoding of the symbols contains {$,&,-,%,[a-z],[0-9],...)).
Any comments and suggestions!
PS:
Of course in this application for a substitution table.txt:
aa ab
a 1
a original string: "aa" should be converted into "ab" and not "1b". That means a string which was yielded by applying a rule must be left untouched.
How to account for that?
The order of the loop for (i in subs) is undefined by default.
In newer versions of awk you can use PROCINFO["sorted_in"] to control the sort order. See section 12.2.1 Controlling Array Traversal and (the linked) section 8.1.6 Using Predefined Array Scanning Orders for details about that.
Alternatively, if you can't or don't want to do that you could store the replacements in numerically indexed entries in subs and walk the array in order manually.
To do that you will need to store both the pattern and the replacement in the value of the array and that will require some care to combine. You can consider using SUBSEP or any other character that cannot be in the pattern or replacement and then split the value to get the pattern and replacement in the loop.
Also note the caveats/etcץ with getline listed on http://awk.info/?tip/getline and consider not using that manually but instead using NR==1{...} and just listing table.txt as the first file argument to awk.
Edit: Actually, for the manual loop version you could also just keep two arrays one mapping input file line number to the patterns to match and another mapping patterns to replacements. Then looping over the line number array will get you the pattern and the pattern can be used in the second array to get the replacement (for gsub).
Instead of storing the replacements in an associative array, put them in two arrays indexed by integer (one array for the strings to replace, one for the replacements) and iterate over the arrays in order:
BEGIN {i=0; while (getline < file) { subs[i]=$1; repl[i++]=$2}
n = i}
{ for(i=0;i<n;i++) { gsub(subs[i],repl[i]); }
print tolower($0);
}
It seems like perl's zero-width word boundary is what you want. It's a pretty straightforward conversion from the awk:
#!/usr/bin/env perl
use strict;
use warnings;
my %subs;
BEGIN{
open my $f, '<', 'table.txt' or die "table.txt:$!";
while(<$f>) {
my ($k,$v) = split;
$subs{$k}=$v;
}
}
while(<>) {
while(my($k, $v) = each %subs) {
s/\b$k\b/$v/g;
}
print;
}
Here's an answer pulled from another StackExchange site, from a fairly similar question: Replace multiple strings in a single pass.
It's slightly different in that it does the replacements in inverse order by length of target string (i.e. longest target first), but that is the only sensible order for targets which are literal strings, as appears to be the case in this question as well.
If you have tcc installed, you can use the following shell function, which process the file of substitutions into a lex-generated scanner which it then compiles and runs using tcc's compile-and-run option.
# Call this as: substitute replacements.txt < text_to_be_substituted.txt
# Requires GNU sed because I was too lazy to write a BRE
substitute () {
tcc -run <(
{
printf %s\\n "%option 8bit noyywrap nounput" "%%"
sed -r 's/((\\\\)*)(\\?)$/\1\3\3/;
s/((\\\\)*)\\?"/\1\\"/g;
s/^((\\.|[^[:space:]])+)[[:space:]]*(.*)/"\1" {fputs("\3",yyout);}/' \
"$1"
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
} | lex -t)
}
With gcc or clang, you can use something similar to compile a substitution program from the replacement list, and then execute that program on the given text. Posix-standard c99 does not allow input from stdin, but gcc and clang are happy to do so provided you tell them explicitly that it is a C program (-x c). In order to avoid excess compilations, we use make (which needs to be gmake, Gnu make).
The following requires that the list of replacements be in a file with a .txt extension; the cached compiled executable will have the same name with a .exe extension. If the makefile were in the current directory with the name Makefile, you could invoke it as make repl (where repl is the name of the replacement file without a text extension), but since that's unlikely to be the case, we'll use a shell function to actually invoke make.
Note that in the following file, the whitespace at the beginning of each line starts with a tab character:
substitute.mak
.SECONDARY:
%: %.exe
#$(<D)/$(<F)
%.exe: %.txt
#{ printf %s\\n "%option 8bit noyywrap nounput" "%%"; \
sed -r \
's/((\\\\)*)(\\?)$$/\1\3\3/; #\
s/((\\\\)*)\\?"/\1\\"/g; #\
s/^((\\.|[^[:space:]])+)[[:space:]]*(.*)/"\1" {fputs("\3",yyout);}/' \
"$<"; \
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"; \
} | lex -t | c99 -D_POSIX_C_SOURCE=200809L -O2 -x c -o "$#" -
Shell function to invoke the above:
substitute() {
gmake -f/path/to/substitute.mak "${1%.txt}"
}
You can invoke the above command with:
substitute file
where file is the name of the replacements file. (The filename must end with .txt but you don't have to type the file extension.)
The format of the input file is a series of lines consisting of a target string and a replacement string. The two strings are separated by whitespace. You can use any valid C escape sequence in the strings; you can also \-escape a space character to include it in the target. If you want to include a literal \, you'll need to double it.
If you don't want C escape sequences and would prefer to have backslashes not be metacharacters, you can replace the sed program with a much simpler one:
sed -r 's/([\\"])/\\\1/g' "$<"; \
(The ; \ is necessary because of the way make works.)
a) Don't use getline unless you have a very specific need and fully understand all the caveats, see http://awk.info/?tip/getline
b) Don't use regexps when you want strings (yes, this means you cannot use sed).
c) The while loop needs to constantly move beyond the part of the line you've already changed or you could end up in an infinite loop.
You need something like this:
$ cat substitute.awk
NR==FNR {
if (NF==2) {
strings[++numStrings] = $1
old2new[$1] = $2
}
next
}
{
for (stringNr=1; stringNr<=numStrings; stringNr++) {
old = strings[stringNr]
new = old2new[old]
slength = length(old)
tail = $0
$0 = ""
while ( sstart = index(tail,old) ) {
$0 = $0 substr(tail,1,sstart-1) new
tail = substr(tail,sstart+slength)
}
$0 = $0 tail
}
print
}
$ echo aaa | awk -f substitute.awk table.txt -
3
$ echo aaaa | awk -f substitute.awk table.txt -
31
and adding some RE metacharacters to table.txt to show they are treated just like every other character and showing how to run it when the target text is stored in a file instead of being piped:
$ cat table.txt
aaa 3
aa 2
a 1
. 7
\ 4
* 9
$ cat foo
a.a\aa*a
$ awk -f substitute.awk table.txt foo
1714291
Your new requirement requires a solution like this:
$ cat substitute.awk
NR==FNR {
if (NF==2) {
strings[++numStrings] = $1
old2new[$1] = $2
}
next
}
{
delete news
for (stringNr=1; stringNr<=numStrings; stringNr++) {
old = strings[stringNr]
new = old2new[old]
slength = length(old)
tail = $0
$0 = ""
charPos = 0
while ( sstart = index(tail,old) ) {
charPos += sstart
news[charPos] = new
$0 = $0 substr(tail,1,sstart-1) RS
tail = substr(tail,sstart+slength)
}
$0 = $0 tail
}
numChars = split($0, olds, "")
$0 = ""
for (charPos=1; charPos <= numChars; charPos++) {
$0 = $0 (charPos in news ? news[charPos] : olds[charPos])
}
print
}
.
$ cat table.txt
1 a
2 b
$ echo "121212" | awk -f substitute.awk table.txt -
ababab

Show different context on different grep keyword?

I know -A -B -C could be used to show context around the grep keyword.
My question is, how to show different context on different keyword?
For example, how do I show -A 5 for cat, -B 4 for dog, and -C 1 for monkey:
egrep -A3 "cat|dog|monkey" <file>
// this just show 3 after lines for each keyword.
i don't think there's any way to do it with a single grep call, but you could run it through grep once for each variable and concatenate the output:
var=$(grep -n -A 5 cat file)$'\n'$(grep -n -B 4 dog file)$'\n'$(grep -n -C 1 monkey file)
var=$(sort -un <(echo "$var"))
now echo "$var" will produce the same output as you would have gotten from your single command, plus line numbers and context indicators (the : prefix indicates a line that matched the pattern exactly, and the - prefix indicates a line being included because of the -A -B and/or -C options).
the reason i included the line numbers thus far is to preserve the order of the results you would have seen had you managed to do this in one statement. if you like them, great, but if not, you can use the following line to cut them out:
var=$(cut -d: -f2- <(echo "$var") | cut -d- -f2-)
this passes it through once to cut the exact matching lines' prefixes, then again to cut the context matches' prefixes.
pretty? no. but it works.
I'm afraid grep won't do that. You'll have to use a different tool. Perhaps write your own program.
Something like this would do it:
awk '
BEGIN{ ARGV[ARGC++] = ARGV[1] }
function prtB(nr) { for (i=FNR-nr; i<FNR; i++) print a[i] }
function prtA(nr) { for (i=FNR+1; i<=FNR+nr; i++) print a[i] }
NR==FNR{ a[NR]; next }
/cat/ { print; prtA(5) }
/dog/ { prtB(4); print }
/monkey/ { prtB(1); print; prtA(1) }
' file
check the math on the loops in the functions. You didn't say how you'd want to handle lines that contain monkey AND dog, for example.
EDIT: here's an untested solution that would print the maximum context around any match and let you specify the contexts on the command line and won't use as much memory as the above cheap and cheerful solution:
awk -v cxts="cat:0:5\ndog:4:0\nmonkey:1:1" '
BEGIN{
ARGV[ARGC++] = ARGV[1]
numCxts = split(cxts,cxtsA,RS)
for (i=1;i<=numCxts;i++) {
regex = cxtsA[i]
n = split(regex,rangeA,/:/)
sub(/:[^:]+:[^:]+$/,"",regex)
endA[regex] = rangeA[n]
startA[regex] = rangeA[n-1]
regexA[regex]
}
}
NR==FNR{
for (regex in regexA) {
if ($0 ~ regex) {
start = NR - startA[regex]
end = NR + endA[regex]
for (i=start; i<=end; i++) {
prt[i]
}
}
}
next
}
FNR in prt
' file
Separate the searched for patterns in the cxts variable with whatever your RS value is, newline by default.

How can I bump a version number using bash

I would like to know how to bump the last digit in a version number using bash.
e.g.
VERSION=1.9.0.9
NEXT_VERSION=1.9.0.10
EDIT: The version number will only contain natural numbers.
Can the solution be generic to handle any number of parts in a version number.
e.g.
1.2
1.2.3
1.2.3.4
1.2.3.4.5
TL;DR:
VERSION=1.9.0.9
echo $VERSION | awk -F. '/[0-9]+\./{$NF++;print}' OFS=.
# will print 1.9.0.10
For a detailed explanation, read on.
Let's start with the basic answer by froogz3301:
VERSIONS="
1.2.3.4.4
1.2.3.4.5.6.7.7
1.9.9
1.9.0.9
"
for VERSION in $VERSIONS; do
echo $VERSION | awk -F. '{$NF = $NF + 1;} 1' | sed 's/ /./g'
done
How can we improve on this? Here are a bunch of ideas extracted from the copious set of comments.
The trailing '1' in the program is crucial to its operation, but it is not the most explicit way of doing things. The odd '1' at the end is a boolean value that is true, and therefore matches every line and triggers the default action (since there is no action inside braces after it) which is to print $0, the line read, as amended by the previous command.
Hence, why not this awk command, which obviates the sed command?
awk -F. '{$NF+=1; OFS="."; print $0}'
Of course, we could refine things further — in several stages. You could use the bash '<<<' string redirection operator to avoid the pipe:
awk -F. '...' <<< $VERSION
The next observation would be that given a series of lines, a single execution of awk could handle them all:
echo "$VERSIONS" | awk -F. '/[0-9]+\./{$NF+=1;OFS=".";print}'
without the for loop. The double quotes around "$VERSION" preserve the newlines in the string. The pipe is still unnecessary, leading to:
awk -F. '/[0-9]+\./{$NF+=1;OFS=".";print}' <<< "$VERSIONS"
The regex ignores the blank lines in $VERSION by only processing lines that contain a digit followed by a dot. Of course, setting OFS in each line is a tad clumsy, and '+=1' can be abbreviated '++', so you could use:
awk -F. '/[0-9]+\./{$NF++;print}' OFS=. <<< "$VERSIONS"
(or you could include 'BEGIN{OFS="."}' in the program, but that is rather verbose.
The '<<<' notation is only supported by Bash and not by Korn, Bourne or other POSIX shells (except as a non-standard extension parallelling the Bash notation). The AWK program is going to be supported by any version of awk you are likely to be able to lay hands on (but the variable assignment on the command line was not supported by old UNIX 7th Edition AWK).
I have come up with this.
VERSIONS="
1.2.3.4.4
1.2.3.4.5.6.7.7
1.9.9
1.9.0.9
"
for VERSION in $VERSIONS; do
echo $VERSION | awk -F. '{$NF = $NF + 1;} 1' | sed 's/ /./g'
done
if [[ "$VERSION" == *.* ]]; then
majorpart="${VERSION%.*}."
else
majorpart=""
fi
minorpart="${VERSION##*.}"
NEXT_VERSION="$majorpart$((minorpart+1))"
Warning: if the minor part of the version number isn't in the expected format (integer, no leading zeros), this may have trouble. Some examples: "1.033" -> "1.28" (since 033 is octal for 27), "1.2.b" -> "1.2.1" (unless b is a defined variable, it'll be treated as 0), "1.2.3a" -> error ("3a" isn't a number). Depending on how many cases you want to cover, this can be made arbitrarily complex.
Well, Jonathan Leffler already answered the question, however I've generalized the solution to accept an arbitrary diff (passed as an awk parameter versionDiff):
VERSION="1.4.1.2"
awk -v versionDiff="0.1" -F. -f bump.awk OFS=. <<< "$VERSION"
the result will be:
1.5.0.0
as the numbers after last non-zero versionDiff number are zeroed.
and the bump.awk:
/[0-9]+\./ {
n = split(versionDiff, versions, ".")
if(n>NF) nIter=n; else nIter=NF
lastNonzero = nIter
for(i = 1; i <= nIter; ++i) {
if(int(versions[i]) > 0) {
lastNonzero = i
}
$i = versions[i] + $i
}
for(i = lastNonzero+1; i <= nIter; ++i) {
$i = 0
}
print
}

Resources