I have a string of form FOO_123_BAR.bazquux, where FOO and BAR are fixed strings, 123 is a number and bazquux is freeform text.
I need to perform a text transformation on this string: extract 123 and bazquux, increment the number and then arrange them in a different string.
For example, FOO_123_BAR.bazquux ⇒ FOO=124 BAR=bazquux.
(Actual transformation is more complex.)
Naturally, I can do this in a sequence of sed and expr calls, but it's ugly:
shopt -s lastpipe
echo "$in" | sed -r 's|^FOO_([0-9]+)_BAR\.(.+)$|\1 \2|' | read number text
out="FOO=$((number + 1)) BAR=$text"
Is there a more powerful text processing tool that can do the job in a single invocation? If yes, then how?
Edit: I apologize for not making this clearer, but the exact structure of the input and output is an example. Thus, I prefer general solutions that work with any delimiters or absence thereof, rather than solutions that depend on e. g. presence of underscores.

With GNU sed, you can execute the entire replacement string as an external command using the e flag.
$ s='FOO_123_BAR.bazquux'
$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\2/e'
FOO=124 BAR=bazquux
To avoid conflict with shell metacharacters, you need to quote the unknown portions:
$ s='FOO_123_BAR.$x(1)'
$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\2/e'
sh: 1: Syntax error: "(" unexpected
$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\x27\2\x27/e'
FOO=124 BAR=$x(1)

Using any awk in any shell on every UNIX box and assuming none of your substrings contain _ or .:
$ s='FOO_123_BAR.bazquux'
$ echo "$s" | awk -F'[_.]' '{print $1"="$2+1,$3"="$4}'
FOO=124 BAR=bazquux

You may do it with perl:
perl -pe 's|^FOO_([0-9]+)_BAR\.(.+)$|"FOO=" . ($1 + 1) . " BAR=" . $2|e' <<< "$in"
See the online demo
The ($1 + 1) will increment the number captured in Group 2.

Could you please try following, written and tested with shown samples in GNU awk.
1st solution: Adding solution with match function awk.
echo "FOO_123_BAR.bazquux" |
awk '
print array[1]"="array[2]+1,array[3] "=" substr($0,RSTART+RLENGTH+1)
2nd solution:
echo "FOO_123_BAR.bazquux" |
awk '
sub(/_/," ")

A pure bash one-liner would be
[[ $s =~ FOO_([0-9]+)_BAR\.(.*) ]] && echo "FOO=$((BASH_REMATCH[1] + 1)) BAR=${BASH_REMATCH[2]}"
assuming the variable s is set to the string that is being parsed before calling that line (s=FOO_123_BAR.bazquux).

Using var substitution:
raw=(${in//_/ })
$ echo "$raw=$[raw[1]+1] ${raw[2]//./=}"
FOO=124 BAR=bazquux


I have this
a/b/Test b/c/Test c/d/Test
and want to transform it into:
"a/b/Test", "b/c/Test", "c/d/Test"
I know I can use this (here: path=a/b/Test b/c/Test c/d/Test)
test=$(echo $path | sed 's/ /", "/g')
to transform it into
a/b/Test", "b/c/Test", "c/d/Test
But here I am missing the first and last ".
I dont quite know how to use sed for this. Can I somehow change it and use the anchors ^ and $ to get the first and last part of the string and add " there?
sed 's/.*/"&"/g ; s/ /", "/g' filename
You may use awk:
s='a/b/Test b/c/Test c/d/Test'
awk -v OFS=', ' '{for (i=1; i<=NF; i++) $i = "\"" $i "\""} 1' <<< "$s"
"a/b/Test", "b/c/Test", "c/d/Test"
awk is easier:
awk -v OFS=", " -v q='"' '{for(i=1;i<=NF;i++)$i=q $i q}7'
You may just add double quotes if you have a single line text:
test="a/b/Test b/c/Test c/d/Test"
test='"'$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g')'"'
echo "$test"
See the online demo
If you have multiple lines use
test=$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g; s/^/"/g; s/$/"/g')
test=$(echo "$test" | sed -E 's/[[:space:]]+/",&"/g; s/^|$/"/g')
See this online demo
The [[:space:]]\{1,\} POSIX BRE pattern (equal to [[:space:]]+ POSIX ERE) matches one or more whitespace chars and & in the replacement pattern inserts this matched value back in the resulting string.

I am trying to use awk to get the name of a file given the absolute path to the file.
For example, when given the input path /home/parent/child/filename I would like to get filename
I have tried:
awk -F "/" '{print $5}' input
which works perfectly.
However, I am hard coding $5 which would be incorrect if my input has the following structure:
So a generic solution requires always taking the last field (which will be the filename).
Is there a simple way to do this with the awk substr function?
Use the fact that awk splits the lines in fields based on a field separator, that you can define. Hence, defining the field separator to / you can say:
awk -F "/" '{print $NF}' input
as NF refers to the number of fields of the current record, printing $NF means printing the last one.
So given a file like this:
This would be the output:
$ awk -F"/" '{print $NF}' file
In this case it is better to use basename instead of awk:
$ basename /home/parent/child1/child2/filename
If you're open to a Perl solution, here one similar to fedorqui's awk solution:
perl -F/ -lane 'print $F[-1]' input
-F/ specifies / as the field separator
$F[-1] is the last element in the #F autosplit array
Another option is to use bash parameter substitution.
$ foo="/home/parent/child/filename"
$ echo ${foo##*/}
$ foo="/home/parent/child/child2/filename"
$ echo ${foo##*/}
Like 5 years late, I know, thanks for all the proposals, I used to do this the following way:
$ echo /home/parent/child1/child2/filename | rev | cut -d '/' -f1 | rev
Glad to notice there are better manners
It should be a comment to the basename answer but I haven't enough point.
If you do not use double quotes, basename will not work with path where there is space character:
$ basename /home/foo/bar foo/bar.png
ok with quotes " "
$ basename "/home/foo/bar foo/bar.png"
file example
$ cat a
/home/parent/child 1/child 2/child 3/filename1
/home/parent/child 1/child2/filename2
$ while read b ; do basename "$b" ; done < a
I know I'm like 3 years late on this but....
you should consider parameter expansion, it's built-in and faster.
if your input is in a var, let's say, $var1, just do ${var1##*/}. Look below
$ var1='/home/parent/child1/filename'
$ echo ${var1##*/}
$ var1='/home/parent/child1/child2/filename'
$ echo ${var1##*/}
$ var1='/home/parent/child1/child2/child3/filename'
$ echo ${var1##*/}
you can skip all of that complex regex :
echo '/home/parent/child1/child2/filename' |
mawk '$!_=$-_=$NF' FS='[/]'
2nd to last :
mawk '$!--NF=$NF' FS='/'
3rd last field :
echo '/home/parent/child1/child2/filename' |
mawk '$!--NF=$--NF' FS='[/]'
4th-last :
mawk '$!--NF=$(--NF-!-FS)' FS='/'
echo '/home/parent/child000/child00/child0/child1/child2/filename' |
echo '/home/parent/child1/child2/filename'
major caveat :
- `gawk/nawk` has a slight discrepancy with `mawk` regarding
- how it tracks multiple,
- and potentially conflicting, decrements to `NF`,
- so other than the 1st solution regarding last field,
- the rest for now, are only applicable to `mawk-1/2`
just realized it's much much cleaner this way in mawk/gawk/nawk :
echo '/home/parent/child1/child2/filename' | …
awk ++NF FS='.+/' OFS= # updated such that
# root "/" still gets printed
You can also use:
sed -n 's/.*\/\([^\/]\{1,\}\)$/\1/p'
sed -n 's/.*\/\([^\/]*\)$/\1/p'

According to the manual, the option -b can give the byte offset of a given occurence, but it seems to start from the beginning of the parsed content.
I need to retrieve the position of each matching content returned by grep. I used this line, but it's quite ugly:
grep '<REGEXP>' | while read -r line ; do echo $line | grep -bo '<REGEXP>' ; done
How to get it done in a more elegant way, with a more efficient use of GNU utils?
$ echo "abcdefg abcdefg" > test.txt
$ grep 'efg' | while read -r line ; do echo $line | grep -bo 'efg' ; done < test.txt
(Indeed, this command line doesn't output the line number, but it's not difficult to add it.)
With any awk (GNU or otherwise) in any shell on any UNIX box:
$ awk -v re='efg' -v OFS=':' '{
end = 0
while( match(substr($0,end+1),re) ) {
print NR, end+=RSTART, substr($0,end,RLENGTH)
}' test.txt
All strings, fields, array indices in awk start at 1, not zero, hence the output not looking like yours since to awk your input string is:
abcdefg abcdefg
rather than:
abcdefg abcdefg
Feel free to change the code above to end+=RSTART-1 and end+=RLENGTH if you prefer 0-indexed strings.
Perl is not a GNU util, but can solve your problem nicely:
perl -nle 'print "$.:$-[0]" while /efg/g'

I am trying to replace a pipe character in an String with the escaped character in it:
Input: "text|jdbc"
Output: "text\|jdbc"
I tried different things with tr:
echo "text|jdbc" | tr "|" "\\|"
But none of them worked.
Any help would be appreciated.
tr is good for one-to-one mapping of characters (read "translate").
\| is two characters, you cannot use tr for this. You can use sed:
echo 'text|jdbc' | sed -e 's/|/\\|/'
This example replaces one |. If you want to replace multiple, add the g flag:
echo 'text|jdbc' | sed -e 's/|/\\|/g'
An interesting tip by #JuanTomas is to use a different separator character for better readability, for example:
echo 'text|jdbc' | sed -e 's_|_\\|_g'
You can take advantage of the fact that | is a special character in bash, which means the %q modifier used by printf will escape it for you:
$ printf '%q\n' "text|jdbc"
A more general solution that doesn't require | to be treated specially is
$ f="text|jdbc"
$ echo "${f//|/\\|}"
${f//foo/bar} expands f and replaces every occurance of foo with bar. The operator here is /; when followed by another /, it replaces all occurrences of the search pattern instead of just the first one. For example:
$ f="text|jdbc|two"
$ echo "${f/|/\\|}"
$ echo "${f//|/\\|}"
You can try with awk:
echo "text|jdbc" | awk -F'|' '$1=$1' OFS="\\\|"

So basically something like expr index '0123 some string' '012345789' but reversed.
I want to find the index of the first character that is not one of the given characters...
I'd rather not use RegEx, if it is possible...
You can remove chars with tr and pick the first from what is left
left=$(tr -d "012345789" <<< "0123_some string"); echo ${left:0:1}
once you have the char to find the index follow the same
expr index "0123_some string" ${left:0:1}
Using gnu awk and FPAT you can do this:
str="0123 some string"
awk -v FPAT='[012345789]+' '{print length($1)}' <<< "$str"
awk -v FPAT='[02345789]+' '{print length($1)}' <<< "$str"
awk -v FPAT='[01345789]+' '{print length($1)}' <<< "$str"
awk -v FPAT='[0123 ]+' '{print length($1)}' <<< "$str"
I know this is in Perl but I got to say that I like it:
$ perl -pe '$i++while s/^\d//;$_=$i' <<< '0123 some string'
In case of 1-based index you can use $. which is initialized at 1 when dealing with single lines:
$ perl -pe '$.++while s/^\d//;$_=$.' <<< '0123 some string'
I'm using \d because I assume that you by mistake left out the number 6 from the list 012345789
Index is currently pointing to the space:
0123 some string
^ this space
Even if shell globing might look similar, it is not a regex.
It could be done in two steps: cut the string, count characters (length).
a="$1" ### string to process
b='0-9' ### range of characters not desired.
c=${a%%[!$b]*} ### cut the string at the first (not) "$b".
echo "${#c}" ### Print the value of the position index (from 0).
It is written to work on many shells (including bash, of course).
Use as:
$ "0123_some string"
$ "012s3_some string"
