Cut a substring in bash - bash

Suppose I have the following string:
some letters foo/substring/goo/some additional letters
I need to extract this substring supposing that foo/ and /goo are constant strings that are known in advance. How can I do that?

This sed one-liner does it.
sed 's#.*foo/##;s#/goo/.*##' file
Except for sed, awk, grep can do the job too. Or with zsh:
kent$ v="some letters foo/substring/goo/some additional letters"
kent$ echo ${${v##*foo/}%%/goo/*}
substring
Note that:
comment by #Nahuel Fouilleul
in ${var%%/goo/*} var must be a variable name, and can't be the result of expansion
The line should be divided into two statements, if work with bash.
$ echo $0
bash
$ v="some letters foo/substring/goo/some additional letters"
$ v=${v##*foo/}
$ v=${v%%/goo/*}
$ echo $v
substring
The line I executed in zsh, worked, but just I tested in bash, it didn't work.
$ echo $0
-zsh
$ v="some letters foo/substring/goo/some additional letters"
$ echo ${${v##*foo/}%%/goo/*}
substring

With variable expansion
line='some letters foo/substring/goo/some additional letters'
line=${line%%/goo*} # remove suffix /goo*
line=${line##*foo/} # remove prefix *ffo/
echo "$line"
or bash regular expression
line='some letters foo/substring/goo/some additional letters'
if [[ $line =~ foo/([^/]*)/goo ]]; then
echo "${BASH_REMATCH[1]}"
fi

If you know there are no other / in your "other letters", you can use cut :
> echo "some letters foo/substring/goo/some additional letters" | cut -d'/' -f2

In terms of readability I think awk is a good solution
echo "some letters foo/substring/goo/some additional letters" | awk -v FS="(foo/|/goo)" '{print $2}'

Related

bash - extract part of variable that starts with digit

I have the following variable 'VAR1' in a bash script:
VAR1 = "/path/to/file/190909_AAA_ZZZ/"
I now want to create a variable (VAR2) that only contains the part "190909".
I want to do this by extracting the part that starts with any 6 digits (190909) until the next "_"
How can this be achieved?
VAR2 = ${grep ... $VAR1} ???
Please try the following:
VAR1="/path/to/file/190909_AAA_ZZZ/"
[[ $VAR1 =~ ([0-9]{6})_ ]] && VAR2=${BASH_REMATCH[1]}
echo "$VAR2"
Output:
190909
Note that it is not recommended to use uppercase letters for normal variable names.
You may use this sed command:
var1="/path/to/file/190909_AAA_ZZZ/"
var2=$(sed -E 's~.*/([0-9]{6})_.*~\1~' <<< "$var1")
echo "$var2"
190909
RegEx Details:
.*/: Match anything (greedy) till we match /
([0-9]{6}): Match 6 digits and capture it in group #1
_.*: Match _ and everything until end
Replacement is \1 which is to put captured group #1 value back.
pcre grep and perl equivalents:
$ VAR1="/path/to/file/190909_AAA_ZZZ/"
$ grep -oP '[0-9]{6}(?=_)' <<< $VAR1
190909
$ perl -nE 'say for /([0-9]{6}(?=_))/' <<< $VAR1
190909
Explanation on regex: https://regex101.com/r/D2NVOl/1

How can I increment an infix variable in Bash?

I have a string foo-0 that I want to convert to bar1baz, i.e., parse the trailing index and add a prefix/suffix. The part before the trailing index (in this case foo- can also contain numeric characters, but those should not be changed.
I tried the following:
echo foo-0 | cut -d'-' -f 2 | sed 's/.*/bar&baz/'
but that gives me only a partial solution (bar0baz). How can I increment the infix variable?
EDIT: the solutions below only work partially for what I am trying to achieve. This is my fault because I simplified the example above too much for the sake of clarity.
The final goal is to set an environmental variable (let's call it MY_ENV) to the output value using bash with the following syntax:
/bin/sh -c "echo $var | ... (some bash magic to replace the trailing index) | ... (some bash magic to set MY_ENV=the output of the pipe)"
Side note: The reason I am using /bin/sh -c "..." is because I want to use the command in a Kubernetes YAML.
Partial solution (using awk)
This works:
echo foo-0 | awk -F- '{print "bar" $2+1 "baz"}'
This doesn't (output is 1baz):
/bin/sh -c "echo foo-0 | awk -F- '{print \"bar\" $2+1 \"baz\"}'
Partial solution (using arithmetic context and parameter expansion)
$ var=foo-0
$ echo "bar$((${var//[![:digit:]]}+1))baz"
This does not work if var contains other numeric characters before the trailing index (e.g. for var foo=r2a-foo-0.
You may use awk:
awk -F- '{print "bar" $2+1 "baz"}' <<< 'foo-0'
bar1baz
You could use an arithmetic context and parameter expansion:
$ var=foo-0
$ echo "bar$((${var//[![:digit:]]}+1))baz"
bar1baz
Unrolled, from the inside:
${var//[![:digit:]]} removes all non-digits from var:
$ echo "${var//[![:digit:]]}"
0
$((blah+1)) adds 1 to the variable blah:
$ blah=0
$ echo "$((blah+1))"
1
or, instead of blah, we can use the result of the inner substitution:
$ echo "$(( ${var//[![:digit:]]} + 1 ))"
1
and finally, putting this between bar and baz, you get bar1baz.
Amending for the other case brought up: assuming there might be other digits and we want to increment only the trailing ones, e.g.,
var=2a-foo-21
To do this, we can use nested parameter expansion with extended globs (shopt -s extglob) and the +(pattern) pattern, which matches one or more of pattern. Observe:
$ echo "${var#"${var%%+([[:digit:]])}"}"
21
The outer expansion is ${var#pattern}, which removes the shortest match of pattern from the beginning of $var. For pattern, we use
"${var%%+([[:digit:]])}"
which is "remove the longest match of +([[:digit:]]) (one or more digits) from the end of $var". This leaves us with just the trailing digits, and incrementing them and adding string before and after looks something like this:
$ echo "bar$((${var#"${var%%+([[:digit:]])}"}+1))baz"
bar22baz
This is so unreadable that I'd suggest using regex instead:
$ re='([[:digit:]]+)$'
$ [[ $var =~ $re ]]
$ echo "bar$((${BASH_REMATCH[1]}+1))baz"
bar22baz

How to add a hyphen after every fifth character of a word in bash

Given "ABCDEFGHIJKLMOPQRSTUVWXY"
How does one achieve this outcome? "ABCDE-FGHIJ-KLMNO-PQRST-UVWXY"
With sed you can do this by first adding a - after every 5 characters, then removing the trailing - at the end of the line:
$ sed -E 's/.{5}/&-/g; s/-$//' <<<"ABCDEFGHIJKLMNOPQRSTUVWXY"
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
In extended (-E) mode:
.{5} matches any 5 characters
&- replaces with the whole match (the 5 characters) plus -
Then the second substitution command matches - at the end of the line ($) and replaces with nothing.
With GNU awk, one option would be to use FPAT to define the way the line is interpreted as a series of fields, then add - between each field:
$ awk -v FPAT='.{5}' -v OFS='-' '{ $1 = $1 } 1' <<<"ABCDEFGHIJKLMNOPQRSTUVWXY"
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
The field pattern FPAT is defined as any 5 characters and the Output Field Separator OFS is defined as -. $1 = $1 "touches" every line, causing it to be reformatted (without this part, nothing would happen). 1 is the shortest true condition causing each line to be printed.
It's not too difficult to do this in bash either:
#!/bin/bash
input="ABCDEFGHIJKLMNOPQRSTUVWXY"
parts=()
# build an array from slices of length 5
for (( i = 0; i < ${#input}; i += 5 )) do
parts+=( "${input:i:5}" )
done
# join the array on IFS (use a subshell to avoid modifying IFS for rest of script)
( IFS=-; echo "${parts[*]}" )
Could you please try following.
echo "ABCDEFGHIJKLMOPQRSTUVWXY" | sed 's/...../&-/g;s/-$//'
A simple solution for only letters will be
sed -E 's/[A-Z]{4}./&-/g' file.txt
The output will be:
ABCDE-FGHIJ-KLMOP-QRSTU-VWXY
if you want them to include more than capital letters just do a:
sed -E 's/[A-Za-z]{4}./&-/g' file.txt
Try this
#!/bin/bash
s="ABCDEFGHIJKLMNOPQRSTUVWXY"
a=($(echo ${s} | grep -o .))
o=""
i=0
while [[ ${i} -lt ${#a[#]} ]]; do
o="${o}${a[${i}]}"
(( i++ ))
[[ $(( i % 5 )) -eq 0 ]] && [[ ${i} -ne ${#a[#]} ]] && o="${o}-"
done
echo ${o}
exit 0
another solution with fold/paste
$ echo {A..Y} | tr -d ' ' | # this is to generate the string
fold -w5 | paste -sd-
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
This might work for you (GNU sed):
sed 's/.\{5\}\B/&-/g' file
Insert a hyphen every five characters as long as the fifth character is inside a word.
Yet another choice
perl -pe 's/(.{5})(?=.)/$1-/g' file
Match 5 characters that are followed by another character (to avoid the trailing hyphen problem)

How to remove special characters from strings but keep underscores in shell script

I have a string that is something like "info_A!__B????????C_*". I wan to remove the special characters from it but keep underscores and letters. I tried with [:word:] (ASCII letters and _) character set, but it says "invalid character set". any idea how to handle this ? Thanks.
text="info_!_????????_*"
if [ -z `echo $text | tr -dc "[:word:]"` ]
......
Using bash parameter expansion:
$ var='info_A!__B????????C_*'
$ echo "${var//[^[:alnum:]_]/}"
info_A__BC_
A sed one-liner would be
sed 's/[^[:alnum:]_]//g' <<< 'info_!????????*'
gives you
info_
An awk one-liner would be
awk '{gsub(/[^[:alnum:]_]/,"",$0)} 1' <<< 'info_!??A_??????*pi9ngo^%$_mingo745'
gives you
info_A_pi9ngo_mingo745
If you don't wish to have numbers in the output then change :alnum: to :alpha:.
My tr doesn't understand [:word:]. I had to do like this:
$ x=$(echo 'info_A!__B????????C_*' | tr -cd '[:alnum:]_')
$ echo $x
info_A__BC_
Not sure if its robust way but it worked for your sample text.
sed one-liner:
echo "SamPlE_#tExT%, really ?" | sed -e 's/[^a-z^A-Z|^_]//g'
SamPlE_tExTreally

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?
Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")
First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

Resources