bash - extract part of variable that starts with digit - bash

I have the following variable 'VAR1' in a bash script:
VAR1 = "/path/to/file/190909_AAA_ZZZ/"
I now want to create a variable (VAR2) that only contains the part "190909".
I want to do this by extracting the part that starts with any 6 digits (190909) until the next "_"
How can this be achieved?
VAR2 = ${grep ... $VAR1} ???

Please try the following:
VAR1="/path/to/file/190909_AAA_ZZZ/"
[[ $VAR1 =~ ([0-9]{6})_ ]] && VAR2=${BASH_REMATCH[1]}
echo "$VAR2"
Output:
190909
Note that it is not recommended to use uppercase letters for normal variable names.

You may use this sed command:
var1="/path/to/file/190909_AAA_ZZZ/"
var2=$(sed -E 's~.*/([0-9]{6})_.*~\1~' <<< "$var1")
echo "$var2"
190909
RegEx Details:
.*/: Match anything (greedy) till we match /
([0-9]{6}): Match 6 digits and capture it in group #1
_.*: Match _ and everything until end
Replacement is \1 which is to put captured group #1 value back.

pcre grep and perl equivalents:
$ VAR1="/path/to/file/190909_AAA_ZZZ/"
$ grep -oP '[0-9]{6}(?=_)' <<< $VAR1
190909
$ perl -nE 'say for /([0-9]{6}(?=_))/' <<< $VAR1
190909
Explanation on regex: https://regex101.com/r/D2NVOl/1

Related

How can I increment an infix variable in Bash?

I have a string foo-0 that I want to convert to bar1baz, i.e., parse the trailing index and add a prefix/suffix. The part before the trailing index (in this case foo- can also contain numeric characters, but those should not be changed.
I tried the following:
echo foo-0 | cut -d'-' -f 2 | sed 's/.*/bar&baz/'
but that gives me only a partial solution (bar0baz). How can I increment the infix variable?
EDIT: the solutions below only work partially for what I am trying to achieve. This is my fault because I simplified the example above too much for the sake of clarity.
The final goal is to set an environmental variable (let's call it MY_ENV) to the output value using bash with the following syntax:
/bin/sh -c "echo $var | ... (some bash magic to replace the trailing index) | ... (some bash magic to set MY_ENV=the output of the pipe)"
Side note: The reason I am using /bin/sh -c "..." is because I want to use the command in a Kubernetes YAML.
Partial solution (using awk)
This works:
echo foo-0 | awk -F- '{print "bar" $2+1 "baz"}'
This doesn't (output is 1baz):
/bin/sh -c "echo foo-0 | awk -F- '{print \"bar\" $2+1 \"baz\"}'
Partial solution (using arithmetic context and parameter expansion)
$ var=foo-0
$ echo "bar$((${var//[![:digit:]]}+1))baz"
This does not work if var contains other numeric characters before the trailing index (e.g. for var foo=r2a-foo-0.
You may use awk:
awk -F- '{print "bar" $2+1 "baz"}' <<< 'foo-0'
bar1baz
You could use an arithmetic context and parameter expansion:
$ var=foo-0
$ echo "bar$((${var//[![:digit:]]}+1))baz"
bar1baz
Unrolled, from the inside:
${var//[![:digit:]]} removes all non-digits from var:
$ echo "${var//[![:digit:]]}"
0
$((blah+1)) adds 1 to the variable blah:
$ blah=0
$ echo "$((blah+1))"
1
or, instead of blah, we can use the result of the inner substitution:
$ echo "$(( ${var//[![:digit:]]} + 1 ))"
1
and finally, putting this between bar and baz, you get bar1baz.
Amending for the other case brought up: assuming there might be other digits and we want to increment only the trailing ones, e.g.,
var=2a-foo-21
To do this, we can use nested parameter expansion with extended globs (shopt -s extglob) and the +(pattern) pattern, which matches one or more of pattern. Observe:
$ echo "${var#"${var%%+([[:digit:]])}"}"
21
The outer expansion is ${var#pattern}, which removes the shortest match of pattern from the beginning of $var. For pattern, we use
"${var%%+([[:digit:]])}"
which is "remove the longest match of +([[:digit:]]) (one or more digits) from the end of $var". This leaves us with just the trailing digits, and incrementing them and adding string before and after looks something like this:
$ echo "bar$((${var#"${var%%+([[:digit:]])}"}+1))baz"
bar22baz
This is so unreadable that I'd suggest using regex instead:
$ re='([[:digit:]]+)$'
$ [[ $var =~ $re ]]
$ echo "bar$((${BASH_REMATCH[1]}+1))baz"
bar22baz

How to add a hyphen after every fifth character of a word in bash

Given "ABCDEFGHIJKLMOPQRSTUVWXY"
How does one achieve this outcome? "ABCDE-FGHIJ-KLMNO-PQRST-UVWXY"
With sed you can do this by first adding a - after every 5 characters, then removing the trailing - at the end of the line:
$ sed -E 's/.{5}/&-/g; s/-$//' <<<"ABCDEFGHIJKLMNOPQRSTUVWXY"
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
In extended (-E) mode:
.{5} matches any 5 characters
&- replaces with the whole match (the 5 characters) plus -
Then the second substitution command matches - at the end of the line ($) and replaces with nothing.
With GNU awk, one option would be to use FPAT to define the way the line is interpreted as a series of fields, then add - between each field:
$ awk -v FPAT='.{5}' -v OFS='-' '{ $1 = $1 } 1' <<<"ABCDEFGHIJKLMNOPQRSTUVWXY"
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
The field pattern FPAT is defined as any 5 characters and the Output Field Separator OFS is defined as -. $1 = $1 "touches" every line, causing it to be reformatted (without this part, nothing would happen). 1 is the shortest true condition causing each line to be printed.
It's not too difficult to do this in bash either:
#!/bin/bash
input="ABCDEFGHIJKLMNOPQRSTUVWXY"
parts=()
# build an array from slices of length 5
for (( i = 0; i < ${#input}; i += 5 )) do
parts+=( "${input:i:5}" )
done
# join the array on IFS (use a subshell to avoid modifying IFS for rest of script)
( IFS=-; echo "${parts[*]}" )
Could you please try following.
echo "ABCDEFGHIJKLMOPQRSTUVWXY" | sed 's/...../&-/g;s/-$//'
A simple solution for only letters will be
sed -E 's/[A-Z]{4}./&-/g' file.txt
The output will be:
ABCDE-FGHIJ-KLMOP-QRSTU-VWXY
if you want them to include more than capital letters just do a:
sed -E 's/[A-Za-z]{4}./&-/g' file.txt
Try this
#!/bin/bash
s="ABCDEFGHIJKLMNOPQRSTUVWXY"
a=($(echo ${s} | grep -o .))
o=""
i=0
while [[ ${i} -lt ${#a[#]} ]]; do
o="${o}${a[${i}]}"
(( i++ ))
[[ $(( i % 5 )) -eq 0 ]] && [[ ${i} -ne ${#a[#]} ]] && o="${o}-"
done
echo ${o}
exit 0
another solution with fold/paste
$ echo {A..Y} | tr -d ' ' | # this is to generate the string
fold -w5 | paste -sd-
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
This might work for you (GNU sed):
sed 's/.\{5\}\B/&-/g' file
Insert a hyphen every five characters as long as the fifth character is inside a word.
Yet another choice
perl -pe 's/(.{5})(?=.)/$1-/g' file
Match 5 characters that are followed by another character (to avoid the trailing hyphen problem)

Cut a substring in bash

Suppose I have the following string:
some letters foo/substring/goo/some additional letters
I need to extract this substring supposing that foo/ and /goo are constant strings that are known in advance. How can I do that?
This sed one-liner does it.
sed 's#.*foo/##;s#/goo/.*##' file
Except for sed, awk, grep can do the job too. Or with zsh:
kent$ v="some letters foo/substring/goo/some additional letters"
kent$ echo ${${v##*foo/}%%/goo/*}
substring
Note that:
comment by #Nahuel Fouilleul
in ${var%%/goo/*} var must be a variable name, and can't be the result of expansion
The line should be divided into two statements, if work with bash.
$ echo $0
bash
$ v="some letters foo/substring/goo/some additional letters"
$ v=${v##*foo/}
$ v=${v%%/goo/*}
$ echo $v
substring
The line I executed in zsh, worked, but just I tested in bash, it didn't work.
$ echo $0
-zsh
$ v="some letters foo/substring/goo/some additional letters"
$ echo ${${v##*foo/}%%/goo/*}
substring
With variable expansion
line='some letters foo/substring/goo/some additional letters'
line=${line%%/goo*} # remove suffix /goo*
line=${line##*foo/} # remove prefix *ffo/
echo "$line"
or bash regular expression
line='some letters foo/substring/goo/some additional letters'
if [[ $line =~ foo/([^/]*)/goo ]]; then
echo "${BASH_REMATCH[1]}"
fi
If you know there are no other / in your "other letters", you can use cut :
> echo "some letters foo/substring/goo/some additional letters" | cut -d'/' -f2
In terms of readability I think awk is a good solution
echo "some letters foo/substring/goo/some additional letters" | awk -v FS="(foo/|/goo)" '{print $2}'

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?
Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")
First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

How can I capture the text between specific delimiters into a shell variable?

I have little problem with specifying my variable. I have a file with normal text and somewhere in it there are brackets [ ] (only 1 pair of brackets in whole file), and some text between them. I need to capture the text within these brackets in a shell (bash) variable. How can I do that, please?
Bash/sed:
VARIABLE=$(tr -d '\n' filename | sed -n -e '/\[[^]]/s/^[^[]*\[\([^]]*\)].*$/\1/p')
If that is unreadable, here's a bit of an explanation:
VARIABLE=`subexpression` Assigns the variable VARIABLE to the output of the subexpression.
tr -d '\n' filename Reads filename, deletes newline characters, and prints the result to sed's input
sed -n -e 'command' Executes the sed command without printing any lines
/\[[^]]/ Execute the command only on lines which contain [some text]
s/ Substitute
^[^[]* Match any non-[ text
\[ Match [
\([^]]*\) Match any non-] text into group 1
] Match ]
.*$ Match any text
/\1/ Replaces the line with group 1
p Prints the line
May I point out that while most of the suggested solutions might work, there is absolutely no reason why you should fork another shell, and spawn several processes to do such a simple task.
The shell provides you with all the tools you need:
$ var='foo[bar] pinch'
$ var=${var#*[}; var=${var%%]*}
$ echo "$var"
bar
See: http://mywiki.wooledge.org/BashFAQ/073
Sed is not necessary:
var=`egrep -o '\[.*\]' FILENAME | tr -d ][`
But it's only works with single line matches.
Using Bash builtin regex matching seems like yet another way of doing it:
var='foo[bar] pinch'
[[ "$var" =~ [^\]\[]*\[([^\[]*)\].* ]] # Bash 3.0
var="${BASH_REMATCH[1]}"
echo "$var"
Assuming you are asking about bash variable:
$ export YOUR_VAR=$(perl -ne'print $1 if /\[(.*?)\]/' your_file.txt)
The above works if brackets are on the same line.
What about:
shell_variable=$(sed -ne '/\[/,/\]/{s/^.*\[//;s/\].*//;p;}' $file)
Worked for me on Solaris 10 under Korn shell; should work with Bash too. Replace '$(...)' with back-ticks in Bourne shell.
Edit: worked when given [ on one line and ] on another. For the single line case as well, use:
shell_variable=$(sed -n -e '/\[[^]]*$/,/\]/{s/^.*\[//;s/\].*//;p;}' \
-e '/\[.*\]/s/^.*\[\([^]]*\)\].*$/\1/p' $file)
The first '-e' deals with the multi-line spread; the second '-e' deals with the single-line case. The first '-e' says:
From the line containing an open bracket [ not followed by a close bracket ] on the same line
Until the line containing close bracket ],
substitute anything up to and including the open bracket with an empty string,
substitute anything from the close bracket onwards with an empty string, and
print the result
The second '-e' says:
For any line containing both open bracket and close bracket
Substitute the pattern consisting of 'characters up to and including open bracket', 'characters up to but excluding close bracket' (and remember this), 'stuff from close bracket onwards' with the remembered characters in the middle, and
print the result
For the multi-line case:
$ file=xxx
$ cat xxx
sdsajdlajsdl
asdajsdkjsaldjsal
sdasdsad [aaaa
bbbbbbb
cccc] asdjsalkdjsaldjlsaj
asdjsalkdjlksjdlaj
asdasjdlkjsaldja
$ shell_variable=$(sed -n -e '/\[[^]]*$/,/\]/{s/^.*\[//;s/\].*//;p;}' \
-e '/\[.*\]/s/^.*\[\([^]]*\)\].*$/\1/p' $file)
$ echo $shell_variable
aaaa bbbbbbb cccc
$
And for the single-line case:
$ cat xxx
sdsajdlajsdl
asdajsdkjsaldjsal
sdasdsad [aaaa bbbbbbb cccc] asdjsalkdjsaldjlsaj
asdjsalkdjlksjdlaj
asdasjdlkjsaldja
$
$ shell_variable=$(sed -n -e '/\[[^]]*$/,/\]/{s/^.*\[//;s/\].*//;p;}' \
-e '/\[.*\]/s/^.*\[\([^]]*\)\].*$/\1/p' $file)
$ echo $shell_variable
aaaa bbbbbbb cccc
$
Somewhere about here, it becomes simpler to do the whole job in Perl, slurping the file and editing the result string in two multi-line substitute operations.
var=`grep -e '\[.*\]' test.txt | sed -e 's/.*\[\(.*\)\].*/\1/' infile.txt`
Thanks to everyone, i used Strager's version and works perfectly, thanks alot once again...
var=`grep -e '\[.*\]' test.txt | sed -e 's/.*\[\(.*\)\].*/\1/' infile.txt`
Backslashes (BSL) got munched up ... :
var='foo[bar] pinch'
[[ "$var" =~ [^\]\[]*\[([^\[]*)\].* ]] # Bash 3.0
# Just in case ...:
[[ "$var" =~ [^BSL]BSL[]*BSL[([^BSL[]*)BSL].* ]] # Bash 3.0
var="${BASH_REMATCH[1]}"
echo "$var"
2 simple steps to extract the text.
split var at [ and get the right part
split var at ] and get the left part
cb0$ var='foo[bar] pinch'
cb0$ var=${var#*[}
cb0$ var=${var%]*} && echo $var
bar

Resources