How to get first and second part of a string in bash on last occurrence of a delimiter - bash

I need to split a string on the last occurrence of a delimiter and get both the parts into two variables.
Input could be
- stringOne_One/stringOne_Two/stingOne_Three
- stringTwo_One/stringTwo_Two
I want to split the string on the last occurrence of the delimiter "/" and get both the first and the last part of the string into two variables.
For the first example output should be
var1=stringOne_One/stringOne_Two
var2=stringOne_Three
For the second example, output should be
var1=stringTwo_One
var2=stringTwo_Two
How do I do this in bash. Would prefer a solution using AWK, but any other method is also acceptable.

Use dirname and basename like so:
my_var='stringOne_One/stringOne_Two/stingOne_Three'
var1=$(basename $my_var)
# stingOne_Three
var2=$(dirname $my_var)
# stringOne_One/stringOne_Two

With bash and a regex:
string='stringOne_One/stringOne_Two/stingOne_Three'
[[ "$string" =~ (.*)/(.*) ]]
var1="${BASH_REMATCH[1]}"
var2="${BASH_REMATCH[2]}"

Using parameter expansion:
$ x="stringOne_One/stringOne_Two/stingOne_Three"
$ var1="${x%/*}"
$ var2="${x##*/}"
$ echo "${var1} : ${var2}"
stringOne_One/stringOne_Two : stingOne_Three
$ x="stringTwo_One/stringTwo_Two"
$ var1="${x%/*}"
$ var2="${x##*/}"
$ echo "${var1} : ${var2}"
stringTwo_One : stringTwo_Two

Related

How to prepend to a string that comes out of a pipe

I have two strings saved in a bash variable delimited by :. I want to get extract the second string, prepend that with THIS_VAR= and append it to a file named saved.txt
For example if myVar="abc:pqr", THIS_VAR=pqr should be appended to saved.txt.
This is what I have so far,
myVar="abc:pqr"
echo $myVar | cut -d ':' -f 2 >> saved.txt
How do I prepend THIS_VAR=?
printf 'THIS_VAR=%q\n' "${myVar#*:}"
See Shell Parameter Expansion and run help printf.
The more general solution in addition to #konsolebox's answer is piping into a compound statement, where you can perform arbitrary operations:
echo This is in the middle | {
echo This is first
cat
echo This is last
}

In bash how can I get the last part of a string after the last hyphen [duplicate]

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

Bash matching part of string

Say I have a string like
s1="sxfn://xfn.oxbr.ac.uk:8843/xfn/mech2?XFN=/castor/
xf.oxbr.ac.uk/prod/oxbr.ac.uk/disk/xf20.m.ac.uk/prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst"
or
s2="sxfn://xfn.gla.ac.uk:8841/xfn/mech2?XFN=/castor/
xf.gla.ac.uk/space/disk1/prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst"
and I want in my script to extract the last part starting from prod/ i.e. "prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst". Note that $s1 contains two occurrences of "prod/".
What is the most elegant way to do this in bash?
Using BASH string manipulations you can do:
echo "prod/${s1##*prod/}"
prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst
echo "prod/${s2##*prod/}"
prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst
With awk (which is a little overpowered for this, but it may be helpful if you have a file full of these strings you need to parse:
echo "sxfn://xfn.gla.ac.uk:8841/xfn/mech2?XFN=/castor/xf.gla.ac.uk/space/disk1/prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst" | awk -F"\/prod" '{print "/prod"$NF}'
That's splitting the string by '/prod' then printing out the '/prod' delimiter and the last token in the string ($NF)
sed can do it nicely:
s1="sxfn://xfn.oxbr.ac.uk:8843/xfn/mech2?XFN=/castor/xf.oxbr.ac.uk/prod/oxbr.ac.uk/disk/xf20.m.ac.uk/prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst"
echo "$s1" | sed 's/.*\/prod/\/prod/'
this relies on the earger matching of the .* part up front.

unix shell replace string twice (in one line)

I run a script with the param -A AA/BB . To get an array with AA and BB, i can do this.
INPUT_PARAM=(${AIRLINE_OPTION//-A / }) #get rid of the '-A ' in the begining
LIST=(${AIRLINES_PARAM//\// }) # split by '/'
Can we achieve this in a single line?
Thanks in advance.
One way
IFS=/ read -r -a LIST <<< "${AIRLINE_OPTION//-A /}"
This places the output from the parameter substitution ${AIRLINE_OPTION//-A /} into a "here-string" and uses the bash read built-in to parse this into an array. Splitting by / is achieved by setting the value of IFS to / for the read command.
LIST=( $(IFS=/; for x in ${AIRLINE_OPTION#-A }; do printf "$x "; done) )
This is a portable solution, but if your read supports -a and you don't mind portability then you should go for #1_CR's solution.
With awk, for example, you can create an array and store it in LIST variable:
$ LIST=($(awk -F"[\/ ]" '{print $2,$3}' <<< "-A AA/BB"))
Result:
$ echo ${LIST[0]}
AA
$ echo ${LIST[1]}
BB
Explanation
-F"[\/ ]" defines two possible field separators: a space or a slash /.
'{print $2$3}' prints the 2nd and 3rd fields based on those separators.

how to chop last n bytes of a string in bash string choping?

for example qa_sharutils-2009-04-22-15-20-39, want chop last 20 bytes, and get 'qa_sharutils'.
I know how to do it in sed, but why $A=${A/.\{20\}$/} does not work?
Thanks!
If your string is stored in a variable called $str, then this will get you give you the substring without the last 20 digits in bash
${str:0:${#str} - 20}
basically, string slicing can be done using
${[variableName]:[startIndex]:[length]}
and the length of a string is
${#[variableName]}
EDIT:
solution using sed that works on files:
sed 's/.\{20\}$//' < inputFile
similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3
using awk:
echo $str | awk '{print substr($0,1,length($0)-20)}'
or using strings manipulation - echo ${string:position:length}:
echo ${str:0:$((${#str}-20))}
In the ${parameter/pattern/string} syntax in bash, pattern is a path wildcard-style pattern, not a regular expression. In wildcard syntax a dot . is just a literal dot and curly braces are used to match a choice of options (like the pipe | in regular expressions), so that line will simply erase the literal string ".20".
There are several ways to accomplish the basic task.
$ str="qa_sharutils-2009-04-22-15-20-39"
If you want to strip the last 20 characters. This substring selection is zero based:
$ echo ${str::${#str}-20}
qa_sharutils
The "%" and "%%" to strip from the right hand side of the string. For instance, if you want the basename, minus anything that follows the first "-":
$ echo ${str%%-*}
qa_sharutils
only if your last 20 bytes is always date.
$ str="qa_sharutils-2009-04-22-15-20-39"
$ IFS="-"
$ set -- $str
$ echo $1
qa_sharutils
$ unset IFS
or when first dash and beyond are not needed.
$ echo ${str%%-*}
qa_sharutils

Resources