extract substring between charachters in specific posittion - shell

I got string as followed:
text=\"abcdef\"gfijk\"lmno\"
How can I extract the text between last two " - so I will get only lmno ?
I tried to use & but without success
sub_text=${text&\"*}
echo ${sub_text&\"*}

This is easily done with parameter expansion.
First, delete that final quote with ${var%pattern}, which removes the shortest match for pattern from the end of $var:
result=${text%'"'} # result=\"abcdef\"gfijk\"lmno
Then, delete everything from the beginning up to the last remaining quote with ${var##pattern}, which removes the longest match for pattern from the beginning of $var:
result=${result##*'"'} # result=lmno
...and then you're there:
echo "$result"

Related

Stripping a string with a another string shell

I am running a shell script and I have the following string:
keystore_location="/mnt/blumeta0/db2/keystore/keystore.p12" How do I fetch string before keystore: i.e /mnt/blumeta0/db2. I know how to strip on a single character delimiter and the path before keystore can change. I tried:
arrIN=(${keystore_location//\"keystore\"/ })
You want
arrIN=${keystore_location%%keystore*}
echo $arrIn
/mnt/blumeta0/db2/
The %% operator removes the longest match, reading from the right side of the string
Note that there are also operators
% --- remove first match from the right side of the string
# --- remove first match starting from the left side of the string
## --- remove longest match starting for the left side of the string.
IHTH
$ keystore_location="/mnt/blumeta0/db2/keystore/keystore.p12"
$ echo "${keystore_location%%/keystore*}"
/mnt/blumeta0/db2
%%/keystore* removes the longest suffix matching /keystore* -which is a glob pattern- from $keystore_location.

Build a variable made with 2 sub-stings of another variable in bash

Here is a script I use:
for dir in $(find . -type d -name "single_copy_busco_sequences"); do
sppname=$(dirname $(dirname $(dirname $dir))| sed 's#./##g');
for file in ${dir}/*.faa; do name=$(basename $file); cp $file /Users/admin/Documents/busco_aa/${sppname}_${name}; sed -i '' 's#>#>'${sppname}'|#g' /Users/admin/Documents/busco_aa/${sppname}_${name}; cut -f 1 -d ":" /Users/admin/Documents/busco_aa/${sppname}_${name} > /Users/admin/Documents/busco_aa/${sppname}_${name}.1;
done;
done
The sppname variable is something like Gender_species
do you know how could I add a line in my script to creat a new variable called abbrev which transformes Gender_species into Genspe, the 3 first letters cat with the 3 first letters after _
exemples:
Homo_sapiens gives Homsap
Canis_lupus gives Canlup
etc
Thank for your help :)
You can achieve this using a regular expression with sed:
echo "Homo_sapiens" | sed -e s'/^\(...\).*_\(...\).*/\1\2/'
Homsap
start, get 3 chars (to keep in \1), anything, _, anything, get 3 chars (to keep in \2), anything
Replace echo "Homo_sapiens" by your $dir thing
PS: will fail if you have less than 3 chars in one word
You can do it all with bash built-in parameter expansions. Specifically, string indexes and substring removal.
$ a=Homo_sapiens; prefix=${a:0:3}; a=${a#*_}; postfix=${a:0:3}; echo $prefix$postfix
Homsap
$ a=Canis_lupus; prefix=${a:0:3}; a=${a#*_}; postfix=${a:0:3}; echo $prefix$postfix
Canlup
Using bash built-ins is always more efficient than spawning separate subshell(s) to invoke utilities to accomplish the same thing.
Explanation
Your string index form (bash only) allows you to index characters from within a string, e.g.
* ${parameter:offset:length} ## indexes are zero based, ${a:0:2} is 1st 2 chars
Where parameter is simply the variable name holding the string.
(you can index from the end of a string by using a negative offset preceded by a space or enclosed in parenthesis, e.g. a=12345; echo ${a: -3:2} outputs "34")
prefix=${a:0:3} ## save the first 3 characters in prefix
a=${a#*_} ## remove the front of the string through '_' (see below)
postfix=${a:0:3} ## save the first 3 characters after '_'
Your substring removal forms (POSIX) are:
${parameter#word} trim to 1st occurrence of word from parameter from left
${parameter##word} trim to last occurrence of word from parameter from left
and
${parameter%word} trim to 1st occurrence of word from parameter from right
${parameter%%word} trim to last occurrence of word from parameter from right
(word can contain globbing to expand to a pattern as well)
a=${a#*_} ## trim from left up to (and including) the first '_'
See bash(1) - Linux manual page for full details.

Replace Last Occurrence of Substring in String (bash)

From the bash software manual:
${parameter/pattern/string}
The pattern is expanded to produce a
pattern just as in filename expansion. Parameter is expanded and the
longest match of pattern against its value is replaced with string.
... If pattern begins with ‘%’, it must match
at the end of the expanded value of parameter.
And so I've tried:
local new_name=${file/%old/new}
Where string is an absolute file path (/abc/defg/hij and old and new are variable strings.
However this seems to be trying to match the literal %sb1.
What is the syntax for this?
Expected Output:
Given
old=sb1
new=sb2
Then
/foo/sb1/foo/bar/sb1 should become /foo/sb1/foo/bar/sb2
/foo/foosb1other/foo/bar/foosb1bar should become /foo/foosb1other/foo/bar/foosb2bar
Using only shell-builtin parameter expansion:
src=sb1; dest=sb2
old=/foo/foosb1other/foo/bar/foosb1bar
if [[ $old = *"$src"* ]]; then
prefix=${old%"$src"*} # Extract content before the last instance
suffix=${old#"$prefix"} # Extract content *after* our prefix
new=${prefix}${suffix/"$src"/"$dest"} # Append unmodified prefix w/ suffix w/ replacement
else
new=$old
fi
declare -p new >&2
...properly emits:
declare -- new="/foo/foosb1other/foo/bar/foosb2bar"

what is the difference between pattern matching operators?

i have some questions about pattern matching operators.
what's difference between these examples
$ VAR=/usr/bin/iecset
$ echo ${VAR#*/}
usr/bin/iecset
and
$ VAR=/usr/bin/iecset
$ echo ${VAR##*/}
iecset
and
$ VAR=/usr/bin/iecset
$ echo ${VAR%*/}
/bin/iecset
and
$ VAR=/usr/bin/iecset
$ echo ${VAR%%*/}
The "pattern" here is a glob or extended glob pattern - most people call them wildcards. The characters have a different meaning to those used in Regular Expressions. So * means "zero or more of any character".
${var#pattern} # delete shortest match of pattern from left
${var##pattern} # delete longest match of pattern from left
${var%pattern} # delete shortest match of pattern from right
${var%%pattern} # delete longest match of pattern from right
Your examples:
(Bad idea to use uppercase variable names)
Hints:
to delete from the left, the * goes on the left.
# delete from left, as # is on the left in "We are #1"
% delete from the right, as the % is on the right in 50%
var=/usr/bin/iecset
Remove the shortest string on the left ending in /
echo ${var#*/}
usr/bin/iecset
There are no characters to the left of the first /, remember that * means zero or more. So the leftmost / is removed.
Remove the longest string on the left ending in /
echo ${var##*/}
iecset
The next two are wrong in your post! To delete from the right, the * should be on the right of the /.
$ echo ${VAR%*/} # WRONG
$ echo ${VAR%%*/} # WRONG
I think you mean:
Delete the shortest string on the right starting with /
var=/usr/bin/iecset
echo ${var%/*}
/usr/bin
Delete the longest string on the right starting with /
echo ${var%%/*}
(blank line)
There are many other meta-characters other than * you can use.

How do prevent whitespace from appearing in these bash variables?

I'm reading in values from an .ini file, and sometimes may get trailing or leading whitespace.
How do I amend this first line to prevent that?
db=$(sed -n 's/.*DB_USERNAME *= *\([^ ]*.*\)/\1/p' < config.ini);
echo -"$db"-
Result;
-myinivar -
I need;
-myinivar-
Use parameter expansion.
echo "=${db% }="
You don't need the .* inside the capturing group (or the semicolon at the end of line):
db="$(sed -n 's/.*DB_USERNAME *= *\([^ ]*\).*/\1/p' < config.ini)"
To elaborate:
.* matches anything at all
DB_USERNAME matches that literal string
* (a single space followed by an asterisk) matches any number of spaces
= matches that literal string
* (a single space followed by an asterisk) matches any number of spaces
\( starts the capturing group that is used for \1 later
[^ ] matches anything which is not a space character
* repeats that zero or more times
\) ends the capturing group
.* matches anything at all
Therefore, the result will be all the characters after DB_USERNAME = and any number of spaces, up to the next space or end of line, whichever comes first.
You can use echo to trim whitespace:
db='myinivar '
echo -"$(echo $db)"-
-myinivar-
Use crudini which handles these ini file edge cases transparently
db=$(crudini --get config.ini '' DB_USERNAME)
To get rid of more than one trailing space, use %% which removes the longest matching pattern from the end of the string
echo "=${db%% *}="

Resources