Build a variable made with 2 sub-stings of another variable in bash - bash

Here is a script I use:
for dir in $(find . -type d -name "single_copy_busco_sequences"); do
sppname=$(dirname $(dirname $(dirname $dir))| sed 's#./##g');
for file in ${dir}/*.faa; do name=$(basename $file); cp $file /Users/admin/Documents/busco_aa/${sppname}_${name}; sed -i '' 's#>#>'${sppname}'|#g' /Users/admin/Documents/busco_aa/${sppname}_${name}; cut -f 1 -d ":" /Users/admin/Documents/busco_aa/${sppname}_${name} > /Users/admin/Documents/busco_aa/${sppname}_${name}.1;
done;
done
The sppname variable is something like Gender_species
do you know how could I add a line in my script to creat a new variable called abbrev which transformes Gender_species into Genspe, the 3 first letters cat with the 3 first letters after _
exemples:
Homo_sapiens gives Homsap
Canis_lupus gives Canlup
etc
Thank for your help :)

You can achieve this using a regular expression with sed:
echo "Homo_sapiens" | sed -e s'/^\(...\).*_\(...\).*/\1\2/'
Homsap
start, get 3 chars (to keep in \1), anything, _, anything, get 3 chars (to keep in \2), anything
Replace echo "Homo_sapiens" by your $dir thing
PS: will fail if you have less than 3 chars in one word

You can do it all with bash built-in parameter expansions. Specifically, string indexes and substring removal.
$ a=Homo_sapiens; prefix=${a:0:3}; a=${a#*_}; postfix=${a:0:3}; echo $prefix$postfix
Homsap
$ a=Canis_lupus; prefix=${a:0:3}; a=${a#*_}; postfix=${a:0:3}; echo $prefix$postfix
Canlup
Using bash built-ins is always more efficient than spawning separate subshell(s) to invoke utilities to accomplish the same thing.
Explanation
Your string index form (bash only) allows you to index characters from within a string, e.g.
* ${parameter:offset:length} ## indexes are zero based, ${a:0:2} is 1st 2 chars
Where parameter is simply the variable name holding the string.
(you can index from the end of a string by using a negative offset preceded by a space or enclosed in parenthesis, e.g. a=12345; echo ${a: -3:2} outputs "34")
prefix=${a:0:3} ## save the first 3 characters in prefix
a=${a#*_} ## remove the front of the string through '_' (see below)
postfix=${a:0:3} ## save the first 3 characters after '_'
Your substring removal forms (POSIX) are:
${parameter#word} trim to 1st occurrence of word from parameter from left
${parameter##word} trim to last occurrence of word from parameter from left
and
${parameter%word} trim to 1st occurrence of word from parameter from right
${parameter%%word} trim to last occurrence of word from parameter from right
(word can contain globbing to expand to a pattern as well)
a=${a#*_} ## trim from left up to (and including) the first '_'
See bash(1) - Linux manual page for full details.

Related

how to find pattern and insert text in middle using shell script

I would like to add a name in the middle of dirPath
#!/bin/bash
name='agent_name-2'
dirPath='/var/azp/1/s'
I want to insert agent_name-2 after /var/azp in dirPath, and store it in a separate variable result like this
result=/var/azp/agent_name-2/1/s
If /var/azp is a hard coded string (i.e. constant), try:
name='agent_name-2'
dirPath='/var/azp/1/s'
result="/var/azp/$name${dirPath#/var/azp}"
Explanation: ${dirPath#/var/azp} removes the string /var/azp from the beginning of the string $dirPath.
Try this:
#!/bin/bash
name='agent_name-2'
dirPath='/var/azp/1/s'
Split dirPath by / and store it in the array dirs.
IFS=/ read -r -a dirs <<< "$dirPath"
Calculate the middle of the array.
middle=$(((${#dirs[#]}+1)/2))
Create two new arrays left and right with the left and right half of the dirs array.
left=("${dirs[#]:0:$middle}")
right=("${dirs[#]:$middle}")
Join the left and right half and put the name in between.
result="$(printf "%s/" "${left[#]}" "$name" "${right[#]}")"
Remove the trailing slash.
result=${result%/}
Bash search-replace
You can use Bash's search and replace syntax ${variable//search/replace}.
prefix='/var/azp'
result=${dirPath//$prefix/$prefix\/$name}
# > /var/azp/agent_name-2/1/s
sed s
If $name doesn't contain any special characters, you could inject it into a sed search-replace:
$ sed "s|/var/azp|\0/$name|" <<< "$dirPath"
/var/azp/agent_name-2/1/s
Then for saving the result to a variable, see How do I set a variable to the output of a command in Bash?

In bash how can I get the last part of a string after the last hyphen [duplicate]

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

What ##*/ does in bash? [duplicate]

I have a string like this:
/var/cpanel/users/joebloggs:DNS9=domain.example
I need to extract the username (joebloggs) from this string and store it in a variable.
The format of the string will always be the same with exception of joebloggs and domain.example so I am thinking the string can be split twice using cut?
The first split would split by : and we would store the first part in a variable to pass to the second split function.
The second split would split by / and store the last word (joebloggs) into a variable
I know how to do this in PHP using arrays and splits but I am a bit lost in bash.
To extract joebloggs from this string in bash using parameter expansion without any extra processes...
MYVAR="/var/cpanel/users/joebloggs:DNS9=domain.example"
NAME=${MYVAR%:*} # retain the part before the colon
NAME=${NAME##*/} # retain the part after the last slash
echo $NAME
Doesn't depend on joebloggs being at a particular depth in the path.
Summary
An overview of a few parameter expansion modes, for reference...
${MYVAR#pattern} # delete shortest match of pattern from the beginning
${MYVAR##pattern} # delete longest match of pattern from the beginning
${MYVAR%pattern} # delete shortest match of pattern from the end
${MYVAR%%pattern} # delete longest match of pattern from the end
So # means match from the beginning (think of a comment line) and % means from the end. One instance means shortest and two instances means longest.
You can get substrings based on position using numbers:
${MYVAR:3} # Remove the first three chars (leaving 4..end)
${MYVAR::3} # Return the first three characters
${MYVAR:3:5} # The next five characters after removing the first 3 (chars 4-9)
You can also replace particular strings or patterns using:
${MYVAR/search/replace}
The pattern is in the same format as file-name matching, so * (any characters) is common, often followed by a particular symbol like / or .
Examples:
Given a variable like
MYVAR="users/joebloggs/domain.example"
Remove the path leaving file name (all characters up to a slash):
echo ${MYVAR##*/}
domain.example
Remove the file name, leaving the path (delete shortest match after last /):
echo ${MYVAR%/*}
users/joebloggs
Get just the file extension (remove all before last period):
echo ${MYVAR##*.}
example
NOTE: To do two operations, you can't combine them, but have to assign to an intermediate variable. So to get the file name without path or extension:
NAME=${MYVAR##*/} # remove part before last slash
echo ${NAME%.*} # from the new var remove the part after the last period
domain
Define a function like this:
getUserName() {
echo $1 | cut -d : -f 1 | xargs basename
}
And pass the string as a parameter:
userName=$(getUserName "/var/cpanel/users/joebloggs:DNS9=domain.example")
echo $userName
What about sed? That will work in a single command:
sed 's#.*/\([^:]*\).*#\1#' <<<$string
The # are being used for regex dividers instead of / since the string has / in it.
.*/ grabs the string up to the last backslash.
\( .. \) marks a capture group. This is \([^:]*\).
The [^:] says any character _except a colon, and the * means zero or more.
.* means the rest of the line.
\1 means substitute what was found in the first (and only) capture group. This is the name.
Here's the breakdown matching the string with the regular expression:
/var/cpanel/users/ joebloggs :DNS9=domain.example joebloggs
sed 's#.*/ \([^:]*\) .* #\1 #'
Using a single Awk:
... | awk -F '[/:]' '{print $5}'
That is, using as field separator either / or :, the username is always in field 5.
To store it in a variable:
username=$(... | awk -F '[/:]' '{print $5}')
A more flexible implementation with sed that doesn't require username to be field 5:
... | sed -e s/:.*// -e s?.*/??
That is, delete everything from : and beyond, and then delete everything up until the last /. sed is probably faster too than awk, so this alternative is definitely better.
Using a single sed
echo "/var/cpanel/users/joebloggs:DNS9=domain.example" | sed 's/.*\/\(.*\):.*/\1/'
I like to chain together awk using different delimitators set with the -F argument. First, split the string on /users/ and then on :
txt="/var/cpanel/users/joebloggs:DNS9=domain.com"
echo $txt | awk -F"/users/" '{print$2}' | awk -F: '{print $1}'
$2 gives the text after the delim, $1 the text before it.
I know I'm a little late to the party and there's already good answers, but here's my method of doing something like this.
DIR="/var/cpanel/users/joebloggs:DNS9=domain.example"
echo ${DIR} | rev | cut -d'/' -f 1 | rev | cut -d':' -f1

I want to extract the strings from file name

one_two_three_four_five.rtf
I need five in A variable
I need four in B variable
And remaining in C variable
Should read from the last character
Note after 2 underscore from the last. There could be many underscores but should take has C variable.
Is it possible?
For example using parameter expansion
#!/bin/ksh
string="one_two_three_four_five.rtf"
base=${string%.rtf}
a=${base##*_}; base=${base%_$a}
b=${base##*_}; base=${base%_$b}
c=$base
echo "$a - $b - $c"
s="one_two_three_four_five.rtf"
source <(sed -r 's/(.*)_([^_]*)_([^_]*)[.].*/C="\1"; B="\2";A="\3"/' <<< "${s}")
# Result:
echo "A=$A, B=$B, C=$C"
A=five, B=four, C=one_two_three
Explanation:
sed -r No need for escaping backslashes
(.*)_ Matches largest string until underscore with the condition that there are underscores left for matching the remaining string
([^_]*) String without underscore
[.] A dot without special meaning
"\1" First remembered string
<<< "${s}" Input for sed is like echo "${s}" | sed ...
<(..) Simulates a file, so sourcing these will execute the commands.

How do prevent whitespace from appearing in these bash variables?

I'm reading in values from an .ini file, and sometimes may get trailing or leading whitespace.
How do I amend this first line to prevent that?
db=$(sed -n 's/.*DB_USERNAME *= *\([^ ]*.*\)/\1/p' < config.ini);
echo -"$db"-
Result;
-myinivar -
I need;
-myinivar-
Use parameter expansion.
echo "=${db% }="
You don't need the .* inside the capturing group (or the semicolon at the end of line):
db="$(sed -n 's/.*DB_USERNAME *= *\([^ ]*\).*/\1/p' < config.ini)"
To elaborate:
.* matches anything at all
DB_USERNAME matches that literal string
* (a single space followed by an asterisk) matches any number of spaces
= matches that literal string
* (a single space followed by an asterisk) matches any number of spaces
\( starts the capturing group that is used for \1 later
[^ ] matches anything which is not a space character
* repeats that zero or more times
\) ends the capturing group
.* matches anything at all
Therefore, the result will be all the characters after DB_USERNAME = and any number of spaces, up to the next space or end of line, whichever comes first.
You can use echo to trim whitespace:
db='myinivar '
echo -"$(echo $db)"-
-myinivar-
Use crudini which handles these ini file edge cases transparently
db=$(crudini --get config.ini '' DB_USERNAME)
To get rid of more than one trailing space, use %% which removes the longest matching pattern from the end of the string
echo "=${db%% *}="

Resources