how to find pattern and insert text in middle using shell script - bash

I would like to add a name in the middle of dirPath
#!/bin/bash
name='agent_name-2'
dirPath='/var/azp/1/s'
I want to insert agent_name-2 after /var/azp in dirPath, and store it in a separate variable result like this
result=/var/azp/agent_name-2/1/s

If /var/azp is a hard coded string (i.e. constant), try:
name='agent_name-2'
dirPath='/var/azp/1/s'
result="/var/azp/$name${dirPath#/var/azp}"
Explanation: ${dirPath#/var/azp} removes the string /var/azp from the beginning of the string $dirPath.

Try this:
#!/bin/bash
name='agent_name-2'
dirPath='/var/azp/1/s'
Split dirPath by / and store it in the array dirs.
IFS=/ read -r -a dirs <<< "$dirPath"
Calculate the middle of the array.
middle=$(((${#dirs[#]}+1)/2))
Create two new arrays left and right with the left and right half of the dirs array.
left=("${dirs[#]:0:$middle}")
right=("${dirs[#]:$middle}")
Join the left and right half and put the name in between.
result="$(printf "%s/" "${left[#]}" "$name" "${right[#]}")"
Remove the trailing slash.
result=${result%/}

Bash search-replace
You can use Bash's search and replace syntax ${variable//search/replace}.
prefix='/var/azp'
result=${dirPath//$prefix/$prefix\/$name}
# > /var/azp/agent_name-2/1/s
sed s
If $name doesn't contain any special characters, you could inject it into a sed search-replace:
$ sed "s|/var/azp|\0/$name|" <<< "$dirPath"
/var/azp/agent_name-2/1/s
Then for saving the result to a variable, see How do I set a variable to the output of a command in Bash?

Related

Basic string manipulation from filenames in bash

I have a some file names in bash that I have acquired with
$ ones=$(find SRR*pass*1*.fq)
$ echo $ones
SRR6301033_pass_1_trimmed.fq
SRR6301034_pass_1_trimmed.fq
SRR6301037_pass_1_trimmed.fq
...
I then converted into an array so I can iterate over this list and perform some operations with filenames:
# convert to array
$ ones=(${ones// / })
and the iteration:
for i in $ones;
do
fle=$(basename $i)
out=$(echo $fle | grep -Po '(SRR\d*)')
echo "quants/$out.quant"
done
which produces:
quants/SRR6301033
SRR6301034
...
...
SRR6301220
SRR6301221.quant
However I want this:
quants/SRR6301033.quant
quants/SRR6301034.quant
...
...
quants/SRR6301220.quant
quants/SRR6301221.quant
Could somebody explain why what I'm doing doesn't work and how to correct it?
Why do you want this be done this complicated? You can get rid of all the unnecessary roundabouts and just use a for loop and built-in parameter expansion techniques to get this done.
# Initialize an empty indexed array
array=()
# Start a loop over files ending with '.fq' and if there are no such files
# the *.fq would be un-expanded and checking it against '-f' would fail and
# in-turn would cause the loop to break out
for file in *.fq; do
[ -f "$file" ] || continue
# Get the part of filename after the last '/' ( same as basename )
bName="${file##*/}"
# Remove the part after '.' (removing extension)
woExt="${bName%%.*}"
# In the resulting string, remove the part after first '_'
onlyFir="${woExt%%_*}"
# Append the result to the array, prefixing/suffixing strings 'quant'
array+=( quants/"$onlyFir".quant )
done
Now print the array to see the result
for entry in "${array[#]}"; do
printf '%s\n' "$entry"
done
Ways your attempt could fail
With ones=$(find SRR*pass*1*.fq) you are storing the results in a variable and not in an array. A variable has no way to distinguish if the contents are a list or a single string separated by spaces
With echo $ones i.e. an unquoted expansion, the string content is subject to word splitting. You might not see a difference as long as you have filenames with spaces, having one might let you interpret parts of the filename as different files
The part ${ones// / } makes no-sense in converting the string to an array as the attempt to use an unquoted variable $ones itself would be erroneous
for i in $ones; would be error prone for the said reasons above, the filenames with spaces could be interpreted as separated files instead of one.

Build a variable made with 2 sub-stings of another variable in bash

Here is a script I use:
for dir in $(find . -type d -name "single_copy_busco_sequences"); do
sppname=$(dirname $(dirname $(dirname $dir))| sed 's#./##g');
for file in ${dir}/*.faa; do name=$(basename $file); cp $file /Users/admin/Documents/busco_aa/${sppname}_${name}; sed -i '' 's#>#>'${sppname}'|#g' /Users/admin/Documents/busco_aa/${sppname}_${name}; cut -f 1 -d ":" /Users/admin/Documents/busco_aa/${sppname}_${name} > /Users/admin/Documents/busco_aa/${sppname}_${name}.1;
done;
done
The sppname variable is something like Gender_species
do you know how could I add a line in my script to creat a new variable called abbrev which transformes Gender_species into Genspe, the 3 first letters cat with the 3 first letters after _
exemples:
Homo_sapiens gives Homsap
Canis_lupus gives Canlup
etc
Thank for your help :)
You can achieve this using a regular expression with sed:
echo "Homo_sapiens" | sed -e s'/^\(...\).*_\(...\).*/\1\2/'
Homsap
start, get 3 chars (to keep in \1), anything, _, anything, get 3 chars (to keep in \2), anything
Replace echo "Homo_sapiens" by your $dir thing
PS: will fail if you have less than 3 chars in one word
You can do it all with bash built-in parameter expansions. Specifically, string indexes and substring removal.
$ a=Homo_sapiens; prefix=${a:0:3}; a=${a#*_}; postfix=${a:0:3}; echo $prefix$postfix
Homsap
$ a=Canis_lupus; prefix=${a:0:3}; a=${a#*_}; postfix=${a:0:3}; echo $prefix$postfix
Canlup
Using bash built-ins is always more efficient than spawning separate subshell(s) to invoke utilities to accomplish the same thing.
Explanation
Your string index form (bash only) allows you to index characters from within a string, e.g.
* ${parameter:offset:length} ## indexes are zero based, ${a:0:2} is 1st 2 chars
Where parameter is simply the variable name holding the string.
(you can index from the end of a string by using a negative offset preceded by a space or enclosed in parenthesis, e.g. a=12345; echo ${a: -3:2} outputs "34")
prefix=${a:0:3} ## save the first 3 characters in prefix
a=${a#*_} ## remove the front of the string through '_' (see below)
postfix=${a:0:3} ## save the first 3 characters after '_'
Your substring removal forms (POSIX) are:
${parameter#word} trim to 1st occurrence of word from parameter from left
${parameter##word} trim to last occurrence of word from parameter from left
and
${parameter%word} trim to 1st occurrence of word from parameter from right
${parameter%%word} trim to last occurrence of word from parameter from right
(word can contain globbing to expand to a pattern as well)
a=${a#*_} ## trim from left up to (and including) the first '_'
See bash(1) - Linux manual page for full details.

Adding a comma after $variable

I'm writing a for loop in bash to run a command and I need to add a comma after one of my variables. I can't seem to do this without an extra space added. When I move "," right next to $bams then it outputs *.sorted,
#!/bin/bash
bams=*.sorted
for i in $bams
do echo $bams ","
done;
Output should be this:
'file1.sorted','file2.sorted','file3.sorted'
The eventual end goal is to be able to insert a list of files into a --flag in the format above. Not sure how to do that either.
First, a literal answer (if your goal were to generate a string of the form 'foo','bar','baz', rather than to run a program with a command line equivalent to somecommand --flag='foo','bar','baz', which is quite different):
shopt -s nullglob # generate a null result if no matches exist
printf -v var "'%s'," *.sorted # put list of files, each w/ a comma, in var
echo "${var%,}" # echo contents of var, with last comma removed
Or, if you don't need the literal single quotes (and if you're passing your result to another program on its command line with the single quotes being syntactic rather than literal, you absolutely don't want them):
files=( *.sorted ) # put *.sorted in an array
IFS=, # set the comma character as the field separator
somecommand --flag "${files[*]}" # run your program with the comma-separated list
try this -
lst=$( echo *.sorted | sed 's/ /,/g' ) # stack filenames with commas
echo $lst
if you really need the single-ticks around each filename, then
lst="'$( echo *.sorted | sed "s/ /','/g" )'" # commas AND quotes
#!/bin/bash
bams=*.sorted
for i in $bams
do flag+="${flag:+,}'$i'"
done
echo $flag

Split filename and get the element between first and last occurrence of underscore

I am trying to split many folder names in a for loop and extract the element between first and last underscore of filename. Filenames can look like ENCSR000AMA_HepG2_CTCF or ENCSR000ALA_endothelial_cell_of_umbilical_vein_CTCF.
My problem is that folder names differ form each other in the total number of underscores, so I cannot use something like:
IN=$d
folderIN=(${IN//_/ })
tf_name=${folderIN[-1]%/*} #get last element which is the TF name
cell_line=${folderIN[-2]%/*}; #get second last element which is the cell line
dataset_name=${folderIN[0]%/*}; #get first element which is the dataset name
cell_line can be one or more words separated by underscore but it's allways between 1st and last underscore.
Any help?
Just do this in a two step bash parameter expansion ONLY because bash does not support nested parameter expansion unlike zsh or other shells.
"${string%_*}" to strip the everything after the last occurrence of '_' and "${tempString#*_}" to strip everything from beginning to first occurrence of '_'
string="ENCSR000ALA_endothelial_cell_of_umbilical_vein_CTCF"
tempString="${string%_*}"
printf "%s\n" "${tempString#*_}"
endothelial_cell_of_umbilical_vein
Another example,
string="ENCSR000AMA_HepG2_CTCF"
tempString="${string%_*}"
printf "%s\n" "${tempString#*_}"
HepG2
You can modify this logic to apply on each of the file-names in your folder.
Could use regex.
extract_words() {
[[ "$1" =~ ^([^_]+)_(.*)_([^_]+)$ ]] && echo "${BASH_REMATCH[2]}"
}
while read -r from_line
do
extracted=$(extract_words "$from_line")
echo "$from_line" "[$extracted]"
done < list_of_filenames.txt
EDIT: I moved the "extraction" into an alone bash function for reuse and easy modification for more complex cases, like:
extract_words() {
perl -lnE 'say $2 if /^([^_]+)_(.*)_([^_]+)$/' <<< "$1"
}

Set bash variable equal to result of string where newlines are replaced by spaces

I have a variable equal to a string, which is a series of key/value pairs separated by newlines.
I want to then replace these newline characters with spaces, and set a new variable equal to the result
From various answers on the internet I've arrived at the following:
#test.txt has the content:
#test=example
#what=s0omething
vars="$(cat ./test.txt)"
formattedVars= $("$vars" | tr '\n' ' ')
echo "$taliskerEnvVars"
Problem is when I try to set formattedVars it tries to execute the second line:
script.sh: line 7: test=example
what=s0omething: command not found
I just want formattedVars to equal test=example what=s0omething
What trick am I missing?
Change your line to:
formattedVars=$(tr '\n' ' ' <<< "$secretsContent")
Notice the space of = in your code, which is not permitted in assignment statements.
I see that you are not setting secretsContent in your code, you are setting vars instead.
If possible, use an array to hold contents of the file:
readarray -t vars < ./test.txt # bash 4
or
# bash 3.x
declare -a vars
while IFS= read -r line; do
vars+=( "$line" )
done < ./test.txt
Then you can do what you need with the array. You can make your space-separated list with
formattedVars="${vars[*]}"
, but consider whether you need to. If the goal is to use them as a pre-command modifier, use, for instance,
"${vars[#]}" my_command arg1 arg2

Resources