Extract a certain part of a string in bash with different patterns - bash

I have this file:
CLUSTERS=SP1,SP2,SP3
FNAME_SP1="REWARDS_BTS_SP1_<GTS>.dat"
FNAME_SP2="DUMP_LOG_SP2_<GTS>.dat"
FNAME_SP3="TEST_CASE_TABLE_SP3_<GTS>.dat"
What I want to get from these are:
REWARDS_BTS_SP1_
DUMP_LOG_SP2_
TEST_CASE_TABLE_SP3_
I loop through the CLUSTERS field, get the values, and use it to find the appropriate FNAME_<CLUSTERNAME> value. Basically, the CLUSTERS value are ALWAYS before the _<GTS> part of the string. Any string pattern will do, provided that the CLUSTERS value come before the _<GTS> at the end of the string.
Any suggestions? Here's a part of the script.
function loadClusters() {
for i in `echo ${!CLUSTER*}`
do
CLUSTER=`echo ${i} | grep $1`
if [[ -n ${CLUSTER} ]]; then
CLUSTER=${!i}
break;
fi
done
echo -e ${CLUSTER}
}
function loadClustersCampaign() {
for i in `echo ${!BPOINTS*}`
do
BPOINTS=`echo ${i} | grep $1`
if [[ -n ${BPOINTS} ]]; then
BPOINTS=${!i}
break;
fi
done
for i in `echo ${!FNAME*}`
do
FNAME=`echo ${i} | grep $1`
if [[ -n ${FNAME} ]]; then
FNAME=${!i}
break;
fi
done
echo -e ${BPOINTS}"|"${FNAME}
}
#get clusters
clusters=$(loadClusters $1)
for i in `echo $clusters | sed 's/,/ /g'`
do
file=$(loadClustersCampaign ${i/-/_} | awk -F"|" '{print $2}') ;
echo $file;
#then get the part of the $file variable
done

Fun with Shell Parameter Expansions
You can use matching-prefix notation and indirect expansion to get at the variables you want, and use the "remove suffix" expansion on each result to collect just the portions of the filename that you want. For example:
FNAME_SP1='REWARDS_BTS_SP1_<GTS>.dat'
FNAME_SP2='DUMP_LOG_SP2_<GTS>.dat'
FNAME_SP3='TEST_CASE_TABLE_SP3_<GTS>.dat'
for cluster in "${!FNAME_SP#}"; do
echo ${!cluster%%<GTS>*}
done
This will print out the following:
REWARDS_BTS_SP1_
DUMP_LOG_SP2_
TEST_CASE_TABLE_SP3_
but you could issue any valid shell command inside the loop instead of using echo.
See Also
http://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html

If you like an awk solution for this ,may be below will be useful.
> echo 'FNAME_SP1="REWARDS_BTS_SP1_<GTS>.dat"' | awk -F"<GTS>" '{split($1,a,"=\"");print substr(a[2],2)}'
REWARDS_BTS_SP1_
Furthur more detail below:
> cat temp
LUSTERS=SP1,SP2,SP3
FNAME_SP1="REWARDS_BTS_SP1_<GTS>.dat"
FNAME_SP2="DUMP_LOG_SP2_<GTS>.dat"
FNAME_SP3="TEST_CASE_TABLE_SP3_<GTS>.dat"
> awk -F"<GTS>" '/FNAME_SP/{split($1,a,"=");print substr(a[2],2)}' temp
REWARDS_BTS_SP1_
DUMP_LOG_SP2_
TEST_CASE_TABLE_SP3_
>

Related

Bash (split) file name comparison fails

In my directory I have files (*fastq.gz.fasta) and directories, whose names contain the filenames (*fastq.gz.fasta-blastdb):
IVC6_Meino.clust.gz.fasta-blastdb
IVC5_Mehiv.clust.gz.fasta-blastdb
....
IVC6_Meino.clust.gz.fasta
IVC5_Mehiv.clust.gz.fasta
....
In a bash script I want to compare the filenames with the direcories using the cut option on the latter to extract only the filename part. If those two names match I want to do further stuff (for now echo match or no match respectively).
I have written the following piece of code:
#!/bin/bash
for file in *.fasta
do
for db in *-blastdb
do
echo $file, $db | cut -d '-' -f 1
if [[ $file = "$db | cut -d '-' -f 1" ]]; then
echo "match"
else
echo "no match"
fi
done
done
But it does not detect matches. The output looks like this:
...
IVC6_Meino.clust.gz.fasta, IIIA11_Meova.clust.gz.fasta
no match
IVC6_Meino.clust.gz.fasta, IVC5_Mehiv.clust.gz.fasta
no match
IVC6_Meino.clust.gz.fasta, IVC6_Meino.clust.gz.fasta
no match
The last line should read match as you can see, the strings look the same.
What am i missing?
You can use parameter expansion to do this more easily:
for file in *.fasta
do
for db in *-blastdb
do
echo "$file", "$db"
if [[ "${file%%.fasta}" = "${db%%.fasta-blastdb}" ]]; then
echo "match"
else
echo "no match"
fi
done
done
If you want to fix yours, the problem is the use of $db | cut -d '-' -f 1 With echo it appears that echo is printing the pipe. It isn't. cut is printing. When you do [[ $file = "$db | cut -d '-' -f 1" ]] it is equivalent to [[ $file = [return code from last pipe component] ]]
You need to use the $(..) shell construct to capture the output of the pipe and you need to echo to get the contents of $db to start the pipe. You should quote "$db" so you do not have word splitting or globbing from the contents of the variable.
Like so:
for file in *.fasta
do
for db in *-blastdb
do
ts=$(echo "$db" | cut -d '-' -f 1)
echo "$file", "$ts"
if [[ "$file" = "$ts" ]]; then
echo "match"
else
echo "no match"
fi
done
done # this works I think -- not tested...
Please be careful with your quoting with Bash and liberally use ShellCheck.
The structure you have is also not the most efficient. You will loop over the *-blastdb glob once for every file in *-blastdb. If you have a lot of files, that could get really slow.
To solve that, you could rewrite this loop with Bash arrays (best if you have Bash 4+) or use awk:
ext1=.fasta
ext2=.fasta-blastdb
awk 'FNR==NR{
s=$0
sub("\\"ext1"$","",s)
seen[s]=$0
next}
{
s=$0
sub("\\"ext2"$","",s)
if (s in seen)
print seen[s], $0
}
' ext1="$ext1" ext2="$ext2" <(for fn in *$ext1; do echo "$fn"; done) <(for fn in *$ext2; do echo "$fn"; done)
Each glob is only executing once and awk is using an array to test if the basenames are the same.
Best

How to filter an ordered list stored into a string

Is it possible in bash to filter out a part of a string with another given string ?
I have a fixed list of motifs defined in a string. The order IS important and I want to keep only the parts that are passed as a parameter ?
myDefaultList="s,t,a,c,k" #order is important
toRetains="k,t,c,u" #provided by the user, order is not enforced
retained=filter $myDefaultList $toRetains # code to filter
echo $retained # will print t,c,k"
I can write an ugly method that will use IFS, arrays and loops, but I wonder if there's a 'clever' way to do that, using built-in commands ?
here is another approach
tolines() { echo $1 | tr ',' '\n'; }
grep -f <(tolines "$toRetains") <(tolines "$myDefaultList") | paste -sd,
will print
t,c,k
assign to a variable as usual.
Since you mention in your comments that you are open to sed/awk , check also this with GNU awk:
$ echo "$a"
s,t,a,c,k
$ echo "$b"
k,t,c,u
$ awk -v RS=",|\n" 'NR==FNR{a[$1];next}$1 in a{printf("%s%s",$1,RT)}' <(echo "$b") <(echo "$a")
t,c,k
#!/bin/bash
myDefaultList="s,t,a,c,k"
toRetains="s,t,c,u"
IFS=","
for i in $myDefaultList
do
echo $toRetains | grep $i > /dev/null
if [ "$?" -eq "0" ]
then
retained=$retained" "$i
fi
done
echo $retained | sed -e 's/ /,/g' -e 's/,//1'
I have checked it running for me. Kindly check.

bash: sed: unexpected behavior: displays everything

I wrote what I thought was a quick script I could run on a bunch of machines. Instead it print what looks like might be directory contents in a recursive search:
version=$(mysql Varnish -B --skip-column-names -e "SELECT value FROM sys_param WHERE param='PatchLevel'" | sed -n 's/^.*\([0-9]\.[0-9]*\).*$/\1/p')
if [[ $(echo "if($version == 6.10) { print 1; } else { print 0; }" | bc) -eq 1 ]]; then
status=$(dpkg-query -l | awk '{print $2}' | grep 'sg-status-polling');
cons=$(dpkg-query -l | awk '{print $2}' | grep 'sg-consolidated-poller');
if [[ "$status" != "" && "$cons" != "" ]]; then
echo "about to change /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm"; echo;
cp /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm.bkup;
sed -ir '184s!\x91\x93!\x91\x27--timeout=35\x27\x93!' /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm;
sed -n 183,185p /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm; echo;
else
echo "packages not found. Assumed to be not applicable";
fi
else
echo "This is 4.$version, skipping";
fi
The script is supposed to make sure Varnish is version 4.6.10 and has 2 custom .deb packages installed (not through apt-get). then makes a backup and edits a single line in a perl module from [] to ['--timeout=35']
it looks like its tripping up on the sed replace one liner.
There are two major problems (minor ones addressed in comments). The first is that you use the decimal code for [] instead of the hexa, so you should use \x5b\x5d instead of \x91\x93. The second problem is that if you do use the proper codes, sed will still interpret those syntactically as []. So you can't escape escaping. Here's what you should call:
sed -ri'.bkup' '184s!\[\]![\x27--timeout=35\x27]!' /var/www/Varnish/lib/Extra/SG/ObjectPoller2.pm
And this will create the backup for you (but you should double check).

Bash scripting; confused with for loop

I need to make a for loop that loops for every item in a directory.
My issue is the for loop is not acting as I would expect it to.
cd $1
local leader=$2
if [[ $dOpt = 0 ]]
then
local items=$(ls)
local nitems=$(ls |grep -c ^)
else
local items=$(ls -l | egrep '^d' | awk '{print $9}')
local nitems=$(ls -l | egrep '^d' | grep -c ^)
fi
for item in $items;
do
printf "${CYAN}$nitems\n${NONE}"
let nitems--
if [[ $nitems -lt 0 ]]
then
exit 4
fi
printf "${YELLOW}$item\n${NONE}"
done
dOpt is just a switch for a script option.
The issue I'm having is the nitems count doesn't decrease at all, it's as if the for loop is only going in once. Is there something I'm missing?
Thanks
Goodness gracious, don't rely on ls to iterate over files.
local is only useful in functions.
Use filename expansion patterns to store the filenames in an array.
cd "$1"
leader=$2 # where do you use this?
if [[ $dOpt = 0 ]]
then
items=( * )
else
items=( */ ) # the trailing slash limits the results to directories
fi
nitems=${#items[#]}
for item in "${items[#]}" # ensure the quotes are present here
do
printf "${CYAN}$((nitems--))\n${NONE}"
printf "${YELLOW}$item\n${NONE}"
done
Using this technique will safely handle files with spaces, even newlines, in the name.
Try this:
if [ "$dOpt" == "0" ]; then
list=(`ls`)
else
list=(`ls -l | egrep '^d' | awk '{print $9}'`)
fi
for item in `echo $list`; do
... # do something with item
done
Thanks for all the suggestions. I found out the problem was changing $IFS to ":". While I meant for this to avoid problems with whitespaces in the filename, it just complicated things.

Get any string between 2 string and assign a variable in bash

I cannot get this to work. I only want to get the string between 2 others in bash. Like this:
FOUND=$(echo "If show <start>THIS WORK<end> then it work" | **the magic**)
echo $FOUND
It seems so simple...
sed -n 's/.*<start>\(.*\)<end>.*/\1/p'
This can be done in bash without any external commands such as awk and sed. When doing a regex match in bash, the results of the match are put into a special array called BASH_REMATCH. The second element of this array contains the match from the first capture group.
data="If show <start>THIS WORK<end> then it work"
regex="<start>(.*)<end>"
[[ $data =~ $regex ]] && found="${BASH_REMATCH[1]}"
echo $found
This can also be done using perl regex in grep (GNU specific):
found=$(grep -Po '(?<=<start>).*(?=<end>)' <<< "If show <start>THIS WORK<end> then it work")
echo "$found"
If you have < start > and < end > in your string then this will work. Set the FS to < and >.
[jaypal:~/Temp] FOUND=$(echo "If show <start>THIS WORK<end> then it work" |
awk -v FS="[<>]" '{print $3}')
[jaypal:~/Temp] echo $FOUND
THIS WORK

Resources