bash scripting - removing trailing chars from the end of a string - bash

I have a string like that:
rthbp-0.0.3-sources.jar
And i want to get a new string called in someway (like string2) with the original string with just the version , so - '0.0.3' now rthbp- at the start will remain constant but after version "-" (-sources.jar) may change.
is that possible in bash, just to extract the version info ?
I am doing this - echo ${f:6} but only gives me 0.0.3-sources.jar

If you're using bash, you can extract an arbitrary substring by specifying both offset and length:
$ filename=rthbp-0.0.3-sources.jar
$ echo "${filename:6:5}"
#=> 0.0.3
But using exact character offsets like that is fragile. You might want to use something like this:
$ IFS=- read pre version post <<<"$filename"
$ echo "$version"
#=> 0.0.3
Or, somewhat more clunkily:
$ ltrim=${filename%%-*}-
$ rest=${filename#$ltrim}
$ version=${rest%%-*}
Or as others mentioned you could call out to cut or awk to do the splitting for you..

No one has mentioned regular expression matching yet, so I will.
[[ $string1 =~ rthbp-(.*)-sources.jar ]]
version=${BASH_REMATCH[1]}
(You may want a slightly more general regular expression; this just demonstrates how to match against a regular expression containing a capture group and how to extract the captured value.)

You can use read built-in:
s='rthbp-0.0.3-sources.jar'
IFS=- read a ver _ <<< "$s" && echo "$ver"
0.0.3

I would recommend just using cut for this. Define the delimiter as dashes and keep field two:
$ echo "rthbp-0.0.3-sources.jar" | cut -d'-' -f 2
0.0.3
If you want to use pure bash, you can use parameter expansion, but it isn't as clean. Assuming the version always starts in the same place and is the same length, you can use:
$ str="rthbp-0.0.3-sources.jar"
$ echo "${str:6:5}"
0.0.3

Another variant of regex match:
$ echo "rthbp-0.0.3-sources.jar"|grep -e '([[:digit:]]+\.)+[[:digit:]]+' -oP
0.0.3

echo "rthbp-0.0.3-sources.jar" | awk -F- '$2=="0.0.3"{print $2}'
0.0.3
echo "rthbp-0.0.3-sources.jar" | awk -F- '{print $2}'
0.0.3

Related

Extracting a substring until and including a matching word using bash tools

I have file names like these:
func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-pfobloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-rest_run-01_bold_space-T1w_preproc.nii.gz
and from each file name I want to extract the part until and including the word bold so that in the end I have:
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
Any ideas how to do that?
The easiest thing to do is to just remove bold and everything after, then replace bold. Obviously, this only works if the terminating string is fixed, as in this case.
$ f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${f%%bold*}"
func/sub-01_task-biommtloc_run-01_
$ echo "${f%%bold*}bold"
func/sub-01_task-biommtloc_run-01_bold
Is something like this what you want?
echo func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz | sed -e 's#bold_.*$#bold#'
Hope this helps
This is (needlessly) clever: remove the prefix ending with "bold"
and then so some substring index arithmetic based on the length of the suffix that's left over:
$ file=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ echo "$keep"
func/sub-01_task-biommtloc_run-01_bold
If $file does not contain "bold", then $keep will be empty: we can give it the value of $file if it is empty:
$ file=foobar
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ : ${keep:=$file}
$ echo "$keep"
foobar
But seriously, do what chepner suggests.
using Perl
> echo "func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz" | perl -e 'while (<>) { $_=~s/(.*bold)(.*)/\1/g; print } '
func/sub-01_task-biommtloc_run-01_bold
>
This is similar to glenn's solution, but a bit "less clever" in that it doesn't use substrings, just nested substitutions:
$ while IFS= read -r fname; do echo "${fname%"${fname#*bold}"}"; done < infile
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
The substitution "${fname%"${fname#*bold}"}" says:
Remove "${fname#*bold}" from the end of each filename, where
"${fname#*bold}" is everything up to and including bold removed from the front of the filename
Example for the first filename with explicit intermediate steps:
$ fname=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${fname#*bold}"
_space-T1w_preproc.nii.gz
$ echo "${fname%"${fname#*bold}"}"
func/sub-01_task-biommtloc_run-01_bold
f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.g
echo "${f//bold*/bold}"
I would recommend using sed for this task. First take all of your input filenames and stick them in a file, call it namelist.txt in the current directory. The following will work, as long as your sed supports extended regular expressions (which most will, particularly GNU sed). Note that the flag for extended regular expressions may differ a bit between platforms, check your sed manual page. On my Linux, it is -r.
bash -c "sed -r 's/(sub-01_task-.{1,10}_run-01_bold).+/\\1/' namelist.txt"

Adding double quotes to beginning, end and around comma's in bash variable

I have a shell script that accepts a parameter that is comma delimited,
-s 1234,1244,1567
That is passed to a curl PUT json field. Json needs the values in a "1234","1244","1567" format.
Currently, I am passing the parameter with the quotes already in it:
-s "\"1234\",\"1244\",\"1567\"", which works, but the users are complaining that its too much typing and hard to do. So I'd like to just take a comma delimited list like I had at the top and programmatically stick the quotes in.
Basically, I want a parameter to be passed in as 1234,2345 and end up as a variable that is "1234","2345"
I've come to read that easiest approach here is to use sed, but I'm really not familiar with it and all of my efforts are failing.
You can do this in BASH:
$> arg='1234,1244,1567'
$> echo "\"${arg//,/\",\"}\""
"1234","1244","1567"
awk to the rescue!
$ awk -F, -v OFS='","' -v q='"' '{$1=$1; print q $0 q}' <<< "1234,1244,1567"
"1234","1244","1567"
or shorter with sed
$ sed -r 's/[^,]+/"&"/g' <<< "1234,1244,1567"
"1234","1244","1567"
translating this back to awk
$ awk '{print gensub(/([^,]+)/,"\"\\1\"","g")}' <<< "1234,1244,1567"
"1234","1244","1567"
you can use this:
echo QV=$(echo 1234,2345,56788 | sed -e 's/^/"/' -e 's/$/"/' -e 's/,/","/g')
result:
echo $QV
"1234","2345","56788"
just add double quotes at start, end, and replace commas with quote/comma/quote globally.
easy to do with sed
$ echo '1234,1244,1567' | sed 's/[0-9]*/"\0"/g'
"1234","1244","1567"
[0-9]* zero more consecutive digits, since * is greedy it will try to match as many as possible
"\0" double quote the matched pattern, entire match is by default saved in \0
g global flag, to replace all such patterns
In case, \0 isn't recognized in some sed versions, use & instead:
$ echo '1234,1244,1567' | sed 's/[0-9]*/"&"/g'
"1234","1244","1567"
Similar solution with perl
$ echo '1234,1244,1567' | perl -pe 's/\d+/"$&"/g'
"1234","1244","1567"
Note: Using * instead of + with perl will give
$ echo '1234,1244,1567' | perl -pe 's/\d*/"$&"/g'
"1234""","1244""","1567"""
""$
I think this difference between sed and perl is similar to this question: GNU sed, ^ and $ with | when first/last character matches
Using sed:
$ echo 1234,1244,1567 | sed 's/\([0-9]\+\)/\"\1\"/g'
"1234","1244","1567"
ie. replace all strings of numbers with the same strings of numbers quoted using backreferencing (\1).

String substitute in Shell script

I suppose to strip down a substring in my shell script. I am trying as follows:
fileName="Test_VSS_TT.csv.old"
here i want to remove the string ".csv.old" and my
test=${fileName%.*}
but getting bad substitution error.
you are looking for test=${filename%%.*}
the doc for parameter expansion in bash here and in zsh here
%.* will match the first .* pattern, whereas %%.* will match the longest one
[edit]
if sed is available, you could try something like that : echo "filename.txt.bin" | sed "s/\..*//g" which yields filename
Here you go,
$ echo $f
Test_VSS_TT.csv.old
$ test=${f%%.*}
$ echo $test
Test_VSS_TT
%% will do a longest match. So it matches from the first dot upto the last and then removes the matched characters.
If your intention is to extract file name without extension, then how about this?
$ echo ${fileName}
Test_VSS_TT.csv.old
$ test=`echo ${fileName} |cut -d '.' -f1`
$ echo $test
Test_VSS_TT
echo "Test_VSS_TT.csv.old"| awk -F"." '{print $1}'

easy way to spilt a string like this in bash?

Now I have strings in the form "temp:10" and I use temp=$(echo $str|awk '{split($0,array,":")} END{print array[1]}') to split which is overkilled and slow..there must be a simpler to do this?
Use bash's parameter expansion with suffix removal:
temp=${str%%:*}
There's also the read command:
$ str="temp:10"
$ IFS=: read before after <<< "$str"
$ echo "$before"
temp
$ echo "$after"
10
If I understand you right, you need the value before the :, temp in this example. If so, then you can use the cut command:
cut -d':' -f1

shell command to truncate/cut a part of string

I have a file with the below contents. I got the command to print version number out of it. But I need to truncate the last part in the version file
file.spec:
Version: 3.12.0.2
Command used:
VERSION=($(grep -r "Version:" /path/file.spec | awk '{print ($2)}'))
echo $VERSION
Current output : 3.12.0.2
Desired output : 3.12.0
There is absolutey no need for external tools like awk, sed etc. for this simple task if your shell is POSIX-compliant (which it should be) and supports parameter expansion:
$ cat file.spec
Version: 3.12.0.2
$ version=$(<file.spec)
$ version="${version#* }"
$ version="${version%.*}"
$ echo "${version}"
3.12.0
Try this:
VERSION=($(grep -r "Version:" /path/file.spec| awk '{print ($2)}' | cut -d. -f1-3))
Cut split string with field delimiter (-d) , then you select desired field with -f param.
You could use this single awk script awk -F'[ .]' '{print $2"."$3"."$4}':
$ VERSION=$(awk -F'[ .]' '{print $2"."$3"."$4}' /path/file.spec)
$ echo $VERSION
3.12.0
Or this single grep
$ VERSION=$(grep -Po 'Version: \K\d+[.]\d+[.]\d' /path/file.spec)
$ echo $VERSION
3.12.0
But you never need grep and awk together.
if you only grep single file, -r makes no sense.
also based on the output of your command line, this grep should work:
grep -Po '(?<=Version: )(\d+\.){2}\d+' /path/file.spec
gives you:
3.12.0
the \K is also nice. worked for fixed/non-fixed length look-behind. (since PCRE 7.2). There is another answer about it. but I feel look-behind is easier to read, if fixed length.

Resources