Extracting a substring until and including a matching word using bash tools - bash

I have file names like these:
func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-pfobloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-rest_run-01_bold_space-T1w_preproc.nii.gz
and from each file name I want to extract the part until and including the word bold so that in the end I have:
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
Any ideas how to do that?

The easiest thing to do is to just remove bold and everything after, then replace bold. Obviously, this only works if the terminating string is fixed, as in this case.
$ f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${f%%bold*}"
func/sub-01_task-biommtloc_run-01_
$ echo "${f%%bold*}bold"
func/sub-01_task-biommtloc_run-01_bold

Is something like this what you want?
echo func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz | sed -e 's#bold_.*$#bold#'
Hope this helps

This is (needlessly) clever: remove the prefix ending with "bold"
and then so some substring index arithmetic based on the length of the suffix that's left over:
$ file=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ echo "$keep"
func/sub-01_task-biommtloc_run-01_bold
If $file does not contain "bold", then $keep will be empty: we can give it the value of $file if it is empty:
$ file=foobar
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ : ${keep:=$file}
$ echo "$keep"
foobar
But seriously, do what chepner suggests.

using Perl
> echo "func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz" | perl -e 'while (<>) { $_=~s/(.*bold)(.*)/\1/g; print } '
func/sub-01_task-biommtloc_run-01_bold
>

This is similar to glenn's solution, but a bit "less clever" in that it doesn't use substrings, just nested substitutions:
$ while IFS= read -r fname; do echo "${fname%"${fname#*bold}"}"; done < infile
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
The substitution "${fname%"${fname#*bold}"}" says:
Remove "${fname#*bold}" from the end of each filename, where
"${fname#*bold}" is everything up to and including bold removed from the front of the filename
Example for the first filename with explicit intermediate steps:
$ fname=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${fname#*bold}"
_space-T1w_preproc.nii.gz
$ echo "${fname%"${fname#*bold}"}"
func/sub-01_task-biommtloc_run-01_bold

f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.g
echo "${f//bold*/bold}"

I would recommend using sed for this task. First take all of your input filenames and stick them in a file, call it namelist.txt in the current directory. The following will work, as long as your sed supports extended regular expressions (which most will, particularly GNU sed). Note that the flag for extended regular expressions may differ a bit between platforms, check your sed manual page. On my Linux, it is -r.
bash -c "sed -r 's/(sub-01_task-.{1,10}_run-01_bold).+/\\1/' namelist.txt"

Related

Combining Grep and Paste command in bash

very sorry to ask a stupid question but I'm getting crazy with this thing.
So, I'm in bash and I have some files:
ls
a.bed
b.bed
c.bed
all I want to do is create a variable that have all the 3 of them separated with a comma, this is the output I search for:
a.bed, b.bed, c.bed
What I'm using for now (but have spaces instead of commas is):
beds=$(ls|grep .bed)
which have
a.bed b.bed c.bed
Thank you so much
I would use printf and its -v option, followed by a use of parameter expansion.
$ printf -v beds '%s, ' *.bed
$ beds=${beds%, }
The first line produces a.bed, b.bed, c.bed, . The second line trims the trailing , .
If you only need a single-character separator, an alternative is to use an array with IFS:
$ beds=$(a=(*.bed); IFS=,; echo "${a[*]}")
You can do it with ls 'x' and 'm' options alone:
beds=$(ls -xm *.bed)
echo $beds
a.bed, b.bed, c.bed
Here's one that is a bit wacky:
beds=$( tr \ , <<< $(ls *.bed))
In the example above, we get rid of the newlines in the ls output simply by executing it with $(). Then we use the resulting string as input to tr which replaces all spaces with commas.
My favorite is using the built in -xm parameters in ls, but this particular answer can apply to other executables that do not provide the rich set of output formats that ls does.
Overkill for this specific case but just as an FYI you could do:
$ bedsArr=( *.bed )
$ bedsStr=$( printf '%s, ' "${bedsArr[#]:0:$((${#bedsArr[#]} - 1))}"; printf "%s\n" "${bedsArr[#]: -1:1}" )
$ printf '%s\n' "$bedsStr"
a.bed, b.bed, c.bed

Add prefix to each word of each line in bash

I have a variable called deps:
deps='word1 word2'
I want to add a prefix to each word of the variable.
I tried with:
echo $deps | while read word do \ echo "prefix-$word" \ done
but i get:
bash: syntax error near unexpected token `done'
any help? thanks
With sed :
$ deps='word1 word2'
$ echo "$deps" | sed 's/[^ ]* */prefix-&/g'
prefix-word1 prefix-word2
For well behaved strings, the best answer is:
printf "prefix-%s\n" $deps
as suggested by 123 in the comments to fedorqui's answer.
Explanation:
Without quoting, bash will split the contents of $deps according to $IFS (which defaults to " \n\t") before calling printf
printf evaluates the pattern for each of the provided arguments and writes the output to stdout.
printf is a shell built-in (at least for bash) and does not fork another process, so this is faster than sed-based solutions.
In another question I just came across the markers for beginning (\<) and end (\>) of words. With those you can shorten the solution of SLePort above somewhat. The solution also nicely extends to appending a suffix, which I needed in addition to the prefix, but couldn't figure out how to use above solution for it, as the & also includes the possible trailing whitespace after the word.
So my solution is this:
$ deps='word1 word2'
# add prefix:
$ echo "$deps" | sed 's/\</prefix-/g'
prefix-word1 prefix-word2
# add suffix:
$ echo "$deps" | sed 's/\>/-suffix/g'
word1-suffix word2-suffix
Explanation: \< matches the beginning of every word, and \> matches the end of each word. You can simply "replace" these by the prefix/suffix, resulting in them being prepended/appended. There is no need to reference them anymore in the replacement, as these are not "real" characters anyway!
You can read the string into an array and then prepend the string to every item:
$ IFS=' ' read -r -a myarray <<< "word1 word2"
$ printf "%s\n" "${myarray[#]}"
word1
word2
$ printf "prefix-%s\n" "${myarray[#]}"
prefix-word1
prefix-word2

How to insert one character in front of a variable using sed

I want to turn this input_variable = 1
into input_variable = 01
From previous posts here I tried this but didn't work:
sed -e "s/\0" <<< "$input_variable"
I get:
Syntax error: redirection unexpected
What do I do wrong?
Thanks!
EDIT
Thanks to Benjamin I found a workaround (I would still like to know why the sed didn't work):
new_variable="0$input_variable"
While it can be done with sed, simple assignment in your script can do exactly what you want done. For example, if you have input_variable=1 and want input_variable=01, you can simply add a leading 0 by assignment:
input_variable="0${input_variable}"
or for additional types of numeric formatting you can use the printf -v option and take advantage of the format-specifiers provided by the printf function. For example:
printf -v input_variable "%02d" $input_variable
will zero-pad input_variable to a length of 2 (or any width you specify with the field-width modifier). You can also just add the leading zero regardless of the width with:
printf -v input_variable "0%s" $input_variable
sed is an excellent tool, but it isn't really the correct tool for this job.
You don't close the substitution command. Each substitution command must contain 3 delimiters
sed -e 's/pattern/replacement/' <<< 'text' # 3 backslashes
What you want to do could be done with:
sed -e 's/.*/0&/' <<< $input_variable
EDIT:
You are probably using Ubuntu and stumbled upon dash also known as the Almquist shell, which does not have the <<< redirection operator. The following would be a POSIX-compliant alternative, which works with dash as well:
sed -e 's/.*/0&/' <<~
$input_variable
~
And also this:
echo $input_variable | sed -e 's/.*/0&/'
To have the variable take on the new value, do this:
input_variable=$(echo $input_variable | sed -e 's/.*/0&/')
That's however not how you would write the shell script. Shell scripts usually give out some textual output, rather than setting external variables:
So, the script, let's call it append_zero.sh:
#!/bin/sh
echo $1 | sed 's/.*/0&/'
and you would execute it like this:
$ input_variable=1
$ input_variable=$(append_zero.sh input_variable)
$ echo $input_variable
01
This way you have a working shell script that you can reuse with any Unix system that has a POSIX compliant /bin/sh

how to print user1 from user1#10.129.12.121 using shell scripting or sed

I wanted to print the name from the entire address by shell scripting. So user1#12.12.23.234 should give output "user1" and similarly 11234#12.123.12.23 should give output 11234
Reading from the terminal:
$ IFS=# read user host && echo "$user"
<user1#12.12.23.234>
user1
Reading from a variable:
$ address='user1#12.12.23.234'
$ cut -d# -f1 <<< "$address"
user1
$ sed 's/#.*//' <<< "$address"
user1
$ awk -F# '{print $1}' <<< "$address"
user1
Using bash in place editing:
EMAIL='user#server.com'
echo "${EMAIL%#*}
This is a Bash built-in, so it might not be very portable (it won't run with sh if it's not linked to /bin/bash for example), but it is probably faster since it doesn't fork a process to handle the editing.
Using sed:
echo "$EMAIL" | sed -e 's/#.*//'
This tells sed to replace the # character and as many characters that it can find after it up to the end of line with nothing, ie. removing everything after the #.
This option is probably better if you have multiple emails stored in a file, then you can do something like
sed -e 's/#.*//' emails.txt > users.txt
Hope this helps =)
I tend to use expr for this kind of thing:
address='user1#12.12.23.234'
expr "$address" : '\([^#]*\)'
This is a use of expr for its pattern matching and extraction abilities. Translated, the above says: Please print out the longest prefix of $address that doesn't contain an #.
The expr tool is covered by Posix, so this should be pretty portable.
As a note, some historical versions of expr will interpret an argument with a leading - as an option. If you care about guarding against that, you can add an extra letter to the beginning of the string, and just avoid matching it, like so:
expr "x$address" : 'x\([^#]*\)'

Removing substring out of string using sed

I am trying to remove substring out of variable using sed like this:
PRINT_THIS="`echo "$fullpath" | sed 's/${rootpath}//' -`"
where
fullpath="/media/some path/dir/helloworld/src"
rootpath=/media/some path/dir
I want to echo just rest of the fullpath like this (i am using this on whole bunch of directories, so I need to store it in variables and do it automatically
echo "helloworld/src"
using variable it would be
echo "Directory: $PRINT_THIS"
Problem is, I can not get sed to remove the substring, what I am I doing wrong? Thanks
You don't need sed for that, bash alone is enough:
$ fullpath="/media/some path/dir/helloworld/src"
$ rootpath="/media/some path/dir"
$ echo ${fullpath#${rootpath}}
/helloworld/src
$ echo ${fullpath#${rootpath}/}
helloworld/src
$ rootpath=unrelated
$ echo ${fullpath#${rootpath}/}
/media/some path/dir/helloworld/src
Check out the String manipulation documentation.
To use variables in sed, you must use it like this :
sed "s#$variable##g" FILE
two things :
I use double quotes (shell don't expand variables in single quotes)
I use another separator that doesn't conflict with the slashes in your paths
Ex:
$ rootpath="/media/some path/dir"
$ fullpath="/media/some path/dir/helloworld/src"
$ echo "$fullpath"
/media/some path/dir/helloworld/src
$ echo "$fullpath" | sed "s#$rootpath##"
/helloworld/src

Resources