Add prefix to each word of each line in bash

Add prefix to each word of each line in bash - bash

I have a variable called deps:
deps='word1 word2'
I want to add a prefix to each word of the variable.
I tried with:
echo $deps | while read word do \ echo "prefix-$word" \ done
but i get:
bash: syntax error near unexpected token `done'
any help? thanks

With sed :
$ deps='word1 word2'
$ echo "$deps" | sed 's/[^ ]* */prefix-&/g'
prefix-word1 prefix-word2

For well behaved strings, the best answer is:
printf "prefix-%s\n" $deps
as suggested by 123 in the comments to fedorqui's answer.
Explanation:
Without quoting, bash will split the contents of $deps according to $IFS (which defaults to " \n\t") before calling printf
printf evaluates the pattern for each of the provided arguments and writes the output to stdout.
printf is a shell built-in (at least for bash) and does not fork another process, so this is faster than sed-based solutions.

In another question I just came across the markers for beginning (\<) and end (\>) of words. With those you can shorten the solution of SLePort above somewhat. The solution also nicely extends to appending a suffix, which I needed in addition to the prefix, but couldn't figure out how to use above solution for it, as the & also includes the possible trailing whitespace after the word.
So my solution is this:
$ deps='word1 word2'
# add prefix:
$ echo "$deps" | sed 's/\</prefix-/g'
prefix-word1 prefix-word2
# add suffix:
$ echo "$deps" | sed 's/\>/-suffix/g'
word1-suffix word2-suffix
Explanation: \< matches the beginning of every word, and \> matches the end of each word. You can simply "replace" these by the prefix/suffix, resulting in them being prepended/appended. There is no need to reference them anymore in the replacement, as these are not "real" characters anyway!

You can read the string into an array and then prepend the string to every item:
$ IFS=' ' read -r -a myarray <<< "word1 word2"
$ printf "%s\n" "${myarray[#]}"
word1
word2
$ printf "prefix-%s\n" "${myarray[#]}"
prefix-word1
prefix-word2

Related

Using sed with a regex to replace strings

I want to replace some string which contain specific words with another word.
Here is my code
#!/bin/bash
arr='foo/foo/baz foo/bar/baz foo/baz/baz';
for i in ${arr[#]}; do
echo $i | sed -e 's|foo/(bar\|baz)/baz|test|g'
done
Result
foo/foo/baz
foo/bar/baz
foo/baz/baz
Expected
foo/foo/baz
foo/test/baz
foo/test/baz

There are several things you can improve. The reason you are using the alternate delimiters '|' for the sed substitution expression (to avoid the "picket fence" appearance of \/\/\/ complicates the use of '|' as the OR (alternative) regex component. Choose an alternative delimiter that does not also server as part of the regular expression, '#' works fine.
Next there is no reason to loop, simply use a here string to redirect the contents of arr to sed and place it all in a command substitution with the "%s\n" format specifier to provide the newline separated output. (that's a mouthful, but it is actually nothing more than)
arr='foo/foo/baz foo/bar/baz foo/baz/baz'
printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr))
Example Use/Output
To test it out, just select the expressions above and middle-mouse paste the selection into your terminal, e.g.
$ arr='foo/foo/baz foo/bar/baz foo/baz/baz'
> printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr)
foo/foo/baz
foo/test/baz
foo/test/baz
Look things over and let me know if you have further questions.

How about something like this:
sed -e 's/\(bar\|baz\)\//test\//g'

Combining Grep and Paste command in bash

very sorry to ask a stupid question but I'm getting crazy with this thing.
So, I'm in bash and I have some files:
ls
a.bed
b.bed
c.bed
all I want to do is create a variable that have all the 3 of them separated with a comma, this is the output I search for:
a.bed, b.bed, c.bed
What I'm using for now (but have spaces instead of commas is):
beds=$(ls|grep .bed)
which have
a.bed b.bed c.bed
Thank you so much

I would use printf and its -v option, followed by a use of parameter expansion.
$ printf -v beds '%s, ' *.bed
$ beds=${beds%, }
The first line produces a.bed, b.bed, c.bed, . The second line trims the trailing , .
If you only need a single-character separator, an alternative is to use an array with IFS:
$ beds=$(a=(*.bed); IFS=,; echo "${a[*]}")

You can do it with ls 'x' and 'm' options alone:
beds=$(ls -xm *.bed)
echo $beds
a.bed, b.bed, c.bed

Here's one that is a bit wacky:
beds=$( tr \ , <<< $(ls *.bed))
In the example above, we get rid of the newlines in the ls output simply by executing it with $(). Then we use the resulting string as input to tr which replaces all spaces with commas.
My favorite is using the built in -xm parameters in ls, but this particular answer can apply to other executables that do not provide the rich set of output formats that ls does.

Overkill for this specific case but just as an FYI you could do:
$ bedsArr=( *.bed )
$ bedsStr=$( printf '%s, ' "${bedsArr[#]:0:$((${#bedsArr[#]} - 1))}"; printf "%s\n" "${bedsArr[#]: -1:1}" )
$ printf '%s\n' "$bedsStr"
a.bed, b.bed, c.bed

Extracting a substring until and including a matching word using bash tools

I have file names like these:
func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-pfobloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-rest_run-01_bold_space-T1w_preproc.nii.gz
and from each file name I want to extract the part until and including the word bold so that in the end I have:
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
Any ideas how to do that?

The easiest thing to do is to just remove bold and everything after, then replace bold. Obviously, this only works if the terminating string is fixed, as in this case.
$ f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${f%%bold*}"
func/sub-01_task-biommtloc_run-01_
$ echo "${f%%bold*}bold"
func/sub-01_task-biommtloc_run-01_bold

Is something like this what you want?
echo func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz | sed -e 's#bold_.*$#bold#'
Hope this helps

This is (needlessly) clever: remove the prefix ending with "bold"
and then so some substring index arithmetic based on the length of the suffix that's left over:
$ file=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ echo "$keep"
func/sub-01_task-biommtloc_run-01_bold
If $file does not contain "bold", then $keep will be empty: we can give it the value of $file if it is empty:
$ file=foobar
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ : ${keep:=$file}
$ echo "$keep"
foobar
But seriously, do what chepner suggests.

using Perl
> echo "func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz" | perl -e 'while (<>) { $_=~s/(.*bold)(.*)/\1/g; print } '
func/sub-01_task-biommtloc_run-01_bold
>

This is similar to glenn's solution, but a bit "less clever" in that it doesn't use substrings, just nested substitutions:
$ while IFS= read -r fname; do echo "${fname%"${fname#*bold}"}"; done < infile
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
The substitution "${fname%"${fname#*bold}"}" says:
Remove "${fname#*bold}" from the end of each filename, where
"${fname#*bold}" is everything up to and including bold removed from the front of the filename
Example for the first filename with explicit intermediate steps:
$ fname=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${fname#*bold}"
_space-T1w_preproc.nii.gz
$ echo "${fname%"${fname#*bold}"}"
func/sub-01_task-biommtloc_run-01_bold

f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.g
echo "${f//bold*/bold}"

I would recommend using sed for this task. First take all of your input filenames and stick them in a file, call it namelist.txt in the current directory. The following will work, as long as your sed supports extended regular expressions (which most will, particularly GNU sed). Note that the flag for extended regular expressions may differ a bit between platforms, check your sed manual page. On my Linux, it is -r.
bash -c "sed -r 's/(sub-01_task-.{1,10}_run-01_bold).+/\\1/' namelist.txt"

Remove a fixed prefix/suffix from a string in Bash

I want to remove the prefix/suffix from a string. For example, given:
string="hello-world"
prefix="hell"
suffix="ld"
How do I get the following result?
"o-wor"

$ prefix="hell"
$ suffix="ld"
$ string="hello-world"
$ foo=${string#"$prefix"}
$ foo=${foo%"$suffix"}
$ echo "${foo}"
o-wor
This is documented in the Shell Parameter Expansion section of the manual:
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches the beginning of the expanded value of parameter, then the result of the expansion is the expanded value of parameter with the shortest matching pattern (the # case) or the longest matching pattern (the ## case) deleted. […]
${parameter%word}
${parameter%%word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches a trailing portion of the expanded value of parameter, then the result of the expansion is the value of parameter with the shortest matching pattern (the % case) or the longest matching pattern (the %% case) deleted. […]

Using sed:
$ echo "$string" | sed -e "s/^$prefix//" -e "s/$suffix$//"
o-wor
Within the sed command, the ^ character matches text beginning with $prefix, and the trailing $ matches text ending with $suffix.
Adrian Frühwirth makes some good points in the comments below, but sed for this purpose can be very useful. The fact that the contents of $prefix and $suffix are interpreted by sed can be either good OR bad- as long as you pay attention, you should be fine. The beauty is, you can do something like this:
$ prefix='^.*ll'
$ suffix='ld$'
$ echo "$string" | sed -e "s/^$prefix//" -e "s/$suffix$//"
o-wor
which may be what you want, and is both fancier and more powerful than bash variable substitution. If you remember that with great power comes great responsibility (as Spiderman says), you should be fine.
A quick introduction to sed can be found at http://evc-cit.info/cit052/sed_tutorial.html
A note regarding the shell and its use of strings:
For the particular example given, the following would work as well:
$ echo $string | sed -e s/^$prefix// -e s/$suffix$//
...but only because:
echo doesn't care how many strings are in its argument list, and
There are no spaces in $prefix and $suffix
It's generally good practice to quote a string on the command line because even if it contains spaces it will be presented to the command as a single argument. We quote $prefix and $suffix for the same reason: each edit command to sed will be passed as one string. We use double quotes because they allow for variable interpolation; had we used single quotes the sed command would have gotten a literal $prefix and $suffix which is certainly not what we wanted.
Notice, too, my use of single quotes when setting the variables prefix and suffix. We certainly don't want anything in the strings to be interpreted, so we single quote them so no interpolation takes place. Again, it may not be necessary in this example but it's a very good habit to get into.

$ string="hello-world"
$ prefix="hell"
$ suffix="ld"
$ #remove "hell" from "hello-world" if "hell" is found at the beginning.
$ prefix_removed_string=${string/#$prefix}
$ #remove "ld" from "o-world" if "ld" is found at the end.
$ suffix_removed_String=${prefix_removed_string/%$suffix}
$ echo $suffix_removed_String
o-wor
Notes:
#$prefix : adding # makes sure that substring "hell" is removed only if it is found in beginning.
%$suffix : adding % makes sure that substring "ld" is removed only if it is found in end.
Without these, the substrings "hell" and "ld" will get removed everywhere, even it is found in the middle.

I use grep for removing prefixes from paths (which aren't handled well by sed):
echo "$input" | grep -oP "^$prefix\K.*"
\K removes from the match all the characters before it.

Do you know the length of your prefix and suffix? In your case:
result=$(echo $string | cut -c5- | rev | cut -c3- | rev)
Or more general:
result=$(echo $string | cut -c$((${#prefix}+1))- | rev | cut -c$((${#suffix}+1))- | rev)
But the solution from Adrian Frühwirth is way cool! I didn't know about that!

Small and universal solution:
expr "$string" : "$prefix\(.*\)$suffix"

Using the =~ operator:
$ string="hello-world"
$ prefix="hell"
$ suffix="ld"
$ [[ "$string" =~ ^$prefix(.*)$suffix$ ]] && echo "${BASH_REMATCH[1]}"
o-wor

NOTE: Not sure if this was possible back in 2013 but it's certainly possible today (10 Oct 2021) so adding another option ...
Since we're dealing with known fixed length strings (prefix and suffix) we can use a bash substring to obtain the desired result with a single operation.
Inputs:
string="hello-world"
prefix="hell"
suffix="ld"
Plan:
bash substring syntax: ${string:<start>:<length>}
skipping over prefix="hell" means our <start> will be 4
<length> will be total length of string (${#string}) minus the lengths of our fixed length strings (4 for hell / 2 for ld)
This gives us:
$ echo "${string:4:(${#string}-4-2)}"
o-wor
NOTE: the parens can be removed and still obtain the same result
If the values of prefix and suffix are unknown, or could vary, we can still use this same operation but replace 4 and 2 with ${#prefix} and ${#suffix}, respectively:
$ echo "${string:${#prefix}:${#string}-${#prefix}-${#suffix}}"
o-wor

Using #Adrian Frühwirth answer:
function strip {
local STRING=${1#$"$2"}
echo ${STRING%$"$2"}
}
use it like this
HELLO=":hello:"
HELLO=$(strip "$HELLO" ":")
echo $HELLO # hello

How can I read words (instead of lines) from a file?

I've read this question about how to read n characters from a text file using bash. I would like to know how to read a word at a time from a file that looks like:
example text
example1 text1
example2 text2
example3 text3
Can anyone explain that to me, or show me an easy example?
Thanks!

The read command by default reads whole lines. So the solution is probably to read the whole line and then split it on whitespace with e.g. for:
#!/bin/sh
while read line; do
for word in $line; do
echo "word = '$word'"
done
done <"myfile.txt"

The way to do this with standard input is by passing the -a flag to read:
read -a words
echo "${words[#]}"
This will read your entire line into an indexed array variable, in this case named words. You can then perform any array operations you like on words with shell parameter expansions.
For file-oriented operations, current versions of Bash also support the mapfile built-in. For example:
mapfile < /etc/passwd
echo ${MAPFILE[0]}
Either way, arrays are the way to go. It's worth your time to familiarize yourself with Bash array syntax to make the most of this feature.

Ordinarily, you should read from a file using a while read -r line loop. To do this and parse the words on the lines requires nesting a for loop inside the while loop.
Here is a technique that works without requiring nested loops:
for word in $(<inputfile)
do
echo "$word"
done

In the context given, where the number of words is known:
while read -r word1 word2 _; do
echo "Read a line with word1 of $word1 and word2 of $word2"
done
If you want to read each line into an array, read -a will put the first word into element 0 of your array, the second into element 1, etc:
while read -r -a words; do
echo "First word is ${words[0]}; second word is ${words[1]}"
declare -p words # print the whole array
done

In bash, just use space as delimiter (read -d ' '). This method requires some preprocessing to translate newlines into spaces (using tr) and to merge several spaces into a single one (using sed):
{
tr '\n' ' ' | sed 's/ */ /g' | while read -d ' ' WORD
do
echo -n "<${WORD}> "
done
echo
} << EOF
Here you have some words, including * wildcards
that don't get expanded,
multiple spaces between words,
and lines with spaces at the begining.
EOF
The main advantage of this method is that you don't need to worry about the array syntax and just work as with a for loop, but without wildcard expansion.

I came across this question and the proposed answers, but I don't see listed this simple possibile solution:
for word in `cat inputfile`
do
echo $word
done

This can be done using AWK too:
awk '{for(i=1;i<=NF;i++) {print $i}}' text_file

You can combine xargs which reads word delimited by space or newline and echo to print one per line:
<some-file xargs -n1 echo
some-command | xargs -n1 echo
That also works well for large or slow streams of data because it does not need to read the whole input at once.
I’ve used this to read 1 table name at a time from SQLite which prints table names in a column layout:
sqlite3 db.sqlite .tables | xargs -n1 echo | while read table; do echo "1 table: $table"; done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio