Combining Grep and Paste command in bash - bash

very sorry to ask a stupid question but I'm getting crazy with this thing.
So, I'm in bash and I have some files:
ls
a.bed
b.bed
c.bed
all I want to do is create a variable that have all the 3 of them separated with a comma, this is the output I search for:
a.bed, b.bed, c.bed
What I'm using for now (but have spaces instead of commas is):
beds=$(ls|grep .bed)
which have
a.bed b.bed c.bed
Thank you so much

I would use printf and its -v option, followed by a use of parameter expansion.
$ printf -v beds '%s, ' *.bed
$ beds=${beds%, }
The first line produces a.bed, b.bed, c.bed, . The second line trims the trailing , .
If you only need a single-character separator, an alternative is to use an array with IFS:
$ beds=$(a=(*.bed); IFS=,; echo "${a[*]}")

You can do it with ls 'x' and 'm' options alone:
beds=$(ls -xm *.bed)
echo $beds
a.bed, b.bed, c.bed

Here's one that is a bit wacky:
beds=$( tr \ , <<< $(ls *.bed))
In the example above, we get rid of the newlines in the ls output simply by executing it with $(). Then we use the resulting string as input to tr which replaces all spaces with commas.
My favorite is using the built in -xm parameters in ls, but this particular answer can apply to other executables that do not provide the rich set of output formats that ls does.

Overkill for this specific case but just as an FYI you could do:
$ bedsArr=( *.bed )
$ bedsStr=$( printf '%s, ' "${bedsArr[#]:0:$((${#bedsArr[#]} - 1))}"; printf "%s\n" "${bedsArr[#]: -1:1}" )
$ printf '%s\n' "$bedsStr"
a.bed, b.bed, c.bed

Related

Using sed with a regex to replace strings

I want to replace some string which contain specific words with another word.
Here is my code
#!/bin/bash
arr='foo/foo/baz foo/bar/baz foo/baz/baz';
for i in ${arr[#]}; do
echo $i | sed -e 's|foo/(bar\|baz)/baz|test|g'
done
Result
foo/foo/baz
foo/bar/baz
foo/baz/baz
Expected
foo/foo/baz
foo/test/baz
foo/test/baz
There are several things you can improve. The reason you are using the alternate delimiters '|' for the sed substitution expression (to avoid the "picket fence" appearance of \/\/\/ complicates the use of '|' as the OR (alternative) regex component. Choose an alternative delimiter that does not also server as part of the regular expression, '#' works fine.
Next there is no reason to loop, simply use a here string to redirect the contents of arr to sed and place it all in a command substitution with the "%s\n" format specifier to provide the newline separated output. (that's a mouthful, but it is actually nothing more than)
arr='foo/foo/baz foo/bar/baz foo/baz/baz'
printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr))
Example Use/Output
To test it out, just select the expressions above and middle-mouse paste the selection into your terminal, e.g.
$ arr='foo/foo/baz foo/bar/baz foo/baz/baz'
> printf "%s\n" $(sed 's#/\(bar\|baz\)/#/test/#g' <<< $arr)
foo/foo/baz
foo/test/baz
foo/test/baz
Look things over and let me know if you have further questions.
How about something like this:
sed -e 's/\(bar\|baz\)\//test\//g'

Extracting a substring until and including a matching word using bash tools

I have file names like these:
func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-pfobloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-rest_run-01_bold_space-T1w_preproc.nii.gz
and from each file name I want to extract the part until and including the word bold so that in the end I have:
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
Any ideas how to do that?
The easiest thing to do is to just remove bold and everything after, then replace bold. Obviously, this only works if the terminating string is fixed, as in this case.
$ f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${f%%bold*}"
func/sub-01_task-biommtloc_run-01_
$ echo "${f%%bold*}bold"
func/sub-01_task-biommtloc_run-01_bold
Is something like this what you want?
echo func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz | sed -e 's#bold_.*$#bold#'
Hope this helps
This is (needlessly) clever: remove the prefix ending with "bold"
and then so some substring index arithmetic based on the length of the suffix that's left over:
$ file=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ echo "$keep"
func/sub-01_task-biommtloc_run-01_bold
If $file does not contain "bold", then $keep will be empty: we can give it the value of $file if it is empty:
$ file=foobar
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ : ${keep:=$file}
$ echo "$keep"
foobar
But seriously, do what chepner suggests.
using Perl
> echo "func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz" | perl -e 'while (<>) { $_=~s/(.*bold)(.*)/\1/g; print } '
func/sub-01_task-biommtloc_run-01_bold
>
This is similar to glenn's solution, but a bit "less clever" in that it doesn't use substrings, just nested substitutions:
$ while IFS= read -r fname; do echo "${fname%"${fname#*bold}"}"; done < infile
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
The substitution "${fname%"${fname#*bold}"}" says:
Remove "${fname#*bold}" from the end of each filename, where
"${fname#*bold}" is everything up to and including bold removed from the front of the filename
Example for the first filename with explicit intermediate steps:
$ fname=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${fname#*bold}"
_space-T1w_preproc.nii.gz
$ echo "${fname%"${fname#*bold}"}"
func/sub-01_task-biommtloc_run-01_bold
f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.g
echo "${f//bold*/bold}"
I would recommend using sed for this task. First take all of your input filenames and stick them in a file, call it namelist.txt in the current directory. The following will work, as long as your sed supports extended regular expressions (which most will, particularly GNU sed). Note that the flag for extended regular expressions may differ a bit between platforms, check your sed manual page. On my Linux, it is -r.
bash -c "sed -r 's/(sub-01_task-.{1,10}_run-01_bold).+/\\1/' namelist.txt"

Adding double quotes to beginning, end and around comma's in bash variable

I have a shell script that accepts a parameter that is comma delimited,
-s 1234,1244,1567
That is passed to a curl PUT json field. Json needs the values in a "1234","1244","1567" format.
Currently, I am passing the parameter with the quotes already in it:
-s "\"1234\",\"1244\",\"1567\"", which works, but the users are complaining that its too much typing and hard to do. So I'd like to just take a comma delimited list like I had at the top and programmatically stick the quotes in.
Basically, I want a parameter to be passed in as 1234,2345 and end up as a variable that is "1234","2345"
I've come to read that easiest approach here is to use sed, but I'm really not familiar with it and all of my efforts are failing.
You can do this in BASH:
$> arg='1234,1244,1567'
$> echo "\"${arg//,/\",\"}\""
"1234","1244","1567"
awk to the rescue!
$ awk -F, -v OFS='","' -v q='"' '{$1=$1; print q $0 q}' <<< "1234,1244,1567"
"1234","1244","1567"
or shorter with sed
$ sed -r 's/[^,]+/"&"/g' <<< "1234,1244,1567"
"1234","1244","1567"
translating this back to awk
$ awk '{print gensub(/([^,]+)/,"\"\\1\"","g")}' <<< "1234,1244,1567"
"1234","1244","1567"
you can use this:
echo QV=$(echo 1234,2345,56788 | sed -e 's/^/"/' -e 's/$/"/' -e 's/,/","/g')
result:
echo $QV
"1234","2345","56788"
just add double quotes at start, end, and replace commas with quote/comma/quote globally.
easy to do with sed
$ echo '1234,1244,1567' | sed 's/[0-9]*/"\0"/g'
"1234","1244","1567"
[0-9]* zero more consecutive digits, since * is greedy it will try to match as many as possible
"\0" double quote the matched pattern, entire match is by default saved in \0
g global flag, to replace all such patterns
In case, \0 isn't recognized in some sed versions, use & instead:
$ echo '1234,1244,1567' | sed 's/[0-9]*/"&"/g'
"1234","1244","1567"
Similar solution with perl
$ echo '1234,1244,1567' | perl -pe 's/\d+/"$&"/g'
"1234","1244","1567"
Note: Using * instead of + with perl will give
$ echo '1234,1244,1567' | perl -pe 's/\d*/"$&"/g'
"1234""","1244""","1567"""
""$
I think this difference between sed and perl is similar to this question: GNU sed, ^ and $ with | when first/last character matches
Using sed:
$ echo 1234,1244,1567 | sed 's/\([0-9]\+\)/\"\1\"/g'
"1234","1244","1567"
ie. replace all strings of numbers with the same strings of numbers quoted using backreferencing (\1).

Add prefix to each word of each line in bash

I have a variable called deps:
deps='word1 word2'
I want to add a prefix to each word of the variable.
I tried with:
echo $deps | while read word do \ echo "prefix-$word" \ done
but i get:
bash: syntax error near unexpected token `done'
any help? thanks
With sed :
$ deps='word1 word2'
$ echo "$deps" | sed 's/[^ ]* */prefix-&/g'
prefix-word1 prefix-word2
For well behaved strings, the best answer is:
printf "prefix-%s\n" $deps
as suggested by 123 in the comments to fedorqui's answer.
Explanation:
Without quoting, bash will split the contents of $deps according to $IFS (which defaults to " \n\t") before calling printf
printf evaluates the pattern for each of the provided arguments and writes the output to stdout.
printf is a shell built-in (at least for bash) and does not fork another process, so this is faster than sed-based solutions.
In another question I just came across the markers for beginning (\<) and end (\>) of words. With those you can shorten the solution of SLePort above somewhat. The solution also nicely extends to appending a suffix, which I needed in addition to the prefix, but couldn't figure out how to use above solution for it, as the & also includes the possible trailing whitespace after the word.
So my solution is this:
$ deps='word1 word2'
# add prefix:
$ echo "$deps" | sed 's/\</prefix-/g'
prefix-word1 prefix-word2
# add suffix:
$ echo "$deps" | sed 's/\>/-suffix/g'
word1-suffix word2-suffix
Explanation: \< matches the beginning of every word, and \> matches the end of each word. You can simply "replace" these by the prefix/suffix, resulting in them being prepended/appended. There is no need to reference them anymore in the replacement, as these are not "real" characters anyway!
You can read the string into an array and then prepend the string to every item:
$ IFS=' ' read -r -a myarray <<< "word1 word2"
$ printf "%s\n" "${myarray[#]}"
word1
word2
$ printf "prefix-%s\n" "${myarray[#]}"
prefix-word1
prefix-word2

Concise and portable "join" on the Unix command-line

How can I join multiple lines into one line, with a separator where the new-line characters were, and avoiding a trailing separator and, optionally, ignoring empty lines?
Example. Consider a text file, foo.txt, with three lines:
foo
bar
baz
The desired output is:
foo,bar,baz
The command I'm using now:
tr '\n' ',' <foo.txt |sed 's/,$//g'
Ideally it would be something like this:
cat foo.txt |join ,
What's:
the most portable, concise, readable way.
the most concise way using non-standard unix tools.
Of course I could write something, or just use an alias. But I'm interested to know the options.
Perhaps a little surprisingly, paste is a good way to do this:
paste -s -d","
This won't deal with the empty lines you mentioned. For that, pipe your text through grep, first:
grep -v '^$' | paste -s -d"," -
This sed one-line should work -
sed -e :a -e 'N;s/\n/,/;ba' file
Test:
[jaypal:~/Temp] cat file
foo
bar
baz
[jaypal:~/Temp] sed -e :a -e 'N;s/\n/,/;ba' file
foo,bar,baz
To handle empty lines, you can remove the empty lines and pipe it to the above one-liner.
sed -e '/^$/d' file | sed -e :a -e 'N;s/\n/,/;ba'
How about to use xargs?
for your case
$ cat foo.txt | sed 's/$/, /' | xargs
Be careful about the limit length of input of xargs command. (This means very long input file cannot be handled by this.)
Perl:
cat data.txt | perl -pe 'if(!eof){chomp;$_.=","}'
or yet shorter and faster, surprisingly:
cat data.txt | perl -pe 'if(!eof){s/\n/,/}'
or, if you want:
cat data.txt | perl -pe 's/\n/,/ unless eof'
Just for fun, here's an all-builtins solution
IFS=$'\n' read -r -d '' -a data < foo.txt ; ( IFS=, ; echo "${data[*]}" ; )
You can use printf instead of echo if the trailing newline is a problem.
This works by setting IFS, the delimiters that read will split on, to just newline and not other whitespace, then telling read to not stop reading until it reaches a nul, instead of the newline it usually uses, and to add each item read into the array (-a) data. Then, in a subshell so as not to clobber the IFS of the interactive shell, we set IFS to , and expand the array with *, which delimits each item in the array with the first character in IFS
I needed to accomplish something similar, printing a comma-separated list of fields from a file, and was happy with piping STDOUT to xargs and ruby, like so:
cat data.txt | cut -f 16 -d ' ' | grep -o "\d\+" | xargs ruby -e "puts ARGV.join(', ')"
I had a log file where some data was broken into multiple lines. When this occurred, the last character of the first line was the semi-colon (;). I joined these lines by using the following commands:
for LINE in 'cat $FILE | tr -s " " "|"'
do
if [ $(echo $LINE | egrep ";$") ]
then
echo "$LINE\c" | tr -s "|" " " >> $MYFILE
else
echo "$LINE" | tr -s "|" " " >> $MYFILE
fi
done
The result is a file where lines that were split in the log file were one line in my new file.
Simple way to join the lines with space in-place using ex (also ignoring blank lines), use:
ex +%j -cwq foo.txt
If you want to print the results to the standard output, try:
ex +%j +%p -scq! foo.txt
To join lines without spaces, use +%j! instead of +%j.
To use different delimiter, it's a bit more tricky:
ex +"g/^$/d" +"%s/\n/_/e" +%p -scq! foo.txt
where g/^$/d (or v/\S/d) removes blank lines and s/\n/_/ is substitution which basically works the same as using sed, but for all lines (%). When parsing is done, print the buffer (%p). And finally -cq! executing vi q! command, which basically quits without saving (-s is to silence the output).
Please note that ex is equivalent to vi -e.
This method is quite portable as most of the Linux/Unix are shipped with ex/vi by default. And it's more compatible than using sed where in-place parameter (-i) is not standard extension and utility it-self is more stream oriented, therefore it's not so portable.
POSIX shell:
( set -- $(cat foo.txt) ; IFS=+ ; printf '%s\n' "$*" )
My answer is:
awk '{printf "%s", ","$0}' foo.txt
printf is enough. We don't need -F"\n" to change field separator.

Resources