Replace '>' with '>\n' in several files in shell/bash - bash

I have several files in a single folder and I want to replace the character > with >\n everywhere in all of those files.
But whatever I do, the \n character does not get added after the > character.
I have tried the following:
echo '>ABCCHACAC' | tr '\>' '>\\n'
echo '>ABCCHACAC' | tr '>' '>\\n'
echo '>ABCCHACAC' | tr '>' '>\n'
echo '>ABCCHACAC' | tr '>' '\>\n'
echo '>ABCCHACAC' | tr '>' '\>\\n'
echo '>ABCCHACAC' | tr '>' '\>\\n'
But I get the same input string as output, whereas the correct output I want is:
>
ABCCHACAC
And I am using this script to do the same thing on many files:
for f in *.txt
do
tr ">" ">\n" < "$f" > $(basename "$f" .txt)_newline_added.txt
done

tr is for one-for-one character replacements, not replacing strings. E.g. if you translate abc with def, it replaces all a with d, all b with e, and all c with f. When the second string is longer than the first, the extra characters are ignored. So tr '>' '>\n' means to replace > with > and ignores \n.
Use sed to perform string replacements.
sed 's/>/>\n/g' "$f" > "$(basename "$f" .txt)_newline_added.txt"

In addition to Barmar's answer, if you're using a BSD based *nix (eg. OS X) you'll either need to include an escaped literal newline, or possibly use tr in addition to sed.
Escaped literal newline:
$ sed 's/^>/>\
/' "$f"
sed with tr:
$ sed 's/^>/>▾/' "$f" | tr '▾' '\n'
↳ Insert newline (\n) using sed

Related

Bash: Replacing "" with newline character, using sed or tr

I'm trying to format output in a way that inserts newline characters after each 'line', with lines denoted by double quotes (""). The quotes themselves are temporary and to be stripped in a later step.
Input:
"a",1,"aa""b",2,"bb"
Output:
a,1,aa
b,2,bb
I've tried:
sed 's/""/\n/'
sed 's/""/\/g'
tr '""' '\n'
But tr seems to replace every quote character and sed seems to insert \n as text instead of a newline. What can I do to make this work?
echo '"a",1,"aa""b",2,"bb"' |awk -v RS='""' '{$1=$1} {gsub(/"/,"")}1'
a,1,aa
b,2,bb
or using sed:
echo '"a",1,"aa""b",2,"bb"' |sed -e 's/""/\n/' -e 's/"//g' # OR sed -e 's/""/\n/;s/"//g'
a,1,aa
b,2,bb
awk solution: Here the default record separator is changed from new line to "". So awk will consider the EOL when it hits "".
sed solution: Here first "" are converted into new line and second replacement is to remove " from each line.
neech#nicolaw.uk:~ $ cat file.txt
"a",1,"aa""b",2,"bb"
neech#nicolaw.uk:~ $ sed 's/""/\n/' file.txt | tr -d '"'
a,1,aa
b,2,bb
You seem to be dealing with POSIX sed, which does not have support for the \n notation. Insert an actual new-line into the pattern, either:
sed 's/""/\
/'
Or:
sed 's/""/\'$'\n''/'
E.g.:
sed 's/""/\
/' | tr -d \"
Output:
a,1,aa
b,2,bb
As suggested by George Vasiliou if you have perl you could use:
> echo '"a",1,"aa""b",2,"bb"' | perl -pe 's/""/"\n"/g;s/"//g'
This avoids the non portable sed problem.
Or for a crappy hack version.
Replace the "" with another character and then use tr (since tr should work with \n) to replace it with \n instead then remove the single " after.
So you can get the "" replaced with newline like this:
sed 's/""/#/g' | tr '#' '\n'
Then the rest follows:
> echo '"a",1,"aa""b",2,"bb"'| sed 's/""/#/g' | tr '#' '\n' | sed 's/\"//g'

removing all characters expect a-z 0-9 in bash

I am trying to remove all characters besides characters a-z and 0-9 in a bash file here is what I have so far:
#!/bin/bash
i=-1
cat rtrans.txt | while read line
do
i=$((i+1))
for word in $line
do
echo "$i $word"|tr A-Z a-z|sed 's/[\._-]//g'
done
done > input1.test
However with sed it seems like I have to input all different non characters I want to remove.
This there a better way of doing this?
You can use a character class
echo "$i $word" | tr A-Z a-z | sed -e 's/[^a-z0-9]//g'
this removes all characters not ^ in [a-z0-9].
If you want to split a file into words and number the lines consecutively, you can also try
tr -s ' \t' '\n' <rtrans.txt | tr A-Z a-z | sed -e 's/[^a-z]//g' | nl -n ln -w1 -s ' '
You can use ${var/Pattern/Replacement} as suggested with bash parameter substitution.
In your case, to remove from $word all characters besides a-z, A-Z and 0-9:
echo "$i ${word//[^a-zA-Z0-9]/}"

How to remove extra spaces in bash?

How to remove extra spaces in variable HEAD?
HEAD=" how to remove extra spaces "
Result:
how to remove extra spaces
Try this:
echo "$HEAD" | tr -s " "
or maybe you want to save it in a variable:
NEWHEAD=$(echo "$HEAD" | tr -s " ")
Update
To remove leading and trailing whitespaces, do this:
NEWHEAD=$(echo "$HEAD" | tr -s " ")
NEWHEAD=${NEWHEAD%% }
NEWHEAD=${NEWHEAD## }
Using awk:
$ echo "$HEAD" | awk '$1=$1'
how to remove extra spaces
Take advantage of the word-splitting effects of not quoting your variable
$ HEAD=" how to remove extra spaces "
$ set -- $HEAD
$ HEAD=$*
$ echo ">>>$HEAD<<<"
>>>how to remove extra spaces<<<
If you don't want to use the positional paramaters, use an array
ary=($HEAD)
HEAD=${ary[#]}
echo "$HEAD"
One dangerous side-effect of not quoting is that filename expansion will be in play. So turn it off first, and re-enable it after:
$ set -f
$ set -- $HEAD
$ set +f
This horse isn't quite dead yet: Let's keep beating it!*
Read into array
Other people have mentioned read, but since using unquoted expansion may cause undesirable expansions all answers using it can be regarded as more or less the same. You could do
set -f
read HEAD <<< $HEAD
set +f
or you could do
read -rd '' -a HEAD <<< "$HEAD" # Assuming the default IFS
HEAD="${HEAD[*]}"
Extended Globbing with Parameter Expansion
$ shopt -s extglob
$ HEAD="${HEAD//+( )/ }" HEAD="${HEAD# }" HEAD="${HEAD% }"
$ printf '"%s"\n' "$HEAD"
"how to remove extra spaces"
*No horses were actually harmed – this was merely a metaphor for getting six+ diverse answers to a simple question.
Here's how I would do it with sed:
string=' how to remove extra spaces '
echo "$string" | sed -e 's/ */ /g' -e 's/^ *\(.*\) *$/\1/'
=> how to remove extra spaces # (no spaces at beginning or end)
The first sed expression replaces any groups of more than 1 space with a single space, and the second expression removes any trailing or leading spaces.
echo -e " abc \t def "|column -t|tr -s " "
column -t will:
remove the spaces at the beginning and at the end of the line
convert tabs to spaces
tr -s " " will squeeze multiple spaces to single space
BTW, to see the whole output you can use cat - -A: shows you all spacial characters including tabs and EOL:
echo -e " abc \t def "|cat - -A
output: abc ^I def $
echo -e " abc \t def "|column -t|tr -s " "|cat - -A
output:
abc def$
Whitespace can take the form of both spaces and tabs. Although they are non-printing characters and unseen to us, sed and other tools see them as different forms of whitespace and only operate on what you ask for. ie, if you tell sed to delete x number of spaces, it will do this, but the expression will not match tabs. The inverse is true- supply a tab to sed and it will not match spaces, even if the number of them is equal to those in a tab.
A more extensible solution that will work for removing either/both additional space in the form of spaces and tabs (I've tested mixing both in your specimen variable) is:
echo $HEAD | sed 's/^[[:blank:]]*//g'
or we can tighten-up #Frontear 's excellent suggestion of using xargs without the tr:
echo $HEAD | xargs
However, note that xargs would also remove newlines. So if you were to cat a file and pipe it to xargs, all the extra space- including newlines- are removed and everything put on the same line ;-).
Both of the foregoing achieved your desired result in my testing.
Try this one:
echo ' how to remove extra spaces ' | sed 's/^ *//g' | sed 's/$ *//g' | sed 's/ */ /g'
or
HEAD=" how to remove extra spaces "
HEAD=$(echo "$HEAD" | sed 's/^ *//g' | sed 's/$ *//g' | sed 's/ */ /g')
I would make use of tr to remove the extra spaces, and xargs to trim the back and front.
TEXT=" This is some text "
echo $(echo $TEXT | tr -s " " | xargs)
# [...]$ This is some text
echo variable without quotes does what you want:
HEAD=" how to remove extra spaces "
echo $HEAD
# or assign to new variable
NEW_HEAD=$(echo $HEAD)
echo $NEW_HEAD
output: how to remove extra spaces

Remove blank spaces with comma in a string in bash shell

I would like to replace blank spaces/white spaces in a string with commas.
STR1=This is a string
to
STR1=This,is,a,string
Without using external tools:
echo ${STR1// /,}
Demo:
$ STR1="This is a string"
$ echo ${STR1// /,}
This,is,a,string
See bash: Manipulating strings.
Just use sed:
echo $STR1 | sed 's/ /,/g'
or pure BASH way::
echo ${STR1// /,}
kent$ echo "STR1=This is a string"|awk -v OFS="," '$1=$1'
STR1=This,is,a,string
Note:
if there are continued blanks, they would be replaced with a single comma. as example above shows.
This might work for you:
echo 'STR1=This is a string' | sed 'y/ /,/'
STR1=This,is,a,string
or:
echo 'STR1=This is a string' | tr ' ' ','
STR1=This,is,a,string
How about
STR1="This is a string"
StrFix="$( echo "$STR1" | sed 's/[[:space:]]/,/g')"
echo "$StrFix"
**output**
This,is,a,string
If you have multiple adjacent spaces in your string and what to reduce them to just 1 comma, then change the sed to
STR1="This is a string"
StrFix="$( echo "$STR1" | sed 's/[[:space:]][[:space:]]*/,/g')"
echo "$StrFix"
**output**
This,is,a,string
I'm using a non-standard sed, and so have used ``[[:space:]][[:space:]]*to indicate one or more "white-space" characters (including tabs, VT, maybe a few others). In a modern sed, I would expect[[:space:]]+` to work as well.
STR1=`echo $STR1 | sed 's/ /,/g'`

Extract words from files

How can I extract all the words from a file, every word on a single line?
Example:
test.txt
This is my sample text
Output:
This
is
my
sample
text
The tr command can do this...
tr [:blank:] '\n' < test.txt
This asks the tr program to replace white space with a new line.
The output is stdout, but it could be redirected to another file, result.txt:
tr [:blank:] '\n' < test.txt > result.txt
And here the obvious bash line:
for i in $(< test.txt)
do
printf '%s\n' "$i"
done
EDIT Still shorter:
printf '%s\n' $(< test.txt)
That's all there is to it, no special (pathetic) cases included (And handling multiple subsequent word separators / leading / trailing separators is by Doing The Right Thing (TM)). You can adjust the notion of a word separator using the $IFS variable, see bash manual.
The above answer doesn't handle multiple spaces and such very well. An alternative would be
perl -p -e '$_ = join("\n",split);' test.txt
which would. E.g.
esben#mosegris:~/ange/linova/build master $ echo "test test" | tr [:blank:] '\n'
test
test
But
esben#mosegris:~/ange/linova/build master $ echo "test test" | perl -p -e '$_ = join("\n",split);'
test
test
This might work for you:
# echo -e "this is\tmy\nsample text" | sed 's/\s\+/\n/g'
this
is
my
sample
text
perl answer will be :
pearl.214> cat file1
a b c d e f pearl.215> perl -p -e 's/ /\n/g' file1
a
b
c
d
e
f
pearl.216>

Resources