Shell Scripting: Delete all instances of an exact word from file (Not pattern) - bash

I'm trying to delete every instance of a certain word in a file. I can't make it so that it doesn't delete the pattern from other words. For example if I want to remove the word 'the' from the file. It will remove 'the' from 'then' and leave me with just 'n'.
Right now I have tried:
sed s/"$word"//g -i final_in
And:
sed 's/\<"$word"\>//g' -i final_in
But neither of them have worked. I thought this would be pretty easy to Google, but every solution I find does not work properly.

$word='the'
$sed -r "s/\b$word\b//g" << HEREDOC
> Sample text
> therefore
> then
> the sky is blue
> HEREDOC
Sample text
therefore
then
sky is blue
\b=word boundary

# test
word='the'
echo 'aaa then bbb' | sed -r "s/$word//g"
# To match exacte word, you can add spaces :
word=then
echo 'aaa then bbb' | sed -e "s/ $word / /g"
# to modify a file
word='the'
cat file.txt | sed -r "s/ $word / /g" > tmp.txt
mv tmp.txt file.txt
# to consider ponctuations :
word=then
echo 'aaa. then, bbb' | sed -e "s/\([:.,;/]\)* *$word *\([:.,;/]\)*/\1 \2/g"

Related

Sed find and replace expression works with literal but not with variable interpolation

For the following MVCE:
echo "test_num: 0" > test.txt
test_num=$(grep 'test_num:' test.txt | cut -d ':' -f 2)
new_test_num=$((test_num + 1))
echo $test_num
echo $new_test_num
sed -i "s/test_num: $test_num/test_num: $new_test_num/g" test.txt
cat test.txt
echo "sed -i "s/test_num: $test_num/test_num: $new_test_num/g" test.txt"
sed -i "s/test_num: 0/test_num: 1/g" test.txt
cat test.txt
Which outputs
0 # parsed original number correctly
1 # increment the number
test_num: 0 # sed with interpolated variable, does not work
sed -i s/test_num: 0/test_num: 1/g test.txt # interpolated parameter looks right
test_num: 1 # ???
Why does sed -i "s/test_num: $test_num/test_num: $new_test_num/g" test.txt not produce the expected result when sed -i "s/test_num: 0/test_num: 1/g" test.txt works just fine in the above example?
As mentioned in the comment, there is a white space in ${test_num}. Therefore in your sed there should not be an empty space between the colon and your variable.
Also I guess you should surround your variable with curly bracket {} to increase readability.
sed "s/test_num:${test_num}/test_num: ${new_test_num}/g" test.txt
test_num: 1
If you just want the number in ${test_num}, you can try something like:
grep 'test_num:' test.txt | awk -F ': ' '{print $2}'
awk allows to specify delimiter with more than 1 character.
Instead of grep|cut you can also use sed.
#! /bin/bash
exec <<EOF
test_num: 0
EOF
grep 'test_num:' | cut -d ':' -f 2
exec <<EOF
test_num: 0
EOF
sed -n 's/^test_num: //p'
When using regexp replace in sed there is special meaning to $ .
Suggesting to rebuild your sed command segments as follow:
sed -i 's/test_num: '$test_num'/test_num: '$new_test_num'/g' test.txt
Other option, use echo command to expand variables in sed command.
sed_command=$(echo "s/test_num:${test_num}/test_num: ${new_test_num}/g")
sed -i "$sed_command" test.txt

How to remove extra spaces in bash?

How to remove extra spaces in variable HEAD?
HEAD=" how to remove extra spaces "
Result:
how to remove extra spaces
Try this:
echo "$HEAD" | tr -s " "
or maybe you want to save it in a variable:
NEWHEAD=$(echo "$HEAD" | tr -s " ")
Update
To remove leading and trailing whitespaces, do this:
NEWHEAD=$(echo "$HEAD" | tr -s " ")
NEWHEAD=${NEWHEAD%% }
NEWHEAD=${NEWHEAD## }
Using awk:
$ echo "$HEAD" | awk '$1=$1'
how to remove extra spaces
Take advantage of the word-splitting effects of not quoting your variable
$ HEAD=" how to remove extra spaces "
$ set -- $HEAD
$ HEAD=$*
$ echo ">>>$HEAD<<<"
>>>how to remove extra spaces<<<
If you don't want to use the positional paramaters, use an array
ary=($HEAD)
HEAD=${ary[#]}
echo "$HEAD"
One dangerous side-effect of not quoting is that filename expansion will be in play. So turn it off first, and re-enable it after:
$ set -f
$ set -- $HEAD
$ set +f
This horse isn't quite dead yet: Let's keep beating it!*
Read into array
Other people have mentioned read, but since using unquoted expansion may cause undesirable expansions all answers using it can be regarded as more or less the same. You could do
set -f
read HEAD <<< $HEAD
set +f
or you could do
read -rd '' -a HEAD <<< "$HEAD" # Assuming the default IFS
HEAD="${HEAD[*]}"
Extended Globbing with Parameter Expansion
$ shopt -s extglob
$ HEAD="${HEAD//+( )/ }" HEAD="${HEAD# }" HEAD="${HEAD% }"
$ printf '"%s"\n' "$HEAD"
"how to remove extra spaces"
*No horses were actually harmed – this was merely a metaphor for getting six+ diverse answers to a simple question.
Here's how I would do it with sed:
string=' how to remove extra spaces '
echo "$string" | sed -e 's/ */ /g' -e 's/^ *\(.*\) *$/\1/'
=> how to remove extra spaces # (no spaces at beginning or end)
The first sed expression replaces any groups of more than 1 space with a single space, and the second expression removes any trailing or leading spaces.
echo -e " abc \t def "|column -t|tr -s " "
column -t will:
remove the spaces at the beginning and at the end of the line
convert tabs to spaces
tr -s " " will squeeze multiple spaces to single space
BTW, to see the whole output you can use cat - -A: shows you all spacial characters including tabs and EOL:
echo -e " abc \t def "|cat - -A
output: abc ^I def $
echo -e " abc \t def "|column -t|tr -s " "|cat - -A
output:
abc def$
Whitespace can take the form of both spaces and tabs. Although they are non-printing characters and unseen to us, sed and other tools see them as different forms of whitespace and only operate on what you ask for. ie, if you tell sed to delete x number of spaces, it will do this, but the expression will not match tabs. The inverse is true- supply a tab to sed and it will not match spaces, even if the number of them is equal to those in a tab.
A more extensible solution that will work for removing either/both additional space in the form of spaces and tabs (I've tested mixing both in your specimen variable) is:
echo $HEAD | sed 's/^[[:blank:]]*//g'
or we can tighten-up #Frontear 's excellent suggestion of using xargs without the tr:
echo $HEAD | xargs
However, note that xargs would also remove newlines. So if you were to cat a file and pipe it to xargs, all the extra space- including newlines- are removed and everything put on the same line ;-).
Both of the foregoing achieved your desired result in my testing.
Try this one:
echo ' how to remove extra spaces ' | sed 's/^ *//g' | sed 's/$ *//g' | sed 's/ */ /g'
or
HEAD=" how to remove extra spaces "
HEAD=$(echo "$HEAD" | sed 's/^ *//g' | sed 's/$ *//g' | sed 's/ */ /g')
I would make use of tr to remove the extra spaces, and xargs to trim the back and front.
TEXT=" This is some text "
echo $(echo $TEXT | tr -s " " | xargs)
# [...]$ This is some text
echo variable without quotes does what you want:
HEAD=" how to remove extra spaces "
echo $HEAD
# or assign to new variable
NEW_HEAD=$(echo $HEAD)
echo $NEW_HEAD
output: how to remove extra spaces

Extract words from files

How can I extract all the words from a file, every word on a single line?
Example:
test.txt
This is my sample text
Output:
This
is
my
sample
text
The tr command can do this...
tr [:blank:] '\n' < test.txt
This asks the tr program to replace white space with a new line.
The output is stdout, but it could be redirected to another file, result.txt:
tr [:blank:] '\n' < test.txt > result.txt
And here the obvious bash line:
for i in $(< test.txt)
do
printf '%s\n' "$i"
done
EDIT Still shorter:
printf '%s\n' $(< test.txt)
That's all there is to it, no special (pathetic) cases included (And handling multiple subsequent word separators / leading / trailing separators is by Doing The Right Thing (TM)). You can adjust the notion of a word separator using the $IFS variable, see bash manual.
The above answer doesn't handle multiple spaces and such very well. An alternative would be
perl -p -e '$_ = join("\n",split);' test.txt
which would. E.g.
esben#mosegris:~/ange/linova/build master $ echo "test test" | tr [:blank:] '\n'
test
test
But
esben#mosegris:~/ange/linova/build master $ echo "test test" | perl -p -e '$_ = join("\n",split);'
test
test
This might work for you:
# echo -e "this is\tmy\nsample text" | sed 's/\s\+/\n/g'
this
is
my
sample
text
perl answer will be :
pearl.214> cat file1
a b c d e f pearl.215> perl -p -e 's/ /\n/g' file1
a
b
c
d
e
f
pearl.216>

SED: First and last empty lines not removed

I'm running the following but it's returning with empty lines at the top and bottom of the new file.
How do I output to a new file without these empty lines?
input | sed -E '/^$/d' > file.txt
The following has no effect either.
sed '1d'
sed '$d'
I'm unsure of where the expression has problems.
If you are comfortable using awk then this would work -
awk 'NF' INPUT_FILE > OUTPUT_FILE
grep . file_name > outfile would do the job for you.
This might work for you:
echo -e " \t\r\nsomething\n \t \r\n" | sed '/^\s*$/d' | cat -n
1 something
N.B. This removes all blank lines, to preserve blank lines in the body of a file use:
echo -e " \t\r\n something\n \nsomething else \n \t \r\n" |
sed ':a;$!{N;ba};s/^\(\s*\n\)*\|\(\s*\n\)*$//g'
something
something else

Using sed to replace a string with the contents of a variable, even if it's an escape character

I'm using
sed -e "s/\*DIVIDER\*/$DIVIDER/g" to replace *DIVIDER* with a user-specified string, which is stored in $DIVIDER. The problem is that I want them to be able to specify escape characters as their divider, like \n or \t. When I try this, I just end up with the letter n or t, or so on.
Does anyone have any ideas on how to do this? It will be greatly appreciated!
EDIT: Here's the meat of the script, I must be missing something.
curl --silent "$URL" > tweets.txt
if [[ `cat tweets.txt` == *\<error\>* ]]; then
grep -E '(error>)' tweets.txt | \
sed -e 's/<error>//' -e 's/<\/error>//' |
sed -e 's/<[^>]*>//g' |
head $headarg | sed G | fmt
else
echo $REPLACE | awk '{gsub(".", "\\\\&");print}'
grep -E '(description>)' tweets.txt | \
sed -n '2,$p' | \
sed -e 's/<description>//' -e 's/<\/description>//' |
sed -e 's/<[^>]*>//g' |
sed -e 's/\&amp\;/\&/g' |
sed -e 's/\&lt\;/\</g' |
sed -e 's/\&gt\;/\>/g' |
sed -e 's/\&quot\;/\"/g' |
sed -e 's/\&....\;/\?/g' |
sed -e 's/\&.....\;/\?/g' |
sed -e 's/^ *//g' |
sed -e :a -e '$!N;s/\n/\*DIVIDER\*/;ta' | # Replace newlines with *divider*.
sed -e "s/\*DIVIDER\*/${DIVIDER//\\/\\\\}/g" | # Replace *DIVIDER* with the actual divider.
head $headarg | sed G
fi
The long list of sed lines are replacing characters from an XML source, and the last two are the ones that are supposed to replace the newlines with the specified character. I know it seems redundant to replace a newline with another newline, but it was the easiest way I could come up with to let them pick their own divider. The divider replacement works great with normal characters.
You can use bash to escape the backslash like this:
sed -e "s/\*DIVIDER\*/${DIVIDER//\\/\\\\}/g"
The syntax is ${name/pattern/string}. If pattern begins with /, every occurence of pattern in name is replaced by string. Otherwise only the first occurence is replaced.
Maybe:
case "$DIVIDER" in
(*\\*) DIVIDER=$(echo "$DIVIDER" | sed 's/\\/\\\\/g');;
esac
I played with this script:
for DIVIDER in 'xx\n' 'xxx\\ddd' "xxx"
do
echo "In: <<$DIVIDER>>"
case "$DIVIDER" in (*\\*) DIVIDER=$(echo "$DIVIDER" | sed 's/\\/\\\\/g');;
esac
echo "Out: <<$DIVIDER>>"
done
Run with 'ksh' or 'bash' (but not 'sh') on MacOS X:
In: <<xx\n>>
Out: <<xx\\n>>
In: <<xxx\\ddd>>
Out: <<xxx\\\\ddd>>
In: <<xxx>>
Out: <<xxx>>
It seems to be a simple substitution:
$ d='\n'
$ echo "a*DIVIDER*b" | sed "s/\*DIVIDER\*/$d/"
a
b
Maybe I don't understand what you're trying to accomplish.
Then maybe this step could take the place of the last two of yours:
sed -n ":a;$ {s/\n/$DIVIDER/g;p;b};N;ba"
Note the space after the dollar sign. It prevents the shell from interpreting "${s..." as a variable name.
And as ghostdog74 suggested, you have way too many calls to sed. You may be able to change a lot of the pipe characters to backslashes (line continuation) and delete "sed" from all but the first one (leave the "-e" everywhere). (untested)
You just need to escape the escape char.
\n will match \n
\ will match \
\\ will match \
Using FreeBSD sed (e.g. on Mac OS X) you have to preprocess the $DIVIDER user input:
d='\n'
d='\t'
NL=$'\\\n'
TAB=$'\\\t'
d="${d/\\n/${NL}}"
d="${d/\\t/${TAB}}"
echo "a*DIVIDER*b" | sed -E -e "s/\*DIVIDER\*/${d}/"

Categories

Resources