How to remove consecutive repeating characters from every line?

How to remove consecutive repeating characters from every line? - bash

I have the below lines in a file
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;;Profilicollis;Profilicollis_altmani;
Acanthocephala;Eoacanthocephala;Neoechinorhynchida;Neoechinorhynchidae;;;;
Acanthocephala;;;;;;;
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;;Polymorphus;;
and I want to remove the repeating semi-colon characters from all lines to look like below (note- there are repeating semi-colons in the middle of some of the above lines too)
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;Profilicollis;Profilicollis_altmani;
Acanthocephala;Eoacanthocephala;Neoechinorhynchida;Neoechinorhynchidae;
Acanthocephala;
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;Polymorphus;
I would appreciate if someone could kindly share a bash one-liner to accomplish this.

You can use tr with "squeeze":
tr -s ';' < infile

perl -p -e 's/;+/;/g' myfile # writes output to stdout
or
perl -p -i -e 's/;+/;/g' myfile # does an in-place edit

If you want to edit the file itself:
printf "%s\n" 'g/;;/s/;\{2,\}/;/g' w | ed -s foo.txt
If you want to pipe a modified copy of the file to something else and leave the original unchanged:
sed 's/;\{2,\}/;/g' foo.txt | whatever
These replace runs of 2 or more semicolons with single ones.

could be solved easily by substitutions.
I add an awk solution by playing with the FS/OFS variable:
awk -F';+' -v OFS=';' '$1=$1' file
or
awk -F';+' -v OFS=';' '($1=$1)||1' file

Here's a sed version of alaniwi's answer:
sed 's/;\+/;/g' myfile # Write output to stdout
or
sed -i 's/;\+/;/g' myfile # Edit the file in-place

Related

Linux Shell script Sed inserting

I need to write a shell script to insert a parameter string after every big letter in a file.
$parameter="4"
Example input.txt
AppLe
House
Example output.txt
A4ppL4e
H4ouse
I've tried to use
sed '/[A-Z]/i\$1\'
Can anyone help me?
THX

With GNU/BSD/busybox sed which support the -i option:
param=4
sed -i'' -e 's/\([[:upper:]]\)/\1'"$param"'/g' input.txt
This replaces each uppercase letter inside the captured group \(...\) globally with the first captured group \1 and the value of variable param in-place.
With standard sed you need a temporary file or sponge from the moreutils package:
param=4
sed 's/\([[:upper:]]\)/\1'"$param"'/g' input.txt > temp && mv temp input.txt
param=4
sed 's/\([[:upper:]]\)/\1'"$param"'/g' input.txt | sponge input.txt

Use a file editor like ed to edit files:
printf "%s\n" 'g/[[:upper:]]/s/\([[:upper:]]\)/\1'"$param"'/g' w | ed -s input.txt
or if you like heredocs better
ed -s input.txt <<EOF
g/[[:upper:]]/s/\([[:upper:]]\)/\1${param}/g
w
EOF

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?

To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile

There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file

You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.

The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile

You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.

I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.

You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).

To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename

I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s

Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'

SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'

perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).

You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.

echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt

Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file

to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.

Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.

cat filename | grep -v "pattern" > filename.1
mv filename.1 filename

You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF

This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done

I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

Remove everything in a pipe delimited file after second-to-last pipe

How can remove everything in a pipe delimited file after the second-to-last pipe? Like for the line
David|3456|ACCOUNT|MALFUNCTION|CANON|456
the result should be
David|3456|ACCOUNT|MALFUNCTION

Replace |(string without pipe)|(string without pipe) at the end of each line:
sed 's/|[^|]*|[^|]*$//' inputfile

Using awk, something like
awk -F'|' 'BEGIN{OFS="|"}{NF=NF-2; print}' inputfile
David|3456|ACCOUNT|MALFUNCTION
(or) use cut if you know the number of columns in total, i,e 6 -> 4
cut -d'|' -f -4 inputfile
David|3456|ACCOUNT|MALFUNCTION

The command I would use is
cat input.txt | sed -r 's/(.*)\|.*/\1/' > output.txt

A pure Bash solution:
while IFS= read -r line || [[ -n $line ]] ; do
printf '%s\n' "${line%|*|*}"
done <inputfile
See Reading input files by line using read command in shell scripting skips last line (particularly the answer by Jahid) for details of how the while loop works.
See pattern matching in Bash for information about ${line%|*|*}.

replace new lines with commas in shell

I want to replace new lines in text with coma or space but do not change the last new line.
I know of this question: How to replace newlines with tab characters? - but it does produce an tab on end instead of new line.
So far I have come with:
awk 'NR>1{printf","} {printf $1} END{printf"\n"}'
Is there an easier way to do this? This is not an assignment, I am just curious want to level up my scripting.

This might work for you:
paste -s -d, file
or
paste -s -d" " file

This should do a job:
echo ${$(tr '\n' ',' < file)%?}
or perhaps using sed:
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/,/g' file

Using perl is not quite as simple as paste, but it does generalize to more cases:
perl -0pe 's/\n(?!$)/,/g' filename # comma
perl -0pe 's/\n(?!$)/ /g' filename # space
The (?!$) is a lookahead assertion that means "not at the end", so the last newline is left alone.

You can use sed command to remove the end ","
# sed -i 's/,$//g' endmodify.txt
# cat endmodify.txt
root
bin
daemon
adm
lp
sync
games
gopher
Thanks.

Delete all lines beginning with a # from a file

All of the lines with comments in a file begin with #. How can I delete all of the lines (and only those lines) which begin with #? Other lines containing #, but not at the beginning of the line should be ignored.

This can be done with a sed one-liner:
sed '/^#/d'
This says, "find all lines that start with # and delete them, leaving everything else."

I'm a little surprised nobody has suggested the most obvious solution:
grep -v '^#' filename
This solves the problem as stated.
But note that a common convention is for everything from a # to the end of a line to be treated as a comment:
sed 's/#.*$//' filename
though that treats, for example, a # character within a string literal as the beginning of a comment (which may or may not be relevant for your case) (and it leaves empty lines).
A line starting with arbitrary whitespace followed by # might also be treated as a comment:
grep -v '^ *#' filename
if whitespace is only spaces, or
grep -v '^[ ]#' filename
where the two spaces are actually a space followed by a literal tab character (type "control-v tab").
For all these commands, omit the filename argument to read from standard input (e.g., as part of a pipe).

The opposite of Raymond's solution:
sed -n '/^#/!p'
"don't print anything, except for lines that DON'T start with #"

you can directly edit your file with
sed -i '/^#/ d'
If you want also delete comment lines that start with some whitespace use
sed -i '/^\s*#/ d'
Usually, you want to keep the first line of your script, if it is a sha-bang, so sed should not delete lines starting with #!. also it should delete lines, that just contain only a hash but no text. put it all together:
sed -i '/^\s*\(#[^!].*\|#$\)/d'
To be conform with all sed variants you need to add a backup extension to the -i option:
sed -i.bak '/^\s*#/ d' $file
rm -Rf $file.bak

You can use the following for an awk solution -
awk '/^#/ {sub(/#.*/,"");getline;}1' inputfile

This answer builds upon the earlier answer by Keith.
egrep -v "^[[:blank:]]*#" should filter out comment lines.
egrep -v "^[[:blank:]]*(#|$)" should filter out both comments and empty lines, as is frequently useful.
For information about [:blank:] and other character classes, refer to https://en.wikipedia.org/wiki/Regular_expression#Character_classes.

If you want to delete from the file starting with a specific word, then do this:
grep -v '^pattern' currentFileName > newFileName && mv newFileName currentFileName
So we have removed all the lines starting with a pattern, writing the content into a new file, and then copy the content back into the source/current file.

You also might want to remove empty lines as well
sed -E '/(^$|^#)/d' inputfile

Delete all empty lines and also all lines starting with a # after any spaces:
sed -E '/^$|^\s*#/d' inputfile
For example, see the following 3 deleted lines (including just line numbers!):
1. # first comment
2.
3. # second comment
After testing the command above, you can use option -i to edit the input file in place.
Just this!

Here is it with a loop for all files with some extension:
ll -ltr *.filename_extension > list.lst
for i in $(cat list.lst | awk '{ print $8 }') # validate if it is the 8 column on ls
do
echo $i
sed -i '/^#/d' $i
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to remove consecutive repeating characters from every line? - bash

You can use tr with "squeeze": tr -s ';' < infile

perl -p -e 's/;+/;/g' myfile # writes output to stdout or perl -p -i -e 's/;+/;/g' myfile # does an in-place edit

If you want to edit the file itself: printf "%s\n" 'g/;;/s/;\{2,\}/;/g' w | ed -s foo.txt If you want to pipe a modified copy of the file to something else and leave the original unchanged: sed 's/;\{2,\}/;/g' foo.txt | whatever These replace runs of 2 or more semicolons with single ones.

could be solved easily by substitutions. I add an awk solution by playing with the FS/OFS variable: awk -F';+' -v OFS=';' '$1=$1' file or awk -F';+' -v OFS=';' '($1=$1)||1' file

Here's a sed version of alaniwi's answer: sed 's/;\+/;/g' myfile # Write output to stdout or sed -i 's/;\+/;/g' myfile # Edit the file in-place

Related

Linux Shell script Sed inserting

How to delete a line (matching a pattern) from a text file? [duplicate]

Remove everything in a pipe delimited file after second-to-last pipe

replace new lines with commas in shell

Delete all lines beginning with a # from a file

Categories

Resources