bash / sed : editing of the file - bash

I use sed to remove all lines starting from "HETATM" from the input file and cat to combine another file with the output recieved from SED
sed -i '/^HETATM/ d' file1.pdb
cat fil2.pdb file1.pdb > file3.pdb
is this way to do it in one line e.g. using only sed?

If you want to consider awk then it can be done in a single command:
awk 'FNR == NR {print; next} !/^HETATM/' file2.pdb file1.pdb > file3.pdb

With cat + grep combination please try following code. Simple explanation would be, using cat command's capability to concatenate file's output when multiple files are passed to it and using grep -v to remove all words starting from HETATM in file1.pdb before sending is as an input to cat command and creating new file named file3.pdb from cat command's output.
cat file2.pdb <(grep -v '^HETATM' file1.pdb) > file3.pdb

I'm not sure what you mean by "remove all lines starting from 'HETATM'", but if you mean that any line that appears in the file after a line that starts with "HETATM" will not be outputted, then your sed expression won't do it - it will just remove all lines starting with the pattern while leaving all following lines that do not start with the pattern.
There are ways to get the effect I believe you wanted, possibly even with sed - but I don't know sed all that well. In perl I'd use the range operator with a guaranteed non-matching end expression (not sure what will be guaranteed for your input, I used "XXX" in this example):
perl -ne 'unless (/^HETATM/../XXX/) { print; }' file1.pdb

mawk '(FNR == NR) < NF' FS='^HETATM' f1 f2

Related

How to write to first n lines of file in bash

I have a file test.txt with following content
first
second
AAA
BBB
CCC
DDD
And I want to remove the first two lines and add new values to the first two lines,
so,
once the first two lines are removed, file should look like this:
AAA
BBB
CCC
DDD
And then should add the two values to first line and then the second line, so the file would finally look like below:
new value at line 1
new value at line 2
AAA
BBB
CCC
DDD
So I tried the below command, but how can I remove the first two lines?
SERVER_HOSTNAME=$(hostname)
SERVER_IP=$(ip -o route get to 8.8.8.8 | sed -n 's/.*src \([0-9.]\+\).*/\1/p')
sed -i "1s/.*/$SERVER_HOSTNAME/" /tmp/test.txt
sed -i "2s/.*/$SERVER_IP/" /tmp/test.txt
My problem is, when I first remove the first two lines, and execute the above command, it will replace the line number 1 and two with new values, but I want to add them on the top so the others (already existing content will shift down) will go down.
Your existing sed already does what you want; it doesn't need you to remove the first two lines yourself first.
$ cat tmp.txt
first
second
AAA
BBB
CCC
DDD
$ SERVER_HOSTNAME=example.local
$ SERVER_IP=127.0.0.1
$ sed -i "1s/.*/$SERVER_HOSTNAME/;2s/.*/$SERVER_IP/" tmp.txt
$ cat tmp.txt
example.local
127.0.0.1
AAA
BBB
CCC
DDD
Don't run sed -i repeatedly. Instead, combine all your commands into one script.
sed -i "1s/.*/$(hostname)/
2s/.*/$(ip -o route get to 8.8.8.8)/
2s/.*src \([0-9.]\+\).*/\1/" /tmp/test.txt
This is rather brittle, though; in particular, it will break if either of the command substitutions produces a slash in their output.
(IIRC ip has options to produce machine-readable output; you should probably look into that instead of replacing out the parts you don't want.)
If you want to add, then delete, the sed d and a commands do that, respectively. But removing two and adding two is obviously equivalent to replacing two.
sed -i "1d
1a\hello
2d
2a\hello" file
Unfortunately, the a command is poorly standardized, and it's unclear how two s commands would not work, so I'm leaving this as a sketch.
Could you please try following, written and tested with shown samples in GNU awk. This will edit 1st and 2nd lines with shell variable values and do an inplace save into Input_file itself.
In case you want to keep 1st line's content as well as print current content then one could remove next in FNR==2 OR FNR==1 conditions.
SERVER_HOSTNAME=$(hostname)
SERVER_IP=$(ip -o route get to 8.8.8.8 | sed -n 's/.*src \([0-9.]\+\).*/\1/p')
awk -v server_ip="$SERVER_IP" -v server_hostname="$SERVER_HOSTNAME" '
FNR==1{ print server_hostname; next}
FNR==2{ print server_ip; ; next}
1
' Input_file > temp && mv temp Input_file
Explanation for above solution:
awk -v server_ip="$SERVER_IP" -v server_hostname="$SERVER_HOSTNAME" '
##Starting awk program from here, creating server_ip and server_hostname vars with respective shell vars.
FNR==1{ print server_hostname; next}
##Checking condition if this is 1st line then print server_hostname.
FNR==2{ print server_ip; ; next}
##Checking condition if this is 2nd line then print server_ip here.
1
##1 will print current line.
' Input_file
##Mentioning Input_file name here.
You may use this gnu sed:
gsed -i -e "1i\\$SERVER_HOSTNAME\n$SERVER_IP" -e '1,2{d;q;}' /tmp/test.txt
Or using POSIX sed:
sed -i.bak "1,2{d;q;};3i\\
$SERVER_HOSTNAME
3i\\
$SERVER_IP
" /tmp/test.txt

How to remove consecutive repeating characters from every line?

I have the below lines in a file
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;;Profilicollis;Profilicollis_altmani;
Acanthocephala;Eoacanthocephala;Neoechinorhynchida;Neoechinorhynchidae;;;;
Acanthocephala;;;;;;;
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;;Polymorphus;;
and I want to remove the repeating semi-colon characters from all lines to look like below (note- there are repeating semi-colons in the middle of some of the above lines too)
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;Profilicollis;Profilicollis_altmani;
Acanthocephala;Eoacanthocephala;Neoechinorhynchida;Neoechinorhynchidae;
Acanthocephala;
Acanthocephala;Palaeacanthocephala;Polymorphida;Polymorphidae;Polymorphus;
I would appreciate if someone could kindly share a bash one-liner to accomplish this.
You can use tr with "squeeze":
tr -s ';' < infile
perl -p -e 's/;+/;/g' myfile # writes output to stdout
or
perl -p -i -e 's/;+/;/g' myfile # does an in-place edit
If you want to edit the file itself:
printf "%s\n" 'g/;;/s/;\{2,\}/;/g' w | ed -s foo.txt
If you want to pipe a modified copy of the file to something else and leave the original unchanged:
sed 's/;\{2,\}/;/g' foo.txt | whatever
These replace runs of 2 or more semicolons with single ones.
could be solved easily by substitutions.
I add an awk solution by playing with the FS/OFS variable:
awk -F';+' -v OFS=';' '$1=$1' file
or
awk -F';+' -v OFS=';' '($1=$1)||1' file
Here's a sed version of alaniwi's answer:
sed 's/;\+/;/g' myfile # Write output to stdout
or
sed -i 's/;\+/;/g' myfile # Edit the file in-place

Insert a variable in a text file [duplicate]

This question already has answers here:
Replace a string in shell script using a variable
(12 answers)
Closed 4 years ago.
I am trying to using sed -i command to insert a string variable in the 1st line of a text file.
This command work : sed -i '1st header' file.txt
But when i pass a variable this doesn't work.
example :
var=$(cat <<-END
This is line one.
This is line two.
This is line three.
END
)
sed -i '1i $var' file.txt # doesn't work
sed -i ’1i $var’ file.txt # doesn't work
Any help with this problem
Thank you
First, let's define your variable a simpler way:
$ var="This is line one.
This is line two.
This is line three."
Since sed is not good at working with variables, let's use awk. This will place your variable at the beginning of a file:
awk -v x="$var" 'NR==1{print x} 1' file.txt
How it works
-v x="$var"
This defines an awk variable x to have the value of shell variable $var.
NR==1{print x}
At the first line, this tells awk to insert the value of variable x.
1
This is awk's shorthand for print-the-line.
Example
Let's define your variable:
$ var="This is line one.
> This is line two.
> This is line three."
Let's work on this test file:
$ cat File
1
2
This is what the awk command produces:
$ awk -v x="$var" 'NR==1{print x} 1' File
This is line one.
This is line two.
This is line three.
1
2
Changing a file in-place
To change file.txt in place using a recent GNU awk:
awk -i inplace -v x="$var" 'NR==1{print x} 1' file.txt
On macOS, BSD or older GNU/Linux, use:
awk -v x="$var" 'NR==1{print x} 1' file.txt >tmp && mv tmp file.txt
Using printf...
$ var="This is line one.
This is line two.
This is line three.
"
Use cat - to read from stdin and then print into a new file. Move it to the original file if you want to modify it.
$ printf "$var" | cat - file > newfile && mv newfile file;
Not the best job for sed. What about a simple cat ?
cat - file.txt <<EOF > newfile.txt
This is line one.
This is line two.
This is line three.
EOF
# you can add mv, if you really want the original file gone
mv newfile.txt file.txt
And for the original problem - sed does not like newlines and spaces in it's 'program', you need to quote and escape the line breaks:
# this works
sed $'1i "abc\\\ncde"' file.txt
# this does not, executes the `c` command from the second line
sed $'1i "abc\ncde"' file.txt

How to increment number in a file

I have one file with the date like below,let say file name is file1.txt:
2013-12-29,1
Here I have to increment the number by 1, so it should be 1+1=2 like..
2013-12-29,2
I tried to use 'sed' to replace and must be with variables only.
oldnum=`cut -d ',' -f2 file1.txt`
newnum=`expr $oldnum + 1`
sed -i 's\$oldnum\$newnum\g' file1.txt
But I get an error from sed syntax, is there any way for this. Thanks in advance.
Sed needs forward slashes, not back slashes. There are multiple interesting issues with your use of '\'s actually, but the quick fix should be (use double quotes too, as you see below):
oldnum=`cut -d ',' -f2 file1.txt`
newnum=`expr $oldnum + 1`
sed -i "s/$oldnum\$/$newnum/g" file1.txt
However, I question whether sed is really the right tool for the job in this case. A more complete single tool ranging from awk to perl to python might work better in the long run.
Note that I used a $ end-of-line match to ensure you didn't replace 2012 with 2022, which I don't think you wanted.
usually I would like to use awk to do jobs like this
following is the code might work
awk -F',' '{printf("%s\t%d\n",$1,$2+1)}' file1.txt
Here is how to do it with awk
awk -F, '{$2=$2+1}1' OFS=, file1.txt
2013-12-29,2
or more simply (this will file if value is -1)
awk -F, '$2=$2+1' OFS=, file1.txt
To make a change to the change to the file, save it somewhere else (tmp in the example below) and then move it back to the original name:
awk -F, '{$2=$2+1}1' OFS=, file1.txt >tmp && mv tmp file1.txt
Or using GNU awk, you can do this to skip temp file:
awk -i include -F, '{$2=$2+1}1' OFS=, file1.txt
Another, single line, way would be
expr cat /tmp/file 2>/dev/null + 1 >/tmp/file
this works if the file doesn't exist or if the file doesnt contain a valid number - in both cases the file is (re)created with a value of 1
awk is the best for your problem, but you can also do the calculation in shell
In case you have more than one rows, I am using loop here
#!/bin/bash
IFS=,
while read DATE NUM
do
echo $DATE,$((NUM+1))
done < file1.txt
Bash one liner option with BC. Sample:
$ echo 3 > test
$ echo 1 + $(<test) | bc > test
$ cat test
4
Also works:
bc <<< "1 + $(<test)" > test

Bash - remove all lines beginning with 'P'

I have a text file that's about 300KB in size. I want to remove all lines from this file that begin with the letter "P". This is what I've been using:
> cat file.txt | egrep -v P*
That isn't outputting to console. I can use cat on the file without another other commands and it prints out fine. My final intention being to:
> cat file.txt | egrep -v P* > new.txt
No error appears, it just doesn't print anything out and if I run the 2nd command, new.txt is empty.
I should say I'm running Windows 7 with Cygwin installed.
Explanation
use ^ to anchor your pattern to the beginning of the line ;
delete lines matching the pattern using sed and the d flag.
Solution #1
cat file.txt | sed '/^P/d'
Better solution
Use sed-only:
sed '/^P/d' file.txt > new.txt
With awk:
awk '!/^P/' file.txt
Explanation
The condition starts with an ! (negation), that negates the following pattern ;
/^P/ means "match all lines starting with a capital P",
So, the pattern is negated to "ignore lines starting with a capital P".
Finally, it leverage awk's behavior when { … } (action block) is missing, that is to print the record validating the condition.
So, to rephrase, it ignores lines starting with a capital P and print everything else.
Note
sed is line oriented and awk column oriented. For your case you should use the first one, see Edouard Lopez's reponse.
Use sed with inplace substitution (for GNU sed, will also for your cygwin)
sed -i '/^P/d' file.txt
BSD (Mac) sed
sed -i '' '/^P/d' file.txt
Use start of line mark and quotes:
cat file.txt | egrep -v '^P.*'
P* means P zero or more times so together with -v gives you no lines
^P.* means start of line, then P, and any char zero or more times
Quoting is needed to prevent shell expansion.
This can be shortened to
egrep -v ^P file.txt
because .* is not needed, therefore quoting is not needed and egrep can read data from file.
As we don't use extended regular expressions grep will also work fine
grep -v ^P file.txt
Finally
grep -v ^P file.txt > new.txt
This works:
cat file.txt | egrep -v -e '^P'
-e indicates expression.

Resources