How to write to first n lines of file in bash - bash

I have a file test.txt with following content
first
second
AAA
BBB
CCC
DDD
And I want to remove the first two lines and add new values to the first two lines,
so,
once the first two lines are removed, file should look like this:
AAA
BBB
CCC
DDD
And then should add the two values to first line and then the second line, so the file would finally look like below:
new value at line 1
new value at line 2
AAA
BBB
CCC
DDD
So I tried the below command, but how can I remove the first two lines?
SERVER_HOSTNAME=$(hostname)
SERVER_IP=$(ip -o route get to 8.8.8.8 | sed -n 's/.*src \([0-9.]\+\).*/\1/p')
sed -i "1s/.*/$SERVER_HOSTNAME/" /tmp/test.txt
sed -i "2s/.*/$SERVER_IP/" /tmp/test.txt
My problem is, when I first remove the first two lines, and execute the above command, it will replace the line number 1 and two with new values, but I want to add them on the top so the others (already existing content will shift down) will go down.

Your existing sed already does what you want; it doesn't need you to remove the first two lines yourself first.
$ cat tmp.txt
first
second
AAA
BBB
CCC
DDD
$ SERVER_HOSTNAME=example.local
$ SERVER_IP=127.0.0.1
$ sed -i "1s/.*/$SERVER_HOSTNAME/;2s/.*/$SERVER_IP/" tmp.txt
$ cat tmp.txt
example.local
127.0.0.1
AAA
BBB
CCC
DDD

Don't run sed -i repeatedly. Instead, combine all your commands into one script.
sed -i "1s/.*/$(hostname)/
2s/.*/$(ip -o route get to 8.8.8.8)/
2s/.*src \([0-9.]\+\).*/\1/" /tmp/test.txt
This is rather brittle, though; in particular, it will break if either of the command substitutions produces a slash in their output.
(IIRC ip has options to produce machine-readable output; you should probably look into that instead of replacing out the parts you don't want.)
If you want to add, then delete, the sed d and a commands do that, respectively. But removing two and adding two is obviously equivalent to replacing two.
sed -i "1d
1a\hello
2d
2a\hello" file
Unfortunately, the a command is poorly standardized, and it's unclear how two s commands would not work, so I'm leaving this as a sketch.

Could you please try following, written and tested with shown samples in GNU awk. This will edit 1st and 2nd lines with shell variable values and do an inplace save into Input_file itself.
In case you want to keep 1st line's content as well as print current content then one could remove next in FNR==2 OR FNR==1 conditions.
SERVER_HOSTNAME=$(hostname)
SERVER_IP=$(ip -o route get to 8.8.8.8 | sed -n 's/.*src \([0-9.]\+\).*/\1/p')
awk -v server_ip="$SERVER_IP" -v server_hostname="$SERVER_HOSTNAME" '
FNR==1{ print server_hostname; next}
FNR==2{ print server_ip; ; next}
1
' Input_file > temp && mv temp Input_file
Explanation for above solution:
awk -v server_ip="$SERVER_IP" -v server_hostname="$SERVER_HOSTNAME" '
##Starting awk program from here, creating server_ip and server_hostname vars with respective shell vars.
FNR==1{ print server_hostname; next}
##Checking condition if this is 1st line then print server_hostname.
FNR==2{ print server_ip; ; next}
##Checking condition if this is 2nd line then print server_ip here.
1
##1 will print current line.
' Input_file
##Mentioning Input_file name here.

You may use this gnu sed:
gsed -i -e "1i\\$SERVER_HOSTNAME\n$SERVER_IP" -e '1,2{d;q;}' /tmp/test.txt
Or using POSIX sed:
sed -i.bak "1,2{d;q;};3i\\
$SERVER_HOSTNAME
3i\\
$SERVER_IP
" /tmp/test.txt

Related

bash / sed : editing of the file

I use sed to remove all lines starting from "HETATM" from the input file and cat to combine another file with the output recieved from SED
sed -i '/^HETATM/ d' file1.pdb
cat fil2.pdb file1.pdb > file3.pdb
is this way to do it in one line e.g. using only sed?
If you want to consider awk then it can be done in a single command:
awk 'FNR == NR {print; next} !/^HETATM/' file2.pdb file1.pdb > file3.pdb
With cat + grep combination please try following code. Simple explanation would be, using cat command's capability to concatenate file's output when multiple files are passed to it and using grep -v to remove all words starting from HETATM in file1.pdb before sending is as an input to cat command and creating new file named file3.pdb from cat command's output.
cat file2.pdb <(grep -v '^HETATM' file1.pdb) > file3.pdb
I'm not sure what you mean by "remove all lines starting from 'HETATM'", but if you mean that any line that appears in the file after a line that starts with "HETATM" will not be outputted, then your sed expression won't do it - it will just remove all lines starting with the pattern while leaving all following lines that do not start with the pattern.
There are ways to get the effect I believe you wanted, possibly even with sed - but I don't know sed all that well. In perl I'd use the range operator with a guaranteed non-matching end expression (not sure what will be guaranteed for your input, I used "XXX" in this example):
perl -ne 'unless (/^HETATM/../XXX/) { print; }' file1.pdb
mawk '(FNR == NR) < NF' FS='^HETATM' f1 f2

grep matching specific position in lines using words from other file

I have 2 file
file1:
12342015010198765hello
12342015010188765hello
12342015010178765hello
whose each line contains fields at fixed positions, for example, position 13 - 17 is for account_id
file2:
98765
88765
which contains a list of account_ids.
In Korn Shell, I want to print lines from file1 whose position 13 - 17 match one of account_id in file2.
I can't do
grep -f file2 file1
because account_id in file2 can match other fields at other positions.
I have tried using pattern in file2:
^.{12}98765.*
but did not work.
Using awk
$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello
How it works
NR==FNR{a[$1]=1;next;}
FNR is the number of lines read so far from the current file and NR is the total number of lines read so far. Thus, if FNR==NR, we are reading the first file which is file2.
Each ID in in file2 is saved in array a. Then, we skip the rest of the commands and jump to the next line.
substr($0,13,5) in a
If we reach this command, we are working on the second file, file1.
This condition is true if the 5 character long substring that starts at position 13 is in array a. If the condition is true, then awk performs the default action which is to print the line.
Using grep
You mentioned trying
grep '^.{12}98765.*' file2
That uses extended regex syntax which means that -E is required. Also, there is no value in matching .* at the end: it will always match. Thus, try:
$ grep -E '^.{12}98765' file1
12342015010198765hello
To get both lines:
$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello
This works because [89]8765 just happens to match the IDs of interest in file2. The awk solution, of course, provides more flexibility in what IDs to match.
Using sed with extended regex:
sed -r 's#.*#/^.{12}&/p#' file2 |sed -nr -f- file1
Using Basic regex:
sed 's#.*#/^.\\{12\\}&/p#' file1 |sed -n -f- file
Explanation:
sed -r 's#.*#/^.{12}&/p#' file2
will generate an output:
/.{12}98765/p
/.{12}88765/p
which is then used as a sed script for the next sed after pipe, which outputs:
12342015010198765hello
12342015010188765hello
Using Grep
The most convenient is to put each alternative in a separate line of the file.
You can look at this question:
grep multiple patterns single file argument list too long

Extract string between two patterns (inclusive) while conserving the format

I have a file in the following format
cat test.txt
id1,PPLLTOMaaaaaaaaaaaJACK
id2,PPLRTOMbbbbbbbbbbbJACK
id3,PPLRTOMcccccccccccJACK
I am trying to identify and print the string between TOM and JACK including these two strings, while maintaining the first column FS=,
Desired output:
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
So far I have tried gsub:
awk -F"," 'gsub(/.*TOM|JACK.*/,"",$2) && !_[$0]++' test.txt > out.txt
and have the following output
id1 aaaaaaaaaaa
id2 bbbbbbbbbbb
id3 ccccccccccc
As you can see I am getting close but not able to include TOM and JACK patterns in my output. Plus I am also losing the original FS. What am I doing wrong?
Any help will be appreciated.
You are changing a field ($2) which causes awk to reconstruct the record using the value of OFS as the field separator and so in this case changing the commas to spaces.
Never use _ as a variable name - using a name with no meaning is just slightly better than using a name with the wrong meaning, just pick a name that means something which, in this case is seen but idk what you are trying to do when using that in this context.
gsub() and sub() do not support capture groups so you either need to use match()+substr():
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/){$2=substr($2,RSTART,RLENGTH)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
or use GNU awk for the 3rd arg to match()
$ gawk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=a[0]} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
or for gensub():
$ gawk 'BEGIN{FS=OFS=","} {$2=gensub(/.*(TOM.*JACK).*/,"\\1","",$2)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
The main difference between the match() and gensub() solutions is how they would behave if TOM appeared twice on the line:
$ cat file
id1,PPLLfooTOMbarTOMaaaaaaaaaaaJACK
id2,PPLRTOMbbbbbbbbbbbJACKfooJACKbar
id3,PPLRfooTOMbarTOMcccccccccccJACKfooJACKbar
$
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=a[0]} 1' file
id1,TOMbarTOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACKfooJACK
id3,TOMbarTOMcccccccccccJACKfooJACK
$
$ awk 'BEGIN{FS=OFS=","} {$2=gensub(/.*(TOM.*JACK).*/,"\\1","",$2)} 1' file
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACKfooJACK
id3,TOMcccccccccccJACKfooJACK
and just to show one way of stopping at the first instead of the last JACK on the line:
$ awk 'BEGIN{FS=OFS=","} match($2,/TOM.*JACK/,a){$2=gensub(/(JACK).*/,"\\1","",a[0])} 1' file
id1,TOMbarTOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMbarTOMcccccccccccJACK
Use capture groups to save the parts of the line you want to keep. Here's how to do it with sed
sed 's/^\([^,]*,\).*\(TOM.*JACK\).*/\1\2/' <test.txt > out.txt
Do you mean to do the following?
$ cat test.txt
id1,PPLLTOMaaaaaaaaaaaJACKABCD
id2,PPLRTOMbbbbbbbbbbbJACKDFCC
id3,PPLRTOMcccccccccccJACKSDER
$ cat test.txt | sed -e 's/,.*TOM/,TOM/g' | sed -e 's/JACK.*/JACK/g'
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK
$
This should work as long as the TOM and JACK do not repeat themselves.
sed 's/\(.*,\).*\(TOM.*JACK\).*/\1\2/' <oldfile >newfile
Output:
id1,TOMaaaaaaaaaaaJACK
id2,TOMbbbbbbbbbbbJACK
id3,TOMcccccccccccJACK

grep (awk) a file from A to first empty line

I need to grep a file from a line containing Pattern A to a first empty line.
I used awk but I don't know how to code this empty line.
cat ${file} | awk '/Pattern A/,/Pattern B/'
sed might be best:
sed -n '/PATTERN/,/^$/p' file
To avoid printing the empty line:
sed -n '/PATTERN/,/^$/{/^$/d; p}' file
or even better - thanks jthill!:
sed -n '/PATTERN/,/^$/{/./p}' file
Above solutions will give more output than needed if PATTERN appears more than once. For that, it is best to quit after empty line is found, as jaypal's answer suggests:
sed -n '/PATTERN/,/^$/{/^$/q; p}' file
Explanation
^$ matches empty lines, because ^ stands for beginning of line and $ for end of line. So that, ^$ means: lines not containing anything in between beginning and end of line.
/PATTERN/,/^$/{/^$/d; p}
/PATTERN/,/^$/ match lines from PATTERN to empty line.
{/^$/d; p} remove (d) the lines being on ^$ format, print (p) the rest.
{/./p} just prints those lines having at least one character.
With awk you can use:
awk '!NF{f=0} /PATTERN/ {f=1} f' file
Same as sed, if it has many lines with PATTERN it would fail. For this, let's exit once empty line is found:
awk 'f && !NF{exit} /PATTERN/ {f=1} f' file
Explanation
!NF{f=0} if there are no fields (that is, line is empty), unset the flag f.
/PATTERN/ {f=1} if PATTERN is found, set the flag f.
f if flag f is set, this is True, so it performs the default awk behaviour: print the line.
Test
$ cat a
aa
bb
hello
aaaaaa
bbb
ttt
$ awk '!NF{f=0} /hello/ {f=1} f' a
hello
aaaaaa
bbb
$ sed -n '/hello/,/^$/{/./p}' a
hello
aaaaaa
bbb
Using sed:
sed -n '/PATTERN/,/^$/{/^$/q;p;}' file
Using regex range, you define your range from the PATTERN to blank line (/^$/). When you encounter a blank line, you quit else you keep printing.
Using awk:
awk '/PATTERN/{p=1}/^$/&&p{exit}p' file
You enable a flag when you encounter your PATTERN. When you reach a blank line and flag is enabled, you exit. If not, you keep printing.
Another alternate suggested by devnull in the comments is to use pcregrep:
pcregrep -M 'PATTERN(.|\n)*?(?=\n\n)' file
I think this is a nice, readable Perl one-liner:
perl -wne '$f=1 if /Pattern A/; exit if /^\s*$/; print if $f' file
Set the flag $f when the pattern is matched
Exit if a blank line (only whitespace between start and end of line) is found
Print the line if the flag is set
Testing it out:
$ cat file
1
2
Pattern A
3
4
5
6
7
8
9
$ perl -wne '$f=1 if /Pattern A/; exit if /^$/; print if $f' file
Pattern A
3
4
5
6
Alternatively, based on the suggestion by #jaypal, you could do this:
perl -lne '/Pattern A/ .. 1 and !/^$/ ? print : exit' file
Rather than using a flag $f, the range operator .. takes care of this for you. It evaluates to true when "Pattern A" is found on the line and remains true indefinitely. When it is true, the other part will be evaluated and will print until a blank line is found.
Never use
/foo/,/bar/
in awk unless you want to get from the first occurrence of "foo" to the last occurrence of "bar" as it makes trivial jobs marginally briefer but even slightly more interesting requirements require a complete re-write.
Just use:
/foo/{f=1} f{print; if (/bar/) f=0}
or similar instead.
In the case the awk solution is:
awk '/pattern/{f=1} f{print; if (!NF) exit}' file

Removing lines based on column values read from file

I use the following code to extract lines from input_file with a certain value in the first column. The values on which the extraction of lines is based is in "one_column.txt":
while read file
do
awk -v col="$file" '$1==col {print $0}' input_file >> output_file
done < one_column.txt
My question is, how do I extract the lines where the first column does not match any of the values in one_column.txt? In other words, how do I extract only the remaining lines from input_file that don't end up in output_file?
grep -vf can make it:
grep -vf output_file input_file
grep -f compares one file with another. grep -v matches the opposite.
Test
$ cat a
hello
good
bye
$ cat b
hello
good
bye
you
all
$ grep -f a b
hello
good
bye
$ grep -vf a b ## opposite
you
all

Resources