Finding a pattern, then executing a line change only after the pattern - bash

I have a file of the like:
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
O 1 2 3 4
H 1 2 3 4
H 1 2 3 4
FRAGNAME=H2ODFT
O 1 2 3 4
H 1 2 3 4
H 1 2 3 4
I want to remove the column "1" from the lines only after the $EFRAG line. and add a label to the O H H as well. My expected output is:
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
I'm new to coding in bash, and I'm not quite sure where to start.
I was thinking of piping a grep command to a sed command, but I'm not sure how that syntax would look. Am also trying to learn awk, but that syntax is even more confusing to me. Currently trying to read a book on it's capabilities.
Any thoughts or ideas would be greatly appreciated!
L

Use the following awk processing:
awk '$0~/\$EFRAG/ {
start = 1; # marker denoting the needed block
split("a b c", suf); # auxiliary array of suffixes
}
start {
if (/^FRAGNAME/) idx = 1; # encountering subblock
if (/^[OH]/) { # if starting with O or H char
$2 = "";
$1 = $1 suf[idx++];
}
}1' test.txt
H 1 2 3 4
H 1 2 3 4
C 1 2 3 4
$END
$EFRAG
COORD=CART
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4
FRAGNAME=H2ODFT
Oa 2 3 4
Hb 2 3 4
Hc 2 3 4

If ed is available/acceptable.
The script.ed (name it to your own hearts content) something like:
/\$EFRAG$/;$g/^O /s/^\([^ ]*\) [^ ]* \(.*\)$/\1a \2/\
/^H /s/^\([^ ]*\) [^ ]* \(.*\)$/\1b \2/\
/^H /s/^\([^ ]*\) [^ ]* \(.*\)$/\1c \2/
,p
Q
Now run
ed -s file.txt < script.ed
Change Q to w if in-place editing is required.
Remove the ,p to silence the output.

This might work for you (GNU sed):
sed -E '1,/\$EFRAG/b;/^O/{N;N;s/^(O) \S+(.*\nH) \S+(.*\nH) \S+/\1a\2b\3c/}' file
Do not process lines from the start of the file until after encountering one containing $EFRAG.
If a line begins O, append the next two lines and then using pattern matching and back references, format those lines accordingly.

Related

How to sort the file based on last column in unix using sort command?

a 1
b 2 4
c 3
d 4 5 7
e 4 6
f 5
how can we print the output like below using sort in which the last column is sorted -
a 1
c 3
b 2 4
f 5
e 4 6
d 4 5 7
We can achieve the result using awk -
$awk '{print $NF,$0}' file.txt | sort -n | cut -f2- -d' '
a 1
c 3
b 2 4
f 5
e 4 6
d 4 5 7
Could you please try following and let me know if this helps you.
rev Input_file | sort -nk1.1 | rev

Bash_shell Use shell to convert three format in one script to another script at one time

cat file1.txt
set A B 1
set C D E 2
set E F 3 3 3 3 3 3
cat file2.txt
A;B;1;
C;D.E;2;
E;F;3 3 3 3 3 3;
please help convert the format in file1.txt to file2.txt, the file2.txt is the output. I just input 3 lines in file1.txt for taking example, but in fact ,there are many command lines same with these 3 format.So the shell command should be adapt to any situation where the content contains these 3 format in file1.txt.
echo "set A B 1
set C D E 2
set E F 3 3 3 3 3 3 " | sed -r 's/set (.) /\1;/;s/([A-Z])*( ([A-Z]))/\1.\3/g;s/([A-Z]) ([0-9])/\1;\2/;s/ ?$/;/'
A;B;1;
C;D.E;2;
E;F;3 3 3 3 3 3;

Find and replace entries in one csv file using another with bash

Main file:
A B
C D
D A
G H
Ref file:
1 A
2 B
3 C
4 D
5 G
6 H
New file:
1 2
3 4
4 1
5 6
I wanna do the above replacement, how can I do that using awk or some simple command line?
awk solution:
awk 'NR==FNR{ a[$2]=$1; next }{ $1=a[$1]; $2=a[$2] }1' reffile mainfile
The output:
1 2
3 4
4 1
5 6
a[$2]=$1 - capturing numbers from reffile into array indexed by letters (e.g. a["A"]=1)
$1=a[$1]; $2=a[$2] - replacing letters in mainfile with respective numbers

Add to the end of a predetermined line using sed in bash

I have a file in the format:
C 1 1 2
H 2 2 1
C 3 1 2
C 3 3 2
H 2 3 1
I need to add " f" to the end of specific lines, for example the third line, so the output would be:
C 1 1 2
H 2 2 1
C 3 1 2 f
C 3 3 2
H 2 3 1
From Googling, it seems that I need to use sed, but I couldn't find any examples on how to do specifically what I want.
Thanks in advance.
You are looking for this article on sed. Specifically, the section on restricting to a line number. An example:
sed '3 s/$/f/' < yourFile
awk 'NR==3{$0=$0" f"}1' your_file

Get n last records and change particular columns on them

I have file like this
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
* a
0 b
I want delete a, b from two last Records in END{} section
Result:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
How can I get n last lines and change fields on them with awk?
Here's one way using any awk:
awk -v count=$(wc -l <file.txt) 'NR > count - 2 { $2 = "" }1' file.txt
Results:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
Or to do awk operations for all records except 2 last lines of input file as a shell script, try ./script.sh file.txt. Contents of script.sh:
command=$(awk -v count=$(wc -l <"$1") 'NR <= count - 2 { $2 = "" }1' "$1"
echo -e "$command"
Results:
1 "45554323" p b
2 "34534567" f a
3 "76546787" u b
2 "56765435" f a
* a
0 b
If you know the value of n - the line number after which you want to delete the last item on the line/colum (here 4) this will work:
awk '{if (NR>4) NF=NF-1}1' data.txt
will give:
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
NF = NF -1 makes awk think there is one less field on the line than there is, which is how it doesn't display the last column/item on the line once that condition is met. NR refers to the current line number in the file being read.
awk can't know the number of lines in a file unless it goes through it once, or is given that information (e.g., wc -l). An alternative approach would be to save the last n lines in a buffer (sort of a sliding window/tape-delay type analogy, you are always printing n lines behind) and then process the final n lines in the END block.
This doesn't exactly answer your question but it produces the output you require:
$ gawk '{if (NF < 3) print $1; else print}' input.txt
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
$ cat file
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
* a
0 b
$ awk 'BEGIN{ARGV[ARGC++]=ARGV[ARGC-1]} NR==FNR{nr++; next} FNR>(nr-2) {NF--} 1' file
1 2 "45554323" p b
2 2 "34534567" f a
3 3 "76546787" u b
2 4 "56765435" f a
*
0
or if you don't mind manually specifying the file name twice:
awk 'NR==FNR{nr++; next} FNR>(nr-2) {NF--} 1' file file

Resources