Searching a string and replacing another string above the searched string - shell

I have a file with the lines below
123
456
123
789
abc
efg
xyz
I need to search with abc and replace immediate above 123 with 111. This is the requirement, abc is only one occurrence in the file but 123 can be multiple occurrences and 123 can be at any position above abc.
Please help me.
I have tried with below sed command
sed -i.bak "/abc/!{x;1!p;d;};x;s/123/1111" filename
With the above command, it is only replacing 123, if 123 is just above abc, if 123 is 2 lines above abc then replace is failing.

There's more than on way to do it. Here's one:
sed -i.bak '1{h;d;};/123/{x;p;d;};/abc/{x;s/123/111/;p;d;};H;${x;p;};d' filename

ed comes in handy for complex editing of files in scripts:
ed -s file <<EOF
/^abc$/;?^123$?;.c
111
.
w
EOF
This: Sets the current line to the first one matching abc (/^abc$/;). Then changes the first line before that point that matches 123 to 111 (?XXX? searches backwards for a matching regular expression, and ?^123$?;. selects that single line for c to change) and finally saves the modified file.

This is a classic case where you keep track of your previous line and change stuff depeinding on conditions satisfying the current line. Genearlly, an awk program looks like this:
awk '(FNR==1){prev=$0; next}
(condition_on_$0) { action_on_prev }
{ print prev; prev = $0 }
END { print $0 }'
So in the case of the OP, this would read:
awk '(FNR==1){prev=$0; next}
$0 == "abc" { if (prev == "123") prev = "111" }
{ print prev; prev = $0 }
END { print $0 }'

This might work for you (GNU sed):
sed -Ez 's/(.*)(\n123.*\nabc)/\1\n111\2/' file
This slurps the file into memory and inserts 111 in front of the last occurrence of 123 before abc.
A less memory intensive solution:
sed -E '/^123$/{:a;N;/\n123$/{h;s///p;g;s/.*\n//;ba};/\nabc$/!ba;s/^/111\n/}' file
This gathers up lines following a line containing 123. If another line containing 123 is encountered it offloads all lines before it and begins gathering lines again. If it finds a line containing abc it inserts 111 at the front of the lines gathered so far.
Another alternative:
sed '/abc/{x;/./{s/^/111\n/p;z};x;b};/123/{x;/./p;x;h;$!d;b};x;/./{x;H;$!d};x' file

$ tac file | awk 'f && sub(/123/,"111"){f=0} /abc/{f=1} 1' | tac
123
456
111
789
abc
efg
xyz

Related

Find a match in a field and print next n fields

BASH noob here.
I have a tab separated file structured like this:
ABC DEF x 123 456
GHI x 678 910
I need to match "x" and print x plus the following two fields:
x 123 456
x 678 910
I've tried a few things but the issue that throws me off is that "x" is not always in the same field. Can please somebody help?
Thanks in advance.
If you are working in bash, then bash provides parameter expansions with substring removal that are built-in. They (along with many more) are:
${var#pattern} Strip shortest match of pattern from front of $var
${var##pattern} Strip longest match of pattern from front of $var
${var%pattern} Strip shortest match of pattern from back of $var
${var%%pattern} Strip longest match of pattern from back of $var
So in your case you want to trim the longest path from the front up to x as the pattern, e.g.
while read line || [ -n "$line" ]; do
echo "x${line##*x}"
done
Where you read each line and then trim from the front until 'x' is found (you remove the 'x' as well), so you simply output "x....." where "....." is the rest of the line (restoring the 'x')
(for large data sets, you would want to use awk or sed for efficiency reasons)
Example Use/Output
Using your sample data in a heredoc, you could do:
$ while read line || [ -n "$line" ]; do
> echo "x${line##*x}"
> done << 'eof'
> ABC DEF x 123 456
> GHI x 678 910
> eof
x 123 456
x 678 910
You can just select-copy/middle-mouse-paste the following in your xterm to test:
while read line || [ -n "$line" ]; do
echo "x${line##*x}"
done << 'eof'
ABC DEF x 123 456
GHI x 678 910
eof
Using grep -o For Simplicity
You other option, is to use grep -o where the -o option returns the part of the line only-matching the expression you provide, so
grep -o 'x.*$' file
Is another simple option, e.g.
$ grep -o 'x.*$' << 'eof'
> ABC DEF x 123 456
> GHI x 678 910
> eof
x 123 456
x 678 910
Let me know if you have any further questions.
In case you need to match only tab separated field x:
pcregrep -o '(^|\t)\Kx(\t|$).*' file
awk 'n=match($0,/(^|\t)x(\t|$)/) {$0=substr($0,n); sub(/^\t/,""); print}' file
To print only the two following fields:
pcregrep -o '(^|\t)\Kx(\t[^\t]*){2}' file
awk 'n=match($0,/(^|\t)x\t[^\t]*\t[^\t]*/) {$0=substr($0,n,RLENGTH); sub(/^\t/,""); print}' file
Could you please try following, written and tested with shown samples in GNU awk.
awk '
match($0,/[[:space:]]+x[[:space:]]+[0-9]+[[:space:]]+[0-9]+$/){
val=substr($0,RSTART,RLENGTH)
sub(/^[[:space:]]+/,"",val)
print val
}
' Input_file
OR to match more than 1 set of digits after x with spaces try following.
awk '
match($0,/[[:space:]]+x[[:space:]]+([0-9]+[[:space:]]+){1,}[0-9]+/){
val=substr($0,RSTART,RLENGTH)
sub(/^[[:space:]]+/,"",val)
print val
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[[:space:]]+x[[:space:]]+[0-9]+[[:space:]]+[0-9]+$/){ ##Using match function to match regex here.
val=substr($0,RSTART,RLENGTH) ##Creating val which has sub string of matched regex(previous step) length.
sub(/^[[:space:]]+/,"",val) ##Substituting initial space with NULL in val here.
print val ##Printing val here.
}
' Input_file ##mentioning Input_file name here.
I need to match x and print x plus the following two fields:
Using awk without any regex:
awk 'BEGIN {FS=OFS="\t"} {for (i=1; i<=NF; ++i) if ($i == "x") break;
print $i, $(i+1), $(i+2)}' file
x 123 456
x 678 910
Or, using gnu sed:
sed -E 's/(^|.*\t)(x(\t[^\t]+){2}).*/\2/' file
x 123 456
x 678 910
If you want to remove everything before the "x", you can run a sed command like this:
sed 's/^.*x/x/g' file.txt
It finds all occurrences of the pattern ^.*x and replace it with x.
Breakdown of ^.*x:
^ means beginning of a line
.* a wild card pattern that can be more than one character
x the character "x"
Hence it replaces everything before and including "x" that are on the same line with the new pattern, just "x".
For more info on sed's find and replace command, see https://www.cyberciti.biz/faq/how-to-use-sed-to-find-and-replace-text-in-files-in-linux-unix-shell/.

BASH - Split file into several files based on conditions

I have a file (input.txt) with the following structure:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
...
I would like to split this file into multiple files (day.txt; month.txt; ...). Each new text file would contain all "header" lines (the one starting with >) and their content (lines between two header lines).
day.txt would therefore be:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
and month.txt:
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
I cannot use split -l in this case because the amount of lines is not the same for each category (day, month, etc.). However, each sub-category has the same number of lines (=3).
EDIT: As per OP adding 1 more solution now.
awk -F'[>_]' '/^>/{file=$2".txt"} {print > file}' Input_file
Explanation:
awk -F'[>_]' ' ##Creating field separator as > or _ in current lines.
/^>/{ file=$2".txt" } ##Searching a line which starts with > if yes then creating a variable named file whose value is 2nd field".txt"
{ print > file } ##Printing current line to variable file(which will create file name of variable file's value).
' Input_file ##Mentioning Input_file name here.
Following awk may help you on same.
awk '/^>day/{file="day.txt"} /^>month/{file="month.txt"} {print > file}' Input_file
You can set the record separator to > and then just set the file name based on the category given by $1.
$ awk -v RS=">" 'NF {f=$1; sub(/_.*$/, ".txt", f); printf ">%s", $0 > f}' input.txt
$ cat day.txt
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
$ cat month.txt
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
Here's a generic solution for >name_number format
$ awk 'match($0, /^>[^_]+_/){k = substr($0, RSTART+1, RLENGTH-2);
if(!(k in a)){close(op); a[k]; op=k".txt"}}
{print > op}' ip.txt
match($0, /^>[^_]+_/) if line matches >name_ at start of line
k = substr($0, RSTART+1, RLENGTH-2) save the name portion
if(!(k in a)) if the key is not found in array
a[k] add key to array
op=k".txt" output file name
close(op) in case there are too many files to write
print > op print input record to filename saved in op
Since each subcategory is composed of the same amount of lines, you can use grep's -A / --after flag to specify that number of lines to match after a header.
So if you know in advance the list of categories, you just have to grep the headers of their subcategories to redirect them with their content to the correct file :
lines_by_subcategory=3 # number of lines *after* a subcategory's header
for category in "month" "day"; do
grep ">$category" -A $lines_by_subcategory input.txt >> "$category.txt"
done
You can try it here.
Note that this isn't the most efficient solution as it must browse the input once for each category. Other solutions could instead browse the content and redirect each subcategory to their respective file in a single pass.

Replace a word of a line if matched

I am given a file. If a line has "xxx" as its third word then I need to replace it with "yyy". My final output must have all the original lines with the modified lines.
The input file is-
abc xyz mno
xxx xyz abc
abc xyz xxx
abc xxx xxx xxx
The required output file should be-
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
I have tried-
grep "\bxxx\b" file.txt | awk '{if ($3=="xxx") print $0;}' | sed -e 's/[^ ]*[^ ]/yyy/3'
but this gives the output as-
abc xyz yyy
abc xxx yyy xxx
Following simple awk may help you in same.
awk '$3=="xxx"{$3="yyy"} 1' Input_file
Output will be as follows.
abc xyz mno
xxx xyz abc
abc xyz yyy
abc xxx yyy xxx
Explanation: Checking condition here if $3 3rd field is equal to string xxx then setting $3's value to string yyy. Then mentioning 1 there, since awk works on method of condition then action. I am making condition TRUE here by mentioning 1 here and NOT mentioning any action here so be default print of current line will happen(either with changed 3rd field or with new 3rd field).
sed solution:
sed -E 's/^(([^[:space:]]+[[:space:]]+){2})apathy\>/\1empathy/' file
The output:
abc xyz mno
apathy xyz abc
abc xyz empathy
abc apathy empathy apathy
To modify the file inplace add -i option: sed -Ei ....
In general the awk command may look like
awk '{command set 1}condition{command set 2}' file
The command set 1 would be executed for every line while command set 2 will be executed if the condition preceding that is true.
My final output must have all the original lines with the modified
lines
In your case
awk 'BEGIN{print "Original File";i=1}
{print}
$3=="xxx"{$3="yyy"}
{rec[i++]=$0}
END{print "Modified File";for(i=1;i<=NR;i++)print rec[i]}'file
should solve that.
Explanation
$3 is the the third space-delimited field in awk. If it matches "xxx", then it is replaced. Print the unmodified lines first while storing the modified lines in an array. At the end, print the modified lines. BEGIN and END blocks are executed only at the beginning and the end respectively. NR is the awk built-in variable which denotes that number of records processed till the moment. Since it is used in the END block it should give us the total number of records.
All good :-)
Ravinder has already provided you with the shortest awk solution possible.
In sed, the following would work:
sed -E 's/(([^ ]+ ){2})xxx/\1yyy/'
Or if your sed doesn't include -E, you can use the more painful BRE notation:
sed 's/\(\([^ ][^ ]* \)\{2\}\)xxx/\1yyy/'
And if you're in the mood to handle this in bash alone, something like this might work:
while read -r line; do
read -r -a a <<<"$line"
[[ "${a[2]}" == "xxx" ]] && a[2]="yyy"
printf '%s ' "${a[#]}"
printf '\n'
done < input.txt

extract the data between two pattern and save it with different name [duplicate]

Using awk or sed how can I select lines which are occurring between two different marker patterns? There may be multiple sections marked with these patterns.
For example:
Suppose the file contains:
abc
def1
ghi1
jkl1
mno
abc
def2
ghi2
jkl2
mno
pqr
stu
And the starting pattern is abc and ending pattern is mno
So, I need the output as:
def1
ghi1
jkl1
def2
ghi2
jkl2
I am using sed to match the pattern once:
sed -e '1,/abc/d' -e '/mno/,$d' <FILE>
Is there any way in sed or awk to do it repeatedly until the end of file?
Use awk with a flag to trigger the print when necessary:
$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2
How does this work?
/abc/ matches lines having this text, as well as /mno/ does.
/abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line.
/mno/{flag=0} unsets the flag when the text mno is found.
The final flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed.
For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.
Using sed:
sed -n -e '/^abc$/,/^mno$/{ /^abc$/d; /^mno$/d; p; }'
The -n option means do not print by default.
The pattern looks for lines containing just abc to just mno, and then executes the actions in the { ... }. The first action deletes the abc line; the second the mno line; and the p prints the remaining lines. You can relax the regexes as required. Any lines outside the range of abc..mno are simply not printed.
This might work for you (GNU sed):
sed '/^abc$/,/^mno$/{//!b};d' file
Delete all lines except for those between lines starting abc and mno
sed '/^abc$/,/^mno$/!d;//d' file
golfs two characters better than ppotong's {//!b};d
The empty forward slashes // mean: "reuse the last regular expression used". and the command does the same as the more understandable:
sed '/^abc$/,/^mno$/!d;/^abc$/d;/^mno$/d' file
This seems to be POSIX:
If an RE is empty (that is, no pattern is specified) sed shall behave as if the last RE used in the last command applied (either as an address or as part of a substitute command) was specified.
From the previous response's links, the one that did it for me, running ksh on Solaris, was this:
sed '1,/firstmatch/d;/secondmatch/,$d'
1,/firstmatch/d: from line 1 until the first time you find firstmatch, delete.
/secondmatch/,$d: from the first occurrance of secondmatch until the end of file, delete.
Semicolon separates the two commands, which are executed in sequence.
something like this works for me:
file.awk:
BEGIN {
record=0
}
/^abc$/ {
record=1
}
/^mno$/ {
record=0;
print "s="s;
s=""
}
!/^abc|mno$/ {
if (record==1) {
s = s"\n"$0
}
}
using: awk -f file.awk data...
edit: O_o fedorqui solution is way better/prettier than mine.
Don_crissti's answer from Show only text between 2 matching pattern?
firstmatch="abc"
secondmatch="cdf"
sed "/$firstmatch/,/$secondmatch/!d;//d" infile
which is much more efficient than AWK's application, see here.
perl -lne 'print if((/abc/../mno/) && !(/abc/||/mno/))' your_file
I tried to use awk to print lines between two patterns while pattern2 also match pattern1. And the pattern1 line should also be printed.
e.g.
source
package AAA
aaa
bbb
ccc
package BBB
ddd
eee
package CCC
fff
ggg
hhh
iii
package DDD
jjj
should has an ouput of
package BBB
ddd
eee
Where pattern1 is package BBB, pattern2 is package \w*. Note that CCC isn't a known value so can't be literally matched.
In this case, neither #scai 's awk '/abc/{a=1}/mno/{print;a=0}a' file nor #fedorqui 's awk '/abc/{a=1} a; /mno/{a=0}' file works for me.
Finally, I managed to solve it by awk '/package BBB/{flag=1;print;next}/package \w*/{flag=0}flag' file, haha
A little more effort result in awk '/package BBB/{flag=1;print;next}flag;/package \w*/{flag=0}' file, to print pattern2 line also, that is,
package BBB
ddd
eee
package CCC
This can also be done with logical operations and increment/decrement operations on a flag:
awk '/mno/&&--f||f||/abc/&&f++' file

Search file, show matches and first line

I've got a comma separated textfile, which contains the column headers in the first line:
column1;column2;colum3
foo;123;345
bar;345;23
baz;089;09
Now I want a short command that outputs the first line and the matching line(s). Is there a shorter way than:
head -n 1 file ; cat file | grep bar
This should do the job:
sed -n '1p;2,${/bar/p}' file
where:
1p will print the first line
2,$ will match from second line to the last line
/bar/p will print those lines that match bar
Note that this won't print the header line twice if there's a match in the columns names.
This might work for you:
cat file | awk 'NR<2;$0~v' v=baz
column1;column2;colum3
baz;089;09
Usually cat file | ... is useless but in this case it keeps the file argument out of the way and allows the variable v to be amended quickly.
Another solution:
cat file | sed -n '1p;/foo/p'
column1;column2;colum3
foo;123;345
You can use grouping commands, then pipe to column command for pretty-printing
$ { head -1; grep bar; } <input.txt | column -ts';'
column1 column2 colum3
bar 345 23
What if the first row contains bar too? Then it's printed two times with your version. awk solution:
awk 'NR == 1 { print } NR > 1 && $0 ~ "bar" { print }' FILE
If you want the search sting as the almost last item on the line:
awk 'ARGIND > 1 { exit } NR == 1 { print } NR > 1 && $0 ~ ARGV[2] { print }' FILE YOURSEARCHSTRING 2>/dev/null
sed solution:
sed -n '1p;1d;/bar/p' FILE
The advantage for both of them, that it's a single process.
head -n 1 file && grep bar file Maybe there is even a shorter version but will get a bit complicated.
EDIT: as per bobah 's comment I have added && between the commands to have only a single error for missing file
Here is the shortest command yet:
awk 'NR==1||/bar/' file

Resources