I want to delete all multiline occurences of a pattern like
{START-TAG
foo bar
ID: 111
foo bar
END-TAG}
{START-TAG
foo bar
ID: 222
foo bar
END-TAG}
{START-TAG
foo bar
ID: 333
foo bar
END-TAG}
I want to delete all portions between START-TAG and END-TAG that contain specific IDs.
So to delete ID: 222 only this would remain:
{START-TAG
foo bar 2
ID: 111
foo bar 3
END-TAG}
{START-TAG
foo bar 2
ID: 333
foo bar 3
END-TAG}
I have a blacklist of IDs that should be removed.
I assume a quite simple multiline sed regex script would do it. Can anyone help?
It is very similar to Question: sed multiline replace but not the same.
You can use the following:
sed '/{START-TAG/{:a;N;/END-TAG}/!ba};/ID: 222/d' data.txt
Breakdown:
/{START-TAG/ { # Match '{START-TAG'
:a # Create label a
N # Read next line into pattern space
/END-TAG}/! # If not matching 'END-TAG}'...
ba # Then goto a
} # End /{START-TAG/ block
/ID: 222/d # If pattern space matched 'ID: 222' then delete it.
Don't use sed for anything that involves multiple lines, just use awk for a robust, portable solution. Given the sample input from the question you referenced, if the blocks are always separated by blank lines:
$ awk -v RS= -v ORS='\n\n' '!/ID: 222/' file
{START-TAG
foo bar
ID: 111
foo bar
END-TAG}
{START-TAG
foo bar
ID: 333
foo bar
END-TAG}
Otherwise:
$ awk '/{START-TAG/{f=1} f{rec=rec $0 ORS} /END-TAG}/{if (rec !~ /ID: 222/) print rec; rec=f=""}' file
{START-TAG
foo bar
ID: 111
foo bar
END-TAG}
{START-TAG
foo bar
ID: 333
foo bar
END-TAG}
Related
I have my doc with differents lines size :
TITLE
NAME VALUE foo bar foo bar foo bar foo bar foo bar foo bar
NAME VALUE foo2 bar2 foo2 bar2
NAME VALUE foo3 bar3
I want to delete the title and delete the two firsts fields, then print newline every two fields like this :
foo bar
foo bar
foo bar
foo bar
foo bar
foo2 bar2
foo2 bar2
foo3 bar3
My output is actually :
foo bar foo bar foo bar foo bar foo bar foo bar
foo2 bar2 foo2 bar2
foo3 bar3
With this code :
awk -F' ' 'NR>1, NF>2 {
s = ""; for(i = 3; i <= NF; i++) s = s $i " "; print s
}' file_input.txt > file_output.txt
I don't found solution.
If someone can help me.
First time on stack overflow !
Thank you
I suggest:
awk 'NR>1{ for(i=3; i<=NF; i=i+2){ print $i,$(i+1) } }' file
Output:
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo2 bar2
foo2 bar2
foo3 bar3
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
Another option using gnu awk with FPAT having a pattern matching 2 fields that are delimited by 1 or more whitespace characters.
As the title is a single field:
awk -v FPAT='\\S+\\s+\\S+' '{for(i=2;i<=NF;i++) print $i}' file
Output
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo2 bar2
foo2 bar2
foo3 bar3
Or starting from the second line:
awk -v FPAT='\\S+\\s+\\S+' 'NR>1{for(i=2;i<=NF;i++) print $i}' file
sed alternative, won't win in readability compared to awk.
$ sed -E '1d;s/\w+ \w+ //;s/( \w+) /\1\n/g' file
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo2 bar2
foo2 bar2
foo3 bar3
I'm trying to prepare a file for a report. What I have is like this:
foo
bar bar oof
bar oof
foo
bar bar
I'm trying to get an output like this:
foo bar bar oof
bar off
foo bar bar
I wanted to search for a string, in this case 'foo', and within the line where the string is found I have to remove the newline.
I did search but I can only find solutions where 'foo' is also removed. How can I do this?
Using awk:
awk -v search='foo' '$0 ~ search{printf $0; next}1' infile
You may use printf $0 OFS like below, if your field doesn't have leading space before newline char
awk -v search='foo' '$0 ~ search{printf $0 OFS; next}1' infile
Test Results:
$ cat infile
foo
bar bar oof
bar oof
foo
bar bar
$ awk -v search='foo' '$0 ~ search{printf $0; next}1' infile
foo bar bar oof
bar oof
foo bar bar
Explanation:
-v search='foo' - set variable search
$0 ~ search - if lines/record/row contains regexp/pattern/string mentioned in variable
{printf $0; next} - print current record without record separator and go to next line
}1 1 at the end does default operation that is print current record/row.
You can do this quite easily with sed, for example:
$ sed '/^foo$/N;s/\n/ /' file
foo bar bar oof
bar oof
foo bar bar
Explanation
/^foo$/ find lines containing only foo
N read/append next line of input into pattern space.
s/\n/ / substitute the '\n' with a space.
Assume a text file file which contains multiple discrete number ranges, one per line. Each range is preceded by a string (i.e., the range name). The lower and upper bound of each range is separated by a dash. Each number range is succeeded by a semi-colon. The individual ranges are sorted (i.e., range 101-297 comes before 1299-1301) and do not overlap.
$cat file
foo 101-297;
bar 1299-1301;
baz 1314-5266;
Please note that in the example above the three ranges do not form a continuous range that starts at integer 1.
I believe that awk is the appropriate tool to fill the missing number ranges such that all ranges taken together form a continuous range from {1} to {upper bound of the last range}. If so, what awk command/function would you use to perform the task?
$cat file | sought_awk_command
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
--
Edit 1: Upon closer evaluation, the code suggested below fails at another simple example.
$cat example2
foo 101-297;
bar 1299-1301;
baz 1302-1314; # Notice that ranges "bar" and "baz" are continuous to one another
qux 1399-5266;
$ awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1' example2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
baz 1302-1314;
new3 1302-1398; # ERROR HERE: Notice that range "new3" has a lower bound that is equal to upper bound of "bar", not of "baz".
qux 1399-5266;
--
Edit 2: Many thanks to RavinderSingh13 for assistance with solving this question. However, the suggested code still generates output inconsistent with the given objective.
$ cat example3
foo 35025-35144;
bar 35259-35375;
baz 35376-35624;
qux 37911-39434;
$ awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}' example3
new1 1-35024;
foo 35025-35144;
new2 35145-35258;
bar 35259-35375;
new3 35376-35375; # ERROR HERE: Notice that range "new3" has been added, even though ranges "bar" and "baz" are contiguous.
baz 35376-35624;
new4 35625-37910;
qux 37911-39434;
try:
awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1' Input_file
EDIT: Adding a non-one liner solution for same too now with proper explanation.
awk -F'[ -]' ' ###Setting field separator as space, dash here.
$3-Q>1{ ###Checking here if 3rd field and variable Qs subtraction is greater than 1, if yes then perform following.
print "new"++o,Q+1"-"$3-1";"; ###printing the string new with a incrementing value of variable o each time, then variable Qs value with adding 1 to it, then current line $4-1 and semi colon.
Q=$4 ###Assigning the variable Q value to 4th field of the current line here too.
}
1 ###printing the current line here.
' Input_file ###Mentioning the Input_file here too.
EDIT2: Adding one more answer as per OP's a condition.
awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}' Input_file
This has no problem with ranges that can overlap as you showed in your original example2 where bar 1299-1301; and baz 1301-1314; overlapped at 1301.
$ cat tst.awk
{ split($2,curr,/[-;]/); currStart=curr[1]; currEnd=curr[2] }
currStart > (prevEnd+1) { print "new"++cnt, prevEnd+1 "-" currStart-1 ";" }
{ print; prevEnd=currEnd }
$ awk -f tst.awk file
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
$ awk -f tst.awk example2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
baz 1301-1314;
new3 1315-1398;
qux 1399-5266;
$ awk -f tst.awk example3
new1 1-35024;
foo 35025-35144;
new2 35145-35258;
bar 35259-35375;
baz 35376-35624;
new3 35625-37910;
qux 37911-39434;
$ cat file1
foo 2-100
bar 102-200
$ awk F' +|[-;}' 'p+1<$2{print "new" ++q, p+1 "-" $2-1 ";"}p=$3' file1
new1 1-1;
foo 2-100
new2 101-101;
bar 102-200
$ cat file2
foo 101-297;
bar 1299-1301;
baz 1314-5266;
$ awk -F' +|[-;]' 'p+1<$2{print "new" ++q, p+1 "-" $2-1 ";"}p=$3' file2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
Explained:
$ awk -F' +|[-;]' ' # FS is ; - or a bunch of spaces
p+1 < $2 { # if p revious $3+1 is still less than new $2
print "new"++q,p+1 "-" $2-1 ";" # print a "new" line
}
p=$3 # set future p and implicit print of record *
' file2 # * as all values are above 0
Please help - I'm stuck.
I need to
delete all lines of a textfile
which does NOT contain foo or Foo
starting from the 2nd line of the file
infile:
first line
foobar
tree
fish
Foo Bar
Football
Foobar
Street
foo bar
outfile:
first line
foobar
Foo Bar
Football
Foobar
foo bar
I tried the following:
sed '2,$/*.foo.*\|.*Foo.*/!d' -i test.txt
The resulting error is:
sed: -e expression #1, char 4: unknown command: `/'
What's my mistake?
(awk would be a possible alternative, too.)
sed approach:
sed -e '2,${/[fF]oo/!d}' file
-e script (--expression=script)
Add the commands in script to the set of commands to be run while
processing the input.
2,$ - an address range, considers the lines from the second line to the end
[fF] - character class, matches either f or F
/!d - deletes lines which don't contain foo or Foo
awk 'NR==1 || !/foo|Foo/' oldfile > newfile
Note: If you use csh or tcsh, you need to protect the ! with backslash:
awk 'NR==1 || \!/foo|Foo/' oldfile > newfile
A good option is to invert your requirements - delete all lines, but print first line and all lines with foo or Foo.
sed -n '1p; /foo\|Foo/p'
This one will print the first line twice if it contains foo or Foo, but that can be easily fixed if needed.
A chance but without sed neither awk (don't know if they are mandatory in your case), you can copy the first line of your textfile in a new file:
head -1 textfile > newfile
Then you can append all the content with foo or Foo to the new file:
grep "foo\|Foo" textfile >> newfile
So you will have the desired content in newfile.
If you want to have it in your original file then you can move it:
mv newfile textfile
If first line contains foo or Foo it will be printed twice, but as you stated you wanted to keep the first line, I assume that it won't have neither foo nor Foo.
In awk. First some test data:
$ cat file
1 begin
2 asd
3 foo
4 sdf
5 Foo
6 end
The code. print records after the first record that contain foo or Foo:
$ awk 'NR==1 || /[fF]oo/' file
1 begin
3 foo
5 Foo
another awk in the forest
awk '!a++||!/[fF]oo/' infile
Here is your code:
sed '2,$/*.foo.*\|.*Foo.*/!d'
The issue here is that you are using two forms of line-addressing, namely the numeric (addr1,addr2) and matching (/REGEX/). In addition, there are errors in your regular expression.
Here is how I would solve it:
sed '1b; /[fF]oo/!d' infile
Output:
first line
foobar
Foo Bar
Football
Foobar
foo bar
awk '/^[fF]/&&!/fish/' file
first line
foobar
Foo Bar
Football
Foobar
foo bar
Assume a text file that contains specific lines whose word order should be altered. The words (substrings) are delimited by single whitespaces. The lines to be altered can be identified by their first character (e.g., ">").
# cat test.txt
>3 test This is
foo bar baz
foo bar qux
>2 test This is
foo bar baz
>1 test This is
foo bar qux
What command (probably in awk) would you use to apply the same ordering process across all lines starting with the key character?
# cat test.txt | sought_command
>This is test 3
foo bar baz
foo bar qux
>This is test 2
foo bar baz
>This is test 1
foo bar qux
Here's one way you could do it using awk:
awk 'sub(/^>/, "") { print ">"$3, $4, $2, $1; next } 1' file
sub returns true (1) when it makes a substitution. The 1 at the end is the shortest true condition, to trigger the default action { print }.
According to your example, like this:
awk '$1~"^>" {sub(">","",$1);print ">"$3,$4,$2,$1;next} {print}' test.txt
The tool best suited for simple substitutions on individual lines is sed:
$ sed -E 's/>([^ ]+)( [^ ]+ )(.*)/>\3\2\1/' file
>This is test 3
foo bar baz
foo bar qux
>This is test 2
foo bar baz
>This is test 1
foo bar qux
Awk is the right tool for anything more complicated/interesting. Note that unlike the awk solutions you've received so far the above will continue to work if/when you have more than 4 "words" on a line, e.g.:
$ cat file
>3 test Now is the Winter of our discontent
foo bar baz
foo bar qux
>2 test This is
foo bar baz
>1 test This is
foo bar qux
$ sed -E 's/>([^ ]+)( [^ ]+ )(.*)/>\3\2\1/' file
>Now is the Winter of our discontent test 3
foo bar baz
foo bar qux
>This is test 2
foo bar baz
>This is test 1
foo bar qux
$ awk 'sub(/^>/, "") { print ">"$3, $4, $2, $1; next } 1' file
>Now is test 3
foo bar baz
foo bar qux
>This is test 2
foo bar baz
>This is test 1
foo bar qux