How to match and delete some if statements from a file based on pattern matching - shell

I have following code
if (temp==1) {
some text
}
some more text
abcdef
if (temp==1) {
some text
}
if (temp2==1) {
some text
}
I need to use any script/command to delete all the if statements.
Required output:
some more text
abcdef
if (temp2==1) {
some text
}
What i can already achieve is the following
grep -zPo "if\ \(temp==1\) (\{([^{}]++)*\})" filename
and i get the following output
if (temp==1) {
some text
}
if (temp==1) {
some text
}
Same result from perl command too
perl -l -0777 -ne
"print $& while /if \(temp==1\) (\{([^{}]++|(?1))*\})/g" filename
Now i need to delete the matched lines from the file.
So all if(temp2==1) must be retained and if(temp==1) must be deleted.
How can i do this?

What you're asking to do is impossible in general without a parser for whatever language that code is written in but you can produce the output you want from that specific input using any awk in any OS on any UNIX box with:
awk '/if \(temp==1/{f=1} !f; /}/{f=0}' file
if that's all you want.

You probably can use sed to do this:
$ sed '/temp==1/,/}/d' inputfile
some more text
abcdef
if (temp2==1) {
some text
}
Above deletes (with d) all lines between and including the patterns, /temp==1 and }.
Note: It will not work with nested patterns as OP is suggesting in his comment. As per OP's comment, one could do the following:
$ sed '/temp==1/,/}/d;/}/,/}/d' 1.txt
This removes additional texts and patterns that are between two }s.

Related

Split portions from file to separate file in bash

From here Split portion of string in bash, with some code changes, I managed to achieve the goal.
Now, I want to save the text in separate file.
I tried:
awk '/[code]:/{flag=1} flag; /[/code]:/{flag=0}{x="/home/user/split/File"++i".txt";}{print > x;}' /home/user/bigfile.nfo
but I got many files with one or no line (empty file with 0 bytes)
How to write all content between [code] and [/code] to separate file ? As many text found between those tag, as many files should be created, that's my expectation
Where is my mistake in code ?
The bigfile content
blavbl
[code]
sdasdasd
asdasd
...
[/code]
line X
line Y
etc
...
[code]
...
test
test
[/code]
blabla
[code]
Single line
[/code]
After ran script I get some files with one line instead all text between blocks
I expect to have
File1.txt
sdasdasd
asdasd
...
File2.txt
...
test
test
File3.txt
Single line
Etc
A few issues with OP's current code:
the characters [, ] and / have special meaning in awk regex patterns; one solution is to escape said characters when looking for them as literal characters
OP should make sure a file's descriptor is closed once no more output is going to said file (this should keep awk from (potentially) crashing due to 'out of file descriptor' errors)
OP's current patterns include a trailing : but no such character exists in OP's sample input (ie, [code]: will not match [code])
One awk idea:
awk '
/^\[code\]/ { outfile="/home/user/split/File" ++i ".txt"; next }
/^\[\/code\]/ { close(outfile); outfile=""; next }
outfile { print > outfile }
' bigfile.nfo
NOTE: technically ] (sans the escape \) should also work
This generates:
$ head File*.txt
==> File1.txt <==
sdasdasd
asdasd
...
==> File2.txt <==
...
test
test
==> File3.txt <==
Single line

How to replace a whole line (between 2 words) using sed?

Suppose I have text as:
This is a sample text.
I have 2 sentences.
text is present there.
I need to replace whole text between two 'text' words. The required solution should be
This is a sample text.
I have new sentences.
text is present there.
I tried using the below command but its not working:
sed -i 's/text.*?text/text\
\nI have new sentence/g' file.txt
With your shown samples please try following. sed doesn't support lazy matching in regex. With awk's RS you could do the substitution with your shown samples only. You need to create variable val which has new value in it. Then in awk performing simple substitution operation will so the rest to get your expected output.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file
Above code will print output on terminal, once you are Happy with results of above and want to save output into Input_file itself then try following code.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file > temp && mv temp Input_file
You have already solved your problem using awk, but in case anyone else will be looking for a sed solution in the future, here's a sed script that does what you needed. Granted, the script is using some advanced sed features, but that's the fun part of it :)
replace.sed
#!/usr/bin/env sed -nEf
# This pattern determines the start marker for the range of lines where we
# want to perform the substitution. In our case the pattern is any line that
# ends with "text." — the `$` symbol meaning end-of-line.
/text\.$/ {
# [p]rint the start-marker line.
p
# Next, we'll read lines (using `n`) in a loop, so mark this point in
# the script as the beginning of the loop using a label called `loop`.
:loop
# Read the next line.
n
# If the last read line doesn't match the pattern for the end marker,
# just continue looping by [b]ranching to the `:loop` label.
/^text/! {
b loop
}
# If the last read line matches the end marker pattern, then just insert
# the text we want and print the last read line. The net effect is that
# all the previous read lines will be replaced by the inserted text.
/^text/ {
# Insert the replacement text
i\
I have a new sentence.
# [print] the end-marker line
p
}
# Exit the script, so that we don't hit the [p]rint command below.
b
}
# Print all other lines.
p
Usage
$ cat lines.txt
foo
This is a sample text.
I have many sentences.
I have many sentences.
I have many sentences.
I have many sentences.
text is present there.
bar
$
$ ./replace.sed lines.txt
foo
This is a sample text.
I have a new sentence.
text is present there.
bar
Substitue
sed -i 's/I have 2 sentences./I have new sentences./g'
sed -i 's/[A-Z]\s[a-z].*/I have new sentences./g'
Insert
sed -i -e '2iI have new sentences.' -e '2d'
I need to replace whole text between two 'text' words.
If I understand, first text. (with a dot) is at the end of first line and second text at the beginning of third line. With awk you can get the required solution adding values to var s:
awk -v s='\nI have new sentences.\n' '/text.?$/ {s=$0 s;next} /^text/ {s=s $0;print s;s=""}' file
This is a sample text.
I have new sentences.
text is present there.

How to get paragraphs of text by index number

I am wondering if there is a way to get paragraphs of text (source file would be a pyx file) by number as sed does with lines
sed -n ${i}p
At this moment I'd be interested to use awk with:
awk '/custom-pyx-tag\(/,/\)custom-pyx-tag/'
but I can't find documentation or examples about that.
I'm also trying to trim "\r\n" with gsub(/\r\n/,"; ") int the same awk command but it doesn't work, and I can't really figure out why.
Any hint would be very appreciated, thanks
EDIT:
This is just one example and not my exact need but I would need to know how to do it for a multipurpose project
Let's take the case that I have exported the ID3Tags of a huge collection of audio files and these have been stored in a pyx-like format, so in the end I will have a nice big file with this pattern repeating for each file in the collection:
audio-genre(
blablabla
)audio-genre
audio-artist(
bla.blabla
)audio-artist
audio album(
bla-bla-bla
)audio-album
audio-track-num(
0x
)audio-track-num
audio-track-title(
bla.bla-bla
)audio-track-title
audio-lyrics(
blablablablabla
bla.bla.bla.bla
blah-blah-blah
blabla-blabla
)audio-lyrics
...
Now if I want to extract the artist of the 1234th audio file I can use:
awk '/audio-artist\(/, /)audio-artist/' | sed '/audio-artist/d' | sed -n 1234p
so being one line it can be obtained with sed, but I don't know how to get an entire paragraph given its index, for example if I want to get the lyrics of the 6543th file how could I do it?
In the end it is just a question of whether there is a command equivalent to
sed -n $ {num} p
but to be used for paragraphs
awk -v indx=1024
'BEGIN {
RS=""
}
{ split($0,arr,"audio-artist");
for (i=2;i<=length(arr);i=i+2)
{ gsub("[()]","",arr[i]);
arts[cnt+=1]=arr[i]
}
}
END {
print arts[indx]
}' audioartist
One liner:
awk -v indx=1234 'BEGIN {RS=""} NR==1 { split($0,arr,"audio-artist");for (i=2;i<=length(arr);i=i+2) { gsub("[()]","",arr[i]);arts[cnt+=1]=arr[i] } } END { print arts[indx] }' audioartist
Using awk, and the file called audioartist, we consume the file as one line by setting the records separator (RS) to "". We then split the whole file into an array arr, based on the separator audio-artist. We look through the array arr starting from 2 in steps of 2 till the end of the array and strip out the opening and closing brackets, creating another array called arts with an incrementing count as the index and the stripped artist as the value. At the end we print the arts index specified by the passed indx variable (in this case 1234).

Remove the range of lines from file using shell script

My test file test.txt is given below:
destination mailerr { file("/var/log/mail.err" fsync(yes)); };
log { source(src); filter(f_mailerr); destination(mailerr); };
#
# and also all in one file:
#
destination mail { file("/var/log/mail"); };
log { source(src); filter(f_mail); destination(mail); };
destination mailwarn { file("/var/log/mail.warn"); };
log
{
#source(src);
filter(f_mailwarn); destination(mailwarn); };
I want to remove below lines using shell script
log
{
#source(src);
filter(f_mailwarn); destination(mailwarn); };
This lines might be comes different structure like
log{ source(src); filter(f_mailwarn); destination(mailwarn); };
(or)
log
{
source(src);
filter(f_mailwarn);
destination(mailwarn);
};
(or)
log
{ source(src); filter(f_mailwarn); destination(mailwarn); };
(or)
# log { source(src); filter(f_mailwarn); destination(mailwarn); };
};
These are the possibilities. I am using sed command:
sed '/log/{:a;/destination(mailwarn);.*}/d;N;ba}' test.txt
But it'll remove all line because first line it self "log" comes. So many of "log" comes in this file so how to remove the particular lines using shell script.
You can try this sed:
sed '/^[# ]*log/{:loop; /destination(.*}/d; /}/{n}; N; b loop;}' file
Explanation:
/^[# ]*log/ - Starts to process the block {} when regex matches.
/destination(.*}/d - Deletes the pattern space when regex matches. Starts a new cycle.
/}/{n} - When it finds }, then prints pattern space and reads next line of input. (for printing out an non log...destination() line.
N - This appends next line input to pattern space.
b loop - This transfers flow control to loop.
If the file is not too big, use the -z option to process entire file in one shot (#SLePort notes that this is a GNU specific option)
sed -z 's/log[ \t\n]*{[^}]*destination(mailwarn);[ \t\n]*};//g' test.txt
log start of match
[ \t\n]*{ zero or more non space/tab/newline characters followed by {
to avoid false matching log/mail.warn"); }; if [^{]*{ was used instead
[^}]* zero or more non } characters
destination(mailwarn); string to match
[ \t\n]*}; zero or more non space/tab/newline characters followed by };
the matched pattern gets deleted as replacement string is empty
Similar with perl
perl -0777 -pe 's/log\s*\{[^}]*destination\(mailwarn\);\s*};//g' test.txt
Here is my somewhat janky way of getting the job done. It seems to work in all the cases you posted, but is in no way a universal solution to this problem.
tr '\n' '\0' < test.txt | sed 's/log[^{}]*{[^}]*destination(mailwarn);[^}]*};//g' | tr '\0' '\n'
If test.txt contains null characters, it may not work properly. This command essentially looks for log followed by a { at some point, then until a } is reached destination(mailwarn);, eventually followed by };, and deletes this match if found.
I hope someone else posts a more robust solution to this problem, but this is the only quick solution I was able to come up with.
With sed:
sed '/^#*[[:blank:]]*log/{:a;/};/!{N;ba;s/\n//;};/destination(mailwarn)/d;}' file
Adds lines from log up to }; to the pattern space, search in it for destination(mailwarn) and if found, deletes lines.

Removing lines between tags in a text file

I have many text files containing annotations. The original text is marked with lines containing the words:
START OF TEXT OF PASSAGE 1
END OF TEXT OF PASSAGE 1
Obviously I can search each document for the phrase START OF TEXT and delete everything up to it. Then search for END OF TEXT and start selecting text for deletion until I get to the next START OF TEXT.
I have come up with this design so far:
#!/bin/bash
a="START OF PROJECT"
b="END OF PROJECT"
while read line; do
if line contains a; do
while read line; do
'if line does not contain b'
'append the line to output.txt'; fi
done
done
fi
done
Perhaps there is an easier way using sed, awk, grep and pipes?
'for every document' 'loop through it doing this' ('find the original text between START and END' | >> output.txt)
Unfortunately I am poor at bash and ignorant of sed/awk.
The reason for this is that I am assembling a huge text document that is a concatenation of thousands of marked up documents – each of which contains some annotated passages.
In Python:
import re
with open('in.txt') as f, open('out.txt', 'w') as output:
output.write('\n'.join(re.findall(r'START OF TEXT(.*?)END OF TEXT', f.read())))
This reads the input, searches for all matches that begin and end with the necessary markers, captures the text of interest in a group, joins all those groups on a linefeed, and writes that to the result file.
Pretty easy to do with awk. You would create a script (I'll call it yank.awk) containing this:
#!/usr/bin/awk
/START OF PROJECT/ { capture = 1; next }
/END OF PROJECT/ { capture = 0 }
capture == 1 { print }
and then run it like so:
yank.awk in.txt > output.txt
Could also do with sed and grep:
sed -ne '/START OF PROJECT/,/END OF PROJECT/p' in.txt | grep -vE '(START|END) OF PROJECT' > output.txt
(Another Python solution)
You can have itertools.groupby group lines together based on a boolean value - just use a global flag to keep track of whether you are in a block or not, and then use groupby to group the lines that are in or out of blocks. Then just discard the ones that are not blocks:
sample_lines = """
lskdjflsdkjf
sldkjfsdlkjf
START OF TEXT
Asdlkfjlsdkfj
Bsldkjf
Clsdkjf
END OF TEXT
sldkfjlsdkjf
sdlkjfdklsjf
sdlkfjdlskjf
START OF TEXT
Dsdlkfjlsdkfj
Esldkjf
Flsdkjf
END OF TEXT
sldkfjlsdkjf
sdlkjfdklsjf
sdlkfjdlskjf
""".splitlines()
from itertools import groupby
in_block = False
def is_in_block(line):
global in_block
if line.startswith("END OF TEXT"):
in_block = False
ret = in_block
if line.startswith("START OF TEXT"):
in_block = True
return ret
for lines_are_text,lines in groupby(sample_lines, key=is_in_block):
if lines_are_text:
print(list(lines))
gives:
['Asdlkfjlsdkfj', 'Bsldkjf', 'Clsdkjf']
['Dsdlkfjlsdkfj', 'Esldkjf', 'Flsdkjf']
See that first group has the lines that start with A, B, and C, and the second group is made up of those lines starting with D, E, and F.
It sounds like the specific solution you need is:
awk '/END OF TEXT OF PASSAGE/{f=0} f; /START OF TEXT OF PASSAGE/{f=1}' file
See https://stackoverflow.com/a/18409469/1745001 for other ways to select text from files.
Use Perl's Flip-Flop Operator to Print Text Between Markers
Given a corpus like:
START OF TEXT OF PASSAGE 1
foo
END OF TEXT OF PASSAGE 1
START OF TEXT OF PASSAGE 2
bar
END OF TEXT OF PASSAGE 2
you can use the Perl flip-flop operator to process within a range of lines. For example, from the shell prompt:
$ perl -ne 'if (/^START OF TEXT/ ... /^END OF TEXT/) {
next if /^(?:START|END)/;
print;
}' /tmp/corpus
foo
bar
Basically, this short Perl script loops through your input. When it finds your start and end tags, it throws away the tags themselves and prints everything else in between.
Usage Notes
The line breaks between passages in the corpus are for readability. It doesn't matter if your real corpus has no line breaks between passages, so long as the text markers always start at the beginning of the line as shown in your original post. If that assumption doesn't hold true, then you will need to adjust the regular expressions used to identify the start and end of your passages.
You can pass multiple files to the Perl script. Again, it makes no practical difference as long as you don't exceed the length limit of your shell.
If you want the final output to go to somewhere other than standard output, just use shell redirection. For example:
perl -ne 'if (/^START OF TEXT/ ... /^END OF TEXT/) {
next if /^(?:START|END)/;
print;
}' /tmp/file1 /tmp/file2 /tmp/file3 > /tmp/output
You can use sed as follows:
sed -n '/^START OF TEXT/,/^END OF TEXT/{/^\(START\|END\) OF TEXT/!p}' infile
or, with extended regular expressions (-r):
sed -rn '/^START OF TEXT/,/^END OF TEXT/{/^(START|END) OF TEXT/!p}' infile
-n prevents sed from printing as a default. The rest works as follows:
/^START OF TEXT/,/^END OF TEXT/ { # For lines between these two matches
/^\(START\|END\) OF TEXT/!p # If the line does NOT match, print it
}
This works with GNU sed and might require some tweaking to run with other seds.

Resources