Perl program to Remove segment in a file - shell

I have a file which contains the following
blah
blah blah
Zebra
Blah blah blah
Blah
Bleh
Dog
Blag
Noblah
Someblah
I want to remove the line segment from Zebra and Dog. Remove only those where Zebra occurs first and Dog occurs later.
How to do this in a perl script ?

Like How do I remove a specific area of element from an array, this is a use case for the flip-flop operator.
perl -ne 'print unless /Zebra/ .. /Dog/' < input-file

perl -ne'$r ||= /^Zebra$/; print if !$r; $r &&= !/^Dog$/;'
or
perl -ne'print if !( /^Zebra$/ .. /^Dog$/ );'
These assume every Zebra and Dogs are paired, and that they can't be nested.
See Specifying file to process to Perl one-liner.

Related

Speed up searching a large file using sed or an alternative

I have several large files in which I need to find a specific string and take everything between the line which contains the string and the next date at the beginning of a line. This file looks like this:
20220520-11:53:01.242: foofoobar
20220520-11:53:01.244: foo_bar blah: this_i_need
what
to
do
20220520-11:53:01.257: blablabla
20220520-11:53:01.257: bla this_i_need bla
20220520-11:53:01.258: barbarfooo
The output I need is this:
20220520-11:53:01.244: foo_bar blah: this_i_need
what
to
do
20220520-11:53:01.257: bla this_i_need bla
Now I'm using sed '/'"$string"'/,/'"$date"'/!d' which works as intended except it also takes the next row with the date even if it doesn't contain the string, but it's not a big problem.
The problem is that it takes a really long time searching the files.
Is it possible to edit the sed command so it will run faster or is there any other option to get a better runtime? Maybe using awk or grep?
EDIT: I forgot to add that the expected results occur multiple times in one file, so exiting after one match is not suitable. I am looping trough multiple files in a for loop with the same $string and same $date. There are a lot of factors slowing the script down that i can't change (extracting files one by one from a 7z, searching and removing them after search in one loop).
Using sed you might use:
sed -n '/this_i_need/{:a;N;/\n20220520/!ba;p;q}' file
Explanation
-n Prevent default printing of a line
/this_i_need/ When matching this_i_need
:a Set a label a to be able to jump back to
N pull the next line into the pattern space
/\n20220520/! If not matching a newline followed by the date
ba Jump back to the label (like a loop and process what is after the label again)
p When we do match a newline and the date, then print the pattern space
q Exit sed
Output
20220520-11:53:01.244: foo_bar blah: this_i_need
what
to
do
20220520-11:53:01.257: blablabla
With sed it has to delete all the lines outside the matching ranges from the buffer, which is inefficient when the file is large.
You can instead use awk to output the desired lines directly by setting a flag upon matching the specific string and clearing the flag when matching a date pattern, and outputting the line when the flag is set:
awk '/[0-9]{8}/{f=0}/this_i_need/{f=1}f' file
Demo: https://ideone.com/J2ISVD
You might use exit statement to instruct GNU AWK to stop processing, which should give speed gain if lines you are looking ends far before end of file. Let file.txt content be
20220520-11:53:01.242: foofoobar
20220520-11:53:01.244: foo_bar blah: this_i_need
what
to
do
20220520-11:53:01.257: blablabla
20220520-11:53:01.257: bla this_i_need bla
20220520-11:53:01.258: barbarfooo
then
awk 's&&/^[[:digit:]]{8}.*this_i_need/{print;exit}/this_i_need/{p=1;s=1;next}p&&/^[[:digit:]]{8}/{p=0}p{print}' file.txt
gives output
what
to
do
20220520-11:53:01.257: bla this_i_need bla
Explanation: I use 2 flag-variables p as priting and s as seen. I inform GNU AWK to
print current line and exit if seen and line starts with 8 digits followed by 0 or more any characters followed by this_i_need
set p flag to 1 (true) and s flag to 1 (true) and go to next line if this_i_need was found in line
set p flag to 0 (false) if p flag is 1 and line starts with 8 digit
print current line if p flag is set to 1
Note that order of actions is crucial.
Disclaimer: this solution assumes that if line starts with 8 digits, then it is line beginning with date, if this is not case adjust regular expression according to your needs.
(tested in gawk 4.2.1)
Assumptions:
start printing when we find the desired string
stop printing when we read a line that starts with any date (ie, any 8-digit string)
One awk idea:
string='this_i_need'
awk -v ptn="${string}" ' # pass bash variable "$string" in as awk variable "ptn"
/^[0-9]{8}/ { printme=0 } # clear printme flag if line starts with 8-digit string
$0 ~ ptn { printme=1 } # set printme flag if we find "ptn" in the current line
printme # only print current line if printme==1
' foo.dat
Or as a one-liner sans comments:
awk -v ptn="${pattern}" '/^[0-9]{8}/ {printme=0} $0~ptn {printme=1} printme' foo.dat
NOTE: OP can rename the awk variables (ptn, printme) as desired as long as they are not a reserved keyword (see 'Keyword' in awk glossary)
This generates:
20220520-11:53:01.244: foo_bar blah: this_i_need
what
to
do
20220520-11:53:01.257: bla this_i_need bla

Delete the lines from file between pattern match

How to delete all the lines between two pattern in file using sed.
Here pattern are //test and //endtest, file content:
blah blah blah
c
f
f
[
]
//test
all text to be deleted
line1
line2
xyz
amv
{
//endtest
l
dsf
dsfs
Expected result:
blah blah blah
c
f
f
[
]
//test
//endtest
l
dsf
dsfs
This is common feature of sed
sed '/^\/\/test$/,/^\/\/endtest/d'
As / is used to bound regex, they have to be escaped, in regex.
If you want to keep marks (as requested):
sed '/^\/\/test$/,/^\/\/endtest/{//!d}'
Explanation:
Have a look at info sed, search for sed address -> Regexp Addresses and Range Addresses.
Enclosed by { ... }, symbol // mean any bound.
The empty regular expression '//' repeats the last regular
expression match (the same holds if the empty regular expression is
passed to the 's' command).
! mean not, then d for delete line
Alternative: You could write:
sed '/^\/\/\(end\)\?test$/,//{//!d}'
or
sed -E '/^\/\/(end)?test$/,//{//!d}'
Will work same, but care, this could reverse effect if some extra pattern //endtest may exist before first open pattern (//test).
... All this was done, using GNU sed 4.4!
Under MacOS, BSD sed
Under MacOS, I've successfully dropped wanted lines with this syntax:
sed '/^\/\/test$/,/^\/\/endtest/{/^\/\/\(end\)\{0,1\}test$/!d;}'
or
sed -E '/^\/\/test$/,/^\/\/endtest/{/^\/\/(end)?test$/!d;}'
With awk:
$ awk '/\/\/endtest/{p=0} !p; /\/\/test/{p = 1}' file
blah blah blah
c
f
f
[
]
//test
//endtest
l
dsf
dsfs
if your data in 'd' file, try gnu sed:
sed -E '/\/\/test/,/\/\/endtest/{/\/\/.*test/!d}' d

How can I wrap text within a multiline table, without loosing formatting?

I have data (mostly log, but also user notes) in form of:
[2019_03_10][21:12:55] # Very useful text of hight iportance to demonstrate my question.
[2019_03_10][22:32:55] # Another Text.
[2019_03_10][23:02:22] # blablabal Bla bla, just another long text with linebreak. And this one is just a little longer then those before.
At the moment, I use cat test.txt | column -s '#' -t and that's what I get:
[2019_03_10][21:12:55] Very useful text of hight iportance
to demonstrate my question.
[2019_03_10][22:32:55] Another Text.
[2019_03_10][23:02:22] blablabal Bla bla, just another lon
g text with linebreak. And this one is just a little longer t
hen those before.
Thats what I would love to get:
[2019_03_10][21:12:55] Very useful text of hight iportance
to demonstrate my question.
[2019_03_10][22:32:55] Another Text.
[2019_03_10][23:02:22] blablabal Bla bla, just another lon
g text with linebreak. And this one
is just a little longer then those
before.
There must be an easy way of doing this. Must not?
I'd take advantage of the UNIX command fold so you don't have to reinvent the wheel:
$ cat tst.awk
{
beg = end = $0
sub(/ *#.*/,"",beg)
sub(/[^#]+# */,"",end)
cmd = "printf \047" end "\n\047 | fold -sw38"
while ( (cmd | getline line) > 0 ) {
print beg, line
gsub(/./," ",beg)
}
}
$ awk -f tst.awk file
[2019_03_10][21:12:55] Very useful text of hight iportance
to demonstrate my question.
[2019_03_10][22:32:55] Another Text.
[2019_03_10][23:02:22] blablabal Bla bla, just another long
text with linebreak. And this one is
just a little longer then those
before.
Perl script that uses the standard Text::Wrap module:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Text::Wrap;
while (<>) {
chomp;
my $line = wrap("", "\t\t\t", $_);
$line =~ s/ # /\t/;
say $line;
}
Usage:
$ perl wrap.pl test.txt
[2019_03_10][21:12:55] Very useful text of hight iportance to demonstrate
my question.
One-liner version:
$ perl -MText::Wrap -lpe '$_ = wrap("", "\t\t\t", $_); s/ # /\t/;' test.txt
You can try the following script:
#!/bin/bash
WIDTH=50 # width of column2
# TMPFILE=$(mktemp) # tempfile in /tmp
TMPFILE=$(mktemp -p /dev/shm) # tempfile in shared memory
while read line; do # read all lines from log
column1=${line%%#*} # extract column1
blank=${column1//?/ } # blankline, size len(column1)
column2=${line##*#} # column2, comments
echo $column2 | fmt -$WIDTH > $TMPFILE # format column2
while read line2; do # read new formated lines
echo "$column1" $line2 # write column1 and column2
column1=$blank # blank column1
done < $TMPFILE # read from tempfile
done < "$1" # first arg from commandline
rm $TMPFILE # delete tempfile
USAGE: scriptname logfile
If you have no shared memory you can replace line 4 with line 3.
Using Perl command line. You can change 20 to 30 or 40 to wrap the length you need.
$ perl -ne ' ($x,$y)=/(.+?)#(.+)/ ; print "$x"; $s="";
while($y=~/(.{20,}?\s|.*$)/g) { printf("%s%s\n",$s,$1);$s="\t\t\t" } ' mac.txt
[2019_03_10][21:12:55] Very useful text of
hight iportance to demonstrate
my question.
[2019_03_10][22:32:55] Another Text.
[2019_03_10][23:02:22] blablabal Bla bla, just
another long text with
linebreak. And this one
is just a little longer
then those before.
$
There is an extra newline, if you want to remove that
$ perl -lne ' ($x,$y)=/(.+?)#(.+)/ ; printf("%s",$x); $s="";
while($y=~/(.{20,}?\s|.*$)/g) { $p=$1; print $s,$p if $p!~/^\s*$/s; $s="\t\t\t" } ' mac.txt
[2019_03_10][21:12:55] Very useful text of
hight iportance to demonstrate
my question.
[2019_03_10][22:32:55] Another Text.
[2019_03_10][23:02:22] blablabal Bla bla, just
another long text with
linebreak. And this one
is just a little longer
then those before.
$
This seem to be a job for sed:
This first script will answer your question, by splitting lines at char 60 (without consideration of spaces).
sed ':a;s/\([^\n]\{60\}\)\([^\n]\+\)/\1\n \2/;ta'
Will output something like:
[2019_03_10][21:12:55] # Very useful text of hight importanc
e to demonstrate my question.
[2019_03_10][22:32:55] Another Text.
[2019_03_10][23:02:22] blablabal Bla bla, just another long
text with linebreak. And this one is
just a lot longer than those before,
by adding unsignificant and useless b
la bla.
Seem but word splitting:
sed ':a;/.\{61\}/s/\([^\n]\{1,60\}\) \([^\n]\+\)/\1\n \2/;/\n/!bb;P;D;:b;ta'
or
sed '
:a;
/.\{61\}/s/\([^\n]\{1,60\}\) \([^\n]\+\)/\1\n \2/;
/\n/!bb;
P;
D;
:b;
ta
'
will render:
[2019_03_10][21:12:55] # Very useful text of hight
importance to demonstrate my
question.
[2019_03_10][22:32:55] Another Text.
[2019_03_10][23:02:22] blablabal Bla bla, just another long
text with linebreak. And this one is
just a lot longer then those before,
by adding unsignificant and useless
bla bla.

Use `sed` to replace text in code block with output of command at the top of the code block

I have a markdown file that has snippets of code resembling the following example:
```
$ cat docs/code_sample.sh
#!/usr/bin/env bash
echo "Hello, world"
```
This means there there's a file at the location docs/code_sample.sh, whose contents is:
#!/usr/bin/env bash
echo "Hello, world"
I'd like to parse the markdown file with sed (awk or perl works too) and replace the bottom section of the code snippet with whatever the above bash command evaluates to, for example whatever cat docs/code_sample.sh evaluates to.
Perl to the rescue!
perl -0777 -pe 's/(?<=```\n)^(\$ (.*)\n\n)(?^s:.*?)(?=```)/"$1".qx($2)/meg' < input > output
-0777 slurps the whole file into memory
-p prints the input after processing
s/PATTERN/REPLACEMENT/ works similarly to a substitution in sed
/g replaces globally, i.e. as many times as it can
/m makes ^ match start of each line instead of start of the whole input string
/e evaluates the replacement as code
(?<=```\n) means "preceded by three backquotes and a newline"
(?^s:.*?) changes the behaviour of . to match newlines as well, so it matches (frugally because of the *?) the rest of the preformatted block
(?=```) means "followed by three backquotes`
qx runs the parameter in a shell and returns its output
A sed-only solution is easier if you have the GNU version with an e command.
That said, here's a quick, simplistic, and kinda clumsy version I knocked out that doesn't bother to check the values of previous or following lines - it just assumes your format is good, and bulls through without any looping or anything else. Still, for my example code, it worked.
I started by making an a, a b, and an x that is the markup file.
$: cat a
#! /bin/bash
echo "Hello, World!"
$: cat b
#! /bin/bash
echo "SCREW YOU!!!!"
$: cat x
```
$ cat a
foo
bar
" b a z ! "
```
```
$ cat b
foo
bar
" b a z ! "
```
Then I wrote s which is the sed script.
$: cat s
#! /bin/env bash
sed -En '
/^```$/,/^```$/ {
# for the lines starting with the $ prompt
/^[$] / {
# save the command to the hold space
x
# write the ``` header to the pattern space
s/.*/```/
# print the fabricated header
p
# swap the command back in
x
# the next line should be blank - add it to the current pattern space
N
# first print the line of code as-is with the (assumed) following blank line
p
# scrub the $ (prompt) off the command
s/^[$] //
# execute the command - store the output into the pattern space
e
# print the output
p
# put the markdown footer back
s/.*/```/
# and print that
p
}
# for the (to be discarded) existing lines of "content"
/^[^`$]/d
}
' $*
It does the job and might get you started.
$: s x
```
$ cat a
#! /bin/bash
echo "Hello, World!"
```
```
$ cat b
#! /bin/bash
echo "SCREW YOU!!!!"
```
Lots of caveats - better to actually check that the $ follows a line of backticks and is followed by a blank line, maybe make sure nothing bogus could be in the file to get executed... but this does what you asked, with (GNU) sed.
Good luck.
A rare case when use of getline would be appropriate:
$ cat tst.awk
state == "importing" {
while ( (getline line < $NF) > 0 ) {
print line
}
close($NF)
state = "imported"
}
$0 == "```" { state = (state ? "" : "importing") }
state != "imported" { print }
$ awk -f tst.awk file
See http://awk.freeshell.org/AllAboutGetline for getline uses and caveats.

How to use :put to append a variable at the end of the current line with vim

I have the following command, which I need to insert in a bash script:
vim file.txt -c ':let var=$foo' -c ':execute "normal! gg/string\<cr>V"' -c ":normal! d" -c ':execute "normal inewstring\<Esc>"' -c ':put =var'
What it does (or what I want it to do) is to use the variable foo, which is defined on the script, search for the first appearance of string select the whole line and delete it, then insert newstring and append the value of foo just after this new string. However, my code puts the value always in the next line, no matter if I change the x value in :[x]put.
As a novice in vim I'm not sure even if this way to achieve my goal is efficient, so any suggestion is welcome. Thanks in advance.
Let's say that we have this input file:
$ cat file.txt
blah
string foo
string foo
blah
What I'm expecting to obtain (defining $foo="hello") is:
$ cat file.txt
blah
newstringhello
string foo
blah
I am a big vim fan, but if I were you, I won't do it with vim.
Since you didn't post the example input the desired output, I can only guess what do you want from your description.
Given that we have:
kent$ cat f
blah
string foo bar
string foo bar
blah
And var="hello", the next sed one-liner changes the input file into:
kent$ sed "0,/string/{/string/s/.*/newString/};$ a \\$var" f
blah
newString
string foo bar
blah
hello
However I don't know if it is exactly what you wanted.
update
kent$ sed "0,/string/{/string/s/.*/newString$var/}" f
blah
newStringhello
string foo bar
blah

Resources