I want to use find . -name *.php -exec COMMAND {} \; on a debian based system to delete pattern like this:
<?php
#bVj7Gt#
line1
...
lineX
#/bVj7Gt#
?>
The line after <?php = hash + six alphanumeric + hash
The line before ?> = hash + slash + six alphanumeric + hash
This may or may not be what you're looking for (since you didn't provide sample input/output we could test against) using GNU awk for multi-char RS:
$ cat file
foo
<?php
#bVj7Gt#
line1
...
lineX
#/bVj7Gt#
?>
bar
$ awk -v RS='<[?]php\n#[[:alnum:]]{6}#.*#/[[:alnum:]]{6}#\n[?]>\n' -v ORS= '1' file
foo
bar
Make it awk -i inplace -v RS=... if you want to do "inplace" editing.
Related
I have two files - file1 & file2.
file1 contains (only words) says-
ABC
YUI
GHJ
I8O
..................
file2 contains many para.
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi
...................
I am using below command to get the matching lines which contains word from file1 in file2
grep -Ff file1 file2
(Gives output of lines where words of file1 found in file2)
I also need the words which doesn't match/found in file 2 and unable to find Un-matching word.
Can anyone help in getting below output
YUI
I8O
i am looking one liner command (via grep,awk,sed), as i am using pssh command and can't use while,for loop
You can print only the matched parts with -o.
$ grep -oFf file1 file2
ABC
GHJ
Use that output as a list of patterns for a search in file1. Process substitution <(cmd) simulates a file containing the output of cmd. With -v you can print lines that did not match. If file1 contains two lines such that one line is a substring of another line you may want to add -x (only match whole lines) to prevent false positives.
$ grep -vxFf <(grep -oFf file1 file2) file1
YUI
I8O
Using Perl - both matched/non-matched in same one-liner
$ cat sinw.txt
ABC
YUI
GHJ
I8O
$ cat sin_in.txt
dfghjo ABC kll njjgg bla bla
GHJ njhjckhv chasjvackvh ..
ihbjhi hbhibb jh jbiibi
$ perl -lne '
BEGIN { %x=map{chomp;$_=>1} qx(cat sinw.txt); $w="\\b".join("\|",keys %x)."\\b"}
print "$&" and delete($x{$&}) if /$w/ ;
END { print "\nnon-matched\n".join("\n", keys %x) }
' sin_in.txt
ABC
GHJ
non-matched
I8O
YUI
$
Getting only the non-matched
$ perl -lne '
BEGIN {
%x = map { chomp; $_=>1 } qx(cat sinw.txt);
$w = "\\b" . join("\|",keys %x) . "\\b"
}
delete($x{$&}) if /$w/;
END { print "\nnon-matched\n".join("\n", keys %x) }
' sin_in.txt
non-matched
I8O
YUI
$
Note that even a single use of $& variable used to be very expensive for the whole program, in Perl versions prior to 5.20.
Assuming your "words" in file1 are in more than 1 line :
while read line
do
for word in $line
do
if ! grep -q $word file2
then echo $word not found
fi
done
done < file1
For Un-matching words, here's one GNU awk solution:
awk 'NR==FNR{a[$0];next} !($1 in a)' RS='[ \n]' file2 file1
YUI
I8O
Or !($0 in a), it's the same. Since I set RS='[ \n]', every space as line separator too.
And note that I read file2 first, and then file1.
If file2 could be empty, you should change NR==FNR to different file checking methods, like ARGIND==1 for GNU awk, or FILENAME=="file2", or FILENAME==ARGV[1] etc.
Same mechanism for only the matched one too:
awk 'NR==FNR{a[$0];next} $0 in a' RS='[ \n]' file2 file1
ABC
GHJ
I have an HTML file of which I need to get only an specific part. The biggest challenge here is that this HTML file doesn't have linebreaks, so my grep expression isn't working well.
Here is my HTML file:
<p>Test1</p><p>Test2</p>
Note that I have two anchors (<a>) on this line.
I want to get the second anchor and I was trying to get it using:
cat example.html | grep -o "<a.*Test2</p></a>"
Unfortunately, this command returns the whole line, but I want only:
<p>Test2</p>
I don't know how to do this with grep or sed, I'd really appreciate any help.
With GNU awk for multi-char RS, if it's the second record you want:
$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"} NR==2' file
<p>Test2</p>
or if it's the record labeled "Test2":
$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"} /<p>Test2<\/p>/' file
<p>Test2</p>
or:
$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"; FS="</?p>"} $2=="Test2"' file
<p>Test2</p>
Using Perl:
$ perl -pe '#a = split(m~(?<=</a>)~, $_);$_ = $a[1]' file
<p>Test2</p>
Breakdown:
perl -pe ' ' # Read line for line into $_
# and print $_ at the end
m~(?<=</a>)~ # Match the position after
# each </a> tag
#a = split( , $_); # Split into array #a
$_ = $a[1] # Take second item
This should do:
grep -o '<a[^>]*><p>Test2</p></a>' example.html
I'm trying to use sed to remove the last occurrence of } from a file. So far I have this:
sed -i 's/\(.*\)}/\1/' file
But this removes as many } as there are on the end of the file. So if my file looks like this:
foo
bar
}
}
}
that command will remove all 3 of the } characters. How can I limit this to just the last occurrence?
someone game me this as a solution
sed -i '1h;1!H;$!d;g;s/\(.*\)}/\1/' file
I'm just not sure it's as good as the above awk solution.
sed is an excellent tool for simple substitutions on a single line. For anything else, just use awk, e.g. with GNU awk for gensub() and multi-char RS:
$ cat file1
foo
bar
}
}
}
$
$ cat file2
foo
bar
}}}
$
gawk -v RS='^$' -v ORS= '{$0=gensub(/\n?}([^}]*)$/,"\\1","")}1' file1
foo
bar
}
}
$
$ gawk -v RS='^$' -v ORS= '{$0=gensub(/\n?}([^}]*)$/,"\\1","")}1' file2
foo
bar
}}
$
Note that the above will remove the last } char AND a preceding newline if present as I THINK that's probably what you would actually want but if you want to ONLY remove the } and leave a trailing newline in those cases (as I think all of the currently posted sed solutions would do), then just get rid of \n? from the matching RE:
$ gawk -v RS='^$' -v ORS= '{$0=gensub(/}([^}]*)$/,"\\1","")}1' file1
foo
bar
}
}
$
And if you want to change the original file without manually specifying a tmp file, just use the -i inplace argument:
$ gawk -i inplace -v RS='^$' -v ORS= '{$0=gensub(/}([^}]*)$/,"\\1","")}1' file1
$ cat file1
foo
bar
}
}
$
With a buffer you can modify the file directly:
awk 'BEGIN{file=ARGV[1]}{a[NR]=$0}/}/{skip=NR}END{for(i=1;i<=NR;++i)if(i!=skip)print a[i]>file}' file
thnks to #jthill for remark for the 1 line file issue
sed ':a
$ !{N
ba
}
$ s/}\([^}]*\)$/\1/' YourFile
Need to load the file in buffer first. This does not remove the new line if } is alone on a line
When I read "do something with the last ...", I think "reverse the file, do something with the first ..., re-reverse the file"
tac file | awk '!seen && /}/ {$0 = gensub(/(.*)}/, "\\\1", 1); seen = 1} 1' | tac
I'm trying to insert a file content before a given pattern
Here is my code:
sed -i "" "/pattern/ {
i\\
r $scriptPath/adapters/default/permissions.xml"
}" "$manifestFile"
It adds the path instead of the content of the file.
Any ideas ?
In order to insert text before a pattern, you need to swap the pattern space into the hold space before reading in the file. For example:
sed "/pattern/ {
h
r $scriptPath/adapters/default/permissions.xml
g
N
}" "$manifestFile"
Just remove i\\.
Example:
$ cat 1.txt
abc
pattern
def
$ echo hello > 2.txt
$ sed -i '/pattern/r 2.txt' 1.txt
$ cat 1.txt
abc
pattern
hello
def
I tried Todd's answer and it works great,
but I found "h" & "g" commands are ommitable.
Thanks to this faq (found from #vscharf's comments), Todd's answer can be this one liner.
sed -i -e "/pattern/ {r $file" -e 'N}' $manifestFile
Edit:
If you need here-doc version, please check this.
I got something like this using awk. Looks ugly but did the trick in my test:
command:
cat test.txt | awk '
/pattern/ {
line = $0;
while ((getline < "insert.txt") > 0) {print};
print line;
next
}
{print}'
test.txt:
$ cat test.txt
some stuff
pattern
some other stuff
insert.txt:
$ cat insert.txt
this is inserted file
this is inserted file
output:
some stuff
this is inserted file
this is inserted file
pattern
some other stuff
CodeGnome's solution don't work, if the pattern is on the last line..
So I used 3 commands.
sed -i '/pattern/ i\
INSERTION_MARKER
' $manifestFile
sed -i '/INSERTION_MARKER/r $scriptPath/adapters/default/permissions.xml' $manifestFile
sed -i 's/INSERTION_MARKER//' $manifestFile
I have a binary file which I convert into a regular file using hexdump and few awk and sed commands. The output file looks something like this -
$cat temp
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e5820000000000000000000
000000087d3f513000000000000000000000000000000000001001001010f000000000026
58783100b354c52658783100b43d3d0000ad6413400103231665f301010b9130194899f2f
fffffffffff02007c00dc015800a040402802f1d5b2b8ca5674504f433031000000000004
6363070000000000000000000000000065450000b4fb6b4000393d3d1116cdcc57e58287d
3f55285a1084b
The temp file has few eye catchers (3d3d) which don't repeat that often. They kinda denote a start of new binary record. I need to split the file based on those eye catchers.
My desired output is to have multiple files (based on the number of eyecatchers in my temp file).
So my output would look something like this -
$cat temp1
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e582000000000000000
0000000000087d3f513000000000000000000000000000000000001001001010f00000000
002658783100b354c52658783100b4
$cat temp2
3d3d0000ad6413400103231665f301010b9130194899f2ffffffffffff02007c00dc0
15800a040402802f1d5b2b8ca5674504f4330310000000000046363070000000000000000
000000000065450000b4fb6b400039
$cat temp3
3d3d1116cdcc57e58287d3f55285a1084b
The RS variable in awk is nice for this, allowing you to define the record separator. Thus, you just need to capture each record in its own temp file. The simplest version is:
cat temp |
awk -v RS="3d3d" '{ print $0 > "temp" NR }'
The sample text starts with the eye-catcher 3d3d, so temp1 will be an empty file. Further, the eye-catcher itself won't be at the start of the temp files, as was shown for the temp files in the question. Finally, if there are a lot of records, you could run into the system limit on open files. Some minor complications will bring it closer to what you want and make it safer:
cat temp |
awk -v RS="3d3d" 'NR > 1 { print RS $0 > "temp" (NR-1); close("temp" (NR-1)) }'
#!/usr/bin/perl
undef $/;
$_ = <>;
$n = 0;
for $match (split(/(?=3d3d)/)) {
open(O, '>temp' . ++$n);
print O $match;
close(O);
}
This might work:
# sed 's/3d3d/\n&/2g' temp | split -dl1 - temp
# ls
temp temp00 temp01 temp02
# cat temp00
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e5820000000000000000000000000087d3f513000000000000000000000000000000000001001001010f000000000026 58783100b354c52658783100b4
# cat temp01
3d3d0000ad6413400103231665f301010b9130194899f2ffffffffffff02007c00dc015800a040402802f1d5b2b8ca5674504f4330310000000000046363070000000000000000000000000065450000b4fb6b400039
# cat temp02
3d3d1116cdcc57e58287d3f55285a1084b
EDIT:
If there are newlines in the source file you can remove them first by using tr -d '\n' <temp and then pipe the output through the above sed command. If however you wish to preserve them then:
sed 's/3d3d/\n&/g;s/^\n\(3d3d\)/\1/' temp |csplit -zf temp - '/^3d3d/' {*}
Should do the trick
Mac OS X answer
Where that nice awk -v RS="pattern" trick doesn't work. Here's what I got working:
Given this example concatted.txt
filename=foo bar
foo bar line1
foo bar line2
filename=baz qux
baz qux line1
baz qux line2
use this command (remove comments to prevent it from failing)
# cat: useless use of cat ^__^;
# tr: replace all newlines with delimiter1 (which must not be in concatted.txt) so we have one line of all the next
# sed: replace file start pattern with delimiter2 (which must not be in concatted.txt) so we know where to split out each file
# tr: replace delimiter2 with NULL character since sed can't do it
# xargs: split giant single-line input on NULL character and pass 1 line (= 1 file) at a time to echo into the pipe
# sed: get all but last line (same as head -n -1) because there's an extra since concatted-file.txt ends in a NULL character.
# awk: does a bunch of stuff as the final command. Remember it's getting a single line to work with.
# {replace all delimiter1s in file with newlines (in place)}
# {match regex (sets RSTART and RLENGTH) then set filename to regex match (might end at delimiter1). Note in this case the number 9 is the length of "filename=" and the 2 removes the "§" }
# {write file to filename and close the file (to avoid "too many files open" error)}
cat ../concatted-file.txt \
| tr '\n' '§' \
| sed 's/filename=/∂filename=/g' \
| tr '∂' '\0' \
| xargs -t -0 -n1 echo \
| sed \$d \
| awk '{match($0, /filename=[^§]+§/)} {filename=substr($0, RSTART+9, RLENGTH-9-2)".txt"} {gsub(/§/, "\n", $0)} {print $0 > filename; close(filename)}'
results in these two files named foo bar.txt and baz qux.txt respectively:
filename=foo bar
foo bar line1
foo bar line2
filename=baz qux
baz qux line1
baz qux line2
Hope this helps!
It depends if it's a single line in your temp file or not. But assuming if it's a single line, you can go with:
sed 's/\(.\)\(3d3d\)/\1#\2/g' FILE | awk -F "#" '{ for (i=1; i++; i<=NF) { print $i > "temp" i } }'
The first sed inserts a # as a field/record separator, then awk splits on # and prints every "field" to its own file.
If the input file is already split on 3d3d then you can go with:
awk '/^3d3d/ { i++ } { print > "temp" i }' temp
HTH