How to git-apply a git word diff - git-diff

I needed to edit a messy commit commit that only changed a word in a few subsequent rows, keeping some of those changes and removing others.
The changes were easy to see in git diff --word-diff, and in that format I could easily edit the hunks to do what I intended to do, but now I have a file like this
diff --git a/cldf/forms.csv b/cldf/forms.csv
index 46c12a4..0374ece 100644
--- a/cldf/forms.csv
+++ b/cldf/forms.csv
## -1783,8 +1783,8 ## ID,Lect_ID,Concept_ID,Form_according_to_Source,Form,Local_Orthography,Segments,C
1782,adan1251-lawah,day,dilɛlɛ,dilɛlɛ,dilele,d i l ɛ l ɛ,Lit. 'all day'.,datasets_Adang_Lawahing_tsv
1783,adan1251-lawah,day,wɛd saha,wɛd_saha,wed saha,w ɛ d _ s a h a,midday' lit. 'hot sun',datasets_Adang_Lawahing_tsv
1784,adan1251-lawah,morning,lalami,lalami,lalami,l a l a m i,,datasets_Adang_Lawahing_tsv
1785,adan1251-lawah,yesterday,ʔu:mi,ʔuːmi,[-umi-]{+'umi+},ʔ uː m i,,datasets_Adang_Lawahing_tsv
1786,adan1251-lawah,day_before_yesterday,ʔotariŋ alumi,ʔotariŋ_alumi,[-otaring-]{+'otaring+} alumi,ʔ o t a r i ŋ _ a l u m i,,datasets_Adang_Lawahing_tsv
1787,adan1251-lawah,tomorrow,dilɛlɛ,dilɛlɛ,dilele,d i l ɛ l ɛ,,datasets_Adang_Lawahing_tsv
1788,adan1251-lawah,day_after_tomorrow,a:lu,aːlu,alu,aː l u,,datasets_Adang_Lawahing_tsv
1789,adan1251-lawah,twilight_dawn,lalami,lalami,lalami,l a l a m i,"(lit, 'early morning')",datasets_Adang_Lawahing_tsv
which I would like to use as a patch for git apply.
However, vanilla git apply words.diff fails with a fatal: corrupt patch at line 6 – a normal diff file would start with a space in that unaffected line – and I don't see anything that might make git apply accept a word-diff file in its manpage.
How can I convince git apply to take a file of this format as patch? Or how can I easily convert this file into a valid patch?

I couldn't find a working solution, so I've put together a script that converts word-diff into a regular diff that can be applied:
#!/usr/bin/env perl
# convert-word-diff.pl -- rev. 2, this script is licensed under WTFPLv2
my (#minus, #plus);
sub flush_diff {
print join("", map { "-$_" } #minus);
print join("", map { "+$_" } #plus);
#minus = (); #plus = ();
}
while (my $line = <>) {
if ($line =~ /^(?:index |diff |\+\+\+ |\-\-\- |## )/) {
flush_diff();
print $line;
next;
}
my $is_diff_line;
if ($line =~ /\[\-.*\-\]/ || $line =~ /\{\+.*?\+\}/) {
my $copy = $line;
$copy =~ s/\[\-(.*?)\-\]\{\+.*?\+\}/\1/g;
$copy =~ s/\[\-(.*?)\-\] ( )?/ \1 /g;
$copy =~ s/\{\+.*?\+\} ?//g;
push(#minus, $copy);
$copy = $line;
$copy =~ s/\[\-.*?\-\]//g;
$copy =~ s/\{\+(.*?)\+\}/\1/g;
push(#plus, $copy);
$is_diff_line = 1;
}
unless ($is_diff_line) {
flush_diff();
print " $line" ;
}
}
flush_diff();
Usage:
cat word-diff.txt | perl convert-word-diff.pl | git apply
Hopefully I didn't mess up anything and you are on Linux/Mac and have Perl. :-)

Related

I need some idea on text processing for SRT subtitles

Title says what I really need ATM.
Basically I've created an OCR toolchain based on Tesseract and ImageMagick. I've managed to get it to the point the output text is very consistent. I'm using this to OCR some old hardsubbed videos and make them into soft subbed SRT subs. To take the screenshots for the image input I'm using a modified version of an old shell script I found and rewrote ages ago. Those get feed into a second script that processes them into a form readable by Tessaract. At this point I could easily do the remainder of the work by hand, but I'd like to automate all but the final proofread pass if possible.
Example Text (From current project)
03:04.418 Their parents have always written letters thanking us. =
03:05.018 Their parents have always written letters thanking us. =
03:05.619 Their parents have always written letters thanking us. =
03:06.219 Their parents have always written letters thanking us. =
03:06.820 Their parents have always written letters thanking us. =
03:07.421 Their parents have always written letters thanking us. =
03:08.021 Their parents have always written letters thanking us. =
03:08.622 This seminary was highly reeemmended. | am relieved te leave her in your care. =
03:09.222 This seminary was highly reeemmended. | am relieved te leave her in your care. =
03:09.823 This seminary was highly reeemmended. | am relieved te leave her in your care. =
03:10.424 This seminary was highly reeemmended. | am relieved te leave her in your care. =
03:11.024 This seminary was highly reeemmended. | am relieved te leave her in your care. =
03:11.625 This seminary was highly reeemmended. | am relieved te leave her in your care. =
03:12.225 In additien te all the previeus requests se far..."
03:12.826 In additien te all the previeus requests se far..."
03:13.427 In additien te all the previeus requests se far..."
03:14.027 In additien te all the previeus requests se far..."
03:14.628 In additien te all the previeus requests se far..."
basically I want to match the Text and pull the timestamps from the first and last lines and set them up in srt format
1
00:03:04,418 --> 00:03:08,021
Their parents have always written
letters thanking us. =
2
00:03:08,622 --> 00:03:08,622
This seminary was highly reeemmended
| am relieved te leave her in your care. =
3
00:03:12,225 --> 00:03:14,628
In additien te all the previeus requests se far..."
At this point I'm fine with it being a separate script.
Basically sub.txt in sub.srt out. Then do a Proofread pass. Now there is a bit of Variability in the detected text but it's minimal. I is occasionally detected as | or [, and it sometimes mixes up o and e in some odd corner cases.
Edit February 2 2020:
I've made some changes and tweaks to further get what I wanted. to Both MY shell script and Ivans. I've eliminated The blank sub Lines produced by ivans script and mine as well.
UPDATED processing and ocr script BTW
#!/bin/bash -x
cd "$1"
mkdir ocr
for f in *.png ;
do
base="$(basename "$f" | cut -d "." -f 1,2)"
echo "$base"
if [[ -z "$2" ]] ;
then
tran="$(convert "$f" -separate -average -crop +0+720 -threshold 11% -fill black -draw 'color 700,10 floodfill' +repage ocr/"$base".png)"
else
tran="$(convert "$f" -separate -average -crop +0+720 -negate -threshold 15% -fill white -draw 'color 700,10 floodfill' +repage ocr/"$base".png)"
fi
$tran
cd ocr
magick mogrify -pointsize 50 -fill blue -draw 'text 1400,310 "L" ' +repage "$base".png
cd ..
done
cd ocr
for i in *.png ;
do base2="$(basename "$i" | cut -d "." -f 1,2 | cut -d ":" -f 2,3)"
tesseract "$i" stdout -c page_separator='' --psm 6 --oem 1 --dpi 300 | { tr '\n' ' '; tr -s [:space:] ' '; echo; } >> text.txt
echo "$base2"" " >> time.txt
done
awk '{printf ("%s", $0); getline < "text.txt"; print $0 }' time.txt >> out.txt
sed -i 's/|/I/g' out.txt
sed -i 's/\[/I/g' out.txt
#sed -i 's/L//g' out.txt
#sed -i 's/=//g' out.txt
sed -i 's/.$//' out.txt
sed -i 's/.$//' out.txt
while read line ; do
sed "/[[:alpha:]]/ !d" >> sub.txt
done <out.txt
exit
The Part Making the Blue L is to ensure every line has something in it for timestamp matching.
UPDATED IVAN SRT SCRIPT
#!/bin/bash -x
sub="$1" # path to sub file
OLD=$IFS # remember current delimiter
IFS=$'\n' # set delimiter to the new line
raw=( $(cat $sub) ) # load sub into raw array
IFS=$OLD # set default delimiter back
reset () {
unset raw[0] # remove 1-st item from array
raw=( "${raw[#]}" ) # rearange array
}
output () {
printf "00:$time1 --> 00:$time3\n$text1\n\n"
}
speen () {
time3=$time2
reset
test=( "${raw[#]::2}" ) # get two more items
test2=( ${test[0]} ) # split 2-nd item
time2=${test2[0]} # get 2-nd timing
text2=${test2[#]:1} # get 2-nd text
# if only one item in test than this is the end, return
[[ "${test[1]}" ]] || { printf "00:$time1 --> 00:$time2\n$text1\n\n"; raw=; return; }
# compare, speen more if match, print ang go further if not
[[ "$text1" == "$text2" ]] && speen || output
}
N=1 # set counter
while [[ "${raw[#]}" ]]; do # loop through data
echo $((N++)) # print and inc counter
test1=( $raw ) # get 1-st item
time1=${test1[0]} # get 1-st timing
text1=${test1[#]:1}
# get 1-st text
speen
done
I just added a third time variable to save the old time2 value as time3. Basically Eliminating the blank timestamp line broke his matching. I realized that time2 was the First non matching time stamp. So I needed to save the one prior from the last loop. Thus time3=$time2 Then rest the time2 value. Then use the old time2 ( now time3) to print the sub string.
Ended with this
#!/bin/bash
sub=file # path to sub file
OLD=$IFS # remember current delimiter
IFS=$'\n' # set delimiter to the new line
raw=( $(cat $sub) ) # load sub into raw array
IFS=$OLD # set default delimiter back
reset () {
unset raw[0] # remove 1-st item from array
raw=( "${raw[#]}" ) # rearange array
}
output () {
text1=${text1//|/I} # change | to I in text
text1=${text1//[/I} # change [ to I in text
printf "$time1 --> $time2\n$text1\n\n"
}
speen () {
reset
test=( "${raw[#]::2}" ) # get two more items
test2=( ${test[0]} ) # split 2-nd item
time2=${test2[0]} # get 2-nd timing
text2=${test2[#]:1} # get 2-nd text
# if only one item in test than this is the end, return
[[ "${test[1]}" ]] || { printf "$time1 --> $time2\n$text1\n\n"; raw=; return; }
# compare, speen more if match, print ang go further if not
[[ "$text1" == "$text2" ]] && speen || output
}
N=1 # set counter
while [[ "${raw[#]}" ]]; do # loop through data
echo $((N++)) # print and inc counter
test1=( $raw ) # get 1-st item
time1=${test1[0]} # get 1-st timing
text1=${test1[#]:1} # get 1-st text
speen
done

Create asterisk border around output in terminal?

I need to create a border around the output of a command in terminal so that if, for example, the output of a command is this:
Apple
Paper Clip
Water
It will become this:
/==========\
|Apple |
|Paper Clip|
|Water |
\==========/
Thanks ahead of time for any and all responses.
-C.L
awk seems like the least insane way to go about this:
command | expand | awk 'length($0) > length(longest) { longest = $0 } { lines[NR] = $0 } END { gsub(/./, "=", longest); print "/=" longest "=\\"; n = length(longest); for(i = 1; i <= NR; ++i) { printf("| %s %*s\n", lines[i], n - length(lines[i]) + 1, "|"); } print "\\=" longest "=/" }'
expand replaces tabs that may be in the output with the appropriate number of spaces to keep the look of it the same (this is to make sure that every byte of output is rendered with the same width). The awk code works as follows:
length($0) > length(longest) { # Remember the longest line
longest = $0
}
{ # also remember all lines in order
lines[NR] = $0
}
END { # when you have everything:
gsub(/./, "=", longest) # build a line of = as long as the longest
# line
print "/=" longest "=\\" # use it to print the top bit
n = length(longest) # format the content with left and right
for(i = 1; i <= NR; ++i) { # delimiters; spacing through printf
printf("| %s %*s\n", lines[i], n - length(lines[i]) + 1, "|")
}
print "\\=" longest "=/" # print bottom bit.
}
The most insane way to do it, and I dare you to dispute this, is with sed:
#!/bin/sed -f
# assemble lines in the hold buffer, preceded by the left delimiter
s/^/| /
1h
1!H
$!d
# make a copy of it in the pattern space
x
h
# isolate the longest line (or rather: a line of = as long as the longest
# line)
s/[^\n]/=/g
:a
/^\(=*\)\n\1/ {
s//\1/
ba
}
//! {
s/\n=*//
ta
}
# build top bit, print it
s,.*,/&\\,
p
# build measuring stick
s,.\(.*\).,=\1,
# for all lines in the output:
:lineloop
# fetch the line
G
s/^\(=*\n\)\([^\n]*\).*/\1\2/
# replace it with = to get a second measuring stick
s/[^\n]/=/g
# fetch another copy of the line
G
s/^\(=*\n=*\n\)\([^\n]*\).*/\1\2/
# inner loop:
:spaceloop
# while the line measuring stick is not as long as the overall measuring
# stick
/^\(=*\)\n\1/! {
# append a = to it and a space to the line for output
s/\n/\n=/
s/$/ /
b spaceloop
}
# once that is done, append the second delimiter
s/$/|/
# remove one measuring stick
s/=*\n//
# put the second behind the actual line
s/\(.*\)\n\(.*\)/\2\n\1/
# print the line
P
# remove it. Only the measuring stick remains and can be reused for the
# next line
s/.*\n//
# do this while there are more lines to be processed
x
/\n/ {
s/[^\n]*\n//
x
b lineloop
}
# then build the bottom bit and print it.
x
s/=/\\/
s/$/\//
Put that in a file foo.sed, use command | expand | sed -f foo.sed. But only do it once to confirm that it works. You don't want to run something like that in production.
Not in the language you were looking for, but succinct and readable:
#!/usr/bin/env ruby
input = STDIN.read.split("\n")
width = input.map(&:size).max + 2
bar = '='*(width-2)
puts '/' + bar + '\\'
input.each {|i| puts "|"+i+" "*(width-i.size-2)+"|" }
puts '\\'+ bar + '/'
You can save it in a file, chmod +x it, and pipe your input into it.
If you "need" to have it in a one-liner:
echo e"Apple\nPaper Clip\nWater" |
ruby -e 'i=STDIN.read.split("\n");w=i.map(&:size).max+2;b="="*(w-2);i.map! {|j| "|"+j+" "*(w-j.size-2)+"|" };i.unshift "/"+b+"\\"; i<<"\\"+b+"/";puts i'

Iterative and conditional deleting of lines in a file

Intro
I have a file named data.dat with the following structure:
1: 67: 1 :s
1: 315: 1 :s
1: 648: 1 :ns
1: 799: 1 :s
1: 809: 1 :s
1: 997: 1 :ns
2: 32: 1 :s
Algorithm
The algorithm that I'm looking for is:
Generate a random number between 1 and number of lines in this file.
Delete that line if the fourth column is "s".
Otherwise generate another random number and repeat this until the number of lines reaches to a certain value.
Technical Concepts
Though technical concepts are irrelevant to this algorithm, but I try to explain the problem. The data shows connectivity table of a network. This algorithm allows us to run it over different initial conditions and study general properties of these networks. Especially, because of randomness property of deleting bonds, any common behavior among these networks can be interpreted as a fundamental law.
Update: Another good reason to produce a random number in each step is that after removing each line, it's possible that property of being s/ns of remaining lines can be changed.
Code
Here is the code I have until now:
#!/bin/bash
# bash in OSX
While ((#there is at least 1 s in the fourth column)); do
LEN=$(grep -c "." data.dat) # number of lines
RAND=$((RANDOM%${LEN}+1)) # generating random number
if [[awk -F, "NR==$RAND" 'data.dat' | cut -d ':' -f 4- == "s"]]; then
sed '$RANDd' data.txt
else
#go back and produce another random
done
exit
I try to find the fourth column with awk -F, "NR==$RAND" 'data.dat' | cut -d ':' -f 4- and deleting the line by sed '$RANDd' data.txt.
Questions
How should I check that there is s pairs in my file?
I am not sure if the condition in if is correct.
Also, I don't know how to force loop after else to go back to generate another random number.
Thank you,
I really appreciate your help.
Personally, I would recommend against doing this in bash unless you have absolutely no choice.
Here's another way you could do it in Perl (quite similar in functionality to Alex's answer but a bit simpler):
use strict;
use warnings;
my $filename = shift;
open my $fh, "<", $filename or die "could not open $filename: $!";
chomp (my #lines = <$fh>);
my $sample = 0;
my $max_samples = 10;
while ($sample++ < $max_samples) {
my $line_no = int rand #lines;
my $line = $lines[$line_no];
if ($line =~ /:s\s*$/) {
splice #lines, $line_no, 1;
}
}
print "$_\n" for #lines;
Usage: perl script.pl data.dat
Read the file into the array #lines. Pick a random line from the array and if it ends with :s (followed by any number of spaces), remove it. Print the remaining lines at the end.
This does what you want but I should warn you that relying on built-in random number generators in any language is not a good way to arrive at statistically significant conclusions. If you need high-quality random numbers, you should consider using a module such as Math::Random::MT::Perl to generate them, rather than the built-in rand.
#!/usr/bin/env perl
# usage: $ excise.pl < data.dat > smaller_data.dat
my $sampleLimit = 10; # sample up to ten lines before printing output
my $dataRef;
my $flagRef;
while (<>) {
chomp;
push (#{$dataRef}, $_);
push (#{$flagRef}, 1);
}
my $lineCount = scalar #elems;
my $sampleIndex = 0;
while ($sampleIndex < $sampleLimit) {
my $sampleLineIndex = int(rand($lineCount));
my #sampleElems = split("\t", $dataRef->[$sampleLineIndex];
if ($sampleElems[3] == "s") {
$flagRef->[$sampleLineIndex] = 0;
}
$sampleIndex++;
}
# print data.dat to standard output, minus any sampled lines that had an 's' in them
foreach my $lineIndex (0..(scalar #{$dataRef} - 1)) {
if ($flagRef->[$lineIndex] == 1) {
print STDOUT $dataRef->[$lineIndex]."\n";
}
}
NumLine=$( grep -c "" data.dat )
while [ ${NumLine} -gt ${TargetLine} ]
do
# echo "Line at start: ${NumLine}"
RndLine=$(( ( ${RANDOM} % ${NumLine} ) + 1 ))
RndValue="$( echo " ${RANDOM}" | sed 's/.*\(.\{6\}\)$/\1/' )"
sed "${RndLine} {
s/^\([^:]*:\)[^:]*\(:.*:ns$\)/\1${RndValue}\2/
t
d
}" data.dat > /tmp/data.dat
mv /tmp/data.dat data.dat
NumLine=$( grep -c "" data.dat )
#cat data.dat
#echo "- Next Iteration -------"
done
tested on AIX (so not a GNU sed). Under Linux, use --posix for sed option and you can use a -i in place of temporary file + redirection + move in this case
Dont't forget that RANDOM is NOT a real RANDOM so study on network behavior based on not random value could not reflect a reality bu a specific case

Using Shell tools (sed | awk... etc) to compute max, min and average field values from a given sample.dat file

I have a sample.dat file which contains experiment values for 10 different
fields, recorded over time. Using sed, awk or any other shell tool, i need to write a script that reads in sample.data file and for each field computes the max, min and average.
sample.dat
field1:experiment1: 10.0
field2:experiment1: 12.5
field1:experiment2: 5.0
field2:experiment2: 14.0
field1:experiment3: 18.0
field2:experiment3: 3.5
Output
field1: MAX = 18.0, MIN = 5.0, AVERAGE = 11.0
field2: MAX = 14.0, MIN = 3.5, AVERAGE = 10.0
awk -F: '
{
sum[$1]+=$3;
if(!($1 in min) || (min[$1]>$3))
min[$1]=$3;
if(!($1 in max) || (max[$1]<$3))
max[$1]=$3;
count[$1]++
}
END {
for(element in sum)
printf("%s: MAX=%.1f, MIN=%.1f, AVARAGE=%.1f\n",
element,max[element],min[element],sum[element]/count[element])
}' sample.dat
Output
field1: MAX=18.0, MIN=5.0, AVARAGE=11.0
field2: MAX=14.0, MIN=3.5, AVARAGE=10.0
Here is a Perl solution I made (substitute the file name for whatever file you use):
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(max min sum);
open( my $fh, "<", "sample.dat" ) or die $!;
my %fields;
while (<$fh>) {
chomp;
$_ =~ s/\s+//g;
my #line = split ":";
push #{ $fields{ $line[0] } }, $line[2];
}
close($fh);
foreach ( keys %fields ) {
print "$_: MAX="
. max #{ $fields{$_} };
print ", MIN="
. min #{ $fields{$_} };
print ", AVERAGE="
. ( (sum #{ $fields{$_} }) / #{ $fields{$_} } ) . "\n";
}
In bash with bc:
#!/bin/bash
declare -A min
declare -A max
declare -A avg
declare -A avgCnt
while read line; do
key="${line%%:*}"
value="${line##*: }"
if [ -z "${max[$key]}" ]; then
max[$key]="$value"
min[$key]="$value"
avg[$key]="$value"
avgCnt[$key]=1
else
larger=`echo "$value > ${max[$key]}" | bc`
smaller=`echo "$value < ${min[$key]}" | bc`
avg[$key]=`echo "$value + ${avg[$key]}" | bc`
((avgCnt[$key]++))
if [ "$larger" -eq "1" ]; then
max[$key]="$value"
fi
if [ "$smaller" -eq "1" ]; then
min[$key]="$value"
fi
fi
done < "$1"
for i in "${!max[#]}"
do
average=`echo "${avg[$i]} / ${avgCnt[$i]}" | bc`
echo "$i: MAX = ${max[$i]}, MIN = ${min[$i]}, AVERAGE = $average"
done
You can make use of this python code :
from collections import defaultdict
d = defaultdict(list)
[d[(line.split(":")[0])].append(float(line.split(":")[2].strip("\n "))) for line in open("sample.dat")]
for f in d: print f, ": MAX=", max(d[f]),", MIN=", min(d[f]),", AVG=", sum(d[f])/float(len(d[f]))
You can use gnu-R for something like this:
echo "1" > foo
echo "2" >> foo
cat foo \
| r -e \
'
f <- file("stdin")
open(f)
v <- read.csv(f,header=F)
write(max(v),stdout())
'
2
For summary statistics,
cat foo \
| r -e \
'
f <- file("stdin")
open(f)
v <- read.csv(f,header=F)
write(summary(v),stdout())
'
# Max, Min, Mean, median, quartiles, deviation, etc.
...
And in json:
... | r -e \
'
library(rjson)
f <- file("stdin")
open(f)
v <- read.csv(f,header=F)
json_summary <- toJSON(summary(v))
write(json_summary,stdout())
'
# same stats
| jq '.Max'
# for maximum
If you are using the linux command line environment, then you probably don't want to reimplement wheels, stay vectorized, and have clean code that is easy to read and develop, and which performs some standard, composable function.
In this case, you don't need an object oriented language (using python will tend to induce interface and code bloat and iterations with google, pip, and conda depending on the libs you need and type conversions you have to code by hand), you don't need verbose syntax, and you probably need to deal with dataframes/vectors/rows/columns of numerical data by default.
You probably also want scripts that can float around your particular machine without issues. If you are on Linux, that probably means: gnu-R. Install dependencies via apt-get.

How to solve bail out error in Unix script invoking nawk? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
The following command is working fine when I am not writing it in a script file, but when I put this command in a script file, it shows an error.
nawk 'c-- >0;$0~s{if(b)for(c=b+1;c >1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=10 a=10 s="string pattern" file
The error is:
nawk: syntax error at source line 1 context is >>> ' <<< missing }
nawk: bailing out at source line 1
One of the comment responses to one of the many requests for 'What does your script look like' is:
#!/bin/ksh
Stringname=$1
directory=$2
d=$3
Command="nawk 'c-- >0;$0~s{if(b)for(c=b+1;c >1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=10 a=10 s=\"$stringname\" $directory"
$Command> $d
Storing the whole command in a string like that is hugely fraught; don't do it! It's unnecessary and very, very hard to get right.
#!/bin/ksh
Stringname=$1
directory=$2
d=$3
nawk 'c-- >0;$0~s{if(b)for(c=b+1;c >1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=10 a=10 s="$stringname" $directory > $d
The quickest way to solve the problem of printing N lines before and M lines after a match is to install GNU grep and use:
grep -B $N -A $M 'string pattern' file
Failing that, here's a Perl script I about 5 years ago to do the job. Note that there are some complications if you ask for 10 lines before and 10 lines after a match, and you have:
a match at line 7 (not 10 lines before)
a match at line 30 and another at 35 (need to print lines 20-45)
a match at line 60 where the last line is line 65 (not 10 lines after)
and there are multiple files to process.
This code does handle all that. It can probably be improved.
#!/usr/bin/perl -w
#
# #(#)$Id: sgrep.pl,v 1.6 2007/09/18 22:55:20 jleffler Exp $
#
# Perl-based SGREP (special grep) command
#
# Print lines around the line that matches (by default, 3 before and 3 after).
# By default, include file names if more than one file to search.
#
# Options:
# -b n1 Print n1 lines before match
# -f n2 Print n2 lines following match
# -n Print line numbers
# -h Do not print file names
# -H Do print file names
use strict;
use constant debug => 0;
use Getopt::Std;
my(%opts);
sub usage
{
print STDERR "Usage: $0 [-hnH] [-b n1] [-f n2] pattern [file ...]\n";
exit 1;
}
usage unless getopts('hnf:b:H', \%opts);
usage unless #ARGV >= 1;
if ($opts{h} && $opts{H})
{
print STDERR "$0: mutually exclusive options -h and -H specified\n";
exit 1;
}
my $op = shift;
print "# regex = $op\n" if debug;
# print file names if -h omitted and more than one argument
$opts{F} = (defined $opts{H} || (!defined $opts{h} and scalar #ARGV > 1)) ? 1 : 0;
$opts{n} = 0 unless defined $opts{n};
my $before = (defined $opts{b}) ? $opts{b} + 0 : 3;
my $after = (defined $opts{f}) ? $opts{f} + 0 : 3;
print "# before = $before; after = $after\n" if debug;
my #lines = (); # Accumulated lines
my $tail = 0; # Line number of last line in list
my $tbp_1 = 0; # First line to be printed
my $tbp_2 = 0; # Last line to be printed
# Print lines from #lines in the range $tbp_1 .. $tbp_2,
# leaving $leave lines in the array for future use.
sub print_leaving
{
my ($leave) = #_;
while (scalar(#lines) > $leave)
{
my $line = shift #lines;
my $curr = $tail - scalar(#lines);
if ($tbp_1 <= $curr && $curr <= $tbp_2)
{
print "$ARGV:" if $opts{F};
print "$curr:" if $opts{n};
print $line;
}
}
}
# General logic:
# Accumulate each line at end of #lines.
# ** If current line matches, record range that needs printing
# ** When the line array contains enough lines, pop line off front and,
# if it needs printing, print it.
# At end of file, empty line array, printing requisite accumulated lines.
while (<>)
{
# Add this line to the accumulated lines
push #lines, $_;
$tail = $.;
printf "# array: N = %d, last = $tail: %s", scalar(#lines), $_ if debug > 1;
if (m/$op/o)
{
# This line matches - set range to be printed
my $lo = $. - $before;
$tbp_1 = $lo if ($lo > $tbp_2);
$tbp_2 = $. + $after;
print "# $. MATCH: print range $tbp_1 .. $tbp_2\n" if debug;
}
# Print out any accumulated lines that need printing
# Leave $before lines in array.
print_leaving($before);
}
continue
{
if (eof)
{
# Print out any accumulated lines that need printing
print_leaving(0);
# Reset for next file
close ARGV;
$tbp_1 = 0;
$tbp_2 = 0;
$tail = 0;
#lines = ();
}
}
I bet you're trying to execute your script as nawk -f file instead of just ./file.

Resources