I have a script that searches through a very large number of files, and uses sed to substitute a multiple line pattern. The script is iterative, and it works fine on some iterations but sometimes it causes a segmentation fault.
This is what the script is doing:
Search for files that DON'T contain the string X
Out of these files, search the ones that CONTAIN the string Y
Iterate the returned file list with a for-loop
If the file contents match pattern A, replace pattern A with A_TAG
The same for patterns B,C,D (a file can contain only one of A,B,C,D)
Patterns A,B,C,D are multiline, and they are replaced with two lines. X and Y are single line.
Here's the script. I apologise for the long lines, but I decided not to edit them since they're regex. I did however shorten the regex by replacing strings with "pattern" - the replaced contents are NOT the same in every regex, but they don't have any special characters so I don't think the actual contents are relevant to this question. Besides, the regex has been shown to work so you probably don't need to fully understand it..
#!/bin/sh
STRING_A="Pattern(\n|.)*Pattern\.\""
A_TAG="\$STRING:A$"
STRING_B="(Pattern(\n|.)*)?(Pattern(\n|.)*)?Pattern(\n|.)*Pattern(\n|.)*Pattern\.((\n|.)*will be met\: http\:\/\/www.foo\.org\/example\/temp\.html\.\n)?"
B_TAG="\$STRING:B$"
STRING_C="(Pattern(\n|.)*)?Pattern(\n|.)*http\:\/\/www\.foo\.org\/bar\/old-foobar\/file\-2\.1\.html\.((\n|.)*Pattern.*Pattern)?"
C_TAG="\$STRING:C$"
STRING_D="(Pattern(\n|.)*)?(Pattern(\n|.)*http\:\/\/www\.foo\.org\/bar\/old-foobar\/file\-2\.1\.html.*|Pattern(\n|.)*Pattern)((\n|.)*http\:\/\/www\.some-site\.org/\.)?"
D_TAG="\$STRING:D$"
## params: #1 file, #2 PATTERN, #3 TAG
multil_sed()
{
echo "In multil_sed"
# -n = silent, -r = extended regex, -i = inline changes
sed -nr '
# Sed has a hold buffer that we can use to "keep text in memory".
# Here we copy the line to the buffer if it is the first line of the file,
# or append it if it is not
1h
1!H
# We must first save all lines until the nth line to the hold buffer,
# then we can search for our pattern
60 {
# Then we must use the pattern buffer. Pattern buffer holds text that
# is up for modification. With g we can hopy the hold buffer into the pattern space
g
# Now we can just use the substitution command as we normally would. Use # as a delimiter
s#([ \t:#*;/".\\-]*)'"$2"'#\1'"$3"'\
\1$QT_END_LICENSE$#Ig
# Finally print what we did
p
}
' $1 > $1.foo;
echo "Done"
}
for p in $(find . -type f -not -iwholename '*.git*' -exec grep -iL '.*STRING_X.*' {} \; | xargs grep -il -E '.*STRING_Y.*')
do
echo
echo "####################"
echo "Working on file" $p
#Find A
if pcregrep -qiM "$STRING_A" "$p";
then
echo "A"
multil_sed "$p" "$STRING_A" "$A_TAG"
#Find B
elif pcregrep -qiM "$STRING_B" "$p";
then
echo "B"
multil_sed "$p" "$STRING_B" "$B_TAG"
#Find C
elif pcregrep -qiM "$STRING_C" "$p";
then
echo "C"
multil_sed "$p" "$STRING_C" "$C_TAG"
#Find D
elif pcregrep -qiM "$STRING_D" "$p";
then
echo "D"
multil_sed "$p" "$STRING_D" "$D_TAG"
else
echo "No match found"
fi
echo "####################"
done
I should probably note that C is essentially a longer version of D, that has some extra contents before the common part.
What happens is that for some iterations this works ok..
####################
Working on file ./src/listing.txt
A
In multil_sed
Done
####################
and sometimes it doesn't.
####################
Working on file ./src/web/page.html
/home/tekaukor/code/project/tag_adder.sh: line 54: 16904 Segmentation fault (core dumped) pcregrep -qiM "$STRING_A" "$p"
No match found
####################
It's not dependent on which pattern is being searched.
####################
Working on file ./src/test/formatter_test.cpp
/home/tekaukor/code/project/tag_adder.sh: line 54: 18051 Segmentation fault (core dumped) pcregrep -qiM "$STRING_B" "$p"
/home/tekaukor/code/project/tag_adder.sh: line 54: 18053 Segmentation fault (core dumped) pcregrep -qiM "$STRING_C" "$p"
/home/tekaukor/code/project/tag_adder.sh: line 54: 18055 Segmentation fault (core dumped) pcregrep -qiM "$STRING_D" "$p"
No match found
####################
Line 54 points to the line "for p in $(find . -type f -not -iwholename '.git' -exec grep...".
My guess is that sed is causing a buffer overflow, but I haven't found a way to ascertain or fix this.
Bash isn't great about locating the source of a fault in a compound statement so
Line 54 points to the line for p in $(find . -type f ....
is misleading as the error could be anywhere in that for statement block. The error message
Segmentation fault (core dumped) pcregrep -qiM "$STRING_D" "$p"
is much more accurate. And likely the cause of the fault is the -M flag combined with unbounded patterns like (.|\n)* As the pcregrep man page notes:
-M, --multiline
Allow patterns to match more than one line. When this option is given, patterns may usefully contain literal newline characters and internal occurrences of ^ and $ characters. The output for any one match may consist of more than one line. When this option is set, the PCRE library is called in "multiline" mode. There is a limit to the number of lines that can be matched, imposed by the way that pcregrep buffers the input file as it scans it. However, pcregrep ensures that at least 8K characters or the rest of the document (whichever is the shorter) are available for forward matching, and similarly the previous 8K characters (or all the previous characters, if fewer than 8K) are guaranteed to be available for lookbehind assertions.
with emphasis mine. The single pattern fragment .* or (.|\n)* can literally match an entire file, so yes, it will fill up its lookahead buffer not just to the next literal (e.g. http) but until it finds the last such literal, because by default regular expressions seek the longest conforming match.
UPDATE #2: So apparently sed doesn't support non greedy matching, which makes part of my answer invalid. There are ways around this, but I will not include them here as it's far removed from the original question. The answer to this question is using the --disable-stack-for-recursion flag as described below.
The answer by msw helped me in the right direction.
First I changed the regex to be lazy instead of greedy. By default regex is greedy, which (as msw stated) means that a multiline expression with "PATTERN(.|\n)*TEXT" will search through the whole file. By adding "?" after quantifiers (* -> *?) I made the regez lazy, which means that the "(.|\n)*?" in "PATTERN(.|\n)*?TEXT" will stop expanding at the first TEXT.
I also made the optional parts lazy (? -> ??), though I'm not sure if this was necessary.
However this was not enough. I also had to configure pcregrep to use heap instead of stack memory. I downloaded pcre and configured using the flag --disable-stack-for-recursion. Note that using heap is much slower, so you shouldn't do this if you don't have to.
I'm including a step-by-step in case anyone wonders here with the same problem. Note that I'm still a linux newb and there's a high chance that I made something unnecessary and/or stupid. The instructions are based on http://www.mail-archive.com/pcre-dev#exim.org/msg00817.html and http://www.linuxfromscratch.org/blfs/view/svn/general/pcre.html
Download pcre from http://downloads.sourceforge.net/pcre/pcre-8.33.tar.bz2
tar jxf pre-8.33.tar.bz2
cd pcre-8.33
./configure --prefix=/usr --docdir=/usr/share/doc/pcre-8.33 --enable-utf --enable-unicode-properties --enable-pcregrep-libz2 --disable-static --disable-stack-for-recursion
make
sudo make install
There are some additional steps in the provided guide, but I didn't have to do them.
UPDATE: Making the optional elements lazy (? -> ??) is a mistake, as then they will not be included in the matched pattern if possible.
Related
I have many very large files. Within each file it repeats 3 times. My intent is to delete the first portion of all of them such that only the last two repeats remain.
The code I have loops through the lines and identifies the position of each repeat (via a counter) and saves them as a variable (FIRST and END). My hope is that I would then use: sed -i '${FIRST},${END}d ${i}.log' to cut out that section of the file.
However when I run the code I get an error as follows: sed: -e expression #1, char 3: extra characters after command
Here is the code that reads the files, where "Cite" is the keyword that identifies repeats:
while read -r LINE ; do
((LCOUNT++))
if [[ "$LINE" =~ "Cite" ]] ; then
((CITE++))
if [[ "$CITE" = 1 ]] ; then
FIRST=${LCOUNT}
fi
if [[ "$CITE" = 2 ]] ; then
END=$((LCOUNT - 1))
fi
fi
done < "./${i}.log"
Your command
sed -i '${FIRST},${END}d ${i}.log'
does not make sense. You call sed here with two arguments: The option
-i
and a single string which is literally
${FIRST},${END}d ${i}.log
Since you have used single quotes, no parameter expansion occurs, and the whole piece is passed to sed as a single argument to be interpreted as a sed program. sed tries to read from stdin (since you have not passed a file argument), and the sed program obviously does not make sense.
You could do something like
sed $FIRST,${END}d "${i}.log"
A note aside, regarding the title of your post: "numerical variables" do not exist in bash. Every variable is a string. You can do a
typeset -i foo
which makes bash do some processing to ensure that the strings assigned represent natural numbers, but they are still strings. For instance,
foo=abc # sets foo to the string 0
foo=00005 # sets foo to the string 5
foo=5a # raises an error
This might work for you (GNU sed):
sed -ni '/Cite/!{p;b};:a;n;//!ba;:b;n;p;bb' file1 file2 ... filen
Turn off implicit printing -n and turn on edit inplace -i.
If a line does not match Cite, print it and repeat.
Otherwise filter following lines until another match and then print the remaining lines until the end of the file.
N.B. The -i treats each file separately in the same way the -s option does but edits the files inplace, so make sure by using the -s option first and when satisfied the results are as expected substitute the -i option.
Attempting to convert a twitter account of over 10K tweets into another format with a bash script on a maxed out MBP 16" running the latest macOS.
After running for several minutes outputting many periods it says, line 43: /bin/ls: Argument list too long. Assuming this issue relates to the number of tweets so while I could attempt to break into small pieces as a last resort, not knowing what the max number to avoid the error is, decided to first search for a solution.
Searched Google and SO and found, "bash: /bin/ls: Argument list too long". If my issue is the same it sounds like replacing "ls" with "find -name" may help. Tried and same error, but perhaps not the correct syntax.
The two lines that use "ls" currently are the following (the first is the one the error currently complains about):
for fileName in `ls ${thisDir}/dotwPosts/p*` ; do
and
printf "`ls ${thisDir}/dotwPosts/p* | wc -l` posts left to import.\n"
Tried changing the first line to (with the error saying /usr/bin/find: Argument list too long).
for fileName in `find -name ${thisDir}/dotwPosts/p*` ; do
May need to provide additional code, but didn't want to make the question too specific to my needs and more general hopefully for others seeing this common error where the other stackoverflow answer didn't seem to apply.
To iterate over file in a directory in bash, print the filenames as a zero separated stream and read it. That way you don't need to store all filenames at once in any place:
find "${thisDir}/dotwPosts/" -maxdepth 1 -type f -name 'p*' -print0 |
while IFS= read -d '' -r file; do
printf "%s\n" "$file"
done
To get the count, output a single character for each file and count the characters:
find "${thisDir}/dotwPosts/" -maxdepth 1 -type f -name 'p*' -printf . | wc -c
Don't use ` backticks, they use is discouraged. Bash hackers wiki discouraged and deprecated syntax. Use $(...) instead.
for fileName in $(...) is a common antipattern in bash. Most probably if you want to iterate over output of another command, you should use while IFS= read -r line loop. bashfaq How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Try this:
for file in "${thisDir}/dotwPosts/p"*
do
# exclude non plain files
[[ -f $file ]] || continue
# do something with "$file"
...
done
I quoted "${thisDir}/dotwPosts/p", so var thisDir can't contain a relevant wildcards, but works with blanks. Otherwise remove the quotes.
I have some text files $f resembling the following
function
%blah
%blah
%blah
code here
I want to append the following text before the first empty line:
%
%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike
%3.0 Unported License. See notes at the end of this file for more information.
I tried the following:
top=$(cat ./PATH/text.txt)
top="${top//$'\n'/\\n}"
sed -i.bak 's#^$#'"$top"'\\n#' $f
where the second line (I think) preserves the new line in the text and the third line (I think) substitutes the first empty line with the text plus a new empty line.
Two problems:
1- My code appends the following text:
%n%This work is licensed under the Creative Commons
Attribution-NonCommercial-ShareAlike n%3.0 Unported License. See notes
at the end of this file for more information.\n
2- It appends it at end of the file.
Can someone please help me understand the problems with my code?
If you are using GNU sed, following would work.
Use ^$ to find the empty line and then use sed to replace/put the text that you want.
# Define your replacement text in a variable
a="%\n%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike\n%3.0 Unported License. See notes at the end of this file for more information."
Note, $a should include those \n that will be directly interpreted by sed as newlines.
$ sed "0,/^$/s//$a/" inputfile.txt
In the above syntax, 0 represents the first occurrence.
Output:
function
%blah
%blah
%
%This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike
%3.0 Unported License. See notes at the end of this file for more information.
%blah
code here
test
You've included bash and sed tags in your question. Since I can't seem to come up with a way of doing this in sed, here's a bash-only solution. It's likely to perform the worst of all working solutions you might find.
The following works with your sample input:
$ while read -r x; do [[ -z "$x" ]] && cat boilerplate; printf '%s\n' "$x"; done < src
This will however insert the boilerplate before EVERY blank line, which is probably not what you're after. Instead, we should probably make this more than a one-liner:
#!/usr/bin/env bash
y=true
while read -r x; do
if [[ -z "$x" ]] && $y; then
cat boilerplate
y=false
fi
printf '%s\n' "$x"
done < src
Note that unlike the code in your question, this doesn't store your boilerplate in a variable, it just cats it "at the right time".
Note that this sends the combined output to stdout. If your goal is to modify the original file, you'll need to wrap this in something that moves around temporary files. (Note that sed's -i option also doesn't really edit files in place, it only hides the moving-around-temp-files from you.)
The following alternatives are probably a better idea.
A similar solution to the bash one might be achieved with better performance using awk:
awk 'NR==FNR{b=b $0 ORS;next} /^$/&&!y{printf "%s",b;y++} 1' boilerplate src
This awk solution obviously reads your boilerplate into a variable, though it's not a shell variable.
Notwithstanding non-standard platform-specific extensions, awk does not have any facility for editing files "in place" either. A portable solution using awk would still need to push temp files around.
And of course, the following old standard of ed is great to keep in your back pocket:
printf 'H\n/^$/\n-\n.r boilerplate\nw\nq\n' | ed src
In bash, of course, you could always use heretext, which might be clearer:
$ ed src <<< $'H\n/^$/\n-\n.r boilerplate\nw\nq\n'
The ed command is non-stream version of sed. Or rather, sed is the stream version of ed, which has been around since before the dinosaurs and is still going strong.
The commands we're using are separated by newlines and fed to ed's standard input. You can discard stdout if you feel the urge. The commands shown here are:
H - instruct ed to print more useful errors, if it gets any.
/^$/ - search for the first occurrence of a newline.
- - GO BACK ONE LINE. Awesome, right?
.r boilerplate - Read your boilerplate at the current line,
w - and write the file.
q - Quit.
Note that this does not keep a .bak file. You'll need to do that yourself if you really want one.
And if, as you suggested in comments, the filename you're reading is to be constructed from a variable, note that variable expansion does not happen inside format quoting ($' .. '). You can either switch quoting mechanisms mid-script:
ed "$file" <<< $'H\n/^$/\n-\n.r ./TATTOO_'"$currn"$'/top.txt\nw\nq\n'
Or you could put ed script in a variable constructed by printf
printf -v scr 'H\n/^$/\n-\n.r ./TATTOO_%s/top.txt\nw\nq\n' "$currn"
ed "$file" <<< "$scr"`
Adding the text to a variable so you can interpolate the variable is wasteful and an unnecessary complication. sed can easily read the contents of a file by itself.
sed -i.bak '1r./PATH/text.txt' "$f"
Unfortunately, this part of sed is poorly standardized, so you may have to experiment a little bit. Some dialects require a newline (perhaps, or perhaps not, preceded by a backslash) before the filename.
sed -i.bak '1r\
./PATH/text.txt' "$f"
(Notice also the double quotes around the file name. You generally always want double quotes around variables which contain file names. More here.)
Adapting the recipe from here we can extend this to apply to the first empty line instead of the first line.
sed -i.bak -e '/^$/!b' -e 'r./PATH/text.txt' -e :a -e '$!{' -e n -e ba -e } "$f"
This adds the boilerplate after the first empty line but perhaps that's acceptable. Refactoring it to replace it or add an empty line after should not be too challenging anyway. (Maybe use sed -n and instead explicitly print everything except the empty line.)
In brief terms, this skips to the end (simply prints) up until we find the first empty line. Then, we read and print the file, and go into a loop which prints the remainder of the file without returning to the beginning of the script.
sed that I think works. Uses files for the extra bit to be inserted.
b='##\n## comment piece\n##'
sed --posix -ne '
1,/^$/ {
/^$/ {
x;
/^true$/ !{
x
s/^$/true/
i\
'"$b"'
};
x;
s/^.*$//
}
}
p
' file1
with the examples using ranges of 1,/^$/, an empty first line would result in the disclaimer being printed twice. To avoid this, I've set it up to put a flag in the hold space ( x; s/^$/true/ ) that I can swap to the pattern space to check whether its the first blank. Once theres a match for blank line, i\ inserts the comment ($b) in front of the pattern space.
Thanks to ghoti for the initial plan.
I am currently trying to extract ALL matching expressions from a text which e.g. looks like this and put them into an array.
aaaaaaaaa${bbbbbbb}ccccccc${dddd}eeeee
ssssssssssssssssss${TTTTTT}efhsekfh ej
348653jlk3jß1094utß43t59ßgöelfl,-s-fko
The matching expressions are similar to this: ${}. Beware that I need the full expression, not only the word in between this expression! So in this case the result should be an array which contains:
${bbbbbbb}
${dddd}
${TTTTTTT}
Problems I have stumbled upon and couldn't solve:
It should NOT recognizes this as a whole
${bbbbbbb}ccccccc${dddd} but each for its own
grep -o is not installed on the old machine, Perl is not allowed either!
Many commands e.g. BASH_REMATCH only deliver the whole line or the first occurrence of the expression, instead of all matching expressions in the line!
The mentioned pattern \${[^}]*} seems to work partly, as it can extract the first occurrence of the expression, however it always omitts the ones following after that, if it's in the same text line. What I need is ALL matching expressions found in the line, not only the first one.
You could split the string on any of the characters $,{,}:
$ s='...blaaaaa${blabla}bloooo${bla}bluuuuu...'
$ echo "$s"
...blaaaaa${blabla}bloooo${bla}bluuuuu...
$ IFS='${}' read -ra words <<< "$s"
$ for ((i=0; i<${#words[#]}; i++)); do printf "%d %s\n" $i "${words[i]}"; done
0 ...blaaaaa
1
2 blabla
3 bloooo
4
5 bla
6 bluuuuu...
So if you're trying to extract the words inside the braces:
$ for ((i=2; i<${#words[#]}; i+=3)); do printf "%d %s\n" $i "${words[i]}"; done
2 blabla
5 bla
If the above doesn't suit you, grep will work:
$ echo '...blaaaaa${blabla}bloooo${bla}bluuuuu...' | grep -o '\${[^}]\+}'
${blabla}
${bla}
You still haven't told us exactly what output you want.
Since it bugged me a lot I have asked directly on www.unix.com and was kindly provided with a solution which fits for my ancient shell. So if anyone got the same problem here is the solution:
line='aaaa$aa{yyy}aaa${important}xxxxxxxx${important2}oo{o$}oo$oo${importantstring3}'
IFS=\$ read -a words <<< "$line"
regex='^(\{[^}]+})'
for e in "${words[#]}"; do
if [[ $e =~ $regex ]]; then
echo "\$${BASH_REMATCH[0]}";
fi;
done
which prints then the following - without even getting disturbed by random occurrences of $ and { or } between the syntactically correct expressions:
${important}
${important2}
${importantstring3}
I have updated the full solution after I got another update from the forums: now it also ignores this: aaa$aa{yyy}aaaa - which it previously printed as ${yyy} - but which it should completely ignore as there are characters between $ and {. Now with the additional anchoring on the beginning of the regexp it works as expected.
I just found another issue: theoretically using the above approach I would still get a wrong output if the read line looks like this line='{ccc}aaaa${important}aaa'. The IFS would split it and the REGEX would match {ccc} although this hadn't the $ sign in front. This is suboptimal.
However following approach could solve it: after getting the BASH_REMATCH I would need to do a search in the original line - the one I gave to the IFS - for this exact expression ${ccc} - with the difference, that the $ is included! And only if it finds this exact match, only then, it counts as a valid match; otherwise it should be ignored. Kind of a reverse search method...
Updated - add this reverse search to ignore the trap on the beginning of the line:
pattern="\$${BASH_REMATCH[0]}";
searchresult="";
searchresult=`echo "$line" | grep "$pattern"`;
if [ "$searchresult" != "" ]; then echo "It was found!"; fi;
Neglectable issue: If the line looks like this line='{ccc}aaaaaa${ccc}bbbbb' it would recognize the first {ccc} as a valid match (although it isn't) and print it, because the reverse search found the second ${ccc}. Although this is not intended it's irrelevant for my specific purpose as it implies that this pattern does in fact exist at least once in the same line.
I want to include command-parameters-inline comments, e.g.:
sed -i.bak -r \
# comment 1
-e 'sed_commands' \
# comment 2
-e 'sed_commands' \
# comment 3
-e 'sed_commands' \
/path/to/file
The above code doesn't work. Is there a different way for embedding comments in the parameters line?
If you really want comment arguments, can try this:
ls $(
echo '-l' #for the long list
echo '-F' #show file types too
echo '-t' #sort by time
)
This will be equivalent to:
ls -l -F -t
echo is an shell built-in, so does not execute external commands, so it is fast enough. But, it is crazy anyway.
or
makeargs() { while read line; do echo ${line//#*/}; done }
ls $(makeargs <<EOF
-l # CDEWDWEls
-F #Dwfwef
EOF
)
I'd recommend using longer text blocks for your sed script, i.e.
sed -i.bak '
# comment 1
sed_commands
# comment 2
sed_commands
# comment 3
sed_commands
' /path/to/file
Unfortunately, embedded comments in sed script blocks are not universally a supported feature. The sun4 version would let you put a comment on the first line, but no where else. AIX sed either doesnt allow any comments, or uses a different char besides # for comments. Your results may vary.
I Hope this helps.
You could invoke sed multiple times instead of passing all of the arguments to one process:
sed sed_commands | # comment 1
sed sed_commands | # comment 2
sed sed_commands | # comment 3
sed sed_commands # final comment
It's obviously more wasteful, but you may decide that three extra sed processes are a fair tradeoff for readability and portability (to #shellter's point about support for comments within sed commands). Depends on your situation.
UPDATE: you'll also have to adjust if you originally intended to edit the files in place, as your -i argument implies. This approach would require a pipeline.
There isn't a way to do what you seek to do in shell plus sed. I put the comments before the sed script, like this:
# This is a remarkably straight-forward SED script
# -- When it encounters an end of here-document followed by
# the start of the next here document, it deletes both lines.
# This cuts down vastly on the number of processes which are run.
# -- It also does a substitution for XXXX, because the script which
# put the XXXX in place was quite hard enough without having to
# worry about whether things were escaped enough times or not.
cat >$tmp.3 <<EOF
/^!\$/N
/^!\\ncat <<'!'\$/d
s%version XXXX%version $SOURCEDIR/%
EOF
# This is another entertaining SED script.
# It takes the output from the shell script generated by running the
# first script through the second script and into the shell, and
# converts it back into an NMD file.
# -- It initialises the hold space with --#, which is a marker.
# -- For lines which start with the marker, it adds the pattern space
# to the hold space and exchanges the hold and pattern space. It
# then replaces a version number followed by a newline, the marker
# and a version number by the just the new version number, but
# replaces a version number followed by a newline and just the
# marker by just the version number. This replaces the old version
# number with the new one (when there is a new version number).
# The line is printed and deleted.
# -- Note that this code allows for an optional single word after the
# version number. At the moment, the only valid value is 'binary' which
# indicates that the file should not be version stamped by mknmd.
# -- On any line which does not start with the marker, the line is
# copied into the hold space, and if the original hold space
# started with the marker, the line is deleted. Otherwise, of
# course, it is printed.
cat >$tmp.2 <<'EOF'
1{
x
s/^/--#/
x
}
/^--# /{
H
x
s/\([ ]\)[0-9.][0-9.]*\n--# \([0-9.]\)/\1\2/
s/\([ ]\)[0-9.][0-9.]*\([ ][ ]*[^ ]*\)\n--# \([0-9.][0-9.]*\)/\1\3\2/
s/\([ ][0-9.][0-9.]*\)\n--# $/\1/
s/\([ ][0-9.][0-9.]*[ ][ ]*[^ ]*\)\n--# $/\1/
p
d
}
/^--#/!{
x
/^--#/d
}
EOF
There's another sed script in the file that is about 40 lines long (marked as 'entertaining'), though about half those lines are simply embedded shell script added to the output. I haven't changed the shell script containing this stuff in 13 years because (a) it works and (b) the sed scripts scare me witless. (The NMD format contains a file name and a version number separated by space and occasionally a tag word 'binary' instead of a version number, plus comment lines and blank lines.)
You don't have to understand what the script does - but commenting before the script is the best way I've found for documenting sed scripts.
No.
If you put the \ before the # it will escape the comment character and you won't have a comment anymore.
If you put the \ after the # it will be part of the comment and you won't escape the newline anymore.
A lack of inline comments is a limitation of bash that you would do better to adapt to than try and work around with some of the baroque suggestions already put forth.
Although the thread is quite old i did find it for the same question and so will others. Here's my solution for this problem:
You need comments, so that if you look at your code at a much later time you will likely get an idea of what you actually did, when you wrote the code. I am just having the same problem while writing my first rsync script, which has lots of parameters which also have side effects.
Group together your parameter which belong together by topic and put them into a variable, which gets a corresponding name. This makes it easy to identify what the parameter steer. This is your short comment. In addition you can put a comment above the variable declaration to see how you can change the behavior. This is the long version comment.
Call the application with the corresponding parameter variables.
## Options
# Remove --whole-file for delta transfer
sync_filesystem=" --one-file-system \
--recursive \
--relative \
--whole-file \ " ;
rsync \
${sync_filesystem} \
${way_more_to_come} \
"${SOURCE}" \
"${DESTIN}" \
Good overview, easy to edit and like comments in parameters. It takes more effort, but has therefore a higher quality.
I'll suggest another way that works at least in some instances:
Let's say I have the command:
foo --option1 --option2=blah --option3 option3val /tmp/bar`
I can write it this way:
options=(
--option1
--option2=blah
--option3 option3val
)
foo ${options[#]} /tmp/bar
Now let's say I want to temporarily remove the second option. I can just comment it out:
options=(
--option1
# --option2=blah
--option3 option3val
)
Note that this technique may not work when you need extensive escaping or quoting. I have run into some issues with that in the past, but unfortunately I don't recall the details at the moment :(
For most cases, though, this technique works well. If you need embedded blanks in a parameter, just enclose the string in quotes, as normal.