What would be the sed command for mac shell scripting that would replace all iterations of string "fox" with the entire string content of myFile.txt.
myFile.txt would be html content with line breaks and all kinds of characters. An example would be
</div>
</div>
<br>
<div id="container2">
<div class="question" onclick="javascript:show('answer2')";>
Thanks!
EDIT 1
This is my actual code:
sed -i.bkp '/Q/{
s/Q//g
r /Users/ericbrotto/Desktop/question.txt
}' $file
When I run it I get:
sed in place editing only works for regular files.
And in my files the Q is replaced by a ton of chinese characters (!). Bizarre!
You can use the r command. When you find a 'fox' in the input...
/fox/{
...replace it for nothing...
s/fox//g
...and read the input file:
r f.html
}
If you have a file such as:
$ cat file.txt
the
quick
brown
fox
jumps
over
the lazy dog
fox dog
the result is:
$ sed '/fox/{
s/fox//g
r f.html
}' file.txt
the
quick
brown
</div>
</div>
<br>
<div id="container2">
<div class="question" onclick="javascript:show('answer2')";>
jumps
over
the lazy dog
dog
</div>
</div>
<br>
<div id="container2">
<div class="question" onclick="javascript:show('answer2')";>
EDIT: to alter the file being processed, just pass the -i flag to sed:
sed -i '/fox/{
s/fox//g
r f.html
}' file.txt
Some sed versions (such as my own one) require you to pass an extension to the -i flag, which will be the extension of a backup file with the old content of the file:
sed -i.bkp '/fox/{
s/fox//g
r f.html
}' file.txt
And here is the same thing as a one liner, which is also compatible with Makefile
sed -i -e '/fox/{r f.html' -e 'd}'
Ultimately what I went with which is a lot simpler than a lot of solutions I found online:
str=xxxx
sed -e "/$str/r FileB" -e "/$str/d" FileA
Supports templating like so:
str=xxxx
sed -e "/$str/r $fileToInsert" -e "/$str/d" $fileToModify
Another method (minor variation to other solutions):
If your filenames are also variable ( e.g. $file is f.html and the file you are updating is $targetfile):
sed -e "/fox/ {" -e "r $file" -e "d" -e "}" -i "$targetFile"
Related
If I run this code in bash:
echo dog dog dos | sed -r 's:dog:log:'
it gives output:
log dog dos
How can I make it replace all occurrences of dog?
You should add the g modifier so that sed performs a global substitution of the contents of the pattern buffer:
echo dog dog dos | sed -e 's:dog:log:g'
For a fantastic documentation on sed, check http://www.grymoire.com/Unix/Sed.html. This global flag is explained here: http://www.grymoire.com/Unix/Sed.html#uh-6
The official documentation for GNU sed is available at http://www.gnu.org/software/sed/manual/
You have to put a g at the end, it stands for "global":
echo dog dog dos | sed -r 's:dog:log:g'
^
I'm using hxselect to process a HTML file in bash.
In this file there are multiple divs defined with the '.row' class.
In bash I want to extract these 'rows' into an array. (The divs are multilined so simply reading it line-by-line is not suitable.)
Is it possible to achieve this? (With basic tools, awk, grep, etc.)
After assigning rows to an array, I want to further process it:
for row in ROWS_EXTRACTED; do
PROCESS1($row)
PROCESS2($row)
done
Thank you!
One possibility would be to put the content of the tags in an array with each item enclosed in quotes. For example:
# Create array with " " as separator
array=`cat file.html | hxselect -i -c -s '" "' 'div.row'`
# Add " to the beginning of the string and remove the last
array='"'${array%'"'}
Then, processing in a for loop
for index in ${!array[*]}; do printf " %s\n\n" "${array[$index]}"; done
If the tags contain the quote character, another solution would be to place a separator character not found in the tags content (§ in my example) :
array=`cat file.html | hxselect -i -c -s '§' 'div.row'`
Then do a treatment with awk :
# Keep only the separators to count them with ${#res}
res="${array//[^§]}"
for (( i=1; i<=${#res}; i++ ))
do
echo $array2 | awk -v i="$i" -F § '{print $i}'
echo "----------------------------------------"
done
The following instructs hxselect to separate matches with a tab, deletes all newlines, and then translates the tab separators to newlines. This enables you to iterate over the divs as lines with read:
#!/bin/bash
divs=$(hxselect -s '\t' .row < "$1" | tr -d '\n' | tr '\t' '\n')
while read -r div; do
echo "$div"
done <<< "$divs"
Given the following test input:
<div class="container">
<div class="row">
herp
derp
</div>
<div class="row">
derp
herp
</div>
</div>
Result:
$ ./test.sh test.html
<div class="row"> herp derp </div>
<div class="row"> derp herp </div>
I'm trying to write a basic script to compile HTML file includes.
The premise goes like this:
I have 3 files
test.html
<div>
#include include1.html
<div>content</div>
#include include2.html
</div>
include1.html
<span>
banana
</span>
include2.html
<span>
apple
</span>
My desired output would be:
output.html
<div>
<span>
banana
</span>
<div>content</div>
<span>
apple
</span>
</div>
I've tried the following:
sed "s|#include \(.*)|$(cat \1)|" test.html >output.html
This returns cat: 1: No such file or directory
sed "s|#include \(.*)|cat \1|" test.html >output.html
This runs but gives:
output.html
<div>
cat include1.html
<div>content</div>
cat include2.html
</div>
Any ideas on how to run cat inside sed using group substitution? Or perhaps another solution.
I wrote this 15-20 years ago to recursively include files and it's included in the article I wrote about how/when to use getline under "Applications" then "d)". I tweaked it now to work with your specific "#include" directive, provide indenting to match the "#include" indentation, and added a safeguard against infinite recursion (e.g. file A includes file B and file B includes file A):
$ cat tst.awk
function read(file,indent) {
if ( isOpen[file]++ ) {
print "Infinite recursion detected" | "cat>&2"
exit 1
}
while ( (getline < file) > 0) {
if ($1 == "#include") {
match($0,/^[[:space:]]+/)
read($2,indent substr($0,1,RLENGTH))
} else {
print indent $0
}
}
close(file)
delete isOpen[file]
}
BEGIN{
read(ARGV[1],"")
exit
}
.
$ awk -f tst.awk test.html
<div>
<span>
banana
</span>
<div>content</div>
<span>
apple
</span>
</div>
Note that if include1.html itself contained a #include ... directive then it'd be honored too, and so on. Look:
$ for i in test.html include?.html; do printf -- '-----\n%s\n' "$i"; cat "$i"; done
-----
test.html
<div>
#include include1.html
<div>content</div>
#include include2.html
</div>
-----
include1.html
<span>
#include include3.html
</span>
-----
include2.html
<span>
apple
</span>
-----
include3.html
<div>
#include include4.html
</div>
-----
include4.html
<span>
grape
</span>
.
$ awk -f tst.awk test.html
<div>
<span>
<div>
<span>
grape
</span>
</div>
</span>
<div>content</div>
<span>
apple
</span>
</div>
With a non-GNU awk I'd expect it to fail after about 20 levels of recursion with a "too many open files" error so get gawk if you need to go deeper than that or you'd have to write your own file management code.
If you have GNU sed, you can use the e flag to the s command, which executes the current pattern space as a shell command and replaces it with the output:
$ sed 's/#include/cat/e' test.html
<div>
<span>
banana
</span>
<div>content</div>
<span>
apple
</span>
</div>
Notice that this doesn't take care of indentation, as the included files don't have any. An HTML prettifier like Tidy can help you further for this:
$ sed 's/#include/cat/e' test.html | tidy -iq --show-body-only yes
<div>
<span>banana</span>
<div>
content
</div><span>apple</span>
</div>
GNU has a command to read a file, r, but the filename can't be generated on the fly.
As Ed points out in his comment, this is vulnerable to shell command injection: if you have something like
#include $(date)
you'll notice that the date command was actually run. This can be prevented, but the conciseness if the original solution is out the window then:
sed 's|#include \(.*\)|cat "$(/usr/bin/printf "%q" '\''\1'\'')"|e' test.html
This still replaces #include with cat, but additionally wraps the rest of the line into a command substitution with printf "%q", so a line such as
#include include1.html
becomes
cat "$(/usr/bin/printf "%q" 'include1.html')"
before being executed as a command. This expands to
cat include1.html
but if the file were named $(date), it becomes
cat '$(date)'
(note the single quotes), preventing the injected command from being executed.
Because s///e seems to use /bin/sh as its shell, you can't rely on Bash's %q format specification in printf to exist, hence the absolute path to the printf binary. For readability, I've changed the / delimiters of the s command to | (so I don't have to escape \/usr\/bin\/printf).
Lastly, the quoting mess around \1 is to get a single quote into a single quoted string: '\'' becomes '.
You may use this bash script that uses a regex to detect line starting with #include and grabs include filename using a capture group:
re="#include +([^[:space:]]+)"
while IFS= read -r line; do
[[ $line =~ $re ]] && cat "${BASH_REMATCH[1]}" || echo "$line"
done < test.html
<div>
<span>
banana
</span>
<div>content</div>
<span>
apple
</span>
</div>
Alternatively you may use this awk script to do the same:
awk '$1 == "#include"{system("cat " $2); next} 1' test.html
There are many examples here and elsewhere on the interwebs for using sed's 'r' to replace a pattern, but it does not seem to work on a range, but maybe I'm just not holding it right.
The following works as expected, deleting BEGIN PATTERN and replacing it with the contents of /tmp/somefile.
sed -n "/BEGIN PATTERN/{ r /tmp/somefile d }" TARGET_FILE
This, however, only replaces END_PATTERN with the contents of /tmp/somefile.
sed -n "/BEGIN PATTERN/,/END PATTERN/ { r /tmp/somefile d }" TARGET_FILE
I suppose I could try perl or awk to do this as well, but it seems like sed should be able to do this.
I believe that this does what you want:
sed $'/BEGIN PATTERN/r somefile\n /BEGIN PATTERN/,/END PATTERN/d' file
Or:
sed -e '/BEGIN PATTERN/r somefile' -e '/BEGIN PATTERN/,/END PATTERN/d' file
How it works
/BEGIN PATTERN/r somefile
Whenever BEGIN PATTERN is found, this inserts the contents of somefile.
/BEGIN PATTERN/,/END PATTERN/d
Whenever we are in the range from a line with /BEGIN PATTERN/ to a line with /END PATTERN/, we delete (d) the contains of the pattern buffer.
Example
Let's consider these two test files:
$ cat file
prelude
BEGIN PATTERN
middle
END PATTERN
afterthought
and:
$ cat somefile
This is
New.
Our command produces:
$ sed $'/BEGIN PATTERN/r somefile\n /BEGIN PATTERN/,/END PATTERN/d' file
prelude
This is
New.
afterthought
This might work for you (GNU sed):
sed -e '/BEGIN PATTERN/,/END PATTERN/{/END PATTERN/!d;r somefile' -e 'd}' file
John1024's answer works if BEGIN PATTERN and END PATTERN are different. If this is not the case, the following works:
sed $'/PATTERN/,/PATTERN/d; 1,/PATTERN/ { /PATTERN/r somefile\n }' file
By preserving the pattern:
sed $'/PATTERN/,/PATTERN/ { /PATTERN/!d; }; 1,/PATTERN/ { /PATTERN/r somefile\n }' file
This solution can yield false positives if the pattern is not paired as potong pointed out.
I would like to replace a digit between two HTML tags, but still have a problem and command sed does not work:
string to replace - <p key=SaveFile>0</p>
new string - <p key=SaveFile>1</p>
Code:
sed -i 's/\<p key\=SaveFile\>0\<\/p\>/<p key=SaveFile>1<\/p>/' newfile.xml
It's easier if you use another delimiter for s like | or #:
echo "<p key=SaveFile>0</p>" | sed 's|<p key=SaveFile>0</p>|<p key=SaveFile>1</p>|'
If you want to replace any number between the two tags simply use [0-9]\+ or [0-9]+ (with option -r):
echo "<p key=SaveFile>1234</p>" | sed 's|<p key=SaveFile>[0-9]\+</p>|<p key=SaveFile>1</p>|'
Output:
<p key=SaveFile>1</p>
Application can be
sed -i 's|<p key=SaveFile>0</p>|<p key=SaveFile>1</p>|' newfile.xml
Or with g:
sed -i 's|<p key=SaveFile>0</p>|<p key=SaveFile>1</p>|g' newfile.xml