Batch, How to filter this HTML text to set a bit of vars - windows

i'm in a big problem
I use wget to send a post petition to a web, then i receive a html
And i need to filter this sample of the html:
more code up...
<div id="song_html" class="show1">
<div class="left">
<!-- info mp3 here -->
256 kbps<br />3:21<br />6.13 mb </div>
<div id="right_song">
<div style="font-size:15px;"><b>Marilyn Manson - Tainted Love ( Manson Remix) mp3</b></div>
<div style="clear:both;"></div>
<div style="float:left;">
<div style="float:left; height:27px; font-size:13px; padding-top:2px;">
<div style="float:left;">Download</div>
<div style="margin-left:8px; float:left; width:27px; text-align:center;">Play</div>
<div style="margin-left:8px; float:left;">Embed</div>
<div style="margin-left:8px; float:left;">Descarga Tono</div>
<div style="clear:both;"></div>
</div>
<div id="player37119" style="float:left; margin-left:10px;" class="player"></div>
</div>
<div style="clear:both;"></div>
</div>
<div style="clear:both;"></div>
</div>
<div id="song_html" class="show2">
<div class="left">
<!-- info mp3 here -->
</div>
<div id="right_song">
<div style="font-size:15px;"><b>Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3</b></div>
<div style="clear:both;"></div>
<div style="float:left;">
<div style="float:left; height:27px; font-size:13px; padding-top:2px;">
<div style="float:left;">Download</div>
<div style="margin-left:8px; float:left; width:27px; text-align:center;">Play</div>
<div style="margin-left:8px; float:left;">Embed</div>
<div style="margin-left:8px; float:left;">Descarga Tono</div>
<div style="clear:both;"></div>
</div>
<div id="player668416" style="float:left; margin-left:10px;" class="player"></div>
</div>
<div style="clear:both;"></div>
</div>
<div style="clear:both;"></div>
</div>
<div id="morelink" style="margin:10px; text-align:center;">Show More Results</div>
<div id="song_html" class="show3">
<div class="left">
<!-- info mp3 here -->
3:10<br /> </div>
<div id="right_song">
<div style="font-size:15px;"><b>Marilyn Manson - MARILYN MANSON - Rock is Dead mp3</b></div>
<div style="clear:both;"></div>
<div style="float:left;">
<div style="float:left; height:27px; font-size:13px; padding-top:2px;">
<div style="float:left;">Download</div>
<div style="margin-left:8px; float:left; width:27px; text-align:center;">Play</div>
<div style="margin-left:8px; float:left;">Embed</div>
<div style="margin-left:8px; float:left;">Descarga Tono</div>
<div style="clear:both;"></div>
</div>
<div id="player670124" style="float:left; margin-left:10px;" class="player"></div>
</div>
<div style="clear:both;"></div>
</div>
<div style="clear:both;"></div>
</div>
</div>
</div>
<!-- ================= -->
more code down...
...To set a bit of variables like "Name" "Bitrate" "Size" and "Download", To print all this information in Batch, like this:
1st result:
[Name] Marilyn Manson - Tainted Love ( Manson Remix) mp3
[Info] Bitrate: 256 kbps. Length: 3:21. Size: 6.13 mb.
[Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3
2nd result:
[Name] Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3
[Info] NO INFO.
[Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3
3rd result:
[Name] Marilyn Manson - MARILYN MANSON - Rock is Dead mp3
[Info] Lenght: 3:10.
[Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3
I've tryed "Findstr","Find","SED","GREP","FART" but i can't find the way (A line and chars delimitators) to do it right...
The only i can see to make it possible is this line:
<!-- ================= -->
I can use it like a END-delimitator cause that line marks the end of mp3's to download and to print their info...
Somebody can help me?
thankyou

The Batch file below use the fact that the data you want is located at fixed number of lines below "info mp3 here" line. Also, the data is extracted based on its position in the line. If some data not follow this rule, the program will need a modification.
#echo off
setlocal EnableDelayedExpansion
findstr /N /C:"info mp3 here" %1 > "%~N1.tmp"
set lastLine=-1
(for /F "usebackq delims=:" %%a in ("%~N1.tmp") do (
set /A skip=%%a-lastLine
for /L %%i in (1,1,!skip!) do set /P info=
set /P =& set /P name=
for /L %%i in (1,1,4) do set /P download=
set "name=!name:*<b>=!
for /F "delims=<" %%n in ("!name!") do echo [Name] %%n
set "info=!info:<br />= !"
set "info=!info:</div>=!"
set bitrate=
set lenght=
set size=
set value=
for %%t in (!info!) do (
if not defined value (
set value=%%t
) else (
if %%t equ kbps (
set "bitrate=Bitrate: !value! kbps. "
set value=
) else if %%t equ mb (
set "size=Size: !value! mb."
set value=
) else (
set "lenght=Lenght: !value!. "
set value=%%t
)
)
)
if defined value (
set "lenght=Lenght: !value!. "
)
set info=!bitrate!!lenght!!size!
if not defined info set info=NO INFO.
echo [Info] !info!
set "download=!download:"=$!"
for /F "tokens=4 delims=$" %%d in ("!download!") do echo [Download] %%d
set /A lastline=%%a+6
)) < %1
del "%~N1.tmp"
Output:
[Name] Marilyn Manson - Tainted Love ( Manson Remix) mp3
[Info] Bitrate: 256 kbps. Lenght: 3:21. Size: 6.13 mb.
[Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3
[Name] Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3
[Info] NO INFO.
[Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3
[Name] Marilyn Manson - MARILYN MANSON - Rock is Dead mp3
[Info] Lenght: 3:10.
[Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3

Here's a script that will parse the information you want.
The script takes the name of your HTML file as an argument.
Output is sent to a file with name derived by appending '.parsed' to the input file name.
Comments at the top of the script give a little explanation of the patterns being used to locate the requested information in the HTML file.
Replace the two instances of 'TAB' with tab characters and be sure that you retain the single space before each tab.
#!/bin/bash
# Parse HTML with sed, suppressing all unwanted lines
# "Info" lines all start with a number (ignoring whitespace)
# Bitrate and file size can be identified by looking for
# the unit (kbps, mb) immediately following the numeric data
# Length is identified by the colon in the middle of numeric data
# File names are delimited by <b> and </b>
# Lines with the URL all contain Download</a>
# The </a> isn't necessary, but I thought it would be safer to
# include it since one could imagine "Download" appearing in a file name
# Pipe output to Awk for reordering of the parsed lines
# and addition of "NO INFO" lines where necessary
sed -n '
/^[ TAB]*[0-9]/ {
s/^[ TAB]*/[Info] /
s/\([0-9]*:[0-9]*\)[^0-9]*/Length: \1. /
s/\([0-9\.]* .bps\)[^0-9L]*/Bitrate: \1. /
s/\([0-9\.]* .b\)[^p][^0-9LB]*/Size: \1. /
p
}
/<b>/ {
s|</b>.*||
s|.*<b>\(.*\)|[Name] \1|
p
}
\|Download</a>| {
s/^.*\(http:[^"]*\).*/[Download] \1/
p
}' $1 | awk 'BEGIN { no_info = "[INFO] NO INFO.";
info = no_info }
{ if ($1 == "[Name]") name = $0;
else if ($1 == "[Info]") info = $0;
else {
printf("%s\n%s\n%s\n\n", name, info, $0);
info = no_info
} }' > $1.parsed
exit 0

TXR 65 (Runs on Windows; MinGW-compiled .exe available)
#(collect)
<div id="song_html" class="show#nil">
<div class="left">
<!-- info mp3 here -->
#(gather :vars ((bitrate nil) (length nil) (size nil)))
#bitrate kbps#(skip)
#(skip)#{length /\d+:\d\d/}#(skip)
#(skip)#{size /\d+\.\d\d/} mb#(skip)
#(until)
<div id="right_song">
#(end)
#(bind info #(if (or bitrate length size)
(let ((s (make-string-output-stream)))
(if bitrate
(format s "Bitrate: ~a kbps. " bitrate))
(if length
(format s "Length: ~a. " length))
(if size
(format s "Size: ~a mb. " size))
(get-string-from-stream s))
"NO INFO."))
<div id="right_song">
<div style="font-size:15px;"><b>#title</b></div>
<div style="clear:both;"></div>
<div style="float:left;">
<div style="float:left; height:27px; font-size:13px; padding-top:2px;">
<div style="float:left;">Download</div>
#(until)
<!-- ================= -->
#(end)
#(output)
# (repeat)
[Name] #title
[Info] #info
[Download] #link
# (end)
#(end)
Run:
$ txr data.txr data.html
[Name] Marilyn Manson - Tainted Love ( Manson Remix) mp3
[Info] Bitrate: 256 kbps. Length: 3:21. Size: 6.13 mb.
[Download] http://rockass.free.fr/video/Marilyn Manson - Taited Love.mp3
[Name] Spaz Marilyn Manson Metric - grow up and blow the great big dj://spaz, marilyn manson mp3
[Info] NO INFO.
[Download] http://spaz.mindstab.net/djspaz_-_grow_up_and_blow_the_great_big_white_nietzche.mp3
[Name] Marilyn Manson - MARILYN MANSON - Rock is Dead mp3
[Info] Length: 3:10.
[Download] http://www.bricbrac.free.fr/Music/01___MARILYN_MANSON___ROCK_.MP3

Related

Windows "Symbol" or WGL4 font code point to UNICODE map?

We use some OCR like PDF to Word converter which is the best we could find, but it uses the Symbol font table where for example, the degree symbol, appears as the code point U+F0B0, which is not a valid UNICODE point but it has a mapping to the proper UNICODE degree code point U+00B0. In fact all but one of the the Symbol font glyphs have a proper UNICODE character, but I am pulling my hair out not finding any table that would show a simple mapping.
This page http://www.alanwood.net/demos/symbol.html almost has it, but it doesn't actually show the Symbol font code points, but relies on some other mapping which, frankly, I don't understand at all. That same site has related pages but nowhere do I find F0B0 referenced for degree.
I found groff mappings of these special fonts to the old groff abbreviations, and it is the best I can get, there I can find in symbol.map a mapping of F0B0 to the abbreviation "de" and then I can find in text.map a mapping from 00B0 to "de". So if I was to reshape these two files to a relational table and then join on the abbreviation, I suppose I could create a mapping.
But I am stunned that nobody had to do that before? Anyone?
Ah well, I guess I didn't ask for a dissertation on the first principles of all possible symbolic fonts, no, I asked for whatever that Windows "Symbol" font is, that WGL4 code page or whatever I suppose "Monotype Symbol" font that is.
So here is what I did to generate the mapping from these groff font abbreviation maps I pointed to in my question:
wget https://opensource.apple.com/source/groff/groff-39/groff/font/devlj4/generate/symbol.map
sed -e '/^#/d' -e '/^ *$/d' -e 's/[\t ][\t ]*/|/g' symbol.map |cut -d\| -f2,3 |sort -t\| -k2 >symbol.map.dat
wget https://opensource.apple.com/source/groff/groff-39/groff/font/devlj4/generate/text.map
sed -e '/^#/d' -e '/^ *$/d' -e 's/[\t ][\t ]*/|/g' text.map |cut -d\| -f2,3 |sort -t\| -k2 >text.map.dat
wget https://opensource.apple.com/source/groff/groff-39/groff/font/devlj4/generate/special.map
sed -e '/^#/d' -e '/^ *$/d' -e 's/[\t ][\t ]*/|/g' special.map |cut -d\| -f2,3 |sort -t\| -k2 >special.map.dat
cat text.map.dat special.map.dat |sort -t\| -k2 > unicode.map.dat
join -t\| -1 2 -2 2 symbol.map.dat unicode.map.dat
Then I create an XML mapping table from that, which I use in my XSLT:
join -t\| -1 2 -2 2 symbol.map.dat unicode.map.dat |sed -e 's~\([^|]*\)|\([^|]*\)|\([^|]*\)~<a abb="\1" sym="\&#x\2;" uni="\&#x\3;"/>~'
this creates:
<a abb="" sym="" uni="≃"/>
<a abb="!" sym="" uni="!"/>
<a abb="!=" sym="" uni="≠"/>
<a abb="#" sym="" uni="#"/>
<a abb="%" sym="" uni="%"/>
<a abb="&" sym="" uni="&"/>
<a abb="(" sym="" uni="("/>
<a abb=")" sym="" uni=")"/>
<a abb="**" sym="" uni="∗"/>
<a abb="*A" sym="" uni="A"/>
<a abb="*B" sym="" uni="B"/>
<a abb="*C" sym="" uni="Ξ"/>
<a abb="*D" sym="" uni="∆"/>
<a abb="*E" sym="" uni="E"/>
<a abb="*F" sym="" uni="Φ"/>
<a abb="*G" sym="" uni="Γ"/>
<a abb="*H" sym="" uni="Θ"/>
<a abb="*I" sym="" uni="I"/>
<a abb="*K" sym="" uni="K"/>
<a abb="*L" sym="" uni="Λ"/>
<a abb="*M" sym="" uni="M"/>
<a abb="*N" sym="" uni="N"/>
<a abb="*O" sym="" uni="O"/>
<a abb="*P" sym="" uni="Π"/>
<a abb="*Q" sym="" uni="Ψ"/>
<a abb="*R" sym="" uni="P"/>
<a abb="*S" sym="" uni="Σ"/>
<a abb="*T" sym="" uni="T"/>
<a abb="*U" sym="" uni="Υ"/>
<a abb="*W" sym="" uni="Ω"/>
<a abb="*X" sym="" uni="X"/>
<a abb="*Y" sym="" uni="H"/>
<a abb="*Z" sym="" uni="Z"/>
<a abb="*a" sym="" uni="α"/>
<a abb="*b" sym="" uni="β"/>
<a abb="*c" sym="" uni="ξ"/>
<a abb="*d" sym="" uni="δ"/>
<a abb="*e" sym="" uni=""/>
<a abb="*f" sym="" uni="φ"/>
<a abb="*g" sym="" uni="γ"/>
<a abb="*h" sym="" uni="θ"/>
<a abb="*i" sym="" uni="ι"/>
<a abb="*k" sym="" uni="κ"/>
<a abb="*l" sym="" uni="λ"/>
<a abb="*m" sym="" uni="μ"/>
<a abb="*n" sym="" uni="ν"/>
<a abb="*o" sym="" uni="ο"/>
<a abb="*p" sym="" uni="π"/>
<a abb="*q" sym="" uni="ψ"/>
<a abb="*r" sym="" uni="ρ"/>
<a abb="*s" sym="" uni="σ"/>
<a abb="*t" sym="" uni="τ"/>
<a abb="*u" sym="" uni="υ"/>
<a abb="*w" sym="" uni="ω"/>
<a abb="*x" sym="" uni="χ"/>
<a abb="*y" sym="" uni="η"/>
<a abb="*z" sym="" uni="ζ"/>
<a abb="+-" sym="" uni="±"/>
<a abb="+f" sym="" uni="ϕ"/>
<a abb="+h" sym="" uni="ϑ"/>
<a abb="+p" sym="" uni="ϖ"/>
<a abb="," sym="" uni=","/>
<a abb="->" sym="" uni="→"/>
<a abb="." sym="" uni="."/>
<a abb="/" sym="" uni="/"/>
<a abb="/_" sym="" uni="∠"/>
<a abb="0" sym="" uni="0"/>
<a abb="1" sym="" uni="1"/>
<a abb="2" sym="" uni="2"/>
<a abb="3" sym="" uni="3"/>
<a abb="3d" sym="" uni="∴"/>
<a abb="4" sym="" uni="4"/>
<a abb="5" sym="" uni="5"/>
<a abb="6" sym="" uni="6"/>
<a abb="7" sym="" uni="7"/>
<a abb="8" sym="" uni="8"/>
<a abb="9" sym="" uni="9"/>
<a abb=":" sym="" uni=":"/>
<a abb=";" sym="" uni=";"/>
<a abb="<" sym="" uni="<"/>
<a abb="<-" sym="" uni="←"/>
<a abb="<=" sym="" uni="≤"/>
<a abb="<>" sym="" uni="↔"/>
<a abb="=" sym="" uni="="/>
<a abb="==" sym="" uni="≡"/>
<a abb="=~" sym="" uni="≅"/>
<a abb=">" sym="" uni=">"/>
<a abb=">=" sym="" uni="≥"/>
<a abb="?" sym="" uni="?"/>
<a abb="AN" sym="" uni="∧"/>
<a abb="Ah" sym="" uni="ℵ"/>
<a abb="CL" sym="" uni="♣"/>
<a abb="CR" sym="" uni="↵"/>
<a abb="DI" sym="" uni="♦"/>
<a abb="Eu" sym="" uni="€"/>
<a abb="HE" sym="" uni="♥"/>
<a abb="Im" sym="" uni="ℑ"/>
<a abb="OR" sym="" uni="∨"/>
<a abb="Re" sym="" uni="ℜ"/>
<a abb="SP" sym="" uni="♠"/>
<a abb="[" sym="" uni="["/>
<a abb="]" sym="" uni="]"/>
<a abb="_" sym="" uni="_"/>
<a abb="ap" sym="" uni="~"/>
<a abb="arrowvertbt" sym="" uni="⇓"/>
<a abb="arrowverttp" sym="" uni="⇑"/>
<a abb="c*" sym="" uni="⊗"/>
<a abb="c+" sym="" uni="⊕"/>
<a abb="ca" sym="" uni="∩"/>
<a abb="cu" sym="" uni="∪"/>
<a abb="da" sym="" uni="↓"/>
<a abb="de" sym="" uni="°"/>
<a abb="di" sym="" uni="÷"/>
<a abb="es" sym="" uni="∅"/>
<a abb="f/" sym="" uni="∕"/>
<a abb="fa" sym="" uni="∀"/>
<a abb="fm" sym="" uni="′"/>
<a abb="gr" sym="" uni="∇"/>
<a abb="hA" sym="" uni="⇔"/>
<a abb="ib" sym="" uni="⊆"/>
<a abb="if" sym="" uni="∞"/>
<a abb="integral" sym="" uni="∫"/>
<a abb="ip" sym="" uni="⊇"/>
<a abb="lA" sym="" uni="⇐"/>
<a abb="la" sym="" uni="〈"/>
<a abb="lz" sym="" uni="◇"/>
<a abb="mi" sym="" uni="−"/>
<a abb="mo" sym="" uni="∈"/>
<a abb="mu" sym="" uni="×"/>
<a abb="nb" sym="" uni="⊄"/>
<a abb="nm" sym="" uni="∉"/>
<a abb="no" sym="" uni="¬"/>
<a abb="pd" sym="" uni="∂"/>
<a abb="pl" sym="" uni="+"/>
<a abb="pp" sym="" uni="⊥"/>
<a abb="product" sym="" uni="∏"/>
<a abb="pt" sym="" uni="∝"/>
<a abb="rA" sym="" uni="⇒"/>
<a abb="ra" sym="" uni="〉"/>
<a abb="sb" sym="" uni="⊂"/>
<a abb="sd" sym="" uni="″"/>
<a abb="sp" sym="" uni="⊃"/>
<a abb="st" sym="" uni="∍"/>
<a abb="sum" sym="" uni="∑"/>
<a abb="te" sym="" uni="∃"/>
<a abb="ts" sym="" uni="ς"/>
<a abb="u2026" sym="" uni="…"/>
<a abb="u2320" sym="" uni="⌠"/>
<a abb="u2321" sym="" uni="⌡"/>
<a abb="ua" sym="" uni="↑"/>
<a abb="wp" sym="" uni="℘"/>
<a abb="~=" sym="" uni="≈"/>
or I can also create a lookup string of these invalid UNICODE points and a string of the position-matched proper UNICODE point:
join -t\| -1 2 -2 2 symbol.map.dat unicode.map.dat |sed -e 's~\([^|]*\)|\([^|]*\)|\([^|]*\)~\1|\&#x\2;|\&#x\3;~' > symbol-unicode.map.dat
echo '<a sym="'$(cut -d\| -f2 symbol-unicode.map.dat |tr -d '\n')'" uni="'$(cut -d\| -f3 symbol-unicode.map.dat |tr -d '\n')'"/>'
which gives me:
<a sym=""
uni="≃!≠#%&()∗ABΞ∆EΦΓΘIKΛMNOΠΨPΣTΥΩXHZαβξδφγθικλμνοπψρστυωχηζ±ϕϑϖ,→./∠0123∴456789:;<←≤↔=≡≅>≥?∧ℵ♣↵♦€♥ℑ∨ℜ♠[]_~⇓⇑⊗⊕∩∪↓°÷∅∕∀′∇⇔⊆∞∫⊇⇐〈◇−∈×⊄∉¬∂+⊥∏∝⇒〉⊂″⊃∍∑∃ς…⌠⌡↑℘≈">
By the way, there is a funny thing about the Stack Exchange platform that I can show you the symbols I have here, first the bad ones, which will probably show up all as boxes, unless you tweak your local CSS style="font-family: 'Symbol';":

and now the UNIICODE string:
≃!≠#%&()∗ABΞ∆EΦΓΘIKΛMNOΠΨPΣTΥΩXHZαβξδφγθικλμνοπψρστυωχηζ±ϕϑϖ,→./∠0123∴456789:;<←≤↔=≡≅>≥?∧ℵ♣↵♦€♥ℑ∨ℜ♠[]_~⇓⇑⊗⊕∩∪↓°÷∅∕∀′∇⇔⊆∞∫⊇⇐〈◇−∈×⊄∉¬∂+⊥∏∝⇒〉⊂″⊃∍∑∃ς…⌠⌡↑℘≈
Pretty neat that.
Perhaps it can help someone else struggling with the same issue needing a quick practical solution. You're welcome.

Extract part of a curl return in Bash to allocate to a variable

I would like to extract a string value from a curl returned webpage in a bash script but am unsure how to go about this?
The value I am interested in is always returned by curl looks like this:
<head>
<title>UKIPVPN.COM FREE VPN Service</title>
<style type='text/css'>
#button {
width:180px;
height:60px;
font-family:verdana,arial,helvetica,sans-serif;
font-size:20px;
font-weight: bold;
}
</style>
</head>
<br>
<br>
<font color=blue><center> <h1>Welcome to Free UK IP VPN Service</h1> </center></font>
<form method='post' action='http://www.ukipvpn.com'>
<center><input type='hidden' name='sessionid' value='4b5q43mhhgl95nsa9v9lg8kac7'></center><br>
<center><input id='button' type='submit' value=' I AGREE ' /><br><br> <h2> Your TOS Let me use the Free VPN Service</h2></center>
</form>
<br><center><font size='2'>No illegal activities allowed. In case of abuse, users' VPN access log is subjected to expose to related authorities.</font></center>
</html>
The value I would like to extract to a variable in Bash is the value='this is the value i am interested in'.
Thanks for any help;
Andy
You could try the below.
$ val=$(curl somelink | grep -oP "name='sessionid'[^<>]*\bvalue\s*=\s*'\K[^']*")
There are some arguments against using regex to parse HTML.
Here's a more robust XPath based version using tidy and xmlstarlet:
var=$(curl someurl |
tidy -asxml 2> /dev/null |
xmlstarlet sel -t -v '//_:input[#name="sessionid"]/#value' 2> /dev/null);

sed variable substitution not working

This problems been driving me up the wall for a bit now, I've searched a ton and it seems like nobody elses solutions work for me...
sed "s|{{/each}}| -->\n $photostr |" $1
So I'm trying to end a comment, and slap in my photo string. Here's what $photostr is
<a data-gallery="gallery" href="The_Great_Wave.jpg" title=" The Great Wave off Kanagawa"> <img src="bar" alt=" The Great Wave off Kanagawa"/></a>
<a data-gallery="gallery" href="Mt_Fuji.jpg" title=" Mount Fuji (the highest mountain in Japan)"> <img src="bar" alt=" Mount Fuji (the highest mountain in Japan)"/></a>
<a data-gallery="gallery" href="Beach.jpg" title=" Waves Crashing on the Beach"> <img src="bar" alt=" Waves Crashing on the Beach"/></a>
<a data-gallery="gallery" href="Elephant.jpg" title=" An Elephant in the Serengeti"> <img src="bar" alt=" An Elephant in the Serengeti"/></a>
<a data-gallery="gallery" href="Milky_Way.jpg" title=" The Milky Way Galaxy (contains our Solar System)"> <img src="bar" alt=" The Milky Way Galaxy (contains our Solar System)"/></a>
<a data-gallery="gallery" href="Poppies.jpg" title=" Poppies in Bloom"> <img src="bar" alt=" Poppies in Bloom"/></a>
So it's chok full of meta characters, so I'm using pipes as delimeters, but I get this error...
sed: -e expression #1, char 70: unterminated `s' command
For input, $1 is a file that has html and some {{metatag}} in it that need appropriate substitutions to make a working webpage. The bit I'm concerned with,
</a>
{{/each}}
</div>
should be turned into...
</a>
-->
<a data-gallery="gallery" href="The_Great_Wave.jpg" title=" The Great Wave off Kanagawa"> <img src="bar" alt=" The Great Wave off Kanagawa"/></a>
<a data-gallery="gallery" href="Mt_Fuji.jpg" title=" Mount Fuji (the highest mountain in Japan)"> <img src="bar" alt=" Mount Fuji (the highest mountain in Japan)"/></a>
<a data-gallery="gallery" href="Beach.jpg" title=" Waves Crashing on the Beach"> <img src="bar" alt=" Waves Crashing on the Beach"/></a>
<a data-gallery="gallery" href="Elephant.jpg" title=" An Elephant in the Serengeti"> <img src="bar" alt=" An Elephant in the Serengeti"/></a>
<a data-gallery="gallery" href="Milky_Way.jpg" title=" The Milky Way Galaxy (contains our Solar System)"> <img src="bar" alt=" The Milky Way Galaxy (contains our Solar System)"/></a>
<a data-gallery="gallery" href="Poppies.jpg" title=" Poppies in Bloom"> <img src="bar" alt=" Poppies in Bloom"/></a>
Encode the line breaks in the value of your variable as \n before you use it:
photostr=$(sed ':a;N;$!ba;s/\n/\\n/g' <<< "$photostr")
sed "s|{{/each}}| -->\n $photostr |" $1

How can I embed some HTML in a shell script?

I want to include some HTML in a shell script. This is what I've tried:
(
echo "<html>
<head>
<title>HTML E-mail</title>
</head>
<body>
<p style="font-family:verdana;color:red;">
This text is in Verdana and red</p>
</body>
</html>"
)>pkll.htm
However, instead of writing the HTML to file, it gives me some errors:
> bash: color:red: command not found bash: > This text is in Verdana and
> red</p </body> </html>: No such file or directory
How can I do this?
A better option would be to use the here document syntax (see this answer):
cat << 'EOF' > pkll.htm
<html>
<head>
<title>HTML E-mail</title>
</head>
<body>
<p style="font-family:verdana;color:red;">
This text is in Verdana and red
</p>
</body>
</html>
EOF
Your attempt failed because the double quotes in the HTML terminates the double quotes you wrapped around it and causing the <>s to be seen as redirections and the ;s to terminate the echo command.
You could technically have used single quotes:
(
echo '<html>
<head>
<title>HTML E-mail</title>
etc ...'
)>pkll.htm
but then you just have the same problem again if the HTML contains a ', such as an apostrophe or in an attribute. The here document has no such issues.
You need to escape the quote in the html, because you have a quote on the start of the argument to echo.
Your terminal interprets it as
<html>...<p style="
first argument
font-family:verdana;
as second argument, and the rest as other commands because you have a semicolon.
So you need to replace the p tag into
<p style=\"font-family:verdana;color:red;\">
Read the Advanced Bash-Scripting Guide Chapter 19. Here Documents. http://tldp.org/LDP/abs/html/here-docs.html
cat << 'EOF' > pkll.htm
<html>
<head>
<title>HTML E-mail</title>
</head>
<body>
<p style="font-family:verdana;color:red;">
This text is in Verdana and red</p>
</body>
</html>
EOF
You can use the online tool to do the same thing:
http://togotutor.com/code-to-html/shell-to-html.php

How to preserve gettext() translations?

Suppose I have a example.php file like that:
<p>
<?php echo _('Hello world') ?>
</p>
<p>
<b><?php echo _('the end') ?>
</p>
If I extract strings:
xgettext example.php
I get a messages.mo file, that I can open with poedit, translate, create a .po file, etc. That's ok, the problem is when I edit my original and already translated example.php:
<p>
<?php echo _('Hello world') ?>
</p>
<p>
<?php echo _('new string') ?>
</p>
<p>
<b><?php echo _('the end') ?>
</p>
I've added a new string and if I execute xgettext again I get a messages.mo file where all strings are empty, so I have to use poedit and translate again all strings. How can I re-use my previous translations?
You can merge two po files together with msgmerge. If the source string is unchanged the merge should work perfectly, if it has changed you obviously may have to do some work to get things translated again, and of course you will have to translate any entirely new strings.
msgmerge -o results.po my_existing_translations.po untranslated_xgettext_output.po

Resources