Decoding Outlook's "J" smilies into text - outlook

Outlook by default transforms ":)" into "J". Googling has revealed that J is the Wingdings character for the smilie, so they appear like normal smilies to Outlook users.
I want to transform these "J" smilies into the UTF-8 smiley characters so the rest of us can see them too. But when looking at the source of an email, the "J" smilie lines look like this:
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">J<o:p></o:p></span></p>
That is, just like normal Outlook formatting that you see on every other line. No font-family other than Calibri and Times New Roman is even mentioned anywhere in the source. So just how is Outlook decoding these J's back into smilies? How do I?
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
#font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
#font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.dp-replycode
{mso-style-name:dp-replycode;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
#page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Some message
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">J<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
....

I'm just an idiot. I was using client data as a source of Outlook emails since I didn't readily have an Outlook client available. After setting it up and testing for myself, the family:Wingdings is indeed in the source. Turns out the client was just signing his emails with "J"!
Now it's simple enough looking for font-family:Wingdings and converting them into UTF-8 smilies.

Related

Outlook Redemption unwrappedRdoMail adds 'B' character to empty HTML paragraph

We are seeing a change to the HTML body of a mail item when unwrapping the RDOMail from the unwrapped RDOStore.
We have tried with the latest Redemption code (5.23)
The code we are using follows:
RDOStores rdoStores;
RDOStore rdoStore, unwrappedRdoStore;
RDOFolder rdoFolder;
RDOMail unwrappedRdoMail;
using (_comObjectHelper.GetMonitor(rdoFolder = mail.Parent))
using (_comObjectHelper.GetMonitor(rdoStore = rdoFolder.Store))
using (_comObjectHelper.GetMonitor(rdoStores = rdoSession.Stores))
using (_comObjectHelper.GetMonitor(unwrappedRdoStore = rdoStores.UnwrapStore(rdoStore)))
using (_comObjectHelper.GetMonitor(unwrappedRdoMail = unwrappedRdoStore.GetMessageFromID(mail.EntryID)))
{
rdoMailAction?.Invoke(unwrappedRdoMail);
unwrappedRdoMail.Save();
}
IRDOMail item HTMLBody from rdoSession.GetRDOObjectFromOutlookObject(_MailItem) (correct):
<html>
<head>
<meta name=\"Generator\" content=\"Microsoft Word 15 (filtered medium)\" />
<style>
<!--\r\n/* Font Definitions */\r\n#font-face\r\n\t{font-family:\"Cambria Math\";\r\n\tpanose-1:2 4 5 3 5 4 6 3 2 4;}\r\n#font-face\r\n\t{font-family:Calibri;\r\n\tpanose-1:2 15 5 2 2 2 4 3 2 4;}\r\n/* Style Definitions */\r\np.MsoNormal, li.MsoNormal, div.MsoNormal\r\n\t{margin:0in;\r\n\tmargin-bottom:.0001pt;\r\n\tfont-size:11.0pt;\r\n\tfont-family:\"Calibri\",sans-serif;}\r\nspan.EmailStyle17\r\n\t{mso-style-type:personal-compose;\r\n\tfont-family:\"Calibri\",sans-serif;\r\n\tcolor:windowtext;}\r\n.MsoChpDefault\r\n\t{mso-style-type:export-only;\r\n\tfont-family:\"Calibri\",sans-serif;}\r\n#page WordSection1\r\n\t{size:8.5in 11.0in;\r\n\tmargin:1.0in 1.0in 1.0in 1.0in;}\r\ndiv.WordSection1\r\n\t{page:WordSection1;}\r\n-->
</style>
</head>
<body lang=\"EN-US\" link=\"#0563C1\" vlink=\"#954F72\">
<div class=\"WordSection1\">
<p class=\"MsoNormal\">test </p>
<p class=\"MsoNormal\"> </p>
<p class=\"MsoNormal\">test </p>
<p class=\"MsoNormal\"> </p>
<p class=\"MsoNormal\">test</p>
<p class=\"MsoNormal\"> </p>
<p class=\"MsoNormal\">test test</p>
</div>
<span/>
</body>
</html>\r\n
RDOMail unwrappedRdoMail item HTMLBody (incorrect - extra 'B' characters on blank paragraphs):
<html>
<head>
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=us-ascii\">
<meta name=\"Generator\" content=\"Microsoft Word 15 (filtered medium)\">
<style>
<!--\r\n/* Font Definitions */\r\n#font-face\r\n\t{font-family:\"Cambria Math\";\r\n\tpanose-1:2 4 5 3 5 4 6 3 2 4;}\r\n#font-face\r\n\t{font-family:Calibri;\r\n\tpanose-1:2 15 5 2 2 2 4 3 2 4;}\r\n/* Style Definitions */\r\np.MsoNormal, li.MsoNormal, div.MsoNormal\r\n\t{margin:0in;\r\n\tmargin-bottom:.0001pt;\r\n\tfont-size:11.0pt;\r\n\tfont-family:\"Calibri\",sans-serif;}\r\nspan.EmailStyle17\r\n\t{mso-style-type:personal-compose;\r\n\tfont-family:\"Calibri\",sans-serif;\r\n\tcolor:windowtext;}\r\n.MsoChpDefault\r\n\t{mso-style-type:export-only;\r\n\tfont-family:\"Calibri\",sans-serif;}\r\n#page WordSection1\r\n\t{size:8.5in 11.0in;\r\n\tmargin:1.0in 1.0in 1.0in 1.0in;}\r\ndiv.WordSection1\r\n\t{page:WordSection1;}\r\n-->
</style>
</head>
<body lang=\"EN-US\" link=\"#0563C1\" vlink=\"#954F72\">
<div class=\"WordSection1\">
<p class=\"MsoNormal\">test </p>
<p class=\"MsoNormal\">B </p>
<p class=\"MsoNormal\">test </p>
<p class=\"MsoNormal\">B </p>
<p class=\"MsoNormal\">test</p>
<p class=\"MsoNormal\"> </p>
<p class=\"MsoNormal\">test test</p>
</div>
<span/>
</body>
</html>\r\n
Has anyone seen this behaviour? Any ideas on how to solve this problem?
Turned out there was some funny characters  and a0 in the received email, which points to incompatibilities between the 2 different mail servers (GSuite and IMAP).
Nothing could be done in this case.

Optimize website to show reader view in Firefox

Firefox 38.0.5 added a "Reader View" to the address bar:
But not all sites get this icon, It only appears when readable content page is detected. So how do I enable this for my site?
I tried media print and an extra stylesheet for print-view, but that has no effect:
<html>
<head>
<style>
#media print { /* no effect: */
.no-print { display:none; }
}
</style>
<!-- no effect either:
<link rel="stylesheet" href="print.css" media="print"><!-- -->
</head><body>
<h1>Some Title</h1>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+should+vanish+in+print+view">
<br><br><br>This is the only text
</body></html>
What code snippets do I have to add into my website sourcecode so this book icon will become visible to the visitors of my site?
As the code stands in May '20 the trigger function (isProbablyReaderable) scores only p or pre elements and div elements that contain at least one decedent br.
A slight oversimplification of the scoring heuristic is:
For each element in ['p', 'pre', 'div > br']:
If textContent length is > 140 chars, increase score by sqrt(length - 140)
if cumulative score > 20, return true
You have to add <div> or <p> tags to achieve a page to iniciate the ReaderView.
I created a simple html that works:
<html>
<head>
<title>Reader View shows only the browser in reader view</title>
</head>
<body>
Everything outside the main div tag vanishes in Reader View<br>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+should+vanish+in+print+view">
<div>
<h1>H1 tags outside ot a p tag are hidden in reader view</h1>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+is resized+in+print+view">
<p>
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789 123456
</p>
</div>
</body>
</html>
This is the minimum needed to activate it. This is a somewhat multi-faceted process where scores are added for text chunks.
You can for example activate the reader view in forum's software if you add a <p>-tag around each message block in the view-posts template.
Here are some more details about the mechanism

Simple wkhtmltopdf conversion with framesets creating empty pdf

We need to convert/provide our html-based in-app HelpSystem to an on-disc pdf for the client to view outside of the application.
I'm trying to use wkhtmltopdf with a very basic file (3 frames with links to simple .html files) but getting an empty .pdf when I run the following from the command line:
wkhtmltopdf "C:\Program Files (x86)\wkhtmltopdf\index.html" "c:\delme\test.pdf"
I know frames are somewhat deprecated but it’s what I’ve got to deal with. Are the frames causing the empty pdf?
Index.html:
<html>
<head>
<title>Help</title>
</head>
<frameset cols="28%, 72%">
<frameset rows="8%, 92%">
<frame noresize="noresize" src="Buttons.html" name="UPPERLEFT" />
<frame noresize="noresize" src="mytest2.html" name="LOWERLEFT" />
</frameset>
<frame noresize="noresize" src="mytest.html" name="RIGHT" />
</frameset>
</html>
mytest.html:
<html>
<body>
<p>
<b>This text is bold</b>
</p>
<p>
<strong>This text is strong</strong>
</p>
<p>
<em>This text is emphasized</em>
</p>
<p>
<i>This text is italic</i>
</p>
<p>
<small>This text is small</small>
</p>
<p>This is
<sub>subscript</sub> and
<sup>superscript</sup></p>
</body>
</html>
mytest2.html:
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<h2>The blockquote Element</h2>
<p>The blockquote element specifies a section that is quoted from another source.</p>
<p>Here is a quote from WWF's website:</p>
<blockquote cite="http://www.worldwildlife.org/who/index.html">For 50 years, WWF has been protecting the future of nature. The
world’s leading conservation organization, WWF works in 100 countries and is supported by 1.2 million members in the United
States and close to 5 million globally.</blockquote>
<p>
<b>Note:</b> Browsers usually indent blockquote elements.</p>
<h2>The q Element</h2>
<p>The q element defines a short quotation.</p>
<p>WWF's goal is to:
<q>Build a future where people live in harmony with nature.</q> We hope they succeed.</p>
<p>
<b>Note:</b> Browsers insert quotation marks around the q element.</p>
</body>
</html>
buttons.html:
![<html>
<body>
<center>
<table>
<tr>
<td>
<form method="link" action="mytest.html" target="LOWERLEFT">
<input type="submit" value="Contents" />
</form>
</td>
<td>
<form method="link" action="mytest2.html" target="LOWERLEFT">
<input type="submit" value="Index" />
</form>
</td>
</tr>
</table>
</center>
</body>
</html>][2]
Taken from the official wkhtmltopdf issues area from a code project member’s answer; emphasis is mine:
wkhtmltopdf calculates the TOC based on the H* (e.g. H1, H2 and so on)
tags in the supplied documents. It does not recurse into frames and
iframes.. It will nest dependend on the number, to make sure that it
does the right thing, it is good to make sure that you only have
tags under a tag and not for some k larger
then 1. 2000+ files sounds like a lot. You might run out of memory
while converting the output. If it does not work for you.. you could
try using the switch to dump the outline to a xml file, to see what it
would but into a TOC.

Conflict duplicating element ID's during conditional statements?

<!--[if gte IE 9]><!--><img src="images/logo.svg" onerror="this.src=images/logo.png;this.onerror=null;" id="logo" alt="Sample logo"><!--<![endif]--><!--[if lt IE 9]><img src="images/logo.png" id="logo" alt="Sample logo"><![endif]-->
Will this validate properly since the same ID tag is used twice on the same page because one is used in a conditional statement?
<!--[if gte IE 9]><img src="images/logo.svg" onerror="this.src=images/logo.png;this.onerror=null;" id="logo" alt="Sample logo"><![endif]><!--[if lt IE 9]><img src="images/logo.png" id="logo" alt="Sample logo"><![endif]>
Try this code. I saw some extra tags in your code which I have removed.

Search for string in a webpage with frames

I am trying to search the webpage for particular strings, and I am using the following code:
strPageContent = appIE.document.documentElement.InnerHTML
However, the page is quite complicated and has got a lot of frames and framesets, and the command above returns only the contents of some parents tag. How can I access the contents of the particular div element - please see the code below:
<html>
<head>...</head>
<frameset id="NavContent_Workhorse" frameborder="0" framespacing="0" rows="*,0">
<frameset id="Nav_Content" border="3" frameborder="1" framespacing="3" cols="240,*">
<frame name="nav" src="/interface/sidebar/sidebar.def" scrolling="no">
<frame name="content" src="/interface/home.def" frameborder="1" border="3" marginheight="0" marginwidth="0" scrolling="no">
#document
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang>
<head> … </head>
<body onload="LoadAdd();">
<div class="PageTitle" id="PageTitle">...</div>
<div class="ToolBar" id="PageBody">...</div>
<div id="error" class="none"> … </div>
<div id="content" style="display: block; height: 445px;">
<form id="frmUSR" method="post" target="workhorse" onsubmit="return false;" action="/setup/users_groups/users/insert.sdl?parentid=14">
<div id="wiz_1" class="wiz_vis">
<table class="frmTbl">
<thead class="title">
<tr>
<th class="label">
TEXT THAT I AM LOOKING FOR
</th>
Thanks in advance!
Edit:
I forgot to add that I did try the code below, but I get a null value
Set div = appIE.document.getElementById("wiz_1")
Edit 2:
The purpose of the script is to automate filling out the user creation forms on my company's system (webpage UI). I don't know why, but I cannot get a reference to anything that is below the main <frameset id="Nav_Content" border="3". I keep getting null values.
You can access the content of an element with a particular ID inside a frame like this:
Set frame = appIE.Document.parentWindow.window.frames("content")
Set div = frame.document.getElementById("wiz_1")
WScript.Echo div.innerHTML
WScript.Echo div.innerText

Resources