Bitbucket ticket markdown: open external link in new window (_blank) [duplicate] - syntax

Is there a way to create a link in Markdown that opens in a new window? If not, what syntax do you recommend to do this? I'll add it to the markdown compiler I use. I think it should be an option.

As far as the Markdown syntax is concerned, if you want to get that detailed, you'll just have to use HTML.
Hello, world!
Most Markdown engines I've seen allow plain old HTML, just for situations like this where a generic text markup system just won't cut it. (The StackOverflow engine, for example.) They then run the entire output through an HTML whitelist filter, regardless, since even a Markdown-only document can easily contain XSS attacks. As such, if you or your users want to create _blank links, then they probably still can.
If that's a feature you're going to be using often, it might make sense to create your own syntax, but it's generally not a vital feature. If I want to launch that link in a new window, I'll ctrl-click it myself, thanks.

Kramdown supports it. It's compatible with standard Markdown syntax, but has many extensions, too. You would use it like this:
[link](url){:target="_blank"}

I don't think there is a markdown feature, although there may be other options available if you want to open links which point outside your own site automatically with JavaScript.
Array.from(javascript.links)
.filter(link => link.hostname != window.location.hostname)
.forEach(link => link.target = '_blank');
jsFiddle.
If you're using jQuery:
$(document.links).filter(function() {
return this.hostname != window.location.hostname;
}).attr('target', '_blank');
jsFiddle.

With Markdown v2.5.2, you can use this:
[link](URL){:target="_blank"}

So, it isn't quite true that you cannot add link attributes to a Markdown URL. To add attributes, check with the underlying markdown parser being used and what their extensions are.
In particular, pandoc has an extension to enable link_attributes, which allow markup in the link. e.g.
[Hello, world!](http://example.com/){target="_blank"}
For those coming from R (e.g. using rmarkdown, bookdown, blogdown and so on), this is the syntax you want.
For those not using R, you may need to enable the extension in the call to pandoc with +link_attributes
Note: This is different than the kramdown parser's support, which is one the accepted answers above. In particular, note that kramdown differs from pandoc since it requires a colon -- : -- at the start of the curly brackets -- {}, e.g.
[link](http://example.com){:hreflang="de"}
In particular:
# Pandoc
{ attribute1="value1" attribute2="value2"}
# Kramdown
{: attribute1="value1" attribute2="value2"}
^
^ Colon

One global solution is to put <base target="_blank">
into your page's <head> element. That effectively adds a default target to every anchor element. I use markdown to create content on my Wordpress-based web site, and my theme customizer will let me inject that code into the top of every page. If your theme doesn't do that, there's a plug-in

Not a direct answer, but may help some people ending up here.
If you are using GatsbyJS there is a plugin that automatically adds target="_blank" to external links in your markdown.
It's called gatsby-remark-external-links and is used like so:
yarn add gatsby-remark-external-links
plugins: [
{
resolve: `gatsby-transformer-remark`,
options: {
plugins: [{
resolve: "gatsby-remark-external-links",
options: {
target: "_blank",
rel: "noopener noreferrer"
}
}]
}
},
It also takes care of the rel="noopener noreferrer".
Reference the docs if you need more options.

For ghost markdown use:
[Google](https://google.com" target="_blank)
Found it here:
https://cmatskas.com/open-external-links-in-a-new-window-ghost/

I'm using Grav CMS and this works perfectly:
Body/Content:
Some text[1]
Body/Reference:
[1]: http://somelink.com/?target=_blank
Just make sure that the target attribute is passed first, if there are additional attributes in the link, copy/paste them to the end of the reference URL.
Also work as direct link:
[Go to this page](http://somelink.com/?target=_blank)

You can do this via native javascript code like so:
var pattern = /a href=/g;
var sanitizedMarkDownText = rawMarkDownText.replace(pattern,"a target='_blank' href=");
JSFiddle Code

In my project I'm doing this and it works fine:
[Link](https://example.org/ "title" target="_blank")
Link
But not all parsers let you do that.

There's no easy way to do it, and like #alex has noted you'll need to use JavaScript. His answer is the best solution but in order to optimize it, you might want to filter only to the post-content links.
<script>
var links = document.querySelectorAll( '.post-content a' );
for (var i = 0, length = links.length; i < length; i++) {
if (links[i].hostname != window.location.hostname) {
links[i].target = '_blank';
}
}
</script>
The code is compatible with IE8+ and you can add it to the bottom of your page. Note that you'll need to change the ".post-content a" to the class that you're using for your posts.
As seen here: http://blog.hubii.com/target-_blank-for-links-on-ghost/

If someone is looking for a global rmarkdown (pandoc) solution.
Using Pandoc Lua Filter
You could write your own Pandoc Lua Filter which adds target="_blank" to all links:
Write a Pandoc Lua Filter, name it for example links.lua
function Link(element)
if
string.sub(element.target, 1, 1) ~= "#"
then
element.attributes.target = "_blank"
end
return element
end
Then update your _output.yml
bookdown::gitbook:
pandoc_args:
- --lua-filter=links.lua
Inject <base target="_blank"> in Header
An alternative solution would be to inject <base target="_blank"> in the HTML head section using the includes option:
Create a new HTML file, name it for example links.html
<base target="_blank">
Then update your _output.yml
bookdown::gitbook:
includes:
in_header: links.html
Note: This solution may also open new tabs for hash (#) pointers/URLs. I have not tested this solution with such URLs.

In Laravel I solved it this way:
$post->text= Str::replace('<a ', '<a target="_blank"', $post->text);
Not works for a specific link. Edit all links in the Markdown text. (In my case it's fine)

I ran into this problem when trying to implement markdown using PHP.
Since the user generated links created with markdown need to open in a new tab but site links need to stay in tab I changed markdown to only generate links that open in a new tab. So not all links on the page link out, just the ones that use markdown.
In markdown I changed all the link output to be <a target='_blank' href="..."> which was easy enough using find/replace.

I do not agree that it's a better user experience to stay within one browser tab. If you want people to stay on your site, or come back to finish reading that article, send them off in a new tab.
Building on #davidmorrow's answer, throw this javascript into your site and turn just external links into links with target=_blank:
<script type="text/javascript" charset="utf-8">
// Creating custom :external selector
$.expr[':'].external = function(obj){
return !obj.href.match(/^mailto\:/)
&& (obj.hostname != location.hostname);
};
$(function(){
// Add 'external' CSS class to all external links
$('a:external').addClass('external');
// turn target into target=_blank for elements w external class
$(".external").attr('target','_blank');
})
</script>

You can add any attributes using {[attr]="[prop]"}
For example [Google] (http://www.google.com){target="_blank"}

For completed alex answered (Dec 13 '10)
A more smart injection target could be done with this code :
/*
* For all links in the current page...
*/
$(document.links).filter(function() {
/*
* ...keep them without `target` already setted...
*/
return !this.target;
}).filter(function() {
/*
* ...and keep them are not on current domain...
*/
return this.hostname !== window.location.hostname ||
/*
* ...or are not a web file (.pdf, .jpg, .png, .js, .mp4, etc.).
*/
/\.(?!html?|php3?|aspx?)([a-z]{0,3}|[a-zt]{0,4})$/.test(this.pathname);
/*
* For all link kept, add the `target="_blank"` attribute.
*/
}).attr('target', '_blank');
You could change the regexp exceptions with adding more extension in (?!html?|php3?|aspx?) group construct (understand this regexp here: https://regex101.com/r/sE6gT9/3).
and for a without jQuery version, check code below:
var links = document.links;
for (var i = 0; i < links.length; i++) {
if (!links[i].target) {
if (
links[i].hostname !== window.location.hostname ||
/\.(?!html?)([a-z]{0,3}|[a-zt]{0,4})$/.test(links[i].pathname)
) {
links[i].target = '_blank';
}
}
}

Automated for external links only, using GNU sed & make
If one would like to do this systematically for all external links, CSS is no option. However, one could run the following sed command once the (X)HTML has been created from Markdown:
sed -i 's|href="http|target="_blank" href="http|g' index.html
This can be further automated by adding above sed command to a makefile. For details, see GNU make or see how I have done that on my website.

If you just want to do this in a specific link, just use the inline attribute list syntax as others have answered, or just use HTML.
If you want to do this in all generated <a> tags, depends on your Markdown compiler, maybe you need an extension of it.
I am doing this for my blog these days, which is generated by pelican, which use Python-Markdown. And I found an extension for Python-Markdown Phuker/markdown_link_attr_modifier, it works well. Note that an old extension called newtab seems not work in Python-Markdown 3.x.

For React + Markdown environment:
I created a reusable component:
export type TargetBlankLinkProps = {
label?: string;
href?: string;
};
export const TargetBlankLink = ({
label = "",
href = "",
}: TargetBlankLinkProps) => (
<a href={href} target="__blank">
{label}
</a>
);
And I use it wherever I need a link that open in a new window.

For "markdown-to-jsx" with MUI v5
This seem to work for me:
import Markdown from 'markdown-to-jsx';
...
const MarkdownLink = ({ children, ...props }) => (
<Link {...props}>{children}</Link>
);
...
<Markdown
options={{
forceBlock: true,
overrides: {
a: {
component: MarkdownLink,
props: {
target: '_blank',
},
},
},
}}
>
{description}
</Markdown>

This works for me: [Page Link](your url here "(target|_blank)")

Related

How to prevent thymeleaf from replacing special characters in HTML attribute values?

I'am using Thymeleaf in combination with a js micro-templating routine, which result in special characters in attribute values. When running Thymeleaf on
<a style="display:<%= x ? 'block' : 'none' %>;">
it creates
<a style="display:<%= x ? 'block' : 'none' %>;">
while I would expect to get exactly the same I put into the processor.
How do I use special characters in HTML attribute values?
Many thanks!
Try to play with
th:utext="#{unescaped text}">
See Thymeleaf doc Unescaped Text
There is some variants to review:
use LEGACYHTML5 option in templateResolver as mentioned above
override template method in Underscore.js
I prefer second approach. Place this code in your initialization script:
var original = _.template;
_.template = function(content) {
// fix operators escaped by Thymeleaf HTML5 validator
content = content.replace(/'/g, "'");
content = content.replace(/</g, "<");
content = content.replace(/>/g, ">");
return original.call(this, content);
};
You can try to surround your code with a CDATA block.
http://www.w3schools.com/xml/xml_cdata.asp
I'm not sure about the other template modes but I know html5 and xml are not going to let you do this since the document thymeleaf would be generating would NOT validate against the doctype. So at least in those modes, I don't think this is possible. (Maybe with a custom dialect?)
So why the desire to use two templating tools on the same page anyway?
As hubbardr already suggested, using LEGACYHTML5 as template mode might help you here.

How to prevent CKEditor replacing spaces with ?

I'm facing an issue with CKEditor 4, I need to have an output without any html entity so I added config.entities = false; in my config, but some appear when
an inline tag is inserted: the space before is replaced with
text is pasted: every space is replaced with even with config.forcePasteAsPlainText = true;
You can check that on any demo by typing
test test
eg.
Do you know how I can prevent this behaviour?
Thanks!
Based on Reinmars accepted answer and the Entities plugin I created a small plugin with an HTML filter which removes redundant entities. The regular expression could be improved to suit other situations, so please edit this answer.
/*
* Remove entities which were inserted ie. when removing a space and
* immediately inputting a space.
*
* NB: We could also set config.basicEntities to false, but this is stongly
* adviced against since this also does not turn ie. < into <.
* #link http://stackoverflow.com/a/16468264/328272
*
* Based on StackOverflow answer.
* #link http://stackoverflow.com/a/14549010/328272
*/
CKEDITOR.plugins.add('removeRedundantNBSP', {
afterInit: function(editor) {
var config = editor.config,
dataProcessor = editor.dataProcessor,
htmlFilter = dataProcessor && dataProcessor.htmlFilter;
if (htmlFilter) {
htmlFilter.addRules({
text: function(text) {
return text.replace(/(\w) /g, '$1 ');
}
}, {
applyToAll: true,
excludeNestedEditable: true
});
}
}
});
These entities:
// Base HTML entities.
var htmlbase = 'nbsp,gt,lt,amp';
Are an exception. To get rid of them you can set basicEntities: false. But as docs mention this is an insecure setting. So if you only want to remove , then I should just use regexp on output data (e.g. by adding listener for #getData) or, if you want to be more precise, add your own rule to htmlFilter just like entities plugin does here.
Remove all but not <tag> </tag> with Javascript Regexp
This is especially helpful with CKEditor as it creates lines like <p> </p>, which you might want to keep.
Background: I first tried to make a one-liner Javascript using lookaround assertions. It seems you can't chain them, at least not yet. My first approach was unsuccesful:
return text.replace(/(?<!\>) (?!<\/)/gi, " ")
// Removes but not <p> </p>
// It works, but does not remove `<p> blah </p>`.
Here is my updated working one-liner code:
return text.replace(/(?<!\>\s.)( (?!<\/)|(?<!\>) <\/p>)/gi, " ")
This works as intended. You can test it here.
However, this is a shady practise as lookarounds are not fully supported by some browsers.
Read more about Assertions.
What I ended up using in my production code:
I ended up doing a bit hacky approach with multiple replace(). This should work on all browsers.
.trim() // Remove whitespaces
.replace(/\u00a0/g, " ") // Remove unicode non-breaking space
.replace(/((<\w+>)\s*( )\s*(<\/\w+>))/gi, "$2<!--BOOM-->$4") // Replace empty nbsp tags with BOOM
.replace(/ /gi, " ") // remove all
.replace(/((<\w+>)\s*(<!--BOOM-->)\s*(<\/\w+>))/gi, "$2 $4") // Replace BOOM back to empty tags
If you have a better suggestion, I would be happy to hear 😊.
I needed to change the regular expression Imeus sent, in my case, I use TYPO3 and needed to edit the backend editor. This one didn't work. Maybe it can help another one that has the same problem :)
return text.replace(/ /g, ' ');

Displaying Docbook section titles in TextMate's function popup

I'm working on a bundle for DocBook 5 XML, which frequently includes content like:
<section>
<title>This is my awesome Java Class called <classname>FunBunny</classname></title>
<para>FunBunny is your friend.</para>
</section>
I want the titles for sections to appear in the function popup at the bottom of the window. I have this partially working using the following bundle items.
Language Grammar:
{ patterns = (
{ name = 'meta.tag.xml.docbook5.title';
match = '<title>(.*?)</title>';
/* patterns = ( { include = 'text.xml'; } ); */
},
{ include = 'text.xml'; },
);
}
Settings/Preferences Item with scope selector meta.tag.xml.docbook5.title:
{ showInSymbolList = 1;
symbolTransformation = 's/^\s*<title\s?.*?>\s*(.*)\s*<\/title>/$1/';
}
The net effect of this is that all title elements in the document are matched and appear in the function popup, excluding the <title></title> tag content based on the symbolTransformation.
I would be happy with this much functionality, since other interesting things (like figures) tend to have formal titles, but there is one issue.
The content of the title tags is not parsed and recognized according to the rest of the text.xml language grammar. The commented-out patterns section in the above language grammar does not have the desired effect of fixing this issue -- it puts everything into the meta.tag.xml.docbook5.title scope.
Is there a way to get what I want here? That is, the content of the title elements, optionally just for section titles, in the function popup and recognized as normal XML content by the parser.
In TextMate grammars you need to use the begin/end type rules instead of match type rules, if you want to 'match within a match'. (You can actually use a match as well, but then you need to use currently undocumented behavior, available only for TextMate 2)
{ patterns = (
{ name = 'meta.tag.xml.docbook5.title';
begin = '<title>';
end = '</title>';
patterns = ( { include = 'text.xml'; } );
},
{ include = 'text.xml'; },
);
}
This has the added benefit of allowing <title>...</title> that span more than one line.

How to make client side I18n with mustache.js

i have some static html files and want to change the static text inside with client side modification through mustache.js.
it seems that this was possible Twitter's mustache extension on github: https://github.com/bcherry/mustache.js
But lately the specific I18n extension has been removed or changed.
I imagine a solution where http:/server/static.html?lang=en loads mustache.js and a language JSON file based on the lang param data_en.json.
Then mustache replaces the {{tags}} with the data sent.
Can someone give me an example how to do this?
You can use lambdas along with some library like i18next or something else.
{{#i18n}}greeting{{/i18n}} {{name}}
And the data passed:
{
name: 'Mike',
i18n: function() {
return function(text, render) {
return render(i18n.t(text));
};
}
}
This solved the problem for me
I don't think Silent's answer really solves/explains the problem.
The real issue is you need to run Mustache twice (or use something else and then Mustache).
That is most i18n works as two step process like the following:
Render the i18n text with the given variables.
Render the HTML with the post rendered i18n text.
Option 1: Use Mustache partials
<p>{{> i18n.title}}</p>
{{#somelist}}{{> i18n.item}}{{/somelist}}
The data given to this mustache template might be:
{
"amount" : 10,
"somelist" : [ "description" : "poop" ]
}
Then you would store all your i18n templates/messages as a massive JSON object of mustache templates on the server:
Below is the "en" translations:
{
"title" : "You have {{amount}} fart(s) left",
"item" : "Smells like {{description}}"
}
Now there is a rather big problem with this approach in that Mustache has no logic so handling things like pluralization gets messy.
The other issue is that performance might be bad doing so many partial loads (maybe not).
Option 2: Let the Server's i18n do the work.
Another option is to let the server do the first pass of expansion (step 1).
Java does have lots of options for i18n expansion I assume other languages do as well.
Whats rather annoying about this solution is that you will have to load your model twice. Once with the regular model and second time with the expanded i18n templates. This is rather annoying as you will have to know exactly which i18n expansions/templates to expand and put in the model (otherwise you would have to expand all the i18n templates). In other words your going to get some nice violations of DRY.
One way around the previous problem is pre-processing the mustache templates.
My answer is based on developingo's. He's answer is very great I'll just add the possibility to use mustache tags in the message keycode. It is really needed if you want to be able the get messages according to the current mustache state or in loops
It's base on a simple double rendering
info.i18n = function(){
return function(text, render){
var code = render(text); //Render first to get all variable name codes set
var value = i18n.t(code)
return render(value); //then render the messages
}
}
Thus performances aren't hit because of mustache operating on a very small string.
Here a little example :
Json data :
array :
[
{ name : "banana"},
{ name : "cucomber" }
]
Mustache template :
{{#array}}
{{#i18n}}description_{{name}}{{/i18n}}
{{/array}}
Messages
description_banana = "{{name}} is yellow"
description_cucomber = "{{name}} is green"
The result is :
banana is yellow
cucomber is green
Plurals
[Edit] : As asked in the comment follows an example with pseudo-code of plural handling for english and french language. Its a very simple and not tested example but it gives you a hint.
description_banana = "{{#plurable}}a {{name}} is{{/plurable}} green" (Adjectives not getting "s" in plurals)
description_banana = "{{#plurable}}Une {{name}} est verte{{/plurable}}" (Adjectives getting an "s" in plural, so englobing the adjective as well)
info.plurable = function()
{
//Check if needs plural
//Parse each word with a space separation
//Add an s at the end of each word except ones from a map of common exceptions such as "a"=>"/*nothing*/", "is"=>"are" and for french "est"=>"sont", "une" => "des"
//This map/function is specific to each language and should be expanded at need.
}
This is quite simple and pretty straightforward.
First, you will need to add code to determine the Query String lang. For this, I use snippet taken from answer here.
function getParameterByName(name) {
var match = RegExp('[?&]' + name + '=([^&]*)')
.exec(window.location.search);
return match && decodeURIComponent(match[1].replace(/\+/g, ' '));
}
And then, I use jQuery to handle ajax and onReady state processing:
$(document).ready(function(){
var possibleLang = ['en', 'id'];
var currentLang = getParameterByName("lang");
console.log("parameter lang: " + currentLang);
console.log("possible lang: " + (jQuery.inArray(currentLang, possibleLang)));
if(jQuery.inArray(currentLang, possibleLang) > -1){
console.log("fetching AJAX");
var request = jQuery.ajax({
processData: false,
cache: false,
url: "data_" + currentLang + ".json"
});
console.log("done AJAX");
request.done(function(data){
console.log("got data: " + data);
var output = Mustache.render("<h1>{{title}}</h1><div id='content'>{{content}}</div>", data);
console.log("output: " + output);
$("#output").append(output);
});
request.fail(function(xhr, textStatus){
console.log("error: " + textStatus);
});
}
});
For this answer, I try to use simple JSON data:
{"title": "this is title", "content": "this is english content"}
Get this GIST for complete HTML answer.
Make sure to remember that other languages are significantly different from EN.
In FR and ES, adjectives come after the noun. "green beans" becomes "haricots verts" (beans green) in FR, so if you're plugging in variables, your translated templates must have the variables in reverse order. So for instance, printf won't work cuz the arguments can't change order. This is why you use named variables as in Option 1 above, and translated templates in whole sentences and paragraphs, rather than concatenating phrases.
Your data needs to also be translated, so the word 'poop', which came from data - somehow that has to be translated. Different languages do plurals differently, as does english, as in tooth/teeth, foot/feet, etc. EN also has glasses and pants that are always plural. Other languages similarly have exceptions and strange idoms. In the UK, IBM 'are' at the trade show whereas in in the US, IBM 'is' at the trade show. Russian has several different rules for plurals depending on if they are people, animals, long narrow objects, etc. In other countries, thousands separators are spaces, dots, or apostrophes, and in some cases don't work by 3 digits: 4 in Japan, inconsistently in India.
Be content with mediocre language support; it's just too much work.
And don't confuse changing language with changing country - Switzerland, Belgium and Canada also have FR speakers, not to mention Tahiti, Haiti and Chad. Austria speaks DE, Aruba speaks NL, and Macao speaks PT.

Image tag not closing with HTMLAgilityPack

Using the HTMLAgilityPack to write out a new image node, it seems to remove the closing tag of an image, e.g. should be but when you check outer html, has .
string strIMG = "<img src='" + imgPath + "' height='" + pubImg.Height + "px' width='" + pubImg.Width + "px' />";
HtmlNode newNode = HtmlNode.Create(strIMG);
This breaks xhtml.
Telling it to output XML as Micky suggests works, but if you have other reasons not to want XML, try this:
doc.OptionWriteEmptyNodes = true;
Edit 1:Here is how to fix an HTML Agilty Pack document to correctly display image (img) tags:
if (HtmlNode.ElementsFlags.ContainsKey("img"))
{ HtmlNode.ElementsFlags["img"] = HtmlElementFlag.Closed;}
else
{ HtmlNode.ElementsFlags.Add("img", HtmlElementFlag.Closed);}
replace "img" for any other tag to fix them as well (input, select, and option come up frequently). Repeat as needed. Keep in mind that this will produce rather than , because of the HAP bug preventing the "closed" and "empty" flags from being set simultaneously.
Source: Mike Bridge
Original answer:
Having just labored over solutions to this issue, and not finding any sufficient answers (doctype set properly, using Output as XML, Check Syntax, AutoCloseOnEnd, and Write Empty Node options), I was able to solve this with a dirty hack.
This will certainly not solve the issue outright for everyone, but for anyone returning their generated html/xml as a string (EG via a web service), the simple solution is to use fake tags that the agility pack doesn't know to break.
Once you have finished doing everything you need to do on your document, call the following method once for each tag giving you a headache (notable examples being option, input, and img). Immediately after, render your final string and do a simple replace for each tag prefixed with some string (in this case "Fix_", and return your string.
This is only marginally better in my opinion than the regex solution proposed in another question I cannot locate at the moment (something along the lines of )
private void fixHAPUnclosedTags(ref HtmlDocument doc, string tagName, bool hasInnerText = false)
{
HtmlNode tagReplacement = null;
foreach(var tag in doc.DocumentNode.SelectNodes("//"+tagName))
{
tagReplacement = HtmlTextNode.CreateNode("<fix_"+tagName+"></fix_"+tagName+">");
foreach(var attr in tag.Attributes)
{
tagReplacement.SetAttributeValue(attr.Name, attr.Value);
}
if(hasInnerText)//for option tags and other non-empty nodes, the next (text) node will be its inner HTML
{
tagReplacement.InnerHtml = tag.InnerHtml + tag.NextSibling.InnerHtml;
tag.NextSibling.Remove();
}
tag.ParentNode.ReplaceChild(tagReplacement, tag);
}
}
As a note, if I were a betting man I would guess that MikeBridge's answer above inadvertently identifies the source of this bug in the pack - something is causing the closed and empty flags to be mutually exclusive
Additionally, after a bit more digging, I don't appear to be the only one who has taken this approach:
HtmlAgilityPack Drops Option End Tags
Furthermore, in cases where you ONLY need non-empty elements, there is a very simple fix listed in that same question, as well as the HAP codeplex discussion here: This essentially sets the empty flag option listed in Mike Bridge's answer above permanently everywhere.
There is an option to turn on XML output that makes this issue go away.
var htmlDoc = new HtmlDocument();
htmlDoc.OptionOutputAsXml = true;
htmlDoc.LoadHtml(rawHtml);
This seems to be a bug with HtmlAgilityPack. There are many ways to reproduce this, for example:
Debug.WriteLine(HtmlNode.CreateNode("<img id=\"bla\"></img>").OuterHtml);
Outputs malformed HTML. Using the suggested fixes in the other answers does nothing.
HtmlDocument doc = new HtmlDocument();
doc.OptionOutputAsXml = true;
HtmlNode node = doc.CreateElement("x");
node.InnerHtml = "<img id=\"bla\"></img>";
doc.DocumentNode.AppendChild(node);
Debug.WriteLine(doc.DocumentNode.OuterHtml);
Produces malformed XML / XHTML like <x><img id="bla"></x>
I have created a issue in CodePlex for this.

Resources