html agility pack and triangular brackets in text - html-agility-pack

When I load certain html files with Html Agility Pack, I get that it will not close some tags (looking at InnerHtml and OuterHtml properties) when text within html tags has triangualr braces like this
<span class='title'> Launching app results in an error "Activation of http://<server name="">/TemplateBuilder/?language=1033&locale=1033 resulted in exception"</span>
so as output I get
<span class='title'> Launching app results in an error "Activation of http://<server name="">/TemplateBuilder/?language=1033&locale=1033 resulted in exception"
Is there anything that can be done to preserve it? Because is missing on output, the whole html is not displayed.
Thank you

This happens because your original html is too invalid. It should be:
<span class='title'> Launching app results in an error "Activation of http://<server name="">/TemplateBuilder/?language=1033&locale=1033 resulted in exception"</span>
This kind of malformed html cannot be detected by the Html Agility Pack.

Related

Implementing target-counter in CssContentPropertyResolver

I am trying to add a table of contents with page numbers to a PDF I am generating using Freemarker -> HTML -> PDF. The HTML/CSS are below
ul.toc a::after {
content: " Page " target-counter(attr(href url), page)
}
<ul class="toc">
<li>First Section</li>
<li>Second Section</li>
</ul>
However when I ran iText7 on this I got this error:
Content property "" Page " target-counter(attr(href url), page)" is either invalid or uses unsupported function."
which comes from CssContentPropertyResolver.java.
Are there any intentions to eventually support this CSS property?
For any documents that require a numbered table of contents, we are forced to use iText5, which does support target-counter. It would be very beneficial if you could reimplement this functionality in iText7

CKEditor moving br tags

I'm having a problem with CKEditor changing my original paragraph formatting with negative side effects.
I start with a basic paragraph loaded into CKEditor using setData():
<p><span style="font-size:50px">My Text</span></p>
... more document content ...
In the editor, I move the cursor to the end of the phrase "My Text" and press enter (with config.enterMode=CKEDITOR.ENTER_BR setting enabled). Inspecting the markup inside the editor I now see:
<p><span style="font-size:50px">My Text<br><br></span></p>
... more document content ...
Then, when I call getData() to pull the contents from the editor and save the document to a database, the HTML extracted by getData() looks like this:
<p><span style="font-size:50px">My Text</span><br> </p>
... more document content ...
This is a problem because while editing, the <br> tag was inside the <span> and was subject to the 50px font size style. The user saw a 50px blank line before the next piece of document content. After saving the HTML to a database and reloading later the <br> tag is now outside the <span> and is not subject to the 50px font sizing and the blank line appears much smaller than before.
The round trip fidelity of the text formatting is not preserved and the user is frustrated by the results.
Can someone help me understand the results I'm seeing with <br> tags being reformatted and moved around during the editing life cycle, and how I might fix this problem?
Using CKEditor v4.4.1

CKEditor HTML Autocorrection Issue

I have few lines of HTML in my database. I want to edit the content in CKEditor. But when I open that in editor the HTML gets break down. The HTML gets rearranged.
Below is the HTML which is in database:
<span class="sec_title">
<h1><span>Web</span> Engineering</h1>
<hr>
</span>
And when I open it in CKEditor the HTML looks likes below:
<h1><span class="sec_title"><span>Web</span> Engineering</span></h1>
<hr />
Some one please help me. I tried config.allowedContent = true; but it is also not stopping the CKEditor to do the modifications.
CKEditor works with a valid HTML only and <h1> is not a valid content of <span>. Quoting CKEditor basic concepts:
CKEditor is not a tool that will let you input invalid HTML code. CKEditor abides by W3C standards so it will modify code if it is invalid.

Dynamically change the font size of a single character in SSRS

Using SSRS in Visual Studio 2012 I currently have the following expression in the report header.
=ReportItems!FirmName.Value
This correctly pulls the Firm name such as Client1, Client2, Client3 etc... from the body of the report.
However if ReportItems!FirmID = 600, I need the Font size of the first character in Firm name to be larger then the other characters.
This is because a particular client has a logo where the first character is larger than the others.
I tried the following expression which I know is wrong but might illustrate what I'm trying to do.
=IIF(ReportItems!FirmID.Value = 600,LEFT(ReportItems!FirmName.Value,1), "18pt", ReportItems!FirmName.Value))
So say client3 has FirmID 600 the result should be like this, you may need to run the code snippet to see what I mean...
<html>
<body lang=EN-US style='tab-interval:.5in'>
<div class=WordSection1>
<p class=MsoNormalCxSpFirst>Client1<o:p></o:p></p>
<p class=MsoNormalCxSpMiddle>Client2<o:p></o:p></p>
<p class=MsoNormalCxSpLast><span style='font-size:20.0pt;line-height:115%'>C</span>lient3</p>
</div>
</body>
</html>
I tried Ian's suggestion
which I thought was working, but I cannot yet get it to work. I can get this to work in the report body, but there are additional complications. The font size change needs to be in the Report Header, which requires referencing the report body via ReportItems! You cannot reference more than one ReportItems! in the report header or you get an error like the following
The Value Expression for the textrun Textbox12 refers to more than one report item. An expression in a page header or footer can refer to only one report item.
Yet there is one more problem and I should have clarified this in my original entry. The actual client name is like this.
Client & Associates
Both of the first letters need to be larger, but not the ampersand between them.
<html>
<body lang=EN-US style='tab-interval:.5in'>
<div class=WordSection1>
<p class=MsoNormal><span style='font-size:22.0pt;line-height:115%'>C</span>lient
& <span style='font-size:18.0pt;line-height:115%'>A</span>ssociates</p>
</div>
</body>
</html>
You can do this by creating a couple of text placeholders within the same textbox, splitting the text between these and apply a font-size expression to the first placeholder only.
See Formatting Text and Placeholders for a good overview.
In a simple example with your data:
I have a simple table to display this:
Note there are two <<Expr>> values in the last column - I have added another placeholder in this textbox.
I have split FirmName between these placeholders; in the first:
=Left(Fields!FirmName.Value, 1)
and in the second:
=Right(Fields!FirmName.Value, Len(Fields!FirmName.Value) - 1)
Even though the text is split in two expressions, it looks fine when the report is run:
Since each placeholder can have its own formatting, we can apply an expression like the following to FontSize on the first placeholder only:
=IIf(Fields!FirmID.Value = 600, "15pt", "10pt")
i.e. increase font size of the first letter for firm 600, which gives us the required results:

How do HtmlAgilityPack extract text from html node whose class attribute appended dynamically

Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from amazon.cn.
<div class="content"><ul>
<li><b>用户评分:</b>
<span class="crAvgStars" style="white-space:no-wrap;">
<span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO">
<a>
<span class="swSprite s_star_3_5 " title="平均3.6 星">
<span>平均3.6 星</span>
</span>
</a>
My question is span class tag value "s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use doc.DocumentNode.SelectSingleNode(" //span[#class='swSprite']").InnerText or //span[#class='swSprite s_star_3_5 '], but the result is an error or not what my want !
Any suggestions?
First of all, I suggest you saving the value of doc.DocumentNode.OuterHtml to a local .html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.
I'm suggesting this because I tested //span[#class='swSprite s_star_3_5 '] and worked correctly.
That was the issue in the following questions:
Selecting nodes that have an attribute with spaces using HTMLAgilityPack
XPath Query Problem using HTML Agility Pack
If that doesn't help, post the HTML code and I'll help you ;)
This works for me:
HtmlDocument doc = new HtmlDocument();
doc.Load(myHtml);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(#class, 'swSprite')]");
Console.WriteLine("Text=" + node.InnerText.Trim());
and outputs
平均3.6 星
Note I use the XPATH starts-with function.

Resources