extract text without <div> and <p> with xpath - xpath

<tr><td class=term>1st param</td>
<td>PUTIN
<div class='info-icon'>
<a href='#' onmouseover='show_pd(351);' onmouseout='hide_pd(351);' id='info-icon-351'></a>
</div>
<div id='pd-351' style='display: none; position: absolute;'>
<b>СПРАВКА</b>
<br /><br />
<P align=justify><NOBR><STRONG>ABS</STRONG></NOBR>bla-bla-bla text</P>
<P align=justify>bla-bla-bla text 2</P>
<P align=justify>bla-bla-bla text 3</P>
<P align=justify>bla-bla-bla text 4</P>
</div>
</td>
I need extract only "PUTIN".
Now I'm on
//td[#class="term"][contains(text(), "1st param")]/following-sibling::td/[not(self::p)]

With some adjustments to your XML following XPath
//td[#class="term"][contains(text(), "1st param")]/following-sibling::td/node()[1]
has the output PUTIN
Adjustments were to change <td class=term> into <td class="term"> and all <P align=justify> into <P align="justify"> (maybe not necessary for your settings but was required for the XPath evaluator I just used).

Related

MPDF - Laravel , multi footer

I am using : https://github.com/mccarlosen/laravel-mpdf/issues to generate pdf ,
all ok except when i use multi footer ,
now the footer show only on page 1 (MyFooter1) and 2 (MyFooter2).
I need to show staring page2 on all pages (MyFooter2) .I am finding the correct syntax , below my code
I used <sethtmlpagefooter name="MyFooter2" page="ON" value="1" write="true" /> to show the footer , it show only the footer only on the page where it exist .
<!DOCTYPE html>
<html>
<head>
<title>My title</title>
</head>
#include('pdf.css2')
<body>
<htmlpagefooter name="MyFooter1" style="display:none">
<div class="print-footer">
<div class="footer-section-1" >
<div style="width:20% ; float:left; padding-right: 5px;">
My company name
</div>
<div style="width:20% ;float:left; padding-right: 5px;">
<div> <b>T</b><span> : +98 123 123</span></div>
<div> <b>T/F</b><span>: +98 123 123</span></div>
<div> <b>Email</b><span>: cdoom#cc.co</span></div>
</div>
<div style="width:35% ; float:left; padding-right: 5px;">
<div> <b>Head Office</b></div>
<div> <span>test test,</span> </div>
<div> <span>Road, test, </span></div>
<div> <span> USA</span></div>
<div> <span>P.O. Box 211265 </span></div>
</div>
<div style="width:11% ; float:left; padding-right: 5px;">
<div><b>Branches</b></div>
<div><span>B1</span></div>
<div><span>B2</span></div>
</div>
<div style=" float:left; padding-right: 20px;padding-top: 20px;">
WE ARE THE BEST
</div>
</div>
</div>
</htmlpagefooter>
<htmlpagefooter name="MyFooter2" style="display:none">
<div class="print-footer">
My company name
</div>
</htmlpagefooter>
<div class="page1" >
<div style=" text-align: center ;" >
<div class="logo">
#include('pdf.company logo')
</div>
<div class="title" >
<span class="span-title" > <b>INDIVIDUAL OFFER </b> </span>
<div class="div-name" ><span class="span-name" >client </span></div>
<div>
<div class="info" style="width:50% ;float:left; padding-right: 5px;">DATE: December 2, 2020</div>
<div class="info" style="float:right; padding-right: 5px;">REFERENCE NUMBER: vd 123 xl</div>
</div>
</div>
</div>
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
<sethtmlpagefooter name="MyFooter1" value="1" />
<div class="chapter2">
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
<sethtmlpagefooter name="MyFooter2" page="ON" value="1" write="true" />
<div class="chapter2">
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
<div class="chapter2">
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
<div class="chapter2">
<div class="message">
<p >
Dear Ms. ,
</p>
<p>
We believe that the following offer is the first step to a long and happy relationship between us. </p>
<p>
Allow us to put your worries aside with our special programs.
</p>
<p>
Best Regards,
</p>
</div>
</div>
</body>
</html>
in CSS , I used this :
#page {
footer: html_MyFooter2;
margin-bottom: 170px;
}
#page :first {
footer: html_MyFooter1;
}
and in HTML I removed :
<sethtmlpagefooter name="MyFooter1" value="1" />
<sethtmlpagefooter name="MyFooter2" value="1" />
This fix the issue and help me ,
refer to :
Mpdf different header for first page
3rd answer (by : Ningappa )

How to concatenate th:text and static content with html tags in Thymeleaf?

<div th:switch="${data.totalPercentage}">
<td th:case="100" th:text="${data.totalPercentage}"/>
<td th:case="*" th:text="${data.totalPercentage + '*'}" class="alert alert-warning font-weight-bold"/>
</div>
Above expression works but I can not concatenate html tag with th:text. I would like to replace * with fontawesome flag icon <i class="fas fa-flag"></i>. Any suggestion?
No real reasons to concatenate here, you should be thinking in terms of HTML.
<div th:switch="${data.totalPercentage}">
<td th:case="100" th:text="${data.totalPercentage}"/>
<td th:case="*" class="alert alert-warning font-weight-bold">
<span th:text="${data.totalPercentage}" /> <i class="fas fa-flag"></i>
</td>
</div>

Please create XPATH for the table element

Could you please create complete XPATH for the following web element
Address 1 (PO Box/Rural Route is ok)
it would be great if you give me how to create the XPATH as well for any HTML or XML code.
HTML code:
<!DOCTYPE html>
<html>
<head>
<body id="myProfile">
<div id="header">
<div class="wrapperMenu">
<div class="wrapperBodyCreditCenter">
<div id="bodyAll2">
<div>
<div class="fsmWideHero">
<div style="display: block; margin-bottom: 10px;">
<div class="fsmSubHeader"> Please update any information to maintain a clean and current profile. Once the correct information is entered, simply click on the save changes button to update your profile. </div>
<span class="validateError"> </span>
<script src="https://staging.privacyguard.com/ClientScript.aspx?script=FormControl%2fdisableOnSubmit.js&utc=633470607600000000" type="text/javascript">
<form action="/secure/MyProfile.aspx" autocomplete="OFF" name="PGUpdateProfileFormControl" onsubmit="disableSubmission(this)" method="post">
<input type="hidden" value="pgupdateprofile" name="cf">
<input id="__FLOWEVENT" type="hidden">
<input type="hidden" value="5sXZAFNB4Co/JF70jyZ1HHJdIMso9YP2jGmMsyuLQCKC2J0lOg2XDVX2FcLJBS2WMnYD55d7UNLPVXW0yHa6f4WVjmWwwvT0KiwnAY+6v0xpWJZ9fuKe5F2h25Z+aZZwdoEpJtmnYVyvq+znPaYwyQqTV0d3M6bWeBKMt6woDqdtRzUEooyojAzyaMiYFyqosVe5cGarbQWvFZiUJqH3wGRawchSjnjzh/Amy2F8YHZlJFJVWYAQ4PUx8E9GQeShHsCMs2Fg3ToiUemtIpgXaCGMnCby+sCEnRW4SPA3P9/Et2VO/YXsCYSwkfsNsqSAWMnYoicNwJy9ALYny1uAUxgBrv4dELuz+St9CcUrB0w=" name="__FlowState__">
<input type="hidden" value="364" name="__fspayload">
<div id="profileFormHeader"> Personal Information </div>
<div id="profileFormBody">
If you wish to change your name, contact the FreeScoresAndMore customer service center at 1-877-787-9002.
<p>
<p> </p>
<table cellspacing="0" cellpadding="2" border="0" summary="">
<tbody>
<tr>
<td width="220" align="left">
<label for="mpadr1">Address 1 (PO Box/Rural Route is ok)</label>
<br>
<input id="mpadr1" type="text" value="123 new address" name="UpdateProfile_Address1" maxlength="25" size="28">
</td>
<td align="left" colspan="2">
<td width="200" valign="top" bgcolor="#FFFFFF" align="left" style="background-color:#EFF0F2;" rowspan="3">
</tr>
<tr>
<tr>
</tbody>
</table>
</div>
<div id="profileFormHeader"> Membership Information </div>
<div id="profileFormBody">
<div id="profileFormHeader"> Subscription Preference </div>
<div id="profileFormBody">
<div style="text-align:center;margin-top:10px;">
</form>
</div>
<input type="hidden" value="7iuanXFUppuCMU7T/6LZi09GsdZ2dCYBLeTL/Wfbclc=" name="authToken">
</div>
</div>
<div class="clear" style="padding-bottom:20px;"></div>
<div id="subFooter">
<div id="mainFooter">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.5.2/jquery.min.js" type="text/javascript">
<script type="text/javascript">
<script type="text/javascript">
</body>
</html>
Thanks In Advance
I don't think there's a magic-ally way of doing things. There's a lot of guides/tutorials out there where you can start learning.
Nonetheless, here's the xpath code that you need //label[#for='mpadr1'] or //label[contains(.,'Address 1')]

Removing the duplicate id in jstl

I am using jstl. In my page there is a loop which creates div's with different ids but i have seen there are some of the div( after loop completion) having the same id which(div) i want to remove or say want to prevent them to be created. I want to achieve it inside the loop itself through jstl. Please help.
Please find the code segment-
<c:forEach var="menuType" items="${menuTypes}" begin="0" varStatus="mCount">
<c:set var="mcnt" value="${mCount.count}"></c:set>
<div class="window" id="window${mcnt}" onmouseout="hideBub('${mcnt}');" style="top:${mcnt*200}px">
<div id="inner-cont${mcnt}" class="inner-cont" align="center" onmouseover="showBubble('${mcnt}');">
${menuType.functionName}(<span id="totRecord"></span>)
<div class="bubble" id="bubble${mcnt}" style="margin-left:50px;" onmouseout="hideBub('${mcnt}');">
<a href="#">
<span>[+] Add New
</span>
</a>
<hr>
<div class="box">
<div class="box-content">
<table width="100%" class="hovertable">
<tbody>
<tr><td></td></tr>
</tbody>
</table>
</div>
</div>
</div>
<div id="dispRec${mcnt}" class="dispRec"></div>
</div>
<div align="center" class="btnContainer" style="width:90px;">
<span class="linkButton">View</span>
<span class="linkButton">Edit</span>
<span class="linkButton">Delete</span>
</div>
<div align="center" class="btnContainer" style="width:47px;float: right">
<span class="linkButton" style="display: inline-block;padding: 1px;margin-right: 2px">
Analysis
</span>
</div>
</div>
</c:forEach>
</div>

Html Agility Pack: make code look neat

Can I use Html Agility Pack to make the output look nicely indented, unnecessary white space stripped?
HAP is not going to give you the results you are after.
Try using a .net wrapper for HtmlTidy such as the one found here
using System;
using System.IO;
using System.Net;
using Mark.Tidy;
namespace CleanupHtml
{
/// <summary>
/// http://markbeaton.com/SoftwareInfo.aspx?ID=81a0ecd0-c41c-48da-8a39-f10c8aa3f931
/// </summary>
internal class Program
{
private static void Main(string[] args)
{
string html =
new WebClient().DownloadString(
"http://stackoverflow.com/questions/2593147/html-agility-pack-make-code-look-neat/2610903#2610903");
using (Document doc = new Document(html))
{
doc.ShowWarnings = false;
doc.Quiet = true;
doc.OutputXhtml = true;
doc.OutputXml = true;
doc.IndentBlockElements = AutoBool.Yes;
doc.IndentAttributes = false;
doc.IndentCdata = true;
doc.AddVerticalSpace = false;
doc.WrapAt = 120;
doc.CleanAndRepair();
string output = doc.Save();
Console.WriteLine(output);
File.WriteAllText("output.htm", output);
}
}
}
}
Results:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy for Windows (vers 14 October 2008), see www.w3.org" />
<title>
Html Agility Pack: make code look neat - Stack Overflow
</title>
<link rel="stylesheet" href="http://sstatic.net/so/all.css?v=6638" type="text/css" />
<link rel="shortcut icon" href="http://sstatic.net/so/favicon.ico" />
<link rel="apple-touch-icon" href="http://sstatic.net/so/apple-touch-icon.png" />
<link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href=
"http://sstatic.net/so/opensearch.xml" />
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js">
</script>
<script type="text/javascript" src="http://sstatic.net/so/js/master.js?v=6523">
</script>
<script type="text/javascript">
//<![CDATA[
var imagePath='http://sstatic.net/so/img/';
//]]>
</script>
<link rel="canonical" href="http://stackoverflow.com/questions/2593147/html-agility-pack-make-code-look-neat" />
<link rel="alternate" type="application/atom+xml" title=
"Feed for question 'Html Agility Pack: make code look neat'" href="/feeds/question/2593147" />
<script src="http://sstatic.net/so/js/question.js?v=6714" type="text/javascript">
</script>
<script type="text/javascript">
//<![CDATA[
var fkey = "b00609a1a5f2966a687eca3f84e4dd64";
$(function() {
vote.init(2593147);
comments.init();
styleCode();
});
//]]>
</script>
</head>
<body>
<noscript>
<div id="noscript-padding"></div></noscript>
<div id="notify-container"></div><script type="text/javascript">
//<![CDATA[
$(function() { notify.showFirstTime(); });
//]]>
</script>
<div class="container">
<div id="header">
<div id="topbar">
<div id="hlinks">
<a href=
"/users/login?returnurl=%2fquestions%2f2593147%2fhtml-agility-pack-make-code-look-neat%2f2610903">login</a>
<span class="lsep">|</span> careers <span class=
"lsep">|</span> about <span class="lsep">|</span> faq
</div>
<div id="hsearch">
<form id="search" action="/search" method="get" name="search">
<div>
<input name="q" class="textbox" tabindex="1" onfocus="if (this.value=='search') this.value = ''" type=
"text" maxlength="80" size="28" value="search" />
</div>
</form>
</div>
</div><br class="cbt" />
<div id="hlogo">
<img src="http://sstatic.net/so/img/logo.png" width="250" height="61" alt="Stack Overflow" />
</div>
<div id="hmenus">
<div class="nav">
<ul>
<li class="youarehere">
Questions
</li>
<li>
Tags
</li>
<li>
Users
</li>
<li>
Badges
</li>
<li>
Unanswered
</li>
</ul>
</div>
<div class="nav" style="float:right">
<ul>
<li style="margin-right:0px">
Ask Question
</li>
</ul>
</div>
</div>
</div>
<div id="content">
<div id="question-header">
<h2>
<a href="/questions/2593147/html-agility-pack-make-code-look-neat" class="question-hyperlink">Html Agility
Pack: make code look neat</a>
</h2>
</div>
<div id="mainbar">
<div id="question" class="">
<div class="everyonelovesstackoverflow">
<script type="text/javascript">
//<![CDATA[
document.write('<s'+'cript lang' + 'uage="jav' + 'ascript" src="http://ads.stackoverflow.com/a.aspx?ZoneID=3&Task=Get&IFR=False&PageID=52405&SiteID=1&Random=' + (+new Date()) + '&Keywords=htmlagilitypack">');
document.write('</'+'scr'+'ipt>');
//]]>
</script> <noscript>
<div>
<a href=
"http://ads.stackoverflow.com/a.aspx?ZoneID=3&Task=Click&Mode=HTML&SiteID=1&PageID=52405">
<img src=
"http://ads.stackoverflow.com/a.aspx?ZoneID=3&Task=Get&Mode=HTML&SiteID=1&PageID=52405"
alt="" /></a>
</div></noscript>
</div>
<table>
<tr>
<td class="votecell">
<div class="vote">
<input type="hidden" value="2593147" /> <img class="vote-up" src=
"http://sstatic.net/so/img/vote-arrow-up.png" width="40" height="25" alt="vote up" title=
"This question is useful and clear (click again to undo)" /> <span class="vote-count-post">1</span>
<img class="vote-down" src="http://sstatic.net/so/img/vote-arrow-down.png" width="40" height="25"
alt="vote down" title="This question is unclear or not useful (click again to undo)" /> <img class=
"vote-favorite" src="http://sstatic.net/so/img/vote-favorite-off.png" width="32" height="31" alt=
"star" title="This is a favorite question (click again to undo)" />
<div class="favoritecount"></div>
</div>
</td>
<td>
<div>
<div class="post-text">
<p>
Can I use Html Agility Pack to make the output look nicely indented, unnecessary white space
stripped?
</p>
</div>
<div class="post-taglist">
<a href="/questions/tagged/htmlagilitypack" class="post-tag" title=
"show questions tagged 'htmlagilitypack'" rel="tag">htmlagilitypack</a>
</div>
<table class="fw">
<tr>
<td class="vt">
<div class="post-menu">
<a id="flag-post-2593147" title="flag this post for serious problems" name=
"flag-post-2593147">flag</a>
</div>
</td>
<td class="post-signature owner">
<div class="user-info">
<div class="user-action-time">
asked <span title="2010-04-07 14:13:47Z" class="relativetime">2 days ago</span>
</div>
<div class="user-gravatar32">
<a href="/users/51795/illdev"><img src=
"http://www.gravatar.com/avatar/52dc0db2cdacc6e9769d074a37466317?s=32&d=identicon&r=PG"
height="32" width="32" alt="" /></a>
</div>
<div class="user-details">
illdev<br />
<span class="reputation-score" title="reputation score">53</span><span title=
"5 bronze badges"><span class="badge3">●</span><span class=
"badgecount">5</span></span>
</div>
</div><br class="cbt" />
<div class="accept-rate cool" title=
"this user has accepted an answer for 2 of 4 eligible questions">
50% accept rate
</div>
</td>
</tr>
</table>
</div>
</td>
</tr>
<tr>
<td class="votecell"></td>
<td>
<div id="comments-2593147" class="comments">
<table>
<tbody>
<tr id="comment-2600849" class="comment">
<td></td>
<td class="comment-text">
<div>
what output? From where? some more details perhaps? – <a href=
"/users/97614/sam-holder" title="1868" class="comment-user">Sam Holder</a> <span class=
"comment-date"><span title="2010-04-07 14:16:41Z">2 days ago</span></span>
</div>
</td>
</tr>
<tr id="comment-2600851" class="comment">
<td></td>
<td class="comment-text">
<div>
<i>(reference)</i> <a href="http://htmlagilitypack.codeplex.com/Wikipage" rel=
"nofollow">htmlagilitypack.codeplex.com/Wikipage</a> – <a href=
"/users/208809/gordon" title="16497" class="comment-user">Gordon</a> <span class=
"comment-date"><span title="2010-04-07 14:16:55Z">2 days ago</span></span>
</div>
</td>
</tr>
<tr id="comment-2624419" class="comment">
<td></td>
<td class="comment-text">
<div>
output = html code output – <a href="/users/51795/illdev" title="53" class=
"comment-user owner">illdev</a> <span class="comment-date"><span title=
"2010-04-10 13:14:42Z">12 secs ago</span></span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</table>
</div>
<div id="answers">
<a name="tab-top" id="tab-top"></a>
<div id="answers-header">
<div id="subheader">
<h2>
2 Answers
</h2>
<div id="tabs">
<a href="/questions/2593147?tab=oldest#tab-top" title=
"Answers in the order they were given">oldest</a> <a href="/questions/2593147?tab=newest#tab-top"
title="Most recent answers first">newest</a> <a class="youarehere" href=
"/questions/2593147?tab=votes#tab-top" title="Answers with the most votes first">votes</a>
</div>
</div>
</div><a name="2610845"></a>
<div id="answer-2610845" class="answer">
<table>
<tr>
<td class="votecell">
<div class="vote">
<input type="hidden" value="2610845" /> <img class="vote-up" src=
"http://sstatic.net/so/img/vote-arrow-up.png" width="40" height="25" alt="vote up" title=
"This answer is useful (click again to undo)" /> <span class="vote-count-post">0</span>
<img class="vote-down" src="http://sstatic.net/so/img/vote-arrow-down.png" width="40" height="25"
alt="vote down" title="This answer is not useful (click again to undo)" />
</div>
</td>
<td>
<div class="post-text">
<p>
A variation of this question has been answered recently
</p>
<ul>
<li>
<a href=
"http://stackoverflow.com/questions/2490765/which-is-the-best-html-tidy-pack-is-there-any-option-in-html-agility-pack-to-mak/2507673#2507673">
http://stackoverflow.com/questions/2490765/which-is-the-best-html-tidy-pack-is-there-any-option-in-html-agility-pack-to-mak/2507673#2507673</a>
</li>
</ul>
<p>
Basically the outcome of this was that while you <strong>can</strong> use HtmlAgilityPack to
clean it up a bit by using the fix nested tags.
</p>
<p>
The best solution is to use something called Tidy which is an application that was originally
created by some developers at w3c and then made open source. Its the engine that powers the w3c
validator as well.
</p>
<p>
This article covers how to use it but you had to sign up (free) to view it:
</p>
<ul>
<li>
<a href="http://www.devx.com/dotnet/Article/20505/1763/" rel=
"nofollow">http://www.devx.com/dotnet/Article/20505/1763/</a>
</li>
</ul>
<p>
It seems like a legit article but its funny because nobody else seems to have covered this
topic in the last six years...
</p>
</div>
<table class="fw">
<tr>
<td class="vt">
<div class="post-menu">
<a href="/questions/2593147/html-agility-pack-make-code-look-neat/2610845#2610845" title=
"permalink to this answer">link</a><span class="lsep">|</span><a id="flag-post-2610845"
title="flag this post for serious problems" name="flag-post-2610845">flag</a>
</div>
</td>
<td align="right" class="post-signature">
<div class="user-info">
<div class="user-action-time">
answered <span title="2010-04-09 20:55:18Z" class="relativetime">16 hours ago</span>
</div>
<div class="user-gravatar32">
<a href="/users/156388/rtpharry"><img src=
"http://www.gravatar.com/avatar/6811db2b37e824fdf6c5c4fcdddd4146?s=32&d=identicon&r=PG"
height="32" width="32" alt="" /></a>
</div>
<div class="user-details">
rtpHarry<br />
<span class="reputation-score" title="reputation score">88</span><span title=
"6 bronze badges"><span class="badge3">●</span><span class=
"badgecount">6</span></span>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="votecell"></td>
<td>
<div id="comments-2610845" class="comments dno">
<table>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</table>
</div>
<div class="everyonelovesstackoverflow">
<script type="text/javascript">
//<![CDATA[
document.write('<s'+'cript lang' + 'uage="jav' + 'ascript" src="http://ads.stackoverflow.com/a.aspx?ZoneID=14&Task=Get&IFR=False&PageID=52405&SiteID=1&Random=' + (+new Date()) + '&Keywords=htmlagilitypack">');
document.write('</'+'scr'+'ipt>');
//]]>
</script> <noscript>
<div>
<a href=
"http://ads.stackoverflow.com/a.aspx?ZoneID=14&Task=Click&Mode=HTML&SiteID=1&PageID=52405">
<img src=
"http://ads.stackoverflow.com/a.aspx?ZoneID=14&Task=Get&Mode=HTML&SiteID=1&PageID=52405"
alt="" /></a>
</div></noscript>
</div><a name="2610903"></a>
<div id="answer-2610903" class="answer">
<table>
<tr>
<td class="votecell">
<div class="vote">
<input type="hidden" value="2610903" /> <img class="vote-up" src=
"http://sstatic.net/so/img/vote-arrow-up.png" width="40" height="25" alt="vote up" title=
"This answer is useful (click again to undo)" /> <span class="vote-count-post">0</span>
<img class="vote-down" src="http://sstatic.net/so/img/vote-arrow-down.png" width="40" height="25"
alt="vote down" title="This answer is not useful (click again to undo)" />
</div>
</td>
<td>
<div class="post-text">
<p>
Output as XHTML and run that through an <a href=
"http://msdn.microsoft.com/en-us/library/system.xml.xmltextwriter.indentation.aspx" rel=
"nofollow">XmlTextWriter</a>
</p>
</div>
<table class="fw">
<tr>
<td class="vt">
<div class="post-menu">
<a href="/questions/2593147/html-agility-pack-make-code-look-neat/2610903#2610903" title=
"permalink to this answer">link</a><span class="lsep">|</span><a id="flag-post-2610903"
title="flag this post for serious problems" name="flag-post-2610903">flag</a>
</div>
</td>
<td align="right" class="post-signature">
<div class="user-info">
<div class="user-action-time">
answered <span title="2010-04-09 21:02:34Z" class="relativetime">16 hours ago</span>
</div>
<div class="user-gravatar32">
<a href="/users/242897/sky-sanders"><img src=
"http://www.gravatar.com/avatar/df4a7fbd8a054fd6193ca0ee62952f1f?s=32&d=identicon&r=PG"
height="32" width="32" alt="" /></a>
</div>
<div class="user-details">
Sky Sanders<br />
<span class="reputation-score" title="reputation score">4,014</span><span title=
"2 silver badges"><span class="badge2">●</span><span class=
"badgecount">2</span></span><span title="14 bronze badges"><span class=
"badge3">●</span><span class="badgecount">14</span></span>
</div>
</div>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="votecell"></td>
<td>
<div id="comments-2610903" class="comments dno">
<table>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</table>
</div>
<form id="post-form" action="/questions/2593147/answer/submit" method="post" name="post-form">
<h2 class="space">
Your Answer
</h2><script src="http://sstatic.net/so/Js/wmd.js?v=6016" type="text/javascript">
</script> <script type="text/javascript">
//<![CDATA[
$(function() {
editorReady(1, heartbeat.answers);
});
//]]>
</script>
<div id="post-editor">
<div id="wmd-container">
<div id="wmd-button-bar"></div>
<textarea id="wmd-input" name="post-text" cols="92" rows="15" tabindex="101">
</textarea>
</div>
<div class="community-option">
<input id="communitymode" name="communitymode" type="checkbox" /> <label for="communitymode" title=
"community owned posts do not generate any reputation for the owner, have a lower reputation barrier for collaborative editing, and show only a revision history instead of a signature block">
community wiki</label>
</div>
<div id="wmd-preview"></div>
<div id="edit-block">
<input id="fkey" name="fkey" type="hidden" value="b00609a1a5f2966a687eca3f84e4dd64" /> <input id=
"author" name="author" type="text" />
</div>
</div>
<div class="form-item">
<table>
<tr>
<td class="vm">
<label for="openid_identifier">OpenID Login</label> <input id="openid_identifier" name=
"openid_identifier" class="openid-identifer" type="text" size="40" maxlength="200" value=""
tabindex="104" />
<div class="form-item-info">
Get an OpenID
</div>
</td>
<td class="orcell">
<div class="orword">
or
</div>
<div class="orline"></div>
</td>
<td class="vm">
<div>
<label for="display-name">Name</label> <input id="display-name" name="display-name" type="text"
size="30" maxlength="30" value="" tabindex="105" />
</div>
<div>
<label for="m-address">Email</label> <input id="m-address" name="m-address" type="text" size=
"40" maxlength="100" value="" tabindex="106" /> <span class="edit-field-overlay" style=
"color:#999; font-weight:normal">never shown</span>
</div>
<div>
<label for="home-page">Home Page</label> <input id="home-page" name="home-page" type="text"
size="40" maxlength="200" value="" tabindex="107" />
</div>
</td>
</tr>
</table>
</div>
<div class="form-submit cbt">
<input id="submit-button" type="submit" value="Post Your Answer" tabindex="110" />
</div>
</form>
<h2 class="space">
Not the answer you're looking for? Browse other questions tagged <a href=
"/questions/tagged/htmlagilitypack" class="post-tag" title="show questions tagged 'htmlagilitypack'" rel=
"tag">htmlagilitypack</a> or ask your own question.
</h2>
</div><img src="/posts/2593147/ivc/1707" class="dno" alt="" />
</div>
A variation of this question has been answered recently
Which is the best HTML tidy pack? Is there any option in HTML agility pack to make HTML webpage tidy?
Basically the outcome of this was that while you can use HtmlAgilityPack to clean it up a bit by using the fix nested tags.
The best solution is to use something called Tidy which is an application that was originally created by some developers at w3c and then made open source. Its the engine that powers the w3c validator as well.
This article covers how to use it but you had to sign up (free) to view it:
http://www.devx.com/dotnet/Article/20505/1763/
It seems like a legit article but its funny because nobody else seems to have covered this topic in the last six years...
See a similar question here: HtmlAgilityPack: how to create indented HTML? and my answer:
No, and it's a "by design" choice.
There is a big difference between XML
(or XHTML, which is XML, not HTML)
where - most of the times -
whitespaces are no specific meaning,
and HTML.
This is not a so minor improvement, as
changing whitespaces can change the
way some browsers render a given HTML
chunk, especially malformed HTML (that
is in general well handled by the
library). And The Html Agility Pack
was designed to minimize the way the
HTML is rendered, not the way the
markup is written.
I'm not saying it's not feasible or
plain impossible. Obviously you can
convert to XML and voilà (and you
could write an extension method to
make this easier) but the rendered
output may be different, in the
general case.

Resources