Parse CData from XML in C# - linq

Am trying to parse my xml which has CData tag as the value for one of its nodes. My XML structure is as below.
<node1>
<node2>
<![CDATA[ <!--###BREAK TYPE="TABLE" ###--> <P><CENTER>... html goes here.. ]]>
</node2>
</node1>
My code is as below. When I parse I get response with CData tag and not the value in the CData tag. Can you pls help me fix my problem?
XDocument xmlDoc = XDocument.Parse(responseString);
XElement node1Element = xmlDoc.Descendants("node1").FirstOrDefault();
string cdataValue = node1Element.Element("node2").Value;
Actual Output: <![CDATA[ <!--###BREAK TYPE="TABLE" ###--> <P><CENTER>... html goes here.. ]]>
Expected Output: <!--###BREAK TYPE="TABLE" ###--> <P><CENTER>... html goes here..
I was not sure if System.XML.Linq.XDocument was causing the problem. So I tried XMLDocument version as below.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(responseString);
XmlNode node = xmlDoc.DocumentElement.SelectSingleNode(#"/node1/node2");
XmlNode childNode = node.ChildNodes[0];
if (childNode is XmlCDataSection)
{}
And my if loop returns false. So looks like there is something wrong with my xml and it is actually not a valid CData? Pls help me fix the problem.
Pls let me know if you need more details.

What you're describing will never actually happen. Getting the Value of a node that contains cdata as a child will give you the contents of the cdata, the inner text. You should already be getting your expected output.
The only way you can get the actual cdata node is if you actually get the cdata node.
var cdata = node1Element.Element("node2").FirstNode;

i tried your code and the CData value are correct... ?!?
how you fill your reponseString? :-)
static void Main(string[] args)
{
string responseString = "<node1>" +
"<node2>" +
"<![CDATA[ <!--###BREAK TYPE=\"TABLE\" ###--> <P><CENTER>... html goes here.. ]]>" +
"</node2>" +
"</node1>";
XDocument xmlDoc = XDocument.Parse(responseString);
XElement node1Element = xmlDoc.Descendants("node1").FirstOrDefault();
string cdataValue = node1Element.Element("node2").Value;
// output: <!--###BREAK TYPE=\"TABLE\" ###--> <P><CENTER>... html goes here..
}

I resolved this case in this form:
XDocument xdoc = XDocument.Parse(vm.Xml);
XNamespace cbc = #"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2";
var list2 =
(from el in xdoc.Descendants(cbc + "Description")
select el).FirstOrDefault();
var queryCDATAXML = (from eel in list2.DescendantNodes()
select eel.Parent.Value.Trim()).FirstOrDefault();

It was because StreamReader was escaping the html. So "<" was getting changed to "<". Hence it was not getting recognized correctly as a cdatatag. So had to do unescape first -
XDocument xmlDoc = XDocument.Parse(HttpUtility.HtmlDecode(responseString))
and that fixed it.

Related

Convert HTML string to HTML and display in DIV

I have HTML string saved in data base , I want to display it into DIV in HTML format using javascript.
Example:
<p>Dear Friends</p> <h1> You have got invitation </h1>
I have used DOMParser like this
parser = new DOMParser();
htmlDoc = parser.parseFromString(document.getElementById(controlID).innerHTML, "text/html");
console.log(htmlDoc);
document.getElementById("emailBodyArea").innerHTML = parser
but in result I see [htmlObject]
You dont need to use any DOMParser just replace emailBodyArea innerHTML with controlId innerHTML or any string based HTML.
document.getElementById("emailBodyArea").innerHTML = document.getElementById(controlID).innerHTML;
You can try it on jsfiddle.

Need help reading XML using LINQ

I'm trying to bind the contents of the following file using LINQ but having issues with the syntax.
<metadefinition>
<page>
<name>home</name>
<metas>
<meta>
<metaname>
title
</metaname>
<metavalue>
Welcome Home
</metavalue>
</meta>
<meta>
<metaname>
description
</metaname>
<metavalue>
Welcome Home Description
</metavalue>
</meta>
</metas>
</page>
<page>
<name>results</name>
<metas>
<meta>
<metaname>
title
</metaname>
<metavalue>
Welcome to Results
</metavalue>
</meta>
</metas>
</page>
</metadefinition>
My query looks like this but as you can see it is missing the retrieval of the metas tag. How do I accomplish this?
var pages = from p in xmlDoc.Descendants(XName.Get("page"))
where p.Element("name").Value == pageName
select new MetaPage
{
Name = p.Element("name").Value,
MetaTags = p.Elements("metas").Select(m => new Tag { MetaName = m.Element("metaname").Value.ToString(),
MetaValue = m.Element("metacontent").Value.ToString()
}).ToList()
};
If <metadefinition> is a root element, then there is no need for iterating over all descendants of the document, that's way too inefficient.
var pages = from p in xmlDoc.Root.Elements("page")
where p.Element("name").Value == pageName
select new MetaPage {
Name = p.Element("name").Value,
MetaTags = p.Element("metas").Elements("meta").Select(m=>new Tag{
MetaName = m.Element("metaname").Value.ToString(),
MetaValue = m.Element("metavalue").Value.ToString()
}).ToList()
};

Html tags in xml (rss)

Followed http://damieng.com/blog/2010/04/26/creating-rss-feeds-in-asp-net-mvc to create RSS for my blog. Everything fine except html tags in xml document. Typical problem:
<br />
insted of
<br />
Normally I would use
#HtmlRaw()
or
MvcHtmlString()
But how can I fix it in XML document created with SyndicationFeed?
Edit:
Ok, I'm starting to think that my question is pointless.
Should I just leave my RSS as it is?
With the XML element, you can wrap the text with your HTML in it in as a CDATA section:
<![CDATA[
your html
]]>
I don't recommend doing that, however.
wrap the text in side the CDATA
var xml= '<person><name><![CDATA[<h1>john smith</h1>]]></name></person>',
xmlDoc = $.parseXML( xml ),
$xml = $( xmlDoc ),
$title = $xml.find( "name" );
$($title.text()).appendTo("body");
DEMO

Html Agility Pack: Setting an HtmlNode's Attribute Value isn't reflected in the HtmlDocument

In Html Agility Pack, when I set an attribute of an HtmlNode, should I see this in the HtmlDocument from which the node was selected?
Lets say that htmlDocument is an HtmlDocument. So the simplified code looks like this:
HtmlNode documentNode = htmlDocument.DocumentNode;
HtmlNodeCollection nodeCollection = documentNode.SelectNodes(someXPath);
foreach(var node in nodeCollection)
if(SomeCondition(node))
node.SetAttributeValue("class","something");
Now, I see the class attribte of node change, but I don't see this change reflected in the htmlDocument's html.
Actually it was a case of ProgrammerTooStupidException :(
I used a MyHtmlPage class, with an Html property and an DocumentProperty.
_html = theHtml;
_htmlDocument = new HtmlDocument();
HtmlDocument.LoadHtml(theHtml)l
_documentNode = HtmlDocument.DocumentNode;
Now, of coourse manipulation the DocumentNode had no effect on the _html value.
Posting this reply to clear the name of HAP.

Using HtmlAgilityPack to modify hyperlink tags

How to use HtmlAgilityPack to Replace all hyperlinks, e.g.:
<a href="url">Link</>
so that only the href attribute is left. the url.
Is this possible?
Dim Doc as HtmlDocument = new HtmlDocument
doc.LoadHtml(MyHtml)
Dim links As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//a")
For Each link In links
Dim att As HtmlAttribute = link.Attributes("href")
MyHtml = Myhtml.Replace(link.OuterHtml, att.Value)
Next

Resources