Need help reading XML using LINQ - linq

I'm trying to bind the contents of the following file using LINQ but having issues with the syntax.
<metadefinition>
<page>
<name>home</name>
<metas>
<meta>
<metaname>
title
</metaname>
<metavalue>
Welcome Home
</metavalue>
</meta>
<meta>
<metaname>
description
</metaname>
<metavalue>
Welcome Home Description
</metavalue>
</meta>
</metas>
</page>
<page>
<name>results</name>
<metas>
<meta>
<metaname>
title
</metaname>
<metavalue>
Welcome to Results
</metavalue>
</meta>
</metas>
</page>
</metadefinition>
My query looks like this but as you can see it is missing the retrieval of the metas tag. How do I accomplish this?
var pages = from p in xmlDoc.Descendants(XName.Get("page"))
where p.Element("name").Value == pageName
select new MetaPage
{
Name = p.Element("name").Value,
MetaTags = p.Elements("metas").Select(m => new Tag { MetaName = m.Element("metaname").Value.ToString(),
MetaValue = m.Element("metacontent").Value.ToString()
}).ToList()
};

If <metadefinition> is a root element, then there is no need for iterating over all descendants of the document, that's way too inefficient.
var pages = from p in xmlDoc.Root.Elements("page")
where p.Element("name").Value == pageName
select new MetaPage {
Name = p.Element("name").Value,
MetaTags = p.Element("metas").Elements("meta").Select(m=>new Tag{
MetaName = m.Element("metaname").Value.ToString(),
MetaValue = m.Element("metavalue").Value.ToString()
}).ToList()
};

Related

Dynamic section name in EvoHtmlToPdf header

Is there a way in EvoHtmlToPdf to display the section/subsection of a page in the header/footer (i.e. the text of the "current" h1/h2 tag)?
With wkhtmltopdf and other tools, it is possible to replace special tags via JavaScript and the HTML header template (as described here for example Dynamic section name in wicked_pdf header).
Unfortunately, such a solution does not seem to work with EvoHtmlToPdf.
Here's the HTML code of my header template:
<html id="headerFooterHtml">
<head>
<script>
function substHeaderFooter(){
var vars={};
var searchString = document.location.search;
var debugMessage = document.getElementById("showJavaScriptWasExecuted");
if (debugMessage)
debugMessage.textContent = "Search string: ["+ searchString + "]";
var search_list = searchString.substring(1).split('&');
for(var i in search_list){
var content=search_list[i].split('=',2);
vars[content[0]] = decodeQueryParam(content[1]);
}
var tags=['section','subsection'];
for(var i in tags){
var name = tags[i],
classElements = document.getElementsByClassName(name);
for(var j=0; j<classElements.length; ++j){
classElements[j].textContent = vars[name];
}
}
}
</script>
</head>
<body id="headerFooterBody" onload="substHeaderFooter()">
<div id="showJavaScriptWasExecuted"></div>
<div id="sections">{section} / {subsection}</div>
</body>
Resulting header in PDF
I already added the EvoHtmlToPdf PrepareRenderPdfPageDelegate event handler to my code (if that's the way I have to go) but I don't know how to access the section of the current page there...
Thanks in advance for your help!

Why does my header end up way outside the page with TuesPechkin?

I do a convert
document.Objects.Clear();
document.GlobalSettings.PaperSize = PaperKind.A4;
document.Objects.Add(new ObjectSettings
{
HtmlText = xml,
HeaderSettings = new HeaderSettings { HtmlUrl = headerPath, RightText = "[page]/[sitepages]", ContentSpacing = 10 },
FooterSettings = new FooterSettings { HtmlUrl = headerPath, RightText = "[page]/[sitepages]" },
});
and the HTML is visible in the footer, but in the header it's way outside the page. It looks like it tries to put the header on the previous page, that's how far outside it it.
OK, the answer was as simple and not obvious. The header HTML is a bit more sensitise so it needed a <!doctype html> first in the file.

Parse CData from XML in C#

Am trying to parse my xml which has CData tag as the value for one of its nodes. My XML structure is as below.
<node1>
<node2>
<![CDATA[ <!--###BREAK TYPE="TABLE" ###--> <P><CENTER>... html goes here.. ]]>
</node2>
</node1>
My code is as below. When I parse I get response with CData tag and not the value in the CData tag. Can you pls help me fix my problem?
XDocument xmlDoc = XDocument.Parse(responseString);
XElement node1Element = xmlDoc.Descendants("node1").FirstOrDefault();
string cdataValue = node1Element.Element("node2").Value;
Actual Output: <![CDATA[ <!--###BREAK TYPE="TABLE" ###--> <P><CENTER>... html goes here.. ]]>
Expected Output: <!--###BREAK TYPE="TABLE" ###--> <P><CENTER>... html goes here..
I was not sure if System.XML.Linq.XDocument was causing the problem. So I tried XMLDocument version as below.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(responseString);
XmlNode node = xmlDoc.DocumentElement.SelectSingleNode(#"/node1/node2");
XmlNode childNode = node.ChildNodes[0];
if (childNode is XmlCDataSection)
{}
And my if loop returns false. So looks like there is something wrong with my xml and it is actually not a valid CData? Pls help me fix the problem.
Pls let me know if you need more details.
What you're describing will never actually happen. Getting the Value of a node that contains cdata as a child will give you the contents of the cdata, the inner text. You should already be getting your expected output.
The only way you can get the actual cdata node is if you actually get the cdata node.
var cdata = node1Element.Element("node2").FirstNode;
i tried your code and the CData value are correct... ?!?
how you fill your reponseString? :-)
static void Main(string[] args)
{
string responseString = "<node1>" +
"<node2>" +
"<![CDATA[ <!--###BREAK TYPE=\"TABLE\" ###--> <P><CENTER>... html goes here.. ]]>" +
"</node2>" +
"</node1>";
XDocument xmlDoc = XDocument.Parse(responseString);
XElement node1Element = xmlDoc.Descendants("node1").FirstOrDefault();
string cdataValue = node1Element.Element("node2").Value;
// output: <!--###BREAK TYPE=\"TABLE\" ###--> <P><CENTER>... html goes here..
}
I resolved this case in this form:
XDocument xdoc = XDocument.Parse(vm.Xml);
XNamespace cbc = #"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2";
var list2 =
(from el in xdoc.Descendants(cbc + "Description")
select el).FirstOrDefault();
var queryCDATAXML = (from eel in list2.DescendantNodes()
select eel.Parent.Value.Trim()).FirstOrDefault();
It was because StreamReader was escaping the html. So "<" was getting changed to "<". Hence it was not getting recognized correctly as a cdatatag. So had to do unescape first -
XDocument xmlDoc = XDocument.Parse(HttpUtility.HtmlDecode(responseString))
and that fixed it.

Watir - file_field.exists? comes up as false even though there is an upload form on the page

I am looking at a web page with an overlay that contains a Choose File/Browse etc button
Below is a snippet from the page
<form name = "form1" method = "post" action = "UploadPhoto.aspx?PhotoUploaderFor=1" id = "form1" enctype = "multipart/form-data">
<div>...</div>
<div class = "popup-form photoUploader">
<div class = "group data-row">
<label>...</label>
<input type ="file" name = "FileUpload1" id = "FileUpload1" class = "browse_file">
Watir has file_fields which can handle file uploads
I've tried on the test site (www.tinypic.com) and the controls work fine there
b.file_fields.exists?
will return true
However, on the page i'm looking at with the snippet above,
b.file_fields.exists?
returns false
I thought it might be because of the form, so I also tried
b.form(:name => "form1").file_fields.exists?
Which also returns false
If i try to access the button itself directly, this also returns false
b.element(:xpath => "//input [#name = 'FileUpload1']").exists?
Anyone have any ideas?
EDIT
The form is inside an iframe
<iframe id="Step1_Banner1_Popup_Photo_Photo_Iframe_PhotoUploader" class="photoUploaderFrame" scrolling="no" src="../MSReport3/UserControls/UploadPhoto.aspx?PhotoUploaderFor=1">
#document
<!DOCTYPE html PUBLIC "-//W3C//DTD CHTML 1.0 .....">
<html xmlns = "http://www.w3.org/1999/xhtml">
<head>...</head>
<body>
<code from above goes here>
NOTE: I'm posting the solution from the comments so that (hopefully) this question can be tagged as answered.
The overlay is within a frame, and it's necessary to specify the frame via method-chaining. For example:
b.frame(:id => "Step1_Banner_Popup_Photo_Photo_Iframe_PhotoUploader").file_field.exists?

Parse HTML doc with HtmlAgilityPack-Xpath, RegExp

I try parse image url from html with HtmlAgilityPack. In html doc I have img tag :
<a class="css_foto" href="" title="Fotka: MyKe015">
<span>
<img src="http://213.215.107.125/fotky/1358/93/v_13589304.jpg?v=6"
width="176" height="216" alt="Fotka: MyKe015" />
</span>
</a>
I need get from this img tag atribute src. I need this: http://213.215.107.125/fotky/1358/93/v_13589304.jpg?v=6.
I know this:
Src atribute consist url, url start
with
http://213.215.107.125/fotky
I know value of alt atribute Url
have
variable lenght and also html doc
consist other img tags with url, which start with
http://213.215.107.125/fotky
I know alt attribute of img tag (Fotka: Myke015))
Any advance, I try many ways, but nothing works good.
Last I try this:
List<string> src;
var req = (HttpWebRequest)WebRequest.Create("http://pokec.azet.sk/myke015");
req.Method = "GET";
using (WebResponse odpoved = req.GetResponse())
{
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.Load(odpoved.GetResponseStream());
var nodes = htmlDoc.DocumentNode.SelectNodes("//img[#src]");
src = new List<string>(nodes.Count);
if (nodes != null)
{
foreach (var node in nodes)
{
if (node.Id != null)
src.Add(node.Id);
}
}
}
Your XPath selects the img nodes, not the src attributes belonging to them.
Instead of (selecting all image tags that have a src attribute):
var nodes = htmlDoc.DocumentNode.SelectNodes("//img[#src]");
Use this (select the src attributes that are child nodes of all img elements):
var nodes = htmlDoc.DocumentNode.SelectNodes("//img/#src");
This XPath 1.0 expression:
//a[#alt='Fotka: MyKe015']/#src

Resources