i have html
<cr:checkboxes name="name1">
<cr:checkbox label="Checkbox 1" />
<cr:checkbox label="Checkbox 2" />
<cr:checkbox label="Checkbox 3" />
</cr:checkboxes>
i am using html agility to load html
var document = new HtmlDocument();
document.LoadHtml(htmlString);
//select all nodes that starts with `cr:checkboxes`
document.DocumentNode.SelectNodes("//cr:checkboxes");
while selecting i am getting exception
System.Xml.XPath.XPathException: 'Namespace Manager or XsltContext
needed. This query has a prefix, variable, or user-defined function.'
Typically, in any other xml document i would have solve this using XmlDocument and by adding namespace using namespace manager
How to select all nodes that has name cr:checkboxes
This will solve your problem.
var nodes = doc.DocumentNode.SelectNodes("//*[name()='cr:checkboxes']");
I couldn't find another way to support XmlNameSpaceManager with HtmlAgilityPack.
Related
I am just learning html agility pack and would like to extract a couple pieces of data from a website.
I want to store the item name and price into strings. I have html source code that contains 25 products
with 1 segment of the code posted below
I have very little xpath and html agility pack experience. and I am working on a class project to compare lowes and home depot prices for a few items.
I want to save string data_price = "14.97"; and string item = "Leaktite 5-Gal. Blue Bucket (Pack of 3)"
below is a portion of the source code I am working with
<div class="pod-inner">
<div class="productlist plp-pod__compare">
<div class="checkbox-btn js-podclick-analytics" data-podaction="compare">
<input type="checkbox" data-img="https://images.homedepot-static.com/productImages/8c1c50a0-e17c-4624-9e9e-35653052c1ce/svn/leaktite-paint-buckets-lids-209334-64_400_compressed.jpg" data-uom=" /package" data-price="$14.97" data-title="Leaktite 5-Gal. Blue Bucket (Pack of 3)" value="203924937" id="compare203924937" name="product" autocomplete="off" class="checkbox-btn__input">
so far I got
HtmlDocument doc = new HtmlDocument();
string home_bucket_url="https://www.homedepot.com/s/5%2520gallon%2520bucket?NCNI-5";
WebClient client = new WebClient();
string home_bucket_raw = client.DownloadString(home_bucket_url);
var findclasses = doc.DocumentNode.Descendants("input type").Where(d => d.Attributes.Contains("checkbox"));
foreach (var x in findclasses)
{
Console.WriteLine(x.ToString());
}
If you successfully selected node that you need (please debug to make sure), what you can do is simply get the attribute values that you need. Something like this:
x.Attributes["data-price"].Value;
x.Attributes["data-title"].Value;
I try to include an HTML-View in a XML-View which actually shouldn't be a problem. Unfortunately the content of the HTML-View is not added to the site. An error is not thrown.
In the XML-View I have a placeholder for the HTML-View:
<html:div
id="helptext">
</html:div>
In the controller of the XML-View I instantiate the HTML-View and add it to the placeholder:
var oController = sap.ui.controller("dividendgrowthtools.view.textviews.dividendcomparehelpDE");
var oTextView = sap.ui.view({
viewName: "dividendgrowthtools.view.textviews.dividendcomparehelpDE",
controller: oController,
type: sap.ui.core.mvc.ViewType.HTML
});
var oHelpText = this.getView().byId("helptext");
oTextView.placeAt(oHelpTextDiv.sId);
That's the content of the HTML-View:
<template data-controller-name="dividendgrowthtools.view.textviews.dividendcomparehelpDE">
<p>This is a test.</p>
</template>
Does anybody have an idea what the problem could be?
you need to be aware of the functionality of this.getView().byId();
SAPUI5 generates another ID then the one you specified in the XML-View("helptext") see:
So you need to pass the right ID. You need to be aware that these ID could possibily change, when you refactor the XML structure.
I would recommend you, not to use HTML elements in SAPUI5.
Some helpful links:
https://scn.sap.com/thread/3551589
https://plnkr.co/edit/wDBpQuxIWd0WGOoyulN0?p=preview (made by me)PLnkr Link
On this Url there is text i want to mine
http://www.mefik.co.il/provider.asp?provider_id=10757
I'm looking for the class 'big_obj_px_news_page'
tried all kinds of xpath options.
any help ?
I suggest you install Firefox+Firebug+Firepath to validate your xpaths. Your xpaths were close, but not enough.
//div[#class='big_obj_px_news_page']
// or if this div may have more class names
//div[contains(#class, 'big_obj_px_news_page')]
I created a unit test with the following code:
using System;
using System.IO;
using HtmlAgilityPack;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Xml;
namespace HtmlAgilityPackTests
{
[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestMethod1()
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(File.ReadAllText(AppDomain.CurrentDomain.BaseDirectory + "\\test.html"));
var item = doc.DocumentNode.SelectNodes("//*[contains(#class, 'big_obj_px_news_page')]");
Assert.IsNotNull(item);
}
}
}
This test passes with the exact html on the page provided. In your code you wrote var item = doc.DocumentNode.SelectNodes(Xpath), are you typing the exact xpath string above, or are you trying to use an xpath object?
If you're using an XPath object, it could be that you are setting up your XPath object incorrectly. The only other option I see is that you are not loading your Html correctly. In the unit test code above "test.html" contains the full html source from the page you provided, and resides in the same directory as the c# source code. In the test.html file properties window in Visual Studio, I've set "Copy to Output Directory" to "copy if newer". It's build action is "Content".
Perhaps if you describe how you're loading your html, we can be of further assistance.
Followed http://damieng.com/blog/2010/04/26/creating-rss-feeds-in-asp-net-mvc to create RSS for my blog. Everything fine except html tags in xml document. Typical problem:
<br />
insted of
<br />
Normally I would use
#HtmlRaw()
or
MvcHtmlString()
But how can I fix it in XML document created with SyndicationFeed?
Edit:
Ok, I'm starting to think that my question is pointless.
Should I just leave my RSS as it is?
With the XML element, you can wrap the text with your HTML in it in as a CDATA section:
<![CDATA[
your html
]]>
I don't recommend doing that, however.
wrap the text in side the CDATA
var xml= '<person><name><![CDATA[<h1>john smith</h1>]]></name></person>',
xmlDoc = $.parseXML( xml ),
$xml = $( xmlDoc ),
$title = $xml.find( "name" );
$($title.text()).appendTo("body");
DEMO
In Html Agility Pack, when I set an attribute of an HtmlNode, should I see this in the HtmlDocument from which the node was selected?
Lets say that htmlDocument is an HtmlDocument. So the simplified code looks like this:
HtmlNode documentNode = htmlDocument.DocumentNode;
HtmlNodeCollection nodeCollection = documentNode.SelectNodes(someXPath);
foreach(var node in nodeCollection)
if(SomeCondition(node))
node.SetAttributeValue("class","something");
Now, I see the class attribte of node change, but I don't see this change reflected in the htmlDocument's html.
Actually it was a case of ProgrammerTooStupidException :(
I used a MyHtmlPage class, with an Html property and an DocumentProperty.
_html = theHtml;
_htmlDocument = new HtmlDocument();
HtmlDocument.LoadHtml(theHtml)l
_documentNode = HtmlDocument.DocumentNode;
Now, of coourse manipulation the DocumentNode had no effect on the _html value.
Posting this reply to clear the name of HAP.