HTML Agility Pack - Get all links of a class - html-agility-pack

I want to get all the links inside a with a certain class.
An example of the HTML is
<tr>
<td>
<a class="dn-index-link" href="/dailynotes/symbol/659/-1/e-mini-sp500-june-2013">
ES M3
</a>
</td>
<td>
<a href="/dailynotes/symbol/659/-1/e-mini-sp500-june-2013">
E-mini S&P500 June 2013
</a>
</td>
</tr>
If I want to get all the links that have the class
class="dn-index-link"
what would be my XPath and HTML Agility code?
Thanks,
Will.

A code like this in a Console Application will dump the content of the HREF attribute for all A nodes (at any level in the whole document) with a CLASS attribute equal to 'dn-index-link' (Click here for a good XPATH tutorial):
HtmlDocument doc = new HtmlDocument();
doc.Load("mytest.htm");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a[#class='dn-index-link']"))
{
Console.WriteLine("node:" + node.GetAttributeValue("href", null));
}

Related

Jmeter WebDriver Sampler - How to Select an Element from a table

So I'm using Jmeter WebDriver sampler and I have a table where each row has a Delete and Edit button.
The layout of the HTML table is as follows:
<tbody id=table_id>
<tr>
<td> name </td>
<td>
<a class="some random text" href="edit.php/20" role="button>edit>edit </a>
<a class="some random text" onclick="delete20"> Delete </a>
</td>
</tr>
<tr>
<td> name2 </td>
<td>
<a class="some random text" href="edit.php/21" role="button>edit>edit </a>
<a class="some random text" onclick="delete21"> Delete </a>
</td>
</tr>
I'm unsure how to tell the webdriver to click a button based on the product name without an ID to findByElementID. I was thinking maybe I can parse through the table but I'm unsure what the steps are to take that route.
Any advice would be much appreciated!
Thank you!
Identify the product by its name, i.e. name or name2 using text() function
//td[text()=' name2 ']
Locate it's following sibling
//td[text()=' name2 ']/following-sibling::td
Locate the button which you want to click, i.e. "edit" or "delete"
//td[text()=' name2 ']/following-sibling::td/a[text()=' Delete ']
Demo:
More information:
XPath Language Reference
XPath Tutorial
Using the XPath Extractor in JMeter

How to get between two br tags in xpath?

I have a table with td like this
<td>
<span> Washington US <br>98101 Times Square</span>
</td>
I can get all the elements in the page, but I need to get those two values separately. If that isn't possible I would like to somehow get 98101 Times Square
I have tried doing something like string(//tr[3]//td[2])/ but all I get is the two text joined together.
You can select the text child nodes in the span element with span/text() so assuming your posted path selects the td containing the span you want //tr[3]//td[2]/span/text().
Here is a sample:
$html = <<<EOD
<html>
<body>
<table>
<tr>
<td>1</td>
</tr>
<tr>
<td>2</td>
</tr>
<tr>
<td>3,1</td>
<td>
<span> Washington US <br>98101 Times Square</span>
</td>
</tr>
</body>
</html>
EOD;
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$textNodes = $xpath->query('//tr[3]//td[2]/span/text()');
foreach ($textNodes as $text) {
echo $text->textContent . "\n";
}
Outputs
Washington US
98101 Times Square
Try
td/span/node()[1]
and
td/span/node()[3]
Or
td/span/text()[1]
td/span/text()[2]

How to find <span> Element within <tr> by XPath

My HTML Code looks like this:
<html>
<body>
<div>
</div>
<div>
<table>
<tbody id=a>
<tr>
<td>
<div>
<span>
some Text
</span>
</div>
</td>
</tr>
<tr>
<td>
<div>
<span>
some Text2
</span>
</div>
</td>
</tr>
<tr>
<td>
<div>
<span>
some Text3
</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</body>
I'm trying to select each of the span elements by their text. I'm able to select the tbody by id. I Tried this:
tbody.FindElement(By.XPath(String.Format(".//span[contains(text(), {0}))]", &var)));
(var = somex0020Text)
but this always returns the first <span> element in my table.
I also tried:
tbody.FindElements(By.XPath(String.Format(".//span[contains(text(), {0}))]", &var)));
which returned a list containing every single <span> element in my table, and I don't know why.
I also don't understand why
tbody.FindElement(By.XPath(String.Format(".//span[text() = {0})]", &var)));
throws an Element not found Exception, when the contain method returns a <span> element with just the same text.
I tried by using xpath as:
.//span[contains(text(),'some Text')]
it is selecting all the 3 span.
so to this i have refer to parent element.
for 1st span: .//tbody[#id='a']//tr[1]//span[contains(text(),'some Text')]
for 2nd: .//tbody[#id='a']//tr[2]//span[contains(text(),'some Text')]
for 3rd: .//tbody[#id='a']//tr[3]//span[contains(text(),'some Text')]
through this I can select every span element individually.
You could use the jQuery to get all the span elements within "Table".
Example:
var items = $('table div span');
items.each(function (x) {
alert(items[x].innerHTML);
});
tbody.FindElement(By.XPath(String.Format(".//span[contains(text(), {0}))]", &var)));
(var = somex0020Text)
but this always returns the first Element in my table.
This is an expected behavior in Selenium. As i can see, there are 3 elements with the same xpath as mentioned in your code, in this case Selenium returns the first element.
tbody.FindElements(By.XPath(String.Format(".//span[contains(text(), {0}))]", &var)));
which returned a list containing every single Element in my table, and i dont know why.
This is also an expected behavior in Selenium. FindElements will return all the elements with the same xpath.
So change the value of var to some Text2 or some Text3 to locate the other two elements.
The following xpath will work for some Text2 :
.//span[contains(text(), 'some Text2'))]
Try with this Xpath $x("//tr//span[contains(.,'some Text')]")
For what I can see, you are having a trouble with the contains. All 3 spans are containing this 'some Text' portion.
If you want to check the entire string, you could use .//span[text()='some Text'].
Hope this helps, and have fun with web parsing!

Using linq to remove a href tag inside cdata in xml file

I have following xml file:
<ab>
<![CDATA[
<table>
<tbody>
<tr>
<th>abcdef</th>
<th>Contact</th>
</tr>
<tr>
<p>
Home
</p>
</tr>
</tbody>
</table>
]]>
</ab>
I am still learning linq. Want to know if there is an easier way to find all a href = "/1/2/" tags inside cdata and remove them. Like in above example it should just show Contact and Home and remove the
void Main()
{
XDocument doc = XDocument.Load("C:\\test.xml");
XDocument xdoc = XDocument.Parse(doc.ToString());
XNode node = xdoc.DescendantNodes().Single(x => x.NodeType == XmlNodeType.CDATA);
if (node.Parent != null)
{
string content = node.Parent.Value.Trim();
IEnumerable<XElement> elements =
XDocument.Parse(content).Descendants().Where(x =>
{
XAttribute xAttribute = x.Attribute("href");
return
xAttribute !=
null && xAttribute.Value == "/1/2";
});
// do something here
}
}
contents of test.xml is
<ab>
<![CDATA[
<table>
<tbody>
<tr>
<th>abcdef</th>
<th>Contact</th>
</tr>
<tr>
<p>
Home
</p>
</tr>
</tbody>
</table>
]]>
</ab>
I don't think LINQ is the best way to go about this problem. Personally, I would use Regular Expression. Here is an example of how this could be done:
Example: Scanning for HREFs
In general, if you are doing any more intensive HTML processing, using an HTML parser is probably the best way to go, such as HtmlAgilityPack.
Regex sample code:
Regex hrefRegex = new Regex(#"href=""([^""]*"")", RegexOptions.IgnoreCase | RegexOptions.Compiled);
string output = hrefRegex.Replace(input, new MatchEvaluator(m => string.Empty));
Hope this helps,
Ivan

xpath expression to access tags after script

I have some problem getting all the html tags after script using Xpath
my html :
<table dir = "rtl .......">
<tbody>
<script src = "get.aspx?type=js&file=ajax&rev=3"......>
<script language = "JavaScript"......>
<script>..</script>
<tr>
<td id = "jm0x1"some code here...>
<td id = "jm0x2"some code here...>
also a lot of <tr> here....
</tbody>
how i can access all (td id = "jm0x..)
this is the page i want to parse: http://kooora.com/?c=6423
Something like this should work
//td[contains(#id, "jm0x")]
Then you can affine the contains string to the pattern you want.

Resources