Unable to set InnerText using Html-Agility-Pack - html-agility-pack

Given an HTML document, I want to identify all the numbers in the document and add custom tags around the numbers.
Right now, i use the following:
HtmlNodeCollection bodyNode = htmlDoc.DocumentNode.SelectNodes("//body");
MatchCollection numbersColl = Regex.Matches(htmlNode.InnerText, <some regex>);
Once I get the numbersColl, I can traverse through each Match and get the index.
However, I can't change the InnerText since it is read-only.
What I need is that if match.Value = 100 and match.Index=25, I want to replace that 25 with
<span isIdentified='true'> 25 </span>
Any help on this will be greatly appreciated. Currently, since I am not able to modify the inner text, I have to modify the InnerHtml but some element might have 25 in it's innerHtml. That should not be touched. But how do I identify whether the number is within
an html tag i.e. < table border='1' > has 1 in the tag.

Here's what I did to work around the read-only property limitation of the InnerText property of a Text node, just select the Parent node of the Text node and note the index of the Text node in the child node collections of the Parent node. Then just do a ReplaceChild(...).
private void WriteText(HtmlNode node, string text)
{
if (node.ChildNodes.Count > 0)
{
node.ReplaceChild(htmlDocument.CreateTextNode(text), node.ChildNodes.First());
}
else
{
node.AppendChild(htmlDocument.CreateTextNode(text));
}
}
In your case I believe you need to create a new Element node that wraps the text into an HtmlElement and then just use it as a replacement of the Text node.
Or even better, see if you can do something like the answer posted here:
Replacing a HTML div InnerText tag using HTML Agility Pack

creating a textnode does not what it should do in this case:
myParentNode.AppendChild(D.CreateTextNode("<script>alert('a');</script>"));
Console.Write(myParentNode.InnerHtml);
The result should be something like
<script....
but it is a working script task even if i add it as "TEXT" not as html. This causes kind of a security issue for me because the text would be a input from a anonymous user.

Related

How to take xpath to Get Text from class inside th

I have the following XPath :
//table[#class='ui-jqgrid-htable']/thead/tr/th//text()
And I'm trying to get the text from it with the following command :
String LabelName = driver.findElement(By.xpath("//table[#class='ui-jqgrid htable']/thead/tr/th//text()")).getText()
But it's not printing text, the result is blank. Could you help me please ?
The text() in your xpath does not qualify as an element. Your element ends at //table[#class='ui-jqgrid-htable']/thead/tr/th. Try using getText() for this XPath.
Also, a table would have many headers. Using findElement will only return the first one.
If you want to get all headers use
driver.findElements(By.xpath("//table[#class='ui-jqgrid-htable']/thead/tr/th"))
and loop through the list to getText of individual element.

How to iterate on select elements with Xpath with one exception?

I want to iterate over each selector found that contains a specific class in order to retrieve all elements within the divs. This works until it reaches one item containing an ID.
for selector in response.xpath("//div[#class='product-list-entry']"):
My best try to get around this is the following code:
for selector in response.xpath("//div[not(#id) and #class='product-list-entry']"):
Both versions lead to only retrieving 5 result sets instead of the full list.
How can I simply ignore the one with the id and iterate on all others?
This should extract the content of the specific divs (examples : text of the div, content of a span and text of a p element) :
def parse(self, response):
for selector in response.xpath("//div[#id='product-list']"):
content = selector.xpath(".//div[not(#id)]/text()").extract()
content2= selector.xpath(".//div[not(#id)]/span").extract()
content3= selector.xpath(".//div[not(#id)]/p/text()").extract()
content4= ...
print (content,content2,content3,...)

Finding xpath for an element loaded by Ajax-json response using class and text

I have an element like
<td class="google-visualization-table-th gradient google-visualization-table-sorthdr">
Project Name
<span class="google-visualization-table-sortind">▼</span>
</td>
I tried
driver.findElement(By.xpath("//td[contains(#class, 'google-visualization-table-th') and normalize-space(text()) = 'Project Name']")
But its not working. Basically its code for Column header and I need to recognize each column header and print if the heading exist or not.
We do not know which version of XPath you are using, but depending on the exact version, text() means different things.
I suspect that the text content of span(the weird character ▼) is also part of td/text(). This is because text() does not mean:
Return the rext nodes of the context node
In this case it means:
Return the text nodes of the context node and the text nodes of all its decendants.
Use contains() also in the second half of the predicate:
//td[contains(#class, 'google-visualization-table-th') and contains(.,'Project Name')]
Its resolved now, element was not getting identified because they were getting loaded in an iframe. I used the below code o switch to iframe and then did usual operations to track the elements.
WebDriverWait wait = new WebDriverWait(driver, 200);
By by = By.id("iframe id");
try {
wait.until(ExpectedConditions.frameToBeAvailableAndSwitchToIt(by));
Thanks everyone for the help.

XPath HTML finding nodes

I am using HtmlAgilityPack to try to find HTML 'A' nodes that have a href attribute that contains a certain string, in my case the string '/groups/':
HtmlNodeCollection groups = source.DocumentNode.SelectNodes("//a[contains(#href, '/groups/')]");
Although the source code contains about 20 such nodes my code above is returning none which leads me to believe maybe I'm doing it incorrectly.
Is what I'm doing correct, and if not how can I select nodes that have a certain attribute that has a value that contains a certain string?
Your expression is seems to be correct as for me.
You don't post your source document (or at least a part of it). So, I'll be guessing.
The thing is, xpath is not cool for case insensitive comparison. If you have an <a> tag with href attribute that contains e.g. /Groups/ or /GROUPS/, it won't be matched. There is a workaround for this:
//a[contains(translate(#href, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), '/groups/')]
As another option you could use LINQ with StringComparison.OrdinalIgnoreCase:
source.DocumentNode.Descendants("a")
.Where(a => a.GetAttributeValue("href", string.Empty)
.IndexOf("/groups/", StringComparison.OrdinalIgnoreCase) != -1
);

how to get attribute values using nokogiri

I have a webpage whose DOM structure I do not know...but i know the text which i need to find in that particular webpage..so in order to get its xpath what i do is :
doc = Nokogiri::HTML(webpage)
doc.traverse { |node|
if node.text?
if node.content == "my text"
path << node.path
end
end
}
puts path
now suppose i get an output like ::
html/body/div[4]/div[8]/div/div[38]/div/p/text()
so that later on when i access this webpage again i can do this ::
doc.xpath("#{path[0]}")
instead of traversing the whole DOM tree everytime i want the text
I want to do some further processing , for that i need to know which of the element nodes in the above xpath output have attributes associated with them and what are their attribute values. how would i achieve that? the output that i want is
#=> output desired
{ p => p_attr_value , div => div_attr_value , div[38] => div[38]_attr_value.....so on }
I am not facing the problem in searching the nodes where "my text" lies.. I wanted to have the full xpath of "my text" node..thts why i did the whole traversal...now after finding the full xpath i want the attributes associated with the each element node that I came across while getting to the "my text" node
constraints are ::I cant use any of the developer tools available in a web browser
PS :: I am newbie in ruby and nokogiri..
To select all attributes of an element that is selected using the XPath expression someExpr, you need to evaluate a new XPath expression:
someExpr/#*
where someExpr must be substituted with the real XPath expression used to select the particular element.
This selects all attributes of all (we assume that's just one) elements that are selected by the Xpath expression someExpr
For example, if the element we want is selected by:
/a/b/c
then all of its attributes are selected by:
/a/b/c/#*

Resources