Xpath of a text containing Bold text - xpath

I am trying to click on the link whose site is www.qualtrapharma.com‎ by searching in google
"qualtra" but there is problem in writing xpath as <cite> tag contains <B> tag inside it. How to do any any one suggest?
<div class="f kv" style="white-space:nowrap">
<cite class="vurls">
www.
<b>qualtra</b>
pharma.com/
</cite>
<div>

You may overcome this by using the '.' in the XPath, which stands for the 'text in the current node'.
The XPath would look like the following:
//cite[.='www.qualtrapharma.com/']

Related

Extracting links (get href values) with certain text with Xpath under a div tag with certain class

SO contributors. I am fully aware of the following question How to obtain href values from a div using xpath?, which basically deals with one part of my problem yet for some reason the solution posted there does not work in my case, so I would kindly ask for help in resolving two related issues. In the example below, I would like to get the href value of the "more" hyperlink (http://www.thestraddler.com/201715/piece2.php), which is under the div tag with content class.
<div class="content">
<h3>Against the Renting of Persons: A conversation with David Ellerman</h3>
[1]
</p>
<p>More here.</p>
</div>
In theory I should be able to extract the links under a div tag with
xidel website -e //div[#class="content"]//a/#href
but for some reason it does not work. How can I resolve this and (2nd part) how can I extract the href value of only the "here" hyperlink?

Trouble accessing a text with XPath query

I have this html snippet
<div id="overview">
<strong>some text</strong>
<br/>
some other text
<strong>more text</strong>
TEXT I NEED IS HERE
<div id="sub">...</div>
</div>
How can I get the text I am looking for (shown in caps)?
I tried this, I get an error message saying not able to locate the element.
"//div[#id='overview']/strong[position()=2]/following-sibling"
I tried this, I get the div with id=sub, but not the text (correctly so)
"//div[#id='overview']/*[preceding-sibling::strong[position()=2]]"
Is there anyway to get the text, other than doing some string matching or regex with contents of overview div?
Thanks.
following-sibling is the axis, you still need to specify the actual node (in your example the XPath processor is searching for an element named following-sibling). You separate the axis from the node with ::.
Try this:
//div[#id='overview']/strong[position()=2]/following-sibling::text()[1]
This specifies the first text node after the second strong in the div.
If you always want the text immediately preceding the <div id="sub"> then you could try
//div[#id='sub']/preceding-sibling::text()[1]
That would give you everything between the </strong> and the opening <div ..., i.e. the upper case text plus its leading and trailing new lines and whitespace.

How to find xpath expression to select this text

I have this html code , trying many times to get the pure xpath for text "sample text" then "author" text in separate xpath and i don't find any criteria for that!!!
<div class="Text">
“sample article here with quotation marks .”
<br/>
―
Author
so please help , it make me mad!!
thanks
The first part you can get by getting the div by class, get br inside and retrieve the preceding-sibling's text:
//div[#class="Text"]/br/preceding-sibling::text()
The second part is easier, just get the text of a tag inside the div:
//div[#class="Text"]/a/text()

How do HtmlAgilityPack extract text from html node whose class attribute appended dynamically

Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from amazon.cn.
<div class="content"><ul>
<li><b>用户评分:</b>
<span class="crAvgStars" style="white-space:no-wrap;">
<span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO">
<a>
<span class="swSprite s_star_3_5 " title="平均3.6 星">
<span>平均3.6 星</span>
</span>
</a>
My question is span class tag value "s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use doc.DocumentNode.SelectSingleNode(" //span[#class='swSprite']").InnerText or //span[#class='swSprite s_star_3_5 '], but the result is an error or not what my want !
Any suggestions?
First of all, I suggest you saving the value of doc.DocumentNode.OuterHtml to a local .html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.
I'm suggesting this because I tested //span[#class='swSprite s_star_3_5 '] and worked correctly.
That was the issue in the following questions:
Selecting nodes that have an attribute with spaces using HTMLAgilityPack
XPath Query Problem using HTML Agility Pack
If that doesn't help, post the HTML code and I'll help you ;)
This works for me:
HtmlDocument doc = new HtmlDocument();
doc.Load(myHtml);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(#class, 'swSprite')]");
Console.WriteLine("Text=" + node.InnerText.Trim());
and outputs
平均3.6 星
Note I use the XPATH starts-with function.

Selecting specific using x-path while disregarding certain nodes

I have some html that looks pretty much like this.
<p>
<a img src="img src">
<strong>foo</strong>
<strong>bar</strong>
<strong>baz</strong>
<strong>eek</strong>
This is the text I want to select using xpath.
</p>
How can I select only this particular text node as indicated above using xpath?
How do I get at only this particular
text element in question using xpath?
Use:
/p/text()[last()]
"/p/text()" xpath expression will select the text from "p" node in above XML (Posted in question).
/p/text()[normalize-space()]
this will remove trailing spaces from string. This xpath produces exactly what you want.
There is very good tutorial at http://www.w3schools.com/xpath/

Resources