HtmlUnit - getTextContent() - htmlunit

I´m working whith HTMLUnit, I need get text content of a HtmlAnchor but only text no more tags html have.
<a class="subjectPrice" href="http://www.terra.es/?ca=28_s&st=a&c=4" title="Opel Zafira Tourer 2.0 Cdti 165 Cv Excellence 5p. -12">
<span class="old_price">32.679€</span>
24.395€
If I execute htmlAnchor.getTextContent() it´s return 32.679€ 24.395€, but I only need 24.395€
Anybody can help me? thanks.

Just use XPath to get the appropriate DomText node. It seems that ./text() taking as a reference the HtmlAnchor should be enough.

Related

Selecting with Xpath in Scrapy

I'm using Scapy to scrape some data from a site and I need help using Xpath to select "data" from the following.
<span class="result_item"><span class="text3"><span class="header_text3">**data**</span><br />
**data**<br />
**data**</span> <span class="phone_button_out"><span class="phone_button" style="margin-top: 0"
onclick="pageTracker._trackEvent('USDSearch','Call Now!F');phone_win.open('name','**data**',27101650,0)">
Call Now!<br />
</span></span>
What statements can I use to select the necessary data? I hope this isn't a stupid question. If it is, please point me in the right direction.
There are multiple data elements to get in the posted html. Assuming that <span class="result_item"> is parent of the items, you can try the following:
To get header:
//span[#class='result_item']/span[#class='header_text3']/text()
To get anchor link data:
//span[#class='result_item']/a/text()
Also, to help with xpaths, install Firebug Addon in Firefox, then FirePath addon on Firebug. Pointing to elements will give you autogenerated xpaths (good for beginners. sometime needs xpath tuning)

selenium findElement by another thing than By.id

I'm quite new to this wonderfull tool that selenium is, and i'm trying to make some examples tests in my web app (html/JS).
I managed to select some (most) elements withtheir id with the command driver.findElement(By.id("elementId"));
but i'm unable to find some elements that do not have an id tag.
I tried these following lines without result, as i have have an
By.cssSelector("//img[#alt='smthg']")
By.xpath("//img[#src='path/to/img'")
a mix of the two aboce (alt and src in xpath and cssSelector
This element HTML code is
<img src="absolut/path/to/img.png" border="0" onclick="JSfunction(0)" alt="smthg" style="cursor: pointer;">
If somebody could help me, that would be very nice :)
Thanks, and have a good day !
You can use either of below
By.cssSelector("img[alt='smthg'][src*='path/to/img']");
or
By.xpath("//img[#alt='smthg' and contains(#src,'path/to/img')]")

Selenium Web driver xpath, span locator

Selenium Webdriver. Looking to Locate New Article from following code. Please note this is under an iframe.
<img class="rtbIcon" src="/icons/16/app/shadow/document_add.png" alt="">
<span class="rtbText">New Article</span>
I have tried to locate with xpath and many other ways. But following is what I get everytime
Code : driver.findElement(By.xpath("id('RadToolBar1'):div:div:div:ul:li[3]:a:span:span:span:span"));
Result:
The given selector id('RadToolBar1'):div:div:div:ul:li[3]:a:span:span:span:span is either invalid or does not result in a WebElement. The following error occurred:
New article has no name, id so please if some one can help find me solution.
Your xpath seems to be wrong. The best way to get the xpath for any element on a page is by installing mozilla add on - Fire Bug. You can inspect any element using this add on and also copy the correct xpath of your element present on the page.
This should be your xpath -
driver.findElement(By.xpath("//*[#class='rtbText']"));
or
driver.findElement(By.linkText("New Article"));
One of these should work. Let me know if you face any problem.

How do HtmlAgilityPack extract text from html node whose class attribute appended dynamically

Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from amazon.cn.
<div class="content"><ul>
<li><b>用户评分:</b>
<span class="crAvgStars" style="white-space:no-wrap;">
<span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO">
<a>
<span class="swSprite s_star_3_5 " title="平均3.6 星">
<span>平均3.6 星</span>
</span>
</a>
My question is span class tag value "s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use doc.DocumentNode.SelectSingleNode(" //span[#class='swSprite']").InnerText or //span[#class='swSprite s_star_3_5 '], but the result is an error or not what my want !
Any suggestions?
First of all, I suggest you saving the value of doc.DocumentNode.OuterHtml to a local .html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.
I'm suggesting this because I tested //span[#class='swSprite s_star_3_5 '] and worked correctly.
That was the issue in the following questions:
Selecting nodes that have an attribute with spaces using HTMLAgilityPack
XPath Query Problem using HTML Agility Pack
If that doesn't help, post the HTML code and I'll help you ;)
This works for me:
HtmlDocument doc = new HtmlDocument();
doc.Load(myHtml);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(#class, 'swSprite')]");
Console.WriteLine("Text=" + node.InnerText.Trim());
and outputs
平均3.6 星
Note I use the XPATH starts-with function.

Beginner: Refreshing a part of a site using AJAX

I want to refresh with ajax just a part of a site.I searched protorypjs and i found ajax.updater but i am having trobule making it work.
this is part of my page
<li id="Home">Home</li>
Can anyone tell me how i can implement this : Ajax.Updater(container, url[, options]) ?
and make it work?
I linked prototype.js in the html.
One solution to update content is by using
$('#refreshMe').innerHtml(_new_ajax_content_);
where element is i.e. a
<div id="refreshMe">
If you want more specific help, you should post more information/code from your project.
thank you for the help. i made it work
here is what i wanted to do :)
<li id="Servicii"><a href=# onclick="new Ajax.Updater('container', 'servicii.html',
{asynchronous:true});">Servicii

Resources