Trouble grabbing background image url with importxml / xpath - xpath

I'm trying to scrape some background image urls into a google sheet. Here is an example of the container-
<div class="_rs9 _1xcn">
<div class="_1ue-">
<section class="_4gsw _7of _1ue_" style="background-image: url(https://scontent.x.com/v/t64.5771-25/38974906_464042117451453_1752137156853235712_n.png?_nc_cat=100&_nc_ht=scontent.x.com&oh=c19f15536205be2e1eedb7f7fc7cb61b&oe=5C4442FD)">
<div class="_7p2">
</div>
</section>
I need to get from the https to the question mark after png. I know there's a way to use substring-before/-after but I am having a tough time, particular with escaping quotes.
Here is my attempt. This just gets me an "#N/A":
=IMPORTXML(B2,"substring-before(substring-after(//section[#class='_4gsw _7of _1ue_']/#style, """"background-image: url(""""), """")"""")")
Could anyone help with the full importxml statement? Much appreciated, thanks.

Your approach was close. Try the following XPath expression:
substring-before(substring-after(//section[#class='_4gsw _7of _1ue_']/#style, 'background-image: url('),'?')
The whole expression could look like this:
=IMPORTXML(B2,"substring-before(substring-after(//section[#class='_4gsw _7of _1ue_']/#style, 'background-image: url('),'?')")

Related

xpath br (line breaks) in p (paragraph)

Below is an example XML.
<p>
Thisisgood
</p>
<p>
Thisisbad
</p>
<p>
This
<br>
is
<br>
acceptable
</p>
<p>
Thisisfine
</p>
I want the result:
Thisisgood
Thisisbad
Thisisacceptable
Thisisfine
I use Xpath //p/text() in Google Doc (=importXML). This results in:
Thisisgood
Thisisbad
This is acceptable (appearing in different cells)
Thisisfine
What XPath would give me the result I need? Thank you.
You cannot solve this problem using XPath 1.0. Using XPath 2.0, you'd just do a
//p/string-join(text(), '')
but this is not supported by Google Spreadsheet.
I'm pretty sure you can use ARRAYFUNCTION and JOIN in Google Spreadsheet, but cannot help you with this. Better ask a new question with appropriate tags for Google Spreadsheets so people following that tag get notified, and provide an example Spreadsheet using the ImportXML function so people can work with it.
I had the same problem. I used this code
=Trim(JOIN("",L3:X3))
L3:X3 are the cells
//p/ without text() must be enough to get this: Thisisacceptable

HtmlUnit - getTextContent()

I´m working whith HTMLUnit, I need get text content of a HtmlAnchor but only text no more tags html have.
<a class="subjectPrice" href="http://www.terra.es/?ca=28_s&st=a&c=4" title="Opel Zafira Tourer 2.0 Cdti 165 Cv Excellence 5p. -12">
<span class="old_price">32.679€</span>
24.395€
If I execute htmlAnchor.getTextContent() it´s return 32.679€ 24.395€, but I only need 24.395€
Anybody can help me? thanks.
Just use XPath to get the appropriate DomText node. It seems that ./text() taking as a reference the HtmlAnchor should be enough.

Google Spreadsheet importxml timestamp

I been trying for over 2 hours to import timestamp from zap2it.com link to my google spreasheet.
Here is link I am trying to importxml from.
http://affiliate.zap2it.com/tvlistings/ZCGrid.do?zipcode=78238&lineupId=DISH641:-
Here is what I am tryign to import
Here is what I tried so far
=importxml("http://affiliate.zap2it.com/tvlistings/ZCGrid.do?aid=dish&pkg=8388608&fromProvider=true&zipcode=78238&x=52&y=18"&B1,"//body//div[3]/div/div/div[3]/div/div")
EDIT
I was able to improve and get better results
//body//div[3]/div/div/div[1]//*
but it shows timestamp from all over the page. not exactly what I need.
[The first complication is that the data stream returned from dereferencing that URI is not actually XML; it has several thousand well-formedness errors (unescaped ampersands in URIs, unescaped ampersands and less-than signs in scripts, some embedded HTML, some miscellaneous errors). Since you're not reporting problems from that, however, I'll assume that somewhere between the server and your XPath expression someone is doing some tidying.]
I think you'll get better results if you use the id and class attributes that are extensively used in the document. The material you want looks like this in the source (you can use any browser-based debugging tool to find it; I used the 'Web Inspector' in Safari); I have indented to make the structure more visible, and fixed some well-formedness errors in one of the a elements (missing whitespace between attribute-value pairs).
<div class="zc-tn" id="zc-tn-top">
<div class="zc-tn-i">
<a href="ZCGrid.do?fromTimeInMillis=1355781600000"
class="zc-tn-l"
title="Move the grid three hours earlier"></a>
<div class="zc-tn-c">
<span class="zc-tn-z"
title="Central Standard Time">CST</span>
<div class="zc-tn-t">7:00 PM</div>
<div class="zc-tn-t">7:30 PM</div>
<div class="zc-tn-t">8:00 PM</div>
<div class="zc-tn-t">8:30 PM</div>
<div class="zc-tn-t">9:00 PM</div>
<div class="zc-tn-t">9:30 PM</div>
</div>
<a href="ZCGrid.do?fromTimeInMillis=1355803200000"
class="zc-tn-r"
title="Advance the grid three hours"></a>
</div>
</div>
A simple search verifies that the value zc-tn-top is indeed unique as an ID value in the document. Given that, a simple XPath expression to retrieve all the elements whose display is circled in your image is (assuming xhtml is bound to the XHTML namespace):
//xhtml:div[#id='zc-tn-top']//xhtml:div[#class='zc-tn-t']
It looks from your question as if your XPath evaluator is namespace-challenged or namespace-oblivious, so you may need to write this as
//div[#id='zc-tn-top']//div[#class='zc-tn-t']

Trying to create XPath from this HTML snippet

I have played for a while writing XPath but am unable to come up with exactly what I want.
I'm trying to write XPath for link(click1 and click2 in code snippet below) based on known text(myidentity in code snippet below). Can someone take a look into and suggest possible solution?
HTML code snippet:
<div class="abc">
<a onclick="mycontroller.goto('xx','yy'); return false;" href="#">
<img src="images/controls/inheritance.gif"/>
</a>
myidentity
<span>
<a onclick="mycontroller.goto('xx','yy'); return false;" href="#">click1</a>
<a onclick="mycontroller.goto('xx','yy'); return false;" href="#">click2</a>
</span>
</div>
You don't need to use XPath here, you could use a CSS locator. These are often faster and more compatible across different browsers.
css=div:contains(myidentity) > span a:nth-child(1) //click1
css=div:contains(myidentity) > span a:nth-child(2) //click2
Note that the > is only required to workaround a bug in the CSS locator library used by Selenium.
Hard to say without seeing the rest of the HTML but the following should work:
//div[text()[contains(., "myidentity")]]/span/a
See Macro's answer - this form should be used.
//div[text()[contains(., "myidentity")]]/span/a[2]
The following only works with one section of text in the containing div.
You'll need to select based on the text containing your identity text.
Xpath for click1
//div[contains(text(),"myidentity")]/span/a[1]
Xpath for click2
//div[contains(text(),"myidentity")]/span/a[2]

Writing XPath for elements which have no ID or Name in Selenium

I'm trying to automate testing of the code... well, written without testing in mind (no IDs on many elements, and a lot of elements with the same class names). I would appreciate any help (questions are below the code):
<div id="author-taxonomies" class="menu-opened menu-hover-opened-inactive" onmouseover="styleMenuElement(this)" onmouseout="styleMenuElement(this)" onclick="toggleSFGroup(this)">Author</div>
<div id="author-taxonomies-div" class="opened">
<div id="top-level-menu" class="opened">
<div id="top-level-menu-item-1" class="as-master">
<div class="filter-label"> Name</div>
</div>
<div id="top-level-menu-item-1" class="as-slave"
style="top: 525px; left: 34px; z-index: 100; display: none;"> </div>
<div id="top-level-menu-item-2" class="as-master">
<div class="filter-label">Title</div>
</div>
<div id="top-level-menu-item-2" class="as-slave">
<div id="top-level-menu-item-2" class="as-slave-title as-slave-title-subgroup"
>Title</div>
<div id="top-level-menu-item-2" class="as-slave-body"> </div>
<div class="as-slave-buffer"> </div>
</div>
<div id="top-level-menu-item-3" class="as-master">
<div class="filter-label">Location</div>
</div>
<div id="top-level-menu-item-3" class="as-slave"> </div>
</div>
</div>
The question is: how to refer particular labels of this menu and the properties with xPath expressions? For example, if I want to:
verify the "Location" label is there
check if "Title" with class "as-slave" is not visible at the moment
It would be something similar to:
//div[#id="top-level-menu-item-3"]/div[#class="filter-label"]
//div[#id="top-level-menu1"] --- and check in code for display: none ... assuming it is selenium rc you are using
Update: also be sure to install the following firefox addin, it is Really useful when trying different xpath expressions on a site https://addons.mozilla.org/en-US/firefox/addon/1095
As a side note: try to avoid using xpath locators in Selenium, if possible. If you have a long xpath expression, it can be up to 20 times slower for Selenium to find the element compared to identifying it using its unique ID. Of course, sometimes there is no alternative to using xpath. However, when you do use it, keep '//' expressions to minimum - this is a real performance killer.
If you're just starting with Selenium, download the selenium add-on for Firefox. As you click on DOM elements, Selenium shows you the xpath to access it.
I am currently working on an open source library for generating xpath expressions through a fluent .Net API. The idea is to be able to generate xpath based selenium locators without having to know xpath.
Here's an example of how the library can be used in your case:
XPathFinder.Find.Tag("div").With.Attribute("id", "top-level-menu-item-3").And.Child("div").With.Attribute("class", "filter-label").ToXPathExpression();
This will produce the following xpath:
"//div[#id='top-level-menu-item-3']/div[#class='filter-label']"
Check it out at
http://code.google.com/p/xpathitup/
You can use firepath that can be installed over firebug(both firefox plugin). When you get a xpath, dont forget to append // before using it. Either in code or in selenium IDE. You are not appending it thats why its unusable. There are two types of xpath absolute and relative. If you use absolute then it will take care of dynamic ids. But if you use relative it will break with each run.

Resources