I been trying for over 2 hours to import timestamp from zap2it.com link to my google spreasheet.
Here is link I am trying to importxml from.
http://affiliate.zap2it.com/tvlistings/ZCGrid.do?zipcode=78238&lineupId=DISH641:-
Here is what I am tryign to import
Here is what I tried so far
=importxml("http://affiliate.zap2it.com/tvlistings/ZCGrid.do?aid=dish&pkg=8388608&fromProvider=true&zipcode=78238&x=52&y=18"&B1,"//body//div[3]/div/div/div[3]/div/div")
EDIT
I was able to improve and get better results
//body//div[3]/div/div/div[1]//*
but it shows timestamp from all over the page. not exactly what I need.
[The first complication is that the data stream returned from dereferencing that URI is not actually XML; it has several thousand well-formedness errors (unescaped ampersands in URIs, unescaped ampersands and less-than signs in scripts, some embedded HTML, some miscellaneous errors). Since you're not reporting problems from that, however, I'll assume that somewhere between the server and your XPath expression someone is doing some tidying.]
I think you'll get better results if you use the id and class attributes that are extensively used in the document. The material you want looks like this in the source (you can use any browser-based debugging tool to find it; I used the 'Web Inspector' in Safari); I have indented to make the structure more visible, and fixed some well-formedness errors in one of the a elements (missing whitespace between attribute-value pairs).
<div class="zc-tn" id="zc-tn-top">
<div class="zc-tn-i">
<a href="ZCGrid.do?fromTimeInMillis=1355781600000"
class="zc-tn-l"
title="Move the grid three hours earlier"></a>
<div class="zc-tn-c">
<span class="zc-tn-z"
title="Central Standard Time">CST</span>
<div class="zc-tn-t">7:00 PM</div>
<div class="zc-tn-t">7:30 PM</div>
<div class="zc-tn-t">8:00 PM</div>
<div class="zc-tn-t">8:30 PM</div>
<div class="zc-tn-t">9:00 PM</div>
<div class="zc-tn-t">9:30 PM</div>
</div>
<a href="ZCGrid.do?fromTimeInMillis=1355803200000"
class="zc-tn-r"
title="Advance the grid three hours"></a>
</div>
</div>
A simple search verifies that the value zc-tn-top is indeed unique as an ID value in the document. Given that, a simple XPath expression to retrieve all the elements whose display is circled in your image is (assuming xhtml is bound to the XHTML namespace):
//xhtml:div[#id='zc-tn-top']//xhtml:div[#class='zc-tn-t']
It looks from your question as if your XPath evaluator is namespace-challenged or namespace-oblivious, so you may need to write this as
//div[#id='zc-tn-top']//div[#class='zc-tn-t']
Related
I am new to web crawling as well as xpath. However, I am trying to crawl the following website: https://sabobic.blogabet.com/
Basically, I want to extract all "feed_pick_analysis", i.e., all text content which belongs to each post.
I cannot use the statement bellow, because the ID is changing dynamically.
xpath('.//div[#class="feed-pick-title"]/div[#class="col-xs-12 _text-more feed-analysis"]/div[#id="feed_pick_analysis_27759116"]/p').extract()
Thus, I tried to use the following statement:
xpath('.//div[#class="col-xs-12 _text-more feed-analysis"]/div[#contenteditable="false"]/p').extract()
However, I am not even getting any data or tag responded by the website... What is my mistake?
[EDIT] This is the html I am working on:
<div class="col-xs-12 _text-more feed-analysis">
<div contenteditable="false" id="feed_pick_analysis_27759116">
<p>Cant verify asians because nothing is working on this site.<br>
<br>
Game is available in IBC,ISN,SBO<br>
<br>
Game on neutral ground.<br>
<br>
No home advantage for Persipura and thats big minus for them today.<br>
<br>
So Persija will have many fans on the stands, supporting them, so thats more home game for Persija.<br>
<br>
They sign some quality players(Aryanto) and foreigners Xandao and spanish playmaker Tomas who seems to be best player in the league.<br>
<br>
Big value on Persija +0.25 and DNB.<br>
<br>
Fair odds Persija ML #2.10 and dnb #1.50. GL!</p> </div>
<div class="col-xs-12 no-padding margin-top-10">
<small class="last-edit "><em>
last edited: Wed, Sep 11th, 2019, 09:47 </em></small>
</div>
</div>
To make your XPath expression more flexible you can ignore the number in the last #id value. Also note that your expression was missing a space between _text-more and feed-analysis - it has to be _text-more feed-analysis.
.//div[#class="col-xs-12 _text-more feed-analysis"]/div[contains(#id,"feed_pick_analysis")]/p
I merely removed the first div because it was not part of the sample HTML. Add it again, if the expression is not specific enough.
I need the third image with that class and parent. None of these xpaths seem to be valid.
xpath=(//div[#class='itemTileV5'])//img[#class='dealItem']/#src[3]
xpath=(//div[#class='itemTileV5']//img[#class='dealItem'])/#src[3]
xpath=(//div[#class='itemTileV5']//img[#class='dealItem']/#src)[3]
Notice I move the parentheses around and it's always an invalid path. Without parentheses it won't work either.
Please help.
<div class="itemTileV5">
<div class="top">
<a href="/Grocery_deals/p_pepperidge-farm-goldfish-variety-pack-bold-mix-29-4-ounce">
<img class="Item" src="https://img.google.com/ai/184x184/dealimage/1493649114.jpg" alt="Pepperidge Farm">
</a>
</div>
</div>
All three of your expressions are valid in all versions of XPath. If you're getting an error, please tell us what it is, and what XPath processor generated it.
The first two expressions aren't useful, because #src[3] selects the third attribute called "src" and there can only be one attribute with a given name.
Your informal requirement "the third image with that class and parent" seems to translate to (//div[#class='itemTileV5']/img[#class='dealItem'])[3]/#src
<div class="summary-item">
<label >Price</label>
<div class="value">
0.99 GBP
</div>
</div>
<div class="summary-item">
<label >Other info</label>
<div class="value">
All languages
</div>
</div>
I am trying to get the "0.99 GBP" using an XPath expression, so far I have reached the label using this (note there is another class by the name summary-item, therefore I need to uniquely identify with the label name Price)
sel.xpath('//*/div[#class="summary-item"]/label[text()="Price"]').extract()
However, I am unable to get to the class, I tried using following-sibling, but I did not succeed, any help will be appreciated.
The existence of child nodes can be part of the predicate. Put the test for label into a predicate for the parent, either as a separate predicate (adding the target node as well):
//div[#class="summary-item"][label[text()="Price"]]/div[#class="value"]
or joined with and:
//div[#class="summary-item" and label[text()="Price"]]/div[#class="value"]
(Note you don’t need //*/div at the start.)
You could use following-sibling if you wanted, it would look like this:
//div[#class="summary-item"]/label[text()="Price"]/following-sibling::div[#class="value"]
(here the label div isn’t part of the predicate).
One more thing to be aware of, using XPath to select HTML classes doesn’t work the same as using CSS – XPath will only match the exact string whereas CSS matches even if the element is in more than one class. In this case it works out okay but you should watch out for it. Search StackOverflow if it will be an issue, there are a few answers descibing it.
i need to scrap information form a website contain the property details.
<div class="inner">
<div class="col">
<h2>House in Digana </h2>
<div class="meta">
<div class="date"></div>
<span class="category">Houses</span>,
<span class="location">Kandy</span>
</div>
</div>
<div class="attr polar">
<span class="data">Rs. 3,600,000</span>
</div>
what is the xpath notation for "Kandy" and "Rs. 3,600,000" ?
It is not wise to address text nodes directly using text() because of nuances in an XML document.
Rather, addressing an element node directly returns the concatenation of all descendant text nodes as the element value, which is what people usually want (and think they are getting when they address text nodes).
The canonical example I use in the classroom is this example of OCR'ed content as XML:
<cost>39<!--that 9 may be an 8-->.22</cost>
The value of the element using the XPath address cost is "39.22", but in XSLT 1.0 the value of the XPath address cost/text() is "39" which is not complete. In XSLT 2.0 (which is how the question is tagged), you get two text nodes "39" and ".22", which if you concatenate them it looks correct. But, if you pass them to a function requiring a singleton argument, you will get a run-time error. When you address an element, the text returned is concatenated into a single string, which is suitable for a singleton argument.
I tell students that in all of my professional work there are only very (very!) few times that I ever have to use text() in my stylesheets.
So //span[#class='location' or #class='data'] would find the two fields if those were the only such elements in the entire document. You may need to use ".//span" from a location inside of the document tree.
I'm trying to automate testing of the code... well, written without testing in mind (no IDs on many elements, and a lot of elements with the same class names). I would appreciate any help (questions are below the code):
<div id="author-taxonomies" class="menu-opened menu-hover-opened-inactive" onmouseover="styleMenuElement(this)" onmouseout="styleMenuElement(this)" onclick="toggleSFGroup(this)">Author</div>
<div id="author-taxonomies-div" class="opened">
<div id="top-level-menu" class="opened">
<div id="top-level-menu-item-1" class="as-master">
<div class="filter-label"> Name</div>
</div>
<div id="top-level-menu-item-1" class="as-slave"
style="top: 525px; left: 34px; z-index: 100; display: none;"> </div>
<div id="top-level-menu-item-2" class="as-master">
<div class="filter-label">Title</div>
</div>
<div id="top-level-menu-item-2" class="as-slave">
<div id="top-level-menu-item-2" class="as-slave-title as-slave-title-subgroup"
>Title</div>
<div id="top-level-menu-item-2" class="as-slave-body"> </div>
<div class="as-slave-buffer"> </div>
</div>
<div id="top-level-menu-item-3" class="as-master">
<div class="filter-label">Location</div>
</div>
<div id="top-level-menu-item-3" class="as-slave"> </div>
</div>
</div>
The question is: how to refer particular labels of this menu and the properties with xPath expressions? For example, if I want to:
verify the "Location" label is there
check if "Title" with class "as-slave" is not visible at the moment
It would be something similar to:
//div[#id="top-level-menu-item-3"]/div[#class="filter-label"]
//div[#id="top-level-menu1"] --- and check in code for display: none ... assuming it is selenium rc you are using
Update: also be sure to install the following firefox addin, it is Really useful when trying different xpath expressions on a site https://addons.mozilla.org/en-US/firefox/addon/1095
As a side note: try to avoid using xpath locators in Selenium, if possible. If you have a long xpath expression, it can be up to 20 times slower for Selenium to find the element compared to identifying it using its unique ID. Of course, sometimes there is no alternative to using xpath. However, when you do use it, keep '//' expressions to minimum - this is a real performance killer.
If you're just starting with Selenium, download the selenium add-on for Firefox. As you click on DOM elements, Selenium shows you the xpath to access it.
I am currently working on an open source library for generating xpath expressions through a fluent .Net API. The idea is to be able to generate xpath based selenium locators without having to know xpath.
Here's an example of how the library can be used in your case:
XPathFinder.Find.Tag("div").With.Attribute("id", "top-level-menu-item-3").And.Child("div").With.Attribute("class", "filter-label").ToXPathExpression();
This will produce the following xpath:
"//div[#id='top-level-menu-item-3']/div[#class='filter-label']"
Check it out at
http://code.google.com/p/xpathitup/
You can use firepath that can be installed over firebug(both firefox plugin). When you get a xpath, dont forget to append // before using it. Either in code or in selenium IDE. You are not appending it thats why its unusable. There are two types of xpath absolute and relative. If you use absolute then it will take care of dynamic ids. But if you use relative it will break with each run.