Xpath working in firepath but not in scrapy

Xpath working in firepath but not in scrapy - xpath

In google finance I want to save the rows of the table containing the companies info but the xpath that works with firepath ie .//*[#id='gf-viewc']/div/div[2]/form/table/tbody/child::* yield and empty list in scrapy using the command response.xpath('.//*[#id='gf-viewc']/div/div[2]/form/table/tbody/child::*').extract().
Any idea why ?

tbody is something you should exclude from the expression - it is generated by a browser to "support the table structure":
response.xpath(".//*[#id='gf-viewc']/div/div[2]/form/table/child::*").extract()
While this works, I'd improve on locating the table and it's rows:
response.css("table.company_results tr")

Related

Web Crawling using import.io

I am trying to crawl the following website https://goo.gl/THqDhD using import.io tool. I used the connector tool to parse the whole search result for specific query (and include the pagination), and successfully chosen all the rows in the search result, but was unable to select the items'image box (as column)
import.io contain manually xpath overriding for the selected, so I tried to select images in the search results using the following xpath:
.//*[#id='container-inner']/div[3]/div[4]/div[*]/div[1]/div/a/
which should represent the columns of the table, but I got the following problem
What you have selected is not within a result
The result here is the previous selected rows, but I inspected the item box and made sure that the selection is inside. Any help please?

Extract href in table with importxml in Google spreadsheet

I am trying to pull the href for each row of each table from this website:
http://www.epa.gov/region4/superfund/sites/sites.html#KY
I can pull the table information off using =IMPORTHTML(A1,"table",1) for all 7 tables, but I need the href to the site with the detailed information.
Using =IMPORTxml(A1,"//div[#class='box']") I can pull the information needed from a site like:
http://www.epa.gov/region4/superfund/sites/fedfacs/alarmyaplal.html
but I need to extract the fedfacs/alarmyaplal.html portion for each row on the original page.
I've tried using //#href, but it is not returning any results. I'm thinking it is because the data is structured in a table but I'm stuck on where to go from here.

I'm not sure about any of the Google Spreadsheet functionality, but here's an XPath to select all href attributes of the Kentucky sites (since your first link included the 'ky' anchor):
//body//a[#id='ky']/following-sibling::table[1]/tbody/tr/td[1]/strong/a/#href
This is very specific to the Kentucky table: following-sibling::table[1] means the first table node after, and at the same level of, a[#id='ky'].

better selenium xpath is expecting

I'm trying to create xpath expression which will work with selenium using following html snippet.
Below is table contains various row that gets incremented with uniquely generatedid(for example in following snippet that id is 1000).
Selenium has created following expressions when row of id 1000 was added in table. However instead of using id, I want to create xpath by using 3rd data element in row which is (MyName) in html snippet.

A possible suggestion is to not use xpath whenever possible.
http://saucelabs.com/blog/index.php/2011/05/why-css-locators-are-the-way-to-go-vs-xpath/

You need to convert the places in the XPATH where it is referring to the row by its ID to its relative position in the table.
In all of your XPATHs, you would change tr[#id='1000'] to tr[3]
Your first example XPATH would look liek this:
//tr[3]/td[1]/a[1]/img //tr[#id='1000']/td[1]/span/a/img
Your second example would follow similarly:
//tr[3]/td[1]/span/a/img
As would your third:
//tr[3]/td[1]/a[2]/img
Hopefully you are now able change the rest of them.

YQL + TABLE + XPATH

I'm working with YQL. I understand how to make a simple query to a web page and select content with xpath.
For example: select * from html where url="http://www.animeclick.it/manga.php?xtit=Ranmaru+XXX" and xpath="/html/body/div/table/tr/td/table/tr/td/div/div/img[contains(#src,'manga')]".
Now, there are limitation in this approach. I can't make login to the site, can't repeat different information in the page (I know can make more query or add new xpath expression) and I can't format output result
(like inside div this content :
"<p> Hello <a src="#"> Boy!</a></p>" ,
where in this case i need the text "Hello boy")
How to use YQL OPEN TABLE for this scope!??!

How to use YQL OPEN TABLE for this scope!??!
Please take the time to have a thorough read through the Creating YQL Open Data Tables chapters in the YQL docs.
In particular, an <execute> block (docs) will enable you to do all of the things that you mentioned above.

Selecting table data from a webpage

I'm trying to get the results from empire magazine website (Film Reviews (Popular Matches) table) using YQL - http://www.empireonline.com/search/default.asp?search=Dragonheart (as an example) and I'm using firebug to get the xpath but it doesn't seem to want to return results. This is what I'm using;
select * from html where url='http://www.empireonline.com/search/default.asp?search=cars' and xpath='/html/body/table[3]/tbody/tr[5]/td[2]/table[2]/tbody/tr/td/table[2]/tbody/tr/td/table[2]'
Now it seems to be able to use;
select * from html where url='http://www.empireonline.com/search/default.asp?search=cars' and xpath='//table'
But that's a whole lot of data I don't need to chuck about.

You just need to be mindful when crafting the appropriate XPath query. The following gets the link and name of each of the reviews listed in that HTML table by first targetting the "Film Reviews (Popular Matches)" paragraph, then navigating to the list of films.
SELECT href, strong
FROM html
WHERE url = 'http://www.empireonline.com/search/default.asp?search=Thor'
AND xpath = '
//p[.="Film Reviews (Popular Matches)"]
/ancestor::table[1]
/following-sibling::table[1]
//td[2]/a
'
(Try this query in the YQL console.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Xpath working in firepath but not in scrapy - xpath

Related

Web Crawling using import.io

Extract href in table with importxml in Google spreadsheet

better selenium xpath is expecting

YQL + TABLE + XPATH

Selecting table data from a webpage

Categories

Resources