Trying to import the right XPath or import function to get the live conversion amount on this page:
https://www.xe.com/currencyconverter/convert/?Amount=1&From=AUD&To=PHP
Need this number:
I am stuck here -
Related
I'm trying to scrape match statistics of a game of football yesterday at the following url:
https://www.flashscore.com/match/8S0QVm38/#match-statistics;0
I've written code, just for Webdriver to select the stats I want and print them for me so I can then see what I want to use. My code is:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
browser.get("https://www.flashscore.com/match/8S0QVm38/#match-statistics;0")
print(browser.find_elements_by_class_name("statText--homeValue"))
A list of elements are printed out and to be honest, I don't know if this was what I was looking for because what is returned doesn't show anything to identify with what i'm looking at in the developer tools.
I'm trying to get all the numbers under statistics like Possession and shots on target, but print returns a list of xpaths like this, where the session is the same but the element is always different:
[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="da88ca87-e318-934e-ba75-dca1d652cd37", element="c53f5f3e-2c89-b34c-a639-ab50fbbf0c33")>,
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="da88ca87-e318-934e-ba75-dca1d652cd37", element="3e422b45-e26d-de44-8994-5f9788462ec4")>,
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="da88ca87-e318-934e-ba75-dca1d652cd37", element="9e110a54-4ecb-fb4b-9d8f-ccd1b210409d")>, <
Anyone know why this is and what I can do to get the actual numbers?
What you're getting are not XPaths, but a list of WebElement objects. To get text from each try
print([node.text for node in browser.find_elements_by_class_name("statText--homeValue")])
You have printed the generators instead of actual contents. For that you have to use .text for each element. Like,
elements = browser.find_elements_by_class_name("statText--homeValue")
for element in elements:
print(element.text)
You can opt for a list comprehensive method shown in Andersson's answer also.
Hope this helps! Cheers!
I'm tryng to do web scraping by importXML in Google Spreadsheet, reading the content in this page:
http://ddp.usach.cl/procesos-de-seleccion-internos
What I need to do is select the list below "Lista de Procesos, and separate it by rows. I went to the page, inspected and copy the XPath
//*[#id="node-page-442"]/div[1]/div/div/div/p[5]
Resulting in this code:
=importxml("http://ddp.usach.cl/node/442";"//*[#id='node-page-442']/div[1]/div/div/div/p[7]/text()")
However, when I try to load it I get an error #N/A
"Imported content is empty"
One path to get the nodes following the h4 element with the content "Lista de Procesos" is
//article[#id='node-page-442']/div[contains(#class, 'content')]/div[contains(#class, 'field-name-body')]/div[#class='field-items']/div[contains(#class,'field-item')]/h4[contains(text(), 'Lista de Procesos')]/following-sibling::*
The retrieved children are not structured, but complete. If you can use XSLT-2.0, you could structure them by using for-each-group with group-starting-with='strong'. But this is only one possibility.
The expression could be reduced to the simple term:
//h4[contains(text(),'Lista de Procesos')]/following-sibling::*
Maybe this suits your needs better.
I'm scrapping a website that have a system pagination based on javascript, so I want to extract the page number from the #href attribute, this is how the link look:
Page 1
Scrapy selectors support regular expressions:
sel.xpath('//a/#href').re(r"javascript:AllerAPage\('(\d+)',")
Note that //a/#href xpath exression is an example one - you may have a different one.
Demo shows the work of the regex I've provided:
>>> import re
>>> s = "javascript:AllerAPage('1', 'element_id');"
>>> re.search("javascript:AllerAPage\('(\d+)',", s).group(1)
'1'
I have a problem with scraping one website - motoallegro
I want to get title of all ads in this page
So I set formula in google spreadsheet:
=ImportXML("http://allegro.pl/samochody-149?order=qd&string=Primera+GT&search_scope=automotive&department=automotive";"//header/h2/a/span")
This formula always return #NA error: not received any data as a result of XPath queries
But if I try to get other data from the same page, for example H1 text:
=ImportXML("http://allegro.pl/samochody-149?order=qd&string=Primera+GT&search_scope=automotive&department=automotive";"//h1/span")
The result is correct: "Primera GT"
I want to add, that xPath rule - //header/h2/a/span IS CORRECT. I tested it on few firefox xPath plugins.
Any ideas, why google spreadsheet formula ImportXML with correct xPath rule not return correct data?
Google seems to strip non-HTML4-tags like <header/> and <section/>. You could use <div id="listing">...</div> for accessing only the headlines you need.
Try this XPath expression:
//div[#id='listing']//h2/a/span
I have a portion of html like below
<li><label>The Keyword:</label><span>The text</span></li>
I want to get the string "The keyword: The text".
I know that I can get xpath of above html using Chrome inspect or FF firebug, then hxs.select(xpath).extract(), then strip html tags to get the string. However, the approach is not generic enough since the xpath is not consistent across different pages.
Hence, I'm thinking of below approach:
Firstly, search for "The Keyword:" using
hxs = HtmlXPathSelector(response)
hxs.select('//*[contains(text(), "The Keyword:")]')
When do pprint I get some return:
>>> pprint( hxs.select('//*[contains(text(), "The Keyword:")]') )
<HtmlXPathSelector xpath='//*[contains(text(), "The Keyword:")]' data=u'<label>The Keyword:</label>'>
My question is how to get the wanted string: "The keyword: The text". I am thinking of how to determine xpath, if xpath is known, then of course I can get the wanted string.
I am open to any solution other than scrapy HtmlXPathSelector. ( e.g lxml.html might have more features but I am very new to it).
Thanks.
If I understand your question correctly, "following-sibling" is what you are looking after.
//*[contains(text(), "The Keyword:")]/following-sibling::span/a/text()
Xpath Axes