I have worked out xpath which gives very close to what I need but needs some small refining.
https://www.punters.com.au/form-guide/
I want all URLs from the website racing Today and only in Australia
These are the xpaths I have now.
This one provides all the races on the page. Including all countries racing today. - //*[#class='component-wrapper form-guide-index']/table1/tbody/tr//td/a/#href
This one provides all races in Australia. But includes races today, tomorrow or any other day on the webpage - //tr[#class="upcoming-race__row"][preceding::tr[#class='upcoming-race__row upcoming-race__row--country']1[*/.="Australia"]]/td[position()>=2]/a/#href
OK. So this is the related topic :
xpath to obtain texts between 2 tags in IMPORTXML formula
To get the links of all races in Australia today (replace " with ' in GoogleSheets) :
//tr[#class="upcoming-race__row"][preceding::td[#class="upcoming-race__country-title"][1][.="Australia"]][preceding::h2[1][.="Today"]]/td[position()>=2]/a/#href
Alternative XPaths :
//h2[.="Today"]/following::table[1]//tr[#class="upcoming-race__row"][preceding::td[#class='upcoming-race__country-title'][1][.="Australia"]]/td[position()>=2]/a/#href
//div[#class="component-wrapper form-guide-index"]/table[1]//tr[#class="upcoming-race__row"][preceding::td[#class='upcoming-race__country-title'][1][.="Australia"]]/td[position()>=2]/a/#href
Related
I'm trying to bring data from players in hltv to Sheet with importxml but can't get it. I've discovered that there are multiple div classes in a row and inside them there are spans where the actual data is.
I have tried multiple ways to get either, the all the info together or one at a time, but I'm starting to get out options.
For example:
=IMPORTXML("https://www.hltv.org/stats/players/11893/ZywOo","//#class='Statistics-row'//#class='columns'")
Also I have tried to get players from certain country in https://www.hltv.org/stats/players
Can someone help?
Alternative to #Madhurjya proposal. With IMPORTFROMWEB addon you can have :
XPaths used :
//div[#class="statistics"]//span[1]
//div[#class="statistics"]//span[2]
Formula :
=IMPORTFROMWEB(B1;B2:C2)
But also :
Xpaths used :
//a[preceding-sibling::img[#alt="France"]]
//img[#alt="France"]/#alt
Formula :
=IMPORTFROMWEB(B1;B2:C2)
Note : number of requests are limited. Check the pricing or code your own GoogleAppScript.
I am a noob at importXML. The XPath to the number of likes is
//*[#id="react-root"]/section/main/div/div/article/div[2]/section[2]/div/a/span
So the formula for the scraping the number of likes from this post: https://www.instagram.com/p/BZLli5ll6yz/ should be:
=IMPORTXML("https://www.instagram.com/p/BZLli5ll6yz/", "//*[#id="react-root"]/section/main/div/div/article/div[2]/section[2]/div/a/span")
Right? What am I missing?
Make sure that in the xpath the "react-root" is in a subclass: 'react-root'. This keeps it contained within the second argument.
I am a beginner to programming in general and google in particular. I've been trying to get this (what seems to me) simple web query working for a while using the importxml() function. I am trying to pull a reference from a citation generation website, where you search a pubmed ID number (PMID).
The site is https://mickschroeder.com/citation/?q=18515037 where 18515037 is the PMID. This brings up a citation.
Allison MA, Kwan K, Ditomasso D, Wright CM, Criqui MH. The epidemiology of
abdominal aortic diameter. J Vasc Surg. 2008;48(1):121-7.
I did inspect element and got the XPath as:
//*[#id="citation_formatted"]/text()
So i have tried
=importxml(ttps://mickschroeder.com/citation/?q=18515037, "//*[#id="citation_formatted"]/text()")
And it returns #N/A or blank. I've tried taking out the * but can't get it working. Do I need to escape the () in the text()? Or do I have the Xpath totally wrong. I did a search for the answer but I figure I'm so new I can't apply those concepts.
Thanks for any help you can give.
I am trying to import both the link to the Google Maps image and the address of the council from https://www.google.com.au/?gws_rd=ssl#q=Albany+City+council+address&time=445678
I have tried all sorts of Xpath expressions and keep getting a result saying the imported results were empty.
For the address I have tried:
//*[#class='_uX kno-fb-ctx']
//div[#class='_eF']
//*[#class='_eF']
//div/div/div/div/div/div/div/div/div/ol/li/div/div/div/ol/li/div
The info I want appears in 3 places on that page - so any Xpath that gets it from one of these locations is what I am looking for:
<div class="_uX kno-fb-ctx" aria-level="3" role="heading" data-hveid="29" data-ved="0CB0QtwcoADAA"><div class="_eF">102 North Road, Yakamia WA 6330</div>
id="lnv_href"></a></div></td><td valign="top" style="color:#222;line-height:1.24">102 North Road, Yakamia WA 6330
<div class="_lR"><div class="_mr"><span style="font-weight:bold">Address:</span> <span>102 North Road, Yakamia WA 6330</span>
Any help would be greatly appreciated.
1) info You want appears in 2 places on that page, probbably the best way is to use Xpath construction with contains.
2) for map use
//*[#id="media_result_group"]/ol/div/div/div/div[2]/div/div[3]/a/img/#src
or simple
//div[#class="rhsg4 rhsmap5col"]/a/img/#src
In my opinion, you can't do that because those data from Google result page can't be rendered easily by xpath into google spreadsheet. The reason is they are rendered by javascript (more tech savvy guys will correct me if I am wrong).
The answer marked correct is incorrect. The content I was looking for is HTML and is able to be captured with ImportHTMl("URL" "table",4) but I did need to add WA (stands for Western Australia) to the search string.
I'm trying to scrape data from a website, which does not seem to have to many clases in the tags. However i'm still wondering whether it is possible to scrape the titles from today using xpath.
So that it only retrieve the titles which is from 09/4 - 2015?
url: http://www.hltv.org/?pageid=96
Since date is unique 10/4 - 2015, you might locate a b tag node using xpath's contents(), see html here:
//b[contains(., '10/4 - 2015')]
then based on this node you go to its parent and siblings, smth. like this (not tested):
//b[contains(., '10/4 - 25')]/parent::div/siblings::div
Update
Since the current date items go at the bottom, here accorting to the html all the following-sibling nodes pertain to this data (google xpath sibling after)
//b[contains(., '10/4 - 25')]/parent::div/following-sibling::div[#class='newsItem']
See test here. If you want to fetch divs inbetween, then explore this