IMPORTXML Google Sheets for every 2nd node? - xpath

I'm having trouble trying to get a value with IMPORTXML in a google spreadsheet ...
I am using as xpath:
//*[contains(#class,"price") which returns me smoothly, ALL prices posted on a web page
The problem is that within that same class (and I don't know why, with dynamic ID's!) I have 2 nodes/prices: "Registered Customer Price" and "Non-Customer Price", which is the 2nd. value ... and the one I am interested in obtaining.
So, I wanted to apply it like this:
(//*[contains(#class,"price")])[2] and with this, I only get the 2nd price... but of the whole page!
(and not the 2nd. price of each and every item!)
I assume it is a "syntax" problem ... but no matter how many times I try it, I don't get the expected result!
Can you give me a hand with this?
Thanks in advance for any suggestion!

Just use :
//div[#class='price-box'][2]//span[#id]
Output :
EDIT : With IMPORTFROMWEB:
//h4[.="Precio unitario por unidad"]/following-sibling::span/span[#id]
EDIT 2 : More robust XPath :
//h4[.="Precio unitario por unidad"]/following-sibling::span[#class="price-excluding-tax"][count(following-sibling::*)=0]/span[#id]

try:
=FILTER(IMPORTXML(
"http://www.maxiconsumo.com/sucursal_villa_dominico/comestibles/aceites/aceite-girasol.html";
"//*[contains(#id,'price-including-tax')]"); MOD(ROW(INDIRECT("A1:A"&COUNTA(IMPORTXML(
"http://www.maxiconsumo.com/sucursal_villa_dominico/comestibles/aceites/aceite-girasol.html";
"//*[contains(#id,'price-including-tax')]")))); 2)=0)

Related

How to properly scraping filtered content using XPath Query to Google Sheet?

So, this is about a content from a website which I want to get and put it in my Google Sheets, but I'm having difficulty understanding the class of the content.
target link: https://www.cnbc.com/quotes/?symbol=XAU=
This number is what I want to get from. Picture 1: The part which i want to scrape
And this is what the code looks like in inspector. Picture 2: The code shown in inspector
The target is inside a span attribute but the span attribute looks very difficult to me, so I tried to simplify it using this line of code here =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span")
Picture 3: List is shown when putting the code
After some tries, I am able to get the right target, but it confuse me, Im using this code =IMPORTXML("https://www.cnbc.com/quotes/?symbol=XAU=","//table[#class='quote-horizontal regular']//tr/td/span[#class='last original'][1]")
Picture 4: The right target is shown when the xpath query is more specified
As what you can see in 2nd Picture, 'last original' is not really the full name of the class, when I put the 'last original ng-binding' instead it gave me an error saying imported content is empty
So, correct me if my code is wrong, or accidental worked out somehow because there's another correct way?
How about this answer?
Modified formula 1:
When the name of class is last original and last original ng-binding, how about the following xpath and formula?
=IMPORTXML(A1,"//span[contains(#class,'last original')][1]")
In this case, the URL of https://www.cnbc.com/quotes/?symbol=XAU= is put in the cell "A1".
In this case, //span[contains(#class,'last original')][1] is used as the xpath. The value of span that the name of class includes last original is retrieved. So last original and last original ng-binding can be used.
Modified formula2:
As other xpath, how about the following xpath and formula?
=IMPORTXML(A1,"//meta[#itemprop='price']/#content")
It seems that the value is included in the metadata. So this sample retrieves the value from the metadata.
Reference:
IMPORTXML
To complete #Tanaike's answer, two alternatives :
=IMPORTXML(B2;"//span[#class='year high']")
"Year high" seems always equal to the current stock index value.
Or, with value retrieved from the script element :
=IMPORTXML(B2;"substring-before(substring-after(//script[contains(.,'modApi')],'""last\"":\""'),'\')")
Note : since I'm based in Europe, you need to replace ; with , in the formulas.

importxml to Google Sheets + hltv

I'm trying to bring data from players in hltv to Sheet with importxml but can't get it. I've discovered that there are multiple div classes in a row and inside them there are spans where the actual data is.
I have tried multiple ways to get either, the all the info together or one at a time, but I'm starting to get out options.
For example:
=IMPORTXML("https://www.hltv.org/stats/players/11893/ZywOo","//#class='Statistics-row'//#class='columns'")
Also I have tried to get players from certain country in https://www.hltv.org/stats/players
Can someone help?
Alternative to #Madhurjya proposal. With IMPORTFROMWEB addon you can have :
XPaths used :
//div[#class="statistics"]//span[1]
//div[#class="statistics"]//span[2]
Formula :
=IMPORTFROMWEB(B1;B2:C2)
But also :
Xpaths used :
//a[preceding-sibling::img[#alt="France"]]
//img[#alt="France"]/#alt
Formula :
=IMPORTFROMWEB(B1;B2:C2)
Note : number of requests are limited. Check the pricing or code your own GoogleAppScript.

Xpath with contains() in importXML()

I've done dozens times, but now don't get what I'm doing wrong. I want to extract specific records, into 2 separate columns (I know that order wil not match), so I use:
//a/#href[contains(.; "github")]
and
//*[contains(text(); "Pricing:")]
But non of them is working - where my mistake?
(my sandbox: https://docs.google.com/spreadsheets/d/11Z3xybq_eYQvjn2-UBOomgeJxFrrsFoXKzF9yZSeASM/edit#gid=1841586203 with LT localle)
damn, those google sheet localles!!!... must be:
//a/#href[contains(., "github")]
and
//*[contains(text(), "Pricing:")]
I'll keep for further reference.

ImportXML Xpath Query Return txt

I need to use Google Spreadsheet ImportXML return a value from this website...
http://www.e-go.com.au/calculatorAPI2?pickuppostcode=2000&pickupsuburb=SYDNEY+CITY&deliverypostcode=4000&deliverysuburb=BRISBANE&type=Carton&width=40&height=35&depth=65&weight=2&items=3
the website simply displays the below in text and code...
error=OK
eta=Overnight
price=64.69
I need to return the values after last line 'price=', being a newbee I'm struggling with xpath query (?) required to make this happens...
=importxml("url",?)
Your help is greatly appreciated.
Thank you in advance.
Regards
first of all, IMPORTXML() won't work because your webpage is not formatted correctly for XML, and google sheets doesn't like it.
All hope is not lost tho, as your output is so simple. you can simply load the whole output using IMPORTDATA() and then process within google sheets
have a look at the output of the following formulae (where the url is stored in A1)
=IMPORTDATA(A1)
=transpose(IMPORTDATA(A1))
=index(IMPORTDATA(A1),3,1) - IF there are always 3 results, and price will always be in the third one this will work
=filter(IMPORTDATA(A1),left(IMPORTDATA(A1),5)="price") - if the price can appear in any of the result lines, but always starting with "price"

XPath Find full HTML element ID from partial ID

I am looking to write an XPath query to return the full element ID from a partial ID that I have constructed. Does anyone know how I could do this? From the following HTML (I have cut this down to remove work specific content) I am looking to extract f41_txtResponse from putting f41_txt into my query.
<input id="f41_txtResponse" class="GTTextField BGLQSTextField2 txtResponse" value="asdasdadfgasdfg" name="f41_txtResponse" title="" tabindex="21"/>
Cheers
You can use contains to select the element:
//*[contains(#id, 'f41_txt')]
Thanks to Thomas Jung I have been able to figure this out. If I use:
//*[contains(./#id, 'f41_txt')]/#id
This will return just the ID I am looking for.
I suggest to not use numbers from Id , when you are composing xpath's using partial id. Those number reprezent DINAMIC elements. And dinamic elements change over the next deploys / releases in the System Under Test.The pourpose is to UNIQUE identify elements.
Using this may be a better option or something like this, yo got the idea:
//input[contains(#id, '_txtResponse')]/#id
It worked for me like below
//*[contains(./#id, 'f41_txt')]

Resources