importxml to Google Sheets + hltv - xpath

I'm trying to bring data from players in hltv to Sheet with importxml but can't get it. I've discovered that there are multiple div classes in a row and inside them there are spans where the actual data is.
I have tried multiple ways to get either, the all the info together or one at a time, but I'm starting to get out options.
For example:
=IMPORTXML("https://www.hltv.org/stats/players/11893/ZywOo","//#class='Statistics-row'//#class='columns'")
Also I have tried to get players from certain country in https://www.hltv.org/stats/players
Can someone help?

Alternative to #Madhurjya proposal. With IMPORTFROMWEB addon you can have :
XPaths used :
//div[#class="statistics"]//span[1]
//div[#class="statistics"]//span[2]
Formula :
=IMPORTFROMWEB(B1;B2:C2)
But also :
Xpaths used :
//a[preceding-sibling::img[#alt="France"]]
//img[#alt="France"]/#alt
Formula :
=IMPORTFROMWEB(B1;B2:C2)
Note : number of requests are limited. Check the pricing or code your own GoogleAppScript.

Related

How to get list of all exchanges with xpath to google sheets

Try to get a list of cryptocurrency exchanges from coingecko 2nd page into my google sheet.
To get a result like:
Tokenize
Bibox
Vebitcoin
...
Try to make it with.
IMPORTXML("https://www.coingecko.com/en/exchanges?page=2", "//*[contains(text(),' exchange')]")
As a result, get Error:
Imported content is empty.
How about this modified xpath?
Modified xpath:
//span/a[contains(#href,'/en/exchanges/')]
and
//span[#class='pt-2 flex-column']/a[contains(#href,'/en/exchanges/')]
Modified formula
=IMPORTXML(A1,"//span/a[contains(#href,'/en/exchanges/')]")
In this case, the URL of https://www.coingecko.com/en/exchanges?page=2 is put to the cell "A1".
Result:
Note:
The list of cryptocurrency exchanges can be retrieved by the modified xpath. But in this case, it seems that the values of Tokenize, Bibox and Vebitcoin are not included.
Reference:
IMPORTXML
To get the list of all exchanges, you can also use the following formula :
=ARRAYFORMULA(REGEXEXTRACT(QUERY(TRANSPOSE(IMPORTDATA("https://api.coingecko.com/api/v3/search?locale=en&img_path_only=1"
));"select * WHERE Col1 starts with ""name""");"name:""(.+)"""))
Output (~ 8000 elements) :

My XPath in Google Sheets IMPORTXML command always returns #N/A

I am trying to scrape some data from the website Sporcle (specifically the Date Earned from one of the Badges) but the XPath that I got from [F12-->right-clicking the element-->Copy-->Copy XPath] does not seem to work with the google sheets command IMPORTXML; all I ever get is #N/A.
=IMPORTXML("https://www.sporcle.com/user/Jimmy/badges/earned/","//*[#id='badge-container']/div[1]/div[3]")
Website uses dynamic rendering. So, classic methods don't work. I see 3 ways to do it :
With IMPORTXML : we retrieve the JSON data from a script element and we parse it with formulas.
With IMPORTXML+ImportJSON script : we retrieve the JSON data from a script element and we parse it with the script (cleaner).
With IMPORTFROMWEB addon (number of requests are limited in the "free" plan).
Solution 1 :
Output :
First, we extract the JSON data in A1 with IMPORTXML and the following formula :
=IMPORTXML(B1;"substring-before(substring-after(//*[contains(text(),'badge_limiter')],'var badgeList = [{'),'}]')")
Then we parse the data with a combination of multiple formulas. In J2 we write :
=QUERY(ARRAYFORMULA(SPLIT(TRANSPOSE(SPLIT(SUBSTITUE(SUBSTITUE(SUBSTITUE(REGEXREPLACE(M1;"(""\w+?_\w+?"":)";"");""",";""";");"""";"");"},";"");"{"));";"));"select Col1,Col6")
Solution 2 :
Output :
First, we extract the JSON data in A1 with IMPORTXML and the following formula :
=IMPORTXML(B1;"substring-before(substring-after(//*[contains(text(),'badge_limiter')],'var badgeList = '),'}]')")&"}]"
Then we parse the data with the script. Formula used in F1 is :
=ImportJSONFromSheet("Feuille 15";"/badge_name,/earned_date")
Where Feuille 15 is the name of the sheet I'm working with. The rest is to select the columns of interest.
Solution 3 :
Output :
XPath used for badges names and dates :
:
//td[#class='left-align link-col col-width-1']
//td[#class="col-width-3"]
Then we pass the formula in B5:
=IMPORTFROMWEB(C1;C2:D2;B3:C3)
Note : be sure to set jsRendering to TRUE.
Side note : I'm based in Europe, so you'll probably need to replace ; with , in the formulas.

IMPORTXML Google Sheets for every 2nd node?

I'm having trouble trying to get a value with IMPORTXML in a google spreadsheet ...
I am using as xpath:
//*[contains(#class,"price") which returns me smoothly, ALL prices posted on a web page
The problem is that within that same class (and I don't know why, with dynamic ID's!) I have 2 nodes/prices: "Registered Customer Price" and "Non-Customer Price", which is the 2nd. value ... and the one I am interested in obtaining.
So, I wanted to apply it like this:
(//*[contains(#class,"price")])[2] and with this, I only get the 2nd price... but of the whole page!
(and not the 2nd. price of each and every item!)
I assume it is a "syntax" problem ... but no matter how many times I try it, I don't get the expected result!
Can you give me a hand with this?
Thanks in advance for any suggestion!
Just use :
//div[#class='price-box'][2]//span[#id]
Output :
EDIT : With IMPORTFROMWEB:
//h4[.="Precio unitario por unidad"]/following-sibling::span/span[#id]
EDIT 2 : More robust XPath :
//h4[.="Precio unitario por unidad"]/following-sibling::span[#class="price-excluding-tax"][count(following-sibling::*)=0]/span[#id]
try:
=FILTER(IMPORTXML(
"http://www.maxiconsumo.com/sucursal_villa_dominico/comestibles/aceites/aceite-girasol.html";
"//*[contains(#id,'price-including-tax')]"); MOD(ROW(INDIRECT("A1:A"&COUNTA(IMPORTXML(
"http://www.maxiconsumo.com/sucursal_villa_dominico/comestibles/aceites/aceite-girasol.html";
"//*[contains(#id,'price-including-tax')]")))); 2)=0)

How to extract the price with importxml google sheets xpath

Good morning,
I can't extract the price on this page with the importxml function:
https://www.t-collector.com/reine?prop%5Bcolor%5D=black&product=26&side=front
I need it to update my google merchant files.
I've tried different formulas like:
=importxml(G2;"//span[#itemprop='price']")
=importxml(G2;"//b[#itemprop='price']/#content")
=importxml(G2;"//b[#itemprop='price'][1]/#content")
=importxml(G2;"//meta[#itemprop='price'][1]/#content")
=importxml("G2";"//span[#itemprop='price']")
but nothing works
Thanks
Sincerely
Website uses dynamic rendering. Selenium would be required here. But we can try with GoogleSheets. We use a custom script to load directly the JSON data.
The script to import JSON data with GoogleSheets (credits to Paul Gambill) : https://gist.github.com/paulgambill/cacd19da95a1421d3164
And the data :
https://www.t-collector.com/campaigns/C-PGE7F?format=json&store=tcollectorofficiel
We use SQL-like formulas to keep only the price. Result :
EDIT : Solution with IMPORTXML :
You can use the following formula (tested with 5 shirts) :
=IMPORTXML(A2;"substring-after(substring-before((//script)[6],'"",""category""'),',""price"":""')")
Output :
EDIT 2 : Fix to extract the default displayed price in euros :
=IMPORTXML(A2;"substring-after(substring-before(//script[starts-with(.,'var campaignObj')],'"",""gbp""'),'""eur"":""')")
Output :
EDIT 3 : To ignore on sale prices, we can use the following one liner :
=SI(IMPORTXML(A2;"substring(substring-after(//script[starts-with(.,'var campaignObj')],'""compare_at_prices"":{""eur"":""'),1,1)")=0;IMPORTXML(A2;"substring-after(substring-before(//script[starts-with(.,'var campaignObj')],'"",""gbp""'),'""eur"":""')");IMPORTXML(A2;"substring-before(substring-after(//script[starts-with(.,'var campaignObj')],'""compare_at_prices"":{""eur"":""'),'""')"))
Output :

ImportXML Xpath Query Return txt

I need to use Google Spreadsheet ImportXML return a value from this website...
http://www.e-go.com.au/calculatorAPI2?pickuppostcode=2000&pickupsuburb=SYDNEY+CITY&deliverypostcode=4000&deliverysuburb=BRISBANE&type=Carton&width=40&height=35&depth=65&weight=2&items=3
the website simply displays the below in text and code...
error=OK
eta=Overnight
price=64.69
I need to return the values after last line 'price=', being a newbee I'm struggling with xpath query (?) required to make this happens...
=importxml("url",?)
Your help is greatly appreciated.
Thank you in advance.
Regards
first of all, IMPORTXML() won't work because your webpage is not formatted correctly for XML, and google sheets doesn't like it.
All hope is not lost tho, as your output is so simple. you can simply load the whole output using IMPORTDATA() and then process within google sheets
have a look at the output of the following formulae (where the url is stored in A1)
=IMPORTDATA(A1)
=transpose(IMPORTDATA(A1))
=index(IMPORTDATA(A1),3,1) - IF there are always 3 results, and price will always be in the third one this will work
=filter(IMPORTDATA(A1),left(IMPORTDATA(A1),5)="price") - if the price can appear in any of the result lines, but always starting with "price"

Resources