Good morning,
I can't extract the price on this page with the importxml function:
https://www.t-collector.com/reine?prop%5Bcolor%5D=black&product=26&side=front
I need it to update my google merchant files.
I've tried different formulas like:
=importxml(G2;"//span[#itemprop='price']")
=importxml(G2;"//b[#itemprop='price']/#content")
=importxml(G2;"//b[#itemprop='price'][1]/#content")
=importxml(G2;"//meta[#itemprop='price'][1]/#content")
=importxml("G2";"//span[#itemprop='price']")
but nothing works
Thanks
Sincerely
Website uses dynamic rendering. Selenium would be required here. But we can try with GoogleSheets. We use a custom script to load directly the JSON data.
The script to import JSON data with GoogleSheets (credits to Paul Gambill) : https://gist.github.com/paulgambill/cacd19da95a1421d3164
And the data :
https://www.t-collector.com/campaigns/C-PGE7F?format=json&store=tcollectorofficiel
We use SQL-like formulas to keep only the price. Result :
EDIT : Solution with IMPORTXML :
You can use the following formula (tested with 5 shirts) :
=IMPORTXML(A2;"substring-after(substring-before((//script)[6],'"",""category""'),',""price"":""')")
Output :
EDIT 2 : Fix to extract the default displayed price in euros :
=IMPORTXML(A2;"substring-after(substring-before(//script[starts-with(.,'var campaignObj')],'"",""gbp""'),'""eur"":""')")
Output :
EDIT 3 : To ignore on sale prices, we can use the following one liner :
=SI(IMPORTXML(A2;"substring(substring-after(//script[starts-with(.,'var campaignObj')],'""compare_at_prices"":{""eur"":""'),1,1)")=0;IMPORTXML(A2;"substring-after(substring-before(//script[starts-with(.,'var campaignObj')],'"",""gbp""'),'""eur"":""')");IMPORTXML(A2;"substring-before(substring-after(//script[starts-with(.,'var campaignObj')],'""compare_at_prices"":{""eur"":""'),'""')"))
Output :
Related
I am trying to scrape some data from the website Sporcle (specifically the Date Earned from one of the Badges) but the XPath that I got from [F12-->right-clicking the element-->Copy-->Copy XPath] does not seem to work with the google sheets command IMPORTXML; all I ever get is #N/A.
=IMPORTXML("https://www.sporcle.com/user/Jimmy/badges/earned/","//*[#id='badge-container']/div[1]/div[3]")
Website uses dynamic rendering. So, classic methods don't work. I see 3 ways to do it :
With IMPORTXML : we retrieve the JSON data from a script element and we parse it with formulas.
With IMPORTXML+ImportJSON script : we retrieve the JSON data from a script element and we parse it with the script (cleaner).
With IMPORTFROMWEB addon (number of requests are limited in the "free" plan).
Solution 1 :
Output :
First, we extract the JSON data in A1 with IMPORTXML and the following formula :
=IMPORTXML(B1;"substring-before(substring-after(//*[contains(text(),'badge_limiter')],'var badgeList = [{'),'}]')")
Then we parse the data with a combination of multiple formulas. In J2 we write :
=QUERY(ARRAYFORMULA(SPLIT(TRANSPOSE(SPLIT(SUBSTITUE(SUBSTITUE(SUBSTITUE(REGEXREPLACE(M1;"(""\w+?_\w+?"":)";"");""",";""";");"""";"");"},";"");"{"));";"));"select Col1,Col6")
Solution 2 :
Output :
First, we extract the JSON data in A1 with IMPORTXML and the following formula :
=IMPORTXML(B1;"substring-before(substring-after(//*[contains(text(),'badge_limiter')],'var badgeList = '),'}]')")&"}]"
Then we parse the data with the script. Formula used in F1 is :
=ImportJSONFromSheet("Feuille 15";"/badge_name,/earned_date")
Where Feuille 15 is the name of the sheet I'm working with. The rest is to select the columns of interest.
Solution 3 :
Output :
XPath used for badges names and dates :
:
//td[#class='left-align link-col col-width-1']
//td[#class="col-width-3"]
Then we pass the formula in B5:
=IMPORTFROMWEB(C1;C2:D2;B3:C3)
Note : be sure to set jsRendering to TRUE.
Side note : I'm based in Europe, so you'll probably need to replace ; with , in the formulas.
I am trying to extract the prices of some products in Mercado Libre website.
The problem is that sometimes it has discounts, and then it doesn't extract the text.
I leave one link with discount and one without. I want octoparse to extract the price in both situations.
How can I do it?
LINKS:
https://articulo.mercadolibre.com.mx/MLM-666847965-funda-protector-iphone-7-8-se-2020-supcase-ubstyle-negro-_JM?quantity=1#position=1&type=item&tracking_id=9e0a5e4a-891d-4b89-add3-7aca91d6969a
https://articulo.mercadolibre.com.mx/MLM-721688631-protector-funda-case-rudo-iphone-6-7-8-x-xs-xr-xs-max-11-pro-_JM?quantity=1#searchVariation=43860021612&position=8&type=pad&tracking_id=9e0a5e4a-891d-4b89-add3-7aca91d6969a&is_advertising=true&ad_domain=VQCATCORE_LST&ad_position=8&ad_click_id=YTY0MWNiMWQtMDFmNi00ZGJmLThjZjMtYWM3YWQyZTc3OWNl
I don't know well Octoparse, but if you can specify XPath manually then you can go with :
(//fieldset[contains(#class,"item-price")]//#content)[last()]
This will select exact price (with decimals) of the items. The value is taken from the attribute of the holding span element. So in your case :
254.62 and 250.0 will be extracted.
Alternative ways :
A) :
string(//fieldset[contains(#class,"item-price")]//span[#class="price-tag"])
Output :
$ 254 . 62 and $ 250
B) :
(//fieldset[contains(#class,"item-price")]//span[#class="price-tag-fraction"]/text())[last()]
Output :
254 and 250
I'm trying to bring data from players in hltv to Sheet with importxml but can't get it. I've discovered that there are multiple div classes in a row and inside them there are spans where the actual data is.
I have tried multiple ways to get either, the all the info together or one at a time, but I'm starting to get out options.
For example:
=IMPORTXML("https://www.hltv.org/stats/players/11893/ZywOo","//#class='Statistics-row'//#class='columns'")
Also I have tried to get players from certain country in https://www.hltv.org/stats/players
Can someone help?
Alternative to #Madhurjya proposal. With IMPORTFROMWEB addon you can have :
XPaths used :
//div[#class="statistics"]//span[1]
//div[#class="statistics"]//span[2]
Formula :
=IMPORTFROMWEB(B1;B2:C2)
But also :
Xpaths used :
//a[preceding-sibling::img[#alt="France"]]
//img[#alt="France"]/#alt
Formula :
=IMPORTFROMWEB(B1;B2:C2)
Note : number of requests are limited. Check the pricing or code your own GoogleAppScript.
I'm having trouble trying to get a value with IMPORTXML in a google spreadsheet ...
I am using as xpath:
//*[contains(#class,"price") which returns me smoothly, ALL prices posted on a web page
The problem is that within that same class (and I don't know why, with dynamic ID's!) I have 2 nodes/prices: "Registered Customer Price" and "Non-Customer Price", which is the 2nd. value ... and the one I am interested in obtaining.
So, I wanted to apply it like this:
(//*[contains(#class,"price")])[2] and with this, I only get the 2nd price... but of the whole page!
(and not the 2nd. price of each and every item!)
I assume it is a "syntax" problem ... but no matter how many times I try it, I don't get the expected result!
Can you give me a hand with this?
Thanks in advance for any suggestion!
Just use :
//div[#class='price-box'][2]//span[#id]
Output :
EDIT : With IMPORTFROMWEB:
//h4[.="Precio unitario por unidad"]/following-sibling::span/span[#id]
EDIT 2 : More robust XPath :
//h4[.="Precio unitario por unidad"]/following-sibling::span[#class="price-excluding-tax"][count(following-sibling::*)=0]/span[#id]
try:
=FILTER(IMPORTXML(
"http://www.maxiconsumo.com/sucursal_villa_dominico/comestibles/aceites/aceite-girasol.html";
"//*[contains(#id,'price-including-tax')]"); MOD(ROW(INDIRECT("A1:A"&COUNTA(IMPORTXML(
"http://www.maxiconsumo.com/sucursal_villa_dominico/comestibles/aceites/aceite-girasol.html";
"//*[contains(#id,'price-including-tax')]")))); 2)=0)
I need to use Google Spreadsheet ImportXML return a value from this website...
http://www.e-go.com.au/calculatorAPI2?pickuppostcode=2000&pickupsuburb=SYDNEY+CITY&deliverypostcode=4000&deliverysuburb=BRISBANE&type=Carton&width=40&height=35&depth=65&weight=2&items=3
the website simply displays the below in text and code...
error=OK
eta=Overnight
price=64.69
I need to return the values after last line 'price=', being a newbee I'm struggling with xpath query (?) required to make this happens...
=importxml("url",?)
Your help is greatly appreciated.
Thank you in advance.
Regards
first of all, IMPORTXML() won't work because your webpage is not formatted correctly for XML, and google sheets doesn't like it.
All hope is not lost tho, as your output is so simple. you can simply load the whole output using IMPORTDATA() and then process within google sheets
have a look at the output of the following formulae (where the url is stored in A1)
=IMPORTDATA(A1)
=transpose(IMPORTDATA(A1))
=index(IMPORTDATA(A1),3,1) - IF there are always 3 results, and price will always be in the third one this will work
=filter(IMPORTDATA(A1),left(IMPORTDATA(A1),5)="price") - if the price can appear in any of the result lines, but always starting with "price"