Is it still possible to extract metadata like price from a site using ImportXML in Google Sheets?
I have tried multiple different variations on the following page without success: https://www.officedepot.com/a/products/273646/Office-Depot-White-Copy-Paper-Letter/
=IMPORTXML("https://www.officedepot.com/a/products/273646/Office-Depot-White-Copy-Paper-Letter/","//*[contains(#itemprop,'price')]/#content")
=IMPORTXML("https://www.officedepot.com/a/products/273646/Office-Depot-White-Copy-Paper-Letter/","//meta[#itemprop='price']/#content")
I should be able to use this formula to return "58.99", but I continuously receive an NA error.
OfficeDepot seems to block the requests from GoogleSheets.
Some clues :
Get the price directly from the .json (we can't use ImportJSON script to load this directly in Sheets since requests are blocked). Change the product id in the url accordingly :
https://www.officedepot.com/mobile/getAjaxPriceListFromService.do?items=273646&mapBySkuId=true&includeOos=true
Another option could be to use SerpAPI(commercial)+ImportJSON to fetch the product price from GoogleShooping.
Or you can use the GoogleSearch API(free)+ImportJson. Output :
You need an API key and an ImportJSON script (credits to Brad Jasper) to do this. Once you have installed the script and activated your API key, add a search engine. In the settings, you have to define the target website.
Copy somewhere your search engine id (cx=XXXXXXXXXX). Once this is done and assuming you have the urls in column A, you can paste in cell B2 :
=REGEXEXTRACT(A2;"products\/(\d+)")
This is for extracting the product id.
In cell C2, you can paste :
="https://customsearch.googleapis.com/customsearch/v1?cx={yoursearchengineID}&key={yourAPIkey}&num=1&fields=items(pagemap(offer(price)))&q="&B2
We construct the request for the API. You need to add your API key and your search engine id in this formula.
In cell D2, you can paste :
=QUERY(ImportJSON(C2);"SELECT Col1 label Col1''";1)
This for importing the .json result and cleaning it a bit.
Notes : this method could fail with some products (the new ones). I'm based in Europe. So ";" in the formulas should be replaced with ",".
Related
I've been trying with no success to importxml using google sheets to scrape the Advanced Receiving table data from the url https://www.pro-football-reference.com/boxscores/201912290car.htm.
I've tried the XPath copied directly from the inspect chrome page of: //*[#id="div_receiving_advanced"]
where I always get the "Imported content is empty" error message.
I'm stumped because it works with the Passing, Rushing, & Receiving table data using the XPath of: //*[#id="div_player_offense"]
When I use the XPath of: //*[#id="all_receiving_advanced"], I get the following results.
unparsed results
However, I'd like to parse the data from the 2nd column so it looks like this.
parsed results
Any help would be greatly appreciated.
Since some players don't have value for specific columns (for eg : "Rec/Br"), transforming directly the data returned by IMPORTXML will produce a scrambled table.
2 solutions :
A) Use IMPORTFROMWEB addon (number of requests are limited in the free plan) with JS rendering activated and a base selector option to keep the data structure. XPath expressions needed for data :
/th/a
/td[#data-stat="team"]
/td[#data-stat="targets"]
/td[#data-stat="rec"]
/td[#data-stat="rec_yds"]
/td[#data-stat="rec_first_down"]
/td[#data-stat="rec_air_yds"]
/td[#data-stat="rec_air_yds_per_rec"]
/td[#data-stat="rec_yac"]
/td[#data-stat="rec_yac_per_rec"]
/td[#data-stat="rec_broken_tackles"]
/td[#data-stat="rec_broken_tackles_per_rec"]
/td[#data-stat="rec_drops"]
/td[#data-stat="rec_drop_pct"]
for the headers :
//div[#id="div_receiving_advanced"]//th[contains(#class,"poptip")]
for the base selector :
//div[#id="div_defense_advanced"]//tr[#data-row][not(#class)]
Formula used in C6 :
IMPORTFROMWEB(B1;B2:O2;B3:C4)
Output :
Side note : IMPORTFROMWEB often output loading errors.
B) Use IMPORTDATA and formulas to generate the table. First we load the data of interest with a filter (QUERY). Then we fix the blank cells problem with SUBSTITUTE. After that we extract the data with REGEXEXTRACT. Finally we apply a last filter and SPLIT the data to populate the cells.
Formula :
=ARRAYFORMULA(SPLIT(QUERY(ARRAYFORMULA(REGEXREPLACE(ARRAYFORMULA(SUBSTITUTE(QUERY(IMPORTDATA(B3);"select Col1 where Col1 contains 'rec_broken_tackles_per_rec'");"></td>";">0</td>"));".+htm.+?>(.+?)<.+team.+([A-Z]{3}).+targets.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+?rec.+?>(.+?)<.+";"$1;$2;$3;$4;$5;$6;$7;$8;$9;$10;$11;$12;$13;$14"));"select * WHERE NOT Col1 contains '<'");";"))
Output :
In both cases, blank cells are replaced with 0.
My working workbook is here.
EDIT :
For "Advanced Defense Table" with IMPORTDATA :
=ARRAYFORMULA(SPLIT(QUERY(ARRAYFORMULA(REGEXREPLACE(ARRAYFORMULA(SUBSTITUTE(QUERY(IMPORTDATA(B3);"select Col1 where Col1 contains 'def_tgt_yds_per_att'");"></td>";">0</td>"));".+htm.+?>(.+?)<.+team.+([A-Z]{3})<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?def.+?>(.+?)<.+?bli.+?>(.+?)<.+?qb_.+?>(.+?)<.+?qb_.+?>(.+?)<.+?sac.+?>(.+?)<.+?pre.+?>(.+?)<.+?tac.+?>(.+?)<.+?tac.+?>(.+?)<.+?tac.+?>(.+?)<.+";"$1;$2;$3;$4;$5;$6;$7;$8;$9;$10;$11;$12;$13;$14;$15;$16;$17;$18;$19;$20;$21;$22"));"select * WHERE NOT Col1 contains '<'");";"))
Output :
I'm trying to parse this table into Google Sheets: https://exvius.gamepedia.com/Chaining/Bolting_Strike
And getting the title text from where there are images.
I can't figure out how to get the text from the full table, as well as img/#alt in cases where it's available. I can get the table with
=IMPORTXML("https://exvius.gamepedia.com/Chaining/Bolting_Strike","//table[#class='wikitable']/tbody/tr[position()>=3]")
And only the image texts
=IMPORTXML("https://exvius.gamepedia.com/Chaining/Bolting_Strike","//table[#class='wikitable']/tbody/tr[position()>=3]/td/a/img/#alt")
But I can't seem to do both, is that a limitation of Google Sheets IMPORTXML?
I've tried with OR and other bool operators with no luck. Tried with axes but that was also a no go for me.
I propose something like this :
Sheet
Description:
In B1 we have the url of the webpage.
In B3 we have the following formula to import the first part of the table :
=QUERY(IMPORTHTML(B1;"table";1);"select Col1,Col2,Col3 OFFSET 2";0)
Columns L to O contain the following formulas to get the element names and the ability names (which will be used as a key in a VLOOKUP step). 4 formulas because an ability could have 2 element names. In L3,M3,N3,03 we have :
=IMPORTXML(B1;"//td/a[1]/img[#srcset]/ancestor::td[1]/preceding::a[1][#title]")
=IMPORTXML(B1;"//td/a[1]/img[#srcset]/#alt")
=IMPORTXML(B1;"//td/a[2]/img[#srcset]/ancestor::td[1]/preceding::a[1][#title]")
=IMPORTXML(B1;"//td/a[2]/img[#srcset]/#alt")
Formula in E4 is a one liner where the results of 2 VLOOKUP are merged together. We use VLOOKUP to pair each ability name with an element.
=ARRAYFORMULA(REGEXREPLACE(ARRAYFORMULA(IFERROR(VLOOKUP(C4:INDIRECT("C"&COUNTA(C:C)+2);L:M;2;FALSE);"")&"|"&ARRAYFORMULA(IFERROR(VLOOKUP(C4:INDIRECT("C"&COUNTA(C:C)+2);N:O;2;FALSE);"")));"^\||\|$";""))
To finish, in H3 we have the last part of the table :
=QUERY(IMPORTHTML(B1;"table";1);"select Col5,Col6 OFFSET 2";0)
The rest (colours, borders,..) is standard and conditionnal formatting.
Side note : I'm based in Europe so you might have to change ; with , in the formulas.
A number of investments from TIAA.org are not traded on exchanges and not available via a ticker symbol thru say GoogleFinance etc. For one of these I would like to 'scrape' the daily price directly off of TIAA.org website and into a cell auto-magically.
In Google sheets I thought it would be easy enough using ImportHTML as a table but no luck. I've experimented with ImportXML but cannot seem to figure out how to set the xpath query for the specific price I'm interested in and leaving me confused - keep ending up with a "N/A" cell ("Error: Imported content is empty").
Using this URL:
https://www.tiaa.org/public/investment-performance/tiaavariableannuity/profile?ticker=41091375
Could someone take a look and suggest how I might import the daily price (aka unit value) for QREARX into a Google sheet cell using an xpath with ImportXML or other method?
Thanks
If you slightly change your url prefix to https://www.tiaa.org/public/tcfpi/Investment/Portfolio?symbol= and then add your ticket number at the end like you did in your example: 41091375
then you can use this importxml function to pull in the value your looking for:
=IMPORTXML("https://www.tiaa.org/public/tcfpi/Investment/Portfolio?symbol=41091375","//*[#class='first']/text()")
I've been working on this project where I need to consolidate data from two other sheets within the spreadsheet and filter the result for easy viewing. But I realized the problem when the filter gives no result there will be a #VALUE! error. The error isn't solved even when I have used IFERROR.
Link to the sample of the Google Spreadsheets.
There are two classes and I wish to filter out those who passed in the class and populate the table in the collated sheet.
You should be able to do something like this:
=query({Class1!A2:C; Class2!A2:C}, "where Col3 = 'pass'")
(change the sheet names and ranges to suit !).
(Also check the formula I entered in A1 of sheet 'JP')
I recently began working with import.io to scrape various websites for data I use in researching stocks. I created an API that pulls 5 yrs of financial data from ADVFN.com I then used the export function within import.io to integrate the API I created with google sheets. It works great. However, what I would like to do is be able to edit the path in the formula so that I can use the API to call up the same data for any stock symbol I enter in Sheet1,cell B1 in my spreadsheet. Below is the formula. The stock symbol here is ATW. As you can see you need to specify the exchange in the path as well (in this case it's the NYSE). I want to be able to edit this so that I can enter a stock symbol in cell B1 on my spreadsheet and it will search for that symbol and return the data. Any assistance here would be appreciated. Thanks!
=ImportHtml("https://api.import.io/store/data/a5816419-a232-440d-92c8-09bf989ccbca/_query?input/webpage/url=http%3A%2F%2Fwww.advfn.com%2Fstock-market%2FNYSE%2FATW%2Ffinancials%3Fbtn%3Dstart_date%26start_date%3D11%26mode%3Dannual_reports&format=HTML&_user=102fe4dc-b403-423f-89a4-e16151128d92&_apikey=102fe4dcb403423f89a4e16151128d92f6bf183ba6b6e3907e836234a054aef23b1c51201a53e1e8336367c8282bafcabffbdbcf85b0d9de2f0769008b04cf99e0fa46179a9279f4d82bbfebf8b3a660", "table", 1)
Here's one way to do it.
Enter the stock symbol in cell B1. It creates the URL in cell C1 and the import function is in cell A2.
The formula in A2 would be
=ImportHtml(C1, "table", 1)
and the formula in C1 would be
="https://api.import.io/store/data/a5816419-a232-440d-92c8-09bf989ccbca/_query?input/webpage/url=http%3A%2F%2Fwww.advfn.com%2Fstock-market%2FNYSE%2F"&$B$1&"%2Ffinancials%3Fbtn%3Dstart_date%26start_date%3D11%26mode%3Dannual_reports&format=HTML&_user=102fe4dc-b403-423f-89a4-e16151128d92&_apikey=102fe4dcb403423f89a4e16151128d92f6bf183ba6b6e3907e836234a054aef23b1c51201a53e1e8336367c8282bafcabffbdbcf85b0d9de2f0769008b04cf99e0fa46179a9279f4d82bbfebf8b3a660"