Importing book names from goodreads.com into Google Sheets with ImportXML gives "Import Internal Error" sometimes - xpath

I have a formula that fetches names of books from goodreads.com:
=IMPORTXML("https://www.goodreads.com/book/show/" & gr_id; "//*[#id='bookTitle']")
where gr_id is a column containing ids of the books. For example when gr_id=23848607, it fetches from URL https://www.goodreads.com/book/show/23848607 and the result is "Warheart".
The formula worked fine some time ago. I did not change anything and now I noticed it stopped working for some of the books (still working for others). Instead of the name of the book now it gives N/A with "Import Internal Error" hint. The ids that do not work are:
48332548
35906922
How to make it work for all books?
There were many questions posted about "Import Internal Error" problems. I tried some solutions including copying the formula to a fresh sheet, but it did not work.
Update: I tried the following different XPath formulas instead of "//*[#id='bookTitle']".
"//h1[#id='bookTitle']"
"//h1"
Those different XPath formulas worked the same as the original XPath formula. They worked correctly for the same ids that the original one did and produced N/As for the same ids that the original one did.
Update: I just re-checked and all my formulas worked correctly for all gr_ids (I had not changed anything since the time when they did not work.) May be someone knows how to prevent them from stopping working in the future.
Update: I re-checked once again. Of all gr_ids only this one was showing N\A now: 35906922. I created an example spreadsheet, because my working spreadsheet contains too many unrelated details, but the problem did not appear in the example spreadsheet. I went back to my working spreadsheet and reloaded it - and the problem disappeared in my working spreadsheet too. Then I added more test data in the example spreadsheet and the following new example gr_ids showed N\A:
48213012
48213092
I tried to make a copy of the example spreadsheet to see if it fixes the problem. The behavior in the copy example spreadsheet was identical to the original example spreadsheet - the problem only with two gr_ids specified above.

if you run full IMPORTXML on those two IDs you can see it won't return anything at all:
=IMPORTXML("https://www.goodreads.com/book/show/48213012-fathers-and-sons", "//*")
which means that Google Sheets can't reach the XML content for some reason (could be something similar to https://stackoverflow.com/a/24891676/5632629)
therefore we can try to read the source code directly with IMPORTDATA where we can find around 70 elements with the same information so we pick one, isolate it and remove HTML tags. then we just wrap the prior formula in IFERROR and force the formula to take a 2nd look if it fails first time. the result is like this:
=IFERROR(IMPORTXML("https://www.goodreads.com/book/show/"&A:A, "//*[#id='bookTitle']"),
REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(
IMPORTDATA("https://www.goodreads.com/book/show/"&A:A), 100, 1),
"select Col1 where Col1 contains '</title>'"), ">(.*) by"))

IMPORTXML() seems to be unreliable. I decided not to use it, because I did not find an acceptable solution to my problem. Instead of using IMPORTXML() I exported my books from goodreads.com to csv file (there is such a feature of goodreads.com) and then imported the csv file into my spreadsheet. This is not be an perfect solution, because I need to re-import every time I need to update the books, but at least it works.

Related

I'm trying to use Data Validation in Google Sheets to only accept a hex color – any help appreciated

I think the title says it all. I have a shared google sheet which I'd like to limit several columns to make sure the information is correctly added as a hexcode.
I've been searching and I just can't seem to find anything, but this code I found for EXCEL may be a starting point:
=AND(LEN(A2)<13,ISERROR(HEX2DEC(A2))=FALSE)
It does not seem to work for Sheets...
try:
=ISNUMBER(HEX2DEC(REGEXEXTRACT(A1, "#(.*)")))*(LEN(A1)<8)

What is the required file format for Google AutoML Datasets?

Whenever I try to upload my dataset to the AutoML Natural Language Web UI, I get the error
Something is wrong, please try again.
The documentation is not very insightful about how my CSV file is supposed to look, but I tried to make a simple sample file just to make sure it works at all, it looks like this:
text,label
asdf,cat
asodlkao,dog
asdkasdsadksafask,cat
waewq23,cat
dads,cat
saiodjas,cat
skdoaskdoas,dog
hgfkgizk,dog
fzdrgbfd,cat
otiujrhzgf,cat
vchztzr,dog
aksodkasodks,dog
sderftz,dog
dsoakd,dog
qweqweqw,cat
asdqweqe,cat
dkawosdkaodk,dog
ewqeweq,cat
fdsffds,dog
bvcghh,cat
rthnghtd,dog
sdkosadkasodk,cat
sdjidghdfig,cat
kfodskdsof,dog
saodsadok,dog
ksaodksaod,dog
vncvb,cat
I chose this formatting according to the Google suggested Syntax
But even with this formatting I still get the same error
I've seen this question Format of the input dataset for Google AutoML Natural Language multi-label text classification but according to the answers there it seems my formatting should work, so I do not know why I get the error
I've just copied the CSV file and uploaded it to my own project and the dataset created worked. One problem is that an extra label was created "label" - this is because the header is not expected to be in the csv file (probably this should get fixed).
Based on that it seems the problem isn't the CSV file format. I would recommend to check if your project is setup correctly. You can open a bug to get someones help. Either you can open a bug in public issue tracker or send feedback using the UI (there is 'Feedback' option in the menu on top right side of the page).
I have found the problem! As Michal K said, there was nothing wrong with the formatting, the real problem was I was not assigned the role of Storage Object Creator, which is necessary because the Data is uploaded in Cloud Storage first

Google Sheets xpath query not working

Hoping someone smarter than me can help me sort this out! I've been stumped for a few days now trying to pull some data from website into Google Sheet using ImportXML with no luck.
I'm looking to import the average odds for various sporting events from the website Oddsportal.com which update and change throughout the day. I'd like my sheet to also update these odds, similar to stock prices.
For example:
http://www.oddsportal.com/search/San+Jose+Sharks/
I would like to pull the Average Odds for Team "1" (+136) into a cell, the odds for Tie "X"(+277) into a cell and Team "2"(+161) into individual cells. Just the odds portion. If it's unable to be pulled from that page it is also listed on http://www.oddsportal.com/hockey/usa/nhl/san-jose-sharks-nashville-predators-6cPaAHOM/ down at the bottom in the Average Odds Row.
This seems simple enough but I just can't seem to get the ImportXML query correct without an error.
I've looked at the page's source code (Ctrl-U). The original html does not contain needed values, they most likely loaded later thru xhr (ajax) call:
So most likely you'll not succeed with mere a request html.
You need to explore Network in the browser DevTools to find out what request is initiated (by JS files) to get needed data. This might be even unique one containing hash signiture, so you'll not reproduce it for future use.
I recommend you to turn to scriping tools for retrieving that info.

Google spreadsheet ImportXML Error:"the XPath query did not return any data"

I continue to get this error when I try to run this XPath query
//div[#iti='0']
on this link (flight search from google)
https://www.google.com/flights/#search;f=LGW;t=JFK;d=2014-05-22;r=2014-05-26
I get something like this:
=ImportXML("https://www.google.fr/flights/#search;f=jfk;t=lgw;d=2014-02-22;r=2014-02-26";"//div[#iti='0']")
I verified and the XPath is correct (I get the answer wanted using XPath helper, the answer wanted are the data relative to the first flight selected).
I guess that it is a problem of syntax, but I tried more or less all the combinations of lower/uppercase, punctuation (replacing ; , ' ") and I tried to link the URI and the XPath query stored in cells, but nothing works.
Any help will be appreciated.
As a matter of fact, maybe it is a bug on the new google sheets or they have changed how the function works. I've activated mine and when I try to use the ImportXML it simply wont work. Since I have some old sheets here (on the old mechanism) they still work normally. If I copy and paste the script from the old to the new one it simply doesn't get any data.
Here a example:
=ImportXML("http://www.nytimes.com/pages/todayspaper/index.html";"//div[#class='columnGroup first']//h3")
If I run this on the old mechanism it works fine, but if I run the same on the new mechanism, first it will exchange my ";" for a "," and then it will bring a "#N/A" with a warning of "Error: Imported XML content cannot be parsed".
Edit (05/05/2015):
I am happy to say that I tested this function again today on the new spreadsheets and they've fixed it. I was checking that every two months and now finally they have solved this issue. The example I've added above is now returning information.
I'm sorry, but you won't be able to easily parse Google result pages. The reason your function throws an error is because the content of the page you see in your browser is generated by javascript, and Google spreadsheet doesn't execute js.
Your ImportXML has the right syntax, it doesn't return anything because the node you're looking for isn't there (importXML Parse Error).
You will have to find another source if you want these result in your spreadsheet. For info some libraries already parse the usual result page (http://www.seerinteractive.com/blog/google-scraper-in-google-docs-update for example, if it still works), but I doubt finding one for your special case will be easy.
This gives the answer (importXML Parse Error), but it's not entirely obvious.
ImportXML doesn't load Javascript. When you're building ImportXML queries on Google results, make sure you're testing against a version of the page that has Javascript turned off. You can do this using the Chrome DevTools.
(But I agree that ImportXML is fickle, idiosyncratic, and generally rage-inducing).

Generating table of contents

I posted this one a couple of months ago on the Mathematica newsgroup, but got no usable response. I thought I'd give SO a try.
The question was: I don't seem to be able to find the method to generate the table of contents of a
Mathematica document I'm working on. Anyone knows this feauture's
hideout?
David Annetts pointed me in the direction of the AuthorTools, an old v5.1
utility package that's still hidden in Mathematica. However, it
doesn't work on my document (v7). Any clue?
Edit
The TOC should contain correct section numbers (if present in the stylesheet) and list page numbers (this requires taking page size settings into account).
Perhaps looking at the code of Yuri Kandrashkin's package, Sidebar, will be useful?

Resources