XPath Query for Google Docs ImportXML - xpath

I'm trying to pull a series of notes out of salesforce, I really just need the body of those notes and I'd really rather avoid copying those manually.
I've got the URLs of the notes into a Google Docs spreadsheet and I'm trying to use ImportXML function to pull specific information out, however I can't seem to get the xpath query right.
After some attempts of my own and a fair bit of research (I am a complete beginner so I might jut be searching for the wrong things) I came up with an xpath query like so:
//div[#class="pbSubsection"]//td[#class="data2Col"][5]//text
This results in a parsing error.
I also found that I can open up the Note in Chrome and in developer tools, find the table and right-click to select Copy XPath, which gives me:
//*[#id="ep"]/div[2]/div[2]/table/tbody/tr[5]/td[2]
Even if I append //text onto the end. Obviously this is not as fool-proof as I require; is there something I'm missing here or some tool I can use to figure out the problem with these queries? I tried XMLQuire without much luck.
Then again, if some kind soul wants to take a look at the page code (hastily altered to remove sensitive information) and tell me specifically what I'm missing, I'll settle for that:
https://www.dropbox.com/s/peo5i47du1vtsmu/test.html
The text I'm trying to pull is:
teamviewer 12345
Server: Customer Name, ST
Username: administrator
Password: password1
Any ideas? Thanks in advance for your time.

"//div[#class='pbSubsection']//td[#class='data2Col']/text()"
yields
['Connection Details',
'teamviewer 12345 \r',
'\r',
'Server: Customer Name, ST\r',
'Username: administrator\r',
'Password: password1']

Related

Getting a xPath from XML document

I am trying to get some values from an online XML document, but I cannot find the right xpath to navigate to those values. I want to import these values into a Google Spreadsheet document, which requires me to get the exact xpath.
The website is this one, and I am trying to get the information for "WillPay" information from MeetingInfo Venue=S1, Races RaceNo=1, Pools PoolInfo Pool=WIN, in OddsInfo.
For now, the value of "Number=1" should be 3350 (or something close to this, it changes quite often), and I would like to load all of these values onto the google spreadsheet document.
What I've tried is locating the xpath of all of it, and tried to my best attempt to get
"/AOSBS_XML/Meetings/MeetingInfo/Races/Pools/PoolInfo/OddsSet/OddsInfo/#WillPay"
but it doesn't work.
I've been stuck on this problem for months now and I've been avoiding it, but realised I can't anymore because it's hindering my work. Please help.
Thanks!
-Brandon
Try using this xpath expression:
//MeetingInfo[#Venue="S1"]/Races//RaceInfo[#RaceNo="1"]//Pools//PoolInfo[#Pool="WIN"]//OddsSet//OddsInfo[#Number="1"]/#WillPay
An alternative :
//OddsInfo[#WillPay][ancestor::PoolInfo[#Pool='WIN'] and ancestor::RaceInfo[#RaceNo='1'] and ancestor::MeetingInfo[#Venue='S1']]

Importing book names from goodreads.com into Google Sheets with ImportXML gives "Import Internal Error" sometimes

I have a formula that fetches names of books from goodreads.com:
=IMPORTXML("https://www.goodreads.com/book/show/" & gr_id; "//*[#id='bookTitle']")
where gr_id is a column containing ids of the books. For example when gr_id=23848607, it fetches from URL https://www.goodreads.com/book/show/23848607 and the result is "Warheart".
The formula worked fine some time ago. I did not change anything and now I noticed it stopped working for some of the books (still working for others). Instead of the name of the book now it gives N/A with "Import Internal Error" hint. The ids that do not work are:
48332548
35906922
How to make it work for all books?
There were many questions posted about "Import Internal Error" problems. I tried some solutions including copying the formula to a fresh sheet, but it did not work.
Update: I tried the following different XPath formulas instead of "//*[#id='bookTitle']".
"//h1[#id='bookTitle']"
"//h1"
Those different XPath formulas worked the same as the original XPath formula. They worked correctly for the same ids that the original one did and produced N/As for the same ids that the original one did.
Update: I just re-checked and all my formulas worked correctly for all gr_ids (I had not changed anything since the time when they did not work.) May be someone knows how to prevent them from stopping working in the future.
Update: I re-checked once again. Of all gr_ids only this one was showing N\A now: 35906922. I created an example spreadsheet, because my working spreadsheet contains too many unrelated details, but the problem did not appear in the example spreadsheet. I went back to my working spreadsheet and reloaded it - and the problem disappeared in my working spreadsheet too. Then I added more test data in the example spreadsheet and the following new example gr_ids showed N\A:
48213012
48213092
I tried to make a copy of the example spreadsheet to see if it fixes the problem. The behavior in the copy example spreadsheet was identical to the original example spreadsheet - the problem only with two gr_ids specified above.
if you run full IMPORTXML on those two IDs you can see it won't return anything at all:
=IMPORTXML("https://www.goodreads.com/book/show/48213012-fathers-and-sons", "//*")
which means that Google Sheets can't reach the XML content for some reason (could be something similar to https://stackoverflow.com/a/24891676/5632629)
therefore we can try to read the source code directly with IMPORTDATA where we can find around 70 elements with the same information so we pick one, isolate it and remove HTML tags. then we just wrap the prior formula in IFERROR and force the formula to take a 2nd look if it fails first time. the result is like this:
=IFERROR(IMPORTXML("https://www.goodreads.com/book/show/"&A:A, "//*[#id='bookTitle']"),
REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(
IMPORTDATA("https://www.goodreads.com/book/show/"&A:A), 100, 1),
"select Col1 where Col1 contains '</title>'"), ">(.*) by"))
IMPORTXML() seems to be unreliable. I decided not to use it, because I did not find an acceptable solution to my problem. Instead of using IMPORTXML() I exported my books from goodreads.com to csv file (there is such a feature of goodreads.com) and then imported the csv file into my spreadsheet. This is not be an perfect solution, because I need to re-import every time I need to update the books, but at least it works.

Xpath implementation in Google Sheets

Xpath newbie question, so forgive me if this seems straight forward, but I really have looked everywhere for the answer!
I'm trying to build a process for extracting all my playlists from Spotify and making it universal, allowing migration across various platforms. I will gladly share once completed as I know many people would find this useful.
I'm unfortunately stumped on trying to extract some data from:
[http://musicbrainz.org/ws/2/artist/?query=%22faith%20no%20more%22][1]
I am looking to extract the id from the artist element, which should be b15ebd71-a252-417d-9e1c-3e6863da68f8. I can get this working in Base X with the following:
declare namespace mmd="http://musicbrainz.org/ns/mmd-2.0#";
declare variable $doc := doc("http://musicbrainz.org/ws/2/artist/?query=%22faith%20no%20more%22");
$doc/mmd:metadata/mmd:artist-list/mmd:artist/#id
However, in Google Sheets using Importxml, the best I can do is:
=IMPORTXML("http://musicbrainz.org/ws/2/artist/?query=%22faith%20no%20more%22","//#id")
This results in all 3 id results being returned:
b15ebd71-a252-417d-9e1c-3e6863da68f8
489ce91b-6658-3307-9877-795b68554c98
83f22bb6-4631-443c-bace-9fae8540362a
I am completely stumped and any help will be greatly appreciated.
Kind regards,
James
I haven't been able to find any useful documentation on Google's IMPORTXML, but there is no evidence that it provides any way to establish a namespace binding, or that it supports the XPath 2.0 syntax *:metadata to select elements independent of namespace. If that's the case then you may need to resort to the horrible construct *[local-name()='metadata']/*[local-name()='artist-list']/*[local-name()='artist']

Google spreadsheet ImportXML Error:"the XPath query did not return any data"

I continue to get this error when I try to run this XPath query
//div[#iti='0']
on this link (flight search from google)
https://www.google.com/flights/#search;f=LGW;t=JFK;d=2014-05-22;r=2014-05-26
I get something like this:
=ImportXML("https://www.google.fr/flights/#search;f=jfk;t=lgw;d=2014-02-22;r=2014-02-26";"//div[#iti='0']")
I verified and the XPath is correct (I get the answer wanted using XPath helper, the answer wanted are the data relative to the first flight selected).
I guess that it is a problem of syntax, but I tried more or less all the combinations of lower/uppercase, punctuation (replacing ; , ' ") and I tried to link the URI and the XPath query stored in cells, but nothing works.
Any help will be appreciated.
As a matter of fact, maybe it is a bug on the new google sheets or they have changed how the function works. I've activated mine and when I try to use the ImportXML it simply wont work. Since I have some old sheets here (on the old mechanism) they still work normally. If I copy and paste the script from the old to the new one it simply doesn't get any data.
Here a example:
=ImportXML("http://www.nytimes.com/pages/todayspaper/index.html";"//div[#class='columnGroup first']//h3")
If I run this on the old mechanism it works fine, but if I run the same on the new mechanism, first it will exchange my ";" for a "," and then it will bring a "#N/A" with a warning of "Error: Imported XML content cannot be parsed".
Edit (05/05/2015):
I am happy to say that I tested this function again today on the new spreadsheets and they've fixed it. I was checking that every two months and now finally they have solved this issue. The example I've added above is now returning information.
I'm sorry, but you won't be able to easily parse Google result pages. The reason your function throws an error is because the content of the page you see in your browser is generated by javascript, and Google spreadsheet doesn't execute js.
Your ImportXML has the right syntax, it doesn't return anything because the node you're looking for isn't there (importXML Parse Error).
You will have to find another source if you want these result in your spreadsheet. For info some libraries already parse the usual result page (http://www.seerinteractive.com/blog/google-scraper-in-google-docs-update for example, if it still works), but I doubt finding one for your special case will be easy.
This gives the answer (importXML Parse Error), but it's not entirely obvious.
ImportXML doesn't load Javascript. When you're building ImportXML queries on Google results, make sure you're testing against a version of the page that has Javascript turned off. You can do this using the Chrome DevTools.
(But I agree that ImportXML is fickle, idiosyncratic, and generally rage-inducing).

How to make the cities/countries dropdown like facebook does?

See the screenshot here:
I'd like the user to just type a city or country name and the autocompleter will show suggested items.
How should I start for creating it?
Are there any API(s) or web services for me to call?
Where can I find the database of all cities/countries in the world?
I think this would be the best database for your situation, check it out:
http://www.geodatasource.com/cities-free.html
You first need a autocomplete plugin.
I recommend to use the jQuery-Ui Auto Complete Plugin.
The database could as example be this, but eventually try to search a bit for yourself.
There was already a question on stackoverflow about a database for cities of the world.
A simple text file with all cities may also be this.
There are very much of those libraries, but you have to chose the right one for you.
My solution may not be the best, but it's a starting point:
Google a list with all countries (ISO-Standard), paste it into a txt-file. Then you can simply read that file with PHP an create a select menu with the contents of the file.
It does not incorporate the cities, but maybe it helps you in some way.

Resources