I want to write an ImportXML function in a Google Spreadsheet to return the document name of the same spreadsheet. For example, my spreadsheet is titled "Kimchi". I want' to return that name in cell "A1" to automate a series of functions within the spreadsheet based on the document name. I'm too lazy to type the value into the cell for each of the hundred or so spreadsheets I'll copy from the original template and rename.
I can't seem to nail a correct query structure.
This bit of XML looked promising, but I can't seem to get the query to pull it:
<span class="docs-title" id="docs-title" role="button"><div class="docs-title-inner" id="docs-title-inner">kimchi</div></span>
I've tried so far...
=ImportXML("SOME URL HERE", "//div[#class=’docs-title-inner’]/#content")
It returns...
Error: Imported Xml content can not be parsed.
I've tried all kinds of variations, some probably equally poorly formed. Following is some of the XML structure that looked juicy:
<html>
<head>
<title>kimchi - Google Sheets</title>
But this XPath query within the ImportXML function didn't work either
=ImportXML("SOME URL HERE", "/html/head/title")
It returned...
Error: Import Internal Error.
I'm stumpted.
Here's the spreadsheet with variations.
PS This ended up working after I shared the document with the world:
=ImportXml("THE URL", "//meta[#itemprop='name']/#content")
You dont have to do any of that.
Go to tools-> script editor -> blank project
replace the contents of the edit window with the code below:
function BookName() {
return SpreadsheetApp.getActiveSpreadsheet().getName();
}
Ctrl-S, put BookName in the name box, click ok, wait for the yellow "saving" bar to dissapear. Close the tab with the code editor.
In your sheet you can now simply type =BookName() and the cell will display the workbook title.
Related
Today when experimenting with using importXML in Google Sheets, I ran into a problem. I was attempting to import the title header of a USTA Tournament page into the Google Sheet, however, this did not work as it just resulted in the HTML title of the webpage being displayed ('TournamentHome'). Below is the Google Sheet, and the website that is used:
Google Sheet and Function:
=importXML(F2, "//html//body[#id='thebody']//div[#id='content']//div[#id='pagetitle']")
Website and Section of Source Code Being Used
The title that I am trying to extract from the website is TOWPATH 24th ANNUAL THANKSGIVING JR SINGLES.
The link to the website is https://m.tennislink.usta.com/tournamenthome?T=225779
update:
=REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(
"https://m.tennislink.usta.com/tournamenthome?T=225779"), 555, 1),
"where Col1 contains 'escape'"), "\(""(.*)""\)")
unfortunately, that won't be possible the way you trying because the field you attempt to scrape is controlled by JavaScript and Google Sheets can't understand/import JS. you can test this simply by disabling JS for a given link and you will see what exactly can be imported into Google Sheets:
How about this sample formula? In this formula, the title value is directly retrieved from the script before the value is put to #pagetitle. Please think of this as just one of several answers.
Sample formula:
=REGEXEXTRACT(IMPORTXML(A1,"//div[#class='tournament_search']/script"),"escape\(""([\w\s\S]+)""")
Result:
When https://m.tennislink.usta.com/TournamentHome/tournament.aspx?T=38079 and https://m.tennislink.usta.com/tournamenthome?T=225779 are put in "A1" and "A2", the results are as follows.
Reference:
REGEXEXTRACT
I am unable to make the below filter formula work. Notice that I am trying to refer to the sheet "Data" and filter them into another sheet. I get error saying there were no matches.
=FILTER(Data!A3:Data!J, ARRAYFORMULA(REGEXMATCH(Data!J3:Data!J, ".*km.*")))
However, the above formula works when I insert it in the "Data" sheet. Notice that "Data!" is removed as it is in same sheet.
=FILTER(A3:J, ARRAYFORMULA(REGEXMATCH(J3:J, ".*km.*")))
I tried using the below sets with same results.
=FILTER(A3:J, REGEXMATCH(J3:J, ".*km.*"))
=FILTER(Data!A3:Data!J, REGEXMATCH(Data!J3:Data!J, ".*km.*"))
I have no problems filtering dates, numbers from "Data" sheet into my filtering sheet. It is only with "text contains" condition. Any help to resolve this is appreciated.
The formula was wrong. I was using the "Data!" twice. The below code works. Phew! Switching between languages (autohotkey, html, javascript, LibreOffice basic) seems to be taking a toll.
=FILTER(Data!A3:J, REGEXMATCH(Data!J3:J, ".*km.*"))
The website is : https://www.futbin.com/18/player/2600/Ayhan/
I inspect the element and get the XPath which is: //*[#id="ps-lowest-1"]
Then I use:
=IMPORTXML("https://www.futbin.com/18/player/2600/Ayhan/","//*[#id='ps-lowest-1']")
To get the data which should be 2000
But instead it only shows: - on the sheet. No errors just doesn't show the data that I want it to. Is there anyway to get the data that I need?
Thanks
The Sheets command importXML reads only the HTML source of the page without executing any JavaScript on it. As you can see yourself by using "view source" in the browser, the source indeed has "-" in that span:
<span class="price_big_right">
<span id="ps-lowest-1">-</span>
</span>
The actual numbers are loaded by some JavaScript file which then inserts them in that span. Neither importXML nor other Sheets functions can retrieve dynamically inserted data.
Sometimes, after inspecting the JS files, one can uncover the URL of source of data and try to import that; but this is a tedious reverse engineering exercise for each particular site.
I encounter the error message imported content is empty when I use the formula below in google spreadsheet.
=IMPORTXML("https://www.moh.gov.sg/content/moh_web/home/pressRoom.html", "//div[#class='article highlight']/h3/a/#title")
I am trying to import the list of press release title on the webpage.
What am I doing wrong?
So this issue is not your formula or XML in this particular case, its that the content is loaded using jQuery, so you need to figure out where or what the url is that actually holds your content.
I am trying to read mails programmatically in VB6. but i am unable to read mails containing inline images or HTML code like hyper link. Can anyone suggest me the way to read this type of mails.
EDIT:
I am not getting any error message but
nsfDocument.GETITEMVALUE("Body")(0) returns only text.
images are not shown.
You may want to try a third party API to help, such as the Midas Rich Text C++ API from Genii Software. http://www.geniisoft.com/showcase.nsf/MidasCPP
Or try the code examples shown on this site to gain access to the Notes Document in HTML form: http://searchdomino.techtarget.com/tip/0,289483,sid4_gci1284906,00.html
The GetItemValue method of the Document class returns rich-text item values as an array of strings, with all rich text styling removed. The "body" field in a Notes email is generally rich text. So, you should look into using the GetFirstItem method, instead. That will return a NotesRichTextItem object (for the body field). From that object, you can access the styling of the text, hyperlinks and file attachments, etc. (I do not believe that you can access in-line images at all via the "back-end" COM classes - I think for that, you will need to drop down to use the C API classes).
Here's a quick sample of how to get a NotesRichTextItem handle:
Dim doc As NotesDocument
Dim rtitem As Variant
... get the document
Set rtitem = doc.GetFirstItem( "Body" )
If rtitem.Type = RICHTEXT Then
.. work with rtItem
End If
Here is the doc page for the NotesRichTextItemClass:
http://publib-b.boulder.ibm.com/lotus/c2359850.nsf/2e73cbb2141acefa85256b8700688cea/dc72d312572a75818525731b004a5294?OpenDocument
And here is a starting point for the C API docs:
http://www14.software.ibm.com/webapp/download/nochargesearch.jsp?k=ALL&S_TACT=104CBW71&status=Active&q=Lotus+%22C+API%22