Keep having errors with importxml + xpath - xpath

I spend hours trying to fix this but can't find where the issue is.
I try to import data in google spreadsheet using importxml.
Here is the url :
http://www.journaldesfemmes.com/maman/creches/3-pom/creche-3098
I'm interested in exctracting email and phone number for exemple. I used chrome inspector to copy the Xpath, and few chrome plugins. I guess the issu is the Xpath. Here is the formula I used in spreadsheet :
=importxml("http://www.journaldesfemmes.com/maman/creches/3-pom/creche-3098";"/html/body/div[4]/div/div[1]/div[2]/div[1]/div/div/div/div/div[10]/table/tbody/tr[2]/td[2]")
Hope someone can help

Since the data you want is in tables, it might be easier to use importhtml.
The table you want you can get with this:
=IMPORTHTML("http://www.journaldesfemmes.com/maman/creches/3-pom/creche-3098","table",2)
To get just the phone number add index (row and column of table)
=index(IMPORTHTML("http://www.journaldesfemmes.com/maman/creches/3-pom/creche-3098","table",2),3,2)
email is:
=index(IMPORTHTML("http://www.journaldesfemmes.com/maman/creches/3-pom/creche-3098","table",2),4,2)

Related

How to extract a text from a url while conditionally rendering the images using an Arrayformula in Google Sheets?

This formula below does extract the file ID and yields the correct image url. Now, how to fetch the correct url to extract that ID from by verifying a key using VLOOKUP() maybe?
I've seen that REGEXEXTRACT() requires JOIN() and could this be the reason why it doesn't work?
Current formula doesn't populate the rows, but only the one it's sitting in:
=arrayformula(iferror(image("https://drive.google.com/uc?export=view&id="&regexextract(VLOOKUP($F$3:$F$100,$I:$I,2,0),"d/(.+)/view")),""))
Here's a file for tests, if you feel like operating.
Thank you!
Your formula is working! Just change $I:$I with $I:$J.
You need to put the whole range including the column to look at and the column with the results.

Import table using IMPORTXML

I am trying to pull the table from https://rotogrinders.com/schedules/nfl into Google Sheets
I tried using ImportHTML("https://rotogrinders.com/schedules/nfl", "table", 1) but it just returns the header:
Time Team Opponent Line Moneyline Over/Under Projected Points Projected Points Change
Using ImportXML, I tried IMportXML("https://rotogrinders.com/schedules/nfl","//tr"), but it returns the same header and no data.
I dont think the tbody needs authentication to access. I logged out, cleared my cache and even tried on another computer and still no tbody.
I know its a table called "tschedules", but cant get the data
Is there another part of the XPATH I am missing?
This is the XPATH from google scraper: "//table[1]/tbody/tr[td]"
When you load this website the actual table doesn't load in for a few seconds. This is why you're not seeing any data come in. If you can set some wait time on that import call somehow then it would work.

Google ImportXML to Scrape Prices off TIAA.org

A number of investments from TIAA.org are not traded on exchanges and not available via a ticker symbol thru say GoogleFinance etc. For one of these I would like to 'scrape' the daily price directly off of TIAA.org website and into a cell auto-magically.
In Google sheets I thought it would be easy enough using ImportHTML as a table but no luck. I've experimented with ImportXML but cannot seem to figure out how to set the xpath query for the specific price I'm interested in and leaving me confused - keep ending up with a "N/A" cell ("Error: Imported content is empty").
Using this URL:
https://www.tiaa.org/public/investment-performance/tiaavariableannuity/profile?ticker=41091375
Could someone take a look and suggest how I might import the daily price (aka unit value) for QREARX into a Google sheet cell using an xpath with ImportXML or other method?
Thanks
If you slightly change your url prefix to https://www.tiaa.org/public/tcfpi/Investment/Portfolio?symbol= and then add your ticket number at the end like you did in your example: 41091375
then you can use this importxml function to pull in the value your looking for:
=IMPORTXML("https://www.tiaa.org/public/tcfpi/Investment/Portfolio?symbol=41091375","//*[#class='first']/text()")

Extract href in table with importxml in Google spreadsheet

I am trying to pull the href for each row of each table from this website:
http://www.epa.gov/region4/superfund/sites/sites.html#KY
I can pull the table information off using =IMPORTHTML(A1,"table",1) for all 7 tables, but I need the href to the site with the detailed information.
Using =IMPORTxml(A1,"//div[#class='box']") I can pull the information needed from a site like:
http://www.epa.gov/region4/superfund/sites/fedfacs/alarmyaplal.html
but I need to extract the fedfacs/alarmyaplal.html portion for each row on the original page.
I've tried using //#href, but it is not returning any results. I'm thinking it is because the data is structured in a table but I'm stuck on where to go from here.
I'm not sure about any of the Google Spreadsheet functionality, but here's an XPath to select all href attributes of the Kentucky sites (since your first link included the 'ky' anchor):
//body//a[#id='ky']/following-sibling::table[1]/tbody/tr/td[1]/strong/a/#href
This is very specific to the Kentucky table: following-sibling::table[1] means the first table node after, and at the same level of, a[#id='ky'].

How do I return multiple columns of data using ImportXML in Google Spreadsheets?

I'm using ImportXML in a Google Spreadsheet to access the user_timeline method in the Twitter API. I'd like to extract the created_at and text fields from the response and create a two-column display of the results.
Currently I'm doing this by calling the API twice, with
=ImportXML("http://twitter.com/status/user_timeline/matthewsim.xml?count=200","/statuses/status/created_at")
in the cell at the top of one column, and
=ImportXML("http://twitter.com/status/user_timeline/matthewsim.xml?count=200","/statuses/status/text")
in another.
Is there a way for me to create this display with a single call?
ImportXML supports using the xpath | separator to include as many queries as you like.
=ImportXML("http://url"; "//#author | //#catalogid| //#publisherid")
However it does not expand the results into multiple columns. You get a single column of repeating triplets (or however many attributes you've selected) as shown below in column A.
The following is deprecated
2015.06.16: continue is not available in "the new Google Sheets" (see: The Google Documentation for continue).
However you don't need to use the automatically inserted CONTINUE() function to place your results.
=CONTINUE($A$2, (ROW()-ROW($A$2)+1)*$A$1-B$1, 1)
Placed in B2 that should cleanly fill down and right to give you sane column data.
ImportXML is in A2.
A3 and below are how the CONTINUE() functions are automatically filled in.
A1 is the number of attributes.
B1:D1 are the attribute index for their columns.
Another way to convert the rows of =CONTINUE() into columns is to use transpose():
=transpose(importxml("http://url","//a | //b | //c"))
Just concatenate your queries with "|"
=ImportXML("http://twitter.com/status/user_timeline/matthewsim.xml?count=200","/statuses/status/created_at | /statuses/status/text")
I posed this question to the Google Support Forum and this is was a solution that worked for me:
=ArrayFormula(QUERY(QUERY(IFERROR(IF({1,1,0},IF({1,0,0},INT((ROW(A:A)-1)/2),MOD(ROW(A:A)-1,2)),IMPORTXML("http://example.com","//td/a | //td/a/#href"))),"select min(Col3) where Col3 <> '' group by Col1 pivot Col2",0),"offset 1",0))
Replace the contents of IMPORTXML with your data and query and see if that works for you. I
Apparently, this attempts to invoke the IMPORTXML function only once. It's a solution for now, at least.
Here's the full thread.
This is the best solution (NOT MINE) posted in the comments below. To be honest, I'm not sure how it works. Perhaps #Pandora, the original poster, could provide an explanation.
=ArrayFormula(iferror(hlookup(1,{1;ARRAY},(row(A:A)+1)*2-transpose(sort(row(A1:A2)+0,1,0)))))
This is a very ugly solution and doesn't even explain how it works. At least I couldn't get it to work due to multiple errors, like i.e. to much parameters for IF (because an array is used). A shorter solution can be found here =ArrayFormula(iferror(hlookup(1,{1;ARRAY},(row(A:A)+1)*2-transpose(sort(row(A1:A2)+0,1,0))))) "ARRAY" can be replaced with IMPORTXML-Function. This function can be used for as much XPATHS one wants. – Pandora Mar 7 '19 at 15:51
In particular, it would be good to know how to modify the formula to accommodate more columns.

Resources