ImportXML XPath URL in Google Sheets - xpath

I just started using google sheets and I wonder if it is possible to fetch a link using importxml XPath
https://prnt.sc/orrxpv
here in this screenshot, you can see the clickable link under main tournament sub-menu which will change every week. currently, I am copy-pasting it manually every week you can find it in a16 cell of the sheet linked below
https://prnt.sc/orrx4y
here is the XPath that I am trying to acquire XPath (I am a total noob at this) and trying to frame it into a formula in a14 cell but have failed miserably.
I also wonder that if it is possible to =concatenate something in between the link
for eg:
https://www.tennisexplorer.com/cincinnati/2019/atp-men/
https://www.tennisexplorer.com/cincinnati/2018/atp-men/
https://www.tennisexplorer.com/cincinnati/2017/atp-men/
I want to change the year in between the link but cannot figure how :(
Link to the sheet with edit permission :)
https://docs.google.com/spreadsheets/d/16Y6q2tw26c-nbmqIrXiQ_ZhuDc48oAzkXvOl78kVy00/edit?usp=sharing

to create selectable year in link use:
="https://www.tennisexplorer.com/cincinnati/"&B1&"/atp-men/"
to get Cincinety you can do:
=ARRAYFORMULA(QUERY(TO_TEXT(IMPORTXML(
"https://www.tennisexplorer.com/cincinnati/2019/atp-men/",
"//td[#class='t-name']")), "select Col2 where Col1 is null limit 1", 0))
and then the whole formula would be:
="https://www.tennisexplorer.com/"&
ARRAYFORMULA(QUERY(TO_TEXT(IMPORTXML(
"https://www.tennisexplorer.com/live/",
"//td[#class='t-name']")), "select Col2 where Col1 is null limit 1", 0))&"/"&B1&"/atp-men/"

Related

Using Google Sheets for web scraping. Need the correct xpath for IMPORTXML function

There is a google sheet containing a list of MPN's (manufacturer part numbers). Trying to scrape a site called wikiarms for the UPC Codes when I have the MPN for an item.
I have the correct formula for doing this on another site.
=IMPORTXML("http://gun.deals/search/apachesolr_search/"&B1,"//dd/a[../../dt[contains(text(),'UPC')]]|//dd/span[../../dt[contains(text(),'UPC')]]")
Trying to figure out what the correct xpath to complete this formula. Some videos I have watch said to open the page in Chrome and use inspector to select and copy the xpath to complete the importxml function. I tried this with no luck.
Sample
Visit https://www.wikiarms.com/guns?q=20071
In the table there is a button "available in 6 stores" click that to reveal the list. The UPC should be listed after the MPN.
If I copy the xpath in Chrome this is the result
/html/body/div[1]/div/div/div[2]/div/div/div[2]/div[2]/table/tbody/tr[2]/td[5]
=IMPORTXML("https://www.wikiarms.com/guns?q="&B2,"xpath here")
What do I have to add at the end of this formula to pull in the UPC code? I will be using this formula to pull in UPC code for about 1000 items.
Thank you for your help.
Using your sample link, try
=IMPORTXML("https://www.wikiarms.com/guns?q=20071","//td[#class='upc']/a/#title")
and see if it works for you.

How Do I Import Data Into My Google Sheet from a Website Using importXML

Today when experimenting with using importXML in Google Sheets, I ran into a problem. I was attempting to import the title header of a USTA Tournament page into the Google Sheet, however, this did not work as it just resulted in the HTML title of the webpage being displayed ('TournamentHome'). Below is the Google Sheet, and the website that is used:
Google Sheet and Function:
=importXML(F2, "//html//body[#id='thebody']//div[#id='content']//div[#id='pagetitle']")
Website and Section of Source Code Being Used
The title that I am trying to extract from the website is TOWPATH 24th ANNUAL THANKSGIVING JR SINGLES.
The link to the website is https://m.tennislink.usta.com/tournamenthome?T=225779
update:
=REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(
"https://m.tennislink.usta.com/tournamenthome?T=225779"), 555, 1),
"where Col1 contains 'escape'"), "\(""(.*)""\)")
unfortunately, that won't be possible the way you trying because the field you attempt to scrape is controlled by JavaScript and Google Sheets can't understand/import JS. you can test this simply by disabling JS for a given link and you will see what exactly can be imported into Google Sheets:
How about this sample formula? In this formula, the title value is directly retrieved from the script before the value is put to #pagetitle. Please think of this as just one of several answers.
Sample formula:
=REGEXEXTRACT(IMPORTXML(A1,"//div[#class='tournament_search']/script"),"escape\(""([\w\s\S]+)""")
Result:
When https://m.tennislink.usta.com/TournamentHome/tournament.aspx?T=38079 and https://m.tennislink.usta.com/tournamenthome?T=225779 are put in "A1" and "A2", the results are as follows.
Reference:
REGEXEXTRACT

xpath query to a specific number on a page

I couldn't manage to write the correct XPATH code to import a number from the following page into my google sheet cell. I want to use the IMPORTXML function of google sheet.
Any help is highly appreciated in advance.
Source page: https://www.xe.com/currencycharts/?from=USD&to=AED
The number I want to insert into a cell is shown below
Tried
//div[#id='rates_detail_desc']/strong[2]

Import data from ebay to google spreadsheet using IMPORTXML

I'm trying to Import a table from "https://www.ebay.com/itm/100-NEW-ALTERNATOR-VW-GOLF-GTI-GL-GLS-1-8T-1-8-2L-99-06-90A-1-YR-WARRANTY-13852/301364941754?fits=Model%3AJetta&hash=item462ac013ba:g:v7oAAOSw~YRagU4N&vxp=mtr" to a Google Spread sheet using =IMPORTXML function, The Formula I was using as below,
A1 = https://www.ebay.com/itm/100-NEW-ALTERNATOR-VW-GOLF-GTI-GL-GLS-1-8T-1-8-2L-99-06-90A-1-YR-WARRANTY-13852/301364941754?fits=Model%3AJetta&hash=item462ac013ba:g:v7oAAOSw~YRagU4N&vxp=mtr
A2 = //*[#id="w1-20ctbl"]
A3 = =IMPORTXML(A1,A2)
But it returns nothing and It says "Imported content is empty."
Can Somebody help me, I'm new to google sheet scripting and I'll really grateful if somebody can help me.
waiting to hearing from somebody....
Thanks
You cannot access the pictured table using IMPORTXML or any built-in Google Sheets formula because the table is generated when a user visits the website.
If you look at the page source, you'll see that the table does not exist. IMPORTPATH looks at this page source, which is the content before javascript rendering by the browser. When you "inspect" an element in your browser, it's inspecting the content after the javascript has been rendered.
Unfortunately, there is not a simple way to get the data you're looking for. You'll have to find or build your own scraping tool. Be careful not to violate eBay terms of use or any local laws.

Google Spreadsheet xpath scraping

So I'm not a professional programmer, but I'm trying to scrape data off the Reuters homepage and import it into google spreadsheets.
I know that there have already been questions answerd about scraping from Reuters, however, that didn't help me.
I want data from this page: http://www.reuters.com/finance/stocks/financialHighlights?symbol=9983.T
specifically, if you scroll down, there's a lot of data on the company's financials, packed into tables. I need specific values out of the tables.
So naturally my question to you is, how can I get specific values out of the tables? For instance, I want the first value out of the line that's labelled "Net Profit Margin (TTM)". The value should be 7.30.
So I got the xpath by using google chrome developer tools, right-click on the element and select "copy xpath". Since I'm not a programmer I dont know any other way for arriving at a specific element from the tables.
I tried the following function in google spreadsheets:
=IMPORTXML(URL as written above,"//*[#id='content']/div[2]/div/div[2]/div[1]/div[13]/div[2]/table/tbody/tr[14]/td[2]")
but it returns
"#N/A - Error, imported content is empty"
What can I do to get the value?
The IMPORTXML() function of Google Sheets is known to be incredibly buggy and it is not surprising if people dig up real errors in it. Still, we don't know exactly why your original XPath expression does not work.
I want the first value out of the line that's labelled "Net Profit Margin (TTM)". The value should be 7.30.
The path expression you got from the developer tools heavily relies on positioning, and not at all on actual values.
If you can rely on the text content of the first cell in this row, use
=IMPORTXML("http://www.reuters.com/finance/stocks/financialHighlights?symbol=9983.T","//tr[contains(td[1],'Net Profit Margin (TTM)')]/td[2]")
which means
Select all tr elements where the text content of the first td child element contains "Net Profit MArgin (TTM)" and select the second td of that tr.
and the result will be
7.3

Resources