Google sheet XPath scraping: copy currency exchange rate - xpath

From this URL https://www.xe.com/currencyconverter/convert/?Amount=1&From=MYR&To=INR I want to copy the data into my google sheets.
in cell A1 I have https://www.xe.com/currencyconverter/convert/?Amount=1&From=MYR&To=INR
in cell A2 I have =IMPORTXML(A1,"//span[#class='converterresult-toAmount']")
I get output N\A
Can someone advise me how?

Alternatively, you can use the GOOGLEFINANCE formula to fetch the FX rates from Google Finance directly:
=index(GOOGLEFINANCE("CURRENCY:MYRINR","price",today(),1,"DAILY"),2,2)
This function will return the daily FX rate for MYR-INR for today.
See GOOGLEFINANCE documentation for more details about the variations you can use to get more / different data.
I wrapped the Google Finance formula into an INDEX function to only get the rate (so you can use that in a multiplication to convert random amounts), as the GOOGLEFINANCE formula returns a table with dates and history by default.

unfortunately, that won't be possible because the site is controlled by JavaScript and Google Sheets can't understand/import JS. you can test this simply by disabling JS for a given link and you will see a blank page:

Related

Google Sheet Formula To Extract Domain From Different Format Site URLs

I have a Google spreadsheet where I have lists of URLs. I have formula to extract the domains from the URLs. But the issues is when a URL has multiple names in the domain. For example
I have attached the link to a sample doc and the tow formulae that I tried. These two formulas work perfectly in certain format and not on some other cases. If there is way to club these two or some way to understand the URL format and choose the best formula to extract the domain would be good. I tried by couldn't achieve the desired output. Google sheet link is given below.
Sample google sheet
You can get by with just one formula, REGEXEXTRACT
First, we extract the hostname from the url. To do that, we use the following formula:
=REGEXEXTRACT(A2:A,"(?:www\.)?([\w._\-]{6,})")
Now, we extract the domain from the hostname. You can do it like this:
=REGEXEXTRACT(...hostname... ,"[\w_\-]+\.\w{0,4}\.?\w{0,4}$")
And now we build everything into a single array formula:
=ARRAYFORMULA(if(A2:A<>"",REGEXEXTRACT(REGEXEXTRACT(A2:A,"(?:www\.)?([\w._\-]{6,})"),"[\w_\-]+\.\w{0,4}\.?\w{0,4}$"),))
I don't pretend to be the best solution to your task - and probably someone will be able to tell you something simpler.

Google Sheets ImportXML issues

I have a google sheet that I'm trying to automate as much as possible for my WoW Raid group. What I'm trying to do here is parse some data from WoW's armory to automatically pull a persons item level.
I am having issues pulling from WoW's website directly (https://worldofwarcraft.com/en-us/character/us/sargeras/Beansy), but I can pull the item level from another site (https://raider.io/characters/us/sargeras/beansy). The only difference I can spot is that one site I can pull from a [ /div/span/b clas"text-white" ] and from WoW the information is directly in [ /div/class="media-text" ]
WOW Formula =IMPORTXML(C32,"//*[#id='character-profile-mount']/div/div/div[2]/div/div[1]/div[1]/div/div[2]/div[1]/a[1]/div/div[2]")
Raider IO Formula =IMPORTXML(C31,"//*[#id='content']/div/div/div/div[2]/div[1]/div[1]/section/div/div[1]/div/span/b")
WOW Inspect Element <div class="Media-text">184 ilvl</div>
Raider IO Inspect Element <b class="text-white">184</b>
Above are the respective formula's and elements I've used. Raider IO's pulls properly and outputs 184 as it's information. However WoW's does not pull properly and outputs N/A Google Sheets Output Screencap
Does anyone have any ideas on why this might be happening?
Thanks in advance!
I think that the https://worldofwarcraft.com/en-us/character/us/sargeras/Beansy prepares the values using Javascript. For example, when the HTML without using Javascript is retrieved from this URL, Media-text cannot be found in the retrieved HTML. On the other hand, https://raider.io/characters/us/sargeras/beansy has the values in the HTML without using Javascript. I thought that the difference is due to this.
But in order to retrieve the value of 184 from URL of the former, when I saw the HTML without using Javascript, I noticed that the value is included in the metadata. So when the value of 184 is retrieved from the metadata, the sample formula is as follows.
Sample formula:
=REGEXEXTRACT(IMPORTXML(A1,"//meta[#name='description']/#content"),"(\d+) ilvl")
In this formula, the URL of https://worldofwarcraft.com/en-us/character/us/sargeras/Beansy is put to the cell "A1".
Result:
Also, as additional modification, about your =IMPORTXML(URL,"//*[#id='content']/div/div/div/div[2]/div[1]/div[1]/section/div/div[1]/div/span/b"), in this case, the xpath might be able to be modified simple a little as follows.
Modified formula:
=IMPORTXML(A1,"//span[contains(text(),'Item Level')]/b[#class='text-white']")
In this formula, the URL of https://raider.io/characters/us/sargeras/beansy is put to the cell "A1".
Result:
References:
IMPORTXML
REGEXEXTRACT

Incorrect URL/XPath when using Google sheets IMPORTXML

I'm trying to import a search result from google to my spreadsheet. I've had success with Wikipedia pages, but for some reason, Google search isn't working correctly (giving a "could not fetch url" error). I'm sure the problem is somewhere in my URL or XPath, but I've been trying a variety of things and I'm lost. Here is what I've got:
=IMPORTXML("https://www.google.com/search?q=dom+fera+easy+thing+released", "//div[#class='Z0LcW XcVN5d']")
I'm linking the spreadsheet below as view-only for reference as well. Ultimately the goal is to be able to webscrape release years of songs. I'd appreciate any help!
https://docs.google.com/spreadsheets/d/1bt8MJ23nfGAv6ianaR-sd7DM5DNn98p7zWSG1UzBlEY/edit?usp=sharing
AFAIK, you can't parse results from GoogleSearch in Google Sheets.
Using Discogs, MusicBrainz, All Music... to get the release dates could be useful.
But it seems some of your groups are little known. So, you can use Youtube to fetch the dates.
Note : we assume the year of publication on Youtube corresponds to the year of release.
Of course, that's not 100% true. For example, artists can clip their video months after release. Or publish nothing on Youtube.
So this method will work with a wide range of songs but not ALL the songs. With recent bands and songs, it should be OK.
To do this you can use the Youtube API or IMPORTXML formulas. In both cases, we always take the first result (relevant order) of the search engine as source.
You need an API key and an ImportJSON script (credits to Brad Jasper) to use the API method. Once you have installed the script and activated your API key,you can paste in cell B3:
="https://www.googleapis.com/youtube/v3/search?key={yourAPIKey}&part=snippet&type=video&filter=items&regionCode=FR&q="&ENCODEURL(A3)
We generate the url to query with the content you input in column A.
We use "regionCode=FR" since some songs are not available in the US ("i need you FMLYBND"). That way we get the correct release date.
In C3, you can paste :
=LEFT(QUERY(ImportJSON(B3);"SELECT Col11 LIMIT 1 label Col11''";1);4)
We parse the JSON, select the column of interest, the line of interest, then we clean the result.
With the IMPORTXML method, you can paste in E3 :
="https://www.youtube.com"&IMPORTXML("https://www.youtube.com/results?search_query="&A3;"(//div[#class='yt-lockup-thumbnail contains-addto'])[3]/a/#href")
We construct the url with the first search result of the search engine.
In F3, you can paste :
=LEFT(IMPORTXML(E3;"//meta[#itemprop='datePublished']/#content");4)
We parse the previously built url, then we extract the year of publication.
As you can see, there's a difference in the results on line 5. That's because the song is not available in the US. The first result returned in the IMPORTXML method is different from the one of the API method which uses a "FR" flag.
Side note : I'm based in Europe. So ";" in the formulas should be replaced with ",".
google does not support web scraping of google search into google sheets. this option was disabled 2 years ago. you will need to use alternative search engine

Google Sheets IMPORTXML Text Field from Website

I am trying to dynamically pull in car values for cars matching specific criteria on Kelley Blue Book. I have this IMPORTXML query that has a link to the specific page that shows the trade-in value of the car.
=IMPORTXML("https://www.kbb.com/Api/3.9.462.0/71553/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=trade-in-sell&pricetype=FPP&zipcode=12345&vehicleid=411852&selectedoptions=6762567|true|6762674|false|6762900|false|6762905|false|6762909|false|6762913|false|6762915|true|6762926|false|6762928|false&hideMonthlyPayment=False&condition=verygood&mileage=40000", "//text[#y='-8']")
In this URL, there is a text field that has the y coordinate as -8. I was hoping that it would be sufficient to identify the data I want to pull in (The trade-in value). I get the standard Can't fetch URL error and can't figure out why.
the issue is not within your XPath "//text[#y='-8']" but with the website itself.
basically you have two options to test if the website can be scraped:
=IMPORTXML("URL", "//*")
where XPath //* means "everything that's possible to scrape"
and direct source code scrape method:
=IMPORTDATA("URL")
sometimes is source code just huge and Google Sheets can't handle it so this needs to be restricted a bit like:
=ARRAY_CONSTRAIN(IMPORTDATA("URL"), 10000, 10)
anyway, non of these can scrape anything from your URL

Importxml google spreadsheet parsing formula error

I tried to use this formula
=ImportXML("http://www.google.com/search?q=philadelphia seo company&num=100", "//h3[#class='r']/a/#href")
from http://www.seerinteractive.com/blog/importxml-cookbook/
and I get an formula error , you need to enable something in google spreadsheet before using this formula?
You need to encode the part of the search query where "q=philadelphia seo company" meaning all the spaces should be converted to "%20".
end result should look like this:
=ImportXML("http://www.google.com/search?q=philadelphia%20seo%20company&num=100", "//h3[#class='r']/a/#href")
also - i use importxml all the time and with google search results, you can also use "//cite" depends how much of the url you want.

Resources