Import data to Google sheet from a live website - xpath

I want to import villages data from state government website (Madhya Pradesh State in India). The website is http://saara.mp.gov.in/saaraweb/PublicReport/frm_PerWiseGirdavariStatus.aspx In this page the report is shown off all the district of Madhya Pradesh State. I want to import all the villages data. My target district is "37-सिवनी". The district name is in first column and its position changes according to the percentage of work done by all the districts. When I click on the "37-सिवनी" the javascript shows me all the tahsil(towns) of "37-सिवनी" which are 08. My target town is "06-घंसाैर" its position also changes according to the work done percentage by all towns. When I click on "06-घंसाैर" the list of villages of "06-घंसाैर" is now shown with the desired data I want. I want this data to be imported automatically to my googlesheet in specific time. I tried importhtml and importxml but it did not work for me as the data is not directly available in the url http://saara.mp.gov.in/saaraweb/PublicReport/frm_PerWiseGirdavariStatus.aspx. Do some know how to import data which shows after clicking on some javascript object. It will be very helpful for me. https://docs.google.com/spreadsheets/d/1FfjDh5-z0EIZ5GBOrKElgi3dkRtJX_4BW3imTLUu7ds/edit?usp=sharing In the sheet "want" you can see the type of data I want to import and in "got" sheet is the data what I got when I try to import using importhtml function.

Google Sheets does not support scraping JavaScript content/elements so the best you can have is:
=IMPORTHTML("http://saara.mp.gov.in/saaraweb/PublicReport/frm_PerWiseGirdavariStatus.aspx";
"table"; 1)
you can easily test what can be imported simply by disabling JS for a given site like this:

Related

Extracting the exhibitor list (company name) using importxml in Google spreadsheet

Website
https://www.sialparis.com/Exhibitors/Catalogue-SIAL-Paris/exhibitors
Used this logic but not working
=IMPORTXML("https://www.sialparis.com/Exhibitors/Catalogue-SIAL-Paris/exhibitors","//div/ul/li/a/div/h3")
Xpath - //*[#id="catalog-v2"]/section[2]/div/div[2]/div/div/ul/li[1]/a/div[2]/h3
Note: If you visit the above website you will see the exhibitor list so with the help of import XML I just want to extract the company name in the google sheet.
I have attached the image for a better understanding.
Please help me to extract the exhibitor list...
JavaScript elements are not supported by the import formula of google sheets. you can always check this if you temporarily disable JS for a given site and only what's left can be imported

Google Sheet & App Script : how to get image from a link preview?

I am trying to reproduce a RSS reader like Feedly with Google Sheet and displaying with Glide as an app on my mobile phone.
Everything's fine with IMPORTFEED() function with titles, description, URL.
But it seems this function doesn't allow pictures to be displayed even if they are in the feed (which is not all the time).
So I am looking for a way to extract the main image from a blog post... the one displayed when you hover on a link in a Google Sheet cell.
I would like to get the link of that image displayed in the link preview and put that link in another cell.
Here is an example:
I tried IMAGE()
and also IMPORTXML when there is an image in the RSS feeds (but not all of them do... so I stopped)
Is it possible in Google Sheet to get the main image from the one displayed in the link preview ?
For instance, one of the blog I want to extract the main picture of a blog article would be Creajv (URL : https://creajv.com/ ; Feed : https://creajv.com/feed/)
So the IMPORTFEED() function I did in Google Sheets was :
=IMPORTFEED("https://creajv.com/feed/";"items";FALSE;3)
Which stands for :
=IMPORFEED(...) the function to import feeds from an URL
"items" the way to pull every data there is in the feed (you can use other parameters and can see all the possibilities on the GoogleFormulas documentation)
FALSE because I don't want the headers to be included
and the number 3 because I want only the last 3 results displayed.
And it displays perfectly : author, description, URL, date
But I did a little digging in Google and found that basically IMPORTFEED() cannot get images from feeds, even if it is added by the author of the blog (he has to add a feature to do it).
So I am now trying to see if there is another way which is not IMPORTFEED() to get every time the main image of a blog post.
And I saw Google Sheet is able to pull instantly it when I copy paste the URL of a blog article within a cell for instance for Creajv :
Print screen of the image I get when I click in the cell which contains the post URL
So my thoughts would be that I can pull the author, date, description etc. with IMPORTFEED (which works perfectly every time) and use a formula on the cell with the URL to get in another cell the URL of the picture pulled from the one in the link preview.
Two other possibilities might also be with Google App Script :
creating with the App Script a custom function
or creating a script pulling the image in a cell every time a new row is added via the IMPORTFEED() function.
Functions only, as Apps Script doesn't run on mobile Apps
How about this solution? I checked the website and inspected the image from the thumbnail.
Luckily, the structure is simple:
<div class="article-image">
<img src="https://creajv.com/wp-content/uploads/2020/11/HighresScreenshot00000.png" alt="Concours de Level Design avec Unreal Engine, du 11/11 au 05/12/2020">
</div>
You can get the url with IMPORTXML, and apply IMAGE to it:
=IMAGE(ImportXml("https://creajv.com/2020/11/08/concours-de-level-design-avec-unreal-engine/", "//div[#class='article-image']//img/#src"))
Since you are already retrieving the post url with your previous formula, change the source url by the correspondent cell:
=IMAGE(ImportXml(C1, "//div[#class='article-image']//img/#src"))
For example:

Data table not importing from morningstar

I am trying to take data from https://financials.morningstar.com/ratios/r.html?t=0P0000032S&culture=en&platform=sal and use the values in the table in a Google sheet. This is table1 when I inspect the element but when I use:
=IMPORTHTML("https://financials.morningstar.com/ratios/r.html?t=0P0000032S&culture=en&platform=sal","table",1)
on google sheets, it says the imported content is empty? Any help on how to import this data?
I've tried importhtml using table number references found when I inspected the page.
unfortunately, that won't be possible because the site is controlled by JavaScript and Google Sheets can't understand/import JS. you can test this simply by disabling JS for a given link and you will see a blank page:

How Do I Import Data Into My Google Sheet from a Website Using importXML

Today when experimenting with using importXML in Google Sheets, I ran into a problem. I was attempting to import the title header of a USTA Tournament page into the Google Sheet, however, this did not work as it just resulted in the HTML title of the webpage being displayed ('TournamentHome'). Below is the Google Sheet, and the website that is used:
Google Sheet and Function:
=importXML(F2, "//html//body[#id='thebody']//div[#id='content']//div[#id='pagetitle']")
Website and Section of Source Code Being Used
The title that I am trying to extract from the website is TOWPATH 24th ANNUAL THANKSGIVING JR SINGLES.
The link to the website is https://m.tennislink.usta.com/tournamenthome?T=225779
update:
=REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(
"https://m.tennislink.usta.com/tournamenthome?T=225779"), 555, 1),
"where Col1 contains 'escape'"), "\(""(.*)""\)")
unfortunately, that won't be possible the way you trying because the field you attempt to scrape is controlled by JavaScript and Google Sheets can't understand/import JS. you can test this simply by disabling JS for a given link and you will see what exactly can be imported into Google Sheets:
How about this sample formula? In this formula, the title value is directly retrieved from the script before the value is put to #pagetitle. Please think of this as just one of several answers.
Sample formula:
=REGEXEXTRACT(IMPORTXML(A1,"//div[#class='tournament_search']/script"),"escape\(""([\w\s\S]+)""")
Result:
When https://m.tennislink.usta.com/TournamentHome/tournament.aspx?T=38079 and https://m.tennislink.usta.com/tournamenthome?T=225779 are put in "A1" and "A2", the results are as follows.
Reference:
REGEXEXTRACT

Import data from ebay to google spreadsheet using IMPORTXML

I'm trying to Import a table from "https://www.ebay.com/itm/100-NEW-ALTERNATOR-VW-GOLF-GTI-GL-GLS-1-8T-1-8-2L-99-06-90A-1-YR-WARRANTY-13852/301364941754?fits=Model%3AJetta&hash=item462ac013ba:g:v7oAAOSw~YRagU4N&vxp=mtr" to a Google Spread sheet using =IMPORTXML function, The Formula I was using as below,
A1 = https://www.ebay.com/itm/100-NEW-ALTERNATOR-VW-GOLF-GTI-GL-GLS-1-8T-1-8-2L-99-06-90A-1-YR-WARRANTY-13852/301364941754?fits=Model%3AJetta&hash=item462ac013ba:g:v7oAAOSw~YRagU4N&vxp=mtr
A2 = //*[#id="w1-20ctbl"]
A3 = =IMPORTXML(A1,A2)
But it returns nothing and It says "Imported content is empty."
Can Somebody help me, I'm new to google sheet scripting and I'll really grateful if somebody can help me.
waiting to hearing from somebody....
Thanks
You cannot access the pictured table using IMPORTXML or any built-in Google Sheets formula because the table is generated when a user visits the website.
If you look at the page source, you'll see that the table does not exist. IMPORTPATH looks at this page source, which is the content before javascript rendering by the browser. When you "inspect" an element in your browser, it's inspecting the content after the javascript has been rendered.
Unfortunately, there is not a simple way to get the data you're looking for. You'll have to find or build your own scraping tool. Be careful not to violate eBay terms of use or any local laws.

Resources