Get data from XML by Google SheetS ImportXML function - xpath

I'm trying to scrape data from XML with the function IMPORTXML of Google sheet but the return is empty.
I tried these formulae:
=IMPORTXML("https://www.futbin.com/20/player/42955/", "//span[#id='ps-lowest-1']/text()")
=INDEX(IMPORTXML("https://www.futbin.com/20/player/42955/" , "//div[#class='xbox-lowest-1']"),1,1)
= IMPORTXML("https://www.futbin.com/20/player/42955/", "//*[#id='xbox-lowest-1']")
=IMPORTXML("https://www.futbin.com/20/player/42955/", "//*[#id='xbox-lowest-1']/text()")
Maybe the data is generated after by a script or something else.

you could do:
=QUERY(ARRAY_CONSTRAIN(IMPORTDATA("https://www.futbin.com/20/player/42955/"), 5000, 1),
"where lower(Col1) contains 'lowest'")
but as you can see there are no numeric values in between those tags because Google Sheets does not support web scraping of JavaScript elements

Related

Retrieve specific value from api response in google sheets

Sorting out some of my trading cards into google sheets. I would like retrieve the average price of the card using yugiohprices api (https://yugiohprices.docs.apiary.io/#reference/checking-card-prices/check-price-for-cards-print-tag/check-price-for-card's-print-tag?console=1)
I can link the cell to retrieve the cards data. But I would only like to get the average price value
sample sheet
NAME, SET, AVERAGE_PRICE
JINZO PSV-000 =IMPORTDATA(CONCATENATE("https://yugiohprices.com/api/price_for_print_tag/",B2))
https://yugiohprices.com/api/price_for_print_tag/PSV-000
{"status":"success","data":{"name":"Blue-Eyes White Dragon","card_type":"monster","property":null,"family":"light","type":"Dragon / Normal","price_data":{"name":"Legend of Blue Eyes White Dragon","print_tag":"LOB-001","rarity":"Ultra Rare","price_data":{"status":"success","data":{"listings":[],"prices":{"high":125000.0,"low":22.22,"average":81.48,"shift":-0.281354736285059,"shift_3":-0.275475724702116,"shift_7":-0.302755433852473,"shift_21":-0.345173993409949,"shift_30":-0.302874743326489,"shift_90":0.0978172999191593,"shift_180":0.886984715145901,"shift_365":-0.525699982536818,"updated_at":"2022-04-11 13:55:38 -0600"}}}}}}
When you want to retrieve the value from JSON data, I thought that a custom function created by Google Apps Script might be useful.
Sample script:
Please copy and paste the following script to the script editor of Spreadsheet. And save the script. When you use this, please put =SAMPLE("https://yugiohprices.com/api/price_for_print_tag/PSV-000") to a cell. By this, the value of obj.data.price_data.price_data.data.prices.average is retrieved.
function SAMPLE(url) {
const res = UrlFetchApp.fetch(url);
const obj = JSON.parse(res.getContentText());
return obj.data.price_data.price_data.data.prices.average;
}
Testing:
When this script is run using =SAMPLE("https://yugiohprices.com/api/price_for_print_tag/PSV-000"), the following result is obtained.
Note:
If the above custom function doesn't work, please reopen Spreadsheet and test it again.
References:
Custom Functions in Google Sheets

Web Scrape returns N/A, not sure how to keep the data returning

totally new here & figured you guys will know the answer before I can even come near figuring this out.
I have a google form, feeding a live google sheet which users submit car reg numbers.
My goal is to have the reg number display the make, model besides the reg.
I have implemented an importXML function & the cell I expect to see the data loads up for a few minutes, then reverts to "N/A" or sometimes doesn't pull the data at all, but manually visiting the URL does return the data.
The import XML function uses a cell, made up of URL string, then adds the Reg/VIN input by form submission. That cell looks something like this "basicvehicledetails.com/reg" and returns the Class on the webpage relevant for Make/Model in separate cells.
I need the data to stay once it is returned, but don't know how to do that.
Another option is a car check website that requires a login, and then the reg to be input & searched before a webpage returns in-depth data on the car, is this something I can get to export to google sheet/excel spreadsheet?
I'm really stuck for this one, and would really appreciate any help as updating each car manually is painful.
try like:
=REGEXREPLACE(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(
"https://www.motorcheck.ie/free-car-check/?vrm=152D1234"),5000,1),
"where Col1 contains 'dark-left'", 0), "</?\S+[^<>]*>", )
UPDATE:
=IMPORTHTML("https://www.cartell.ie/ssl/servlet/beginStarLookup?registration=152D1234",
"table", 1)

Xpath Scrub of Website Incomplete Results

I am trying to use the Google Spreadsheet function "importXML" to pull in all links and titles from a Khan Academy Website:
https://www.khanacademy.org/commoncore/grade-HSA-A-SSE
So far I have tried:
=IMPORTXML("https://www.khanacademy.org/commoncore/grade-HSA-A-SSE", "//a[#class='standard-preview']")
It brings in 29 results, but not all of the "a" elements with class "standard-preview". On the webpage, there are many more elements with that class than just the 29 results.
How do I grab all the elements with the class "standard-preview". Why would my xpath not return some of the values?
My spreadsheet is below:
https://docs.google.com/spreadsheets/d/1pP-WMnoCYzG38VyT_0tYpdblSKjNGvDpa8dRMnraQ7w/edit?usp=sharing

Google image search says api no longer available

I am using google image search API. Till yesterday it was working, but today morning it says "This API is no longer available"
Is it officially closed, Or any error at my side
Request
https://ajax.googleapis.com/ajax/services/search/images?v=1.0&rsz=8&q=cute+kittens
Response
{"responseData": null, "responseDetails": "This API is no longer available.", "responseStatus": 403}
The answer I found was using Google's Custom Search Engine (CSE) API. Note that this is limited to 100 free requests per day.
Creating cx and modifying it to search for images
Create custom search engine at https://cse.google.com/cse/create/new based on your search criteria.
Choose sites to search (leave this blank if you want to search the entire web, otherwise you can enter a site to search in one particular site)
Enter a name and a language for your search engine.
Click "create." You can now find cx in your browser URL.
Under "Modify your search engine," click the "Control Panel" button. In the "edit" section you will find an "Image Search" label with an ON/OFF button, change it to ON. Click "update" to save your changes.
Conducting a search with the API
The API endpoint url is https://www.googleapis.com/customsearch/v1
The following JSON parameters are used for this API:
q: specifies search text
num: specifies number of results. Requires an integer value between 1 and 10 (inclusive)
start: the "offset" for the results, which result the search should start at. Requires an integer value between 1 and 101.
imgSize: the size of the image. I used "medium"
searchType: must be set to "image"
filetype: specifies the file type for the image. I used `"jpg", but you can leave this out if file extension doesn't matter to you.
key: an API key, obtained from https://console.developers.google.com/
cx: the custom search engine ID from the previous section
Simply make a GET request by passing above parameters as JSON to the API endpoint (also listed above).
Note: If you set a list of referrers in the search engine settings, visiting the URL via your browser will likely not work. You will need to make an AJAX call (or the equivalent from another language) from a server specified in this list. It will work for only the referrers which were specified in the configuration settings.
Reference:
https://developers.google.com/custom-search/json-api/v1/reference/cse/list
Now You can search images with Custom image search API.
You can do this with two steps:
Get CUSTOM_SEARCH_ID
Go to - https://cse.google.ru/cse/all
Here you must create new Search Engine. Do this and enable Image Search at there.
Screen(i am Russian... sorry)
then get this search engine ID. To do this press at Get Code button:
And there find line with cx = "here will be your CUSTOM_SEARCH_ID":
Ok. It's done, now second step:
Get SERVER_KEY
Go to google Console - https://console.developers.google.com/project
Press to Create project button, enter the name and other required information.
Pick this project and go to Enable Apis
Now find Custom Search Engine.
And Enable it.
Now we must go to Credentials and create new Server Key:
Ok. Now we can use Image Search.
Query:
https://www.googleapis.com/customsearch/v1?key=SERVER_KEY&cx=CUSTOM_SEARCH_ID&q=flower&searchType=image&fileType=jpg&imgSize=xlarge&alt=json
Replace the SERVER_KEY and CUSTOM_SEARCH_ID and call this request.
Limit: for free you can search only 100 images per day.
If this is just for your own purposes (not for production) and you're not planning to abuse Google Image Search, you can simply extract first image URL from Google search results using JSOUP.
For example:
Code to retrieve image URL of the first thumbnail:
public static String FindImage(String question, String ua) {
String finRes = "";
try {
String googleUrl = "https://www.google.com/search?tbm=isch&q=" + question.replace(",", "");
Document doc1 = Jsoup.connect(googleUrl).userAgent(ua).timeout(10 * 1000).get();
Element media = doc1.select("[data-src]").first();
String finUrl = media.attr("abs:data-src");
finRes= "<img src=\"" + finUrl.replace("&quot", "") + "\" border=1/>";
} catch (Exception e) {
System.out.println(e);
}
return finRes;
}
Guide:
question - image search term
ua - user agent of the browser
After I read several responses I compiled a response with images:
Access the website: https://developers.google.com/custom-search/v1/introduction, on the page you will find this part, so click in the button Get a Key:
Create or select a project, and then NEXT:
Copy the API KEY:
Access the website to create your CX: https://cse.google.com/cse/create/new, write some random domain like “www.anypage.com”, (after we will delete), select a language, and define some name for your search engine. Click on the Button CREATE.
Will see this page, then click in Control Panel:
Copy the Search engine ID for later (this is your CX). After you can set to search in all websites (active Search the entire web, select on the random website www.anypage.com then click on the button Delete) and you can active Image search. So will see like this:
And Using REST you can get the results, using this example code (searching for flower):
<html lang="pt">
<head>
<title>JSON Custom Search API Example</title>
</head>
<body>
<div id="content"></div>
<script>
function hndlr(response) {
console.log(response);
for (var i = 0; i < response.items.length; i++) {
var item = response.items[i];
// in production code, item.htmlTitle should have the HTML entities escaped.
document.getElementById("content").innerHTML += "<br>" + item.htmlTitle;
}
}
</script>
<script src="https://www.googleapis.com/customsearch/v1?key=API_KEY&cx=SEARCH_ENGINE_KEY&q=flower&searchType=image&callback=hndlr"></script>
</body>
</html>
The base code is found here: https://developers.google.com/custom-search/v1/using_rest
After setting your API_KEY (key) and your SEARCH ENGINE KEY (cx), the result will see like this:
Thanks to #Vijay Shegokar, #aftamat4ik and #Alladinian
This is the full URL template to be used
We can eliminate unnecessary parameters.
https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&cref={cref?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite={relatedSite?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json
I am using
https://www.googleapis.com/customsearch/v1?key=ap_key&cx=cx&q=hello&searchType=image&imgSize=xlarge&alt=json&num=10&start=1
Change the API url to
Google Custom Image search
Provide the same parameters along with with API KEY and CX.
More Info and Explorer
The Yahoo Boss API is a reasonable substitute, although it's not free and the results are not quite as good.
UPDATE: YAHOO BOSS JSON Search API will discontinue on March 31, 2016
SerpAPI enables to search through Google Images and returns a clean json. it integrates with most of the programming languages: python, php, java, golang, nodejs...
https://serpapi.com/images-results
Google limit the number of search per day.
but this service provides unlimited searches...
looks like we need to implement google custom search API
https://developers.google.com/custom-search/
says so on top of the page you provided yourself

reading RSS returns different response on localhost

im trying to parse an rss feed on localhost, and it brings back the right results, but when i try to do that from another (preproduction server) and live, it returns a list of comments made by users on the hydrapinion website which is completely unrelated, have i been spoofed? how can i debug this? its just an rss feed and a simple LINQ code!
string bingurl = "http://www.bing.com/search?form=QBRE&filt=rf&qs=n&format=rss&count=10&q=+environment+(site:www.australianit.news.com.au)";
XDocument doc = XDocument.Load(bingurl);
IEnumerable<XElement> items = (from i in doc.Descendants("item")
orderby DateTime.Parse(i.Element("pubDate").Value) descending
select i).Take(10);
rpData.DataSource = items;
rpData.DataBind();
i tried a different combination, and i get no results at all! do u think the server settings have antyhing to do with retrieving rss results?
i found some decent guide for bing search, but as it turned out, bing doesnt bring decent resuls! and it appears to me it changes the results set according to where ur calling it from, i tried adding "loc:" to the rss, when called from code it returned different results than when called on bing website itself, i dont know the algorithm they are using but it is getting more obvious
the guide is here:
http://help.live.com/help.aspx?mkt=en-AU&project=a

Resources