Use of indexing JSON response by Google and other search engines?

Use of indexing JSON response by Google and other search engines? - ajax

Usually JSON responses for AJAX are not full stories. But succinct structured response to specific queries. Does google index them? How important I submit those AJAX URLs to Google? Because the google crawler will get only the JSON string - not the whole context? Am I missing few things?

Related

In Elasticsearch searches, are query string parameters for GET requests and the "Query DSL" for POST requests functionally equivalent?

I'm trying to create a small app that displays some simple visualizations from data indexed on Elasticsearch (on an AWS managed Elasticsearch service).
Since, to the best of my knowledge, the degree of access control that AWS offers over its ES service is based on allowing specific HTTP verbs (GET, POST, etc), to simplify my life and the ES admin's, I'm granting this app "read only" permissions, so only GET and HEAD.
However, I see that for its search API, ES exposes a GET endpoint that works with query string parameters, and a POST endpoint that works with a JSON based "Query DSL". This DSL seems to be the preferred method in all examples I have seen online and in the books.
Given the predominance of the Query DSL throughout the documentation, I was wondering:
Does the the Query DSL exposes functionality that standard query string parameters don't, or are they both functionally equivalent?
Does the POST search endpoint result in any data being actually POSTED, or is this only a workaround to allow to send JSON as a query that breaks a little bit with REST conventions?

As per the docs
You can use query parameters to define your search criteria directly in the request URI, rather than in the request body. Request URI searches do not support the full Elasticsearch Query DSL, but are handy for testing.
The GET behavior is slightly confusing but even Kibana sends a POST in the background when you perform a GET with a body. If you have to use GET, some query results might be unexpected. What's your exact use case? Which queries are we talking?
FYI more useful info is here and here.

Google API for the Search Result Events

I'm looking for the correct API for the events that show up in a regular Google Search, the ones that are structured (with name, datetime, location)
Any help or guidance is appreciated
I have tried the Custom Search with no luck, and also the Calendar API (which seems to require a calendar ID, more so for personal calendars or targeted public ones)

We've actually just made an API to scrape the Google event results. You can query it directly like this:
https://serpapi.com/search.json?engine=google_events&q=Events+in+Austin
Or if you are using Ruby, you can do something like this:
require 'google_search_results'
params = {
engine: "google_events",
q: "Events in Austin",
}
client = GoogleSearchResults.new(params)
events_results = client.get_hash[:events_results]
Some documentation: https://serpapi.com/google-events-api

I had a quick look - while I didn't find a fully programatic API yet, here are two things that can get you started on more:
How to search the events page directly: use the following URL schema: https://www.google.com/search?q=cool+conferences&oq=cool+conferences&ibp=htl;events&rciv=evn - replacing "cool+conferences" with any string you like - this can let you create dynamic URLs for event searches.
How to access event metadata for a given page - google is pushing a standard to structure data on webpages to support "smart" searches such as for events. They are using a data structure called JSON-ld. More details. If you want to read such metadata from a webpage, here is one scraper I have found that does that - extruct (though I didn't get a change to test it yet).
Hope this helps :)

Use of Mechanize

I want to get response from websites that take a simple input, which is also reflected in the parameter of the url. Is it better to simply get the result by using conventional methods, for example OpenURI.open_uri(...) with some parameter set, or it is better to use mechanize, extract the form, and get the result through submit?
The mechanize page gives an example of extracting a form and submitting it to get the search result from Google search. However, this much can be done simply as OpenURI.open_uri("http://www.google.com/search?q=...").read. Is there any reason I should try to use one way or the other?

There are lots of sites where it turns out to be easiest to use mechanize. If you need to log in, and set a cookie before accessing the data, then mechanize is a simple way of doing this. Similarly, if there are lots of hidden fields that need to be matched (such as CSRF token), then fetching the page using mechanize then submitting it with the data filled out is often a more foolproof method that crafting the URL yourself.
If it is a simple URI, like google's search pages, then manually constructing it may be simpler.

subscribing to Yahoo search feed with google reader?

Is it possible to subscribe to Yahoo search results with google reader through reader api?

Yahoo search results can be returned in RSS format, which can then be passed to the Google Reader API for subscription.
The Yahoo RSS feed URL format is (for search term 'nascar'):
http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&query=nascar
Unfortunately it looks like these feeds are not returning results at the moment. It's possible that they've been discontinued.
As an alternative, you can use the Bing search RSS feeds which have the format:
http://www.bing.com/search?format=rss&q=nascar
Bing search results power Yahoo search, so you'll get the same (or close to the same) results either way.

Scraping pages with asynchronous responses with Hpricot

I'm trying to scrape a page but the initial response has nothing in the body as the content is pumped in asynchronously, e.g. the results from a search on the apple website: http://www.apple.com/uk/search/?q=searching+for+something&sec=global
Any ideas on how I can successfully grab the results from the search with hpricot?
Thanks.

When the search page you refer to is loaded, it makes a request via javascript/ajax to some other location, then populates the search results. This is what you're seeing in the page. Hpricot itself can't help you here because it has no way to interpret the javascript that comes with the page in order to fetch the actual search results list.
Now, if what you're interested in are the search results, you'd need to analyze a bit what happens when you enter that page and type a search query. Some javascript in the page takes your query, and calls (via XMLHttpRequest or similar, AJAX techniques) some other script in Apple's server. This is the one that actually does the search in a database and returns the result.
I suggest you install Firefox with the Firebug plugin, or some other way of seeing the actual requests a page and its javascript components send and / or receive. You'll see that, for the search page you referred, it fetches two parts: First, the "featured" results that come from this URL:
http://www.apple.com/global/scripts/search_featured.php?q=mac+mini&section=global&geo=uk
Notice the search string is in the "q" parameter.
Second, a long results list comes from here:
http://www.apple.com/search/service/nph-search10?site=uk_www&filter=1&snum=50&q=mac+mini
These both are XML documents; you might have better luck parsing these URLs with Hpricot.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Use of indexing JSON response by Google and other search engines? - ajax

Usually JSON responses for AJAX are not full stories. But succinct structured response to specific queries. Does google index them? How important I submit those AJAX URLs to Google? Because the google crawler will get only the JSON string - not the whole context? Am I missing few things?

Related

In Elasticsearch searches, are query string parameters for GET requests and the "Query DSL" for POST requests functionally equivalent?

Google API for the Search Result Events

Use of Mechanize

subscribing to Yahoo search feed with google reader?

Scraping pages with asynchronous responses with Hpricot

Categories

Resources