How to implement related searches with elasticsearch? - elasticsearch

Is it possible to implement "people who searched for this, also searched for" feature with elasticsearch? Just like what we often see in the search result at the bottom of Google search. If possible, how can we achieve it?

There is this elasticsearch+Mahout based plugin available for implementing your solution. It may help you
https://github.com/codelibs/elasticsearch-taste.

Related

google search appliance - explain results

I might be missing something obvious, but is there any way to get an insight into why the GSA results for a query are what they are? E.g. Lucene searchers have explain method. Is there anything similar in GSA?
This would be extremely useful when you don't quite understand why you are getting results that you are getting and why the order is what it is.
No. According to all expert reports in enterprise search domain (i.e. Gartner, but not only), Google never explained how it ranks search results in GSA.

How to get the correct suggestion for the wrong searched term in ElasticSearch?

I am quite new to ElasticSearch and it has been very nice getting to know about it. I have to implement something as the image attached below suggests. How do I hit this? I tried looking for it but did not get anything close to it. SO had to post it as a new question.
Does ES have to do anything with it?
Take a look at Suggesters
"The suggest feature suggests similar looking terms based on a provided text by using a suggester. Parts of the suggest feature are still under development."
you can use this:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters.html
using term suggest,it can fix you wrong word from it's dictionary.

Google Custom Search API (CSE) - Retrieve only discussions

I would like to use the Google custom search API for searching only in discussions like using the query string &tbm=dsc.
Unfortunately there is no tbm parameter given in the API documentation.
Is it not possible to limit the search results to discussions only?
No, there is currently not a way to do the discussion search with CSE/GSS. The only special search is image which is documented in the API reference. You could use Labels and Refinements to limit your search to specific sites and/or patterns.
Limiting search results for Google Custom Search to only discussion websites is not possible. Just in case, remember that Google Custom Search is for searching over one website or a collection of websites. If your collection is all discussion sites, well, that doesn't seem to be the purpose of Google Custom Search. However, there may be some useful workarounds/solutions.
Workaround 0
Find or generate a collection of discussion sites you're interest in and create a custom search based on that. This would accomplish (almost) the same results you are after.
Workaround 1
You might be able to perform a redirection with refinement labels. This example redirects to a Google Scholar search. You might be able to accomplish the same result using &tbm=dsc.
<CustomSearchEngine>
<Title>Universities</Title>
<Context>
<Facet>
<FacetItem title="Papers">
<Label name="papers" mode="FILTER"/>
<Redirect url="http://scholar.google.com/scholar?q=$q"/>
</FacetItem>
</Facet>
</Context>
</CustomSearchEngine>

How do I see/debug the way SOLR find it's results?

Let's say I search for "ABLS" and the SOLR returns a result that to me does not make any sense.
How can I debug why SOLR picked this record to be returned?
debugQuery=true would help you get the detailed score calculation and the explanation for each scores.
An over view of the scoring is available at link
For detailed explaination of the debug information you can refer Link
You could add debugQuery=true&indent=true to the url and examine the results. You could also use the analysis tool in solr. Go to the admin and click analysis. You would need to read the wiki to understand either of these more in depth.
queryDebug will give you knowledge about why your scoring looks like it does (end how every field is relevant).
I will get some results that you are not understand and play with them with Solr's analysis
You should find it under:
/admin/analysis.jsp?highlight=on
Alternatively turn on highlighting over your results to see what is actually matching in your results
Solr queries are full of short parameters, hard to read and modify, especially when the parameters are too many.
And after it is even harder to debug and understand why a document is more or less relevant than another. The debug explain output usually is a three too big to fit in one page.
I found this Google Chrome extension useful to see Solr Query explain and debug in a clear manner.
For those who still use very old version of solr 3.X, "debugQuery=true" will not put the debug information. you should specify "debugQuery=on".
There are two ways of doing that. First is the query level, which means adding the debugQuery=on to your query. That will include a few things:
parsed query
debug timing information
detailed scoring information which helps you with analysis of why a give document is given a score.
In addition to that, you can use the [explain] transformer and add it to your fl parameter. For example ...&fl=*,[explain], which will result in your documents having the scoring information as another field.
The scoring information can be quite extensive and will include calculations done by the similarity algorithm. If you would like to learn more about the similarities and the scoring algorithm in Solr, have a look at this my and my colleague Radu from Sematext talk from the Activate conference: https://www.youtube.com/watch?v=kKocQdYGVJM

Searching a datastore for related topics by keyword

For example, how does StackOverflow decide other questions are similar?
When I typed in the question above and then tabbed to this memo control I saw a list of existing questions which might be the same as the one I am asking.
What technique is used to find similar questions?
I got an email from team#stackoverflow.com on Mar 20 that mentions how it works:
the "ask a question" search is
exclusively on title and will not
match anything in the body. It is a
mystery to me why people think it's
better.
The last sentence refers to the search bar, which I've found is less useful when I'm trying to find a specific question I've already seen.
I think it's plain old word matching. However, I might add that this feature does not work as well as I would like it to. It's much better to do google search with site:stackoverflow.com prefix than to rely on SO to provide the relevant suggestions.
Poorly -- using MS SQL Full Text Search, I believe. You'll have better luck using Lucene, IMO. For more background on the topic see the Wikipedia article on Lucene or the general topic of information retrieval.
The matching program would store an index of all questions. When you ask a question, all keywords in your question are matched against the index. This is similar to Google Search. Lucene open source search can be (and with high probability has been) used for this. Since the results are not quite accurate, I presume they index just the headlines of the questions, as an approximation.
The other related keyword is collaborative filtering, the algorithm popularized by Amazon to recommend products based on behavior of other similar customers. In the current case, an alternative algorithm based on collaborative filtering is: keywords are extracted from the question, then tags associated (in the history) with the keywords are found. Questions which have those tags are returned. Well, experiments are needed to see whether it works well at all.

Resources