How to hightlight matches in Elasticsearch completion suggester - elasticsearch

Is it possible to highlight the matching parts in the results of a completion suggester query?

(Caveat: I am new to Elasticsearch and the answer given relies solely on the two links I posted below. However, these are official Elasticsearch / Lucene sources and they are pretty clear about it)
As of today (2016-06-28), it does not seem to be possible.
This feature would not be implemented in Elasticsearch but in the underlying Lucene. However, it seems to be a complex issue that is not going to be resolved anytime soon.
See:
https://github.com/elastic/elasticsearch/issues/8089
https://issues.apache.org/jira/browse/LUCENE-4518

Related

Can ElasticSearch return relevant passages and not the entire document

I'm looking for a search engine that is able to return just relevant passages as a result and not the entire documents. Is ElasticSearch able to do this?
If you are looking to extract part of a long document, look at Highlighting. Specifically parameters like fragment_size (100 characters by default) or boundary_chars will help you to build that functionality.

Implement Fuzzy Prefix Full-text matching query in Elasticsearch

I understand that Elasticsearch tries to avoid using Fuzzy search with prefix matching, and this is also why it doesn't natively support such feature due to its complexity. However, we have a directory search system that solely relies on elasticsearch as a blackbox search engine, and we need the following logic:
E.g. Say the terms are "Michael Pierce Chem". We want to support full text search on the first two terms (with match query) and we also want to do fuzzy on the last term first and then do a prefix match, as if "Chem" matches "chemistry", "chen", and even "YouTube Chen" due to full-text support.
Please give me some advice on the implementation and design suggestions. Current stack is a NodeJS web app with Elasticsearch.

Elasticsearch: Return same search results regardless of diacritics/accents

I've got a word in the text (e.g. nagymező) and I want to be able to type in the search query nagymező or nagymezo and it should show this text which contains that word in the search results.
How can it be accomplished?
You want to use a Unicode folding strategy, probably the asciifolding filter. I'm not sure which version of Elasticsearch you're on, so here are a couple of documentation links:
asciifolding for ES 2.x (older version, but much more detailed guide)
asciifolding for ES 6.3
The trick is to remove the diacritics when you index them so they don't bother you anymore.
Have a look at ignore accents in elastic search with haystack
and also at https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-analyzers.html (look for 'diacritic' on the page).
Then, just because it will probably be useful to someone one day or the other, know that the regular expression \p{L} will match any Unicode letter :D
Hope this helps,

How do I see/debug the way SOLR find it's results?

Let's say I search for "ABLS" and the SOLR returns a result that to me does not make any sense.
How can I debug why SOLR picked this record to be returned?
debugQuery=true would help you get the detailed score calculation and the explanation for each scores.
An over view of the scoring is available at link
For detailed explaination of the debug information you can refer Link
You could add debugQuery=true&indent=true to the url and examine the results. You could also use the analysis tool in solr. Go to the admin and click analysis. You would need to read the wiki to understand either of these more in depth.
queryDebug will give you knowledge about why your scoring looks like it does (end how every field is relevant).
I will get some results that you are not understand and play with them with Solr's analysis
You should find it under:
/admin/analysis.jsp?highlight=on
Alternatively turn on highlighting over your results to see what is actually matching in your results
Solr queries are full of short parameters, hard to read and modify, especially when the parameters are too many.
And after it is even harder to debug and understand why a document is more or less relevant than another. The debug explain output usually is a three too big to fit in one page.
I found this Google Chrome extension useful to see Solr Query explain and debug in a clear manner.
For those who still use very old version of solr 3.X, "debugQuery=true" will not put the debug information. you should specify "debugQuery=on".
There are two ways of doing that. First is the query level, which means adding the debugQuery=on to your query. That will include a few things:
parsed query
debug timing information
detailed scoring information which helps you with analysis of why a give document is given a score.
In addition to that, you can use the [explain] transformer and add it to your fl parameter. For example ...&fl=*,[explain], which will result in your documents having the scoring information as another field.
The scoring information can be quite extensive and will include calculations done by the similarity algorithm. If you would like to learn more about the similarities and the scoring algorithm in Solr, have a look at this my and my colleague Radu from Sematext talk from the Activate conference: https://www.youtube.com/watch?v=kKocQdYGVJM

Lucene.NET MatchAllDocsQuery doesn't honor document boost?

I have a Lucene index of document, all nearly identical (test 1, test 2, etc.) except that some have a higher boost than others. When using a default query (MatchAllDocsQuery OR .Parse(":") on the query parser) the documents come back in the order they went in every time. By adding a search term ("test" in this case), the document boost is apparent and the documents are sorted according to the boost. I can change the boost levels around and the new order is reflected in the results. All my code is pretty standard fair, I'm using a default Sort() is both cases.
I found that this same bug was reported and fixed in Lucene back in 2005-2006, and I checked my MatchAllDocsQuery.cs file (Lucene .NET 2.9.2) and it seems to have this change present, but the behavior is as described in the ticket above.
Any ideas what I might be doing wrong? Perhaps someone running the Java version has experienced this (or not)? Thanks.
Uh, don't I feel silly now. This is as-designed behavior. I guess. According to Lucene in Action, MatchAllDocsQuery uses a constant for the boost.

Resources