In elasticsearch, is there a way to show which field in a document was the "hit"? - elasticsearch

When searching some documents using elasticsearch, I'd like to see which field in the document was the "hit" that flagged it up as a search result. Is there a native way to do this, or do I need to do it in the search client?
E.g:
GET /events/_search?q=nottingham
gives me:
{//elided
{'hits'[
{'id':1,
'name': 'Some name',
'nicknames': ['Nottingham']
}]}}
it's obvious from this example that the nickname matched, but can I get elasticsearch to flag that for me?

Elasticsearch can find and highlight terms from your query in the result fields. See http://www.elasticsearch.org/guide/reference/api/search/highlighting.html for more information. Technically speaking, it's not the same as flagging fields that caused the "hit", but for most practical purposes, it's as useful.

Related

Basic elasticsearch autocomplete

I am trying to set up autocomplete for my elasticsearch cluster. The field I want to use is a text field with journal titles. I tried to use the 'standard' completion suggester field type in elasticsearch, but it used too much memory so I had to disable it.
In the meantime I would like to get something basic working, such that someone typing "science" would get a list of suggestions like "science in religion", "science experiments". Then when they type "science in" they would get "science in religion".
I guess this is just a match_phrase query and I can limit it to the top 10 results? Or there is a way to do term frequency across the index?
You can experiment with match_phrase, match_phrase_prefix, prefix over a keyword, and edge n-grams as well. Each of these work well for different use cases.

ElasticSearch autocomplete/suggest by token

I want to create search suggestions based on the tokens (and not full documents) that are present in my index.
For example:
I have a simple index for movies in which I have these two documents:
{"name":"Captain America"}
{"name":"American Made"}
If I type "ame" then I should get two suggestions (as tokens)
america
american
Similarly if I type "cap" then I should get "captain" and not "Captain America"
I am having exact same problem as this post:
https://discuss.elastic.co/t/elasticsearch-autocomplete-suggest-by-token/18392
I have gone through all types of suggesters and seems like they are focused on returning the whole documents rather than the tokens.
Apache Solr serves this requirement through its autosuggest functionality:
For example, if I type “kni“ then Solr would return knives, knife and knit as suggestions (based on the tokens coming from the indexed documents)
{
"responseHeader":{
"status":0,
"QTime":19},
"spellcheck":{
"suggestions":[
"kni",{
"numFound":3,
"startOffset":0,
"endOffset":3,
"suggestion":["knives",
"knife",
"knit"]}],
"collations":[
"collation","knives"]}}
One of the probable solution is mentioned in this StackOverflow thread:
Elasticsearch autocomplete or autosuggest by token
But it relies on explicitly adding all the suggestions in every document. This seems to be a tedious approach.
Please let me know if this can be achieved somehow in a better way.
Thanks in advance.
It wont return the part like America when you search as "ame" because its stored as "Captain America". You get the original text which is stored
You need to store it as only America.
In your case you the the field name has value "Captain America".
If you are applying the text field type for it, it may be creating tokens for you like Captain, America etc.
These are the token created at the time of indexing and created to help you in search/auto suggest.
As a response of search or autosuggest you will get the original text.
Although the alternative way is to highlight the matching term or part of the term from the response of original text of the autosuggest.

How to query for alternative spellings and representations of words in elasticsearch?

I'm using elasticsearch to query on the theme field in documents. For example:
[
{ theme: 'landcover' },
{ theme: 'land cover' },
{ theme: 'land-cover' },
etc
]
I would like to specify a search of the term landcover that matches all these documents. How do I do this?
So far I've tried using the fuzziness operator in a match search, and also a fuzzy query. However neither of these approaches seems to work, which surprised me because my understanding of fuzzy searches is that they would provide a means of inexact matching.
What am I missing? From the docs I see that fuzziness definitely looks for close approximations to a search term:
When querying text or keyword fields, fuzziness is interpreted as a Levenshtein Edit Distance — the number of one character changes that need to be made to one string to make it the same as another string.
I would consider 'landcover' and 'land cover' to be close. Is this not the case? (this is the first I have heard of Levenshtein Edit Distance so I don't know what extra/less characters mean in terms of this measurement).
An example of a match query that this doesn't seem to work:
{
query: {
match: {
'theme': {
query: 'landcover'
fuzziness: 'AUTO' // I've tried 2, '2', 6, '6', etc.
},
},
},
}
// When the term is 'land-cover' and fuzziness is auto, then 'land cover' is matched. But 'landcover' is not
And an example of a 'fuzzy' query that doesn't seem to work:
{
query: {
fuzzy: {
'theme': {
value: query,
fuzziness: 'AUTO', // Tried other values
},
},
},
}
// When the term is 'land-cover' and fuzziness is auto, then 'landcover' is matched. But 'land cover' is not. So works almost opposite to the match query in this regard
(NOTE - these queries are converted to JSON and do run and return sensible results, just the fuzziness doesn't seem to work as I would have expected)
Looking around StackOverflow, I see some questions that seem to indicate that querying an index is in some way related to how the index is created - i.e. that i cannot just run adhoc queries on any index that already exists and expect results. Is this correct? (sorry - I'm new to elasticsearch and I'm querying an index that already exists).
This answer seems related (how to find near matches for a search term): https://stackoverflow.com/a/55772800/3114742 - mentions that I should do something referred to as 'field mapping' prior to indexing data. but then the example query doesn't include the fuzziness operator. So in this case I'm confused as to what the point of the fuzziness operator is actually for.
Looking more into the documentation I've found the following:
Elasticsearch uses the concept of an 'index' rather than a database. But from the perspective of someone familiar with CouchDB and MongoDB, which are both JSON stores, there is definitely some similarity between a CouchDB database and an Elasticsearch index. Although the elasticsearch index is not an authoritative data storage in itself (it's 'built' from a source of data).
For a given index called, for example, my-index. you can insert JSON strings (documents) into my-index by PUTting to Elasticsearch:
PUT /... '{... json string ...}'
The JSON string can come directly from a JSON store (Mongo, Couch, etc.) or be cobbled together from a variety of sources. I guess.
Elasticsearch will process the document on insert and append to the inverted tree. For text fields this means K:V pairs will be created from JSON document text, with the keys being fragments of the text, and the values being references to where that text fragment is found in the source (the JSON document).
In other words, when inserting documents into an Elasticsearch index, the content is 'analyzed' to create K:V pairs that are added to the index.
I guess, then, that searching Elasticsearch means looking up search terms that are keys in the index, and comparing the values (the source of the key) to the source defined in the search (I think), and returning the source document where a search term is present for a particular field.
So:
Text is analyzed on insertion to an index
Queries are analyzed (using the same analyzer that was used to create the index)
So in my case (as mentioned above) the default analyzer is good enough to create indices that allow for basic fuzzy matching (i.e. in the match query, "land-cover" is matched to "land cover", and in the fuzzy query, "land-cover" is matched to "landcover" - I have no idea why these match differently!)
But to improve on the search results, I think I need to adjust the analyzer / tokenizer both when inserting documents into an index, and for when parsing queries to apply to an index.
My understanding of the analysis/tokenization is that this is the configuration by which inverted indexes are built from source documents. i.e. defining what the keys of the inverted index will be. As far as I can tell there is no magic in searching the index. search terms have to match keys in the inverted index otherwise there will be no results.
I'm still not sure what fuzziness is actually doing in this context.
So in short, querying elasticsearch seems to require a 'holistic perspective' over both how source data is indexed, and how queries are designed.
As a disclaimer,though, I'm not exactly an authoritative answer on this subject with less than one day of elasticsearch experience, so a better answer would still be appreciated!

Cannot use "OR" with "NOT _exists_" in Kibana 6.8.0 search bar

I am trying to create one query in the Kibana search bar to retrieve some specific documents.
The goal is to get the documents that either have the field "myDate" before 2019-10-08 or "myDate" does not exist.
I have documents that meet one or the other condition.
I started by creating this query :
myDate:<=2019-10-08 OR NOT _exists_:myDate
But no documents were returned.
Since it did not work, I tried some other ways i found online :
myDate:<=2019-10-08 OR NOT (_exists_:myDate)
myDate:<=2019-10-08 OR !(_exists_:myDate)
myDate:<=2019-10-08 OR NOT (myDate:*)
But still, no results.
When I use either "part" of the "OR" condition, it works perfectly : I get either the documents who have myDate<=2019-10-08 or the ones that do not have a "myDate" field filled.
But when I try with both conditions, I get no document.
I have to use only the search bar to find these documents, neither an elasticsearch rest query nor by using kibana filters.
Thank you for your help :)
Below query works. Use Inspect button in kibana to see what query is actually being fired and make sure you are using correct index pattern as well.
(myDate:<=2019-12-31) OR (NOT _exists_:myDate)
Take a look at Query DSL documentation for Boolean operators for more better understanding with different use cases

Elasticsearch autocomplete and searching against multiple term fields

I'm integrating elasticsearch into an asset tracking application. When I setup the mapping initially, I envisioned the 'brand' field being a single-term field like 'Hitachi', or 'Ford'. Instead, I'm finding that the brand field in the actual data contains multiple terms like: "MB 7 A/B", "B-7" or even "Brush Bull BB72X".
I have an autocomplete component setup now that I configured to do autocomplete against an edgeNGram field, and perform the actual search against an nGram field. It's completely useless the way I set it up because users expect the search results to be restricted to what the autocomplete matches.
Any suggestions on the best way to setup my mapping to support autocomplete and subsequent searches against a multiple term field like this? I'm considering a terms query against a keyword field, or possibly a match query with 'and' as the operator? I also have to deal with hyphens like "B-7".
you can use phrase suggest, the guide is here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters.html
the phrase suggest guide is here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html

Resources