How can I query Elasticsearch to output the exact position of a searched keyword or sentence? - elasticsearch

I indexed several documents into my Elasticsearch cluster and queried the Elasticsearch cluster using some keywords and sentences, the output from my query displayed the entire documents where the sentences or keywords where be found.
I want a case where if a query is carried out, it should display just the paragraph where the sentence or keyword can be found and also show the page number it was found.

You can use highlighting functionality with source filtering. So it will show only field which is required and you can hide the remaining field.
You can set _source to false so it will return only highlighted field. If you want to search on different field and highlight on different field then you can set require_field_match to false. Please refer the elastic doc for more referance.
GET /_search
{
"_source":false,
"query": {
"match": { "content": "kimchy" }
},
"highlight": {
"require_field_match":false,
"fields": {
"content": {}
}
}
}

Related

Retrieve distinct values for search as you type in Elasticsearch

We have a field title and the type is search_as_you_type,
{
"mappings": {
"properties": {
"title": {
"type": "search_as_you_type"
}
}
}
}
and when we a searching
{
"query": {
"match_phrase_prefix": {
"title": "red"
}
}
}
we are getting duplicates results
red car
red icecream
red car
This is because we have documents with same title values.
Is there a way to indicate that result must have distinct vaules?
You can see terms aggregation of your title field in case of search as you type works on not by following the example given in [this SO answer] 1. You can also check this blog which explains how to get unique values from Elasticsearch.
Also, make sure these documents which are coming in your results are the same documents and not the different document which has the same values.
Edit:- As discussed in the comment, in this case, completion suggestor was more useful as it deals with duplicates and it solved the issue.

How to boost Elasticsearch results based on another field?

Kinda simple use case but cannot come up with good solution.
Basically I have two indexed fields: content and keywords (keyword tokenizer), where content is a long text field and keywords contain important terms within that content. When I query with some long text, I have to boost those results based on the keywords present in the matching document.
I tried querying the complete text on both content and keywords field, but it is too slow or it throws too_many_clauses error for text with more than 40 words.
{"query": {
"match": {
"keywords": {
"query": "some long text",
"analyzer": "custom_analyzer"
}
}
}}
Is there any better way? Would percolator work here?
I can relate this to my application, which is similar to Stackoverflow, which consists of question and answers, for a question, there is subject, body, tags etc.
Subject here relates to your keyword indexed field and body relate to your content indexed field. Normally subject contains the important keywords about the post, which is also the case with you.
Now coming to solution part,
How we solve it by querying both on subject and body indexed fields but boost subject by a factor of 15, which is configurable.
ES query which we use:
{
"query": {
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^15", "message" ]
}
}
}
This ES doc also has a similar example where they are boosting a subject field in multi_match query by a factor of 3.
Let me know if you have any questions.

ElasticSearch: Using match_phrase for all fields

As a user of ElasticSearch 5, I have been using something like this to search for a given phrase in all fields:
GET /my_index/_search
{
"query": {
"match_phrase": {
"_all": "this is a phrase"
}
}
}
Now, the _all field is going away, and match_phrase does not seem to work like query_string, where you can simply use something like this to run a search for all fields:
"query": {
"query_string": {
"query": "word"
}
}
What is the alternative for a exact phrase search for all fields without using the _all field from version 6.0?
I have many fields per document so specifying all of them in the query is not really a solution for me.
You can find answer in Elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
It says:
Use a custom field and the mapping copy_to parameter
So, you have to create custom fields in source, and copy all other fields to it.

Elasticsearch 6.2: terms query require lowercase input when searching on keyword

I've created an example index, with the following mapping:
{
"_doc": {
"_source": {
"enabled": False
},
"properties": {
"status": { "type": "keyword" }
}
}
}
And indexed a document:
{"status": "CMP"}
When searching the documents with this status with a terms query, I find no results:
{
"query" : {
"terms": { "status": ["CMP"]}
}
}
However, if I make the same query by putting the input in lowercase, I will find my document:
{
"query" : {
"terms": { "status": ["cmp"]}
}
}
Why is it? Since I'm searching on a keyword field, the indexed content should not be analyzed and should match an uppercase value...
no more #Oliver Charlesworth Now - in Elastic 6.x - you could continue to use a keyword datatype, lowercasing your text with a normalizer,doc here. However in every cases you should change your index mapping and reindex your docs
The index and mapping creation and the search were part of a test suite. It seems that the setup part of the test suite was not executed, and the mapping was not applied to the index.
The index was then using the default types instead of the mapping types, resulting of the use of string fields instead of keywords.
After changing the setup method of the automated tests, the mappings are well applied to the index, and the uppercase values for the status "CMP" are now matching documents.
The symptoms you're seeing shouldn't occur, unless something else is wrong.
A keyword index is not analysed, so your index should contain only CMP. A terms query is also not analysed, etc. so your index is searched only for CMP. Hence there should be a match.

In Elasticsearch match query how to deal with slash

I have a match query searching for a type of doc:
{
"query": {
"bool": {
"should": {
"match": {
"ph1_enc": "EAAQnb1kMr/e2/ADqo"
}
}
}
}
}
"EAAQnb1kMr/e2/ADqo" is the string i'm trying to match, however in the search results I can see multiple records with substring "/e2/" are also returned.
Looks like "/e2/" is indexed separately, so that this could happen.I thought the match query is to do full-text match... Is it because I missed something when creating the template? Any idea?
Add-on instead of reindex, how to modify the query to match the exact value in the query?
Which analyzer do you set in the mapping to index your data?
If you are using the default one (standard analyzer), then according to the documentation, this uses the default tokenizer that seems to split also the text by slash ('/'). The documentation redirects here for more information about the tokenizer.
So, that will index the following words 'EAAQnb1kMr', 'e2', and 'ADqo'. Accordingly, your query value will also been analyzed the same way the field was indexed. That is why documents with 'e2' are also being returned.
If you don't need to tokenize the 'ph1_enc' field, you can just set its type in the mapping as 'keyword'.
"properties": {
"ph1_enc": {
"type": "keyword"
}
}
That will not analyze the field and it will match exactly while you query.
I hope that it helps.

Resources