More like this query with ElasticsearchTemplate - elasticsearch

In "More like this query" we can specify not only documents, but some other text:
{
"more_like_this" : {
"fields" : ["title", "description"],
"like" : [
{
"_index" : "imdb",
"_type" : "movies",
"_id" : "1"
},
"and potentially some more text here as well"
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
Is it possible using Spring Data and Elasticsearch Template to add a set of documents and a chunk of text to query like the example above? Reading https://docs.spring.io/spring-data/elasticsearch/docs/current/api/ it seems that MoreLikeThisQuery class does not suppport that, Am I right? Is the only way to make that query?

Related

How to make Elastic Engine understand a field is not to be analyzed for an exact match?

The question is based on the previous post where the Exact Search did not work either based on Match or MatchPhrasePrefix.
Then I found a similar kind of post here where the search field is set to be not_analyzed in the mapping definition (by #Russ Cam).
But I am using
package id="Elasticsearch.Net" version="7.6.0" targetFramework="net461"
package id="NEST" version="7.6.0" targetFramework="net461"
and might be for that reason the solution did not work.
Because If I pass "SOME", it matches with "SOME" and "SOME OTHER LOAN" which should not be the case (in my earlier post for "product value").
How can I do the same using NEST 7.6.0?
Well I'm not aware of how your current mapping looks. Also I don't know about NEST as well but I will explain
How to make Elastic Engine understand a field is not to be analyzed for an exact match?
by an example using elastic dsl.
For exact match (case sensitive) all you need to do is to define the field type as keyword. For a field of type keyword the data is indexed as it is without applying any analyzer and hence it is perfect for exact matching.
PUT test
{
"mappings": {
"properties": {
"field1": {
"type": "keyword"
}
}
}
}
Now lets index some docs
POST test/_doc/1
{
"field1":"SOME"
}
POST test/_doc/2
{
"field1": "SOME OTHER LOAN"
}
For exact matching we can use term query. Lets search for "SOME" and we should get document 1.
GET test/_search
{
"query": {
"term": {
"field1": "SOME"
}
}
}
O/P that we get:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"field1" : "SOME"
}
}
]
}
}
So the crux is make the field type as keyword and use term query.

Get from ElasticSearch why a result is a hit

In the ElasticSearch below I search for the word Balances in two fields name and notes:
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","notes"]
}
}
}
And the result in the name field:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.673515,
"hits" : [
{
"_index" : "idx",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.673515,
"_source" : {
"name" : "Deposits checking accounts balances",
"notes" : "These are the notes",
"#timestamp" : "2019-04-18T21:05:00.387Z",
"id" : 25,
"#version" : "1"
}
}
]
}
Now, I want to know in which field ElasticSearch found the value. I could evaluate the result and see if the searched text is in name or notes, but I cannot do that if it's a fuzzy search.
Can ElasticSearch tell me in which field the text was found, and in addition provide a snippet with 5 words to the left and to the right of the result to tell the user why the result is a hit?
What I want to achieve is similar to Google highlighting in bold the text that was found within a phrase.
I think the 2 solutions in Find out which fields matched in a multi match query are still the valid solutions:
Highlight to find it.
Split the query up into multiple named match queries.

How do you bulk index documents into the default mapping of ElasticSearch?

The documentation for ElasticSearch 5.5 offers no examples of how to use the bulk operation to index documents into the default mapping of an index. It also gives no indication why this is not possible, unless I'm missing that somewhere else in the documentation.
The ES 5.5 documentation gives one explicit example of bulk indexing:
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
But it also says that
The endpoints are /_bulk, /{index}/_bulk, and {index}/{type}/_bulk.
When the index or the index/type are provided, they will be used by
default on bulk items that don’t provide them explicitly.
So, the middle endpoint is valid, and it implies to me that a) you have to explicitly provide a type in the metadata for each document indexed, or b) that you can index documents into the default mapping ("_default_").
But I can't get this to work.
I've tried the /myindex/bulk endpoint with no type specified in the metadata.
I've tried it with "_type": "_default_" specified.
I've tried /myindex/_default_/bulk.
This has nothing to do with the _default_ mapping. This is about falling back to the default type that you specify in the URL. You can do the following
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
However the following snippet is exactly the same
POST /test/type1/_bulk
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
And you can mix this
POST foo/bar/_bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
In this example, one document would be indexed into foo and one into test.
Hope this makes sense.

Elastich search : more_like_this operator returns no hit

I am trying to find similar documents to one document in elastic search (the document with id '4' in this case) in my sandbox based on a field (the 'town' field in this case).
So i wrote this query, which returns no hit :
GET _search
{
"query": {
"more_like_this" : {
"fields" : ["town"],
"docs" : [
{
"_index" : "app",
"_type" : "house",
"_id" : "4"
}
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
In my dataset, the document #4 is located in a town nammed 'Paris'. Thus when I run the following query, the document #4 is in the hits results with a lot of others results :
GET _search
{
"query": {
"match": { "town": "Paris" }
}
}
I don't understand why the 'more_like_this' query does not return results whereas there are other documents that have a field with the same value.
I precise that I check the _index, _type and _id parameters using the '"match_all": {}' query.
It looks like the second example of this official elastic search ressource : http://www.elastic.co/guide/en/elasticsearch/reference/1.5/query-dsl-mlt-query.html
What's wrong with my 'more_like_this' query ?
I am assuming you have only a less number of documents.
In that case , can you give min_doc_freq as 0 and try again.
Also use POST for search -
POST _search
{
"query": {
"more_like_this" : {
"fields" : ["town"],
"docs" : [
{
"_index" : "app",
"_type" : "house",
"_id" : "4"
}
],
"min_term_freq" : 1,
"max_query_terms" : 12,
"min_doc_freq" : 1
}
}
}

Elasticsearch: Nested query under a boolean 'should' not returning results

I'm running the following query (it has been shortened for clarity):
body : {
query : {
bool : {
must : [
{
match : {
active : 1
}
},
],
should : [
{
term : {
apply : '2'
}
},
{
nested : {
path : 'items',
query : {
terms : {
'items.product' : ["1","2"]
}
}
}
}
],
minimum_should_match : 1
}
}
}
};
When I run this query, I don't pull back the documents that match the nested query in the should clause; I only pull back documents matching the first condition. What am I doing wrong? Why can't the terms query not test the field against an array of input items and return results?
When I change the nested query to a match_all or match the items.product field to an exact value, I do get results.
Changing the nested query into the following instead of the current nested query (while everything else stays the same) gives me no results either.
nested : {
path : 'items',
query : {
bool : {
must : [
{
terms : {
'items.product' : ["1","2"],
minimum_should_match : 1
}
},
]
}
}
}
Any help would be greatly appreciated - this has been driving me crazy for a couple days now!
EDITED to include discussion of the index mapping
Given that the terms condition expects a non-analyzed field (per the documentation here), I would recommend you verify that your index has a mapping that specifically makes it so. For instance:
{"mappings" : {
"your_doc_type" : {
"items" : {
"type" : "nested",
"properties" : {
"product" : {"type" : "string", "index" : "not_analyzed"},
...
... Other properties of the nested object
...
}
},
...
... Mappings for the other fields in your document type
...
}
}
That should enable the terms to do what they are supposed to do when checking items.product.
My earlier suspicion was that there is something else in your query (min_score perhaps) that is filtering out results based upon score, and that threshold is weeding out the documents that match the items.product condition but not the apply condition due to the underlying Lucene scoring model. In other words, if all other things are equal for documents meeting only one item of the should query, the ones that meet the "apply":"2" condition will score higher than the documents for which items.product is 1 or 2. This was my empirical observation querying a trivially small test set of data with your query.
Test data set:
{"active":1, "apply":"2", "items" : [{"product": "3"}]}
{"active":0, "apply":"2", "items" : [{"product": "3"}]}
{"active":1, "apply":"3", "items" : [{"product": "3"}]}
{"active":1, "apply":"3", "items" : [{"product": "1"}]}
{"active":1, "apply":"3", "items" : [{"product": "2"}]}
Based on the conditions in your query, we should see three documents returned - the first, fourth, and fifth documents.
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1rIJ0nixSnh_cG",
"_score" : 0.731233,
"_source":{"active":1, "apply":"2", "items" : [{"product": "3"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cK",
"_score" : 0.4601705,
"_source":{"active":1, "apply":"3", "items" : [{"product": "2"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cJ",
"_score" : 0.35959372,
"_source":{"active":1, "apply":"3", "items" : [{"product": "1"}]}
} ]
The expected documents came back, but you can see that the first document (for which apply is 2, meeting the first criterion of the should query) scored much higher.
If your intent is for these conditions to not affect the scoring of the documents but to use them instead as simple inclusion/exclusion criteria, you may want to switch to a filtered query. Something like:
{
"query" : {"filtered" : {
"query" : {"match_all" : {}},
"filter" : {"bool" : {
"must" : [
{"term" : {"active" : 1}}
],
"should" : [
{"term" : {"apply" : "2"}},
{"nested" : {
"path": "items",
"query" : {
"terms" : {"items.product" : ["1", "2"]}
}
}}
]
}}
}}
}
Since you are now specifying a filter instead, these conditions should not impact the scoring of the returned documents but instead only determine whether a document qualifies at all for the result set (the scores are then calculated independently of the conditions above). Using this filtered query, the results from my dumb data set are:
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1rIJ0nixSnh_cG",
"_score" : 1.0,
"_source":{"active":1, "apply":"2", "items" : [{"product": "3"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cK",
"_score" : 1.0,
"_source":{"active":1, "apply":"3", "items" : [{"product": "2"}]}
}, {
"_index" : "test",
"_type" : "test",
"_id" : "AUtrND1sIJ0nixSnh_cJ",
"_score" : 1.0,
"_source":{"active":1, "apply":"3", "items" : [{"product": "1"}]}
} ]
The scores are now identical for all returned documents, without regard for which part of the should was satisfied.
Note that the query property above is match_all - if you had other conditions in your query that are not represented in the original question, then you would need to modify this accordingly.

Resources