Search across a searchable field in Elasticsearch - elasticsearch

I'm looking for a way of searching across a tokenized field in Elasticsearch, so instead of returning the Elements indexed with my search, return a unique set of values that matched the best.
{
"id": 1,
"brand": [
"word1",
"another"
]
},
{
"id": 2,
"brand": [
"word2",
"word3",
"yet_another"
]
}
So searching for wo, I would recieve a list of the words word1, word2 and word3 scored, of course.
Should I create a new index for that with these values?
Is there a way I can do that work by reusing the tokenization of my index?

Related

Elasticsearch filtering with input array where

Our requirement is to filter objects by an array field of data by giving an input array to elasticsearch. Any combination input array elements is match with mentions array.
Small example
data:[
{"name": "xxxx", "mentions": ["X", "Y"]},
{"name": "yyyy", "mentions": ["K", "L", "M"]},
{"name": "zzz", "mentions": ["X", "L"]},
]
Input: [X, Y, K, L]
Output:[
{"name": "xxxx", "mentions": ["X", "Y"]},
{"name": "zzz", "mentions": ["X", "L"]}
]
Objects must be filtered according to mentions field, where each member of mentions array must be in the given input array, if there is any inconsistency, then ignore the object.
Terms query or bool with must field is not solving our problem.
A very simplistic solution is to make use of a Regex Expression in a Regex Query:
Below is how your query would be:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must_not": [ <---- Note this.
{
"regexp": {
"mentions": "[^XYKL]" <---- Note this.
}
}
]
}
}
}
Square Brackets [...] would mean to match one of the characters present.
What I've done is simply used a Negate Character ^ inside the bracket and wrapped that Regex Logic inside a must_not clause of a Bool Query and it should give you what you are looking for.
The query would only return documents with values X Y K L values. Any other values barring that, it would not return those documents.
Note that I'm assuming the field mentions is of type keyword.

Create a keyword field concatenated of other fields

I've got an index with a mapping of 3 fields. Let's say f1, f2 and f3.
I want a new keyword field with the concatenation of the values of f1, f2 and f3 to be able to aggregate by it to avoid having lots of nested loops when checking the search results.
I've seen that this could be achieved by source transformation, but since elastic v5, this feature was deleted.
ElasticSearch version used: 6.5
Q: How can I archieve the concatenation in ElasticSearch v 6.5?
There was indeed source transformation prior to ES 5, but as of ES 5 there is now a more powerful feature called ingest nodes which will allow you to easily achieve what you need:
First, define an ingest pipeline using a set processor that will help you concatenate three fields into one:
PUT _ingest/pipeline/concat
{
"processors": [
{
"set": {
"field": "field4",
"value": "{{field1}} {{field2}} {{field3}}"
}
}
]
}
You can then index a document using that pipeline:
PUT index/doc/1?pipeline=concat
{
"field1": "1",
"field2": "2",
"field3": "3"
}
And the indexed document will look like:
{
"field1": "1",
"field2": "2",
"field3": "3",
"field4": "1 2 3"
}
Just make sure to create the index with the appropriate mapping for field4 prior to indexing the first document.

Spring Data MongoDB with text index: difference between matchingany and matchingphrase

I am using MongoDB and Spring for an application
I am using a text index on my collection.
I found two methods:
matchingany
matchingphrase
But I am unable to understand the difference.
Please help me to understand them.
If you want a match on multiple words forming a phrase then use matchingPhrase, if you want a match on at least one word in a ist of words then use matchingAny.
For example, given these documents (and assuming the title attribute is text-indexed):
{ "id": 1, "title": "The days of the week"}
{ "id": 2, "title": "Once a week"}
{ "id": 3, "title": "Once a month"}
matchingAny("Once") will match the documents with id=2 and id=3
matchingAny("month", "foo' , "bar") will match the document with id=3
matchingPhrase("The days of the week") will match the document with id=1
More details in the docs.

Elasticsearch: how to know which field the results are sorted by?

In Elasticsearch, is there any way to check which field the results are sorted by? I want something like inner-hits for sort clause.
Imagine that your documents have this kind of form:
{"numerals" : [ // nested
{"key": "point", "value": 30},
{"key": "points", "value": 200},
{"key": "score", "value": 20},
{"key": "scores", "value": 40}
]
}
and you sort the results by:
{"numerals.value": {
"nested_path": "numerals",
"nested_filter": {
"match": {
"numerals.key": "score"}}}}
Now I have no idea how to know the field by which the results are actually sorted: it's probably scores at this document, but is perhaps score at the others? There are 2 problems - 1. You cannot use inner-hits nor highlight for the nested fields. and - 2. Even if you can, it doesn't solve the issue if there are multiple matching candidates.
The question is about sorting by fields that are inside nested objects.
So this is what the documention
https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-sorting.html
and
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html#_nested_sorting_example
says:
Elasticsearch will first restrict the nested documents by the "nested_filter"-query and then sort on the same way as for multi-valued fields:
Exactly the way as if there would be only the filtered nested documents as inner objects aka as if there would be only the root document with a multi-valued field which contains exactly all value which belong to the filtered nested objects
( in your example there will only one value remain: 20).
If you want to be sure about the sort order insert a "mode" parameter:
"min", "max", "sum", "avg" or "median"
If you do not specify the "mode" parameter according to the corresponding issue the min-value will be picked for "asc" and the max-value will be picked for "desc"-order:
By default when sorting on a multi-valued field the lowest or highest
value will be picked from the field values depending on the sort
order.

Elasticsearch boost but only one occurrence of term per field

I'm currently sending the following query to ElasticSearch:
{
"size": 100,
"query": {
"function_score": {
"query": {
"simple_query_string": {
"query": "term1",
"fields": ["field1^10", "field2^5"]
}
}]
}
}
}
Now imagine I have two documents.
Document1 contains one occurrence of "term1" on field1
Document2 contains three occurrences of "term1" on field2
What I get: Elastic returns Document2 above Document1
What I want: Document1 above Document2.
To achieve this, Elastic should not multiply the number of occurrences of "term1" just that it "appears". What should I do to my query?
There seems to be two kinds of options to force Elastic not give more weight based on number of occurrences of a term.
The first one is to map the fields to disable term frequency (TF): https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html#tfidf
The second one is to use the Constant Score Query: https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html

Resources