I have an index with "name" and "description" filed. I am running a Boolean query against my index. Sometimes the term is present in both name and description fields, in this case the documents in which both the name and description contains the search term are scored higher compared to the ones having either the name or the description having the search term.
What I want is to score them equal. So the the documents with either name or description having the term has the same score as the document having the search term present in both name and description.
Is it possible?
Here is the example:
{
"name": "xyz",
"description": "abc xyz"
},
{
"name": "abc",
"description": "xyz pqr"
},
{
"name": "xyz",
"description": "abc pqr"
}
If the user search for term "xyz" I want all three documents above to have the same score.
As all documents contains the term "xyz" either in name or in description or in both fields.
You can use a Filtered Query for this. Filters are not scored. See the query below for searching the term "xyz":
POST <index name>/<type>/_search
{
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"should": [
{
"term": {
"name": "xyz"
}
},
{
"term": {
"description": "xyz"
}
}
]
}
}
}
}
I think you can either :
transform you query to a filter. Filters do not affect score (and are faster than queries)
or wrap your query in a "Constant score query" - see : http://www.elasticsearch.org/guide/reference/query-dsl/constant-score-query/
Related
I'm looking for a query to return a document that "must match a document with keyword if keyword occurs in search phrase, or otherwise find another document that doesn't contain a keyword if that keyword doesn't occur in search phrase".
You can imagine an index with a store products that can be "regular" or "have something unusual" and when it's regular you don't add this to search phrase.
Like if we have this products:
"Nike T-Shirt" (attributes: [])
"Adidas T-Shirt" (attributes: ["collectible"])
If user searches for "t-shirt" we don't want him to find any collectible items. But when user searches for "collectible t-shirt" we want him to find only collectible items. There can be multiple of this kind of keywords.
Example:
I have some documents:
[
{
"id": 1,
"name": "First document",
"variants": ["red", "big"]
},
{
"id": 2,
"name": "Second document",
"variants": ["red"]
},
{
"id": 3,
"name": "Third entry",
"variants": ["green", "big"]
}
]
And I have a two search phrases that I convert to terms query:
With a keyword (big) occurrence:
{
"query": {
"bool" : {
"must": {
"match": {
"name": {
"query": "document"
}
}
},
"??? must or must_not ???" : {
"terms": {
"variants": ["some", "big", "search", "phrase"]
}
},
}
}
}
Without a keyword occurrence:
{
"query": {
"bool" : {
"must": {
"match": {
"name": {
"query": "document"
}
}
},
"??? must or must_not ???" : {
"terms": {
"variants": ["some", "search", "phrase"]
}
},
}
}
}
Now with first search I want Elasticsearch to return only documents id: 1 and 3 and for second search I want to return only document id: 2.
Using bool.must.terms.variants: ["some", "big", "search", "phrase"]
Would return one document I'm looking for, but using bool.must.terms.variants: ["some", "search", "phrase"] would return no documents.
On the other hand if I replace must with should I'd get both documents correctly ordered by score, but I must match only one document that follows the above rule.
Sorry, this may not answer your question. Since I cannot create comments yet, I'm posting this.
I dont think you can do that logic with "one" query. the logic that you describe is a two step logic.
Find records that matches the variants
If no records returned, find records that doesnt match the variants
You need the result of the first step to evaluate the second step.
As far as I understand, elasticsearch query is single step. the query is distributed to all shards holding the data, each shards will search independently and it will just return the result. i.e. it will not coordinate with other shards to check if other shards have matches.
Maybe you can try something with Aggregate.
As #dna01 mentioned, you need to send two consequent requests: the first one to find documents that match the keyword, then if nothing found the second one to find documents that don't match the keyword.
You can omit extra latency added by second request by utilizing Multi Search API
Just send two searches in a single request.
Request body example (let request be "some big search phrase" and keyword "big").
{ }
{ "query": { "bool": {"must": [{"match": {"name": "document" }}, {"terms": {"variants": ["some", "big", "search", "phrase"]}}] } } }
{ }
{ "query": { "bool": {"must": [{"match": {"name": "document" }}, {"terms": {"variants": ["some", "big", "search", "phrase"]}}], "must_not": [{"terms": {"variants": ["big"]}}] } } }
I'm using the following query but it gets higher score for words which are repeated and is a subset of the words typed but not the entire sentence match.
For Eg:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "test in maths",
"fuzziness": "3",
"fields": [
"title"
],
"minimum_should_match": "75%",
"type": "most_fields"
}
}
}
}
}
If the field value contains : test test test
has higher score than the field value : test in maths
How can I get the higher score for the exact words match and not repeated words?
Thanks in Advance.
If you want to search exact sentences/phrases you should use the match_phrase query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html).
You can add a should-clause that contains the match-phrase query to boost the score of exact phrases to your current query.
you can use match_phrase query for an exact match. match_phrase matches for exact occurrence in the sequence of the query provided.
e.g
{
'query': {
'bool': {
'must': [{
'match_phrase': {
'title': 'test in maths'
}
}]
}
}
}
Editing after comment:
Use
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text",
"index_options": "docs"
}
}
}
}
and then you can use normal match type query, the elastisearch won't consider repetition of the words in the index for the title field.
this drives me crazy. I have no clue why this elastic search do not return me value.
I put values with this:
PUT /customer/person-test/1?pretty
{
"name": "John Doe",
"personId": 153,
"houseHoldId": 6191136,
"quarter": "2016_Q1"
}
PUT /customer/person-test/2?pretty
{
"name": "John Doe",
"personId": 153,
"houseHoldId": 6191136,
"quarter": "2016_Q2"
}
and when I query like this, it do not returns me value:
GET /customer/person-test/_search
{
"query": {
"bool": {
"must" : [
{
"term": {
"name": "John Doe"
}
},
{
"term": {
"quarter": "2016_Q1"
}
}
]
}
}
}
this query i copied from A simple AND query with Elasticsearch
I just want to get the person with "John Doe" AND "2016_Q1", why this did not work?
You should use match instead of term :
GET /customer/person-test/_search
{
"query": {
"bool": {
"must" : [
{
"match": {
"name": "John Doe"
}
},
{
"match": {
"quarter": "2016_Q1"
}
}
]
}
}
}
Explanation
Why doesn’t the term query match my document ?
String fields can be of type text (treated as full text, like the body
of an email), or keyword (treated as exact values, like an email
address or a zip code). Exact values (like numbers, dates, and
keywords) have the exact value specified in the field added to the
inverted index in order to make them searchable.
However, text fields are analyzed. This means that their values are
first passed through an analyzer to produce a list of terms, which are
then added to the inverted index.
There are many ways to analyze text: the default standard analyzer
drops most punctuation, breaks up text into individual words, and
lower cases them. For instance, the standard analyzer would turn the
string “Quick Brown Fox!” into the terms [quick, brown, fox].
This analysis process makes it possible to search for individual words
within a big block of full text.
The term query looks for the exact term in the field’s inverted
index — it doesn’t know anything about the field’s analyzer. This
makes it useful for looking up values in keyword fields, or in numeric
or date fields. When querying full text fields, use the match query
instead, which understands how the field has been analyzed.
...
its not working because of u r using default standard analyzer link for 'name' and 'quarter' .
You have two more options :-
1)change mapping :-
"name": {
"type": "string",
"index": "not_analyzed"
},
"quarter": {
"type": "string",
"index": "not_analyzed"
}
2)try this , lowercase your value since by default standard analyzer use Lower Case Token Filter :-
{
"query": {
"bool": {
"must" : [
{
"term": {
"name": "john_doe"
}
},
{
"term": {
"quarter": "2016_q1"
}
}
]
}
}
}
I have this query that returns if the word "mumbai" appear anywhere in the title.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"title": "mumbai"
}
}
}
}
}
So the result contains...
mumbai
mumbai ports
financial capital mumbai
I need to return only "mumbai" term and not the other documents where mumbai word is associated with other phrases. Only the first result is correct. How do I discard other results?
update
This query is working as expected and it lists the sort value 58 (random value) if the match is exact.
curl -XPOST "localhost:9200/enwiki_content/page/_search?pretty" -d'
{
"fields": "title",
"query": {
"match": {"title": "Mumbai"}
},
"sort": {
"_script": {
"script": "_source.title == \"Mumbai\" ? \"58\": \"78\";",
"type": "string"
}
}
}'
I need to return the title where match is exact Mumbai (and hence the sort value 58). How do I filter or add the script to "fields" parameter?
To get mumbai to match with doc which contains only mumbai and nothing else, you'll have to store a token count field for the field you are searching on.
This token count field will contain the number of tokens the field contains. Using this field, you can match mumbai on your title field, and match token_count field with the number of tokens in mumbai (which is one).
Note that token_count field in other documents will more than 1.
For reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/token-count.html
Note: If you are using stopwords, then you need to know about the other caveats related to token count. You can find the information in the above link.
Try the term query. It will do exact match search
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "mumbai"
}
}
]
}
}
}
Term query will not match Mumbai and mumbai, it will be counted as different words
Second Option:
If you can change the mapping then you can set the title field as not_analyzed
Third Option
match query with analyzer option
{
"query": {
"match": {
"title": {
"query": "mumbai",
"analyzer": "keyword"
}
}
}
}
In ElasticSearch how do i sort documents based on finding a phrase in the following order of fields.
Search Phrase: Miami
Fields: Title, Content, Topics
If found in Title, Content and in Topics it will show before other documents that the phrase is only found in Content.
Maybe there is a way to say:
if phrase found in Title then weight 2
if phrase found in Content then weight 1.5
if phrase found in Topics then weight 1
and this will be sum(weight) with _score
My Current query looks like
{
"index": "abc",
"type": "mydocuments",
"body": {
"query": {
"multi_match": {
"query": "miami",
"type": "phrase",
"fields": [
"title",
"content",
"topics",
"destinations"
]
}
}
}
}
You can use boosting on fields with the caret ^ notation to score them higher than other matching fields
{
"index": "abc",
"type": "mydocuments",
"body": {
"query": {
"multi_match": {
"query": "miami",
"type": "phrase",
"fields": [
"title^10",
"content^3",
"topics",
"destinations"
]
}
}
}
}
Here I have applied a weight of 10 to title and weight of 3 to content. Documents will be returned in decreasing _score order so you need to boost scores in fields that you consider more important; the values to use for boosting are up to you and may require a little trial and improvement to return documents in your preferred order.