Elasticsearch with AND query in DSL - elasticsearch

this drives me crazy. I have no clue why this elastic search do not return me value.
I put values with this:
PUT /customer/person-test/1?pretty
{
"name": "John Doe",
"personId": 153,
"houseHoldId": 6191136,
"quarter": "2016_Q1"
}
PUT /customer/person-test/2?pretty
{
"name": "John Doe",
"personId": 153,
"houseHoldId": 6191136,
"quarter": "2016_Q2"
}
and when I query like this, it do not returns me value:
GET /customer/person-test/_search
{
"query": {
"bool": {
"must" : [
{
"term": {
"name": "John Doe"
}
},
{
"term": {
"quarter": "2016_Q1"
}
}
]
}
}
}
this query i copied from A simple AND query with Elasticsearch
I just want to get the person with "John Doe" AND "2016_Q1", why this did not work?

You should use match instead of term :
GET /customer/person-test/_search
{
"query": {
"bool": {
"must" : [
{
"match": {
"name": "John Doe"
}
},
{
"match": {
"quarter": "2016_Q1"
}
}
]
}
}
}
Explanation
Why doesn’t the term query match my document ?
String fields can be of type text (treated as full text, like the body
of an email), or keyword (treated as exact values, like an email
address or a zip code). Exact values (like numbers, dates, and
keywords) have the exact value specified in the field added to the
inverted index in order to make them searchable.
However, text fields are analyzed. This means that their values are
first passed through an analyzer to produce a list of terms, which are
then added to the inverted index.
There are many ways to analyze text: the default standard analyzer
drops most punctuation, breaks up text into individual words, and
lower cases them. For instance, the standard analyzer would turn the
string “Quick Brown Fox!” into the terms [quick, brown, fox].
This analysis process makes it possible to search for individual words
within a big block of full text.
The term query looks for the exact term in the field’s inverted
index — it doesn’t know anything about the field’s analyzer. This
makes it useful for looking up values in keyword fields, or in numeric
or date fields. When querying full text fields, use the match query
instead, which understands how the field has been analyzed.
...

its not working because of u r using default standard analyzer link for 'name' and 'quarter' .
You have two more options :-
1)change mapping :-
"name": {
"type": "string",
"index": "not_analyzed"
},
"quarter": {
"type": "string",
"index": "not_analyzed"
}
2)try this , lowercase your value since by default standard analyzer use Lower Case Token Filter :-
{
"query": {
"bool": {
"must" : [
{
"term": {
"name": "john_doe"
}
},
{
"term": {
"quarter": "2016_q1"
}
}
]
}
}
}

Related

Must match document with keyword if it occurs, must match other document if keyword doesn't occur

I'm looking for a query to return a document that "must match a document with keyword if keyword occurs in search phrase, or otherwise find another document that doesn't contain a keyword if that keyword doesn't occur in search phrase".
You can imagine an index with a store products that can be "regular" or "have something unusual" and when it's regular you don't add this to search phrase.
Like if we have this products:
"Nike T-Shirt" (attributes: [])
"Adidas T-Shirt" (attributes: ["collectible"])
If user searches for "t-shirt" we don't want him to find any collectible items. But when user searches for "collectible t-shirt" we want him to find only collectible items. There can be multiple of this kind of keywords.
Example:
I have some documents:
[
{
"id": 1,
"name": "First document",
"variants": ["red", "big"]
},
{
"id": 2,
"name": "Second document",
"variants": ["red"]
},
{
"id": 3,
"name": "Third entry",
"variants": ["green", "big"]
}
]
And I have a two search phrases that I convert to terms query:
With a keyword (big) occurrence:
{
"query": {
"bool" : {
"must": {
"match": {
"name": {
"query": "document"
}
}
},
"??? must or must_not ???" : {
"terms": {
"variants": ["some", "big", "search", "phrase"]
}
},
}
}
}
Without a keyword occurrence:
{
"query": {
"bool" : {
"must": {
"match": {
"name": {
"query": "document"
}
}
},
"??? must or must_not ???" : {
"terms": {
"variants": ["some", "search", "phrase"]
}
},
}
}
}
Now with first search I want Elasticsearch to return only documents id: 1 and 3 and for second search I want to return only document id: 2.
Using bool.must.terms.variants: ["some", "big", "search", "phrase"]
Would return one document I'm looking for, but using bool.must.terms.variants: ["some", "search", "phrase"] would return no documents.
On the other hand if I replace must with should I'd get both documents correctly ordered by score, but I must match only one document that follows the above rule.
Sorry, this may not answer your question. Since I cannot create comments yet, I'm posting this.
I dont think you can do that logic with "one" query. the logic that you describe is a two step logic.
Find records that matches the variants
If no records returned, find records that doesnt match the variants
You need the result of the first step to evaluate the second step.
As far as I understand, elasticsearch query is single step. the query is distributed to all shards holding the data, each shards will search independently and it will just return the result. i.e. it will not coordinate with other shards to check if other shards have matches.
Maybe you can try something with Aggregate.
As #dna01 mentioned, you need to send two consequent requests: the first one to find documents that match the keyword, then if nothing found the second one to find documents that don't match the keyword.
You can omit extra latency added by second request by utilizing Multi Search API
Just send two searches in a single request.
Request body example (let request be "some big search phrase" and keyword "big").
{ }
{ "query": { "bool": {"must": [{"match": {"name": "document" }}, {"terms": {"variants": ["some", "big", "search", "phrase"]}}] } } }
{ }
{ "query": { "bool": {"must": [{"match": {"name": "document" }}, {"terms": {"variants": ["some", "big", "search", "phrase"]}}], "must_not": [{"terms": {"variants": ["big"]}}] } } }

Elasticsearch template in Logstash doesn't mapping and not able to sort fields

I want to sort datas via elasticsearch rest client, below is my template in logstash
{
"index_patterns": ["index_name"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"int_var": {
"type": "keyword"
}
}
}
}
}
}
When I try to reach, with the below code
{
"size": 100,
"query": {
"bool": {
"must": {
"match": {
"match_field": user_request
}
}
}
},
"sort": [
{"int_var": {"order": "asc"}}
]
}
I've got this error
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true
How can i solve this ? Thanks for answering
Here's the documentation regarding field data and how to enable it as long as you are aware of the performance impacts.
When ingested into Elasticsearch, field values are tokenized based on their data type.
Text fields are broken into tokens delimited by whitespace. I.E. "quick brown fox" creates three tokens: 'quick', 'brown', and 'fox'. If you perform a search for any of these three words, you will generate matches.
Keyword fields, on the other hand, create a single token of the entire value. I.E. "quick brown fox" is a single token, 'quick brown fox'. Searching for anything that is not exactly 'quick brown fox' will generate no matches.
Unless you scrubbed your query before you posted it here, you need to modify the field name under match to be the actual field name, like below.
{
"size": 100,
"query": {
"bool": {
"must": {
"match": {
"int_var": "whatever value you are searching for"
}
}
}
},
"sort": [
{"int_var": {"order": "asc"}}
]
}

increase score of query where all text match and not repeating words

I'm using the following query but it gets higher score for words which are repeated and is a subset of the words typed but not the entire sentence match.
For Eg:
{
"query": {
"bool": {
"must": {
"multi_match": {
"query": "test in maths",
"fuzziness": "3",
"fields": [
"title"
],
"minimum_should_match": "75%",
"type": "most_fields"
}
}
}
}
}
If the field value contains : test test test
has higher score than the field value : test in maths
How can I get the higher score for the exact words match and not repeated words?
Thanks in Advance.
If you want to search exact sentences/phrases you should use the match_phrase query (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html).
You can add a should-clause that contains the match-phrase query to boost the score of exact phrases to your current query.
you can use match_phrase query for an exact match. match_phrase matches for exact occurrence in the sequence of the query provided.
e.g
{
'query': {
'bool': {
'must': [{
'match_phrase': {
'title': 'test in maths'
}
}]
}
}
}
Editing after comment:
Use
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text",
"index_options": "docs"
}
}
}
}
and then you can use normal match type query, the elastisearch won't consider repetition of the words in the index for the title field.

I search for all fields using Elasticsearch, do you know which field matched?

I search for all fields using Elasticsearch, do you know which field matched?
PUT my_index/user/1
{
"first_name": "John",
"last_name": "Smith",
"date_of_birth": "1970-10-24"
}
GET my_index/_search
{
"query": {
"match": {
"_all": "john 1970"
}
}
}
In the above example, "john 1970" is searched for all fields.
Since the put document matches "first_name" and "date_of_birth", it returns as a result.
How do I know that it matches "first_name" and "date_of_birth"?
The thing is that _all is a field into which all values from all other fields are copied at indexing time. Concretely, when you index your document, what ES conceptually sees is this (though the source is not modified to contain _all and _all itself is not stored, just indexed):
{
"first_name": "John",
"last_name": "Smith",
"date_of_birth": "1970-10-24",
"_all": "john smith 1970 10 24"
}
So if you match against _all then the only field that can match is _all itself, there's no way to "reverse-engineer" which field contained which matching value solely based on _all.
What you can do, however, is to use another feature called highlighting. Since the _all field is not stored it cannot be highlighted but the other fields can, so you can highlight which original fields match which values:
{
"query": {
"match": {
"_all": "john 1970"
}
},
"highlight": {
"fields": {
"*": {
"require_field_match": false
}
}
}
}
In the response, you'll see something like this which shows that first_name matches the query.
"highlight": {
"first_name": [
"<em>John</em>"
]
}

OR query with elasticsearch

I have an index with "name" and "description" filed. I am running a Boolean query against my index. Sometimes the term is present in both name and description fields, in this case the documents in which both the name and description contains the search term are scored higher compared to the ones having either the name or the description having the search term.
What I want is to score them equal. So the the documents with either name or description having the term has the same score as the document having the search term present in both name and description.
Is it possible?
Here is the example:
{
"name": "xyz",
"description": "abc xyz"
},
{
"name": "abc",
"description": "xyz pqr"
},
{
"name": "xyz",
"description": "abc pqr"
}
If the user search for term "xyz" I want all three documents above to have the same score.
As all documents contains the term "xyz" either in name or in description or in both fields.
You can use a Filtered Query for this. Filters are not scored. See the query below for searching the term "xyz":
POST <index name>/<type>/_search
{
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"should": [
{
"term": {
"name": "xyz"
}
},
{
"term": {
"description": "xyz"
}
}
]
}
}
}
}
I think you can either :
transform you query to a filter. Filters do not affect score (and are faster than queries)
or wrap your query in a "Constant score query" - see : http://www.elasticsearch.org/guide/reference/query-dsl/constant-score-query/

Resources