Must not with and in elastic search - elasticsearch

I have 4 fields in an elastic search schema.
date
status
type
createdAt
Now, I need to fetch all the rows where
date=today
status = "confirmed"
and where type is not equals to "def"
However, it is ok if
type=def exists
but only when the field createdAt is not equals to today.
My current query looks like this:
{
must: [
{ "bool":
{
"must": [
{"term": {"date": 'now/d'}},
{"term": {"status": 'confirmed'}},
]
}
}
],
mustNot: [
{"match": {'createdAt': 'now/d'}},
{"match":{"type": "def"}}
]
}
The rows where type is not equals to "def" are fetched.
However, if a row has the type=def AND createdAT any date but today, then the row doesn't show up.
What am I doing wrong?

This query should work.
{
"query": {
"bool": {
"must": [
{ "term": {"date": "now/d" } },
{ "term": {"status": "confirmed" } }
],
"must_not": {
"bool": {
"must": [
{ "match": { "createdAt": "now/d" } },
{ "match": { "type": "def" } }
]
}
}
}
}
}
I believe the reason that your version is not working is that every query in the must_not must not match.
https://www.elastic.co/guide/en/elasticsearch/guide/current/bool-query.html#_controlling_precision
All the must clauses must match, and all the must_not clauses must not match, but how many should clauses should match? By default, none of the should clauses are required to match, with one exception: if there are no must clauses, then at least one should clause must match.

Assuming a setup like this:
PUT twitter
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"_doc": {
"properties": {
"date": {
"type": "date",
"format": "epoch_millis"
},
"createdAt": {
"type": "date",
"format": "epoch_millis"
},
"status": {
"type": "keyword"
},
"type": {
"type": "keyword"
}
}
}
}
}
and a sample doc like this (adjust values to test the query):
post twitter/_doc/1
{
"date": 1536562800000, //start of TODAY September 10, 2018 in UTC
"createdAt": 1536562799999,
"status": "confirmed",
"type": "def"
}
the following query should work:
get twitter/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now/d",
"lte": "now/d"
}
}
},
{
"term": {
"status": "confirmed"
}
}
],
"must_not": [
{
"range": {
"createdAt": {
"gte": "now/d",
"lte": "now/d"
}
}
},
{
"term": {
"type": "def"
}
}
]
}
}
}
}
}
This is a filtered query which i think for this scenario is better because it doesn't calculate the score. If you do want to calculate the score, just remove the bool and the filter from the top.

Related

Multi match query with terms lookup searching multiple indices elasticsearch 6.x

All,
I am working on building a NEST 6.x query that takes a serach term and looks in different fields in different indices.
This is the one I got so far but is not returning any results that I am expecting.
Please see the details below
Indices used
dev-sample-search
user-agents-search
The way the search should work is as follows.
The value in the query field(27921093) is searched against the
fields agentNumber, customerName, fileNumber, documentid(These are all
analyzed fileds).
The search should limit the documents to the agentNumbers the user
sampleuser#gmail.com has access to( sample data for
user-agents-search) is added below.
agentNumber, customerName, fileNumber, documentid and status are
part of the index dev-sample-search.
status field is defined as a keyword.
The fields in the user-agents-search index are all keywords
Sample user-agents-search index data:
{
"id": "sampleuser#gmail.com"",
"user": "sampleuser#gmail.com"",
"agentNumber": [
"123.456.789",
"1011.12.13.14"
]
}
Sample dev-sample-search index data:
{
"agentNumber": "123.456.789",
"customerName": "Bank of america",
"fileNumber":"test_file_1123",
"documentid":"1234456789"
}
GET dev-sample-search/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"multi_match": {
"type": "best_fields",
"query": "27921093",
"operator": "and",
"fields": [
"agentNumber",
"customerName",
"fileNumber",
"documentid^10"
]
}
}
],
"filter": [
{
"bool": {
"must": [
{
"terms": {
"agentNumber": {
"index": "user-agents-search",
"type": "_doc",
"user": "sampleuser#gmail.com",
"path": "agentNumber"
}
}
},
{
"bool": {
"must_not": [
{
"terms": {
"status": {
"value": "pending"
}
}
},
{
"term": {
"status": {
"value": "cancelled"
}
}
},
{
"term": {
"status": {
"value": "app cancelled"
}
}
}
],
"should": [
{
"term": {
"status": {
"value": "active"
}
}
},
{
"term": {
"status": {
"value": "terminated"
}
}
}
]
}
}
]
}
}
]
}
}
}
I see a couple of things that you may want to look at:
In the terms lookup query, "user": "sampleuser#gmail.com", should be "id": "sampleuser#gmail.com",.
If at least one should clause in the filter clause should match, set "minimum_should_match" : 1 on the bool query containing the should clause

Elasticsearch connect range and term to same array item

I have a user document with a field called experiences which is an array of objects, like:
{
"experiences": [
{
"end_date": "2017-03-02",
"is_valid": false
},
{
"end_date": "2015-03-02",
"is_valid": true
}
]
}
With this document I have to search users where end date is in last year and is_valid is true.
At this time I have a query -> bool and I add two must there, one range for the end_date and one term for the is_valid.
{
"query": {
"bool": {
"must": {
"term": {
"experiences.is_valid": true
},
"range": {
"experiences.end_date": {
"gte": "now-1y",
"lte": "now"
}
},
}
}
}
}
The result is that this user is selected because he has an end_date in the last year (the first exp.) and another exp. with is_valid true.
Of course this is not what I need, because I need that end_date and is_valid must be referenced to the same object, but how can we do this on Elasticsearch?
Mapping:
"experiences": {
"properties": {
"comment": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"end_date": {
"type": "date"
},
"id": {
"type": "long"
},
"is_valid": {
"type": "boolean"
},
"start_date": {
"type": "date"
}
}
}
You need to change experiences type to Nested data type.
Then apply nested query :
{
"query": {
"nested": {
"path": "experiences",
"query": {
"bool": {
"must": [
{
"term": {
"experiences.is_valid": true
}
},
{
"range": {
"experiences.end_date": {
"gte": "now-1y",
"lte": "now"
}
}
}
]
}
}
}
}
}
This is due to the way arrays of objects are flattened in Elasticsearch.
Study more here

Function Score On Nested Object

I have this index blog with the following settings and mappings.
PUT /blog
{
"settings": {
"index": {
"number_of_shards": "1"
}
},
"mappings": {
"post": {
"_all": {
"enabled": false
},
"properties": {
"title": {
"type": "string"
},
"content": {
"type": "string"
},
"visitor": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"last_visit": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
}
}
}
I want to rank my posts based on relevancy and visitor's last visit. I tried this query without success. It seems like the gauss function cannot get the value of visitor's last_visit. How to get this worked?
POST /blog/post/_search
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"visitor.last_visit": {
"origin": "now/d",
"offset": "3d",
"scale": "4d",
"decay": 0.5
}
},
"filter": {
"nested": {
"path": "visitor",
"query": {
"term": {
"visitor.id": "1"
}
}
}
}
}
]
}
}
}
Here is a query with a match for a name that uses a nested object that I had for a particular use case. I didn't use any date fields, but as I said, it does use a nested object. I used relevancy of distance along with a text match, so it's similar.
I used the answer from this question to structure my query as it matched what I was trying to do. Scoring documents by text match and distance
GET dev_search_core_data/_search?size=200
{
"query": {
"bool": {
"should": [
{
"match": {
"NAME": "Amy Smith"
}
},
{
"bool": {
"must": [
{
"function_score": {
"query": {
"nested": {
"path": "LOCATION",
"query": {
"term": {
"LOCATION.SOME_IND": {
"value": true
}
}
}
}
},
"functions": [
{
"gauss": {
"LOCATION.COORDINATES": {
"origin": "-118.309, 34.041",
"scale": "50km",
"offset": "10km",
"decay": 0.5
}
}
}
]
}
}
]
}
}
]
}
}
}
I think the problem is with the structure of your query. I always run this command first to validate my queries if I'm having any problems to eliminate any syntax issues.
GET dev_search_core_data/_validate/query?explain
This was the result:
{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"explanations": [
{
"index": "dev_search_core_data_b",
"valid": true,
"explanation": "filtered((NAME:amy NAME:smith) (+function score (ToParentBlockJoinQuery (filtered(LOCATION.SOME_IND:true)->random_access(_type:_LOCATION)),function=org.elasticsearch.index.query.functionscore.DecayFunctionParser$GeoFieldDataScoreFunction#274227b9)))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter#1012ada6)"
}
]
}
I also looked at the docs for an in-depth explanation of how the function score worked. You don't mention your version, but I'm using ES 1.6.

Elastic search DSL Syntax equivalence for SQL statement

I'm trying to replicate the below query logic in an elastic search query but something's not right.
Basically the query below returns one doc. I'd like either the first condition to be applied: "name": "iphone" OR the more complex second one which is: (username = 'gogadget' AND status_type = '1' AND created_time between 4532564 AND 64323238). Note that the nested bool must inside the should would take care of the more complex condition. I should still see 1 doc if I change the outside match of "name": "iphone" to be changed to "name": "wrong value". But I get nothing when I do that. I'm not sure where this is wrong.
The SQL Query is here below.
SELECT * from data_points
WHERE name = 'iphone'
OR
(username = 'gogadget' AND status_type = '1' AND created_time between 4532564 AND 64323238)
{
"size": 30,
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": "1",
"should": [
{
"bool": {
"must": [
{
"match": {
"username": "gogadget"
}
},
{
"terms": {
"status_type": [
"3",
"4"
]
}
},
{
"range": {
"created_time": {
"gte": 20140712,
"lte": 1405134711
}
}
}
]
}
}
],
"must": [],
"must_not": []
}
},
{
"match": {
"name": "iphone"
}
}
]
}
}
}
should query will match the query and return.
You don't need use must to aggregate your OR query.
The query should like:
{
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"match": {
"username": "gogadget"
}
}, {
"terms": {
"status_type": [
"3",
"4"
]
}
}, {
"range": {
"created_time": {
"gte": 20140712,
"lte": 1405134711
}
}
}]
}
}, {
"match": {
"name": "iphone"
}
}]
}
}
}

Elastic Search Relevance for query based on most matches

I have a following mapping
posts":{
"properties":{
"prop1": {
"type": "nested",
"properties": {
"item1": {
"type": "string",
"index": "not_analyzed"
},
"item2": {
"type": "string",
"index": "not_analyzed"
},
"item3": {
"type": "string",
"index": "not_analyzed"
}
}
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
Consider the objects indexed like following for these mapping
{
"name": "Name1",
"prop1": [
{
"item1": "val1",
"item2": "val2",
"item3": "val3"
},
{
"item1": "val1",
"item2": "val5",
"item3": "val6"
}
]
}
And another object
{
"name": "Name2",
"prop1": [
{
"item1": "val2",
"item2": "val7",
"item3": "val8"
},
{
"item1": "val12",
"item2": "val9",
"item3": "val10"
}
]
}
Now say i want to search documents which have prop1.item1 value to be either "val1" or "val2". I also want the result to be sorted in such a way that the document with both val1 and val2 would have more score than the one with only one of "val1" or "val2".
I have tried the following query but that doesnt seem to score based on number of matches
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"nested": {
"path": "prop1",
"filter": {
"or": [
{
"and": [
{"term": {"prop1.item1": "val1"}},
{"term": {"prop1.item2": "val2"}}
]
},
{
"and": [
{"term": {"prop1.item1": "val1"}},
{"term": {"prop1.item2": "val5"}}
]
},
{
"and": [
{"term": {"prop1.item1": "val12"}},
{"term": {"prop1.item2": "val9"}}
]
}
]
}
}
}
}
}
}
Now although it should give both documents, first document should have more score as it contains 2 of the things in the filter whereas second contains only one.
Can someone help with the right query to get results sorted based on most matches ?
The biggest problem you have with your query is that you are using a filter. Therefore no score is calculated. Than you use a match_all query which gives all documents a score of 1. Replace the filtered query with a query and use the bool query instead of the bool filter.
Hope that helps.
Scores aren't calculated on filters use a nested query instead:
{
"query": {
"nested": {
"score_mode": "sum",
"path": "prop1",
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"match": {
"prop1.item1": "val1"
}
},
{
"match": {
"prop1.item2": "val2"
}
}]
}
},
{
"bool": {
"must": [{
"match": {
"prop1.item1": "val1"
}
},
{
"match": {
"prop1.item2": "val5"
}
}]
}
},
{
"bool": {
"must": [{
"match": {
"prop1.item1": "val12"
}
},
{
"match": {
"prop1.item2": "val9"
}
}]
}
}]
}
}
}
}
}

Resources