ElasticSearch Return only if Nested Date in range and occurs more than X times - elasticsearch

I am trying to write a query that returns only the documents if they have X number of visits in a specific data range. Ie Only return guid-docA if it has 2 visits that are between "2017-01-01" and "2017-12-31"
**Example Data Documents**
{
"_id": "guid-docA"
"visits": [
{
"name": "Sarah",
"visitDate": "2013-02-27T00:00:00.000Z"
},
{
"name": "John",
"visitDate": "2017-02-27T00:00:00.000Z"
},
{
"name": "Jim",
"visitDate": "2017-12-27T00:00:00.000Z"
}
]
},
{
"_id": "guid-docB"
"visits": [
{
"name": "Brian",
"visitDate": "2013-02-27T00:00:00.000Z"
},
{
"name": "Kerri",
"visitDate": "2016-02-27T00:00:00.000Z"
},
{
"name": "Julia",
"visitDate": "2017-12-27T00:00:00.000Z"
}
]
}
I have tried script filter and can return the data if Visits count is greater than 2 but I haven't figured out how to get it if they have more than 1 (or X) visits in given year.
"filter": {
"script": {
"script": "_source.visits.size() > 1"
}
}
Not sure how to apply logic to it for the specified count.
Thank you in Advance.

First of all you need to be sure that visits field is nested Nested datatype
If so you can use a nested bool query with must clause and minimum_shold_mathc
Something like this
"query": {
"nested": {
"path": "visits",
"query": {
"bool": {
"must": [
{
"range": {
"visits.visitDate": {
"gte": "2017-01-01",
"lte": "2017-12-31"
}
}
}
],
"minimum_number_should_match": 2
}
}
}
}

Related

Elasticsearch is setting a 1.0 score to all results

I have the following json in Kibana's console:
GET /sgn-audit/_search
{
"from": "0",
"size": "10",
"sort": [
{
"updated_at": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"owner_type": "taskStatus"
}
},
{
"match": {
"owner_id": "1"
}
},
{
"match": {
"tenant_id": "1"
}
},
{
"range": {
"updated_at": {
"gte": "2021-02-11T00:00:00Z",
"lte": "2021-02-11T23:59:59Z"
}
}
}
]
}
}
}
However all results are returning with a 1.0 score and even results that don't match the queries above return as well, what should I do?
This bewildered me a few times too! Simply remove the whitespace between the URI and the search body:
GET /sgn-audit/_search
<---
{
...

Multi match query with terms lookup searching multiple indices elasticsearch 6.x

All,
I am working on building a NEST 6.x query that takes a serach term and looks in different fields in different indices.
This is the one I got so far but is not returning any results that I am expecting.
Please see the details below
Indices used
dev-sample-search
user-agents-search
The way the search should work is as follows.
The value in the query field(27921093) is searched against the
fields agentNumber, customerName, fileNumber, documentid(These are all
analyzed fileds).
The search should limit the documents to the agentNumbers the user
sampleuser#gmail.com has access to( sample data for
user-agents-search) is added below.
agentNumber, customerName, fileNumber, documentid and status are
part of the index dev-sample-search.
status field is defined as a keyword.
The fields in the user-agents-search index are all keywords
Sample user-agents-search index data:
{
"id": "sampleuser#gmail.com"",
"user": "sampleuser#gmail.com"",
"agentNumber": [
"123.456.789",
"1011.12.13.14"
]
}
Sample dev-sample-search index data:
{
"agentNumber": "123.456.789",
"customerName": "Bank of america",
"fileNumber":"test_file_1123",
"documentid":"1234456789"
}
GET dev-sample-search/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"multi_match": {
"type": "best_fields",
"query": "27921093",
"operator": "and",
"fields": [
"agentNumber",
"customerName",
"fileNumber",
"documentid^10"
]
}
}
],
"filter": [
{
"bool": {
"must": [
{
"terms": {
"agentNumber": {
"index": "user-agents-search",
"type": "_doc",
"user": "sampleuser#gmail.com",
"path": "agentNumber"
}
}
},
{
"bool": {
"must_not": [
{
"terms": {
"status": {
"value": "pending"
}
}
},
{
"term": {
"status": {
"value": "cancelled"
}
}
},
{
"term": {
"status": {
"value": "app cancelled"
}
}
}
],
"should": [
{
"term": {
"status": {
"value": "active"
}
}
},
{
"term": {
"status": {
"value": "terminated"
}
}
}
]
}
}
]
}
}
]
}
}
}
I see a couple of things that you may want to look at:
In the terms lookup query, "user": "sampleuser#gmail.com", should be "id": "sampleuser#gmail.com",.
If at least one should clause in the filter clause should match, set "minimum_should_match" : 1 on the bool query containing the should clause

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

ElasticSearch - Get only matching nested objects with All Top level fields in search response

let say I have following Document:
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
},
{
name: 'xyz',
surname: 'wef'
},
{
name: 'defg',
surname: 'pqr'
}
]
}
I want to Get only matching nested objects with All Top level fields in search response.
I mean If I search/filter for users with name 'abc', I want below response
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
}
]
}
How can I do that?
Reference : select matching objects from array in elasticsearch
If you're ok with having all root fields except the nested one and then only the matching inner hits in the nested field, then we can re-use the previous answer like this by specifying a slightly more involved source filtering parameter:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": { <---- this is where the magic happens
"_source": [
"name", "surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
Maybe late, I use nested sorting to limit element on my nested relation, here a example :
"sort": {
"ouverture.periodesOuvertures.dateDebut": {
"order": "asc",
"mode": "min",
"nested_filter": {
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
"nested_path": "ouverture.periodesOuvertures"
}
},
Since 5.5 ES (I think) you can use filter on nested query.
Here a example of nested query filter I use:
{
"nested": {
"path": "ouverture.periodesOuvertures",
"query": {
"bool": {
"must": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
],
"filter": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
]
}
}
}
}
Hope this can help ;)
Plus if you ES is not in the last version (5.5) inner_hits could slow your query Including inner hits drastically slows down query results
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-inner-hits.html#nested-inner-hits-source
"inner_hits": {
"_source" : false,
"stored_fields" : ["name", "surname"]
}
but you may need to change mapping to set those fields as "stored_fields" , otherwise you can use
"inner_hits": {}
to get a result that not that perfect.
You can make such a request, but the response will have internal fields starting with _
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {},
"query": {
"bool": {
"must": [
{ "match": { "users.name": "abc" }}
]
}
}
}
}
}
In one of my projects, My expectation was to retrieve unique conversation messages text(inner fields like messages.text) having specific tags. So instead of using inner_hits, I used aggregation like below,
final NestedAggregationBuilder aggregation = AggregationBuilders.nested("parentPath", "messages").subAggregation(AggregationBuilders.terms("innerPath").field("messages.tag"));
final NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.addAggregation(aggregation).build();
final Aggregations aggregations = elasticsearchOperations.search(searchQuery, Conversation.class).getAggregations();
final ParsedNested parentAgg = (ParsedNested) aggregations.asMap().get("parentPath");
final Aggregations childAgg = parentAgg.getAggregations();
final ParsedStringTerms childParsedNested = (ParsedStringTerms) childAgg.asMap().get("innerPath");
// Here you will get unique expected inner fields in key part.
Map<String, Long> agg = childParsedNested.getBuckets().stream().collect(Collectors.toMap(Bucket::getKeyAsString, Bucket::getDocCount));
I use the following body to get that result (I have set the full path to the values):
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
Also another way exists:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": false,
"docvalue_fields": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
See results in inner_hits of the result hits.
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/inner-hits.html#nested-inner-hits-source

Combining and with or conditions

I stuck with a query which has to combine some conditions.
this properties of the catalog are the following
_id:integer
parentID: integer
path: string
level: integer
i have absolutely no clue how to combine them, so that the query returns what I need.
a) _id has to be one of a given list ("_id": ["7","10"]) OR
b) parentID has to be of a given integer ("_parentID": "1") OR
c) path has to match a special pattern ("regexp": {"path": "/foobar.*"}) AND level has be between two integer ("range": {"level": {"gte": 2, "lte": 3 } })
Additionaly all entries have to be from one defined catalog
I will not write down all my attempts. I tried to use bool query with must and should, but this does not apply c):
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"type": {
"value": "category"
}
}
],
"should": [
{
"regexp": {
"path": "/foobar.*"
}
},
{
"range": {
"level": {
"gte": 2,
"lte": 3
}
}
},
{
"term": {
"_id": [
"7",
"10"
]
}
}
]
}
}
}
}
}
what is the best way to combine and and or conditions? i am kind of lost.
I think this should be pretty darn close to what you need.
GET devdev/alert/_search
{
"filter": {
"or": {
"filters": [
{
"terms": {
"_id": [
"eee75eJpRua4HasVzz0PeA",
"VALUE2"
]
}
},
{
"term": {
"_parentID": "SE.SE.0000"
}
},
{
"and": {
"filters": [
{
"term": {
"regexp": "foobar"
}
},
{
"range": {
"level": {
"from": 2,
"to": 3
}
}
}
]
}
}
]
}
}
}

Resources