Conditions on array of objects in ElasticSearch

Conditions on array of objects in ElasticSearch - elasticsearch

Still new to ES so apologies if this is obvious. Let's say I have a document with the structure like this
{..., 'objectArray': [{'a': 3, 'b': 2}, {'a': 0, 'b': 4}]}
in which my objectArray property is an array of objects. How would I query for documents in this index that have an object within objectArray with a = 3 and b = 4? So the above document would not be in the result set but
{..., 'objectArray': [{'a': 3, 'b': 2}, {'a': 3, 'b': 4}]}
would be
If you could show an example in NEST or just illustrate the type of query so I could read about it, that would be awesome, thanks so much

Assuming that we are talking about a property of type nested, we use nested query.
Mapping
PUT idx_nested
{
"mappings": {
"properties": {
"objectArray":{
"type": "nested"
}
}
}
}
Documents
POST idx_nested/_doc
{
"objectArray": [
{
"a": 3,
"b": 2
},
{
"a": 0,
"b": 4
}
]
}
POST idx_nested/_doc
{
"objectArray": [
{
"a": 3,
"b": 2
},
{
"a": 3,
"b": 4
}
]
}
Query
GET idx_nested/_search
{
"query": {
"nested": {
"path": "objectArray",
"query": {
"bool": {
"must": [
{
"term": {
"objectArray.a": {
"value": 3
}
}
},
{
"term": {
"objectArray.b": {
"value": 4
}
}
}
]
}
}
}
}
}
Response:
"hits": [
{
"_index": "idx_nested",
"_id": "4j5VxoQBiZR2Tvxo_zXz",
"_score": 2,
"_source": {
"objectArray": [
{
"a": 3,
"b": 2
},
{
"a": 3,
"b": 4
}
]
}
}
]

Related

Interval search for messages in Elasticsearch

I need to split the found messages into intervals. Can this be done with Elasticsearch?
For example. There are 10 messages, you need to divide them into 3 intervals. It should look like this...
[0,1,2,3,4,5,6,7,8,9] => {[0,1,2], [3,4,5,6], [7,8,9]}.
I'm only interested in the beginning of the intervals. For example: {[count - 3, min 0], [count - 4, min 3], [count - 3, min - 7]}
Example.
PUT /test_index
{
"mappings": {
"properties": {
"id": {
"type": "long"
}
}
}
}
POST /test_index/_doc/0
{
"id": 0
}
POST /test_index/_doc/1
{
"id": 1
}
POST /test_index/_doc/2
{
"id": 2
}
POST /test_index/_doc/3
{
"id": 3
}
POST /test_index/_doc/4
{
"id": 4
}
POST /test_index/_doc/5
{
"id": 5
}
POST /test_index/_doc/6
{
"id": 6
}
POST /test_index/_doc/7
{
"id": 7
}
POST /test_index/_doc/8
{
"id": 8
}
POST /test_index/_doc/9
{
"id": 9
}
It is necessary to divide the values into 3 intervals with the same number of elements in each interval:
{
...
"aggregations": {
"result": {
"buckets": [
{
"min": 0.0,
"doc_count": 3
},
{
"min": 3.0,
"doc_count": 4
},
{
"min": 7.0,
"doc_count": 3
}
]
}
}
}
There is a similar function: "variable width histogram":
GET /test_index/_search?size=0
{
"aggs": {
"result": {
"variable_width_histogram": {
"field": "id",
"buckets": 3
}
}
},
"query": {
"match_all": {}
}
}
But "variable width histogram" separates documents by id value, not by the number of elements in the bucket

Assuming your mapping is like:
{
"some_numeric_field" : {"type" : "integer"}
}
Then you can build histograms out of it with fixed interval sizes:
POST /my_index/_search?size=0
{
"aggs": {
"some_numeric_field": {
"histogram": {
"field": "some_numeric_field",
"interval": 7
}
}
}
}
Results:
{
...
"aggregations": {
"prices": {
"buckets": [
{
"key": 0.0,
"doc_count": 7
},
{
"key": 7.0,
"doc_count": 7
},
{
"key": 14.0,
"doc_count": 7
}
]
}
}
}
To get the individual values inside each bucket, just add a sub-aggregation, maybe "top_hits" or anything else like a "terms"
aggregation.
Without knowing more about your data, I really cannot help further.

How to return 0 for requested data in aggregation if no documents matched

I have an index of users with structure:
User
book_ids:[] //array of book ids
books : [{
book_id:
name:
}] //array of books
I want to create a query that returns a map of Book Id and number of users that read it.
The result of the query should include books that are not used by any user.
I have a very simplified version of the query:
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"books",
"query": {
"bool": {
"must": {
"terms": {
"books.book_id": [100,200] //book ids that provided as a parameter
}
}
}
}
}
}
]
}
},
"aggs":{
"books":{
"terms":{
"field":"book_ids",
"include":[100,200] //book ids that provided as a parameter
}
}
},
"size":0
}
The result of the query will be
buckets: [
{key: 100, doc_count: 53}
]
So there are 53 users who read the book with id 100, but there is no user who reads book with id 200(as we don't have it in response).
The question here is how can I change the query to get a following result:
buckets: [
{key: 100, doc_count: 53},
{key: 200, doc_count: 0}
]

Terms aggregations doesn't add the bucket in the result if a given term does not exist in the index.
You can use filters aggregation for this purpose:
{
"query": {
...
},
"aggs": {
"books": {
"filters": {
"filters": {
"100": { "match": { "book_ids": 100 } },
"200": { "match": { "book_ids": 200 } }
}
}
}
},
"size": 0
}
To reproduce
# post some books ids, with the 5 missing
POST /_bulk
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [1, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [4, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [6, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [7, 2, 3] }
GET /72201832/_search
{
"size": 0,
"aggs": {
"books": {
"filters": {
"filters": {
"1": { "term": {"book_ids": "1"} },
"2": { "term": {"book_ids": "2"} },
"3": { "term": {"book_ids": "3"} },
"4": { "term": {"book_ids": "4"} },
"5": { "term": {"book_ids": "5"} },
"6": { "term": {"book_ids": "6"} },
"7": { "term": {"book_ids": "7"} }
}
}
}
}
}

elasticSearch count multiple

I have two profiles, "A" and "B" both have events in the elastic
this is the elastic data for ex:
{hits: [
{tag:"A"},
{tag:"B"},
{tag:B}
]}
I want to count how much events tag "a" have and and how much "B" in one request
Ive tried this but it counts them total as 3 and I want A:1 and B:2
GET forensics/_count
{
"query": {
"terms": {
"waas_tag": ["A","B"]
}
}
}

You can use term vector API to get information about the terms of a particular field.
Adding a working example with index data and response
Index Data
{
"waas_tag": [
{
"tag": "A"
},
{
"tag": "B"
},
{
"tag": "B"
}
]
}
Term Vector API:
GET _termvectors/1?fields=waas_tag.tag
Response:
"term_vectors": {
"waas_tag.tag": {
"field_statistics": {
"sum_doc_freq": 2,
"doc_count": 1,
"sum_ttf": 3
},
"terms": {
"a": {
"term_freq": 1, // note this
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 1
}
]
},
"b": {
"term_freq": 2, // note this
"tokens": [
{
"position": 101,
"start_offset": 2,
"end_offset": 3
},
{
"position": 202,
"start_offset": 4,
"end_offset": 5
}
]
}
}
}
}

at the end I found a solution not using count but msearch
GET forensics/_msearch
{} // this means {index:"forensics"}
{"query":{"term":{"waas_tag":"A"}}}
{} // this means {index:"forensics"}
{"query":
{
"bool":{
"must":[{"term":{"waas_tag":"B"}
},
{
"range":{"#timestamp":{"gte":"now-20d","lt":"now/s"}}}]}
}
}

You can use filters aggregation to get the count for each tag in a single query without using _msearch endpoint. This query should work:
{
"size": 0,
"aggs": {
"counts": {
"filters": {
"filters": {
"CountA": {
"term": {
"waas_tag": "A"
}
},
"CountB": {
"term": {
"waas_tag": "B"
}
}
}
}
}
}
}

Search for documents with the same value in Elasticsearch

I have a schema that looks something like this:
{
"mappings": {
"entity": {
"properties": {
"a": {
"type": "text"
},
"b": {
"type": "text"
}
}
}
I want to find all the values of b which have a value of a which is shared by 2 or more entities:
Querying against:
[{"a": "a1", "b": "b1"},
{"a": "a1", "b": "b2"},
{"a": "a2", "b": "b3"}]
Should return b1 and b2.

You can do a terms aggregation on the a field with a min_doc_count of 2 and then add a top_hits sub-aggregation to find the matching b fields:
{
"size": 0,
"aggs": {
"dups": {
"terms": {
"field": "a",
"min_doc_count": 2
},
"aggs": {
"b_hits": {
"top_hits": {
"_source": "b"
}
}
}
}
}
}

elastic search: check for two values (term) in same field (nested) give no result (with one value gives results)

I've got a problem with ES when i try to check for 2 (or +2) values; that are existing in an nested doc.
First the data i put into ES and afterwards the exact case that did not work.
Mapping
POST /test
{
"mappings": {
"doc": {
"properties": {
"attributes": {
"type": "nested"
}
}
}
}
}
Testdata
POST /test/doc/1 { "attributes": [{"id": 1}, {"id": 2}, {"id": 3}] }
POST /test/doc/2 { "attributes": [{"id": 3}, {"id": 5}] }
POST /test/doc/3 { "attributes": [{"id": 5}] }
Request
POST /test/doc/_search
{
"query": {
"nested": {
"path": "attributes",
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"attributes.id": 3
}
}
]
}
}
}
}
}
}
}
Result that works (only one attribute requested)
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"attributes": [
{
"id": 3
},
{
"id": 5
}
]
}
},
{
"_index": "test",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"attributes": [
{
"id": 1
},
{
"id": 2
},
{
"id": 3
}
]
}
}
]
}
}
now i try to check against 2 attribute ids and i've got an empty result
Request (2 attributes)
POST /test/doc/_search
{
"query": {
"nested": {
"path": "attributes",
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"attributes.id": 3
}
},
{
"term": {
"attributes.id": 5
}
}
]
}
}
}
}
}
}
}
Result
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
As in the Result in the single request, i've got an document with attribute ids 3 and 5. Now i've got an empty result.
EDIT:
solution for my problem was to not use nested object!
POST /test
{
"mappings": {
"doc": {
"properties": {
"attributes": {
"type": "integer"
}
}
}
}
}
POST /test/doc/1
{ "attributes": [1, 2, 3] }
POST /test/doc/2
{ "attributes": [3, 5] }
POST /test/doc/3
{ "attributes": [5] }
POST /test/doc/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"attributes": 3
}
},
{
"term": {
"attributes": 5
}
}
]
}
}
}

It's a correct behavior of a nested object relationship. Nested mappings tell that nested object is indexed as separate hidden documents and query is made on each nested object, not on the entire collection. You said in your query that find me an attribute where id = 3 and id = 5. To be honest to you scenario is to better take a look at an inner object mapping. This article provides an explanation when inner object and nested object should be used based on a very similar example: https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
Below you can find information how data are stored for an inner object and for a nested object. Many times people using nested mapping for collection but don’t know the consequence of this decision so I think you should rethink your approach.
Inner object will generate something like that:
attributes.id [1,2,3]
attributes.id [3,5]
attributes.id [5]
nested will generate something like that:
attributes.id [{"id": 1}, {"id": 2}, {"id": 3}]
attributes.id [{"id": 3}, {"id": 5}]
attributes.id [{"id": 5}]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Conditions on array of objects in ElasticSearch - elasticsearch

Related

Interval search for messages in Elasticsearch

How to return 0 for requested data in aggregation if no documents matched

elasticSearch count multiple

Search for documents with the same value in Elasticsearch

elastic search: check for two values (term) in same field (nested) give no result (with one value gives results)

Categories

Resources