How write a query on item in a collection in Elastic Search - elasticsearch

This is my problem.
I have documents with many states and I'm looking for some state at a define time. Sample :
{
"_index" : "toto-2022",
"_type" : "_doc",
"_id" : "9eba6fbbe4284f9d92ad183f5a4bab43",
"_score" : 1.0,
"_source" : {
"conversationId" : "9eba6fbbe4284f9d92ad183f5a4bab43",
"#timestamp" : "2022-11-10T14:17:53.372+0100",
"etats" : [
{
"date" : "2022-11-09T12:32:30.091+0100",
"niveauVisibilite" : 1,
"etat" : "ROUTEE",
"commentaire" : "Routée",
"rang" : 1
},
{
"date" : "2022-11-10T07:07:55.351+0100",
"niveauVisibilite" : 1,
"etat" : "TERMINEE",
"commentaire" : "Terminée",
"rang" : 2
}
]
}
},
{
"_index" : "toto-2022",
"_type" : "_doc",
"_id" : "476ffa93b550497da26380348d97e199",
"etats" : [
{
"date" : "2022-11-10T08:03:17.869+0100",
"niveauVisibilite" : 1,
"etat" : "ROUTEE",
"commentaire" : "Routée",
"rang" : 1
},
{
"date" : "2022-11-10T07:05:23.669+0100",
"niveauVisibilite" : 1,
"etat" : "TERMINEE",
"commentaire" : "Terminée",
"rang" : 2
}
]
}
I only want to have the document with the status ROUTEE on the 11-09-2022.
My query is
GET /toto-2022/_search
{
"query":{
"bool":{
"must": [
{
"match": {"etats.etat": "ROUTEE"}
}
],
"filter":[
{"range" : {"etats.date" : {"gte" : "2022-11-09T00:00:00.000+0100","lt" : "2022-11-09T23:59:59.000+0100"}}}
]
}
}
}
But the 2 documents are selected.
Thank you for your response
I would like to have only the first document

Firstly you need to define etats as a nested field. Elastic search flattens arrays so relation ship between objects in array is lost. Nested field preserves each object in an array as a separate document . You can read about nested field here.
Once mapping is changed you can use nested query to get data
{
"query": {
"nested": {
"path": "etats",
"inner_hits": {} --> it will return matched inner documents.
"query": {
"bool": {
"filter":[
{"range" : {"etats.date" : {"gte" : "2022-11-09T00:00:00.000+0100","lt" : "2022-11-09T23:59:59.000+0100"}}}],
],
"must": [
{
"match": {"etats.etat": "ROUTEE"}
}
]
}
}
}
}
}

Related

How to get different aggregations for different params in bool in Elasticsearch

Document structure:
{
"_index" : "admin",
"_type" : "_doc",
"_id" : "9rEy94EB7k-V-3UYmchn",
"_source" : {
"entity_title" : "Title CPP",
"entity_type" : "type1",
"entity_score" : 185346,
"entity" : {
"customer_id" : "cid1",
"customer_name" : "cname1",
}
}
}
{
"_index" : "admin",
"_type" : "_doc",
"_id" : "9rEy94EB7k-V-3UYmchn",
"_source" : {
"entity_title" : "Title APP",
"entity_type" : "type1",
"entity_score" : 12,
"entity" : {
"customer_id" : "cid2",
"customer_name" : "cname2",
}
}
}
My query
GET /admin/_search
{
"size": 0,
"query" : {
"bool" : {
"should" : [
{
"query_string" : {"default_field" : "entity_title", "query" : "app*"}
},
{
"fuzzy": {"entity_title": {"value": "app"}}
}
]
}
}
},
"aggs": {
"by_entity_type": {
"terms":{
"field":"entity_type",
"size": 4 <total number of entity types>
},
"aggs": {
"by_top_score":{"top_hits":{"size":10, "sort": {"entity_score": {"order" : "desc", "mode" : "avg"}}}}
}
}
}
I need to
Aggregate all search results by entity_type.
Sort the results of matched query (query_string) by _score.
Sort results of fuzzy search by 'entity_score'.
Kindly help me to fetch this as a separate or in same aggregation.
Thanks.

Bool Filter not showing just the filtered data in Elastic Search

I have an index "tag_nested" which has data of following type :
{
"jobid": 1,
"table_name": "table_A",
"Tags": [
{
"TagType": "WorkType",
"Tag": "ETL"
},
{
"TagType": "Subject Area",
"Tag": "Telecom"
}
]
}
When I fire the query to filter data on "Tag" and "TagType" by firing following query :
POST /tag_nested/_search
{
"query": {
"bool": {
"must": {"match_all": {}},
"filter": [
{"term": {
"Tags.Tag.keyword": "ETL"
}},
{"term": {
"Tags.TagType.keyword": "WorkType"
}}
]
}
}
}
It gives me the following output. The problem I am facing is while the above query filters documents which doesn't have filtered data BUT it shows all the "Tags" of that document instead of just the filter one
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
},
{
"TagType" : "Subject Area",
"Tag" : "Telecom"
}
]
}
}
Instead of above result I want my output to be like :
{
"_index" : "tag_nested",
"_type" : "_doc",
"_id" : "9",
"_score" : 1.0,
"_source" : {
"jobid" : 1,
"table_name" : "table_A",
"Tags" : [
{
"TagType" : "WorkType",
"Tag" : "ETL"
}
]
}
}
Already answered here, here and here.
TL;DR you'll need to make your Tags field of type nested, resync your index & use inner_hits to only fetch the applicable tag group.

Elasticsearch returns 0.0 for metrics sum aggregation

Elasticsearch returns 0.0 for metrics sum aggregation. Expected output will be some of metric probe_http_duration_seconds.
Elasticsearch version: 7.1.1
Query used for aggregation:
GET some_metric/_search
{
"query": {
"bool": {
"must": [
{
"range": { "time": { "gte" : "now-1m", "lt": "now" } }
},
{
"match": {"name": "probe_http_duration_seconds"}
},
{
"match": {"labels.instance": "some-instance"}
}
]
}
},
"aggs" : {
"sum_is" : { "sum": { "field" : "value" } }
}
}
The above query returns for documents followed by:
"aggregations" : {
"sum_is" : {
"value" : 0.0
}
Each document in the index looks like:
{
"_index" : "some_metric-2019.12.03-000004",
"_type" : "_doc",
"_id" : "_wCjz24Bk6FPpmW1lC31",
"_score" : 5.3475914,
"_source" : {
"name" : "probe_http_duration_seconds",
"time" : 1575441630181,
"value" : 0,
"labels" : {
"__name__" : "probe_http_duration_seconds",
"app" : "some-events",
"i" : "some_metric",
"instance" : "some-instance",
"job" : "someproject-k8s-service",
"kubernetes_name" : "some-events",
"kubernetes_namespace" : "deploytest",
"phase" : "connect",
"t" : "type",
"v" : "1"
}
}
}
In query on changing must to should, I get:
"aggregations" : {
"sum_is" : {
"value" : 1.5389155527088604E16
}
}
The index dynamic mapping looks something like this:
"mappings" : {
"dynamic_templates" : [
{
"strings" : {
"unmatch" : "*seconds*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "keyword"
}
}
},
{
"to_float" : {
"match" : "*seconds*",
"mapping" : {
"type" : "float"
}
}
}
],
However in our requirement, we need results matching all of the clauses in the query.
For metrics aggregation elasticsearch converts everything to double, still this doesn't explain result as zero.
Any pointers will be helpful. Thanks for attention.
NOTE: I see that in example document, value field is zero. Maybe while drafting/editing I made a mistake.
Below is the result of past 2 mins. This shows value field is actually float.
Query:
GET some_metric/_search?size=3
{
"_source": ["value"],
"query": {
"bool": {
"must": [
{
"range": { "time": { "gte" : "now-2m", "lt": "now" } }
},
{
"match": {"name": "probe_http_duration_seconds"}
},
{
"match": {"labels.instance": "some-instance"}
}
]
}
}
}
Result:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 14.551308,
"hits" : [
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "7oog0G4Bk6EPplW1ibD1",
"_score" : 14.551308,
"_source" : {
"value" : 0.040022423
}
},
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "74og0G4Bk6EPplW1ibD1",
"_score" : 14.551308,
"_source" : {
"value" : 3.734E-5
}
},
{
"_index" : "some_metric-2019.12.04-000005",
"_type" : "_doc",
"_id" : "A4og0G4Bk6EPplW1ibH1",
"_score" : 14.551308,
"_source" : {
"value" : 0.015694122
}
}
]
}
}
What you see is just what you indexed in the source document. ES will never modify your source document. However, since the type is long as I thought then it will index that float value as a long and not as a float.
This usually happens when the very first document to be indexed has an integer value, such as 0, for instance.
You can either reindex your data with the proper mapping... Or since you have time-based indexes, just modify the dynamic template and tomorrow's index will be created correctly.

range query not working as intended [elasticsearch]

I am executing a simple range query. But I see that an empty result being returned. But I know that they are many records/documents that satisfy the query.
Below are the 3 types of queries I have tried.
(the third one is intended query)
1)
"query": {
"range" : {
"endTime" : {
"gte" : 1559076400.0
}
}
}
2)
"query": {
"bool": {
"must": [
{"range" : {
"endTime" : {
"gte" : 1559076401.0
}
}
}
]
}
}
3)
"query": {
"bool": {
"filter": [
{"range" : {
"startTime" : {
"gt" : 1356873300.0
}
}
},
{"range" : {
"endTime" : {
"gte" : 1559076401.0
}
}
}
]
}
All 3 queries return an empty response.
Hope you people can help. Thank you.
In elastic index, before inserting data, you you need define the fields mappings as date or numbers so that range searches can be applied.
Or keep dynamic mappings ON so that elastic can identify the field types automatically based on inserted data.
In case of latter, do check the auto generated mappings on your index.
Also check the date/timestamp format.
Steps to check mappings
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html
Since you are using epoch time, you need to mention that in the mapping. This is what I did. Basically the mapping and the way you stored the data mattered here. I am not sure if we can save any format as we want and query using any format we want. I will do some more research and update the answer if that can be done
1) created the mapping -- to show how the endTime mapping is done
2) inserting a few sample documents
3) queried the document using epoch time -- the way you wanted
Mapping
PUT so_test24
{
"mappings" : {
"_doc" : {
"properties" : {
"id" : {
"type" : "long"
},
"endTime" : {
"type" : "date",
"format": "epoch_millis"
}
}
}
}
}
Inserting the documents
POST /so_test24/_doc
{
"id": 1,
"endTime": "1546300800"
}
POST /so_test24/_doc
{
"id": 2,
"endTime": "1514764800"
}
POST /so_test24/_doc
{
"id": 3,
"endTime": "1527811200"
}
POST /so_test24/_doc
{
"id": 4,
"endTime": "1535760000"
}
The search Query
GET /so_test24/_search
{
"query": {
"range": {
"endTime": {"gte": "1532883892"}
}
}
}
The result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "so_test24",
"_type" : "_doc",
"_id" : "uFIq42sB4TH56W1h-jGu",
"_score" : 1.0,
"_source" : {
"id" : 1,
"endTime" : "1546300800"
}
},
{
"_index" : "so_test24",
"_type" : "_doc",
"_id" : "u1Iq42sB4TH56W1h-zEK",
"_score" : 1.0,
"_source" : {
"id" : 4,
"endTime" : "1535760000"
}
}
]
}
}

Elasticsearch - Conditional nested fetching

I have index mapping:
{
"dev.directory.3" : {
"mappings" : {
"profile" : {
"properties" : {
"email" : {
"type" : "string",
"index" : "not_analyzed"
},
"events" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "integer"
},
"name" : {
"type" : "string",
"index" : "not_analyzed"
},
}
}
}
}
}
}
}
with data:
"hits" : [ {
"_index" : "dev.directory.3",
"_type" : "profile",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"email" : "test#dummy.com",
"events" : [
{
"id" : 111,
"name" : "ABC",
},
{
"id" : 222,
"name" : "DEF",
}
],
}
}]
I'd like to filter only matched nested elements instead of returning all events array - is this possible in ES?
Example query:
{
"nested" : {
"path" : "events",
"query" : {
"bool" : {
"filter" : [
{ "match" : { "events.id" : 222 } },
]
}
}
}
}
Eg. If I query for events.id=222 there should be only single element on the result list returned.
What strategy for would be the best to achieve this kind of requirement?
You can use inner_hits to only get the nested records which matched the query.
{
"query": {
"nested": {
"path": "events",
"query": {
"bool": {
"filter": [
{
"match": {
"events.id": 222
}
}
]
}
},
"inner_hits": {}
}
},
"_source": false
}
I am also excluding the source to get only nested hits

Resources