Is it possible to use a query result into another query in ElasticSearch? - elasticsearch

I have two queries that I want to combine, the first one returns a document with some fields.
Now I want to use one of these fields into the new query without creating two separates ones.
Is there a way to combine them in order to accomplish my task?
This is the first query
{
"_source": {
"includes": [
"data.session"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"field1": "9419"
}
},
{
"match": {
"field2": "5387"
}
}
],
"filter": [
{
"range": {
"timestamp": {
"time_zone": "+00:00",
"gte": "2020-10-24 10:16",
"lte": "2020-10-24 11:16"
}
}
}
]
}
},
"size" : 1
}
And this is the response returned:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 109,
"relation": "eq"
},
"max_score": 3.4183793,
"hits": [
{
"_index": "file",
"_type": "_doc",
"_id": "UBYCkgsEzLKoXh",
"_score": 3.4183793,
"_source": {
"data": {
"session": "123456789"
}
}
}
]
}
}
I want to use that "data.session" into another query, instead of rewriting the value of the field by passing the result of the first query.
{
"_source": {
"includes": [
"data.session"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"data.session": "123456789"
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "asc"
}
}
]
}

If you mean to use the result of the first query as an input to the second query, then it's not possible in Elasticsearch. But if you share your query and use-case, we might suggest you better way.

ElasticSearch does not allow sub queries or inner queries.

Related

Elasticsearch - How do i search on 2 fields. 1 must be null and other must match search text

I am trying to do a search on elasticsearch 6.8.
I don't have control over the elastic search instance, meaning i cannot control how the data is indexed.
I have data structured like this when i do a match. all search:
{ "took": 4,
"timed_out": false,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 15.703552,
"hits": [ {
"_index": "(removed index)",
"_type": "_doc",
"_id": "******** (Removed id)",
"_score": 15.703552,
"_source": {
"VCompany": {
"cvrNummer": 12345678,
"penheder": [
{
"pNummer": 1234567898,
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
}
],
"vMetadata": {
"nyesteNavn": {
"navn": "company1",
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
},
}
}
}
}
}]
The json might not be fully complete because i removed some unneeded data. So what I am trying to do is search where: "vCompany.vMetaData.nyesteNavn.gyldigTil" is null and where "vCompany.vMetaData.nyesteNavn.navn" will match a text string.
I tried something like this:
{
"query": {
"bool": {
"must": [
{"match": {"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"}}
],
"should": {
"terms": {
"Vrvirksomhed.penheder.periode.gyldigTil": null
}
}
}
}
You need to use must_not with exists query like below to check if field is null or not. Below query will give result where company1 is matching and Vrvirksomhed.penheder.periode.gyldigTil field is null.
{
"query": {
"bool": {
"must": [
{
"match": {
"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"
}
}
],
"must_not": [
{
"exists": {
"field": "Vrvirksomhed.penheder.periode.gyldigTil"
}
}
]
}
}
}

Elasticsearch separate aggregation based on values from first

I'm using a Elasticsearch 6.8.8 and trying to aggregate the number of entities and relationships over a given time period.
Here is the data structure and examples values of the index:
date entityOrRelationshipId startId endId type
=========================================================================
DATETIMESTAMP ENT1_ID null null ENTITY
DATETIMESTAMP ENT2_ID null null ENTITY
DATETIMESTAMP ENT3_ID null null ENTITY
DATETIMESTAMP REL1_ID ENT1_ID ENT2_ID RELATIONSHIP
DATETIMESTAMP REL2_ID ENT3_ID ENT1_ID RELATIONSHIP
etc.
For a given entity ID, I want to get the top 50 relationships. I have started with the following query.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "2020-04-01T00:00:00.000+00:00",
"lt": "2020-04-28T00:00:00.000+00:00"
}
}
}
]
}
},
"aggs": {
"my_rels": {
"filter": {
"bool": {
"must": [
{
"term": {
"type": "RELATIONSHIP"
}
},
{
"bool": {
"should": [
{
"term": {"startId": "ENT1_ID"}
},
{
"term": {"endId": "ENT1_ID"}
}
]
}
}
]
}
},
"aggs": {
"my_rels2": {
"terms": {
"field": "entityOrRelationshipId",
"size": 50
},
"aggs": {
"my_rels3": {
"top_hits": {
"_source": {
"includes": ["startId","endId"]
},
"size": 1
}
}
}
}
}
}
}
}
This produces the following results:
{
"took": 54,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 93122,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"my_rels": {
"doc_count": 332,
"my_rels2": {
"doc_count_error_upper_bound": 6,
"sum_other_doc_count": 259,
"buckets": [
{
"key": "REL1_ID",
"doc_count": 47,
"my_rels3": {
"hits": {
"total": 47,
"max_score": 1.0,
"hits": [
{
"_index": "trends",
"_type": "trend",
"_score": 1.0,
"_source": {
"endId": "ENT2_ID",
"startId": "ENT1_ID"
}
}
]
}
}
},
{
"key": "REL2_ID",
"doc_count": 26,
"my_rels3": {
"hits": {
"total": 26,
"max_score": 1.0,
"hits": [
{
"_index": "trends",
"_type": "trend",
"_score": 1.0,
"_source": {
"endId": "ENT1_ID",
"startId": "ENT3_ID"
}
}
]
}
}
}
]
}
}
}
}
This lists the top 50 relationships. For each relationship it lists the relationship ID, the count and the entity ids (startId, endId). What I would like to do now is produce another aggregation of entity counts for those distinct entities. Ideally this would not be a nested aggregation but a separate one using the rel ids identified in the first aggregation.
Is that possible to do in this query?
Unfortunately you cannot aggregate over the results of top_hits in Elasticsearch.
Here is the link to GitHub issue.
You can have other aggregation on a parallel level of top_hit but you cannot have any sub_aggregation below top_hit.
You can have a parallel level aggregation like:
"aggs": {
"top_hits_agg": {
"top_hits": {
"size": 10,
"_source": {
"includes": ["score"]
}
}
},
"avg_agg": {
"avg": {
"field": "score"
}
}
}

Aggregations and filters in Elastic - find the last hits and filter them afterwards

I'm trying to work with Elastic (5.6) and to find a way to retrieve the top documents per some category.
I have an index with the following kind of documents :
{
"#timestamp": "2018-03-22T00:31:00.004+01:00",
"statusInfo": {
"status": "OFFLINE",
"timestamp": 1521675034892
},
"name": "myServiceName",
"id": "xxxx",
"type": "Http",
"key": "key1",
"httpStatusCode": 200
}
}
What i'm trying to do with these, is retrieve the last document (#timestamp-based) per name (my categories), see if its statusInfo.status is OFFLINE or UP and fetch these results into the hits part of a response so I can put it in a Kibana count dashboard or somewhere else (a REST based tool I do not control and can't modify by myself).
Basically, I want to know how many of my services (name) are OFFLINE (statusInfo.status) in their last update (#timestamp) for monitoring purposes.
I'm stuck at the "Get how many of my services" part.
My query so far:
GET actuator/_search
{
"size": 0,
"aggs": {
"name_agg": {
"terms": {
"field": "name.raw",
"size": 1000
},
"aggs": {
"last_document": {
"top_hits": {
"_source": ["#timestamp", "name", "statusInfo.status"],
"size": 1,
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
}
}
}
},
"post_filter": {
"bool": {
"must_not": {
"term": {
"statusInfo.status.raw": "UP"
}
}
}
}
}
This provides the following response:
{
"all_the_meta":{...},
"hits": {
"total": 1234,
"max_score": 0,
"hits": []
},
"aggregations": {
"name_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "myCategory1",
"doc_count": 225,
"last_document": {
"hits": {
"total": 225,
"max_score": null,
"hits": [
{
"_index": "myIndex",
"_type": "Http",
"_id": "dummy id",
"_score": null,
"_source": {
"#timestamp": "2018-04-06T00:06:00.005+02:00",
"statusInfo": {
"status": "UP"
},
"name": "myCategory1"
},
"sort": [
1522965960005
]
}
]
}
}
},
{other_buckets...}
]
}
}
}
Removing the size make the result contain ALL of the documents, which is not what I need, I only need each bucket content (every one contains one bucket).
Removing the post filter does not appear to do much.
I think this would be feasible in ORACLE SQL with a PARTITION BY OVER clause, followed by a condition.
Does somebody know how this could be achieved ?
If I understand you correctly, you are looking for the latest doc that have status of OFFLINE in each group (grouped by name)?. In that case you can try the query below and the number of items in the bucket should give you the "how many are down" (for up you would change the term in the filter)
NOTE: this is done in latest version, so it uses keyword field instead of raw
POST /index/_search
{
"size": 0,
"query":{
"bool":{
"filter":{
"term": {"statusInfo.status.keyword": "OFFLINE"}
}
}
},
"aggs":{
"services_agg":{
"terms":{
"field": "name.keyword"
},
"aggs":{
"latest_doc":{
"top_hits": {
"sort": [
{
"#timestamp":{
"order": "desc"
}
}
],
"size": 1,
"_source": ["#timestamp", "name", "statusInfo.status"]
}
}
}
}
}
}

Aggregation on geo_piont elasticsearch

Is there a way to aggregate on a geo_point field and to receive the actual lat long?
all i managed to do is get the hash geo.
what i did so far:
creating the index
PUT geo_test
{
"mappings": {
"sharon_test": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
adding X docs with different lat long
POST geo_test/sharon_test
{
"location": {
"lat": 45,
"lon": -7
}
}
ran this agg:
GET geo_test/sharon_test/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"aggs": {
"locationsAgg": {
"geohash_grid": {
"field": "location",
"precision" : 12
}
}
}
}
i got this result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "geo_test",
"_type": "sharon_test",
"_id": "fGb4uGEBfEDTRjcEmr6i",
"_score": 1,
"_source": {
"location": {
"lat": 41.12,
"lon": -71.34
}
}
},
{
"_index": "geo_test",
"_type": "sharon_test",
"_id": "oWb4uGEBfEDTRjcE7b6R",
"_score": 1,
"_source": {
"location": {
"lat": 4,
"lon": -7
}
}
}
]
},
"aggregations": {
"locationsAgg": {
"buckets": [
{
"key": "ebenb8nv8nj9",
"doc_count": 1
},
{
"key": "drm3btev3e86",
"doc_count": 1
}
]
}
}
}
I want to know if i can get one of the 2:
1. convert the "key" that is currently representing as a geopoint hash to the sources lat/long
2. show the lat, long in the aggregation in the first place
Thanks!
P.S
I also tried the other geo aggregations but all they give me is the number of docs that fit my aggs conditions, i need the actual values
E.G
wanted this aggregation to return all the locations i had in my index, but it only returned the count
GET geo_test/sharon_test/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"aggs": {
"distanceRanges": {
"geo_distance": {
"field": "location",
"origin": "50.0338, 36.2242 ",
"unit": "meters",
"ranges": [
{
"key": "All Locations",
"from": 1
}
]
}
}
}
}
You can actually use geo_bounds inside the geo_hash to get a bounding box to narrow it down precisely but to get the exact location you will need to decode the geohash
GET geo_test/sharon_test/_search
{
"query":{
"bool":{
"must":[
{
"match_all":{
}
}
]
}
},
"aggs":{
"locationsAgg":{
"geohash_grid":{
"field":"location",
"precision":12
},
"aggs":{
"cell":{
"geo_bounds":{
"field":"location"
}
}
}
}
}
}

Multiple Match Phrase Prefixes Return Zero Results In Elasticsearch

I have the following Elasticsearch, version 2.3, query which produces zero results.
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
Output from above query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Output of above query with _explain
{
"_index": "index_name",
"_type": "doc_type",
"_id": "_explain",
"_version": 4,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}
However, when I do either of the following I get results including the one document that matches both parts of the above query. If I include the full phone number then the document will appear in the results.
Phone numbers are stored as strings without any formatting. i.e. "1234567890".
Any reason why the two prefix query returns zero results?
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
}
]
}
}
}
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
I was able to get the results I wanted by changing the phone number query to a regexp query instead of a match_phrase_prefix query.
{
"query": {
"bool": {
"must": [
{
"regexp": {
"phone": "123[0-9]+"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}

Resources