Is it possible to add data in Elastic Search from a filter? - elasticsearch

I have an API backed by Elastic Search. Depending on login/password automatically a diferent filter is applied.
Elastic search index contains:
"organisation.id"
"organisation.name"
"organisation.country"
"shop.id"
"shop.name"
"shop.address"
"creationdatetime"
This would be a sample filter:
{
"_source":{
"includes":[
"organisation.id",
"organisation.name",
"shop.id",
"shop.name"
"creationdatetime"
],
"excludes": [
"shop.address",
"organisation.country"
]
},
"from":"0",
"size":"500",
"sort":{"creationdatetime":"asc"},
"query":{
"bool":{
"must":{
"match":{
"shop.sharedwith":"client1"
}
},
"filter":{
"range":{
"creationdatetime":{
"gte":"2020-01-01"
}
}
}
}
}
}
Output would be
{
"total": 2,
"from": "0",
"size": "10",
"hops": [
{
"organisation": [
{
"name": "A1",
"id": "0001-A1"
}
],
"shop": [
{
"name": "A1Shop",
"id": "0001-0001-A1"
}
]
}
]
}
I would like to add a "version" and "filtername" to the output... coming from the filter itself.
Exactly this:
{
"total": 2,
"from": "0",
"size": "10",
"version": "1.0.0.0", // -------------------------------NEW FIELD
"filtername": "filter01", // -------------------------------NEW FIELD
"hops": [
{
"organisation": [
{
"name": "A1",
"id": "0001-A1"
}
],
"shop": [
{
"name": "A1Shop",
"id": "0001-0001-A1"
}
]
}
]
}
Is it possible to add those two extra outputs from the filter itself?

This is not directly possible but there's a workaround using a top_hits aggregation in combination with agg metadata:
GET _search
{
"size": 0, // no need for the standard hits b/c of our `top_hits`
"query": {
"match_all": {} // your actual query
},
"aggs": {
"my_hits": {
"top_hits": {
"size": 10,
"_source": {
"includes": [
"organisation.id",
"organisation.name",
"shop.id",
"shop.name",
"creationdatetime"
],
"excludes": [
"shop.address",
"organisation.country"
]
}
},
"meta": { // custom key-value pairs
"version": "1.0.0.0",
"filtername": "filter01"
}
}
}
}
resulting in
{
...
"aggregations": {
"my_hits": {
"meta": {
"version": "1.0.0.0",
"filtername": "filter01"
},
"hits": {
... // the actual docs
}
}
}
}
It's also worth looking at named queries although their use here is very loosely applicable.

Related

Cannot seem to use must and must_not together in an elastic search query

If I run the following query:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "boxing",
"fuzziness": 2,
"minimum_should_match": 2
}
}
],
"must_not": [
{
"terms_set": {
"allowedCountries": {
"terms": ["gb", "mx"],
"minimum_should_match_script": {
"source": "2"
}
}
}
}
],
"filter": [
{
"range": {
"expireTime": {
"gt": 1674061907954
}
}
},
{
"term": {
"region": {
"value": "row"
}
}
},
{
"term": {
"sourceType": {
"value": "article"
}
}
}
]
}
}
}
against an index with articles that look like:
{
"_index": "content-items-v10",
"_type": "_doc",
"_id": "e7hm75ui4dma1mm4j8q5v7914",
"_score": 4.3724976,
"_source": {
"allowedCountries": ["gb", "ie"],
"body": "Both Joshua Buatsi and Craig Richards join The DAZN Boxing Show ahead of their clash at London's O2 Arena. Matchroom's Eddie Hearn also gives his take on the night, as well as Chantelle Cameron previewing her contest with Victoria Noelia Bustos.",
"competitions": [
{
"id": "8lo6205qyio0fksjx9glqbdhj",
"name": "Buatsi v Richards"
}
],
"contestants": [
{
"id": "7rq59j3eiamxlm12vhxcsgujj",
"name": "Joshua Buatsi"
},
{
"id": "boby9oqe23g6qyuwphrxh8su5",
"name": "Craig Richards"
}
],
"countries": [
{
"id": "7yasa43laq1nb2e6f8bfuvxed",
"name": "World"
},
{
"id": "258l9t5sm55592i08mdpqzr3t",
"name": "United Kingdom"
}
],
"dotsLastUpdateTime": 1673979749396,
"expireTime": 4800000000000,
"fixtureDate": {},
"headline": "Buatsi vs. Richards: Preview",
"id": "e7hm75ui4dma1mm4j8q5v7914",
"importance": 0,
"languageKeys": ["en"],
"languages": ["en"],
"lastUpdateTime": {
"ts": 1653088281000,
"iso8601": "2022-05-20T23:11:21.000Z"
},
"promoImageUrl": null,
"publication": {
"typeId": "1plcw0iyhx9vn1fcanbm2ja3rf",
"typeName": "Shoulder"
},
"publishedTime": {
"ts": 1653088281000,
"iso8601": "2022-05-20T23:11:21.000Z"
},
"region": "row",
"shortHeadline": null,
"sourceType": "article",
"sports": [
{
"id": "2x2oqzx60orpoeugkd754ga17",
"name": "Boxing"
}
],
"teaser": "",
"thumbnailImageUrl": "https://images.daznservices.com/di/library/babcock_canada/45/3e/the-dazn-boxing-show-20052022_xc4jbfqi022l1shq9lu641h9e.png?t=-477976832",
"translations": {}
}
}
I get the following validation error from elasticsearch:
{
"ok": false,
"errors": {
"validation": [
{
"message": "\"query.bool.must_not\" is not allowed",
"path": [
"query",
"bool",
"must_not"
],
"type": "object.unknown",
"context": {
"child": "must_not",
"label": "query.bool.must_not",
"value": [
{
"terms_set": {
"allowedCountries": {
"terms": [
"gb",
"mx"
],
"minimum_should_match_script": {
"source": "2"
}
}
}
}
],
"key": "must_not"
}
}
]
},
"correlationId": "d29e9275-9ab3-4ff8-944d-852b98d4b503"
}
And I cannot figure out what the issue might be! From the elastic docs it should be OK.
I'm using ElasticSearch 7.9.3 running in a local docker container.
I'm hoping someone out there will give me a clue!
Cheers!
I would expect this to just work.
I'm trying to filter out articles that have both of the country codes gb and mx in the field allowedCountries.
I can include them easily enough in the results when I add the terms_set query to the bool.must section of the query.
It works well, you just need to enclose your query in the query section
{
"query": { <--- add this
"bool": { <--- your query starts here
"must": [
...
Thank you for responding!
I was helping with a system I did not have full context on - it turns out there is a proxy in the mix with validation that was blocking the must_not query. So, with the proxy fixed, it now works.

elasticsearch Saved Search with Group by

index_name: my_data-2020-12-01
ticket_number: T123
ticket_status: OPEN
ticket_updated_time: 2020-12-01 12:22:12
index_name: my_data-2020-12-01
ticket_number: T124
ticket_status: OPEN
ticket_updated_time: 2020-12-01 12:32:11
index_name: my_data-2020-12-02
ticket_number: T123
ticket_status: INPROGRESS
ticket_updated_time: 2020-12-02 12:33:12
index_name: my_data-2020-12-02
ticket_number: T125
ticket_status: OPEN
ticket_updated_time: 2020-12-02 14:11:45
I want to create a saved search with group by ticket_number field get unique doc with latest ticket status (ticket_status). Is it possible?
You can simply query again, I am assuming you are using Kibana for visualization purpose. in your query, you need to filter based on the ticket_number and sort based on ticket_updated_time.
Working example
Index mapping
{
"mappings": {
"properties": {
"ticket_updated_time": {
"type": "date"
},
"ticket_number" :{
"type" : "text"
},
"ticket_status" : {
"type" : "text"
}
}
}
}
Index sample docs
{
"ticket_number": "T123",
"ticket_status": "OPEN",
"ticket_updated_time": "2020-12-01T12:22:12"
}
{
"ticket_number": "T123",
"ticket_status": "INPROGRESS",
"ticket_updated_time": "2020-12-02T12:33:12"
}
Now as you can see, both the sample documents belong to the same ticket_number with different status and updated time.
Search query
{
"size" : 1, // fetch only the latest status document, if you remove this, will get other ticket with different status.
"query": {
"bool": {
"filter": [
{
"match": {
"ticket_number": "T123"
}
}
]
}
},
"sort": [
{
"ticket_updated_time": {
"order": "desc"
}
}
]
}
And search result
"hits": [
{
"_index": "65180491",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"ticket_number": "T123",
"ticket_status": "INPROGRESS",
"ticket_updated_time": "2020-12-02T12:33:12"
},
"sort": [
1606912392000
]
}
]
If you need to group by ticket_number field, then you can use aggregation as well
Index Mapping:
{
"mappings": {
"properties": {
"ticket_updated_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
Search Query:
{
"size": 0,
"aggs": {
"unique_id": {
"terms": {
"field": "ticket_number.keyword",
"order": {
"latestOrder": "desc"
}
},
"aggs": {
"latestOrder": {
"max": {
"field": "ticket_updated_time"
}
}
}
}
}
}
Search Result:
"buckets": [
{
"key": "T125",
"doc_count": 1,
"latestOrder": {
"value": 1.606918305E12,
"value_as_string": "2020-12-02 14:11:45"
}
},
{
"key": "T123",
"doc_count": 2,
"latestOrder": {
"value": 1.606912392E12,
"value_as_string": "2020-12-02 12:33:12"
}
},
{
"key": "T124",
"doc_count": 1,
"latestOrder": {
"value": 1.606825931E12,
"value_as_string": "2020-12-01 12:32:11"
}
}
]

Distinct values from array-field matching filter in Elasticsearch 2.4

In short: I want to lookup for distinct values in some field of the document BUT only matching some filter. The problem is in array-fields.
Imagine there are following documents in ES 2.4:
[
{
"states": [
"Washington (US-WA)",
"California (US-CA)"
]
},
{
"states": [
"Washington (US-WA)"
]
}
]
I'd like my users to be able to lookup all possible states via typeahead, so I have the following query for the "wa" user request:
{
"query": {
"wildcard": {
"states.raw": "*wa*"
}
},
"aggregations": {
"typed": {
"terms": {
"field": "states.raw"
},
"aggregations": {
"typed_hits": {
"top_hits": {
"_source": { "includes": ["states"] }
}
}
}
}
}
}
states.raw is a sub-field with not_analyzed option
This query works pretty well unless I have an array of values like in the example - it returns both Washington and California. I do understand why it happens (query and aggregations are working on top of the document and the document contains both, even though only one option matched the filter), but I really want to only see Washington and don't want to add another layer of filtering on the application side for the ES results.
Is there a way to do so via single ES 2.4 request?
You could use the "Filtering Values" feature (see https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-terms-aggregation.html#_filtering_values_2).
So, your request could look like:
POST /index/collection/_search?size=0
{
"aggregations": {
"typed": {
"terms": {
"field": "states.raw",
"include": ".*wa.*" // You need to carefully quote the "wa" string because it'll be used as part of RegExp
},
"aggregations": {
"typed_hits": {
"top_hits": {
"_source": { "includes": ["states"] }
}
}
}
}
}
}
I can't hold myself back, though, and not tell you that using wildcard with leading wildcard is not the best solution. Do, please please, consider using ngrams for this:
PUT states
{
"settings": {
"analysis": {
"filter": {
"ngrams": {
"type": "nGram",
"min_gram": "2",
"max_gram": "20"
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"ngrams"
],
"tokenizer": "standard"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"location": {
"properties": {
"states": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
},
"ngrams": {
"type": "string",
"analyzer": "ngram_analyzer"
}
}
}
}
}
}
}
}
}
POST states/doc/1
{
"text":"bla1",
"location": [
{
"states": [
"Washington (US-WA)",
"California (US-CA)"
]
},
{
"states": [
"Washington (US-WA)"
]
}
]
}
POST states/doc/2
{
"text":"bla2",
"location": [
{
"states": [
"Washington (US-WA)",
"California (US-CA)"
]
}
]
}
POST states/doc/3
{
"text":"bla3",
"location": [
{
"states": [
"California (US-CA)"
]
},
{
"states": [
"Illinois (US-IL)"
]
}
]
}
And the final query:
GET states/_search
{
"query": {
"term": {
"location.states.ngrams": {
"value": "sh"
}
}
},
"aggregations": {
"filtering_states": {
"terms": {
"field": "location.states.raw",
"include": ".*sh.*"
},
"aggs": {
"typed_hits": {
"top_hits": {
"_source": {
"includes": [
"location.states"
]
}
}
}
}
}
}
}

Aggregation on geo_piont elasticsearch

Is there a way to aggregate on a geo_point field and to receive the actual lat long?
all i managed to do is get the hash geo.
what i did so far:
creating the index
PUT geo_test
{
"mappings": {
"sharon_test": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
adding X docs with different lat long
POST geo_test/sharon_test
{
"location": {
"lat": 45,
"lon": -7
}
}
ran this agg:
GET geo_test/sharon_test/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"aggs": {
"locationsAgg": {
"geohash_grid": {
"field": "location",
"precision" : 12
}
}
}
}
i got this result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "geo_test",
"_type": "sharon_test",
"_id": "fGb4uGEBfEDTRjcEmr6i",
"_score": 1,
"_source": {
"location": {
"lat": 41.12,
"lon": -71.34
}
}
},
{
"_index": "geo_test",
"_type": "sharon_test",
"_id": "oWb4uGEBfEDTRjcE7b6R",
"_score": 1,
"_source": {
"location": {
"lat": 4,
"lon": -7
}
}
}
]
},
"aggregations": {
"locationsAgg": {
"buckets": [
{
"key": "ebenb8nv8nj9",
"doc_count": 1
},
{
"key": "drm3btev3e86",
"doc_count": 1
}
]
}
}
}
I want to know if i can get one of the 2:
1. convert the "key" that is currently representing as a geopoint hash to the sources lat/long
2. show the lat, long in the aggregation in the first place
Thanks!
P.S
I also tried the other geo aggregations but all they give me is the number of docs that fit my aggs conditions, i need the actual values
E.G
wanted this aggregation to return all the locations i had in my index, but it only returned the count
GET geo_test/sharon_test/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"aggs": {
"distanceRanges": {
"geo_distance": {
"field": "location",
"origin": "50.0338, 36.2242 ",
"unit": "meters",
"ranges": [
{
"key": "All Locations",
"from": 1
}
]
}
}
}
}
You can actually use geo_bounds inside the geo_hash to get a bounding box to narrow it down precisely but to get the exact location you will need to decode the geohash
GET geo_test/sharon_test/_search
{
"query":{
"bool":{
"must":[
{
"match_all":{
}
}
]
}
},
"aggs":{
"locationsAgg":{
"geohash_grid":{
"field":"location",
"precision":12
},
"aggs":{
"cell":{
"geo_bounds":{
"field":"location"
}
}
}
}
}
}

ElasticSearch Aggregation on nested field with bucketing on parent id

Following is my doc structure
'Order': {
u'properties': {
u'order_id': {u'type': u'integer'},
'Product': {
u'properties': {
u'product_id': {u'type': u'integer'},
u'product_category': {'type': 'text'},
},
u'type': u'nested'
}
}
}
Doc1
"Order": {
"order_id": "1",
"Product": [
{
"product_id": "1",
"product_category": "category_1"
},
{
"product_id": "2",
"product_category": "category_2"
},
{
"product_id": "3",
"product_category": "category_2"
},
]
}
Doc2
"Order": {
"order_id": "2",
"Product": [
{
"product_id": "4",
"product_category": "category_1"
},
{
"product_id": "1",
"product_category": "category_1"
},
{
"product_id": "2",
"product_category": "category_2"
},
]
}
I want to get following output
"aggregations": {
"Order": [
{
"order_id": "1"
"category_counts": [
{
"category_1": 1
},
{
"category_2": 2
},
]
},
{
"order_id": "1"
"category_counts": [
{
"category_1": 2
},
{
"category_2": 1
},
]
},
]
}
I tried using nested aggregation
"aggs": {
"Product-nested": {
"nested": {
"path": "Product"
}
"aggs": {
"category_counts": {
"terms": {
"field": "Product.product_category"
}
}
},
}
}
It does not give output for each order but gives combined output for all orders
{
"Product-nested": {
"category_counts": [
"category_1": 3,
"category_2": 3
]
}
}
I have two questions:
How to get the desired output in above scenario?
What if instead of single product_category I have an array of
product_categories then how will we achieve the same in this
scenario?
I am using elasticsearch >= 5.0
I have an idea but i dont think its the best one..
you can make a terms aggregation on the "order_id" field, then a sub nestes aggregation on "Product.product_category".
somthing like this :
{
"aggs": {
"all-order-id": {
"terms": {
"field": "order_id",
"size": 10
},
"aggs": {
"Product-nested": {
"nested": {
"path": "Product"
},
"aggs": {
"all-products-in-order-id": {
"terms": {
"field": "Product.product_category"
}
}
}
}
}
}
}
}
sorry its lock bit messy i'm not so good with this answer editor

Resources