Elasticsearch groovy script not working as expected - elasticsearch

My partial mapping of an index listing elasticsearch 2.5 (I know I have to upgrade to newer version and start using painless, let's keep that aside for this question)
"name": { "type": "string" },
"los": {
"type": "nested",
"dynamic": "strict",
"properties": {
"start": { "type": "date", "format": "yyyy-MM" },
"max": { "type": "integer" },
"min": { "type": "integer" }
}
}
I have only one document in my storage and that is as follows:
{
"name": 'foobar',
"los": [{
"max": 12,
"start": "2018-02",
"min": 1
},
{
"max": 8,
"start": "2018-03",
"min": 3
},
{
"max": 10,
"start": "2018-04",
"min": 2
},
{
"max": 12,
"start": "2018-05",
"min": 1
}
]
}
I have a a groovy script in my elastic search query as follows:
los_map = [doc['los.start'], doc['los.max'], doc['los.min']].transpose()
return los_map.size()
This groovy query ALWAYS returns 0, which is not possible, as I have one document, as mentioned above (even if I add multiple documents, it still returns 0) and los field is guaranteed to be present in every doc with multiple objects in it. So it seems the transpose which I am doing is not working correctly?
I also tried changing this line los_map = [doc['los.start'], doc['los.max'], doc['los.min']].transpose() to los_map = [doc['los'].start, doc['los'].max, doc['los'].min].transpose() then I get this error "No field found for [los] in mapping with types [listing]"
Does anyone have any idea how to get the transpose work?
By the way, if you are curious, my complete script is as follows:
losMinMap = [:]
losMaxMap = [:]
los_map = [doc['los.start'], doc['los.max'], doc['los.min']].transpose()
los_map.each {st, mx, mn ->
losMinMap[st] = mn
losMaxMap[st] = mx
}
return los_map['2018-05']
Thank you in advance.

Related

Elasticsearch alias not being created on index creation

I'm using the go-elasticsearch API in my application to create indices in an Elastic.co cloud cluster. The application dynamically creates an index with a template and then starts indexing documents. The template includes an alias name and look like this:
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"created_at": {
"type": "date"
},
"updated_at": {
"type": "date"
},
"status": {
"type": "keyword"
}
}
},
"aliases": {
"rollout-nodes-f0776f0": {}
}
}
The name of the alias can change, so we pass it to the template when we create a new index. This is done with the Create indices API in Go:
indexTemplate := getIndexTemplate()
res, err := n.client.Indices.Create(
indexName,
n.client.Indices.Create.WithBody(indexTemplate),
n.client.Indices.Create.WithContext(ctx),
n.client.Indices.Create.WithTimeout(time.Second),
)
Doing some testing, this code works on localhost (without security enabled) but is not working with the cluster in Elastic.co, the index is created but not the alias.
I think it should be a problem related with either the API Key permissions or some configuration in the server, but I was unable to find yet which permission I'm missing.
For more context, this is the API Key I'm using:
{
"id": "fakeID",
"name": "index-service-key",
"creation": 1675350573126,
"invalidated": false,
"username": "fakeUser",
"realm": "cloud-saml-kibana",
"metadata": {},
"role_descriptors": {
"logstash_writer": {
"cluster": [
"monitor",
"transport_client",
"read_ccr",
"read_ilm",
"manage_index_templates"
],
"indices": [
{
"names": [
"*"
],
"privileges": [
"all"
],
"allow_restricted_indices": false
}
],
"applications": [],
"run_as": [],
"metadata": {},
"transient_metadata": {
"enabled": true
}
}
}
}
Any ideas? I know I can use the POST _aliases API, but the index creation option should be working too.

Merge / flatten sub aggs into main agg

Is there away in elasticsearch to get the results back in a sort of flattend form (multiple child/sub aggs?
For instance currently i am trying to get back all product types and their status (online / offline).
This is what i end up with:
aggs
[
{ key: SuperProduct, doc_count:3, subagg:[
{status:online, doc_count:1},
{status:offline, doc_count:2}
]
},
{ key: SuperProduct2, doc_count:10, subagg:[
{status:online, doc_count:7},
{status:offline, doc_count:3}
]
Charting libraries tend to like it flattened so i was wondering if elasticsearch could probide it in this sort of manner:
[
{ products_key: 'SuperProduct', status_key:'online', doc_count:1},
{ products_key: 'SuperProduct', status_key:'offline', doc_count:2},
{ products_key: 'SuperProduct2', status_key:'online', doc_count:7},
{ products_key: 'SuperProduct2', status_key:'offline', doc_count:3}
]
Thanks
It is possible with composite aggregation which you can use to link two terms aggregations:
// POST /i/_search
{
"size": 0,
"aggregations": {
"distribution": {
"composite": {
"sources": [
{"product": {"terms": {"field": "product.keyword"}}},
{"status": {"terms": {"field": "status.keyword"}}}
]
}
}
}
}
This results in following structure:
{
"aggregations": {
"distribution": {
"after_key": {
"product": "B",
"status": "online"
},
"buckets": [
{
"key": {
"product": "A",
"status": "offline"
},
"doc_count": 3
},
{
"key": {
"product": "A",
"status": "online"
},
"doc_count": 2
},
{
"key": {
"product": "B",
"status": "offline"
},
"doc_count": 1
},
{
"key": {
"product": "B",
"status": "online"
},
"doc_count": 4
}
]
}
}
}
If for any reason composite aggregation doesn't fulfill your needs, you can create (via copy_to or by concatenation) or simulate (via scripted fields) field that would uniquely identify bucket. In our project we went with concatenation (partially for the necessity to collapse on this field), e.g. {"bucket": "SuperProductA:online"}, which results in dirtier output (you'll have to decode that field back or use top hits to get original values) but still does the job.

Elastic search Average time difference Aggregate Query

I have documents in elasticsearch in which each document looks something like as follows:
{
"id": "T12890ADSA12",
"status": "ENDED",
"type": "SAMPLE",
"updatedAt": "2020-05-29T18:18:08.483Z",
"events": [
{
"event": "STARTED",
"version": 1,
"timestamp": "2020-04-30T13:41:25.862Z"
},
{
"event": "INPROGRESS",
"version": 2,
"timestamp": "2020-05-14T17:03:09.137Z"
},
{
"event": "INPROGRESS",
"version": 3,
"timestamp": "2020-05-17T17:03:09.137Z"
},
{
"event": "ENDED",
"version": 4,
"timestamp": "2020-05-29T18:18:08.483Z"
}
],
"createdAt": "2020-04-30T13:41:25.862Z"
}
Now, I wanted to write a query in elasticsearch to get all the documents which are of type "SAMPLE" and I can get the average time between STARTED and ENDED of all those documents. Eg. Avg of (2020-05-29T18:18:08.483Z - 2020-04-30T13:41:25.862Z, ....). Assume that STARTED and ENDED event is present only once in events array. Is there any way I can do that?
You can do something like this. The query selects the events of type SAMPLE and status ENDED (to make sure there is a ENDED event). Then the avg aggregation uses scripting to gather the STARTED and ENDED timestamps and subtracts them to return the number of days:
POST test/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"status.keyword": "ENDED"
}
},
{
"term": {
"type.keyword": "SAMPLE"
}
}
]
}
},
"aggs": {
"duration": {
"avg": {
"script": "Map findEvent(List events, String type) {return events.find(it -> it.event == type);} def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp); def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp); return ChronoUnit.DAYS.between(started, ended);"
}
}
}
}
The script looks like this:
Map findEvent(List events, String type) {
return events.find(it -> it.event == type);
}
def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp);
def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp);
return ChronoUnit.DAYS.between(started, ended);

spring mongodb criteria API: check two values on the same nested element

I have the following query:
Criteria crit = Criteria.where("nestedObj.date").lt(LocalDate.now())
.and("nestedObj.active").is(true)
.and("someId").is(null)
.and("somethingElse").exists(false);
How can I make sure that nestedObj.active and nestedObj.date are checked on the same nestedObj?
I only want this to match if a document has a nestedObj that is active AND has a date older than today.
Example:
If the nestedObj array on a document loos like this, the query should match:
[
{
"nestedObj": {
"active": "true",
"date": "2010-29-10"
},
{
"nestedObj": {
"active": "false",
"date": "2010-29-10"
},
{
"nestedObj": {
"active": "true",
"date": "2022-29-10"
}
]
But if it looks like this, it shouldn't:
[
{
"nestedObj": {
"active": "false",
"date": "2010-29-10"
},
{
"nestedObj": {
"active": "true",
"date": "2022-29-10"
}
]
Check the element match in https://docs.mongodb.com/manual/reference/operator/query/elemMatch/
for instance
where("nestedObj.date").elemMatch( where("attribute1").is("value1").and("attribute2").regex("(?i).*$something.*")

Children are not mapping properly in elastic to parents

"chods": {
"mappings": {
"chod": {
"properties": {
"state": {
"type": "text"
}
}
},
"chods": {},
"variant": {
"_parent": {
"type": "chod"
},
"_routing": {
"required": true
},
"properties": {
"percentage": {
"type": "double"
}
}
}
}
},
When I execute:
PUT /chods/variant/565?parent=36442
{ // some data }
It returns:
{
"_index":"chods",
"_type":"variant",
"_id":"565",
"_version":6,
"result":"updated",
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"created":false
}
But when I run this query:
GET /chods/variant/565?parent=36442
It returns variant with parent=36443
{
"_index": "chods",
"_type": "variant",
"_id": "565",
"_version": 7,
"_routing": "36443",
"_parent": "36443",
"found": true,
"_source": {
...
}
}
Why it returns with parent 36443 and not 36442?
When I tried to reproduce this with your steps, I got the expected result (version=36442). I noticed that after your PUT of the document with "_parent": "36442" the output is "_version":6. In your GET of the document, "_version": 7 is returned. Is it possible that you posted another version of the document?
I also noticed that GET /chods/variant/565?parent=36443 would not actually filter by the parent id - the query parameter is disregarded. If you actually want to filter by parent id, this is the query you're looking for:
GET /chods/_search
{
"query": {
"parent_id": {
"type": "variant",
"id": "36442"
}
}
}
As #fylie pointed out the main problem is that if you use same id of the document you will get your document overridden by last version - sort of
Lets say that we have index /tests and type "a" which is child of type "test" and we do following commands:
PUT /tests/a/50?parent=25
{
"item": "C"
}
PUT /tests/a/50?parent=26
{
"item": "D"
}
PUT /tests/a/50?parent=50
{
"item": "E",
"item2": "F",
}
What the result will be? Well it can result in creating 1 - 3 documents.
If it will route to the same shard, you will end up with one document, which will have 3 versions.
If it will route to 3 different shards, you will end up with 3 new documents.

Resources