Transform an array into an object in JSONata - jsonata

Here's my input data:
$array := [
{
"thing": "one",
"stuff": {
"type": "car",
"engine": "vrooom"
}
},
{
"thing": "two",
"stuff": {
"type": "truck",
"engine": "vrooom"
}
},
{
"thing": "three",
"stuff": {
"type": "car",
"engine": "vroom"
}
}
];
Ultimately I want the output to look like this:
{
"car": [
{
"thing": "one",
"stuff": {
"type": "car",
"engine": "vroom"
}
},
{
"thing": "three",
"stuff": {
"type": "car",
"engine": "vroom"
}
}
],
"truck": [
{
"thing": "two",
"stuff": {
"type": "truck",
"engine": "vrooom"
}
},
]
}
I've tried like this:
$reduce($array, function($accumulator, $value, $index){
$merge([$accumulator, {$value.stuff.type: [$accumulator.$value, $value]}])
}, {} )
but the output looks like this:
{
"car": [
{
"thing": "three",
"stuff": {
"type": "car",
"engine": "vroom"
}
},
{
"thing": "three",
"stuff": {
"type": "car",
"engine": "vroom"
}
}
],
"truck": [
{
"thing": "two",
"stuff": {
"type": "truck",
"engine": "vrooom"
}
},
{
"thing": "two",
"stuff": {
"type": "truck",
"engine": "vrooom"
}
}
]
}
I feel like it has something to do with $accumulator.$value but I'm not sure how to get the syntax proper.
Here's a playground to try: https://try.jsonata.org/gM7o-EOV4

Try this:
${
stuff.type: [$]
}
See https://try.jsonata.org/VoX6C9N8V

Related

Order documents by multiple geolocations

I am new to ElasticSearch and I try to create an index for companies that come with multiple branches in the city.
Each of the branches, it has its own geolocation point.
My companies document looks like this:
{
"company_name": "Company X",
"branch": [
{
"address": {
// ... other fields
"location": "0.0000,1.1111"
}
}
]
}
The index have the following mapping:
{
"companies": {
"mappings": {
"dynamic_templates": [
{
"ids": {
"match": "id",
"match_mapping_type": "long",
"mapping": {
"type": "long"
}
}
},
{
"company_locations": {
"match": "location",
"match_mapping_type": "string",
"mapping": {
"type": "geo_point"
}
}
}
],
"properties": {
"branch": {
"properties": {
"address": {
"properties": {
// ...
"location": {
"type": "geo_point"
},
// ...
}
},
}
}
}
}
}
}
Now, in the ElasticSearch I've indexed the following documents:
{
"company_name": "Company #1",
"branch": [
{
"address": {
"location": "39.615,19.8948"
}
}
]
}
and
{
"company_name": "Company #2",
"branch": [
{
"address": {
"location": "39.586,19.9028"
}
},
{
"address": {
"location": "39.612,19.9134"
}
},
{
"address": {
"location": "39.607,19.8946"
}
}
]
}
Now what is my problem. If I try to run the following search query, unfortunately the company displayed first is the Company #2 although the geodistance query has the location data of the Company #1:
GET companies/_search
{
"fields": [
"company_name",
"branch.address.location"
],
"_source": false,
"sort": [
{
"_geo_distance": {
"branch.address.location": {
"lon": 39.615,
"lat": 19.8948
},
"order": "asc",
"unit": "km"
}
}
]
}
Am I doing something wrong? Is there a way to sort the search results using this method?
Please keep in mind that if for example search with a geolocation that is more close to some geolocations of the "Comapny #2", in this case I need the Company #2 to be first.
Finally, if the setup I have isn't correct for what I require, if there's any other way to achieve that same result with different document structure, please let me know. I am still in the beginning of the project, and It's simple to adapt to what is more appropriate.
The documentation here says "Geopoint expressed as a string with the format: "lat,lon"."
Your location is "location": "39.615,19.8948", maybe the query must be below:
"branch.address.location": {
"lat": 39.615,
"lon": 19.8948
}
My Tests:
PUT idx_test
{
"mappings": {
"properties": {
"branch": {
"properties": {
"address": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
}
POST idx_test/_doc/1
{
"company_name": "Company #1",
"branch": [
{
"address": {
"location": "39.615,19.8948"
}
}
]
}
POST idx_test/_doc/2
{
"company_name": "Company #2",
"branch": [
{
"address": {
"location": "39.586,19.9028"
}
},
{
"address": {
"location": "39.612,19.9134"
}
},
{
"address": {
"location": "39.607,19.8946"
}
}
]
}
Search by location "39.607,19.8946" company #2
GET idx_test/_search?
{
"fields": [
"company_name",
"branch.address.location"
],
"_source": false,
"sort": [
{
"_geo_distance": {
"branch.address.location": {
"lat": 39.607,
"lon": 19.8946
},
"order": "asc",
"unit": "km"
}
}
]
}
Response:
"hits": [
{
"_index": "idx_test",
"_id": "2",
"_score": null,
"fields": {
"branch.address.location": [
{
"coordinates": [
19.9028,
39.586
],
"type": "Point"
},
{
"coordinates": [
19.9134,
39.612
],
"type": "Point"
},
{
"coordinates": [
19.8946,
39.607
],
"type": "Point"
}
],
"company_name": [
"Company #2"
]
},
"sort": [
0
]
},
{
"_index": "idx_test",
"_id": "1",
"_score": null,
"fields": {
"branch.address.location": [
{
"coordinates": [
19.8948,
39.615
],
"type": "Point"
}
],
"company_name": [
"Company #1"
]
},
"sort": [
0.8897252783915647
]
}
]
Search by location "39.615,19.8948" company #1
GET idx_test/_search?
{
"fields": [
"company_name",
"branch.address.location"
],
"_source": false,
"sort": [
{
"_geo_distance": {
"branch.address.location": {
"lat": 39.615,
"lon": 19.8948
},
"order": "asc",
"unit": "km"
}
}
]
}
Response
"hits": [
{
"_index": "idx_test",
"_id": "1",
"_score": null,
"fields": {
"branch.address.location": [
{
"coordinates": [
19.8948,
39.615
],
"type": "Point"
}
],
"company_name": [
"Company #1"
]
},
"sort": [
0
]
},
{
"_index": "idx_test",
"_id": "2",
"_score": null,
"fields": {
"branch.address.location": [
{
"coordinates": [
19.9028,
39.586
],
"type": "Point"
},
{
"coordinates": [
19.9134,
39.612
],
"type": "Point"
},
{
"coordinates": [
19.8946,
39.607
],
"type": "Point"
}
],
"company_name": [
"Company #2"
]
},
"sort": [
0.8897285575578558
]
}
]

Need Nested sorting elastic search query

Sample Result : need to sort this (name will always be constant, only value will change)
{
"sampleJson": { "loop1": {
"userId": "1",
"loop2": {
"loop3": [
{
"name": "hindi",
"value": "abc"
},
{
"name": "telugu",
"value": "xyz"
}
]
}
},
"loop1": {
"userId": "2",
"loop2": {
"loop3": [
{
"name": "hindi",
"value": "def"
},
{
"name": "telugu",
"value": "ghi"
}
]
}
},
"loop1": {
"userId": "1",
"loop2": {
"loop3": [
{
"name": "hindi",
"value": "jkl"
},
{
"name": "telugu",
"value": "mno"
}
]
}
}
}
}
After performing elastic Sorting, result should look like below:
{
"sampleJson": {
"loop1": {
"userId": "1",
"loop2": {
"loop3": [
{
"name": "hindi",
"value": "jkl"
},
{
"name": "telugu",
"value": "mno"
}
]
}
},
"loop1": {
"userId": "1",
"loop2": {
"loop3": [
{
"name": "hindi",
"value": "abc"
},
{
"name": "telugu",
"value": "xyz"
}
]
}
}
}
}
below Elastic sorting is not working: not sure what is wrong with this. any help that would be great.
{
"from": 0,
"size": 25,
"query": {
"bool": {
"must": [
{
"match": {
"userId": "1"
}
}
]
}
},
"sort": [
{
"sampleJson.loop1.loop2.loop3.value": {
"order":"asc",
"nested": {
"path": "sampleJson",
"filter": {
"terms" : {
"sampleJson.loop1.loop2.loop3.name": [
"telugu"
] }
}
}
}
}
]
}

Elasticsearch 7.8 Nested Aggregation not returning correct data

I have been struggling for a week trying to get correct data out of an Elasticsearch nested aggregtation index. Below is my index mapping and two sample documents inserted. What i want to find is:
Match all documents with the field xforms.sentence.tokens.value equal to 24
Within the matched set of documents do a count of matches grouped by xforms.sentence.tokens.tag where xforms.sentence.tokens.value equal to 24
So as an example in the inserted documents below the output i expect is:
{"JJ": 1, "NN": 1}
{
"_doc": {
"_meta": {},
"_source": {},
"properties": {
"originalText": {
"type": "text"
},
"testDataId": {
"type": "text"
},
"xforms": {
"type": "nested",
"properties": {
"sentence": {
"type": "nested"
},
"predicate": {
"type": "nested"
}
}
},
"corpusId": {
"type": "text"
},
"row": {
"type": "text"
},
"batchId": {
"type": "text"
},
"processor": {
"type": "text"
}
}
}
}
A sample doc inserted is as follows:
{
"_id": "28",
"_source": {
"testDataId": "5e97e9bef033448b893e485baa0fdf15",
"originalText": "Some text with the word 24",
"xforms": [{
"sentence": {
"tokens": [{
"lemma": "Some",
"index": 1,
"after": " ",
"tag": "JJ",
"value": "Some"
},
{
"lemma": "text",
"index": 2,
"after": " ",
"tag": "NN",
"value": "text"
},
{
"lemma": "with",
"index": 3,
"after": " ",
"tag": "NN",
"value": "with"
},
{
"lemma": "the",
"index": 4,
"after": "",
"tag": "CD",
"value": "the"
},
{
"lemma": "word",
"index": 5,
"after": " ",
"tag": "CC",
"value": "word"
},
{
"lemma": "24",
"index": 6,
"after": " ",
"tag": "JJ",
"value": "24"
}
],
"type": "RAW"
},
"originalSentence": "Some text with the word 24 in it",
"id": "e724611d8c024bcb8f0158b60e3df87e"
}]
}
},
{
"_id": "56",
"_source": {
"testDataId": "5e97e9bef033448b893e485baa0fad15",
"originalText": "24 word",
"xforms": [{
"sentence": {
"tokens": [{
"lemma": "24",
"index": 1,
"after": " ",
"tag": "NN",
"value": "24"
},
{
"lemma": "word",
"index": 2,
"after": " ",
"tag": "JJ",
"value": "word"
}
],
"type": "RAW"
},
"originalSentence": "24 word",
"id": "e724611d8c024bcb8f0158b60e3d123"
}]
}
}
Expanding on #Gibbs's answer, #N Kiram you'll need to set the tokens as nested too:
{
"xforms":{
"type":"nested",
"properties":{
"sentence":{
"type":"nested",
"properties":{
"tokens":{ <----
"type":"nested"
}
}
},
"predicate":{
"type":"nested"
}
}
}
}
Then and only then will your aggs yield the correct counts:
{
"aggregations":{
"xforms":{
"doc_count":8,
"inner":{
"doc_count":2,
"tag_count":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"JJ",
"doc_count":1
},
{
"key":"NN",
"doc_count":1
}
]
}
}
}
}
}
Side note: you'll have to reindex in order for the changed mapping to apply.
{
"aggs": {
"xforms": {
"nested": { //Nested aggregation
"path": "xforms.sentence"
},
"aggs": {
"inner": { //Counting only within the matching doc
"filter": {
"bool": {
"filter": { //Filtering docs with value=24
"terms": {
"xforms.sentence.tokens.value": [
"24"
]
}
}
}
},
"aggs" : {
"tag_count":{ //On filtered doc, doing terms aggregation on tag's keyword version as tag is of type text
"terms":{
"field":"xforms.sentence.tokens.tag.keyword"
}
}
}
}
}
}
}
}
It provides the below output
"aggregations": {
"xforms": {
"doc_count": 2,
"inner": {
"doc_count": 2,
"tag_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "JJ",
"doc_count": 2
},
{
"key": "NN",
"doc_count": 2
},
{
"key": "CC",
"doc_count": 1
},
{
"key": "CD",
"doc_count": 1
}
]
}
}
}
}

Jolt Transform to Array with Parent Attributes

I am using NiFi Jolt Processor to transform some JSON data.
My JSON field genre contains shared attributes and an array that contains the list of actual genre names.
I needed to transform the "genre" attribute to become an array of "genres" containing the list of the common attributes and the different genre names.
I have the following input JSON:
{
"programIdentifier": "9184663",
"programInstance": {
"genre": {
"source": "GN",
"locked": false,
"lastModifiedDate": 1527505462094,
"lastModifiedBy": "Some Service",
"genres": [
"Miniseries",
"Drama"
]
}
}
}
I have tried the following spec:
[{
"operation": "shift",
"spec": {
"programIdentifier": ".&",,
"genre": {
"source": "genres[].source.value",
"locked": "genres[].locked",
"lastModifiedDate": "genres[].lastModifiedDate",
"lastModifiedBy": "genres[].lastModifiedBy",
"genres": {
"*": "genres[&0].name"
}
}
}]
This my expected output:
{
"programIdentifier": "9184663",
"programInstance": {
"genres": [
{
"source": {
value: "GN"
}
"locked": false,
"lastModifiedDate": 1527505462094,
"lastModifiedBy": "Some Service",
"name": "Miniseries"
},
{
"source": {
value: "GN"
}
"locked": false,
"lastModifiedDate": 1527505462094,
"lastModifiedBy": "Some Service",
"name": "Drama"
}
]
}
}
But it's coming out as:
{
"programIdentifier": "9184663",
"programInstance": {
"genres": [
{
"source": {
"value": "GN"
},
"name": "Miniseries"
}, {
"locked": false,
"name": "Drama"
}, {
"lastModifiedDate": 1527505462094
}, {
"lastModifiedBy": "Some Service"
}],
}
}
Is it what you want to achieve?
[
{
"operation": "shift",
"spec": {
"programIdentifier": ".&",
"programInstance": {
"genre": {
"genres": {
"*": {
"#2": {
"source": "programInstance.genres[&2].source[]",
"locked": "programInstance.genres[&2].locked",
"lastModifiedDate": "programInstance.genres[&2].lastModifiedDate",
"lastModifiedBy": "programInstance.genres[&2].lastModifiedBy"
},
"#": "programInstance.genres[&1].name"
}
}
}
}
}
}
]

Elasticsearch - How to get min/max/avg of set of nested documents

Given the following mapping and documents in Elasticsearch, how would I get the min/max/avg of a set of nested documents that match a certain condition? For instance, how would I get them min age of pet that are dogs? My filter gets the correct people that have dogs, but how do I make the min then calculate against the correct nested documents.
(1) Mapping
{
"myIndex": {
"mappings": {
"person": {
"properties": {
"name": {
"type": "string"
},
"pets": {
"type": "nested",
"properties": {
"age": {
"type": "long"
},
"name": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
}
}
(2) Data
{
"name": "bob",
"pets": [
{
"type": "dog",
"name": "wolfie",
"age": 20
},
{
"type": "cat",
"name": "kitty",
"age": 6
}
]
}
{
"name": "bill",
"pets": [
{
"type": "fish",
"name": "goldie",
"age": 2
},
{
"type": "cat",
"name": "meowie",
"age": 18
}
]
}
(3) Query and aggregation
{
"query": {
"filtered": {
"filter": {
"nested": {
"path": "pets",
"filter" : {
"terms": {
"pets.type": ["dog"]
}
}
}
}
}
},
"aggs": {
"minage": {
"nested": {
"path": "pets"
},
"aggs": {
"minage": {
"min": {
"field": "age"
}
}
}
}
}
}
I think you can get what you want with a combination of filter aggregation and the nested filter's join option.
This code worked for me:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"person": {
"properties": {
"name": {
"type": "string"
},
"pets": {
"type": "nested",
"properties": {
"age": {
"type": "long"
},
"name": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
}
PUT /test_index/person/1
{
"name": "bob",
"pets": [
{
"type": "dog",
"name": "wolfie",
"age": 20
},
{
"type": "cat",
"name": "kitty",
"age": 6
}
]
}
PUT /test_index/person/2
{
"name": "bill",
"pets": [
{
"type": "fish",
"name": "goldie",
"age": 2
},
{
"type": "cat",
"name": "meowie",
"age": 18
}
]
}
PUT /test_index/person/3
{
"name": "john",
"pets": [
{
"type": "dog",
"name": "oldie",
"age": 25
}
]
}
POST /test_index/_search?search_type=count
{
"aggs": {
"minage_1": {
"nested": {
"path": "pets"
},
"aggs": {
"minage_2": {
"filter": {
"nested": {
"path": "pets",
"filter": {
"terms": {
"pets.type": [
"dog"
]
}
},
"join": false
}
},
"aggs": {
"min_age_3": {
"min": {
"field": "age"
}
}
}
}
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"minage_1": {
"doc_count": 5,
"minage_2": {
"doc_count": 2,
"min_age_3": {
"value": 20
}
}
}
}
}

Resources