Data Operation - Select (Json Array) - power-automate

I have a JSON Array with the following structure:
{
"InvoiceNumber": "11111",
"AccountName": "Hospital",
"items": {
"item": [
{
"Quantity": "48.000000",
"Rate": "0.330667",
"Total": "15.87"
},
{
"Quantity": "1.000000",
"Rate": "25.000000",
"Total": "25.00"
}
]
}
}
I would like to use Data Operation "Select" to select invoice numbers with invoice details.
Select:
From body('Parse_Json')?['invoices']?['invoice']
Key: Invoice Number;Map:item()['InvoiceNumber'] - this line works
Key: Rate; Map: item()['InvoiceNumber']?['items']?['item']?['Rate']- this line doesnt work.
The error message says "Array elements can only be selected using an integer index". Is it possible to select the Invoice Number AND all the invoice details such as rate etc.? Thank you in advance! Also, I am trying not to use "Apply to each"

You have to use a loop in some form, the data resides in a array. The only way you can avoid looping is if you know that the number of items in the array will always be of a certain length.
Without looping, you can't be sure that you've processed each item.
To answer your question though, if you want to select a specific item in an array, as the error describes, you need to provide the index.
This is the sort of expression you need. In this one, I am selecting the item at position 1 (arrays start at 0) ...
body('Parse_JSON')?['items']?['item'][1]['rate']
Using your JSON ...
You can always extract just the items object individually but you'll still need to loop to process each item IF the length is never a static two items (for example).
To extract the items, you select the object from the dynamic content ...
Result ...

Related

Kibana scripted field which loops through an array

I am trying to use the metricbeat http module to monitor F5 pools.
I make a request to the f5 api and bring back json, which is saved to kibana. But the json contains an array of pool members and I want to count the number which are up.
The advice seems to be that this can be done with a scripted field. However, I can't get the script to retrieve the array. eg
doc['http.f5pools.items.monitor'].value.length()
returns in the preview results with the same 'Additional Field' added for comparison:
[
{
"_id": "rT7wdGsBXQSGm_pQoH6Y",
"http": {
"f5pools": {
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
},
"pool.MemberCount": [
7
]
},
If I try
doc['http.f5pools.items']
Or similar I just get an error:
"reason": "No field found for [http.f5pools.items] in mapping with types []"
Googling suggests that the doc construct does not contain arrays?
Is it possible to make a scripted field which can access the set of values? ie is my code or the way I'm indexing the data wrong.
If not is there an alternative approach within metricbeats? I don't want to have to make a whole new api to do the calculation and add a separate field
-- update.
Weirdly it seems that the number values in the array do return the expected results. ie.
doc['http.f5pools.items.ratio']
returns
{
"_id": "BT6WdWsBXQSGm_pQBbCa",
"pool.MemberCount": [
1,
1
]
},
-- update 2
Ok, so if the strings in the field have different values then you get all the values. if they are the same you just get one. wtf?
I'm adding another answer instead of deleting my previous one which is not the actual question but still may be helpful for someone else in future.
I found a hint in the same documentation:
Doc values are a columnar field value store
Upon googling this further I found this Doc Value Intro which says that the doc values are essentially "uninverted index" useful for operations like sorting; my hypotheses is while sorting you essentially dont want same values repeated and hence the data structure they use removes those duplicates. That still did not answer as to why it works different for string than number. Numbers are preserved but strings are filters into unique.
This “uninverted” structure is often called a “column-store” in other
systems. Essentially, it stores all the values for a single field
together in a single column of data, which makes it very efficient for
operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is
enabled by default. Doc values are created at index-time: when a field
is indexed, Elasticsearch adds the tokens to the inverted index for
search. But it also extracts the terms and adds them to the columnar
doc values.
Some more deep-dive into doc values revealed it a compression technique which actually de-deuplicates the values for efficient and memory-friendly operations.
Here's a NOTE given on the link above which answers the question:
You may be thinking "Well that’s great for numbers, but what about
strings?" Strings are encoded similarly, with the help of an ordinal
table. The strings are de-duplicated and sorted into a table, assigned
an ID, and then those ID’s are used as numeric doc values. Which means
strings enjoy many of the same compression benefits that numerics do.
The ordinal table itself has some compression tricks, such as using
fixed, variable or prefix-encoded strings.
Also, if you dont want this behavior then you can disable doc-values
OK, solved it.
https://discuss.elastic.co/t/problem-looping-through-array-in-each-doc-with-painless/90648
So as I discovered arrays are prefiltered to only return distinct values (except in the case of ints apparently?)
The solution is to use params._source instead of doc[]
The answer for why doc doesnt work
Quoting below:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
Doc-values can only return "simple" field values like numbers, dates,
geo- points, terms, etc, or arrays of these values if the field is
multi-valued. It cannot return JSON objects
Also, important to add a null check as mentioned below:
Missing fields
The doc['field'] will throw an error if field is
missing from the mappings. In painless, a check can first be done with
doc.containsKey('field')* to guard accessing the doc map.
Unfortunately, there is no way to check for the existence of the field
in mappings in an expression script.
Also, here is why _source works
Quoting below:
The document _source, which is really just a special stored field, can
be accessed using the _source.field_name syntax. The _source is loaded
as a map-of-maps, so properties within object fields can be accessed
as, for example, _source.name.first.
.
Responding to your comment with an example:
The kyeword here is: It cannot return JSON objects. The field doc['http.f5pools.items'] is a JSON object
Try running below and see the mapping it creates:
PUT t5/doc/2
{
"items": [
{
"monitor": "default"
},
{
"monitor": "default"
}
]
}
GET t5/_mapping
{
"t5" : {
"mappings" : {
"doc" : {
"properties" : {
"items" : {
"properties" : {
"monitor" : { <-- monitor is a property of items property(Object)
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}

Elasticsearch performance impact on choosing mapping structure for index

I am receiving data in a format like,
{
name:"index_name",
status: "good",
datapoints: [{
paramType: "ABC",
batch: [{
time:"timestamp1<epoch in sec>",
value: "123"
},{
time:"timestamp2<epoch in sec>",
value: "123"
}]
},
{
paramType: "XYZ",
batch: [{
time:"timestamp1<epoch in sec>",
value: "123"
},{
time:"timestamp2<epoch in sec>",
value: "124"
}]
}]
}
I would like to store the data into elasticsearch in such a way that I can query based on a timerange, status or paramType.
As mentioned here, I can define datapoints or batch as a nested data type which will allow to index object inside the array.
Another way, I can possibly think is by dividing the structure into separate documents. e.g.
{
name : "index_name",
status: "good",
paramType:"ABC",
time:"timestamp<epoch in sec>",
value: "123"
}
which one will be the most efficient way?
if I choose the 2nd way, I know there may be ~1000 elements in the batch array and 10-15 paramsType array, which means ~15k documents will be generated and 15k*5 fields (= 75K) key values pair will be repeated in the index?
Here this explains about the advantage and disadvantage of using nested but no performance related stats provided. in my case, there won't be any update in the inner object. So not sure which one will be better. Also, I have two nested objects so I would like to know how can I query if I use nested for getting data between a timerange?
Flat structure will perform better than nested. Nested queries are slower compared to term queries ; Also while indexing - internally a single nested document is represented as bunch of documents ; just that they are indexed in same block .
As long as your requirements are met - second option works better.

Protocol buffers Fieldmask on Collections within resource

If I want to update the "amount" field within a particular element inside "f_units" collection in the below resource (protocol buffer), how will the FieldMask look like to update the amount field? Does the field mask operate on array index for collections?
{
"f_sel": {
"f_units": [
{
"id": "1",
"amount": {
"coefficient": 1000,
"exponent": -2
}
},
{
"id": "2",
"amount": {
"coefficient": 2000,
"exponent": -2
}
}
]
}
}
Will it be "f_sel.f_units.0.amount" ? How can I update the amount using FieldMask?
As far as I know, there is no way to replace individual elements of a repeated field with an index in a FieldMask.
Instead, you'd update the amount field for the element within f_units you wish to change and set the FieldMask to
"f_sel.f_units"
It would be slightly more efficient to only have to send a delta to the original list, but it would be hard to prevent bugs. For example, what if the proto was modified in the meantime and the specified index (presuming there was a way to specify one) for the repeated field was no longer in range?
As an aside, Google does propose the concept of MergeOptions which defines semantics for how repeated fields are to be handled when merging. Currently, it appears they intend for you either to replace the repeated field in its entirety or append to the end of the destination field. Both of these merging strategies avoid the aforementioned bug that could be caused by specifying an invalid index.

ElasticSearch Score Function Depending on Neighbor Documents

I have an ElasticSearch index with 2 mappings (types).
In the app I need to display a paginated feed containing items of both types.
Currently the items are sorted just by creation date, but I also want to have control on how the items alternate with each other on the page.
For example, I want to set a rule for sequence "3 items of type A, 1 item of type B, and so on".
I need it to make sure items of both types are displayed on each page and equally distributed across the pages.
But as far as I see it's not possible to access another documents in custom score function script.
Of course it's easy to implement directly in the app logic, but it's not clear how to implement pagination using this way.
Any ideas on how to achieve that?
I don't think you can do this.
One approach (that doesn't work) is to keep a global variable in a script and to increment that once every document is being returned/processed. And then to take this number, divide it by 3 and get the modulo number. Based on this number, to sort the docs. But "global" variables are not possible in sripts.
The only two approaches that I can think of is to use a script to generate a random number and based on that to sort. In this way, you get some chances to have a "mixed list of types.
Or, if you want the smallest deterministic way of sorting the docs, still in a script take the ID of the document (you said is a number) modulo 3 it and use the value to sort.
For the random approach:
"sort": [
{
"date": {
"order": "desc"
}
},
{
"_script": {
"script": "Math.random()",
"type": "number",
"order": "asc"
}
}
]

couchdb view/reduce. sometimes you can return values, sometimes you cant..?

This is on a recent version of couchbase server.
The end goal is for the reduce/groupby to aggregate the values of the duplicate keys in to a single row with an array value.
view result with no reduce/grouping (in reality there are maybe 50 rows like this emitted):
{
"total_rows": 3,
"offset": 0,
"rows": [
{
"id": "1806a62a75b82aa6071a8a7a95d1741d",
"key": "064b6b4b-8e08-4806-b095-9e59495ac050",
"value": "1806a62a75b82aa6071a8a7a95d1741d"
},
{
"id": "47abb54bf31d39946117f6bfd1b088af",
"key": "064b6b4b-8e08-4806-b095-9e59495ac050",
"value": "47abb54bf31d39946117f6bfd1b088af"
},
{
"id": "ed6a3dd3-27f9-4845-ac21-f8a5767ae90f",
"key": "064b6b4b-8e08-4806-b095-9e59495ac050",
"value": "ed6a3dd3-27f9-4845-ac21-f8a5767ae90f"
}
}
with reduce + group_level=1:
function(keys,values,re){
return values;
}
yields an error from couch with the actual 50 or so rows from the real view (even fails with fewer view rows). couch says something about the data not shrinking rapidly enough. However this same type of thing works JUST FINE when the view keys are integers and there is a small amount of data.
Can someone please explain the difference to me?
Reduce values need to remain as small as possible, due to the nature of how they are stored in the internal b-tree data format. There's a little bit of information in the wiki about why this is.
If you want to identify unique values, this needs to be done in your map function. This section on the same wiki page shows you one method you can use to do so. (I'm sure there are others)
I am almost always going to be querying this view with a "key" parameter, so there really is no need to aggregate values via couch, it can be easily and efficiently done in the app.

Resources