How to optimize expression to avoid evaluation timeout? - jsonata

When working with a limited dataset my jsonata expression works as intended, but with a larger one I keep getting "Expression evaluation timeout: Check for infinite loop". Is there any way to optimize this expression to avoid timeouts?
Check try.jsonata.org/ryGcRwxkr for an example with a working dataset.
Please try pasting this json.
{
"type": "FeatureCollection",
"features": $map($[0].features, function($v){
{
"type": $v.type,
"geometry": $v.geometry,
"properties": $merge([$v.properties,{"fecha":$$[1][tm=$v.properties.tm].fecha}])
}
})
}
Thanks in advance!
Update
I found it is a limitation of the JSONata Excerciser. Anyway I'd like to optimize the expression because it is very resource demanding. Thanks again!

Related

Azure Data Factory REST API paging with Elasticsearch

During developing pipeline which will use Elasticsearch as a source I faced with issue related paging. I am using SQL Elasticsearch API. Basically, I've started to do request in postman and it works well. The body of request looks following:
{
"query":"SELECT Id,name,ownership,modifiedDate FROM \"core\" ORDER BY Id",
"fetch_size": 20,
"cursor" : ""
}
After first run in response body it contains cursor string which is pointer to next page. If in postman I send the request and provide cursor value from previous request it return data for second page and so on. I am trying to archive the same result in Azure Data Factory. For this I using copy activity, which store response to Azure blob. Setup for source is following.
copy activity source configuration
This is expression for body
{
"query": "SELECT Id,name,ownership,modifiedDate FROM \"#{variables('TableName')}\" WHERE ORDER BY Id","fetch_size": #{variables('Rows')}, "cursor": ""
}
I have no idea how to correctly setup pagination rule. The pipeline works properly but only for the first request. I've tried to setup Headers.cursor and expression $.cursor but this setup leads to an infinite loop and pipeline fails with the Elasticsearch restriction.
I've also tried to read document at https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-support but it seems pretty limited in terms of usage examples and difficult for understanding.
Could somebody help me understand how to build the pipeline with paging abilities utilization?
Responce with the cursor looks like:
{
"columns": [
{
"name": "companyId",
"type": "integer"
},
{
"name": "name",
"type": "text"
},
{
"name": "ownership",
"type": "keyword"
},
{
"name": "modifiedDate",
"type": "datetime"
}
],
"rows": [
[
2,
"mic Inc.",
"manufacture",
"2021-03-31T12:57:51.000Z"
]
],
"cursor": "g/WuAwFaAXNoRG5GMVpYSjVWR2hsYmtabGRHTm9BZ0FBQUFBRUp6VGxGbUpIZWxWaVMzcGhVWEJITUhkbmJsRlhlUzFtWjNjQUFBQUFCQ2MwNWhaaVIzcFZZa3Q2WVZGd1J6QjNaMjVSVjNrdFptZDP/////DwQBZgljb21wYW55SWQBCWNvbXBhbnlJZAEHaW50ZWdlcgAAAAFmBG5hbWUBBG5hbWUBBHRleHQAAAABZglvd25lcnNoaXABCW93bmVyc2hpcAEHa2V5d29yZAEAAAFmDG1vZGlmaWVkRGF0ZQEMbW9kaWZpZWREYXRlAQhkYXRldGltZQEAAAEP"
}
I finally find the solution, hopefully, it will be useful for the community.
Basically, what needs to be done it is split the solution into four steps.
Step 1 Make the first request as in the question description and stage file to blob.
Step 2 Read blob file and get the cursor value, set it to variable
Step 3 Keep requesting data with a changed body
{"cursor" : "#{variables('cursor')}" }
Pipeline looks like this:
pipeline
Configuration of pagination looks following
pagination . It is a workaround as the server ignores this header, but we need to have something which allows sending a request in loop.

How to convert json to collection in power apps

I have a power app that using the flow from power automate.
My flow is doing an HTTP get and respond a JSON to power apps like below.
Here is the JSON as text:
{"value": "[{\"dataAreaId\":\"mv\",\"AccountNum\":\"100000\",\"Name\":\"*****L FOOD AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100001\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100014\",\"Name\":\"****(SEB)\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100021\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100029\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500100\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500210\",\"Name\":\"****\"}]"}
But when I try to convert this JSON to the collection, It doesn't behave like a list.
It just seems like a text. Here is how I try to bind the list.
How can I create a collection from JSON to bind to the gallery view?
I found the solution. I finally create a collection from the response of flow.
The flow's name is GetVendor.
The response of flow is like this :
{"value": "[{\"dataAreaId\":\"mv\",\"AccountNum\":\"100000\",\"Name\":\"*****L FOOD AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100001\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100014\",\"Name\":\"****(SEB)\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100021\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100029\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500100\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500210\",\"Name\":\"****\"}]"}
Below code creates a list from this response :
ClearCollect(_vendorData, MatchAll(GetVendors.Run(_token.value).value, "\{""dataAreaId"":""(?<dataAreaId>[^""]*)"",""AccountNum"":""(?<AccountNum>[^""]*)"",""Name"":""(?<Name>[^""]*)""\}"));
And I could bind the accountnum and name from _vendorDatra collection to the gallery view
In my case I had the same issue as you, but couldn't manage to get data into _vendorData collection, because MatchAll regex part was not working correctly, even if I had exactly the same scenario and I could not make it work.
My solution was to modify the flow itself, where I returned Response instead of Respond to a Power app or Flow, so basically I could return full request from Http.
This caused me some issues also, because when I generated schema from sample I could not register the flow to the powerapp with the error Failed during http send request.
The solution was to manually review the response schema and change all column types to one of the following three, because other are not supported: string, integer or boolean. Object and array can be set only on top level items, but never on children, so if you have anything else than my mentioned three, replace it to string. And no property can be left with undefined type.
Basically I like this solution even more, because in powerapps itself you do not need to do any conversion or anything - simply use the data as is, because it is already recognized as collection in case of array and you have all the properties already named for you.
Response step schema example is below.
{
"type": "object",
"properties": {
"PropertyOne": {
"type": "string"
},
"PropertyTwo": {
"type": "integer"
},
"PropertyThree": {
"type": "boolean"
},
"PropertyFour": {
"type": "array",
"items": {
"type": "object",
"properties": {
"PropertyArray1": {
"type": "string"
},
"PropertyArray1": {
"type": "integer"
},
"PropertyArray1": {
"type": "boolean"
}
}
}
It is easy now.
Power Apps introduced ParseJSON function which helps converting string to collection easily.
Table(ParseJSON(JSONString));
In gallery, map columns like - ThisItem.Value.ColumnName

Script fields in nested objects specificaly geo_shapes

Part of my document mapping consisits of the mapping below
"locations": {
"type": "nested",
"properties": {
"point": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "100m"
}
}
}
When I attempt to issue a script_field as part of a query Elasticsearch is returning an error
failed to run inline script [doc['locations.point'].distanceInMiles(53.4791,-2.2441)] using lang [groovy]
With a reason of:
failed to find field data builder for field locations.point, and type geo_shape
I'm assuming this is because the field is nested (it has a few (geo) points inside the field and the search matches on any one of them, however as it's nested the context of the path locations.point is obviously wrong, it needs to be something like locations.point[10] (for the 11th one perhaps - this is dependant on the context of the matched item in the query).
So, does anyone know a way to perform this properly? Is there a special operator I can tell the script so that it knows it needs to look at the matched point from the field?
Thanks in advance.
Turns out it's actually not possible to do this with geo_shape's

How to use the Nifi JoltJSONTransform spec?

I wish to use the JoltTransformJSON spec that can be used to convert the input to output.
I have tried to use map to List and other syntax, but was not been successful so far.
Expected input:
{
"params": "sn=GH6747246T4JLR6AZ&c=QUERY_RECORD&p=test_station_name&p=station_id&p=result&p=mac_addresss"
}
Expected output:
{
"queryType": "scan",
"dataSource": "xyz",
"resultFormat": "list",
"columns": ["test_station_name", "station_id", "result", "mac_address"],
"intervals": ["2018-01-01/2018-02-09"],
"filter": {
"type": "selector",
"dimension": "sn",
"value": "GH6747246T4JLR6AZ"
}
}
Except for the content inside Columns and dimension and value attributes rest of the fields are hardcoded.
As all of the data is contained in a single JSON key/value, I don't think JoltTransformJSON is the best option here. I actually think writing a simple script in Python/Groovy/Ruby to split the querystring value and write it out as JSON is easier and less complicated to maintain. I would recommend Groovy specifically (you can use the specialized ExecuteGroovyScript processor), as it is the most performant & robust in Apache NiFi and has excellent JSON handling.

Dynamic Template not working for short, byte & float

I am trying to create a template, in my template I am trying to achieve the dynamic mapping.
Here is what I wrote, as in 6.2.1 the only boolean, date, double, long, object, string are automatically detected, facing issues for mapping the float, short & byte.
Here if I index 127, it will be mapped to short from the short_fields, it's fine, but when I index some 325566, I am getting exception Numeric value (325566) out of range of Java short, I want to suppress this and let long_fields, should take care about this & it should be mapped to long. I have tried with coerce:false, ignore_malformed:true, none of them worked as expected.
"dynamic_templates": [
{
"short_fields": {
"match": "*",
"match_mapping_type": "long",
"mapping": {
"type": "short",
"doc_values": true
}
}
},
{
"long_fields": {
"match": "*",
"match_mapping_type": "long",
"mapping": {
"type": "long",
"doc_values": true
}
}
},
{
"byte_fields": {
"match": "*",
"match_mapping_type": "byte",
"mapping": {
"type": "byte",
"doc_values": true
}
}
}
]
Unfortunately, it is not possible to make Elasticsearch choose the smallest data type possible for you. There are plenty of workarounds, but let me first explain why it does not work.
Why it does not work?
Dynamic mapping templates allow to override default dynamic type matching in three ways:
by matching the name of the field,
by matching the type Elasticsearch have guessed for you,
and by a path in the document.
Elasticsearch picks the first matching rule that works. In your case, the first rule, short_fields, always works for any integer, because it accepts any field name and a guessed type long.
That's why it works for 127 but doesn't work for 325566.
To illustrate better this point, let's change "matching_mapping_type" in the first rule like this:
"match_mapping_type": "short",
Elasticsearch does not accept it and returns an error:
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [doc]: No field type matched on [short], \
possible values are [object, string, long, double, boolean, date, binary]"
}
But how can we make Elasticsearch pick the right types?
Here are some of the options.
Define strict mapping manually
This gives you full control over the selection of types.
Use the default long
Postpone "shrinking" data until it starts being a performance problem.
In fact, using smaller data types will only affect searching/indexing performance, not the storage required. As long as you are fine with dynamic mappings, Elasticsearch manages them for you pretty well.
Mark field names with type information
Since Elasticsearch is not able to tell a byte from long, you can determine the type beforehand and add type information in the field name, like customerAge_byte or revenue_long.
Then you will be able to use a prefix/suffix match like this:
{
"bytes_as_longs": {
"match_mapping_type": "long",
"match": "*_byte",
"mapping": {
"type": "byte"
}
}
}
Please choose the approach that fit your needs better.
Why Elasticsearch takes longs
The reason why Elasticsearch takes longs for any integer input is probably coming from the JSON definition of a number type (as shown at json.org):
It is not possible to tell if a number 0 or 1 is actually integer or long in the entire dataset. Elasticsearch has to guess the correct type from the first example shown, and it takes the safest shot possible.
Hope that helps!

Resources