Azure Data Factory REST API paging with Elasticsearch - elasticsearch

During developing pipeline which will use Elasticsearch as a source I faced with issue related paging. I am using SQL Elasticsearch API. Basically, I've started to do request in postman and it works well. The body of request looks following:
{
"query":"SELECT Id,name,ownership,modifiedDate FROM \"core\" ORDER BY Id",
"fetch_size": 20,
"cursor" : ""
}
After first run in response body it contains cursor string which is pointer to next page. If in postman I send the request and provide cursor value from previous request it return data for second page and so on. I am trying to archive the same result in Azure Data Factory. For this I using copy activity, which store response to Azure blob. Setup for source is following.
copy activity source configuration
This is expression for body
{
"query": "SELECT Id,name,ownership,modifiedDate FROM \"#{variables('TableName')}\" WHERE ORDER BY Id","fetch_size": #{variables('Rows')}, "cursor": ""
}
I have no idea how to correctly setup pagination rule. The pipeline works properly but only for the first request. I've tried to setup Headers.cursor and expression $.cursor but this setup leads to an infinite loop and pipeline fails with the Elasticsearch restriction.
I've also tried to read document at https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-support but it seems pretty limited in terms of usage examples and difficult for understanding.
Could somebody help me understand how to build the pipeline with paging abilities utilization?
Responce with the cursor looks like:
{
"columns": [
{
"name": "companyId",
"type": "integer"
},
{
"name": "name",
"type": "text"
},
{
"name": "ownership",
"type": "keyword"
},
{
"name": "modifiedDate",
"type": "datetime"
}
],
"rows": [
[
2,
"mic Inc.",
"manufacture",
"2021-03-31T12:57:51.000Z"
]
],
"cursor": "g/WuAwFaAXNoRG5GMVpYSjVWR2hsYmtabGRHTm9BZ0FBQUFBRUp6VGxGbUpIZWxWaVMzcGhVWEJITUhkbmJsRlhlUzFtWjNjQUFBQUFCQ2MwNWhaaVIzcFZZa3Q2WVZGd1J6QjNaMjVSVjNrdFptZDP/////DwQBZgljb21wYW55SWQBCWNvbXBhbnlJZAEHaW50ZWdlcgAAAAFmBG5hbWUBBG5hbWUBBHRleHQAAAABZglvd25lcnNoaXABCW93bmVyc2hpcAEHa2V5d29yZAEAAAFmDG1vZGlmaWVkRGF0ZQEMbW9kaWZpZWREYXRlAQhkYXRldGltZQEAAAEP"
}

I finally find the solution, hopefully, it will be useful for the community.
Basically, what needs to be done it is split the solution into four steps.
Step 1 Make the first request as in the question description and stage file to blob.
Step 2 Read blob file and get the cursor value, set it to variable
Step 3 Keep requesting data with a changed body
{"cursor" : "#{variables('cursor')}" }
Pipeline looks like this:
pipeline
Configuration of pagination looks following
pagination . It is a workaround as the server ignores this header, but we need to have something which allows sending a request in loop.

Related

Is there a way to write an Expression in Power Automate to retrieve item from SurveyMonkey?

There is no dynamic content you can get from the SurveyMonkey trigger in Power Automate except for the Analyze URL, Created Date, and Link. Is it possible I could retrieve the data with an expression so I could add fields to SharePoint or send emails based on answers to questions?
For instance, here is some JSON data for a county multiple choice field, that I would like to know the county so I can have the email sent to the correct person:
{
"id": "753498214",
"answers": [
{
"choice_id": "4963767255",
"simple_text": "Williamson"
}
],
"family": "single_choice",
"subtype": "menu",
"heading": "County where the problem is occurring:"
}
And basically, a way to create dynamic fields from the content so it would be more usable?
I am a novice so your answer will have to assume I know nothing!
Thanks for considering the question.
Overall, anything I have tried is unsuccessful!
I was able to get an answer on Microsoft Power Users support.
Put this data in compose action:
{
"id": "753498214",
"answers": [
{
"choice_id": "4963767255",
"simple_text": "Williamson"
}
],
"family": "single_choice",
"subtype": "menu",
"heading": "County where the problem is occurring:"
}
Then these expressions in additional compose actions:
To get choice_id:
outputs('Compose')?['answers']?[0]?['choice_id']
To get simple_text:
outputs('Compose')?['answers']?[0]?['simple_text']
Reference link here where I retrieved the answer is here.
https://powerusers.microsoft.com/t5/General-Power-Automate/How-to-write-an-expression-to-retrieve-answer/m-p/1960784#M114215

How to query arbitrary fields in Kibana?

We're importing logs that contain the the full request/response for a given endpoint. Using the escLayout c# library. The import is working fine, however the 'structured' part of the log is stored under metadata, like so:
"metadata": {
"event": {
"controller": "main",
"method": "GetData",
"request": {
"userId": 1,
"clientTypeId": 2
},
"response": {
"marketOpen": true,
"price": 18.56
}
}
}
How do I go about querying this metadata as the fields do not appear in the 'Lens' page.
Is it a case of creating an index of some description? There are a lot of different (and occasionally large) data sets so this seems really impactable.
Is querying 'ad-hoc' data like this not a good use of Kibana? Should I look elsewhere, say Grafana before I go too far down the Elastic road?
Note: We're on Elastic 8.2.0

how can I compose a bot to iterate trough a xml json object?

I am using the composer to publish a bot to fetch data from an azure storage table.
In short, the bot composer needs to construct a bot to iterate through an XML deserialized JSON object returned by the azure storage rest API.
In my code generated by the composer, the bot does a "set property" step immediately following the successful return of the REST API (storage table query). Given the deserialized object returned by the storage REST API, how should the "set property" statement be constructed so the bot can print our the individual data field,
Another way to phrase the question: how can I use the composer to construct the bot to iterate through a returned deserialized object (coded in XML JSON format)?
Where can I find a document that can shed some light on this matter?
Is there any place I can find a good example? Can it be done via composer?
Thanks in advance.
Yes, it can be done. If the API returns XML, make sure you configure your api call to ask for content type application/xml.
Then you can use use the xPath built in function. Make note that it will return an array if results in more than value matches the expression, in which you can use the foreach function to iterate over it with. I needed to run the nightly build of Composer (with bot-builder 4.12.0) to get it to work for me. See here for some more info:
https://github.com/microsoft/botbuilder-js/pull/3093
Here's an example that worked for me:
"actions": [
{
"$kind": "Microsoft.SendActivity",
"$designer": {
"id": "rGv7XC"
},
"activity": "${SendActivity_rGv7XC()}"
},
{
"$kind": "Microsoft.HttpRequest",
"$designer": {
"id": "TDA1wO"
},
"method": "GET",
"url": "http://www.geoplugin.net/xml.gp?ip=157.54.54.128",
"resultProperty": "dialog.api_response",
"contentType": "application/xml"
},
{
"$kind": "Microsoft.SetProperty",
"$designer": {
"id": "ipNhfY"
},
"property": "dialog.timezone",
"value": "=xPath(dialog.api_response.content,'/geoPlugin/geoplugin_timezone/text()')"
},
{
"$kind": "Microsoft.SendActivity",
"$designer": {
"id": "DxohEx"
},
"activity": "${SendActivity_DxohEx()}"
}
]
You can (if needed/you wish) use the json and jPath built in functions to convert xml to json and then query with. Something like:
${json(user.testXml)} and then
${jPath(user.testJson , "automobiles")}

How to convert json to collection in power apps

I have a power app that using the flow from power automate.
My flow is doing an HTTP get and respond a JSON to power apps like below.
Here is the JSON as text:
{"value": "[{\"dataAreaId\":\"mv\",\"AccountNum\":\"100000\",\"Name\":\"*****L FOOD AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100001\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100014\",\"Name\":\"****(SEB)\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100021\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100029\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500100\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500210\",\"Name\":\"****\"}]"}
But when I try to convert this JSON to the collection, It doesn't behave like a list.
It just seems like a text. Here is how I try to bind the list.
How can I create a collection from JSON to bind to the gallery view?
I found the solution. I finally create a collection from the response of flow.
The flow's name is GetVendor.
The response of flow is like this :
{"value": "[{\"dataAreaId\":\"mv\",\"AccountNum\":\"100000\",\"Name\":\"*****L FOOD AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100001\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100014\",\"Name\":\"****(SEB)\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100021\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"100029\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500100\",\"Name\":\"**** AB\"},{\"dataAreaId\":\"mv\",\"AccountNum\":\"500210\",\"Name\":\"****\"}]"}
Below code creates a list from this response :
ClearCollect(_vendorData, MatchAll(GetVendors.Run(_token.value).value, "\{""dataAreaId"":""(?<dataAreaId>[^""]*)"",""AccountNum"":""(?<AccountNum>[^""]*)"",""Name"":""(?<Name>[^""]*)""\}"));
And I could bind the accountnum and name from _vendorDatra collection to the gallery view
In my case I had the same issue as you, but couldn't manage to get data into _vendorData collection, because MatchAll regex part was not working correctly, even if I had exactly the same scenario and I could not make it work.
My solution was to modify the flow itself, where I returned Response instead of Respond to a Power app or Flow, so basically I could return full request from Http.
This caused me some issues also, because when I generated schema from sample I could not register the flow to the powerapp with the error Failed during http send request.
The solution was to manually review the response schema and change all column types to one of the following three, because other are not supported: string, integer or boolean. Object and array can be set only on top level items, but never on children, so if you have anything else than my mentioned three, replace it to string. And no property can be left with undefined type.
Basically I like this solution even more, because in powerapps itself you do not need to do any conversion or anything - simply use the data as is, because it is already recognized as collection in case of array and you have all the properties already named for you.
Response step schema example is below.
{
"type": "object",
"properties": {
"PropertyOne": {
"type": "string"
},
"PropertyTwo": {
"type": "integer"
},
"PropertyThree": {
"type": "boolean"
},
"PropertyFour": {
"type": "array",
"items": {
"type": "object",
"properties": {
"PropertyArray1": {
"type": "string"
},
"PropertyArray1": {
"type": "integer"
},
"PropertyArray1": {
"type": "boolean"
}
}
}
It is easy now.
Power Apps introduced ParseJSON function which helps converting string to collection easily.
Table(ParseJSON(JSONString));
In gallery, map columns like - ThisItem.Value.ColumnName

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

Resources