NiFi: QueryElasticSearchHttp: Matching Objects with Expression Language? - elasticsearch

I have a NiFi flow where I bring in some data through an API, which I easily route into ElasticSearch.
I have a second flow where I need to use the processor QueryElasticSearchHttp. Within that processor, I have a JSON query:
{
"query" : {
"bool" : {
"must" : [
{ "match" : { "Example1" : ${Example1} } },
{ "match" : { "ExampleCode" : ${ExampleCode} } }
]
}
}
}
I am trying to match the object Example1 and ExampleCode and return the whole column. I've tried to inject expression language into the query. Does not work, and I cannot find an example of how to match on an entire object.
I have tried to put quotation marks like so: "${}".
I get the error:
ERROR [Timer-Driven Process Thread-7] ...from Elasticsearch due to
Elasticsearch returned code 400 with message Bad Request, transferring
flow file to failure:
org.apache.nifi.processors.elasticsearch.UnretryableException:
Elasticsearch returned code 400 with message Bad Request
The Attributes are also correctly routed, and appear where they need to appear.
What is the correct way to format this? Thanks

You can try using the JsonQueryElasticsearch processor. A demo flow shown below:
Overall Flow
Start Step
Query ES Step
The JSON query
Store Result Step
The Controller Service
Output of Run:
$ cd /var/so_out
$ cat 17973351988502 | python -m json.tool
[
{
"_id": "o002",
"_index": "office",
"_score": 0.26742277,
"_source": {
"description": "Shows and events",
"name": "Tom",
"title": "Marketing Manager"
},
"_type": "doc"
}
]

Related

Calculate field data size and store to other field at indexing time ElasticSearch 7.17

I am looking for a way to store the size of a field (bytes) in a new field of a document.
I.e. when a document is created with a field message that contains the value hello, I want another field message_size_bytes written that in this example has the value 5.
I am aware of the possibilities using _update_by_query and _search using scripting fields, but I have so much data that I do not want to calculate the sizes while querying but at index time.
Is there a possibility to do this using Elasticsearch 7.17 only? I do not have access to the data before it's passed to elasticsearch.
You can use Ingest Pipeline with Script processor.
You can create pipeline using below command:
PUT _ingest/pipeline/calculate_bytes
{
"processors": [
{
"script": {
"description": "Calculate bytes of message field",
"lang": "painless",
"source": """
ctx['message_size_bytes '] = ctx['message'].length();
"""
}
}
]
}
After creating pipeline, you cna use pipeline name while indexing data like below (same you can use in logstash, java or anyother client as well):
POST 74906877/_doc/1?pipeline=calculate_bytes
{
"message":"hello"
}
Result:
"hits": [
{
"_index": "74906877",
"_id": "1",
"_score": 1,
"_source": {
"message": "hello",
"message_size_bytes ": 5
}
}
]

Trigger an action for each hit of Elasticsearch query in Kibana Monitor

Is it possible to trigger an action for each hit of a given query in a Kibana Monitor? I would like to use a foreach loop to do this as demonstrated here. However, it's unclear how to implement this on the Kibana Monitor page. On the page there is an input field for Trigger Conditions but I'm unsure how to format the foreach within it or if this is supported.
Consider using Elasticsearch watcher (require at least gold licesnse): https://www.elastic.co/guide/en/elasticsearch/reference/current/how-watcher-works.html
Watcher will run on a certain interval and will perform a query against indices (according to your configuration). You will need to create a condition (e.g. hits number is greater than 5) that when it evaluates to true an action will be performed. Elasticsearch allows you to use multiple actions. For example, you can use webhook and receive the data from the last watcher run (you can also use watcher api to transform the data). If you don't have Gold license you can mimic watcher behavior by a script/program that uses Elasticsearch Search API.
Herbeby is a simple example of a watcher checking index named test every minute and sends a webhook with the entire search context in case there is at least one document.
{
"trigger" : {
"schedule" : { "interval" : "1m" }
},
"input" : {
"search" : {
"request" : {
"indices" : [ "test" ],
"body" : {
"query" : {
"bool": {
"must": {
"range": {
"updatedAt": {
"gte": "now-1m"
}
}
}
}
}
}
}
}
},
"condition" : {
"compare" : { "ctx.payload.hits.total" : { "gt" : 0 }}
},
"actions" : {
"sample_webhook" : {
"webhook" : {
"method" : "POST",
"url": "http://b4022015b928.ngrok.io/UtilsService/api/elasticHandler/watcher",
"body" : "{{#toJson}}ctx.payload{{/toJson}}",
"auth": {
"basic": {
"user": "user",
"password": "pass"
}
}
}
}
}
}
An alternative way would be to use Kibana Alerts and Actions.
https://www.elastic.co/guide/en/kibana/current/alerting-getting-started.html
This feature is slightly different from Watcher but basically allows you to perfrom actions upon a query against Elasticsearch. This featrue is only part of Kibana opposing to watcher which is part of Elasticsearch (though it is accessible from Kibana stack management).

Is there a way to update a document with a Painless script without changing the order of unaffected fields?

I'm using Elasticsearch's Update by Query API to update some documents with a Painless script like this (the actual query is more complicated):
POST ts-scenarios/_update_by_query?routing=test
{
"query": {
"term": { "routing": { "value": "test" } }
},
"script": {
"source": """ctx._source.tagIDs = ["5T8QLHIBB_kDC9Ugho68"]"""
}
}
This works, except that upon reindexing, other fields get reordered, including some classes which are automatically (de)serialized using JSON.NET's type handling. That means a document with the following source before the update:
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"$type" : "Test.Models.AnotherActivity, Test",
"CustomParameter" : 1,
"CustomSetting" : false
}
]
}
ends up as
{
"routing" : "testsuite",
"activities" : [
{
"$type" : "Test.Models.SomeActivity, Test"
},
{
"CustomParameter" : 1,
"CustomSetting" : false,
"$type" : "Test.Models.AnotherActivity, Test"
}
],
"tagIDs" : [
"5T8QLHIBB_kDC9Ugho68"
]
}
which JSON.NET can't deserialize. Is there a way I can tell the script (or the Update by Query API) not to change the order of those other fields?
In case it matters, I'm using Elasticsearch OSS version 7.6.1 on macOS. I haven't checked whether an Ingest pipeline would work here, as I'm not familiar with them.
(It turns out I can make the deserialization more flexible by setting the MetadataPropertyHandling property to ReadAhead, as mentioned here. That works, but as mentioned it may hurt performance and there might be other situations where field order matters. Technically, it shouldn't; JSON isn't XML, but there are always edge cases where it does matter.)

Elastic Search | How to get original search query with corresponding match value

I'm using ElasticSearch as search engine for a human resource database.
The user submits a competence (f.ex 'disruption'), and ElasticSearch returns all users ordered by best match.
I have configured the field 'competences' to use synonyms, so 'innovation' would match 'disruption'.
I want to show the user (who is performing the search) how a particular search result matched the search query. For this I use the explain api (reference)
The query works as expected and returns an _explanation to each hit.
Details (simplified a bit) for a particular hit could look like the following:
{
description: "weight(Synonym(skills:innovation skills:disruption)),
value: 3.0988
}
Problem: I cannot see what the original search term was in the _explanation. (As illustrated in example above: I can see that some search query matched with 'innovation' or 'disruption', I need to know what the skill the users searched for)
Question: Is there any way to solve this issue (example: parse a custom 'description' with info about the search query tag to the _explanation)?
Expected Result:
{
description: "weight(Synonym(skills:innovation skills:disruption)),
value: 3.0988
customDescription: 'innovation'
}
Maybe you can put the original query in the _name field?
Like explained in https://qbox.io/blog/elasticsearch-named-queries:
GET /_search
{
"query": {
"query_string" : {
"default_field" : "skills",
"query" : "disruption",
"_name": "disruption"
}
}
}
You can then find the proginal query in the matched queries section in the return object:
{
"_index": "testindex",
"_type": "employee",
"_id": "2",
"_score": 0.19178301,
"_source": {
"skills": "disruption"
},
"matched_queries": [
"disruption"
]
}
Add the explain to the solution and i think it would work fine...?

Problems accessing _source fields with a dot in the name when creating Slack action for Elasticsearch Watcher

I am trying to create a Slack action with a dynamic attachment. My _source looks like this:
{
"user.url": "https://api.github.com/users/...",
"user.gists_url": "https://api.github.com/users/.../gists{/gist_id}",
"user.repos_url": "https://api.github.com/users/.../repos",
"date": "2018-04-27T14:34:10Z",
"user.followers_url": "https://api.github.com/users/.../followers",
"user.following_url": "https://api.github.com/users/.../following{/other_user}",
"user.id": 123456,
"user.avatar_url": "https://avatars0.githubusercontent.com/u/123456?v=4",
"user.events_url": "https://api.github.com/users/.../events{/privacy}",
"user.site_admin": false,
"user.html_url": "https://github.com/...",
"user.starred_url": "https://api.github.com/users/.../starred{/owner}{/repo}",
"user.received_events_url": "https://api.github.com/users/.../received_events",
"metric": "stars",
"user.login": "...",
"user.type": "User",
"user.subscriptions_url": "https://api.github.com/users/.../subscriptions",
"user.organizations_url": "https://api.github.com/users/.../orgs",
"user.gravatar_id": ""
}
and here is my Slack action
"actions": {
"notify-slack": {
"throttle_period_in_millis": 240000,
"slack": {
"account": "monitoring",
"message": {
"from": "Elasticsearch Watcher",
"to": [
"#watcher"
],
"text": "We have {{ctx.payload.new.hits.total}} new stars! And {{ctx.payload.old.hits.total}} in total.",
"dynamic_attachments" : {
"list_path" : "ctx.payload.new.hits.hits",
"attachment_template" : {
"title" : "{{_source.[\"user.login\"]}}",
"text" : "Users Count: {{count}}",
"color" : "{{color}}"
}
}
}
}
}
I can't seem to figure out how to access my _source fields since they have dots in them. I have tried:
"{{_source.[\"user.login\"]}}"
"{{_source.user.login}}"
"{{_source.[user.login]}}"
"{{_source.['user.login']}}"
The answer to my question is that you can't access _source keys with dots in them directly using mustache, you must first transform your data.
Update:
I was able to get this working by using a transform to build a new object. Mustache might not be able to access fields with dots in their names, but painless can! I added this transform to my slack object:
"transform" : {
"script" : {
"source" : "['items': ctx.payload.new.hits.hits.collect(user -> ['userName': user._source['user.login']])]",
"lang" : "painless"
}
}
and now in the slack action dynamic attachments, I can access the items array:
"dynamic_attachments" : {
"list_path" : "ctx.payload.items",
"attachment_template" : {
"title" : "{{userName}}",
"text" : "{{_source}}"
}
}
Old Answer:
So according to this Watcher uses mustache.
and according to this mustache can't access fields with dots in the names.

Resources