NiFi LookupRecord for multiple fields - apache-nifi

I have a requirement of filtering the records from the source file if the value of few attributes (e.g., Emp_number, Manager_id, associate_num) matches with the list of values available in a lookup file (e.g., column name ID in a csv lookup file). For this, I have used NiFi LookupRecord processor (version 1.5). Added dynamic property in the lookup processor as "key" and gave the recordpath value as /Emp_number. It is working fine. But if i add additional dynamic properties like "key1": /Manager_id and "key2": /associate_num, it is not filtering the records which has matching values in column Manager_id and associate_num with lookup file.
As per the definition in NiFi document, my understanding is multiple field lookup is supported.
"The "coordinates" to use for looking up a value in the Lookup Service are defined by adding a user-defined property. Each property that is added will have an entry added to a Map, where the name of the property becomes the Map Key and the value returned by the RecordPath becomes the value for that key."
Could any one please help me on what am missing here.

The LookupRecord processor will take all of the user defined properties and create a coordinates Map and then call a LookupService.
The LookupService has a method like:
lookup(final Map<String, Object> coordinates)
Each implementation of LookupService may work differently and may or may not accept multiple coordinates.
There is a method on the service Set<String> getRequiredKeys(); where the service defines the required keys that are expected to be in the coordinates Map.
For example, CsvRecordLookupService returns a set of keys with one string in it called "key", so that service only supports a single coordinate.
I haven't gone through all the services, but many of them only support a single coordinate called "key".

Related

How to handle the situation of saving to Elasticsearch logs whose structure are very diverse?

My log POCO has several fixed properties, like user id, timestamp, with a flexible data bag property, which is a JSON representation of any kind of extra information I'd like to add to the log. This means the property names could be anything within this data bag, bringing me 2 questions:
How can I configure the mapping so that the data bag property, which is of type string, would be mapped to a JSON object during the indexing, instead of being treated as a normal string?
With the data bag object having arbitrary property names, meaning the overall document type could have a huge number of properties inside, would this hurt the search performance?
For the data translation from string to JSON you can use ingest pipeline with JSON processor:
https://www.elastic.co/guide/en/elasticsearch/reference/master/json-processor.html
It depends of you queries. If you'll use the "free text search" - yes, the huge number of fields will slow the query. If you you'll use query like "field":"value" - no, there is no problem with the fields number in the searches. Additional information about query optimization you cold find here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/tune-for-search-speed.html#search-as-few-fields-as-possible
And the question is: what you meen, when say "huge number"? 1000? 10000? 100000? As part of optimization i recommend to use dynamic templates with the definition: each string field automatically ingest into the index as "keyword" and not text + keyword. This setting decrease the number of fields to half.

elasticsearch - Tag data with lookup table values

I’m trying to tag my data according to a lookup table.
The lookup table has these fields:
• Key- represent the field name in the data I want to tag.
In the real data the field is a subfield of “Headers” field..
An example for the “Key” field:
“Server. (* is a wildcard)
• Value- represent the wanted value of the mentioned field above.
The value in the lookup table is only a part of a string in the real data value.
An example for the “Value” field:
“Avtech”.
• Vendor- the value I want to add to the real data if a combination of field- value is found in an document.
An example for combination in the real data:
“Headers.Server : Linux/2.x UPnP/1.0 Avtech/1.0”
A match with that document in the look up table will be:
Key= Server (with wildcard on both sides).
Value= Avtech(with wildcard on both sides)
Vendor= Avtech
So baisically I’ll need to add a field to that document with the value- “ Avtech”.
the subfields in “Headers” are dynamic fields that changes from document to document.
of a match is not found I’ll need to add to the tag field with value- “Unknown”.
I’ve tried to use the enrich processor , use the lookup table as the source data , the match field will be ”Value” and the enrich field will be “Vendor”.
In the enrich processor I didn’t know how to call to the field since it’s dynamic and I wanted to search if the value is anywhere in the “Headers” subfields.
Also, I don’t think that there will be a match between the “Value” in the lookup table and the value of the Headers subfield, since “Value” field in the lookup table is a substring with wildcards on both sides.
I can use some help to accomplish what I’m trying to do.. and how to search with wildcards inside an enrich processor.
or if you have other idea besides the enrich processor- such as parent- child and lookup terms mechanism.
Thanks!
Adi.
There are two ways to accomplish this:
Using the combination of Logstash & Elasticsearch
Using the only the Elastichsearch Ingest node
Constriant: You need to know the position of the Vendor term occuring in the Header field.
Approach 1
If so then you can use the GROK filter to extract the term. And based on the term found, do a lookup to get the corresponding value.
Reference
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_static.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_streaming.html
Approach 2
Create an index consisting of KV pairs. In the ingest node, create a pipeline which consists of Grok processor and then Enrich it. The Grok would work the same way mentioned in the Approach 1. And you seem to have already got the Enrich part working.
Reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html
If you are able to isolate the sub field within the Header where the Term of interest is present then it would make things easier for you.

Get all nested arrays from ElasticSearch document and sort it

I have a mapping similar to this: JsonExample
This represents a changelog structure. So each document represents an object with certain properties. These properties can be changed at certain dates. To limit the space, it keeps track of the changes instead of updating the whole object with only one updated field for example.
The goal is to query a certain object based on a given date. The result should be the object with the properties on that given date. So later changes should be discarded and only the changes matching or most recent to the given date should be returned.
So with a nested query I can retrieve the whole object. Now I only want those nested properties to be returned and also sorted closest to the given date so I can easily find the properties at that given date.
Is there any way to do this with only ElasticSearch queries/filters and without parsing the returned json and sort afterwards with for example Java.

Map JSON string property to ES object

I have a process that imports some of the data from external sources to elasticsearch. I use C# and NEST client.
Some of the classes have string properties that contain JSON. Same property may contain different json schema depending on source.
I want to index and analyze json objects in these properties.
I tried object type mapping using [ElasticProperty(Type=FieldType.Object)] but it doesn't seem to help.
What is the right way to index and analyze these strings?
E.g. I import objects like below and then want to query all start events of customer 9876 that have status rejected. I then want to see how they distribute over period of time (using kibana).
var e = new Event (){id=123, source="test-log" input="{type:'START',params:[{name:'customerid',value:'9876'},{name:'region',value:'EU'}]}",result="{status:'rejected'}"};

Binding multiple select where option values may contain commas in Spring 3

We are having an issue with binding a multiple select element when the option values contain commas. We have tried binding to both a String and to a List<String>, but have issues with both.
When a multiple select element is posted, the value of each selected option is passed in a separate request parameter, all with the same name. For example, if the select element name is "code", the parameters might look like this:
code=ABC
code=A,B
code=XYZ
When binding to a String, Spring will automatically join these values into a comma-separated string. That is obviously an issue if one or more of the values contains a comma.
When binding to a List<String>, things work fine when multiple options are selected. In that case, Spring creates a List with an entry for each selected option. But if only one option is selected, Spring assumes the value is a comma-separated list and will split it into multiple entries.
Is there a way to tell Spring to use a different character than a comma when binding to a String? Is there a way to tell Spring not to split a single value when binding to a List<String>? Or is there another way to deal with this?
I believe this thread is related to your issue: How to prevent parameter binding from interpreting commas in Spring 3.0.5?. This Spring issue may also be helpful: https://jira.springsource.org/browse/SPR-7963
The solution provided at https://stackoverflow.com/a/5239841/1259928, which details how to create a new conversion service which uses a different string separator and wiring it into Spring config should do the trick.

Resources