Apache NiFi: PutElasticSearchHttp is not working, with blank error - elasticsearch

I currently have Elasticsearch version 6.2.2 and Apache Nifi version 1.5.0 running on the same machine. I'm trying to follow the Nifi example located: https://community.hortonworks.com/articles/52856/stream-data-into-hive-like-a-king-using-nifi.html except instead of storing to Hive, I want to store to Elasticsearch.
Initially I tried using the PutElasticsearch5 processor but I was getting the following error on Elasticsearch:
Received message from unsupported version: [5.0.0] minimal compatible version is: [5.6.0]
When I tried Googling this error message, it seemed like the consensus was to use the PutElasticsearchHttp processor. My Nifi looks like:
And the configuration for the PutElasticsearchHttp processor:
When the flowfile gets to the PutElasticsearchHttp processor, the following error shows up:
PutElasticSearchHttp failed to insert StandardFlowFileRecord into Elasticsearch due to , transferring to failure.
It seems like the reason is blank/null. There also wasn't anything in the Elasticsearch log.
After the ConvertAvroToJson, the data is a JSON array with all of the entries on a single line. Here's a sample value:
{"City": "Athens",
"Edition": 1896,
"Sport": "Aquatics",
"sub_sport": "Swimming",
"Athlete": "HAJOS, Alfred",
"country": "HUN",
"Gender": "Men",
"Event": "100m freestyle",
"Event_gender": "M",
"Medal": "Gold"}
Any ideas on how to debug/solve this problem? Do I need to create anything in Elasticsearch first? Is my configuration correct?

I was able to figure it out. After the ConvertAvroToJSON, the flow file was a single line that contained a JSON Array of JSON indices. Since I wanted to store the individual indices I needed a SplitJSON processor. Now my Nifi looks like this:
The configuration of the SplitJson looks like this:

The index name cannot contain the / character. Try with a valid index name: e.g. sports.

I had a similar flow, wherein changing the type to _doc did the trick after including splitTojSON.

Related

How to cast a field in Elasticsearch pipelines / Painless script

I have an application which is logging level as integers. I am using filebeat to send the logs to ES. I have set level as string in the ES index, which is working for most of the applications. But when filebeat is receiving an integer, the indexation is failing of course with:
"type":"illegal_argument_exception","reason":"field [level] of type [java.lang.Integer] cannot be cast to [java.lang.String]"
In my document: "level":30
I added a step Script in my ingestion pipeline. But I can't manage to make it work: either I get a compilation error or the script is somehow failing and nothing at all get indexed.
Some very basic script I tried:
if (doc['level'].value == 30) {
doc['level'].value = 'info';
}
Any idea on how to handle this in ES pipelines?
Regards
The best way is to transform data before sending to ES.
You can usse processsor in filebeat to filter you data.
https://www.elastic.co/guide/en/beats/filebeat/current/defining-processors.html

Filebeat - what is the configuration json.message_key for?

I am working on a Filebeat project for indexing logs, in a Json format.
I see in the configuration that there is the option json.message_key: message
I don't really understand, what is this for, if I remove it, I see no change.
Can someone explain me ?
Logs are in this format :
{"appName" : "blala", "version" : "1.0.0", "level":"INFO", "message": "log message"}
Message is the default key for raw content line.
So if you remove if from the config, filebeat will still use message, and apply grok on it.
If you change it to "not-a-message", you should see a difference. But you should not do it as every automation depend on it.

logstash 5 ruby filter

I've recently upgraded an elk-stack cluster from an old version to 5.1 and although everything looks great, I have an exception occurring frequently in the logstash log, which looks like this:
logstash.filters.ruby Ruby exception occurred: Direct event field references (i.e. event['field']) have been disabled in favor of using event get and set methods (e.g. event.get('field')). Please consult the Logstash 5.0 breaking changes documentation for more details.
The filter I have looks like this:
filter {
ruby {
init => "require 'time'"
code => "event.cancel if event['#timestamp'] < Time.now-(4*86400)"
}
}
Any suggestions ?
The exception contains the answer:
Direct event field references (i.e. event['field']) have been disabled in favor of using event get and set methods (e.g. event.get('field')).
From that, it seems like event.get('#timestamp') is now preferred over event['#timestamp'].

Elastic Search and Spark

I am trying to set up spark and Eleastic search using the elasticsearch-spark library with the sbt artifact: "org.elasticsearch" %% "elasticsearch-spark" % "2.3.2". When I try to configure eleastic search with this code:
val sparkConf = new SparkConf().setAppName("test").setMaster("local[2]")
.set("es.index.auto.create", "true")
.set("es.resource", "test")
.set("es.nodes", "test.com:9200")
I keep getting the error: illegal character for all of the set statements above for elastic search. Anyone know the issue?
You must have copied the code from any website or any other blog. It contains unreadable characters that are actually giving you trouble.
Simple solution: Delete all the content. Type one by one manually, and run it.. Let me know if you faced any problems again, i will help you out.
You might want to set the http.publish_host in your elasticsearch.yml to HOST_NAME. The es-hadoop connector is sniffing the nodes from the _nodes/transport API so it checks what the published http address is.

debugging elasticsearch

I'm using tire and elasticsearch. The service has started using port 9200. However, it was returning 2 errors:
"org.elasticsearch.search.SearchParseException: [countries][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"query_string":{"query":"name:"}}}]]"
and
"Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'name:': Encountered "<EOF>" at line 1, column 5."
So, I reinstalled elasticsearch and the service container. Service starts fine.
Now, when I search using tire I get no results when results should appear and I don't receive any error messages.
Does anybody have any idea how I might find out what is wrong, let alone fix it?
first of all, you don't need to reindex anything, in the usual cases. It depends how you installed and configured elasticsearch, but when you install and upgrade eg. with Homebrew, the data are persisted safely.
Second, no need to reinstall anything. The error you're seeing means just what it says on the tin: SearchParseException, ie. your query is invalid:
{"query":{"query_string":{"query":"name:"}}}
Notice that you didn't pass any query string for the name qualifier. You have to pass something, eg:
{"query":{"query_string":{"query":"name:foo"}}}
or, in Ruby terms:
Tire.index('test') { query { string "name:hey" } }
See this update to the Railscasts episode on Tire for an example how to catch errors due to incorrect Lucene queries.

Resources