How to save nested JSON data into Elasticsearch from Apache Nifi? - elasticsearch

When I try to insert geo_shape type entry into Elasticsearch from Apache Nifi. This is just a nested JSON field - for eaxmple, in Apache Nifi my FlowFile have this nested content for geo_shape:
{
"location": "{\"type\":\"polygon\",\"coordinates\":[[[3.042514,41.79673582],[3.04182089,41.79738937],[3.04299467,41.79763732],[3.042514,41.79673582]]]}"
}
In Elasticsearch the field is specified as follows:
"location": {
"type": "geo_shape"
}
When I execute PutElasticsearch1.3, I get the following error:
MapperParsingException failed to parse [location]: nested - shape must
be an object consisting of type and coordinates
How can I parse this nested JSON string in order to save it in Elasticsearch from Apache Nifi?

Related

JSON Schema (or IntelliJ plugin) for JSON object required by Elasticsearch Create Index API

Is there an available JSON schema for the request object required by Elasticsearch Create Index API?
I didn't find any schema in the JSON Schema Store.
I want to validate the JSON object in IntelliJ IDEA and have the assisted edit.
Alternatively, is there any IntelliJ plugin with built-in support for editing these files? I did not find the support in any of the existing Elasticsearch* plugins.
You can use the create index API to add a new index to an Elasticsearch cluster. When creating an index, you can specify the following:
Settings for the index
Mappings for fields in the index
Index aliases
The JSON object to create Elasticsearch index looks like this:
{
"settings": {
...
},
"mappings": {
...
},
"aliases": {
...
}
}

Timestamp mapping in Spark to Elasticsearch

I am writing logs using Spark to elasticsearch.Logs are in JSON format having timestamp field.
example { "timestamp": "2016-11-02 21:16:06.116" }
When I write the Json logs to Elastic index, timestamp is analysed as String instead of date. I tried setting the property in sparkconf using sparkConf.set("es.mapping.timestamp", "timestamp") but it throws following error at runtime : org.apache.spark.util.TaskCompletionListenerException: failed to parse timestamp [2016-11-03 15:46:55.1155]
you can change timestamp data format
2016-11-02 21:16:06.116 -> 2016-11-02T21:16:06.116
i using 2016-11-02T21:16:06.116 insert to Elastic is work
type properties
"create_time": {
"format": "strict_date_optional_time||epoch_millis",
"type": "date"

By Default couchbaseCheckPoint type created for document in elasticsearch sync with couchbase

I am new to elasticsearch and couchbase.I am using both in my project.My requirement is to sync couchbase bucket with elasticsearch indices by using couchbase XDCR.
In couchbase, Bucket name "Employee" and structure of one this document is
{
"empName":"Stev Jobs",
"dept":"IT",
"company":"xxxx",
"salary":"30000",
"country":"USA"
}
I created index in elasticsearch that is employee and also created cluster reference in couchbase with elasticsearch cluster.
After setting all this I started replication between employee bucket of couchbase and employee index of elasticsearch. It created indices in elasticsearch,but this index contains more than couchbase bucket documents.
My couchbase bucket employee has 182 records but in elasticsearch employee index showing docs 1025.
And In couchbase when sync it showing some error s.this error are like below
2015-05-22 09:07:44 [Vb Rep] Error replicating vbucket 98. Please see logs for details.
2015-05-22 09:07:44 [Vb Rep] Error replicating vbucket 697. Please see logs for details.
In elasticsearch my employee index docs structure like
{
"_index": "employee",
"_type": "couchbaseCheckpoint",
"_id": "vbucket921UUID",
"_score": 1,
"_source": {
"doc": {
"uuid": "ec88aeb16c00427698f079d8a3fa7097"
}
}
}
And i write search query like,I run this query in http://127.0.0.1:9200/_plugin/head/
http://127.0.0.1:9200/employee/_search/
{
"query": {
"match":{
"query":"ec88aeb16c00427698f079d8a3fa7097",
"fields":["uuid"]
}
}
}
It giving error
{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[J2CjiG2vQqqrG2h5jlsudg][couchrecords][0]: SearchParseException[[couchrecords][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][1]: SearchParseException[[couchrecords][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][2]: SearchParseException[[couchrecords][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][3]: SearchParseException[[couchrecords][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }{[J2CjiG2vQqqrG2h5jlsudg][couchrecords][4]: SearchParseException[[couchrecords][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"query":{"match":{"query":"ec88aeb16c00427698f079d8a3fa7097","fields":["uuid"]}}}]]]; nested: QueryParsingException[[couchrecords] [match] query parsed in simplified form, with direct field name, but included more options than just the field name, possibly use its 'options' form, with 'query' element?]; }]",
"status": 400
}
The couchbaseCheckpoint document is used by the plugin to save state for each vBucket, so that it can emulate the XDCR protocol correctly. That's why there are 1025 of them - 1024 vBuckets plus one global state doc.
The fact that you have 1025 docs in ElasticSearch means that you ONLY got the state docs and none of the actual docs got replicated. Did you set up mappings in ElasticSearch like it says in the installation guide? This looks like a problem with indexing, so the ElasticSearch log will actually have meaningful errors that will tell you why it can't index any of your documents. The Couchbase log only tells you that it couldn't replicate something

How to reject "invalid" documents in ElasticSearch

We're currently using a Couchbase Plugin (transport-couchbase) to transport and index the data into ElasticSearch (http://docs.couchbase.com/couchbase-elastic-search/)
I've taken a look at ElasticSearch's mapping documentation here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html
My understanding is that if you rely on defaults for ElasticSearch, once a document gets indexed, ElasticSearch will create a dynamic mapping for that document type. This is what we've defaulted to.
We ran into issues where after adding a specific document type, and when the transport plugin inserts an "invalid" document (the document's field type is now different -- from string -> array), ElasticSearch throws an exception and essentially breaks the replication from Couchbase to ElasticSearch. The exception looks like this:
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: unknown property
[xyz]
java.lang.RuntimeException: indexing error MapperParsingException[failed to parse
[doc.myfield]]; nested: ElasticsearchIllegalArgumentException[unknown property[xyz]]
Is there a way we can configure ElasticSearch so that "invalid" documents simply get filtered without throwing exception and breaking the replication?
Thanks.
{
"tweet" : {
"dynamic": "strict",
"properties" : {
"message" : {"type" : "string", "store" : true }
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html

Logstash/ElasticSearch: guesses wrong for datatype for field

The log files I'm trying to import into Logstash contain a field that sometimes looks like a date/time and sometimes does not. Unfortunately, the first occurrence looked like a date/time and someone (logstash or elasticsearch) decided to define the field as a date/time. When trying to import a later log record, Elasticsearch has an exception:
Failed to execute [index ...]
org.elasticsearch.index.mapper.MapperParsingException: Failed to parse [#fields.field99]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:320)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:587)
...
Caused by: java.lang.IllegalArgumentException: Invalid format: "empty"
at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:747)
...
Question: How do I tell logstash/elasticsearch to not define this field as a date/time? I would like all the fields from my log (except the one explicit timestamp field) to be defined as just text.
Question: it appears that logstash gives up trying to import records from the log file after seeing this one that elasticsearch throws an exception on. How can I tell logstash to ignore this exception and keep trying to import the other records from the log file?
I found the answer to my first question myself.
Before adding data through Logstash, I had to set the defaults for Elasticsearch to treat the field as "string" instead of "date".
I did this by creating a defaults.js file like this:
{
"template": "logstash-*",
"mappings": {
`"_default_"`: {
"dynamic_templates": [{
"fields_template": {
"mapping": { "type": "string" },
"path_match": "#fields.*"
}
}]
}
}
}
and telling Elasticsearch to use it before adding any data through Logstash:
curl -XPUT 'http://localhost:9200/_template/template_logstash/' -d #defaults_for_elasticsearch.js
Hope this helps someone else.

Resources