Apache Nifi Put data In ElasticSearch with _Parent - elasticsearch

I'm working with Apache Nifi and trying to put data in ElasticSearch using the processor PutElasticsearch, and it worked pretty good since i tried to add a child/parent relation in elasticsearch and so add parent in my put request.
here is my mapping elasticsearch :
"mappings": {
"myparent": {
},
"mychild": {
"_parent": {
"type": "myparent"
},
"properties": {
"attr1": {
"type": "string"
},
"attr2": {
"type": "date",
"format": "dateOptionalTime"
},
"attr3": {
"type": "string"
}
}
}
}
here is how i manualy insert data in "mychild" type :
POST /myindex/mychild/1?parent=[IDParent]
{
"attr1" : "02020",
"attr2" : "2016-10-10",
"attr3" : "toto"
}
i didn't find how to spécify the parentID.
Is there any way of doing it using PutElasticsearch other than using the processor InvokHTTP.
Thank you.

This is not possible today (NiFi 1.1.0 and below) with the PutElasticsearch processors, so InvokeHttp is your best option for now. I have written NIFI-3284 to cover this improvement.

Related

How to create a mutlitype index in Elasticsearch?

In several pages in Elasticsearch documentation is mentioned how to query a multi-type index.
But I failed to create one at the first place.
Here is my minimal example (on a Elasticsearch 6.x server):
PUT /myindex
{
"settings" : {
"number_of_shards" : 1
}
}
PUT /myindex/people/123
{
"first name": "John",
"last name": "Doe"
}
PUT /myindex/dog/456
{
"name": "Rex"
}
Index creation and fist insert did well, but at the dog type insert attempt:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [myindex] as the final mapping would have more than 1 type: [people, dog]"
}
],
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [myindex] as the final mapping would have more than 1 type: [people, dog]"
},
"status": 400
}
But this is exactly what I'm trying to do, buddy! Having "more than 1 type" in my index.
Do you know what I have to change in my calls to achieve this?
Many thanks.
Multiple mapping types are not supported from Elastic 6.0.0 onwards. See breaking changes for details.
You can still effectively use multiple types by implementing your own custom type field.
For example:
{
"mappings": {
"doc": {
"properties": {
"type": {
"type": "keyword"
},
"first_name": {
"type": "text"
},
"last_name": {
"type": "text"
}
}
}
}
}
This is described in removal of types.

Elasticsearch Mapping - Rename existing field

Is there anyway I can rename an element in an existing elasticsearch mapping without having to add a new element ?
If so whats the best way to do it in order to avoid breaking the existing mapping?
e.g. from fieldCamelcase to fieldCamelCase
{
"myType": {
"properties": {
"timestamp": {
"type": "date",
"format": "date_optional_time"
},
"fieldCamelcase": {
"type": "string",
"index": "not_analyzed"
},
"field_test": {
"type": "double"
}
}
}
}
You could do this by creating an Ingest pipeline, that contains a Rename Processor in combination with the Reindex API.
PUT _ingest/pipeline/my_rename_pipeline
{
"description" : "describe pipeline",
"processors" : [
{
"rename": {
"field": "fieldCamelcase",
"target_field": "fieldCamelCase"
}
}
]
}
POST _reindex
{
"source": {
"index": "source"
},
"dest": {
"index": "dest",
"pipeline": "my_rename_pipeline"
}
}
Note that you need to be running Elasticsearch 5.x in order to use ingest. If you're running < 5.x then you'll have to go with what #Val mentioned in his comment :)
Updating field name in ES (version>5, missing has been removed) using _update_by_query API:
Example:
POST http://localhost:9200/INDEX_NAME/_update_by_query
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "NEW_FIELD_NAME"
}
}
}
},
"script" : {
"inline": "ctx._source.NEW_FIELD_NAME = ctx._source.OLD_FIELD_NAME; ctx._source.remove(\"OLD_FIELD_NAME\");"
}
}
First of all, you must understand how elasticsearch and lucene store data, by immutable segments (you can read about easily on Internet).
So, any solution will remove/create documents and change mapping or create a new index so a new mapping as well.
The easiest way is to use the update by query API: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-update-by-query.html
POST /XXXX/_update_by_query
{
"query": {
"missing": {
"field": "fieldCamelCase"
}
},
"script" : {
"inline": "ctx._source.fieldCamelCase = ctx._source.fieldCamelcase; ctx._source.remove(\"fieldCamelcase\");"
}
}
Starting with ES 6.4 you can use "Field Aliases", which allow the functionality you're looking for with close to 0 work or resources.
Do note that aliases can only be used for searching - not for indexing new documents.

Parsing exception when creating index and mapping at once

I am receiving an exception when trying to create an index, along with a mapping. I am issuing a PUT to my local ElasticSearch instance (v. 5.1.1) http://127.0.0.1:9200/indexname with the following body
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "1"
}
},
"mappings": {
"examplemapping": {
"properties": {
"titel": {
"type": "text",
"index": false
},
"body": {
"type": "text"
},
"room": {
"type": "text",
"index": false
},
"link": {
"type": "text",
"index": false
}
}
}
}
}
I receive the following error
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "No handler for type [text] declared on field [body]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [examplemapping]: No handler for type [text] declared on field [body]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "No handler for type [text] declared on field [body]"
}
},
"status": 400
}
From the documentation on index creation it should be possible to create an index, and create one or more mappings at the same time:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html
I have read https://www.elastic.co/guide/en/elasticsearch/reference/current/string.html and believe that I am correctly using the new datatype, but the exception suggests otherwise.
Any help is greatly appreciated.
Resolution
Thanks to a comment by Val i was pointed in the right direction. Indeed I was not using version 5.1.1, but version 2.4.3
So why the confusion? Well, I have been running both versions (not at once), and startet and stopped them using the respective bat scripts:
call es-2.4.3/bin/service.bat start
call es-5.1.1/bin/elasticsearch-service.bat start
It seems that even though I have been running the latter, it was still ES2.4.3 that was started. This is probably caused by the logic inside the bat script.
Going forward I will keep in mind to check the version response from the service itself, and I'm gonna have to find a proper setup to run multiple versions of ElasticSearch.
Thanks for the answers.
I tried your settings on ElasticSeach 5.0.0 and it worked fine, output of GET indexname :
{
"indexname": {
"aliases": {},
"mappings": {
"examplemapping": {
"properties": {
"body": {
"type": "text"
},
"link": {
"type": "text",
"index": false
},
"room": {
"type": "text",
"index": false
},
"titel": {
"type": "text",
"index": false
}
}
}
},
"settings": {
"index": {
"creation_date": "1488892255496",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "GugRGgllQbCadCTj5oq4ow",
"version": {
"created": "5000099"
},
"provided_name": "testo"
}
}
}
}
Also please note that I would definitely recommend that you set a different value of "number_of_shards": "1" as a rule of thumb consider that Elasticsearch is allocating 1 thread per shard, and thus the bigger your shard will become, the slower the text search will be. Now also bear in mind that some overhead comes from allocating more shards, so don't "overallocate". See this post and this onefor more details.
Thanks to a comment by Val i was pointed in the right direction. Indeed I was not using version 5.1.1, but version 2.4.3
So why the confusion? Well, I have been running both versions (not at once), and startet and stopped them using the respective bat scripts:
call es-2.4.3/bin/service.bat start
call es-5.1.1/bin/elasticsearch-service.bat start
It seems that even though I have been running the latter, it was still ES2.4.3 that was started. This is probably caused by the logic inside the bat script.
Going forward I will keep in mind to check the version response from the service itself, and I'm gonna have to find a proper setup to run multiple versions of ElasticSearch.
Thanks to all who pitched in!

Elasticsearch indexing homogenous objects under dynamic keys

The kind of document we want to index and query contains variable keys but are grouped into a common root key as follows:
{
"articles": {
"0000000000000000000000000000000000000001": {
"crawled_at": "2016-05-18T19:26:47Z",
"language": "en",
"tags": [
"a",
"b",
"d"
]
},
"0000000000000000000000000000000000000002": {
"crawled_at": "2016-05-18T19:26:47Z",
"language": "en",
"tags": [
"b",
"c",
"d"
]
}
},
"articles_count": 2
}
We want to able to ask: what documents contains articles with tags "b" and "d", with language "en".
The reason why we don't use list for articles, is that elasticsearch can efficiently and automatically merge documents with partial updates. The challenge however is to index the objects inside under the variable keys. One possible way we tried is to use dynamic_templates as follows:
{
"sources": {
"dynamic": "strict",
"dynamic_templates": [
{
"article_template": {
"mapping": {
"fields": {
"crawled_at": {
"format": "dateOptionalTime",
"type": "date"
},
"language": {
"index": "not_analyzed",
"type": "string"
},
"tags": {
"index": "not_analyzed",
"type": "string"
}
}
},
"path_match": "articles.*"
}
}
],
"properties": {
"articles": {
"dynamic": false,
"type": "object"
},
"articles_count": {
"type": "integer"
}
}
}
}
However this dynamic template fails because when documents are inserted, the following can be found in the logs:
[2016-05-30 17:44:45,424][WARN ][index.codec] [node]
[main] no index mapper found for field:
[articles.0000000000000000000000000000000000000001.language] returning
default postings format
Same for the two other fields as well. When I try to query for the existence of a certain article, or even articles it doesn't return any document (no error but empty hits):
curl -LsS -XGET 'localhost:9200/main/sources/_search' -d '{"query":{"exists":{"field":"articles"}}}'
When I query for the existence of articles_count, it returns everything. Is there a minor error in what we are trying to achieve, for example in the schema: the definition of articles as a property and in the dynamic template? What about the types and dynamic false? The path seems correct. Maybe this is not possible to define templates for objects in variable-keys, but it should be according to the documentation.
Otherwise, what alternatives are possible without changing the document if possible?
Notes: we have other types in the same index main that also have these fields like language, I ignore if it could influence. The version of ES we are using is 1.7.5 (we cannot upgrade to 2.X for now).

Replacing (Bulk Update) Nested documents in ElasticSearch

I have an ElasticSearch index with vacation rentals (100K+), each including a property with nested documents for availability dates (1000+ per 'parent' document). Periodically (several times daily), I need to replace the entire set of nested documents for each property (to have fresh data for availability per vacation rental property) - however ElasticSearch default behavior is to merge nested documents.
Here is a snippet of the mapping (availability dates in the "bookingInfo"):
{
"vacation-rental-properties": {
"mappings": {
"property": {
"dynamic": "false",
"properties": {
"bookingInfo": {
"type": "nested",
"properties": {
"avail": {
"type": "integer"
},
"datum": {
"type": "date",
"format": "dateOptionalTime"
},
"in": {
"type": "boolean"
},
"min": {
"type": "integer"
},
"out": {
"type": "boolean"
},
"u": {
"type": "integer"
}
}
},
// this part left out
}
}
}
}
Unfortunately, our current underlying business logic does not allow us to replace or update parts of the "bookingInfo" nested documents, we need to replace the entire array of nested documents. With the default behavior, updating the 'parent' doc, merely adds new nested docs to the "bookingInfo" (unless they exist, then they're updated) - leaving the index with a lot of old dates that should no longer be there (if they're in the past, they're not bookable anyway).
How do I go about making the update call to ES?
Currently using a bulk call such as (two lines for each doc):
{ "update" : {"_id" : "abcd1234", "_type" : "property", "_index" : "vacation-rental-properties"} }
{ "doc" : {"bookingInfo" : ["all of the documents here"]} }
I have found this question that seems related, and wonder if the following will work (first enabling scripts via script.inline: on in the config file for version 1.6+):
curl -XPOST localhost:9200/the-index-and-property-here/_update -d '{
"script" : "ctx._source.bookingInfo = updated_bookingInfo",
"params" : {
"updated_bookingInfo" : {"field": "bookingInfo"}
}
}'
How do I translate that to a bulk call for the above?
Using ElasticSearch 1.7, this is the way I solved it. I hope it can be of help to someone, as a future reference.
{ "update": { "_id": "abcd1234", "_retry_on_conflict" : 3} }\n
{ "script" : { "inline": "ctx._source.bookingInfo = param1", "lang" : "js", "params" : {"param1" : ["All of the nested docs here"]}}\n
...and so on for each entry in the bulk update call.

Resources