Multi Cluster Insertion Spring Data Elastic Search - spring-boot

I am trying to insert records in multiple cluster (not nodes). I am making use of spring-data-elastic search(version 2.0.4.version). I have tried to create 2 elasticsearchoperation instances in configuration with different bean name. And i am trying to insert using that object with index(IndexQuery indexQuery) method. But when i am trying to insert i am not able to keep mapping of fields(non analyze field type). Can someone please help me how to keep mapping also when inserting entity to elasticsearch.

Finally i was able to achieve it. I have created 2 instances of elasticsearchtemplate based on client instance, instead of elasticsearchoperations. And was able to insert to 2 different cluster.

Related

Fetching Index data from elasticsearch DB using Spring Data ElasticSearch

I have a java code which connects to Elasticsearch DB using Spring-data-elasticsearch and fetches all the index data by connecting to the repository and executing the findAll() method. The data received from ES is being processed by a seperate application. When new data is inserted into elastic search, I have the below queries
1. How can I fetch only the newly inserted data Programatically ?
2. Apart from using the DSL queries, Is there a way to Asyncronously get the new records as and when new data is inserted into elasticsearch DB.
I dont want to execute the findAll() method again. Because it returns the entire data ( including the previously processed records as well) .
Any help on this is much appreciated.
You will need to add a field (I call it createdAt here) to your entities that contains the timestamp when your application inserts into Elasticsearch. One possibility would be to use the auditing support of Spring Data Elasticsearch to have the value set automatically, or you set the value in your application. If the data is inserted by some other application you need to make sure that it contains a timestamp in a format that maps the field type definition of this field in your application.
Then you'd need to define a method in your repository like
SearchHits<T> findByCreatedAtAfter(Timestamp referenceValue);
As for getting a notification in some form when new data is inserted: I'm not aware that Elasticsearch offers something like that. You will probably need to regularly call the method that retrieves the data.

Migrating data from RDBMS to ElasticSearch using Apache NIfi

We are trying to migrate data from RDBMS to elastic search using Apache Nifi. We have created pipelines in Nifi and are able to transfer data but are facing some issues and wanted to check if someone already got over them.
Please provide inputs on the below items.
1.How to avoid auto-generating _id in elastic search. We want this to be set from a DB column. We tried providing the column name in the "Identifier Record Path" attribute in the PutElasticSearchHTTPRecord processor but were getting an error that the attribute name is not valid. Can you please let us know the acceptable format.
How to load nested objects into the index using NIfi? We are looking to maintain one to many relationships in the index using nested objects but were unable to find a configuration to do so. Do we have any processors to do this in Nifi? Please let us know.
Thanks in Advance!
It needs to be a RecordPath statement like /myidfield
You need to manually create nested fields in Elasticsearch. This is not a NiFi thing, but how Elasticsearch works. If you were to post a document with cURL, you would run into the same issues.

Update indices in Elasticsearch on adding new documents to my database

I'm new to elastic search however had to work with it. I have successfully set it up using logstash to connect it to my oracle database(one particular table). Now if new records are added to one of the tables in my oracle database(which I built the index on), what should be done?
I have thought of two solutions,
Re-build the indices by running the logstash conf file.
On insert into the table, also POST to elastic search.
The first solution is not working like it should. I mean that if 'users' is the table that I have updated with new records, then on re-building indices(for the 'users' table) in elastic search, the new records also should be reflected in the logstash get query.
The first should would help as a POC.
So, Any help is appreciated.
Thank you Val for pointing me in the right direction.
However, for the first brute-force solution it was about changing the document type in the logstash conf file.
{"document_type":"same_type"}
This must be consistent with the previously mentioned type. I had run it with different type, first time(Same_type). After adding new records, I used same_type. So, the elastic search as thrown an exception for multiple mapping rejection.
For further clarification, it looked up here.
Thank you guys.

Elasticsearch best practices

1) We are fairly new to Elasticsearch. In our spring boot application, we are using Spring's Elasticsearch that is based on in-memory node client. Insert/update/deletes happen on our primary relational database (DB2) and we use Elasticsearch solely for handling searches. We have a synchronizing mechanism to keep elastic search up to date with the latest changes
2) In production, we have 4 instances of our application running. To synchronize the in-memory elastic store on all 4 servers, we have a JMS topic in place where all the DB2 updates are posted. Application has a topic listener that will consume any DB changes posted to this JMS topic and update the in-memory elastic store.
Question:
i) Is the above an ideal way to implement Elasticsearch in your application? If not, what else would you recommend?
ii) Any Elasticsearch best practices that you can point us to?
Thanks Much!
1- In Prod, choose 3 master and 4 data nodes. Always an odd number of total servers
2- Define your mappings and index in advance, dont choose auto-create option.
Should define data types
Define amount as sclaed_float with 100 precision
All numeric fields should be defined as long so query ' between', 'sort' or aggregation.
Chose carefully between keyword and text field type. Use text where it is necessary.
3- Define external version if you update the same record, again and again, to avoid updating with stale data.

Is it possible for an Elasticsearch index to have a primary key comprised of multiple fields?

I have a multi-tenant system, whereby each tenant gets their own Mongo database within a MongoDB deployment.
However for elastic search indexing, this all goes into one elastic instance via Mongoosastic, tagged with a TenantDB to keep data separated when searching.
Currently we have some of the same _id's reused across the multiple databases in test data for various config collections(Different document content, same _id), however this is causing a problem when syncing to elastic as although they're in separate databases when they come into elastic with the same Type and ID one of them gets dropped.
Is it possible to specify both the ID and TenantDB as the primary key?
Solution 1: You can search for multiple index in Elasticsearch. But, If you can not separate your index for database, you can follow like below method. While syncing your data to elasticsearch, use a pattern to create elastic document _id. For example, from mongoDb1 use mdb1_{mongo_id}, from mongoDb2 use mdb2_{mongo_id} , etc. This will be unique your _ids if you have not same id in same mongo database.
Solution 2: Separate your index.

Resources