Update indices in Elasticsearch on adding new documents to my database - oracle

I'm new to elastic search however had to work with it. I have successfully set it up using logstash to connect it to my oracle database(one particular table). Now if new records are added to one of the tables in my oracle database(which I built the index on), what should be done?
I have thought of two solutions,
Re-build the indices by running the logstash conf file.
On insert into the table, also POST to elastic search.
The first solution is not working like it should. I mean that if 'users' is the table that I have updated with new records, then on re-building indices(for the 'users' table) in elastic search, the new records also should be reflected in the logstash get query.
The first should would help as a POC.
So, Any help is appreciated.

Thank you Val for pointing me in the right direction.
However, for the first brute-force solution it was about changing the document type in the logstash conf file.
{"document_type":"same_type"}
This must be consistent with the previously mentioned type. I had run it with different type, first time(Same_type). After adding new records, I used same_type. So, the elastic search as thrown an exception for multiple mapping rejection.
For further clarification, it looked up here.
Thank you guys.

Related

How I can remove only data from elastic search index not the complete index

I have one ELK index available using that I am showing visual dashboard.
My requirement is that I need to empty or remove the data only , not the index it self. How i can achieve this. I googled a lot . I am getting solution to remove the index, but i need only to remove the data so index will remain there.
I want to achieve this dynamically using command prompt.
You can simply delete all the data in the index if there's not too much of it:
POST my-index/_delete_by_query?q=*&wait_for_completion=false

Migrating data from RDBMS to ElasticSearch using Apache NIfi

We are trying to migrate data from RDBMS to elastic search using Apache Nifi. We have created pipelines in Nifi and are able to transfer data but are facing some issues and wanted to check if someone already got over them.
Please provide inputs on the below items.
1.How to avoid auto-generating _id in elastic search. We want this to be set from a DB column. We tried providing the column name in the "Identifier Record Path" attribute in the PutElasticSearchHTTPRecord processor but were getting an error that the attribute name is not valid. Can you please let us know the acceptable format.
How to load nested objects into the index using NIfi? We are looking to maintain one to many relationships in the index using nested objects but were unable to find a configuration to do so. Do we have any processors to do this in Nifi? Please let us know.
Thanks in Advance!
It needs to be a RecordPath statement like /myidfield
You need to manually create nested fields in Elasticsearch. This is not a NiFi thing, but how Elasticsearch works. If you were to post a document with cURL, you would run into the same issues.

How to make Logstash replace old data?

I have an Oracle DB. Logstash retrieves data from Oracle and puts it to ElasticSearch.
But when Logstash makes planned export every 5 minutes, ElasticSearch filled with copies cause old data still exist. This is an obvious situation. Oracle's condition almost not changed during this 5 minutes. Let's say - added 2-3 rows, and 4-5 deleted.
How can we replace old data with new without copies?
For example:
Delete the whole old index;
Create new index with the same name and make the same configuration (nGram configuration and mapping);
Add all new data;
Wait for 5 minutes and repeat.
It's pretty easy: create a new index for each import and apply the mappings, switch your alias afterwards to the most recent index. Remove old indices if needed. Your currenr data will be always searchable while indexing the most recent data.
Here are the sources you'll probalbly need to read:
Use aliases (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html) to point to the most current data when searching in elasticsearch (BTW it`s always a good idea to have aliases in place).
Use rollover api (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-rollover-index.html) to create a new index for each import run - note the alias handling here too.
Use index templates (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html) to autmatically apply the mappings/settings for your newly created indices.
Shrink, close and/or delete old indices to keep your cluster handling data you really need. Have a look on the curator (https://github.com/elastic/curator) as standalone tool.
You just need to use the fingerprint/hash of each document , or hash of the uniq fields in each document , as the document id , so that eveytime you can overwirte the same documents with updated one , in place , while adding new documents as well.
But this approach will not work with deleting data from oracle.

Documents in elasticsearch getting deleted automatically?

I'm creating an index though logstash and pushing data to it from a MySQL database. But what I noticed in elasticsearch was once the whole data is uploaded, it starts deleting some of the docs. The total number of docs is 160729. Without the scheduler it works fine.
I inserted the cron scheduler in order to check whether new rows have been added to the table. Can that be the issue?
My logstash conf looks like this.
Where am I going wrong? Or is this behavior common?
Any help could be appreciated.
The docs.deleted number doesn't mean that your documents are being deleted, but simply that existing documents are being "updated" and the older version of the updated document is marked as deleted in the process.
Those documents marked as deleted will be eventually cleaned up as Lucene merges segments in the background.

reindexing elastic search or updating indexes?

I am now on elastic search, I cant figure out how to update elastic search index,type or document without deleting and reindexing? or is it the best way to achieve it?
So if I have products in my sql product table, should I better delete product type and reindex it or even entire DB as index on elasticsearc. what is the best use case and how can I achieve it?
I would like to do it with Nest preferably but if it is easier, ElasticSearch works for me as well.
Thanks
This can be a real challenge! Historic records in elasticsearch will need to be reindexed when the template changes. New records will automatically be formatted according to the template you specify.
Using this link has helped us a lot:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
You'll want to be sure to have the logstash filter set up to match the fields in your template.

Resources