I've already got my index (response_summary) created using logstash, which puts data into the index from a MySQL database.
My concern here is, how will I be able to update the index manually whenever a new set of records are being added to the database without deleting and recreating the index yet again.
Or is there a way that it can be done automatically, whenever a db change is done?
Any help could be appreciated.
No way with ES. There were the rivers in ES, but they were removed in ES 2.0. The alternative is the Logstash JDBC input plugin to automatically pickup changes based on a defined schedule.
For doing the same with files, you have the LS file input plugin which is tailing the files to pick up the new changes and, also, to keep track of where it left off in case LS is restarted.
Related
We want to keep track of all the changes of a document, so we want to store all the document versions in separate index.
Is there a way when a new document is added or changes to send the entire document in another index? Maybe there is a processor for this use case?
As far as I know, Elasticsearch as such supports only version numbers but there is no way to trace back to previous version.
You could maintain version history in a seperate elastic index
Whenever you update main_index ensure that you update main_index as well
POST main_index/_doc/doc_id
POST main_index/_doc/doc_id_version
May be you can configure logstash to do this...not sure
I have an Oracle DB. Logstash retrieves data from Oracle and puts it to ElasticSearch.
But when Logstash makes planned export every 5 minutes, ElasticSearch filled with copies cause old data still exist. This is an obvious situation. Oracle's condition almost not changed during this 5 minutes. Let's say - added 2-3 rows, and 4-5 deleted.
How can we replace old data with new without copies?
For example:
Delete the whole old index;
Create new index with the same name and make the same configuration (nGram configuration and mapping);
Add all new data;
Wait for 5 minutes and repeat.
It's pretty easy: create a new index for each import and apply the mappings, switch your alias afterwards to the most recent index. Remove old indices if needed. Your currenr data will be always searchable while indexing the most recent data.
Here are the sources you'll probalbly need to read:
Use aliases (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html) to point to the most current data when searching in elasticsearch (BTW it`s always a good idea to have aliases in place).
Use rollover api (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-rollover-index.html) to create a new index for each import run - note the alias handling here too.
Use index templates (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html) to autmatically apply the mappings/settings for your newly created indices.
Shrink, close and/or delete old indices to keep your cluster handling data you really need. Have a look on the curator (https://github.com/elastic/curator) as standalone tool.
You just need to use the fingerprint/hash of each document , or hash of the uniq fields in each document , as the document id , so that eveytime you can overwirte the same documents with updated one , in place , while adding new documents as well.
But this approach will not work with deleting data from oracle.
I'm creating an index though logstash and pushing data to it from a MySQL database. But what I noticed in elasticsearch was once the whole data is uploaded, it starts deleting some of the docs. The total number of docs is 160729. Without the scheduler it works fine.
I inserted the cron scheduler in order to check whether new rows have been added to the table. Can that be the issue?
My logstash conf looks like this.
Where am I going wrong? Or is this behavior common?
Any help could be appreciated.
The docs.deleted number doesn't mean that your documents are being deleted, but simply that existing documents are being "updated" and the older version of the updated document is marked as deleted in the process.
Those documents marked as deleted will be eventually cleaned up as Lucene merges segments in the background.
I am uploading data to elasticSearch using batch process. I am getting data once in a day from third party which need to be uploaded in elasticSearch.
My question is can I maintain past, current & future version of index in elasticSearch?
Below are the thinking:
If Batch process is success :
1.Upload the data in future version of index.
2.Copy the data of current version of index to past.
3.Copy future version of index data to current version.
If Batch process is fail:
1.Do nothing and continue with the current version of index.
Can anyone please help me with this?
This is usually done with aliases. E.g.
Alias pointing to working yesterday's index:
working_index -> index_2016_12_01
Create new index_2016_12_02, upload data, if everything is ok switch alias (Alias API allows transactional changes.)
working_index -> index_2016_12_02
Then you can archive or delete or just leave untouched the old index
Always perform all the queries against alias instead of real index name.
is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.