I'm creating an index though logstash and pushing data to it from a MySQL database. But what I noticed in elasticsearch was once the whole data is uploaded, it starts deleting some of the docs. The total number of docs is 160729. Without the scheduler it works fine.
I inserted the cron scheduler in order to check whether new rows have been added to the table. Can that be the issue?
My logstash conf looks like this.
Where am I going wrong? Or is this behavior common?
Any help could be appreciated.
The docs.deleted number doesn't mean that your documents are being deleted, but simply that existing documents are being "updated" and the older version of the updated document is marked as deleted in the process.
Those documents marked as deleted will be eventually cleaned up as Lucene merges segments in the background.
Related
Our log processing went a bit haywire and we ended up with 2-5 copies of some logs in our ElasticSearch cluster. Fortunately, each one has a md5 hash stored with it in an indexed field (which is its own problem, but for another day).
So, in theory, we should be able to delete all by one copy of every document by using that field. However, I can't figure out how to do that. Any suggestions?
I've already got my index (response_summary) created using logstash, which puts data into the index from a MySQL database.
My concern here is, how will I be able to update the index manually whenever a new set of records are being added to the database without deleting and recreating the index yet again.
Or is there a way that it can be done automatically, whenever a db change is done?
Any help could be appreciated.
No way with ES. There were the rivers in ES, but they were removed in ES 2.0. The alternative is the Logstash JDBC input plugin to automatically pickup changes based on a defined schedule.
For doing the same with files, you have the LS file input plugin which is tailing the files to pick up the new changes and, also, to keep track of where it left off in case LS is restarted.
From reading this article (Lucene's Handling of Deleted Documents), I understand that deleted documents in Elasticsearch are simply marked as deleted, such that they may remain on the disk for some time afterwards.
I was therefore wondering if there was a way to recover deleted documents in Elasticsearch?
Deleted documents and old document versions are totally removed by the segment merging process :(
This is the moment when those old deleted documents are purged from the filesystem. Deleted documents (or old versions of updated documents) are not copied over to the new bigger segment.
https://www.elastic.co/guide/en/elasticsearch/guide/current/merge-process.html
is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.
I'm having troubles storing documents within a 3-node Elasticsearch cluster that previously was able to store documents. I use the Java API to send bulks of documents to Elasticsearch, which are accepted (no failure in BulkResponse object) AND Elasticsearch has heavy index activities. However, the number of documents are not increased and I assume that none of them are store.
I've looked into Elasticsearch logs (of all three nodes) but I see no errors or warnings.
Note: I've had to restart two nodes previously but search/query is working perfectly. (the count in the image starts at ~17:00 as I've installed the Marvel plugin at this time)
What can I do to solve or debug the problem?
Sorry for this point blank code blindness by me! I forgot to skip the cursor when reading from MongoDB and therefore re-inserted the same 1000 documents into Elasticsearch for thousands of times!
Learning: If this problem occurs check if you select the correct documents in your database and that these documents are not already stored in ES.
Sidenote to Marvel: It would be great is this could be indicated in any way - e.g. by having a chart with "updated documents" (I rechecked and could not find one)