Tomstone deletion in couchbase - couchbase-lite

I have setup the autocomaction in couchbase and purge interval =1 day.
but could see that compacted is running as expected. but could see that tombstone are not deleting from couchbase.
Anyone help on this?

Related

Is it possible to restore data from Elasticsearch after deleting indices?

I am doing local development on my box and I have deleted some indices that were useful. I wasn't doing any snapshotting, is it possible to restore those indices if I ran the delete command about 5-6 hours ago and it acknowledged with a true.
No, it is not possible, if you deleted an index and do not have snapshots of it, you can't recover the data.

How to stop auto reindexing in elastic search if any update happens?

I am having a big use case with elasticsearch which has millions of records in it.
I will be updating the records frequently, say 1000 records per hour.
I don't want elastic search to reindex for my every update.
I am planning to reindex it on weekly basis.
Any Idea how to stop auto-reindex while update ?
Or any other better suggestion is welcome . Thanks in advance :)
Elasticsearch(ES) update an existing doc in below manner.
1. Deletes the old doc.
2. Index a new doc with the changes applied to it.
According to ES docs :-
In Elasticsearch, this lightweight process of writing and opening a
new segment is called a refresh. By default, every shard is refreshed
automatically once every second. This is why we say that Elasticsearch
has near real-time search: document changes are not visible to search
immediately, but will become visible within 1 second.
Note that these changes will not be visible/searchable until ES commits/flush these changes to disk cache and disk,which is control by soft-commit(es refresh interval, which is by default 1 second) and hard-commit(which actually write the document to disk, which prevent it being lost permanently and costly affair than a soft-commit).
You need to make sure, you tune your ES refresh interval, and do proper load testing, as setting it very low and very high has its own pros and cons.
for example setting it very less for example 1 second and if you have too many updates happening than it has a performance hit and it might crash your system. Also setting it very high for example 1 hour means you now don't have a NRT(near real time search) and during that time if your memory could contain again millions of doc(depending on your app) and can cause out of memory error, also committing on such a large memory is a very costly affair.

Documents in elasticsearch getting deleted automatically?

I'm creating an index though logstash and pushing data to it from a MySQL database. But what I noticed in elasticsearch was once the whole data is uploaded, it starts deleting some of the docs. The total number of docs is 160729. Without the scheduler it works fine.
I inserted the cron scheduler in order to check whether new rows have been added to the table. Can that be the issue?
My logstash conf looks like this.
Where am I going wrong? Or is this behavior common?
Any help could be appreciated.
The docs.deleted number doesn't mean that your documents are being deleted, but simply that existing documents are being "updated" and the older version of the updated document is marked as deleted in the process.
Those documents marked as deleted will be eventually cleaned up as Lucene merges segments in the background.

Solr (re)indexing database

I've looked for several other questions related to mine but for now I couldn't find a solution for my issue.
Here is the situation:
A database with table table_x
A cronjob which checks every 2 minutes to index newly added/updated content in table_x using Solr
Extra information about the cronjob and table_x
- The cronjobs checks a field in table_x to determine if row has to be indexed with Solr or not
- table_x contains over 400k records
What we want is Solr to reindex whole table_x. But there are (we think) 2 factors that are not clear for us:
- What will happen when Solr is indexing all 400k records and the cronjob detects more records to be indexed
- What will happen when a search query is performed on the website while Solr is indexing all 400k records?
If there is someone who can answer this to me?
Kind regards,
Pim
The question has two parts
What happens to indexing when you see the changes detected while the initial indexing is going on?
You can make the second cron trigger to wait till the first is completed. This is more of your application question and how you want to handle it.
How does query is affected by new indexing or indexing in progress?
Which version of solr you are using? You can use NearRelaTimeSearch to see the changes before har commits.

Use Elasticsearch as backup store

My application receives and parse thousands of small JSON snippets each about ~1Kb every hour. I want to create a backup of all incoming JSON snippets.
Is it a good idea to use Elasticsearch to backup this snippets in an index with f.ex. "number_of_replicas:" 4? Never read that anyone has used Elasticsearch for this.
Is my data safe in Elasticsearch when I use a cluster of servers and replicas or should I better use another storage for this use case?
(Writing it to the local file system isn't safe, as our hard discs crashes often. First I have thought about using HDFS, but this isn't made for small files.)
First you need to find difference between replica and backups.
replica is more than one copy of data at run time.It increases high availability and failover support,it wont support accidental delete of data.
Backup is copy of whole data at backup time.it will be used to restore when system crashed.
Elastic search for back up.. its not good idea.. Elastic search is a search engine not DB.If you have not configured ES cluster carefully,then you will end up with loss of data.
So in my opinion ,
To store json object, we got lot of dbs.. For example mongodb is a nosql db.We can easily configure it with more replicas.It means high availability of data and failover support.As you asked its also opensource and more reliable.
for more info about mongodb refer https://www.mongodb.org/
Update:
In elasticsearch if you create index with more shards it'll be distributed among nodes.If a node fails then the data will be lost.But in mongoDB more node means ,each mongodb node contains its own copy of data.If a mongodb fails then we can retrieve out data from replica mongodbs. We need to be more conscious about replica setup and shard allocation in Elasticsearch. But in mongoDB it's easier and good architecture too.
Note: I didn't say storing data in elasticsearch is not safe.I mean, comparing to mongodb,it's difficult to configure replica and maintain in elasticsearch.
Hope it helps..!

Resources