Getting old indexed elasticsearch data - elasticsearch

I have lot of data indexed in my elasticsearch.
I deleted elasticsearch folder and then extarct again fresh zip of elasticsearch and start the elasticsearch server.
I am surprised because after staring new elasticsearch server, I again found all old data and this problem persists again and again.
Can any please help me? I don't want to get all old data indexed in elasticsearch.
Regards

Given the cluster health response it's not a problem with multiple nodes running on the same cluster as suggested by Igor. I'd suggest you to check the java processes running. You could maybe have an elasticsearch hanging somewhere which keeps writing in that folder.

Related

ELK logstash cant create index in ES

after following this tuto (https://www.bmc.com/blogs/elasticsearch-logs-beats-logstash/) in order to use logstash to analyze some log files, my index was created fine at the first time, then I wanted to re-index new files with new filters and new repositories so I deleted via "curl XDELETE" the index and now when I restart logstash and filebeat, the index is not created anymore.. I dont see any errors while launching the components.
Do I need to delete something else in order to re-create my index?
Ok since my guess (see comments) was correct, here's the explanation:
To avoid that filebeat reads and publishes lines of a file over and over again, it uses a registry to store the current state of the harvester:
The registry file stores the state and location information that Filebeat uses to track where it was last reading.
As you stated, filebeat successfully harvested the files, sent the lines to logstash and logstash published the events to elasticsearch which created the desired index. Since filebeat updated its registry, no more lines had to be harvested and thus no events were published to logstash again, even when you deleted the index. When you inserted some new lines, filebeat reopened the harvester and published only the new lines (which came after the "registry checkpoint") to logstash.
The default location of the registry file is ${path.data}/registry (see Filebeat's Directory Layout Overview).
... maybe the curl api call is not the best solution to restart the index
This has nothing to do with deleting the index. Deleting the index happens inside elasticsearch. Filebeat has no clue about your actions in elasticsearch.
Q: Is there a way to re-create an index based on old logs?
Yes, there are some ways you should take into consideration:
You can use the reindex API which copies documents from one index to another. You can update the documents while reindexing them into the new index.
In contrast to the reindex you can use the update by query API to update documents that will remain in the original index.
Lastly you could of course delete the registry file. However this could cause data loss. But for development purposes I guess that's fine.
Hope I could help you.

Elasticsearch Shard Location

I am trying to setup an elasticsearch cluster and have a question thats bothering me. I am transitioning from Marklogic to Elasticsearch and have this concept of storing data on a different disk rather than on the same disk where my software i.e. MarkLogic is installed. I know how to do it in MarkLogic but somehow can not find anything on this on elasticsearch. Can anyone point me to a document that can help me configure my shard on a different machine where elasticsearch is not installed?
Thanks,
S.
You simply need to change the path.data setting in your elasticsearch.yml configuration file:
path:
data:
- /mnt/hda1
- /mnt/hda2
- /mnt/hda3
You can use a single location or several and when you do, ES will store your index data on those locations. Note that data pertaining to a given shard will always be located at the same path location.

How to Analyze logs from multiple sources in ELK

I have started working on ELK recently and have a doubt regarding handling of multiple types of logs.
I have two sets of logs on my server that I want to analyse, one from my android application and the other from my website. I have successfully transferred logs from this server via filebeat to the ELK server.
I have created two filters for either types of logs and have successfully imported these logs into logstash and then Kibana.
This link helped do the above stuff.
https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-centos-7
The above link directs to use the logs in the filebeat index in Kibana and start analysing(I successfully did for one type of logs). But the problem that I am facing is that since both these logs are very different, they need to be analysed differently. How do I do this in Kibana. Should I create multiple filebeat indexes there and import them, or should it be just one single index, or some other way. I am not very clear on this(could not find much documentation), hence would request to please help and guide me here.
Elasticsearch organizes by index and type. Elastic used to compare these to SQL concepts, but now offers a new explanation.
Since you say that the logs are very different, Elastic is saying that you should use different indexes.
In Kibana, the visualization is tied to an index. If you had one panel from each index, you can show them both on the same dashboard.

How to reset replication stream between couchbase and elasticsearch

I have a couchbase cluster setup as the primary source for data. From this a subset of data is synced to a elasticsearch cluster via the Couchbase Transport Plugin for ElasticSearch(https://github.com/couchbaselabs/elasticsearch-transport-couchbase) which sets up an XDCR stream from couchbase to elasticsearch.
Due to some issues with the elasticsearch cluster all data needs to be synced again from couchbase to elasticsearch. I have tried recreating XDCR but that does not seem to help as it only copies a very small subset of documents. Is there a way by which this can be achieved?
Additional details
Couchbase version: 3.1.0
Number of couchbase documents: 50K+
Documents synced to elasticsearch: around 700 (expected 20K+)
If a document in couchbase is modified it is successfully synced to elasticsearch
The issue you're experiencing is likely in one of the following: XDCR, the Couchbase Transport Plugin for Elasticsearch, or Elasticsearch itself.
Start by checking for XDCR errors. You can find your XDCR logs using these instructions. Be aware that the Transport Plugin uses XDCR v1 and almost everything else in Couchbase uses v2.
Consult the advice in troubleshooting the Couchbase Transport Plugin for Elasticsearch. Instructions should work for you even though they are from the 4.0 docs.
Pay attention to how your documents are being mapped to Elasticsearch. You mention that you're expecting only a subset of documents to be synced to Elasticsearch, so it's possible that you have lost a setting or misconfigured something. You can enable logging and observe a small set of test data. At TRACE level, you should be able to see each document that is inspected.
If all of that fails, make sure the basics are working by indexing the beer sample dataset, following the directions in the Couchbase docs. ES is probably not the issue, but test with a fresh ES instance will rule out problems on that side.

How to recover data from a renamed Elasticsearch cluster?

I have just spent the best part of 12 hours indexing 70 million documents into Elasticsearch (1.4) on a single node, single server setup on an EC2 Ubuntu 14.04 box. This completed successfully however before taking a snapshot of my server I thought it would be wise to rename the cluster to prevent it accidentally joining production boxes in the future, what a mistake that was! After renaming in the elasticsearch.yml file and restarting the ES service my indexes have disappeared.
I saw the data was still present in the data dir under the old cluster name, i tried stopping ES, moving the data manually in the filesystem and then starting the ES service again but still no luck. I then tried renaming back to the old cluster name, putting everything back in place and still nothing. The data is still there, all 44gb of it but I have no idea how to get this back. I have spent the past 2 hours searching and all i can seem to find is advice on how to restore from a snapshot which I don't have. Any advice would be hugely appreciated - I really hope I haven't lost a day's work. I will never rename a cluster again!
Thanks in advance.
I finally fixed this on my own: Stopped the cluster, deleted the nodes directory that had been created in the new cluster, copied my old nodes directort over being sure to respect the old structure exactly, chowned the folder to elasticsearch just in case, started up the cluster and breathed a huge sigh of relief to see 72 million documents!

Resources