Multiple nodes in a Single elastic server - elasticsearch

I am seeing multiple nodes in a single elastic server.
where I had specified to be only one.
this server is used to parse logstash logs

Probably you have connected the logstash instances with transport client. As you can see there is only one data node in the screenshot. THis way logstash instances connects to the cluster as a elastic node, but do not get index requests because they are set as data and master false.

Related

Logstash - is Pull possible?

We are trying to build up an ElasticSearch data collector. The ElasticSearch cluster should receive data from different servers. These servers are at other locations (and networks) than the ElasticSearch cluster. The clients are connected to the ElasticCluster via a one-way VPN connections.
As a first attempt we installed logstash on each client server to collect the data, filter it and send it to the ElasticCluster. So far it was no problem in a test environment. The problem is now that the LogStash from the client tries to establish a connection to ElasticSearch. However, this attempt is blocked by the firewall. It is however possible to open a connection from the ElasticCluster side to each client and receive the data. What we need is a way to get the data from LogStash so that we open a connection and pull the data from LogStash (PULL). Is there a way to do this without changing the VPN configuration?
Logstash push events, if your logstash instances can't initiate the connection with the elasticsearch nodes, you will need something in the middle or allow the traffic on the firewall/VPN.
For example, you can have a elasticsearch to where the logstash servers can push data and then another logstash in your main cluster environment where you will have a pipeline in which the input will be the elasticsearch in the middle, this way the data will be pulled from the elasticsearch.
edit:
As I've said in the comment, you need to have something like this image.
Here you have your servers sending data to a logstash instance, this logstash has an output to an elasticsearch instance, so it starts the connection pushing the data.
On your main cluster, where you have your elasticsearch cluster and an one way VPN that only can start a connection, you will have another logstash, this logstash will then have an input that will query the outside elasticsearch node, pulling the data.
In the logstash pipeline you can have a elasticsearch input, which queries a elasticsearch node, then send the data received to filters and outputs.
input {
elasticsearch { the elasticsearch in the middle }
}
filter {
your filters
}
output {
elasticsearch { your cluster nodes }
}
Is it clearly now?

Will ElasticSearch rest client find out the cluster nodes automatically?

I am using Rest Client of Elastic Search and Elastic Search will have
Nodes in a cluster.
If I connect to one node then will it automatically find other nodes
of cluster while processing the request and do load balancing ?
Or I need to take care of that while creating the Rest Client?
If "sniffer" is active, before the request (in background) the client will update your internal list of node data. It works for default client and Rest Client.
The master node (defined in the cluster configuration) will do the load balancing.

What happens if logstash sends data to elasticsearch at a rate faster than it can index?

So I have multiple hosts with logstash installed on each host. Logstash on all these hosts reads from the log files generated by the host and sends data to my single aws elasticsearch cluster.
Now considering a scenario where large quantities of logs are being generated by each host at the same time. Since logstash is installed on each host and it just forwards the data to the es cluster I assume that even if my elasticsearch cluster is not able to index it, my hosts won't be affected. Are the logs just loss in such a scenario?
Can my host machines get affected in any way?
In short, you may lose some logs on the host machines, and that's why messaging solutions like kafka are used https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html#deploying-message-queueing

Elasticsearch 0.90.0: Putting existing server (with some data) into cluster

I am using elasticsearch 0.90.0 in my production environment. Currently, I am using only a single server (along with jetty plugin to restrict write access).
Now I want a cluster of two servers consisting of one old server (with data) and one new server. Now, I need my data on both of my servers, so that in case if anyone of them fails, data can be fetched from another. What should I do? Will the normal configuration work? Can I copy data folder from one server to another and expect it to work properly when placed in a cluster? Or should I clone elasticsearch folder itself on my second machine?
By normal configuration, I mean this:-
cluster.name: elasticsearch
node.name: "John"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: []
Please help!
You should not have to copy data manually.
If you kept the default settings, adding one more node to the cluster should trigger :
creation of one replica by existing shard on your new node
rebalancing of the primary shards between your two nodes
You will have to add the adresses of your nodes as hosts as you have enabled unicast communication.
I recommend you to make some tests with ElasticSearch instances on your personal computer in order to check this behavior.

Logstash cluster output to Elasticseach cluster without multicast

I want to run logstash -> elasticsearch with high availability and cannot find an easy way to achieve it. Please review how I see it and correct me:
Goal:
5 machines each running elasticsearch united into a single cluster.
5 machines each running logstash server and streaming data into elasticsearch cluster.
N machines under monitoring each running lumberjack and streaming data into logstash servers.
Constraint:
It is supposed to be run on PaaS (CoreOS/Docker) so multi-casting
discovery does not work.
Solution:
Lumberjack allows to specify a list of logstash servers to forward data to. Lumberjack will randomly select the target server and switch to another one if this server goes down. It works.
I can use zookeeper discovery plugin to construct elasticsearch cluster. It works.
With multi-casting each logstash server discovers and joins the elasticsearch cluster. Without multicasting it allows me to specify a single elasticsearch host. But it is not high availability. I want to output to the cluster, not a single host that can go down.
Question:
Is it realistic to add a zookeeper discovery plugin to logstash's embedded elasticsearch? How?
Is there an easier (natural) solution for this problem?
Thanks!
You could potentially run a separate (non-embedded) Elasticsearch instance within the Logstash container, but configure Elasticsearch not to store data, maybe set these as the master nodes.
node.data: false
node.master: true
You could then add your Zookeeper plugin to all Elasticsearch instances so they form the cluster.
Logstash then logs over http to the local Elasticsearch, who works out where in the 5 data storing nodes to actually index the data.
Alternatively this Q explains how to get plugins working with the embedded version of Elasticsearch Logstash output to Elasticsearch on AWS EC2

Resources