How to get hostname from Elasticsearch Python module? - elasticsearch

Using Elasticsearch, how do we get the working Elastic host?
es = Elasticsearch()
health = es.cluster.health
Above statement will print the health of a Elasticsearch host. But how to get the working host from this?

Use hosts = es.transport.hosts to get the list of hosts.
You can also use something like:
con = elasticsearch.connection.RequestsHttpConnection(**hosts[0])
con.perform_request(...)
But even better, you can do something like :
es.transport.perform_request('GET', '/_cat/tasks')
in order to perform any request on a healthy host from the list of configured hosts.

For a newer version [>=8.x.x] this is possible by accessing .transport.node_pool.all()
# get a list of all connections
nodes = [node.base_url for node in es.transport.node_pool.all()]

Related

How to add dynamic hosts in Elasticsearch and logstash

I have prototype working for me with Devices sending logs and then logstash parsing it and putting into elasticsearch.
Logstash output code :-
output{
if [type] == "json" {
elasticsearch {
hosts => ["host1:9200","host2:9200","host3:9200"]
index => "index-metrics-%{+xxxx.ww}"
}
}
}
Now My Question is :
I will be producing this solution. For simplicity assume that I have one Cluster and I have right now 5 nodes inside that cluster.
So I know I can give array of 5 nodes IP / Hostname in elasticsearch output plugin and then it will round robin to distribute data.
How can I avoid putting all my node IP / hostnames into logstash config file.
As system goes into production I don't want to manually go into each logstash instance and update these hosts.
What are the best practices one should follow in this case ?
My requirement is :
I want to run my ES cluster and I want to add / remove / update any number of node at any time. I need all of my logstash instances send data irrespective of changes at ES side.
Thanks.
If you want to add/remove/update you will need to run sed or some kind of string replacement before the service startup. Logstash configs are "compiled" and cannot be changed that way.
hosts => [$HOSTS]
...
$ HOSTS="\"host1:9200\",\"host2:9200\""
$ sed "s/\$HOSTS/$HOSTS/g" $config
Your other option is to use environment variables for the dynamic portion, but that won't allow you to use a dynamic amount of hosts.

Bind Elastic-Search to localhost as well as an IP address

modules-network in Elastic-Search documentation says that it can bind to more than one network addresses by specifying an array of IP addresses in network.bind_host
I put the following in my config/elasticsearch.yaml:
# Used a real IP address in the below settings
network.bind_host: ["10.10.10.10","_local_"]
network.publish_host: 10.10.10.10
But it does not work and I get the following error:
failed to send join request to master [{Perseus}{AB0...B-kw}{10.10.10.172}{10.10.10.172:9300}],
reason [RemoteTransportException[[Perseus][10.10.10.172:9300][internal:discovery/zen/join]];
nested: ConnectTransportException[[Glitch][10.10.10.164:9300]
connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[Connection refused: /10.10.10.164:9300];
Any ideas what am I doing wrong?
All my other elastic search nodes are running happily on other machines.
I want to bind all elastic-search nodes to a respective IP address as well as _localhost_ so that I can run some storm-jobs on each of the machines whose EsBolts feed only into localhost.
That ways, none of my EsBolts need to round-robin the feed to several Elastic-Search nodes.

Ganglia seeing nodes but not metrics

I have a hadoop cluster with 7 nodes, 1 master and 6 core nodes. Ganglia is setup on each machine, and the web front end correctly shows 7 hosts.
But it only shows metrics from the master node (with both gmetad and gmond). The other nodes have the same gmond.conf file as the master node, and the web front end clearly sees the nodes. I don't understand how ganglia can recognize 7 hosts but only show metrics from the box with gmetad.
Any help would be appreciated. Is there a quick way to see if those nodes are even sending data? Or is this a networking issue?
update#1: when I telnet into a gmond host machine that is not the master node, and look at port 8649, I see the XML but no data. When I telnet to 8649 on the master machine, I see XML and data. Any suggestions of where to go from here?
Set this to all gmond.conf files of every node you want to monitor:
send_metadata_interval = 15 // or something.
Now all the nodes and their metrics are showed in master (gmetad).
This extra configuration is necessary if you are running in a unicast mode, i.e., if you are specifying a host in udp_send_channel rather than mcast_join. In the multi-cast mode, the gmond deamons can query each other any time and proactive sending of monitoring data is not required.
In gmond configuration, ensure the following is all provided:-
cluster {
name = "my cluster" #is this the same name as given in gmetad conf?
## Cluster name
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
udp_send_channel {
#mcast_join = 239.2.11.71 ## Comment this
host = 192.168.1.10 ## IP address/hostname of gmetad node
port = 8649
ttl = 1
}
/* comment out this block itself
udp_recv_channel {
...
}
*/
tcp_accept_channel {
port = 8649
}
save and quit. Restart your gmond daemon. Then execute "netcat 8649". Are you able to see XML with metrics now?

logstash with elasticsearch_http

Apparently logstash OnDemand account does not work when I wanted to post an issue.
Anyways, I have a logstash setup with redis, elasticsearch, and kibana. My logstash are collecting logs from several files and putting in redis just fine.
Logstash version 1.3.3
Elasticsearch version 1.0.1
The only thing I have in elasticsearch_http for logstash is the host name. This all setup seems to glue together just fine.
The problem is that the elasticsearch_http is not consuming the redis entries as they come. What I have seen by running it in debug mode is that it flush about 100 entries after every 1 min (flush_size and idle_flush_time's default values). The documentation however states, from what I understand is, that it will force a flush in case the 100 flush_size is not satisfied (for example we had 10 messages in last 1 min). But it seems to work the other way. Its flushing about 100 messages every 1 min only. I changed the size to 2000 and it flush 2000 every min or so.
Here is my logstash-indexer.conf
input {
redis {
host => "1xx.xxx.xxx.93"
data_type => "list"
key => "testlogs"
codec => json
}
}
output {
elasticsearch_http {
host => "1xx.xxx.xxx.93"
}
}
Here is my elasticsearch.yml
cluster.name: logger
node.name: "logstash"
transport.tcp.port: 9300
http.port: 9200
discovery.zen.ping.unicast.hosts: ["1xx.xxx.xxx.93:9300"]
discovery.zen.ping.multicast.enabled: false
#discovery.zen.ping.unicast.enabled: true
network.bind_host: 1xx.xxx.xxx.93
network.publish_host: 1xx.xxx.xxx.93
The indexer, elasticsearch, redis, and kibana are on same server. The log collection from file is done on another server.
So I'm going to suggest a couple of different approaches to solve you problem. Logstash as you are discovering can be a bit quirky so I've found a these approaches useful in dealing with unexpected behavior from logstash.
Use the elasticsearch output instead of elasticsearch_http. You
can get the same functionality by using elasticsearch output with
protocol set to http. The elasticsearch output is more mature
(milestone 2 vs milestone 3) and I've seen this change make a
difference before.
Set the defaults for idle_flush_time and flush_size. There have
been issues with Logstash defaults previously, I've found it to be a
lot safer to set them explicitly. idle_flush_time is in seconds,
flush_size is the number of records to flush.
Upgrade to more recent versions of logstash. There is
enough of a change in how logstash is deployed with version 1.4.X
(http://logstash.net/docs/1.4.1/release-notes) that I'd that I'd
bite the bullet and upgrade. It's also significantly easier to get
attention if you still have a problem with the most recent stable
major release.
Make certain your Redis version matches those support by your
logstash version.
Experiment with setting the batch, batch_events and batch_timeout
values for the Redis output. You are using the list data_type.
list supports various batch options and as with some other
parameters it's best not to assume the defaults are always being set
correctly.
Do all of the above. In addition to trying the first set of
suggestions, I'd try all of them together in various combinations.
Keep careful records of each test run. Seems obvious but between all
the variations above it's easy to lose track - I'd keep careful
records and try to change only one variation at a time.

Connecting to elasticsearch cluster in NEST

Let's assume I have several elasticsearch machines in a cluster: 192.168.1.1, 192.168.1.2 and 192.168.1.3
Any of the machines can go down. It doesn't look like NEST supports providing a range of IPs to try to connect.
So how do I make sure I connect to any of the available machines from Nest? Just try to open connection to one, if TryConnect didn't work, try another?
You can run a local ES instance at your application server (eg your web server) and config it to work as a load balancer:
Set node.client: true (or node.master: false and node.data: false) for this local ES config to make it a load balancer. This mean ES will not become master nor contains data
Config it to join the cluster (your 3 nodes don't need to know this ES)
Config NEST to use local ES as your search server
Then this ES become a part of your cluster, and will distribute your requests to suitable nodes
If you don't want "load balancer", then you have to manually checking on client side to determine which node is alive.
Since you have a small set of nodes, you can use a StaticConnectionPool:
var uri1 = new Uri("192.168.1.1");
var uri2 = new Uri("192.168.1.2");
var uri3 = new Uri("192.168.1.3");
var uris = new List<Uri> { uri1, uri2, uri3 };
var connectionPool = new StaticConnectionPool(uris);
var connectionSettings = new ConnectionSettings(connectionPool); // <-- need to be reused
var client = new ElasticClient(connectionSettings);
An important point to keep in mind is to reuse the same ConnectionSetting when creating a new elastic client, since elasticsearch cache is per ConnectionSetting. See this GitHub post:
...In any case its important to share the same ConnectionSettings
instance across any elastic client you instantiate. ElasticClient can
be a singleton or not as long as each instance shares the same
ConnectionSettings instance.
All of our caches are per ConnectionSettings, this includes
serialization caches.
Also a single ConnectionSettings holds a single IConnectionPool and
IConnection something you definitely want to reuse across requests.
I would set up one of the nodes as a load balancer. Meaning that the URL your are calling should allways be up.
Though if you increase the number of replicas you can call any of the nodes by URL and still access the same data. ElasticSearch does not care which one you access while in a cluster. So you could build your own range of ips in your application.

Resources