We know each ES node has 9200(default) http port for CRUD operation. And there are three types ES nodes: Master, Data and Client.
So my question is that the same http port in different node type acting different roles or say they are little bit different?
I have the question just because I saw different result when I query the http port in different node type sometimes.
It seems something very weird when I call Master's http port.Did you guys meet this situation please? Is that because some cache data? Thanks!
Related
I want to have a graph where all recent IPs that requested my webserver get shown as total request count. Is something like this doable? Can I add a query and remove it afterwards via Prometheus?
Technically, yes. You will need to:
Expose some metric (probably a counter) in your server - say, requests_count, with a label; say, ip
Whenever you receive a request, inc the metric with the label set to the requester IP
In Grafana, graph the metric, likely summing it by the IP address to handle the case where you have several horizontally scaled servers handling requests sum(your_prometheus_namespace_requests_count) by (ip)
Set the Legend of the graph in Grafana to {{ ip }} to 'name' each line after the IP address it represents
However, every different label value a metric has causes a whole new metric to exist in the Prometheus time-series database; you can think of a metric like requests_count{ip="192.168.0.1"}=1 to be somewhat similar to requests_count_ip_192_168_0_1{}=1 in terms of how it consumes memory. Each metric instance currently being held in the Prometheus TSDB head takes something on the order of 3kB to exist. What that means is that if you're handling millions of requests, you're going to be swamping Prometheus' memory with gigabytes of data just from this one metric alone. A more detailed explanation about this issue exists in this other answer: https://stackoverflow.com/a/69167162/511258
With that in mind, this approach would make sense if you know for a fact you expect a small volume of IP addresses to connect (maybe on an internal intranet, or a client you distribute to a small number of known clients), but if you are planning to deploy to the web this would allow a very easy way for people to (unknowingly, most likely) crash your monitoring systems.
You may want to investigate an alternative -- for example, Grafana is capable of ingesting data from some common log aggregation platforms, so perhaps you can do some structured (e.g. JSON) logging, hold that in e.g. Elasticsearch, and then create a graph from the data held within that.
Let's say i have this HTTP2 service, that has a list of users and this user hair color, in memory and database well.
Now i want to scale this up into multiple nodes - however i do not want the same user to be in two different servers memory - each server shall handle those specific users. This means i need to inform the load balancer where each user is being handled. In case of de-scaling, i need to inform this user is nowhere and can be routed to any server or by a given rule - IE server with less memory being used.
Would any1 know if ALB load balancer supports that ? One path i was thinking of using Query string parameter-based routing, so i could inform in the request itself something like destination_node = (int)user_id % 4 in case i had 4 nodes for instance - and this worked well in a proof of concept but that leads to a few issues:
The service itself would need to know how many instances there are to balance.
I could not guarantee even balancing, its basically a luck based balancing.
What would be the preferred approach for this, or what is a common way of solving this problem ? Does AWS ELB supports this out of the box ? I was trying to avoid having to write my own balancer, a middleware that keeps track of what services are handling what users, whose responsibility would be distributing the requests among those servers.
In AWS Application Load Balancer (ALB) it is possible to write Routing-Rules on
Host Header
HTTP Header
HTTP Request Method
Path Pattern
Query String
Source IP
But at the moment there is no way to route under dynamic conditions.
If it possible to group your data, i would prefere path pattern like
/users/blond/123
since i havent found an answer to that on the net, im trying it here :
I was wondering how SNMP get his traffic data on a router ?
I am actually monitoring a router with 2 different way :
- With snmp which seems to give me the exact number of octets going trought the router,
- With a custom data flow collector ( a bit complicated think about it as netflow or sflow) who give me data only when a flow close (i guess its that right, if im wrong tell me).
So how snmp does that, did they got a poller on the port ? or do they just acess to something in the hardware ?
SNMP is just a protocol, which in particular defines a data model to represent the agent status and configuration; there is no particular technology behinf the curitain. Often routers have an internal infrastructure that collects data and send to manager
The underlying operating system keeps the counters for incoming octets and so on. The SNMP agent on the device usually reads the counters directly and returns the values to you via standard messages.
However, not familiar with the flow approach so cannot answer the other half of your question.
I have couple of Elasticsearch questions regarding client node:
Can I say: any nodes as long as they are opening HTTP port, I can treat them as "client" nodes, because we can do search/index through this node.
Actually we treat the node as client node when the cluster=false and data=false, if I set up 10 client nodes, do I need to route in my client side, I mean if I specify clientOne:9200 in my code as ES portal, then would clientOne forward other HTTP requests to other client nodes, otherwise, clientOne would be under very high pressure. i.e do they communicate with each other between client nodes?
When I specify client nodes in ES cluster, should I close other nodes' HTTP port? Because we can only query client nodes.
Do you think it's necessary to set up both data node and client node in the same machine, or just setup data node acts as client node as well, anyways it's in the same machine?
If the ES cluster would be heavily/frequently indexed while less searched, then I don't have to set up client node, because client node good for gathering data, right please?
For general search/index purpose should I use http port or tcp port, what's the difference in clients perspective please?
Yes, you can send queries via http to any node that has port 9200 open.
With node.data: false and node.master: false, you get a "client node". These are useful for offloading indexing and search traffic from your data nodes. If you have 10 of them, you would want to put a load balancer in front of them.
Closing the data node's http port (http.enabled: false) would keep them from serving client requests (probably good), though it would also prevent you from curl'ing them directly for stats, etc.
Client nodes are useful (see #2), so I wouldn't route traffic directly to your data nodes. Whether you run both a client and data node on the same piece of hardware would be dependent on the config of that machine (do you have sufficient RAM, etc).
Client node are also useful for indexing, because they know which data node should receive the data for storage. If you sent an indexing request to a random data node instead, the odds would be high that it would have to redirect that request to another node. That's a waste of time and resources, if you can create client nodes.
Having your clients join the cluster might give them access to more information about the cluster, but using http gives them a more generic "black box" interface. With http, you also don't have to keep your clients at the same version as your ES nodes.
Hope that helps.
I am new to elasticsearch. I have a cluster with 3 nodes on a same machine. To access each node I have separate url as the port changes(localhost:9200, localhost:9201, localhost:9202).
Now the question I have is that suppose my node 1(i.e. master node) dies then elasticsearch engine handle the situation very well and makes node 2 as master node but how does my application know that a node died and now I need to hit node 2 with port 9201?
Is there a way using which I always hit a single URL and internally it figures out which node to hit?
Thanks,
Pratz
The client search nodes with a discovery module. The name of the cluster in your clients configuration is important to get this working.
With a correct configuration (on client and cluster) you can bring a single node down without any (negative) effect on your client.
See the following links:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html