I have a ton of services: Node(s), MySQL(s), Redis(s), Elastic(s)...
I want to monitor how they connect to each other: Connection rate, Number alive connection... (Node1 create 30 connection to Node2/MySQL/Redis per second...) like Haproxy stat image attached below.
Currently i have two options:
Haproxy (proxy): I want to use single service Haproxy to archive this but it's seem very hard to use ALC detect what connection need forward to what service.
ELK (log center): I need to create log files on each service (Node, MySQL, Redis...) and then show them on the log center. I see that a ton of works to do that without built-in feature like Haproxy stat page.
How to do this? Is log center good in this case?
The problem
I think your problem is not collecting and pipelining the statistics to Elasticsearch, but instead the ton of work extracting metrics from your services because most of them do not have metric files/logs.
You'd then need to export them with some custom script, log them and capture it with filebeat, stream to a logstash for text processing and metric extraction so they are indexed in a way you can do some sort of analytics, and then send it to elasticsearch.
My take on the answer
At least for the 3 services you've referenced, there are Prometheus exporters readily available and you can find them here. The exporters are simple processes that will query your services native statistics APIs and expose a prometheus metric API for Prometheus to Scrape (poll).
After you have Prometheus scraping the metrics, you can display them in dashboards via Grafana (which is the de facto visualization layer for Prometheus) or bulk export your metrics to wherever you want (Elasticsearch, etc..) for visualization and exploration.
Conclusion
The benefits of this approach:
Prometheus can auto-discover new nodes you add to your networks
Readily available exporters from haproxy, redis and mysql for
Prometheus
No code needed, each exporter requires minimal
configuration specific to each monitored technology, it can easily
be containerized and deployed if your environment is container
oriented, otherwise you just need to run each exporter in the
correct machines
Prometheus is very, very easy to deploy
Use ELK - elasticsearch logstash and kibana stack with filebeat. Filebeat -will share the log file content with logstash
Logstash-will scan, filter and share the needed content to elastic search
Elasticsearch- will work as a db, store the content from logstash in json format as documents.
Kibana- with kibana you can search the required info. Also you can plot graphs and other visuals with the relevant data.
Related
As title, I need to get data from nginx access log to handle and store in db. So anyone have any ideas about this ? Thank you for reading this post
You should not be storing nginx logs in the DB and trying to read them through Laravel, it will very quickly cause you performance and storage issues especially on production. Other issues will be if you have various servers, how would you aggregate all the logs?
Common practice is to use NoSQL for such tasks. So you can setup another dedicated server where you export all your logs and analyze them. You use an exporter that you install on every one of your servers, point them to your log file and they export the logs to a central logs server. You can set this up yourself using something like ELK stack. With ELK stack you can use filebeat and logstash for this.
Better would be to use some of the services out there such as GCP logging, splunk, etc. You have to pay for them but they offer a lot of benefits. Splunk would provide you with an exporter, with gcp you could use fluentd. If you are using containers, you can also setup a fluentd container and shared volumes to export the logs.
When comes to centralized log tools, I see lot of comparison of ELK vs EFK vs Loki vs other.
But I have hard time to actually see information about "ELG", ELK (or EFK) but with Grafana instead of Kibana.
I know Grafana can use Elasticsearch as datasource, so it should be technically working. But how good is it? Any drawback compare to using Kibana? Maybe there are more existing dashboard for Kibana than Grafana when it comes to log?
I am asking this as I would like to have one UI system for both my metrics dashboard and my logs dashboard.
Kibana is part of the stack, so it is deeply integrated with elasticsearch, you have a lot of pre-built dashboards and apps inside Kibana like SIEM and Observability. If you use filebeat, metricbeat or any other beat to collect data it will have a lot of dashboards for a lot of systems, services and devices, so it is pretty easy to visualize your data without having to do a lot of work, basically you just need to follow the documentation.
But if you have some data that doesn't fit with one of pre-built dashboards, or want more flexibility and creat your own dashboards, Kibana needs more work than Grafana, and Kibana also only works with elasticsearch, so if you have other datasources you would need to put the data in elasticsearch. Also, if you want to have map visualizations, Kibana Map app is pretty good.
The Grafana plugin for Elasticsearch has some small bugs, but in overall it works fine, things probably will change for better since Elastic and Grafana made a partnership to improve the plugin.
So, if all your data is in elasticsearch, use Kibana, if you have different datasources, use grafana.
I am trying out the ELK to visualise my log file. I have tried different setups:
Logstash file input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html
Logstash Beats input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html with Filebeat Logstash output https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html
Filebeat Elasticsearch output https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html
Can someone list out their differences and when to use which setup? If it is not for here, please point me to the right place like Super User or DevOp or Server Fault.
1) To use logstash file input you need a logstash instance running on the machine from where you want to collect the logs, if the logs are on the same machine that you are already running logstash this is not a problem, but if the logs are on remote machines, a logstash instance is not always recommended because it needs more resources than filebeat.
2 and 3) For collecting logs on remote machines filebeat is recommended since it needs less resources than a logstash instance, you would use the logstash output if you want to parse your logs, add or remove fields or make some enrichment on your data, if you don't need to do anything like that you can use the elasticsearch output and send the data directly to elasticsearch.
This is the main difference, if your logs are on the same machine that you are running logstash, you can use the file input, if you need to collect logs from remote machines, you can use filebeat and send it to logstash if you want to make transformations on your data, or send directly to elasticsearch if you don't need to make transformations on your data.
Another advantage of using filebeat, even on the logstash machine, is that if your logstash instance is down, you won't lose any logs, filebeat will resend the events, using the file input you can lose events in some cases.
An additional point for large scale application is that if you have a lot of Beat (FileBeat, HeartBeat, MetricBeat...) instances, you would not want them altogether open connection and sending data directly to Elasticsearch instance at the same time.
Having too many concurrent indexing connections may result in a high bulk queue, bad responsiveness and timeouts. And for that reason in most cases, the common setup is to have Logstash placed between Beat instances and Elasticsearch to control the indexing.
And for larger scale system, the common setup is having a buffering message queue (Apache Kafka, Rabbit MQ or Redis) between Beats and Logstash for resilency to avoid congestion on Logstash during event spikes.
Figures are captured from Logz.io. They also have a good
article on this topic.
Not really familiar with (2).
But,
Logstash(1) is usually a good choice to take a content play around with it using input/output filters, match it to your analyzers, then send it to Elasticsearch.
Ex.
You point the Logstash to your MySql which takes a row modify the data (maybe do some math on it, then Concat some and cut out some words then send it to ElasticSearch as processed data).
As for Logbeat(2), it's a perfect choice to pick up an already processed data and pass it to elasticsearch.
Logstash (as the name clearly states) is mostly good for log files and stuff like that. usually you can do tiny changes to those.
Ex. I have some log files in my servers (incl errors, syslogs, process logs..)
Logstash listens to those files, automatically picks up new lines added to it and sends those to Elasticsearch.
Then you can filter some things in elasticsearch and find what's important to you.
p.s: logstash has a really good way of load balancing too many data to ES.
You can now use filebeat to send logs to elasticsearch directly or logstash (without a logstash agent, but still need a logstash server of course).
Main advantage is that logstash will allow you to custom parse each line of the logs...whereas filebeat alone will simply send the log and there is not much separation of fields.
Elasticsearch will still index and store the data.
Besides ELK standard goal for gathering application logs data i want to leverage this stack for advanced data collection such as JVM metrics (via JMX) and host's cpu/ram/disk/network utilization.
The most suitable one i thought is using metricbeat, but i doubt if metricbeat is enough for purposes described above.
Since i aiming at minimal stack of things to configure, will Metricbeat-Elasticsearch-Kibana be enough for collecting app logs,app jvm metrics,host's hardware utilization or there are some more suitable alternatives ?
UPDATE
Oh, i see now, that i need also filebeat besides metricbeat for gathering app logs.
Is there any out of the box single solution that combines filebeat and metricbeat agents ?
Currently Filebeat and Metricbeat are separate binaries and you need to run both:
Filebeat to collect your logs (and potentially parse them with Elasticsearch Ingest node).
Metricbeat with the system module for cpu/ram/disk/network and we also have a JMX / Jolokia module for that functionality.
I'm planning on using Elasticsearch to log all my application activities (like an audit log).
Considering how I have direct control over the application, should I directly push the data into Elasticsearch using their REST APIs or should I somehow use Logstash to feed data into Elasticsearch?
Is there any reason I should use Logstash when I can directly push data into Elasticsearch? It's an additional layer to manage.
If you need to parse different log formats (eventlog, syslog and so on), support different transports (UDP, TCP and so on) and log outputs use Logstash. If http is good for you and you collect logs only from one application use ES directly. Logstash is an additional tool. Details are here.