Can logstash perform statistical analysis on the data coming from filebeat? - elasticsearch

OK,my problem is if it is possible to use logstash to perform statistical analysis on the collected log data.Now I have used filebeat to collect nginx logs into the es cluster and put the required labels on these logs.I plan to read these logs from the es cluster and write a program to make statistics on these logs, such as the traffic in a certain region for a period of time.Now, I want to know whether the logs collected by filebeat can be transferred to logstash for data statistics.
After a short period of research, I haven't found that logstash has this function. I hope you can help me.Thanks.
I want to know whether logstash can realize the functions I need

Logstash is basically ingests, transforms, and ships your data regardless of format or complexity. Derive structure from unstructured data with grok, decipher geo coordinates from IP addresses, anonymize or exclude sensitive fields, and ease overall processing. For I don't know your specific use case, but you can use Kibana for analyzing statistical data. You even don't need logstash if you have a node in your elasticsearch cluster with ingest node roles.

Related

Running awk in logstash

I do not have the ability to do much but receive unstructured syslogs from Kafka which have been produced with logstash.
When I attach logstash as a consumer, these syslogs are all over the place and contain half a dozen patterns or more which very wildly. This is something more fitting to be run somehow streamed with an awk filter since the programmatic approach to passing incoming messages is actually quite sttisghtforward with such a tool.
Does anyone have any input on how one could attach a consumer to a Kafka topic and procure incoming logs and ship these logs in am intelligent way towards an elasticsearch clister?
Try to use grok expressions in your LOGSTASH config to parse the logs https://logz.io/blog/logstash-grok/ . This should allow you to filter, transform or drop data.
Or use something like CRIBL in between KAFKA and ELASTIC https://docs.cribl.io/stream/about/
Note on the CRIBL page how under sources KAFKA is one of the supported sources and ELASTIC is one of the supported destinations. This should allow to transform your data before ingesting it into ELASTIC.

What are the differences between different Elastic data stream types?

Elastic docs mentions that Elastic data stream supports the following types: logs, metrics and synthetics. What are the differences between these types?
I tested storing some data as logs and metrics types separately and I don't see any difference when querying the data. Are both types interchangeable or are they stored differently?
Those are different types of data sets collected by the new Elastic Agent and Fleet integration:
The logs type is for logs data, i.e. what Filebeat used to send to Elasticsearch.
The metrics type is for metric data, i.e. what Metricbeat used to send to Elasticsearch
The synthetics type is for uptime and status check data, i.e. what Heartbeat used to send to Elasticsearch.
Now, with Fleet, all the Beats have been refactored into a single agent called Elastic Agent which can do all of that, so instead of having to install all the *Beats, you just need to install that agent and enable/disable/configure whatever type of data you want to gather and index into Elasticsearch. All of that through a nice, powerful and centralized Kibana UI.
Beats are now simply Elastic Agent modules that you can enable/disable and they will all write their data into indexes that follow a new taxonomy and naming scheme, which is based on those types, which are nothing more than a generic way describing the nature of data they contain, i.e. logs, metrics, synthetics, etc.

How can I find the most used query from Elasticsearch?

I have a Elasticsearch cluster running on AWS Elasticsearch instance. It is up running for a few months. I'd like to know the most used query requests over the last few months. Does Elasticsearch save all queries somewhere I can search? Or do I have to programmatically save the requests for analysis?
As far as I'm aware, Elasticsearch doesn't by default save a record or frequency histogram of all queries. However, there's a way you could have it log all queries, and then ship the logs somewhere to be aggregated/searched for the top results (incidentally this is something you could use Elasticsearch for :D). Sadly, you'll only be able to track queries after you configure this, I doubt that you'll be able to find any record of your historical queries the last few months.
To do this, you'd take advantage of Elasticsearch's slow query log. The default thresholds are designed to only log slow queries, but if you set those defaults to 0s then Elasticsearch would log any query as a slow query, giving you a record of all queries. See that link above for detailed instructions how, you could set this for a whole cluster in your yaml configuration file like
index.search.slowlog.threshold.fetch.debug: 0s
or set it dynamically per-index with
PUT /<my-index-name>/_settings
{
"index.search.slowlog.threshold.query.debug": "0s"
}
To be clear the log level you choose doesn't strictly matter, but utilizing debug for this would allow you to keep logging actually slow queries at the more dangerous levels like info and warn, which you might find useful.
I'm not familiar with how to configure an AWS elasticsearch cluster, but as the above are core Elasticsearch settings in all the versions I'm aware of there should be a way to do it.
Happy searching!

Can Beats update existing documents in Elasticsearch?

Consider the following use case:
I want the information from one particular log line to be indexed into Elasticsearch, as a document X.
I want the information from some log line further down the log file to be indexed into the same document X (not overriding the original, just adding more data).
The first part, I can obviously achieve with filebeat.
For the second, does anyone have any idea about how to approach it? Could I still use filebeat + some pipeline on an ingest node for example?
Clearly, I can use the ES API to update the said document, but I was looking for some solution that doesn't require changes to my application - rather, it is all possible to achieve using the log files.
Thanks in advance!
No, this is not something that Beats were intended to accomplish. Enrichment like you describe is one of the things that Logstash can help with.
Logstash has an Elasticsearch input that would allow you to retrieve data from ES and use it in the pipeline for enrichment. And the Elasticsearch output supports upsert operations (update if exists, insert new if not). Using both those features you can enrich and update documents as new data comes in.
You might want to consider ingesting the log lines as is to Elasticearch. Then using Logstash, build a separate index that is entity specific and driven based on data from the logs.

Indexing logs with es-hadoop

I am new to elasticsearch and want to index my website logs which are stored on HDFS for fast querying.
I have a well structured pipeline which runs a script every 20 minutes to ingest the data into HDFS.
I want to integrate elasticsearch with it, so that it also indexes these logs based on particular field(s) and thereby giving faster query results using spark SQL.
So, my question is, can I index my data based on particular field(s) only?
Also, my logs are saved in avro file format. Does es provides a way to directly index avro serialized data or do I need to convert it into some other format?
Thank you in advance.
I would suggest you to look at Elasticsearch, Logstash and Kibana stack that should be good enough to full fill your requirement. Putting it on HDFS and then using ES would be additional overhead.
Instead, you can use Logstash to pump data into ES, index on whatever fields you wish to query and build easy dashboards in less than 10 minutes of exercise. Take a look at this tutorial for better step-by-step guide.
http://hadooptutorials.co.in/tutorials/elasticsearch/log-analytics-using-elasticsearch-logstash-kibana.html

Resources