Elasticsearch - missing data - elasticsearch

I have been planning to use ELK for our production environment and seems to be running into a weird problem -
the problem is that while loading a sample of the production log file I realized that there is a huge mismatch in the number of events being published by Filebeat and what we see in kibana. My first doubt was on filebeat but i could verify that all the events were successfully received in logstash.
I also checked logstash (by enabling debug mode ) and could see all the events were received and processed (i am using the following filters date , json ) and i could see them getting processed successfully
but when i do a search in kibana I only get to see the percent of the number of logs being actually published (e.g. only 16000 out of 350K). No exception or error in either logstash or elasticsearch logs.
I have tried zapping the entire data by doing the following so far :
Stopped all processes for ES, Logstash and kibana.
Deleted all the index files, cleared the cache , deleted mappings
stopped filebeat, deleted registry files (since its running in windows)
Restarted elasticsearch, logstash and filebeat (in that order)
but same results. i get only 2 out of 8 records (in the shortened file) and even less when i use the full file
i tried increasing the time windows in kibana to 10 years (:)) to see if they are being pushed to the wrong year but got nothing
I have read almost all threads related to the missing data but nothing seems to work.
any pointers would help !

Related

Difference between using Filebeat and Logstash to push log file to Elasticsearch

I am trying out the ELK to visualise my log file. I have tried different setups:
Logstash file input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html
Logstash Beats input plugin https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html with Filebeat Logstash output https://www.elastic.co/guide/en/beats/filebeat/current/logstash-output.html
Filebeat Elasticsearch output https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html
Can someone list out their differences and when to use which setup? If it is not for here, please point me to the right place like Super User or DevOp or Server Fault.
1) To use logstash file input you need a logstash instance running on the machine from where you want to collect the logs, if the logs are on the same machine that you are already running logstash this is not a problem, but if the logs are on remote machines, a logstash instance is not always recommended because it needs more resources than filebeat.
2 and 3) For collecting logs on remote machines filebeat is recommended since it needs less resources than a logstash instance, you would use the logstash output if you want to parse your logs, add or remove fields or make some enrichment on your data, if you don't need to do anything like that you can use the elasticsearch output and send the data directly to elasticsearch.
This is the main difference, if your logs are on the same machine that you are running logstash, you can use the file input, if you need to collect logs from remote machines, you can use filebeat and send it to logstash if you want to make transformations on your data, or send directly to elasticsearch if you don't need to make transformations on your data.
Another advantage of using filebeat, even on the logstash machine, is that if your logstash instance is down, you won't lose any logs, filebeat will resend the events, using the file input you can lose events in some cases.
An additional point for large scale application is that if you have a lot of Beat (FileBeat, HeartBeat, MetricBeat...) instances, you would not want them altogether open connection and sending data directly to Elasticsearch instance at the same time.
Having too many concurrent indexing connections may result in a high bulk queue, bad responsiveness and timeouts. And for that reason in most cases, the common setup is to have Logstash placed between Beat instances and Elasticsearch to control the indexing.
And for larger scale system, the common setup is having a buffering message queue (Apache Kafka, Rabbit MQ or Redis) between Beats and Logstash for resilency to avoid congestion on Logstash during event spikes.
Figures are captured from Logz.io. They also have a good
article on this topic.
Not really familiar with (2).
But,
Logstash(1) is usually a good choice to take a content play around with it using input/output filters, match it to your analyzers, then send it to Elasticsearch.
Ex.
You point the Logstash to your MySql which takes a row modify the data (maybe do some math on it, then Concat some and cut out some words then send it to ElasticSearch as processed data).
As for Logbeat(2), it's a perfect choice to pick up an already processed data and pass it to elasticsearch.
Logstash (as the name clearly states) is mostly good for log files and stuff like that. usually you can do tiny changes to those.
Ex. I have some log files in my servers (incl errors, syslogs, process logs..)
Logstash listens to those files, automatically picks up new lines added to it and sends those to Elasticsearch.
Then you can filter some things in elasticsearch and find what's important to you.
p.s: logstash has a really good way of load balancing too many data to ES.
You can now use filebeat to send logs to elasticsearch directly or logstash (without a logstash agent, but still need a logstash server of course).
Main advantage is that logstash will allow you to custom parse each line of the logs...whereas filebeat alone will simply send the log and there is not much separation of fields.
Elasticsearch will still index and store the data.

Send multiple logs from filebeat to logstash in a timely manner

I have a server where all logs are present in a directory.
Now these files are separated by date. How can I setup filebeat such that all log files from these are sent to kibana (and how to configure this) on other server to receive logs in the same timely manner in a single file.
For example: in server A: I have 40 log files for last 40 days of log
I want these 40 logs in a timely manner, from oldest to newest in a single file in other server.
And also the file with today's date will be updating with new logs.
I have configured filebeat and logstash such that sync is being maintained, but the logs are not in timely manner because of which I'm facing problem in processing it by some of my logic.
glob pattern
/directory to logs/*.log
If you are asking how to remotely sync a set of log files to a single file in time sorted order using filebeat and logstash then...
If you set the harvester_limit to 1, so that only one file is processed at a time, then I think you can use scan.order and scan.sort to get filebeat to send the data in the right order. logstash is more of a problem. In the current version you can disable the java execution engine ('pipeline.java_execution: false' in logstash.yml) and set '--pipeline.workers 1', in which case logstash will preserve order.
In future releases I do not forsee elastic maintaining two execution engines, so once the ruby execution engine is retired it will not be possible to prevent events being re-ordered in the pipeline (the java engine routinely re-orders events in the pipeline in reproducible but unpredicatable ways).

ELK Stack: Data not appearing in Kibana

I'm new to the ELK stack so I'm not sure what the problem is. I have a configuration file (see screenshot, it's based on the elasticsearch tutorial):
Configuration File
Logstash is able to read the logs (it says Pipeline main started) but when the configuration file is run, elasticsearch doesn't react. I can search through the files
However, when I open Kibana, it says no results found. I checked and made sure that my range is the full day.
Any help would be appreciated!

How to watch the logstash log?

For my enterprise application distributed and structured logging, I use logstash for log aggregation and elastic search as log storage. I have the clear control pushing logs from my application to logstash. On the other hand, from logstash to elastic search having very thin control.
Assume, if my elasticsearch goes down for some stupid reason, The logstash log(/var/log/logstash/logstash.log) is recording the reason clearly like the following one.
Attempted to send a bulk request to Elasticsearch configured at '["http://localhost:9200/"]', but Elasticsearch appears to be unreachable or down! {:client_config=>{:hosts=>["http://localhost:9200/"], :ssl=>nil, :transport_options=>{:socket_timeout=>0, :request_timeout=>0, :proxy=>nil, :ssl=>{}}, :transport_class=>Elasticsearch::Transport::Transport::HTTP::Manticore, :logger=>nil, :tracer=>nil, :reload_connections=>false, :retry_on_failure=>false, :reload_on_failure=>false, :randomize_hosts=>false}, :error_message=>"Connection refused", :class=>"Manticore::SocketException", :level=>:error}
How will I get noticed OR notified for the error level logs from logstash?
Should be doable with the following 3 steps:
1) Depends on how you want to get notified. If an email is sufficient you could use the Logstash email output-plugin.
But there are many more output plugins available.
2) To restrict certain events you can do stuff like that in your Logstash config (example is taken from the Elastic support site):
if [level] == "ERROR" {
output {
...
}
}
The if clause is not limited to the level field of your JSON; you are able to apply it for any of your JSON fields of course, which makes it more powerful.
3) To make this work (and not run into a logging cycle) you need either:
Start a second Logstash instance on your system (just observing the Logstash ERROR log), which should be okay from what is written here
Or you build a more complicated configuration, using just one Logstash instance. This configuration has to forward log-statements from YOUR application to Elasitcsearch while logstaments from Logstash ERROR logs are forwarded to the e.g. Logstash email output-plugin.
Side note: you may want to have a look at Filebeat which works very well with Logstash (Its from Elastic as well) and it is even more light-weighted than Logstash. It allows stuff like include_lines: ["^ERR", "^WARN"] in your configuration.
To receive input from Filebeat you will have to adopt the config to send data to Logstash and for Logstash you will have to active and use the Beats input plugin described here.

How to Analyze logs from multiple sources in ELK

I have started working on ELK recently and have a doubt regarding handling of multiple types of logs.
I have two sets of logs on my server that I want to analyse, one from my android application and the other from my website. I have successfully transferred logs from this server via filebeat to the ELK server.
I have created two filters for either types of logs and have successfully imported these logs into logstash and then Kibana.
This link helped do the above stuff.
https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-centos-7
The above link directs to use the logs in the filebeat index in Kibana and start analysing(I successfully did for one type of logs). But the problem that I am facing is that since both these logs are very different, they need to be analysed differently. How do I do this in Kibana. Should I create multiple filebeat indexes there and import them, or should it be just one single index, or some other way. I am not very clear on this(could not find much documentation), hence would request to please help and guide me here.
Elasticsearch organizes by index and type. Elastic used to compare these to SQL concepts, but now offers a new explanation.
Since you say that the logs are very different, Elastic is saying that you should use different indexes.
In Kibana, the visualization is tied to an index. If you had one panel from each index, you can show them both on the same dashboard.

Resources