I am working with Filebeat and Logstash to upload logs to Elastic (all are 7.3-oss version).
My log file contain billions of rows, yet elastic only show 10K documents.
When adding stdout output it seems like all the data is coming to Logstash, but for some reason Logstash uploads only 10,000 docs.
I added another output
stdout { codec => rubydebug }
for printing to the screen it seems like the data is coming from Filebeat, but for some reason Logstash only upload 10,000 docs.
Also tried removing the Json Filter in Logstash, but the issue still occur.
Filebeat config
filebeat.inputs:
- type: log
paths:
\\some-path\my.json
output.logstash:
hosts: ["localhost:5044"]
Logstash pipeline
input {
beats {
port => 5044
}
}
filter{
json{
source => "message"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => [ "machine-name:9200" ]
}
}
Logstash.yml
is empty as the default installation
I found that is was my search that caused the confusion.
According to
https://www.elastic.co/guide/en/elasticsearch/reference/7.3/search-request-body.html#request-body-search-track-total-hits,
Elastic simply didn't return the accurate hits (just stated that its greater than 10000).
Changing my search query
GET logstash-*/_search
{
"track_total_hits": true
}
returned the right size.
Related
I have a Spring boot application, that produces logs into a file.
I also have running Elastic search (in docker) and Kibana and Logstash (not in docker).
This is my Logstash config:
input {
file {
type => "java"
path => "C:\Users\user\Documents\logs\semblogs.log"
start_position => "beginning"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
stdout {
codec => rubydebug
}
}
Elastic is up and running. When I check for the data in index that was created like this:
http://localhost:9200/logstash-2019.11.04-000001/_search
it shows:
took 0
timed_out false
_shards
total 1
successful 1
skipped 0
failed 0
hits
total
value 0
relation "eq"
max_score null
hits []
In Kibana I also can't create an index, it says there are no data in elastic.
I suspect that Logstash is not sending incoming anything to Elastic, but I don't know why. There ARE logs in the log file from the app...
How do I avoid elasticsearch duplicate documents?
The elasticsearch index docs count (20,010,253) doesn’t match with logs line count (13,411,790).
documentation:
File input plugin.
File rotation is detected and handled by this input,
regardless of whether the file is rotated via a rename or a copy operation.
nifi:
real time nifi pipeline copies logs from nifi server to elk server.
nifi has rolling log files.
logs line count on elk server:
wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total
elasticsearch index docs count:
curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253
logstash input conf file:
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}
You can use fingerprint filter plugin: https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html
This can e.g. be used to create consistent document ids when inserting
events into Elasticsearch, allowing events in Logstash to cause
existing documents to be updated rather than new documents to be
created.
Using ELK stack, is it possible to generate reports on existing dump of logs?
For example:
I have some 2 GB of Apache access logs and I want to have the dashboard reports showing:
All requests, with status code 400
All requests, with pattern like "GET http://example.com/abc/.*"
Appreciate, any example links.
Yes, it is possible. You should:
Install and setup the ELK stack.
Install filebeat, configure it to harvest your logs, and to forward the data to logstash.
In logstash, listen to filebeat input, use the grok to process/break up your data, and forward it to elastichsearch something like:
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{COMMONAPACHELOG}" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "filebeat-logstash-%{+YYYY.MM.dd}"
}
}
In kibana, setup your indices, and query for data, e.g.
response: 400
verb: GET AND message: "http://example.com/abc/"
I have a log file created in S3 bucket every minute.
The data is "\x01" delimited. One of the column is a timestamp field.
I want to load this data into elastic search.
I tried using the following logstash conf. But it doesn't seem to work. I don't see any output. I took some reference from http://brewhouse.io/blog/2014/11/04/big-data-with-elk-stack.html
Logstash config file is as follows:
input {
s3 {
bucket => "mybucketname"
credentials => [ "accesskey", "secretkey" ]
}
}
filter {
csv {
columns => [ "col1", "col2", "#timestamp" ]
separator => "\x01"
}
}
output {
stdout { }
}
How do I modify this file to take in new file coming in every minute?
I would then eventually want to connect Kibana to ES to visualize the changes.
Just use logstash-forwarder to send the files from S3, you will have to generate certificates for authorization.
There is a really nice tutorial: https://www.digitalocean.com/community/tutorials/how-to-use-logstash-and-kibana-to-centralize-logs-on-centos-7
if you getting I/O errors, mb you can solve them by setting cluster:
inside logstash.conf:
output {
elasticsearch {
host => "127.0.0.1"
cluster => CLUSTER_NAME
}
inside elasticsearch.yml:
cluster.name: CLUSTER_NAME
if you getting problems generating certificates, you can generate them using this:
https://raw.githubusercontent.com/driskell/log-courier/develop/src/lc-tlscert/lc-tlscert.go
I also found better init.d for logstash-forwarder on CentOS:
http://smuth.me/posts/centos-6-logstash-forwarder-init-script.html
I just altered my logstash-elasticearch setup to include rabbitmq rather since I wasn't able to get messages into logstash fast enough with tcp connection. Now it is blazing fast as logstash reads from the queue but I do not see the messages coming through into kibana. One error shows the timestamp field missing. I used the plugin/head to view the data and it is odd:
_index _type _id ▼_score #version #timestamp
pt-index logs Bv4Kp7tbSuy8YyNi7NEEdg 1 1 2014-03-27T12:37:29.641Z
this is what my conf file looks like now and below what it did look like:
input {
rabbitmq {
queue => "logstash_queueII"
host => "xxx.xxx.x.xxx"
exchange => "logstash.dataII"
vhost => "/myhost"
}
}
output {
elasticsearch{
host => "xxx.xxx.xx.xxx"
index => "pt-index"
codec => "json_lines"
}
}
this is what it was before rabbitmq:
input {
tcp {
codec => "json_lines"
port => "1516"
}
}
output {
elasticsearch {
embedded => "true"
}
}
Now the only change I made was to create a specific index in elasticsearch and have the data indexed there but now it seems the format of the message has changed. It is still json messages with 2/3 fields but not sure what logstash is reading or changing from rabbitmq. I can see data flowing into the histogram but the fields are gone.
"2014-03-18T14:32:02" "2014-03-18T14:36:24" "166" "google"
these are the fields I would expect. Like I said all this worked before I made the change.
I have seen examples of a similar configurations, but they do not use the output codec of "json_lines" going into Elasticsearch. The output codec would adjust the formatting of the data as it leaves logstash which I do not believe is nessisary. Try deleting the codec and see what logstash is outputting by adding a file output to a log, be sure this is only short sample...