Filebeat index is getting created but with 0 documents - elasticsearch

I am trying to index my custom log file using filebeat. I am successfully running filebeat with pre-built modules like mysql, nginx etc. But when I actually try to use it with my application specific log file, index is created with 0 documents.
I could not find anywhere in the filebeats document if there are any specific steps need to be taken to ensure indexing takes place for the custom log files.
I did not get any error when I setup filebeats or run filebeats post setup.
Below is the filebeat.yml:
filebeat.inputs:
- type: log
enabled: true
paths:
- /Applications/MAMP/htdocs/247around-adminp-aws/application/logs/log-2020-12-21.log
include_lines: ['^INFO', '^ERROR']
fields:
app_id: crm
filebeat.config.modules:
setup.template.settings:
index.number_of_shards: 1
path: ${path.config}/modules.d/*.yml
setup.kibana:
output.elasticsearch:
hosts: ["localhost:9200"]
processors:
As can be seen, it is majorly default .yml file with very minor changes.
My custom log file log-2020-12-21.php is:
INFO - 2020-12-21 15:10:26 --> index Logging details have been captured for employee. Details are : Array
INFO - 2020-12-21 15:10:36 --> editpartner partner_id:1
INFO - 2020-12-21 15:10:36 --> SELECT DISTINCT service_id, brand, active
ERROR - 2020-12-21 15:10:36 --> Query error: Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'boloaaka.collateral.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
INFO - 2020-12-21 15:10:36 --> Database Error: A Database Error Occurred<br/>Array
ERROR - 2020-12-21 15:10:54 --> Query error: Expression #5 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'boloaaka.service_centres.district' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
INFO - 2020-12-21 15:10:54 --> Database Error: A Database Error Occurred<br/>Array
INFO - 2020-12-21 23:53:21 --> Loginindex
INFO - 2020-12-21 23:54:50 --> Loginindex
INFO - 2020-12-21 23:55:42 --> Loginindex
INFO - 2020-12-21 23:56:24 --> Loginindex
Index file is getting created with 0 documents:
Log file showing logs for filebeats setup and filebeats running:
https://pastebin.com/TK6uYXuq
Please help:
Why there are no error messages if something is wrong because of which documents are not getting indexed? I should be getting some error if things are not right.
How should I index my log file?
Where should I add pattern for my log file like key-value pair which would help me in searching the documents for relevant values later on?
Thanks for your help.

In your filebeat configuration, are you sure you are referring to the exact file where your logs are stored? Your 'paths' in filebeat.yml is referring to a .log file extension while the custom log file you've pasted is log-2020-12-21.php Try changing your paths to match this .php extension instead.
If filebeat correctly picks this file up, you could see something like the code below in your filebeat logs
INFO log/harvester.go:287 Harvester started for file: /Applications/MAMP/htdocs/247around-adminp-aws/application/logs/log-2020-12-21.php

Related

Duplicate and missing log entries with FluentBit and ES

We're using FluentBit to ship microservice logs into ES and recently found an issue on one of the environments: some log entries are duplicated (up to several hundred times) while other entries are missing in ES/Kibana but can be found in the microservice's container (kubectl logs my-pod -c my-service).
Each duplicate log entry has a unique _id and _fluentBitTimestamp so it really looks like the problem is on FluentBit's side.
FluentBit version is 1.5.6, the configuration is:
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Log_File /fluent-bit/log/fluent-bit.log
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/parsers_java.conf
[INPUT]
Name tail
Path /home/xng/log/*.log
Exclude_Path /home/xng/log/*.zip
Parser json
Buffer_Max_Size 128k
[FILTER]
Name record_modifier
Match *
Record hostname ${HOSTNAME}
[OUTPUT]
Name es
Match *
Host es-logging-service
Port 9210
Type flink-logs
Logstash_Format On
Logstash_Prefix test-env-logstash
Time_Key _fluentBitTimestamp
Any help would be much appreciated.
We had same problem
Can you try in your configuration
Write_operation upsert
So if log has duplicate _id it will update instead of create
Please note, Id_Key or Generate_ID is required in update, and upsert scenario.
https://docs.fluentbit.io/manual/pipeline/outputs/elasticsearch#write_operation

Spring boot logging file name

🐞 Bug report ??
image
logging:
level:
com.zaxxer.hikari: DEBUG
org.springframework: INFO
org.kafka.test: TRACE
file: "logs/%d{yyyy-MM-dd HH_mm_ss} pid-${PID}.log"
pattern.console: "%d{HH:mm:ss} - %msg%n"
Hello.
please help with the file name.
The time format does not work well.
I expected to see 1 file named "2020-02-07 10_38_40 pid-17996.log"
I got 2 files and the file names are bad.
Please do not advise using logback-spring.xml
I configure logs through .yml

logstash not runs config

I'm using filebeat on client side > logstash on serverside > elasticsearch on server side
filebeat on clientside works properly by sending file, but the configuration i've made on logstash returning
Fail
[WARN ] 2019-12-18 14:53:30.987 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[FATAL] 2019-12-18 14:53:31.341 [LogStash::Runner] runner - Logstash could not be started because there is already another instance using the configured data directory. If you wish to run multiple instances, you must change the "path.data" setting.
[ERROR] 2019-12-18 14:53:31.364 [LogStash::Runner] Logstash - java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit
Here is my configfile
input {
beats {
port =>5044
}
}
filter {
grok {
match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp}] %{WORD:test}\[%{NUMBER:nom}]\[%{DATA:tes}\] %{DATA:module_name}\: %{WORD:method}%{GREEDYDATA:log_message}" }
}
}
output {
elasticsearch
{
hosts => "127.0.0.1:9200"
index=>"test_log_pbx"
}
}
code to run my logstash config
/usr/share/logstash/bin/logstash -f logstash.conf
when i run configtest it returns
Thread.exclusive is deprecated, use Thread::Mutex
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2019-12-18 14:59:53.300 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2019-12-18 14:59:56.566 [LogStash::Runner] Reflections - Reflections took 139 ms to scan 1 urls, producing 20 keys and 40 values
Configuration OK
[INFO ] 2019-12-18 14:59:57.923 [LogStash::Runner] runner - Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash
please help me i dont know whats wrong
A logstash instance already running, so you can not run another instance.If you made your logstash as service, you should stop the service. If you want to run multiple instances, you should modify pipelines.yml
If you want to learn more about pipelines.yml, I put link the below.
https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html

Packetbeat throws Bulk item insert failed error

Packetbeat throws following error
Bulk item insert failed
When the following processor is added to packetbeat.yml
processors.include_fields.fields: ["http.request.body"]
Error log
2018-06-04T00:37:40.893+0530 ERROR pipeline/output.go:92 Failed to publish events: temporary bulk send failure
2018-06-04T00:37:40.893+0530 DEBUG [elasticsearch] elasticsearch/client.go:666 ES Ping(url=http://localhost:9200)
2018-06-04T00:37:40.894+0530 DEBUG [elasticsearch] elasticsearch/client.go:689 Ping status code: 200
2018-06-04T00:37:40.894+0530 INFO elasticsearch/client.go:690 Connected to Elasticsearch version 6.2.2
2018-06-04T00:37:40.894+0530 DEBUG [elasticsearch] elasticsearch/client.go:708 HEAD http://localhost:9200/_template/packetbeat-6.2.4 <nil>
2018-06-04T00:37:40.895+0530 INFO template/load.go:73 Template already exists and will not be overwritten.
2018-06-04T00:37:40.896+0530 DEBUG [elasticsearch] elasticsearch/client.go:303 PublishEvents: 1 events have been published to elasticsearch in 1.245631ms.
2018-06-04T00:37:40.896+0530 DEBUG [elasticsearch] elasticsearch/client.go:507 Bulk item insert failed (i=0, status=500): {"type":"string_index_out_of_bounds_exception","reason":"String index out of range: 0"}
Environment: elasticsearch version - 6.2.4
packetbeat version - 6.2.4
I managed to find the root course for this error. It was when adding following to
packetbeat.yml
index: "packetbeat-%{[beat.version]}-%{+yyyy.MM.dd.HH}"
when I removed it problem disappeared. seems to be a bug with custom index naming

Hive issue using yarn

I am running hive sql on yarn,
it's throwing error with join condition , I am able to create External as well as internal table but failed to create table when use command
create table as AS SELECT name from student.
when running same query through hive cli it's working fine but with spring jog it throws error
2016-03-28 04:26:50,692 [Thread-17] WARN
org.apache.hadoop.hive.shims.HadoopShimsSecure - Can't fetch tasklog:
TaskLogServlet is not supported in MR2 mode.
Task with the most failures(4):
-----
Task ID:
task_1458863269455_90083_m_000638
-----
Diagnostic Messages for this Task:
AttemptID:attempt_1458863269455_90083_m_000638_3 Timed out after 1 secs
2016-03-28 04:26:50,842 [main] INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application
application_1458863269455_90083
2016-03-28 04:26:50,849 [main] ERROR com.mapr.fs.MapRFileSystem - Failed to
delete path maprfs:/home/pro/amit/warehouse/scratdir/hive_2016-03-28_04-
24-32_038_8553676376881087939-1/_task_tmp.-mr-10003, error: No such file or
directory (2)
2016-03-28 04:26:50,852 [main] ERROR org.apache.hadoop.hive.ql.Driver -
FAILED: Execution Error, return code 2 from
As per my findings I think there is some issue with scratdir.
Kindly suggest if any one face same issue.
This issue occurs if the recursive directory doesnot exist. Hive doesnt automatically create directories recursively.
Please check existence of directories to child\table level from root
I faced a similar issue while running the below Hive query
select * from <db_name>.<internal_tbl_name> where <field_name_of_double_type> in (<list_of_double_values>) order by <list_of_order_fields> limit 10;
I performed an explain on the above statement and below was the result.
fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2d00001590]: it still exists.
2017-05-08 04:26:37,969 WARN [41289638-cd53-4d4b-88c9-3359e9ec99e2 main] fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2700001591]: it still exists.
Time taken: 0.886 seconds, Fetched: 24 row(s)
And checked the logs through
yarn logs -applicationID application_1458863269455_90083
The error happened after a MapR upgrade from the admin team. It is probably due to some upgrade or installation issue and Tez configurations (as suggested by the line 873 in log below). Or probably, the Hive query is syntactically not supporting the Tez optimization. Saying so, because another Hive query on an external table is running fine in my case. Have to check a bit deeper though.
Though not sure but the error line in the logs that looks to be most relevant is as follows:
2017-05-08 00:01:47,873 [ERROR] [main] |web.WebUIService|: Tez UI History URL is not set
Solution:
It is probably happening due to some open files or applications that are using some resources. Pls check https://unix.stackexchange.com/questions/11238/how-to-get-over-device-or-resource-busy
You can run the explain <your_Hive_statement>
In the result execution plan, you can come across the filenames/dirs that Hive execution engine fails to delete e.g.
2017-05-08 04:26:37,969 WARN [41289638-cd53-4d4b-88c9-3359e9ec99e2 main] fs.FileUtil: Failed to delete file or dir [/hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/.nfs0000000057b93e2d00001590]: it still exists.
Go to the path given in the step 2 e.g. /hdfs/Hadoop_Misc_Logs/Edge01/local_scratch/<hive_username>/41289638-cd53-4d4b-88c9-3359e9ec99e2/hive_2017-05-08_04-26-36_658_6626096693992380903-1/
In path 3, doing ls -a or lsof +D /path will show the open process_ids blocking the files from delete.
If you run ps -ef | grep <pid>, you get
hive_username <pid> 19463 1 05:19 pts/8 00:00:35 /opt/mapr/tools/jdk1.7.0_51/jre/bin/java -Xmx256m -Dhiveserver2.auth=PAM -Dhiveserver2.authentication.pam.services=login -Dmapr_sec_enabled=true -Dhadoop.login=maprsasl -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 -Dhadoop.id.str=hive_username -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dlog4j.configurationFile=hive-log4j2.properties -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.file=/opt/mapr/hive/hive-2.1/bin/../conf/parquet-logging.properties -Dhadoop.security.logger=INFO,NullAppender -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.saslprovider=com.mapr.security.maprsasl.MaprSaslProvider -Djavax.net.ssl.trustStore=/opt/mapr/conf/ssl_truststore org.apache.hadoop.util.RunJar /opt/mapr/hive/hive-2.1//lib/hive-cli-2.1.1-mapr-1703.jar org.apache.hadoop.hive.cli.CliDriver
CONCLUSION:
The HiveCLiDriver clearly shows that running "Hive on Spark" (or managed) tables through Hive CLI is not supported any more from Hive 2.0 onwards and it is going to be deprecated going forward. You have to use HiveContext in Spark for running Hive queries. But you can still run queries on Hive external tables through Hive CLI.

Resources