JDBC input logstash plugin fetch data from mysql multiple times - elasticsearch

Logstash jdbc input plugin fetch data from mysql multiple time and keep creating documents in elasticsearch
For 600 rows in mysql, it creates 8581812 documents in elasticsearch
I have created multiple config files to fetch data from each table in mysql and put in /etc/logstash/conf.d folder
Start logstash service as sudo systemctl start logstash
Run following command to execute files
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/spt_audit_event.conf
Data successfully fetched
input{
jdbc {
jdbc_driver_library => "/usr/share/jdbc_driver/mysql-connector-java-5.1.47.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://:3306/"
jdbc_user => ""
jdbc_password => ""
statement => "select * from spt_identity"
}
}
output {
elasticsearch {
"hosts" => "localhost:9200"
"index" => ""
}
stdout {}
}
Actual Results
Number of documents in elasticsearch keep on increasing and reached to 8581812 but there are only 600 rows in mysql table
Is it bug in plugin or I'm doing something wrong ?

You need to mention the unqiue id for elasticsearch
In order to avoid the duplication issues at elasticsearch you may need to add the unique id for the documents at elasticsearch.
Modify the logstash.conf by adding the "document_id" => "%{studentid}" in the output like below.
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "test-migrate"
"document_id" => "%{studentid}"
}
In your case it wont be studentid, but something else. Find the same and add it to your configuration.

Related

Logstash Elasticsearch plugin. Compare results from two sources

I have two deployed Elasticsearch clusters. Data "surpassingly" should be the same in both clusters. My main aim is to compare _source field for each elasticsearch document from source and target ES clusters.
I created logstash config in which I define Elasticsearch input plugin, which run over each document in source cluster, next using elasticsearch filter look up the target Elasticsearch cluster and query from it document by _id which I took from source cluster, match results of the _source field for both documents.
Could you please helm to implement such a config.
input {
elasticsearch {
hosts => ["source_cluster:9200"]
ssl => true
user => "user"
password => "password"
index => "my_index_pattern"
}
}
filter {
mutate {
remove_field => ["#version", "#timestamp"]
}
elasticsearch {
hosts => ["target_custer:9200"]
ssl => true
user => "user"
password => "password"
query => ???????
match _source field ????
}
}
output {
stdout { codec => rubydebug }
}
Maybe print some results of comparison...

Unable to sync data form mysql to logstash

My logstash.conf file is
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:8889/optjobs"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => "root"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/Users/ajoshi31/mysql-connector-java-5.1.17.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
statement => "SELECT * FROM candidates INNER JOIN candidate_skills ON candidate_skills.candidate_id = candidates.id"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "optjobsprd"
"document_type" => "data"
}
}
With the config file and on running logstash
$ logstash -f logstash.conf
I get the below error
Thread.exclusive is deprecated, use Thread::Mutex
Sending Logstash logs to /usr/local/Cellar/logstash/7.5.2/libexec/logs which is now configured via log4j2.properties
[2020-04-09T11:47:14,307][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-04-09T11:47:14,549][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.5.2"}
[2020-04-09T11:47:17,657][INFO ][org.reflections.Reflections] Reflections took 122 ms to scan 1 urls, producing 20 keys and 40 values
[2020-04-09T11:47:18,565][ERROR][logstash.outputs.elasticsearch] Unknown setting '"document_type"' for elasticsearch
[2020-04-09T11:47:18,568][ERROR][logstash.outputs.elasticsearch] Unknown setting '"hosts"' for elasticsearch
[2020-04-09T11:47:18,572][ERROR][logstash.outputs.elasticsearch] Unknown setting '"index"' for elasticsearch
[2020-04-09T11:47:18,590][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:mai
Once I remove the host, document_type and host, logstash runs, connects to mysql and execute the query but I cannot create index or update the data from mysql.
The options must not be quoted. Remove the quotes like
output {
stdout { codec => json_lines }
elasticsearch {
hosts => "localhost:9200"
index => "optjobsprd"
document_type => "data"
}
}

Delete data or document from elastic search using logstash

I am trying to delete elastic search data or document using logstash configuration but delete seems to be not working.
I am using logstash 5.6.8 version
Below is the logstash configuration file:
```input {
jdbc {
#db configuration
'''
statement => " select * from table "
}
output {
elasticsearch {
action => "delete"
hosts => "localhost"
index => "myindex"
document_type => "doctype"
document_id => "%{id}"
}
stdout { codec => json_lines }
}```
But the above configuration are deleting the id's present in my db table and not deleting the id's that are not present.
when i sync from db to elastic search using logstash, i expect that deleted rows in db also synched and it should be consistent.
I also tried below configuration but getting some error:
```input {
jdbc {
#db configuration
'''
statement => " select * from table "
}
output {
elasticsearch {
action => "delete"
hosts => "localhost"
index => "myindex"
document_type => "doctype"
}
stdout { codec => json_lines }
}```
Error in logstash console:
"current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/interval.rb:89:in sleep'"}]}}
[2019-12-27T16:30:16,087][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>9, "stalling_thread_info"=>{"other"=>[{"thread_id"=>22, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/interval.rb:89:insleep'"}]}}
[2019-12-27T16:30:18,623][ERROR][logstash.outputs.elasticsearch] Encountered a retryable error. Will Retry with exponential backoff {:code=>400, :url=>"http://localhost:9200/_bulk"}
[2019-12-27T16:30:21,086][WARN ][logstash.shutdownwatcher ] {"inflight_count"=>9, "stalling_thread_info"=>{"other"=>[{"thread_id"=>22, "name"=>"[main]>worker0", "current_call"=>"[...]/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/interval.rb:89:in `sleep'"}]}}
Can someone tell me how to delete document and sync db data or how to handle deleted records in elastic search?

SQL Server Sync with Elasticsearch through Logstash - retransfer does not happens

Background:
We are doing a POC of Sync SQL Server error log data to elasticsearch(ES) to bring a dashboard in kibana.
I used Logstash with jdbc input plugin to move sql server table data to (ES), it was succeeded. In the log table it was around 5000 records, each got moved to ES.
Problem Statement:
For testing I deleted the index from ES which was which was earlier synced by Logstash and I ran the Logstash again with the same input config file. But no records was moved If I add a new record to the SQL Server table, that was reflecting, but the older records (5000) was not updated.
Config
Below is my config file used to sync
input {
jdbc {
#https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-record_last_run
jdbc_connection_string => "jdbc:sqlserver://localhost:40020;database=application;user=development;password=XXX$"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_user => nil
# The path to our downloaded jdbc driver
jdbc_driver_library => "C:\Program Files (x86)\sqljdbc6.2\enu\sqljdbc4-3.0.jar"
# The name of the driver class for SqlServer
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
# Query for testing purpose
schedule => "* * * * *"
last_run_metadata_path => "C:\Software\ElasticSearch\logstash-6.4.0\.logstash_jdbc_last_run"
record_last_run => true
clean_run => true
statement => " select * from Log"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "application_log"
#document_id is a unique id, this has to be provided during syn, else we may get duplicate entry in ElasticSearch index.
document_id => "%{Id}"
}
}
Please help me out and explain what went wrong.
Logstash Version:6.4.0
Elasticsearch Version: 6.3.1
Thanks in advance
I found and resolved this issue.
The problem I found is every fields are case sensitive, it accepts only if in lowercase letter.
Below is the change what I did and it works fine for me.
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "application_log"
#document_id is a unique id, this has to be provided during syn, else we may get duplicate entry in ElasticSearch index.
document_id => "%{Id}"
}
}
and no change in the input section.
Thanks for your support.

exception when executing JDBC query in logstash .conf file

MySQL 5.6.35
RHEL 7
Logstash version 6.2.4
MySQL driver: mysql-connector-java-8.0.11.jar
I am getting the following error when I run my logstash script
[2018-06-18T14:39:49,395][WARN ][logstash.inputs.jdbc ] Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::JavaLang::IllegalArgumentException: MONTH>}
[2018-06-18T14:39:49,667][INFO ][logstash.pipeline ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x2937cfc6#/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:247 run>"}
It is always in the same group of records, so I am sure it is a data issue.
the fact the error mentions IllegalArgumentException: MONTH> makes me think it is a date issue, but I can't see any obvious issues in the data, and the error message does not give me enough information to home in any further on error. all my empty dates are null rather than '0000-00-00'
Does anyone know if there are any other logs that would help me identify the error.
this is the script in my conf.d folder.
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/hawk_aol"
#useCursorFetch=true&
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => ""
# The path to our downloaded jdbc driver
jdbc_driver_library => "/apps/elasticsearch/drivers/mysql-connector-java-8.0.11.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
statement => "SELECT * FROM weekly_data"
#jdbc_fetch_size => "5000"
jdbc_paging_enabled => true
jdbc_page_size => 100000
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "aol"
"document_type" => "weekly_data"
"document_id" => "%{id}"
}
}
I solved my own problem. I thought the issue was related to dates, I had been searching in my database for dates with a value of 0000-00-00 and converting them to null, however there was one date with a value of 2016-00-00, once I cleared that to null, everything starting running as expected.

Resources