I was working on inserting indexes to elasticsearch using logstash. My conf looks like this:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/sample"
jdbc_user => "root"
jdbc_password => ""
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mysql-connector-java-5.1.48.jar"
jdbc_driver_class => " com.mysql.jdbc.Driver"
tracking_column => "time"
use_column_value => true
statement => "SELECT time, firstname, lastname, email FROM sample.sample where time > :sql_last_value;"
schedule => " * * * * * *"
}
}
output {
elasticsearch {
document_id=> "%{time}"
document_type => "doc"
"hosts" => ["myhost"]
"index" => "sample"
}
stdout{
codec => rubydebug
}
}
When I execute this file using logstash, I got query:
SELECT time, firstname, lastname, email FROM sample.sample where time > 351
But the time column is currenttimestamp in MySQL database which I'm tracking, and I think sql_last_value contains the value of this field. Am I doing something incorrect related to sql_last_value?
What I was doing wrong:
we need to specify tracking_column_type if the tracking_column is not of type int. In my case, tracking_column is of type timestamp. So, I need to add:
tracking_column_type =>"timestamp"
That solved my problem.
Related
How to insert multiple table values into each table?
Using logstash, I want to put multiple tables as elasticsearch.
I used logstash several times using jdbc
but only one value is saved in one table.
I tried to answer the stackoverflow, but I couldn't solve it.
-> multiple inputs on logstash jdbc
This is my confile code.
This code is the code that I executed by myself.
input {
jdbc {
jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.23.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/db_name?useSSL=false&user=root&password=1234"
jdbc_user => "root"
jdbc_password => "1234"
schedule => "* * * * *"
statement => "select * from table_name1"
tracking_column => "table_name1"
use_column_value => true
clean_run => true
}
jdbc {
jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.23.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/db_name?useSSL=false&user=root&password=1234"
jdbc_user => "root"
jdbc_password => "1234"
schedule => "* * * * *"
statement => "select * from table_name2"
tracking_column => "table_name2"
use_column_value => true
clean_run => true
}
jdbc {
jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.23.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/db_name?useSSL=false&user=root&password=1234"
jdbc_user => "root"
jdbc_password => "1234"
schedule => "* * * * *"
statement => "select * from table_name3"
tracking_column => "table_name3"
use_column_value => true
clean_run => true
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "aws_05181830_2"
document_type => "%{type}"
document_id => "{%[#metadata][document_id]}"
}
stdout {
codec => rubydebug
}
}
problem
1. If you look at the picture, save only one value in one table
2. When a new table comes, the existing table value disappears.
My golas
How to save properly without duplicate data in each table?
You are setting the document_id of the document in elasticsearch using
document_id => "{%[#metadata][document_id]}"
This is not a valid sprintf reference, so it uses the literal value {%[#metadata][document_id]}. As a result, every document you index overwrites the previous document. I suggest you remove this option.
Is there any way I can configure logstash so that it picks up delta records real time automatically. If not then is there any opensource plugin/tool available to achieve this? Thanks for the help.
Try the below configuration for the MSSQL server. You need to schedule it like below by adding the schedule period, a statement which would the query to fetch the data from your database
input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://localhost:1433;databaseName=test"
# The user we wish to execute our statement as
jdbc_user => "sa"
jdbc_password => "sasa"
# The path to our downloaded jdbc driver
jdbc_driver_library => "C:\Users\abhijitb\.m2\repository\com\microsoft\sqlserver\mssql-jdbc\6.2.2.jre8\mssql-jdbc-6.2.2.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
#clean_run => true
schedule => "* * * * *"
#query
statement => "SELECT * FROM Student where studentid > :sql_last_value"
use_column_value => true
tracking_column => "studentid"
}
}
output {
#stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "student"
"document_type" => "data"
"document_id" => "%{studentid}"
}
}
I am trying to index data from mysql db to elasticsearch using logstash. Logstash is running without errors but the problem is, it indexing only one row from my SELECT query.
Below are the versions of softwares I am using:
elastic search : 2.4.1
logstash: 5.1.1
mysql: 5.7.17
jdbc_driver_library: mysql-connector-java-5.1.40-bin.jar
I am not sure if this is because logstash and elasticsearch versions are different.
Below is my pipeline configuration:
input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "password"
schedule => "* * * * *"
statement => "SELECT * FROM employee"
use_column_value => true
tracking_column => "id"
}
}
output {
elasticsearch {
index => "logstash"
document_type => "sometype"
document_id => "%{uid}"
hosts => ["localhost:9200"]
}
}
It seems like the tracking_column (id) which you're using in the jdbc plugin and the document_id (uid) in the output is different. What if you have both of them same since it'll be easy to get all the records by id and push them into ES using the same id as well which could look more understandable:
document_id => "%{id}" <-- make sure you've got the exact spellings
And also please try adding this following line to your jdbc input after tracking_column:
tracking_column_type => "numeric"
Additionally to make sure that you don't have the .logstash_jdbc_last_run file existing when you're running the logstash file include the following line as well:
clean_run => true
So this is how your jdbc input should look like:
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "password"
schedule => "* * * * *"
statement => "SELECT * FROM employee"
use_column_value => true
tracking_column => "id"
tracking_column_type => "numeric"
clean_run => true
}
Other than that the conf seems to be fine, unless you're willing to have :sql_last_value where if you only wanted to update the newly added records in your database table. Hope it helps!
I have a log stash running pulling records from postgresql and creating documents in elastic search, but whenever i am trying to update a record in postgres the same is not getting reflected in elastic search, here is my INPUT & OUTPUT configs let me know if i am missing anything here,
input {
jdbc {
# Postgres jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:postgresql://127.0.0.1:5009/data"
# The user we wish to execute our statement as
jdbc_user => "data"
jdbc_password=>"data"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/postgresql-9.4.1209.jar"
# The name of the driver class for Postgresql
jdbc_driver_class => "org.postgresql.Driver"
#sql_log_level => "debug"
jdbc_paging_enabled => "true"
jdbc_page_size => "5000"
schedule => "* * * * *"
# our query
clean_run => true
last_run_metadata_path => "/logstash/.test_metadata"
#use_column_value => true
#tracking_column => id
statement => "SELECT id,name,update_date from data where update_date > :sql_last_value"
}
}
output {
elasticsearch{
hosts => ["127.0.0.1"]
index => "test_data"
action => "index"
document_type => "data"
document_id => "%{id}"
upsert => ' {
"name" : "%{data.name}",
"update_date" : "%{data.update_date}"
} '
}
}
I think you need to track a date/timestamp column instead of the id column. If you have an UPDATE_DATE column that changes on each update, that would be good.
Your SELECT statement will only grab new records (i.e. id > last_id) and if you update a record, its id won't change, hence that updated record won't be picked up by the jdbc input the next time it runs.
I am using logstash jdbc to keep the things syncd between mysql and elasticsearch. Its working fine for one table. But now I want to do it for multiple tables. Do I need to open multiple in terminal
logstash agent -f /Users/logstash/logstash-jdbc.conf
each with a select query or do we have a better way of doing it so we can have multiple tables being updated.
my config file
input {
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table1"
}
}
output {
elasticsearch {
index => "testdb"
document_type => "table1"
document_id => "%{table_id}"
hosts => "localhost:9200"
}
}
You can definitely have a single config with multiple jdbc input and then parametrize the index and document_type in your elasticsearch output depending on which table the event is coming from.
input {
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table1"
type => "table1"
}
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table2"
type => "table2"
}
# add more jdbc inputs to suit your needs
}
output {
elasticsearch {
index => "testdb"
document_type => "%{type}" # <- use the type from each input
hosts => "localhost:9200"
}
}
This will not create duplicate data. and compatible logstash 6x.
# YOUR_DATABASE_NAME : test
# FIRST_TABLE : place
# SECOND_TABLE : things
# SET_DATA_INDEX : test_index_1, test_index_2
input {
jdbc {
# The path to our downloaded jdbc driver
jdbc_driver_library => "/mysql-connector-java-5.1.44-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# Postgres jdbc connection string to our database, YOUR_DATABASE_NAME
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => ""
schedule => "* * * * *"
statement => "SELECT #slno:=#slno+1 aut_es_1, es_qry_tbl.* FROM (SELECT * FROM `place`) es_qry_tbl, (SELECT #slno:=0) es_tbl"
type => "place"
add_field => { "queryFunctionName" => "getAllDataFromFirstTable" }
use_column_value => true
tracking_column => "aut_es_1"
}
jdbc {
# The path to our downloaded jdbc driver
jdbc_driver_library => "/mysql-connector-java-5.1.44-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# Postgres jdbc connection string to our database, YOUR_DATABASE_NAME
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => ""
schedule => "* * * * *"
statement => "SELECT #slno:=#slno+1 aut_es_2, es_qry_tbl.* FROM (SELECT * FROM `things`) es_qry_tbl, (SELECT #slno:=0) es_tbl"
type => "things"
add_field => { "queryFunctionName" => "getAllDataFromSecondTable" }
use_column_value => true
tracking_column => "aut_es_2"
}
}
# install uuid plugin 'bin/logstash-plugin install logstash-filter-uuid'
# The uuid filter allows you to generate a UUID and add it as a field to each processed event.
filter {
mutate {
add_field => {
"[#metadata][document_id]" => "%{aut_es_1}%{aut_es_2}"
}
}
uuid {
target => "uuid"
overwrite => true
}
}
output {
stdout {codec => rubydebug}
if [type] == "place" {
elasticsearch {
hosts => "localhost:9200"
index => "test_index_1_12"
#document_id => "%{aut_es_1}"
document_id => "%{[#metadata][document_id]}"
}
}
if [type] == "things" {
elasticsearch {
hosts => "localhost:9200"
index => "test_index_2_13"
document_id => "%{[#metadata][document_id]}"
# document_id => "%{aut_es_2}"
# you can set document_id . otherwise ES will genrate unique id.
}
}
}
If you need to run more than one pipeline in the same process, Logstash provides a way to do this through a configuration file called pipelines.yml and using multiple pipelines
multiple pipeline
Using multiple pipelines is especially useful if your current configuration has event flows that don’t share the same inputs/filters and outputs and are being separated from each other using tags and conditionals.
more helpfull resource