Why is logstash merging tables? - elasticsearch

I've got two configuration files pointing each on a different table of the database but in elasticsearch, i get the two tables merged so, tablea and tableb items will all be copy as typea and tablea and tableb items will all be copy as typeb.
But i configured it to get tablea as typea and tableb as typeb.
Here's my configuration filea
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://db:3306/nombase"
jdbc_user => "root"
jdbc_password => "root"
schedule => "* * * * *"
statement => "SELECT * FROM `tablecandelete`"
}
}
## Add your filters / logstash plugins configuration here
output {
stdout { codec => json_lines }
if [deleted] == 1 {
elasticsearch {
hosts => "elasticsearch:9200"
index => "nombase"
document_type => "tablecandelete"
## TO PREVENT DUPLICATE ITEMS
document_id => "tablecandelete-%{id}"
action => "delete"
}
} else {
elasticsearch {
hosts => "elasticsearch:9200"
index => "nombase"
document_type => "tablecandelete"
## TO PREVENT DUPLICATE ITEMS
document_id => "tablecandelete-%{id}"
}
}
}
Here's my configuration fileb
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://db:3306/nombase"
jdbc_user => "root"
jdbc_password => "root"
schedule => "* * * * *"
statement => "SELECT * FROM `nomtable`"
}
}
## Add your filters / logstash plugins configuration here
output {
stdout { codec => json_lines }
elasticsearch {
hosts => "elasticsearch:9200"
index => "nombase"
document_type => "nomtable"
## TO PREVENT DUPLICATE ITEMS
document_id => "nomtable-%{id}"
}
}
Does anybody have an idea? Thanks for any response/help

While you do have two different configuration files, logstash doesn't treat them as independent -- you could have all of your inputs in one file, all of your outputs in another file.
To separate things out, you have to add type => something_unique on each of your inputs and then surround the remaining code in the config file with
if [type] == 'something_unique' {
// config here
}

Related

How to insert multiple table values into each table?

How to insert multiple table values into each table?
Using logstash, I want to put multiple tables as elasticsearch.
I used logstash several times using jdbc
but only one value is saved in one table.
I tried to answer the stackoverflow, but I couldn't solve it.
-> multiple inputs on logstash jdbc
This is my confile code.
This code is the code that I executed by myself.
input {
jdbc {
jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.23.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/db_name?useSSL=false&user=root&password=1234"
jdbc_user => "root"
jdbc_password => "1234"
schedule => "* * * * *"
statement => "select * from table_name1"
tracking_column => "table_name1"
use_column_value => true
clean_run => true
}
jdbc {
jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.23.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/db_name?useSSL=false&user=root&password=1234"
jdbc_user => "root"
jdbc_password => "1234"
schedule => "* * * * *"
statement => "select * from table_name2"
tracking_column => "table_name2"
use_column_value => true
clean_run => true
}
jdbc {
jdbc_driver_library => "/usr/share/java/mysql-connector-java-8.0.23.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/db_name?useSSL=false&user=root&password=1234"
jdbc_user => "root"
jdbc_password => "1234"
schedule => "* * * * *"
statement => "select * from table_name3"
tracking_column => "table_name3"
use_column_value => true
clean_run => true
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "aws_05181830_2"
document_type => "%{type}"
document_id => "{%[#metadata][document_id]}"
}
stdout {
codec => rubydebug
}
}
problem
1. If you look at the picture, save only one value in one table
2. When a new table comes, the existing table value disappears.
My golas
How to save properly without duplicate data in each table?
You are setting the document_id of the document in elasticsearch using
document_id => "{%[#metadata][document_id]}"
This is not a valid sprintf reference, so it uses the literal value {%[#metadata][document_id]}. As a result, every document you index overwrites the previous document. I suggest you remove this option.

Logstash Input -> JDBC in some properties or parameterizable file?

I am using logstash to ingest elasticsearch. I am using input jdbc, and I am urged by the need to parameterize the inputt jdbc settings, such as the connection string, pass, etc, since I have 10 .conf files where each one has 30 jdbc and 30 output inside.
So, since each file has the same settings, would you like to know if it is possible to do something generic or reference that information from somewhere?
I have this 30 times:...
input {
# Number 1
jdbc {
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/ifxjdbc-4.50.3.jar"
jdbc_driver_class => "com.informix.jdbc.IfxDriver"
jdbc_connection_string => "jdbc:informix-sqli://xxxxxxx/schema:informixserver=server"
jdbc_user => "xxx"
jdbc_password => "xxx"
schedule => "*/1 * * * *"
statement => "SELECT * FROM public.test ORDER BY id ASC"
tags => "001"
}
# Number 2
jdbc {
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/ifxjdbc-4.50.3.jar"
jdbc_driver_class => "com.informix.jdbc.IfxDriver"
jdbc_connection_string => "jdbc:informix-sqli://xxxxxxx/schema:informixserver=server"
jdbc_user => "xxx"
jdbc_password => "xxx"
schedule => "*/1 * * * *"
statement => "SELECT * FROM public.test2 ORDER BY id ASC"
tags => "002"
}
[.........]
# Number X
jdbc {
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/ifxjdbc-4.50.3.jar"
jdbc_driver_class => "com.informix.jdbc.IfxDriver"
jdbc_connection_string => "jdbc:informix-sqli://xxxxxxx/schema:informixserver=server"
jdbc_user => "xxx"
jdbc_password => "xxx"
schedule => "*/1 * * * *"
statement => "SELECT * FROM public.testx ORDER BY id ASC"
tags => "00x"
}
}
filter {
mutate {
add_field => { "[#metadata][mitags]" => "%{tags}" }
}
# Number 1
if "001" in [#metadata][mitags] {
mutate {
rename => [ "codigo", "[properties][codigo]" ]
}
}
# Number 2
if "002" in [#metadata][mitags] {
mutate {
rename => [ "codigo", "[properties][codigo]" ]
}
}
[......]
# Number x
if "002" in [#metadata][mitags] {
mutate {
rename => [ "codigo", "[properties][codigo]" ]
}
}
mutate {
remove_field => [ "#version","#timestamp","tags" ]
}
}
output {
# Number 1
if "001" in [#metadata][mitags] {
# Para ELK
elasticsearch {
hosts => "localhost:9200"
index => "001"
document_type => "001"
document_id => "%{id}"
manage_template => true
template => "/home/user/logstash/templates/001.json"
template_name => "001"
template_overwrite => true
}
}
# Number 2
if "002" in [#metadata][mitags] {
# Para ELK
elasticsearch {
hosts => "localhost:9200"
index => "002"
document_type => "002"
document_id => "%{id}"
manage_template => true
template => "/home/user/logstash/templates/002.json"
template_name => "002"
template_overwrite => true
}
}
[....]
# Number x
if "00x" in [#metadata][mitags] {
# Para ELK
elasticsearch {
hosts => "localhost:9200"
index => "002"
document_type => "00x"
document_id => "%{id}"
manage_template => true
template => "/home/user/logstash/templates/00x.json"
template_name => "00x"
template_overwrite => true
}
}
}
You will still need one jdbc input for each query you need to do, but you can improve your filter and output blocks.
In your filter block you are using the field [#metadata][mitags] to filter your inputs but you are applying the same mutate filter to each one of the inputs, if this is the case you don't need the conditionals, the same mutate filter can be applied to all your inputs if you don't filter it.
Your filter block could be resumed to something as this one.
filter {
mutate {
add_field => { "[#metadata][mitags]" => "%{tags}" }
}
mutate {
rename => [ "codigo", "[properties][codigo]" ]
}
mutate {
remove_field => [ "#version","#timestamp","tags" ]
}
}
In your output block you use the tag just to change the index, document_type and template, you don't need to use conditionals to that, you can use the value of the field as a parameter.
output {
elasticsearch {
hosts => "localhost:9200"
index => "%{[#metadata][mitags]}"
document_type => "%{[#metadata][mitags]}"
document_id => "%{id}"
manage_template => true
template => "/home/unitech/logstash/templates/%{[#metadata][mitags]}.json"
template_name => "iol-fue"
template_overwrite => true
}
}
But this only works if you have a single value in the field [#metadata][mitags], which seems to be the case.
EDIT:
Edited just for history reasons, as noted in the comments, the template config does not allow the use of dynamic parameters as it is only loaded when logstash is starting, the other configs works fine.

Logstash error: Rejecting mapping update to [db] as the final mapping would have more than 1 type: [meeting_invities, meetingroom "}}}}

Rejecting mapping update to [db] as the final mapping would have more than 1 type: [meeting_invities, meetingroom
"}}}}
below is my logstatsh-mysql.conf I have to use multiple table in jdbc input. Please advise
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/db"
jdbc_user => "root"
jdbc_password => "pwd"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * FROM meeting"
tags => "dat_meeting"
}
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/db"
jdbc_user => "root"
jdbc_password => "pwd"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * FROM meeting_invities;"
tags => "dat_meeting_invities"
}
}
output {
stdout { codec => json_lines }
if "dat_meeting" in [tags]{elasticsearch {
hosts => "localhost:9200"
index => "meetingroomdb"
document_type => "meeting"
}
}
if "dat_meeting_invities" in [tags]{elasticsearch {
hosts => "localhost:9200"
index => "meetingroomdb"
document_type => "meeting_invities"
}
}
}
Your elasticsearch output uses the option document_type with two different values, this option sets the _type field of the index, since version 6.X you can only have one type in your index.
This option is deprecated in version 7.X and will be removed in future versions, since elasticsearch is going typeless.
Since elasticsearch won't allow you to index more than one index type, You need to see what is the type of the first document you indexed, this is the type that elasticsearch will use for any future document, use this in both document_type option.

Logstash reading SQL Server data real time

Is there any way I can configure logstash so that it picks up delta records real time automatically. If not then is there any opensource plugin/tool available to achieve this? Thanks for the help.
Try the below configuration for the MSSQL server. You need to schedule it like below by adding the schedule period, a statement which would the query to fetch the data from your database
input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://localhost:1433;databaseName=test"
# The user we wish to execute our statement as
jdbc_user => "sa"
jdbc_password => "sasa"
# The path to our downloaded jdbc driver
jdbc_driver_library => "C:\Users\abhijitb\.m2\repository\com\microsoft\sqlserver\mssql-jdbc\6.2.2.jre8\mssql-jdbc-6.2.2.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
#clean_run => true
schedule => "* * * * *"
#query
statement => "SELECT * FROM Student where studentid > :sql_last_value"
use_column_value => true
tracking_column => "studentid"
}
}
output {
#stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "student"
"document_type" => "data"
"document_id" => "%{studentid}"
}
}

multiple inputs on logstash jdbc

I am using logstash jdbc to keep the things syncd between mysql and elasticsearch. Its working fine for one table. But now I want to do it for multiple tables. Do I need to open multiple in terminal
logstash agent -f /Users/logstash/logstash-jdbc.conf
each with a select query or do we have a better way of doing it so we can have multiple tables being updated.
my config file
input {
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table1"
}
}
output {
elasticsearch {
index => "testdb"
document_type => "table1"
document_id => "%{table_id}"
hosts => "localhost:9200"
}
}
You can definitely have a single config with multiple jdbc input and then parametrize the index and document_type in your elasticsearch output depending on which table the event is coming from.
input {
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table1"
type => "table1"
}
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table2"
type => "table2"
}
# add more jdbc inputs to suit your needs
}
output {
elasticsearch {
index => "testdb"
document_type => "%{type}" # <- use the type from each input
hosts => "localhost:9200"
}
}
This will not create duplicate data. and compatible logstash 6x.
# YOUR_DATABASE_NAME : test
# FIRST_TABLE : place
# SECOND_TABLE : things
# SET_DATA_INDEX : test_index_1, test_index_2
input {
jdbc {
# The path to our downloaded jdbc driver
jdbc_driver_library => "/mysql-connector-java-5.1.44-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# Postgres jdbc connection string to our database, YOUR_DATABASE_NAME
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => ""
schedule => "* * * * *"
statement => "SELECT #slno:=#slno+1 aut_es_1, es_qry_tbl.* FROM (SELECT * FROM `place`) es_qry_tbl, (SELECT #slno:=0) es_tbl"
type => "place"
add_field => { "queryFunctionName" => "getAllDataFromFirstTable" }
use_column_value => true
tracking_column => "aut_es_1"
}
jdbc {
# The path to our downloaded jdbc driver
jdbc_driver_library => "/mysql-connector-java-5.1.44-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# Postgres jdbc connection string to our database, YOUR_DATABASE_NAME
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => ""
schedule => "* * * * *"
statement => "SELECT #slno:=#slno+1 aut_es_2, es_qry_tbl.* FROM (SELECT * FROM `things`) es_qry_tbl, (SELECT #slno:=0) es_tbl"
type => "things"
add_field => { "queryFunctionName" => "getAllDataFromSecondTable" }
use_column_value => true
tracking_column => "aut_es_2"
}
}
# install uuid plugin 'bin/logstash-plugin install logstash-filter-uuid'
# The uuid filter allows you to generate a UUID and add it as a field to each processed event.
filter {
mutate {
add_field => {
"[#metadata][document_id]" => "%{aut_es_1}%{aut_es_2}"
}
}
uuid {
target => "uuid"
overwrite => true
}
}
output {
stdout {codec => rubydebug}
if [type] == "place" {
elasticsearch {
hosts => "localhost:9200"
index => "test_index_1_12"
#document_id => "%{aut_es_1}"
document_id => "%{[#metadata][document_id]}"
}
}
if [type] == "things" {
elasticsearch {
hosts => "localhost:9200"
index => "test_index_2_13"
document_id => "%{[#metadata][document_id]}"
# document_id => "%{aut_es_2}"
# you can set document_id . otherwise ES will genrate unique id.
}
}
}
If you need to run more than one pipeline in the same process, Logstash provides a way to do this through a configuration file called pipelines.yml and using multiple pipelines
multiple pipeline
Using multiple pipelines is especially useful if your current configuration has event flows that don’t share the same inputs/filters and outputs and are being separated from each other using tags and conditionals.
more helpfull resource

Resources