Need clarification on sql_last_value used in Logstash configuration - elasticsearch

Hi All i am using below code for indexing data from MSSql server to elasticsearch but i am not clear about this sql_last_value.
input {
jdbc {
jdbc_driver_library => ""
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://xxxx:1433;databaseName=xxxx;"
jdbc_user => "xxxx"
jdbc_paging_enabled => true
tracking_column => modified_date
tracking_column_type => "timestamp"
use_column_value => true
jdbc_password => "xxxx"
clean_run => true
schedule => "*/1 * * * *"
statement => "Select * from [dbo].[xxxx] where modified_date >:sql_last_value"
}
}
filter {
if [is_deleted] {
mutate {
add_field => {
"[#metadata][elasticsearch_action]" => "delete"
}
}
mutate {
remove_field => [ "is_deleted","#version","#timestamp" ]
}
} else {
mutate {
add_field => {
"[#metadata][elasticsearch_action]" => "index"
}
}
mutate {
remove_field => [ "is_deleted","#version","#timestamp" ]
}
}
}
output {
elasticsearch {
hosts => "xxxx"
user => "xxxx"
password => "xxxx"
index => "xxxx"
action => "%{[#metadata][elasticsearch_action]}"
document_type => "_doc"
document_id => "%{id}"
}
stdout { codec => rubydebug }
}
Where this sql_last_value stored and how to view that physically?
Is it possible to set a customized value to sql_last_value?
Could any one please clarify on above queries?

The sql_last_value is stored in the file called .logstash_jdbc_last_run and according to the docs it is stored in $HOME/.logstash_jdbc_last_run. The file itself contains the timestamp of the last run and it can be set to a specific value.
You should define the last_run_metadata_path parameter for each single jdbc_input_plugin and point to a more specific location, as all running jdbc_input_plugin instances will share the same .logstash_jdbc_last_run file by default and potentially lead into unwanted results.

Related

Elasticsearch array of an object using logstash

I have a mysql database working as a primary database and i'm ingesting data into elasticsearch from mysql using logstash. I have successfully indexed the users table into elasticsearch and it is working perfectly fine however, my users table has fields interest_id and interest_name which contains the ids and names of user interests as follows:
"interest_id" : "1,2",
"interest_name" : "Business,Farming"
What i'm trying to achieve:
I want to make an object of interests and this object should contain array of interest ids and interests_names like so:
interests : {
[
"interest_name" : "Business"
"interest_id" : "1"
],
[
"interest_name" : "Farming"
"interest_id" : "2"
]
}
Please let me know if its possible and also what is the best approach to achieve this.
My conf:
input {
jdbc {
jdbc_driver_library => "/home/logstash-7.16.3/logstash-core/lib/jars/mysql-connector-java-
8.0.22.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/"
jdbc_user => "XXXXX"
jdbc_password => "XXXXXXX"
sql_log_level => "debug"
clean_run => true
record_last_run => false
statement_filepath => "/home/logstash-7.16.3/config/queries/query.sql"
}
}
filter {
mutate {
remove_field => ["#version", "#timestamp",]
}
}
output {
elasticsearch {
hosts => ["https://XXXXXXXXXXXX:443"]
index => "users"
action => "index"
user => "XXXX"
password => "XXXXXX"
template_name => "myindex"
template => "/home/logstash-7.16.3/config/my_mapping.json"
template_overwrite => true
}
}
I have tried doing this by creating a nested field interests in my mapping and then adding mutate filer in my conf file like this:
mutate {
rename => {
"interest_id" => "[interests][interest_id]"
"interest_name" => "[interests][interest_name]"
}
With this i'm only able to get this output:
"interests" : {
"interest_id" : "1,2",
"interest_name" : "Business,Farming"
}

Logstash Input -> JDBC in some properties or parameterizable file?

I am using logstash to ingest elasticsearch. I am using input jdbc, and I am urged by the need to parameterize the inputt jdbc settings, such as the connection string, pass, etc, since I have 10 .conf files where each one has 30 jdbc and 30 output inside.
So, since each file has the same settings, would you like to know if it is possible to do something generic or reference that information from somewhere?
I have this 30 times:...
input {
# Number 1
jdbc {
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/ifxjdbc-4.50.3.jar"
jdbc_driver_class => "com.informix.jdbc.IfxDriver"
jdbc_connection_string => "jdbc:informix-sqli://xxxxxxx/schema:informixserver=server"
jdbc_user => "xxx"
jdbc_password => "xxx"
schedule => "*/1 * * * *"
statement => "SELECT * FROM public.test ORDER BY id ASC"
tags => "001"
}
# Number 2
jdbc {
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/ifxjdbc-4.50.3.jar"
jdbc_driver_class => "com.informix.jdbc.IfxDriver"
jdbc_connection_string => "jdbc:informix-sqli://xxxxxxx/schema:informixserver=server"
jdbc_user => "xxx"
jdbc_password => "xxx"
schedule => "*/1 * * * *"
statement => "SELECT * FROM public.test2 ORDER BY id ASC"
tags => "002"
}
[.........]
# Number X
jdbc {
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/ifxjdbc-4.50.3.jar"
jdbc_driver_class => "com.informix.jdbc.IfxDriver"
jdbc_connection_string => "jdbc:informix-sqli://xxxxxxx/schema:informixserver=server"
jdbc_user => "xxx"
jdbc_password => "xxx"
schedule => "*/1 * * * *"
statement => "SELECT * FROM public.testx ORDER BY id ASC"
tags => "00x"
}
}
filter {
mutate {
add_field => { "[#metadata][mitags]" => "%{tags}" }
}
# Number 1
if "001" in [#metadata][mitags] {
mutate {
rename => [ "codigo", "[properties][codigo]" ]
}
}
# Number 2
if "002" in [#metadata][mitags] {
mutate {
rename => [ "codigo", "[properties][codigo]" ]
}
}
[......]
# Number x
if "002" in [#metadata][mitags] {
mutate {
rename => [ "codigo", "[properties][codigo]" ]
}
}
mutate {
remove_field => [ "#version","#timestamp","tags" ]
}
}
output {
# Number 1
if "001" in [#metadata][mitags] {
# Para ELK
elasticsearch {
hosts => "localhost:9200"
index => "001"
document_type => "001"
document_id => "%{id}"
manage_template => true
template => "/home/user/logstash/templates/001.json"
template_name => "001"
template_overwrite => true
}
}
# Number 2
if "002" in [#metadata][mitags] {
# Para ELK
elasticsearch {
hosts => "localhost:9200"
index => "002"
document_type => "002"
document_id => "%{id}"
manage_template => true
template => "/home/user/logstash/templates/002.json"
template_name => "002"
template_overwrite => true
}
}
[....]
# Number x
if "00x" in [#metadata][mitags] {
# Para ELK
elasticsearch {
hosts => "localhost:9200"
index => "002"
document_type => "00x"
document_id => "%{id}"
manage_template => true
template => "/home/user/logstash/templates/00x.json"
template_name => "00x"
template_overwrite => true
}
}
}
You will still need one jdbc input for each query you need to do, but you can improve your filter and output blocks.
In your filter block you are using the field [#metadata][mitags] to filter your inputs but you are applying the same mutate filter to each one of the inputs, if this is the case you don't need the conditionals, the same mutate filter can be applied to all your inputs if you don't filter it.
Your filter block could be resumed to something as this one.
filter {
mutate {
add_field => { "[#metadata][mitags]" => "%{tags}" }
}
mutate {
rename => [ "codigo", "[properties][codigo]" ]
}
mutate {
remove_field => [ "#version","#timestamp","tags" ]
}
}
In your output block you use the tag just to change the index, document_type and template, you don't need to use conditionals to that, you can use the value of the field as a parameter.
output {
elasticsearch {
hosts => "localhost:9200"
index => "%{[#metadata][mitags]}"
document_type => "%{[#metadata][mitags]}"
document_id => "%{id}"
manage_template => true
template => "/home/unitech/logstash/templates/%{[#metadata][mitags]}.json"
template_name => "iol-fue"
template_overwrite => true
}
}
But this only works if you have a single value in the field [#metadata][mitags], which seems to be the case.
EDIT:
Edited just for history reasons, as noted in the comments, the template config does not allow the use of dynamic parameters as it is only loaded when logstash is starting, the other configs works fine.

Trying to get the data from oracle database through logstash but data is not coming to elasticsearch

I am trying to get the data of oracle database through logstash but data is not coming to elasticsearch. I am not sure where I missed it. I didn't see any error on logstash log file. Below are the logstash conf file.
input {
jdbc {
jdbc_validate_connection => "true"
jdbc_connection_string => "jdbc:oracle:thin:#//server:1521/db"
jdbc_user => "user"
jdbc_password => "pass"
jdbc_driver_library => "/etc/logstash/files/ojdbc7.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_paging_enabled => "true"
schedule => "* * * * *"
statement_filepath => "/etc/logstash/files/keycount.sql"
use_column_value => "true"
tracking_column => "timestamp"
last_run_metadata_path => "/etc/logstash/files/.logstash_jdbc_last_run"
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "keyinventory-%{+YYYY}"
}
stdout{
codec => rubydebug
}
}
Please, someone, help me.

Logstash Scheduling first run

I have a logstash pipeline running every 5 minutes with below jdbc input config, issue is upon starting the pipeline first time, it also waits for 5 minutes and then start scheduling. Is there any way to specify that we query/statement is executed as soon as the logstash pipeline is started instead of waiting on first 5 minutes too?
input {
jdbc {
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://${DB_HOST}/${DB_NAME}?useSSL=false"
jdbc_user => "${DB_USER_NAME}"
jdbc_password => "${DB_PASSWORD}"
schedule => "*/5 * * * *"
statement => "Select * from students"
}
}
The other workaround to this is to have two jdbc inputs, one for startup and one for schedule. It requires a bit of copy/paste but is not too bad.
input {
jdbc {
id => "index_name_startup"
jdbc_connection_string => "${JDBC_STRING}"
jdbc_user => "${JDBC_USER}"
jdbc_password => "${JDBC_PASSWORD}"
jdbc_driver_library => "/opt/logstash/postgresql-42.2.5.jre7.jar"
jdbc_driver_class => "org.postgresql.Driver"
add_field => { "[#metadata][project_id]" => "index_name" }
statement_filepath => "/mnt/elastic-search-config/sql-scripts/assets-index_name.sql"
}
jdbc {
id => "index_name"
jdbc_connection_string => "${JDBC_STRING}"
jdbc_user => "${JDBC_USER}"
jdbc_password => "${JDBC_PASSWORD}"
jdbc_driver_library => "/opt/logstash/postgresql-42.2.5.jre7.jar"
jdbc_driver_class => "org.postgresql.Driver"
add_field => { "[#metadata][project_id]" => "index_name" }
statement_filepath => "/mnt/elastic-search-config/sql-scripts/assets-index_name.sql"
schedule => "0 * * * *"
}
}
filter {
mutate {
gsub => [
"name", "_", " ",
"name", "-", " "
]
}
}
output {
if [#metadata][project_id] == "index_name" {
elasticsearch {
index => "index_name"
document_id => "%{geom_id}"
hosts => "localhost:9200"
template_name => "assets_template"
id => "index_name_es"
}
}
}
No, that is not the way rufus cron schedules work (and that is what the jdbc input uses). There is an open issue that includes a link to a patch that adds this.

multiple inputs on logstash jdbc

I am using logstash jdbc to keep the things syncd between mysql and elasticsearch. Its working fine for one table. But now I want to do it for multiple tables. Do I need to open multiple in terminal
logstash agent -f /Users/logstash/logstash-jdbc.conf
each with a select query or do we have a better way of doing it so we can have multiple tables being updated.
my config file
input {
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table1"
}
}
output {
elasticsearch {
index => "testdb"
document_type => "table1"
document_id => "%{table_id}"
hosts => "localhost:9200"
}
}
You can definitely have a single config with multiple jdbc input and then parametrize the index and document_type in your elasticsearch output depending on which table the event is coming from.
input {
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table1"
type => "table1"
}
jdbc {
jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
jdbc_user => "root"
jdbc_password => "password"
schedule => "* * * * *"
statement => "select * from table2"
type => "table2"
}
# add more jdbc inputs to suit your needs
}
output {
elasticsearch {
index => "testdb"
document_type => "%{type}" # <- use the type from each input
hosts => "localhost:9200"
}
}
This will not create duplicate data. and compatible logstash 6x.
# YOUR_DATABASE_NAME : test
# FIRST_TABLE : place
# SECOND_TABLE : things
# SET_DATA_INDEX : test_index_1, test_index_2
input {
jdbc {
# The path to our downloaded jdbc driver
jdbc_driver_library => "/mysql-connector-java-5.1.44-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# Postgres jdbc connection string to our database, YOUR_DATABASE_NAME
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => ""
schedule => "* * * * *"
statement => "SELECT #slno:=#slno+1 aut_es_1, es_qry_tbl.* FROM (SELECT * FROM `place`) es_qry_tbl, (SELECT #slno:=0) es_tbl"
type => "place"
add_field => { "queryFunctionName" => "getAllDataFromFirstTable" }
use_column_value => true
tracking_column => "aut_es_1"
}
jdbc {
# The path to our downloaded jdbc driver
jdbc_driver_library => "/mysql-connector-java-5.1.44-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# Postgres jdbc connection string to our database, YOUR_DATABASE_NAME
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => ""
schedule => "* * * * *"
statement => "SELECT #slno:=#slno+1 aut_es_2, es_qry_tbl.* FROM (SELECT * FROM `things`) es_qry_tbl, (SELECT #slno:=0) es_tbl"
type => "things"
add_field => { "queryFunctionName" => "getAllDataFromSecondTable" }
use_column_value => true
tracking_column => "aut_es_2"
}
}
# install uuid plugin 'bin/logstash-plugin install logstash-filter-uuid'
# The uuid filter allows you to generate a UUID and add it as a field to each processed event.
filter {
mutate {
add_field => {
"[#metadata][document_id]" => "%{aut_es_1}%{aut_es_2}"
}
}
uuid {
target => "uuid"
overwrite => true
}
}
output {
stdout {codec => rubydebug}
if [type] == "place" {
elasticsearch {
hosts => "localhost:9200"
index => "test_index_1_12"
#document_id => "%{aut_es_1}"
document_id => "%{[#metadata][document_id]}"
}
}
if [type] == "things" {
elasticsearch {
hosts => "localhost:9200"
index => "test_index_2_13"
document_id => "%{[#metadata][document_id]}"
# document_id => "%{aut_es_2}"
# you can set document_id . otherwise ES will genrate unique id.
}
}
}
If you need to run more than one pipeline in the same process, Logstash provides a way to do this through a configuration file called pipelines.yml and using multiple pipelines
multiple pipeline
Using multiple pipelines is especially useful if your current configuration has event flows that don’t share the same inputs/filters and outputs and are being separated from each other using tags and conditionals.
more helpfull resource

Resources