I have a log stash running pulling records from postgresql and creating documents in elastic search, but whenever i am trying to update a record in postgres the same is not getting reflected in elastic search, here is my INPUT & OUTPUT configs let me know if i am missing anything here,
input {
jdbc {
# Postgres jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:postgresql://127.0.0.1:5009/data"
# The user we wish to execute our statement as
jdbc_user => "data"
jdbc_password=>"data"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/postgresql-9.4.1209.jar"
# The name of the driver class for Postgresql
jdbc_driver_class => "org.postgresql.Driver"
#sql_log_level => "debug"
jdbc_paging_enabled => "true"
jdbc_page_size => "5000"
schedule => "* * * * *"
# our query
clean_run => true
last_run_metadata_path => "/logstash/.test_metadata"
#use_column_value => true
#tracking_column => id
statement => "SELECT id,name,update_date from data where update_date > :sql_last_value"
}
}
output {
elasticsearch{
hosts => ["127.0.0.1"]
index => "test_data"
action => "index"
document_type => "data"
document_id => "%{id}"
upsert => ' {
"name" : "%{data.name}",
"update_date" : "%{data.update_date}"
} '
}
}
I think you need to track a date/timestamp column instead of the id column. If you have an UPDATE_DATE column that changes on each update, that would be good.
Your SELECT statement will only grab new records (i.e. id > last_id) and if you update a record, its id won't change, hence that updated record won't be picked up by the jdbc input the next time it runs.
Related
I was working on inserting indexes to elasticsearch using logstash. My conf looks like this:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/sample"
jdbc_user => "root"
jdbc_password => ""
jdbc_driver_library => "/usr/share/logstash/logstash-core/lib/jars/mysql-connector-java-5.1.48.jar"
jdbc_driver_class => " com.mysql.jdbc.Driver"
tracking_column => "time"
use_column_value => true
statement => "SELECT time, firstname, lastname, email FROM sample.sample where time > :sql_last_value;"
schedule => " * * * * * *"
}
}
output {
elasticsearch {
document_id=> "%{time}"
document_type => "doc"
"hosts" => ["myhost"]
"index" => "sample"
}
stdout{
codec => rubydebug
}
}
When I execute this file using logstash, I got query:
SELECT time, firstname, lastname, email FROM sample.sample where time > 351
But the time column is currenttimestamp in MySQL database which I'm tracking, and I think sql_last_value contains the value of this field. Am I doing something incorrect related to sql_last_value?
What I was doing wrong:
we need to specify tracking_column_type if the tracking_column is not of type int. In my case, tracking_column is of type timestamp. So, I need to add:
tracking_column_type =>"timestamp"
That solved my problem.
I am trying to update ElasticSearch indexes with the data stored into a SQL DataBase in way that every row added into the DB are added automatically into ElasticSearch.
I tried to set the Primary Key of the DB as the _id field of ElasticSearch in way that every time the schedule launches Logstash (once a minute), the documents that were alredy in ElasticSearch doesn't get re-added.
This is my Logstash .conf file:
input {
jdbc {
jdbc_connection_string => "JDBC-Connection-String"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_user => "JDBC-Connection-User"
jdbc_driver_library => "JDBC-Driver-Path"
statement => "SELECT MyCol1 MyCol2 FROM MyTable"
use_column_value => true
tracking_column => "MyCol1"
tracking_column_type => "numeric"
clean_run => true
schedule => "*/1 * * * *"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "MyIndex"
document_id => "%{MyCol1}"
}
stdout { }
}
After Logstash finishes I find only 1 document with "_id": "%{MyCol1}" into ElasticSearch, why can't Logstash take the id value properly?
P.S. MyCol1 is Primary Key of Mytable
few things to keep in mind.
value in document_id must be part of query.
the id is case sensitive. so use exact name..
clean_run=>false
use :sql_last_value to identify the column which you want to be taken care to identify new record.
JDBC Plugin Polymorphic Index
Hi, we have a table that's polymorphic to items and we'd like to find a way to update different indexes in one logstash config.
Table Structure
Below is an example table. The item_type column denotes the type (such as Pen, Post, Collection), the item_id is a foreign key to the item in our DB, and the score is calculated on a cron and updated every once in a while, which updates our updated_at column.
popularity_scores
Process
Using logstash jdbc plugins, we'd like to query the data, then push it to ES. However, I don't see a way (other than a logstash config and sql query for each item type) to dynamically push updates to indexes. In a perfect world, we'd like to take input from the table above (see input code below)
input
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/bin/mysql-connector-java-8.0.15.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
# useCursorFetch needed cause jdbc_fetch_size not working??
# https://discuss.elastic.co/t/logstash-jdbc-plugin/84874/2
# https://stackoverflow.com/a/10772407
jdbc_connection_string => "jdbc:mysql://${CP_LS_SQL_HOST}:${CP_LS_SQL_PORT}/${CP_LS_SQL_DB}?useCursorFetch=true&autoReconnect=true&failOverReadOnly=false&maxReconnects=10"
statement => "select * from view_elastic_popularity_all where updated_at > :sql_last_value"
jdbc_user => "${CP_LS_SQL_USER}"
jdbc_password => "${CP_LS_SQL_PASSWORD}"
jdbc_fetch_size => "${CP_LS_FETCH_SIZE}"
last_run_metadata_path => "/usr/share/logstash/cp/last_run_files/last_run_popularity_live"
jdbc_page_size => '10000'
use_column_value => true
tracking_column => 'updated_at'
tracking_column_type => 'timestamp'
schedule => "* * * * *"
}
}
Then run update queries to ES via an output plugin (see output code below)
output
output {
elasticsearch {
index => "HOW_DO_WE_DYNAMICALLY_SET_INDEX_BASED_ON_ITEM_TYPE?"
document_id => "%{id}"
hosts => ["${CP_LS_ES_HOST}:${CP_LS_ES_PORT}"]
user => "${CP_LS_ES_USER}"
password => "${CP_LS_ES_PASSWORD}"
}
}
Help?
We can't be the first company with this problem. How would we structure the output?
You can dynamically set the name of the index by using a field in the event message in the same way that you dynamically setup the document_id.
output {
elasticsearch {
index => "%{item_type}"
document_id => "%{id}"
hosts => ["${CP_LS_ES_HOST}:${CP_LS_ES_PORT}"]
user => "${CP_LS_ES_USER}"
password => "${CP_LS_ES_PASSWORD}"
}
}
Is there any way I can configure logstash so that it picks up delta records real time automatically. If not then is there any opensource plugin/tool available to achieve this? Thanks for the help.
Try the below configuration for the MSSQL server. You need to schedule it like below by adding the schedule period, a statement which would the query to fetch the data from your database
input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://localhost:1433;databaseName=test"
# The user we wish to execute our statement as
jdbc_user => "sa"
jdbc_password => "sasa"
# The path to our downloaded jdbc driver
jdbc_driver_library => "C:\Users\abhijitb\.m2\repository\com\microsoft\sqlserver\mssql-jdbc\6.2.2.jre8\mssql-jdbc-6.2.2.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
#clean_run => true
schedule => "* * * * *"
#query
statement => "SELECT * FROM Student where studentid > :sql_last_value"
use_column_value => true
tracking_column => "studentid"
}
}
output {
#stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "student"
"document_type" => "data"
"document_id" => "%{studentid}"
}
}
I am trying to index data from mysql db to elasticsearch using logstash. Logstash is running without errors but the problem is, it indexing only one row from my SELECT query.
Below are the versions of softwares I am using:
elastic search : 2.4.1
logstash: 5.1.1
mysql: 5.7.17
jdbc_driver_library: mysql-connector-java-5.1.40-bin.jar
I am not sure if this is because logstash and elasticsearch versions are different.
Below is my pipeline configuration:
input {
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "password"
schedule => "* * * * *"
statement => "SELECT * FROM employee"
use_column_value => true
tracking_column => "id"
}
}
output {
elasticsearch {
index => "logstash"
document_type => "sometype"
document_id => "%{uid}"
hosts => ["localhost:9200"]
}
}
It seems like the tracking_column (id) which you're using in the jdbc plugin and the document_id (uid) in the output is different. What if you have both of them same since it'll be easy to get all the records by id and push them into ES using the same id as well which could look more understandable:
document_id => "%{id}" <-- make sure you've got the exact spellings
And also please try adding this following line to your jdbc input after tracking_column:
tracking_column_type => "numeric"
Additionally to make sure that you don't have the .logstash_jdbc_last_run file existing when you're running the logstash file include the following line as well:
clean_run => true
So this is how your jdbc input should look like:
jdbc {
jdbc_driver_library => "mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "password"
schedule => "* * * * *"
statement => "SELECT * FROM employee"
use_column_value => true
tracking_column => "id"
tracking_column_type => "numeric"
clean_run => true
}
Other than that the conf seems to be fine, unless you're willing to have :sql_last_value where if you only wanted to update the newly added records in your database table. Hope it helps!