I need to get data from a PostgreSQL DB and index it into Elasticsearch.
https://www.elastic.co/blog/logstash-jdbc-input-plugin
When I run /opt/logstash-2.3.3/bin/logstash -v -f es_table.logstash.conf
I receive the following error:
Pipeline aborted due to error
{:exception=>#<LogStash::ConfigurationError: org.postgres.Driver not loaded.
Are you sure you've included the correct jdbc driver in :jdbc_driver_library?>, :backtrace=>["/opt/logstash-2.3.3/vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.0.2/lib/logstash/plugin_mixins/jdbc.rb:156:in `prepare_jdbc_connection'", "/opt/logstash-2.3.3/vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.0.2/lib/logstash/plugin_mixins/jdbc.rb:148:in `prepare_jdbc_connection'", "/opt/logstash-2.3.3/vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.0.2/lib/logstash/inputs/jdbc.rb:167:in `register'", "/opt/logstash-2.3.3/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/pipeline.rb:330:in `start_inputs'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash-2.3.3/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/pipeline.rb:329:in `start_inputs'", "/opt/logstash-2.3.3/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/pipeline.rb:180:in `start_workers'", "/opt/logstash-2.3.3/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/pipeline.rb:136:in `run'", "/opt/logstash-2.3.3/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/agent.rb:473:in `start_pipeline'"], :level=>:error}
Here is a piece of my Logstash configuration:
input {
jdbc {
jdbc_user => 'user'
jdbc_driver_class => 'org.postgresql.Driver'
jdbc_connection_string => 'jdbc:postgresql://1.1.1.1:5432/db'
lowercase_column_names => false
clean_run => false
jdbc_driver_library => '/usr/share/java/postgresql-jdbc4.jar'
jdbc_password => 'pass'
jdbc_validate_connection => true
jdbc_page_size => 1000
jdbc_paging_enabled => true
statement => 'SELECT * FROM "table"'
type => 'table'
}
...
The jdbc4 driver exists. I tried jdbc3 too without success.
ls /usr/share/java | grep postgresql-jdbc
postgresql-jdbc3-9.2.jar
postgresql-jdbc3.jar
postgresql-jdbc4-9.2.jar
postgresql-jdbc4.jar
The Driver class is inside:
jar tf /usr/share/java/postgresql-jdbc4.jar | grep -i driver
org/postgresql/Driver$1.class
org/postgresql/Driver$ConnectThread.class
org/postgresql/Driver.class
org/postgresql/util/PSQLDriverVersion.class
META-INF/services/java.sql.Driver
The port 5432 is open:
telnet 192.168.109.108 5432
Trying 192.168.109.108...
Connected to 192.168.109.108.
Escape character is '^]'.
Authentication to the DB works.
The problem was that I made a mistake in the driver name.
I wrote jdbc_driver_class => 'org.postgres.Driver'
And the correct name is jdbc_driver_class => 'org.postgresql.Driver'
I resolved this issue by following the workaround suggested in this issue
Reason:
This is a known problem that we have with the modules changes in JDK 9 (Jigsaw). The classloaders have seen some changes and a work around we added before to some driver loading is now failing. The jdbc input has the same failing in JDK 11 (9+). We are working on a fix.
Workaround that worked for me:
An "extreme" work around is to copy the driver file to /logstash-core/lib/jars/ directory. These jar get added to the correct JDK classpath as logstash is started via java.
Related
We are trying to migrate around 3 million records from oracle to Elastic Search using Logstash.
We are applying a couple of jdbc_streaming filters as a part of our logstash script, one to load connecting nested objects and another to run a hierarchical query to load data to another nested object in the index.
We are able to index 0.4 million records in 24 hours. The total size occupied by .4 million records is around 300MB.
We tried multiple approaches to migrate data quickly into elastic from oracle but were not able to achieve desired results.
Please find below the approaches we tried :
1.In the logstash script,
we used jdbc_fetch_size,
jdbc_page_size,
jdbc_paging_enabled,
clean_run parameters,
set pipeline workers to 20 and
pipeline batch size to 125 in logstash.yml file.
2. On the elastic side,
we set the number of replicas to 0,
refresh interval to -1,
tried increasing the value of indices.memory.index_buffer_size parameter, increased number of watcher queues in the elastic.yml file.
We basically googled out and followed various suggestions from this site and others too but nothing seems to work out so far.
We are using a single node elastic setup and neither the DB nor the elastic node are present on the machine from which we are running the logstash script.
Please find below the logstash config file
input {
jdbc {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
statement => "select * from "
}
}
filter{
jdbc_streaming {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
#statement => "select claimnumber,claimtype,is_active from claim where policynumber = :policynumber"
parameters => {"policynumber" => "policynumber"}
target => "nested node"
}
stdout { codec => json }
}
filter{
jdbc_streaming {
jdbc_driver_library => "LIB"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "connection url"
jdbc_user => "user"
jdbc_password => "pwd"
statement => "select listagg(column name,'/' ) within group(order by column name) from
where LEVEL > 1
start with =:
connect by prior = "
parameters => {"p1" => "p1"}
target => "nested node1"
}
}
output {
elasticsearch {
hosts => [""]
index => "<index_name>"
document_id => "%{doc_id}"
}
}
Can you please help us identify bottlenecks and also make suggestions on how to increase indexing performance.
Thank You
I'm having troubles to successfully import HSQL DB database content using Logstash's JDBC input plugin.
The problem occurs when I try to fetch a column that is of type ARRAY.
Please note that if I try to fetch non-array columns, it works just fine.
I get the following error message from Logstash :
[WARN ][logstash.inputs.jdbc ] Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::OrgLogstash::MissingConverterException: Missing Converter handling for full class name=org.hsqldb.jdbc.JDBCArray, simple name=JDBCArray>}
[INFO ][logstash.pipeline ] Pipeline has terminated {:pipeline_id=>"hsql", :thread=>"#<Thread:0x7b626752 run>"}
Please find below the input part of the Logstash conf file (PLATFORM_DESTINATION_CANDIDATES is the name of a column in a table.)
input {
jdbc {
jdbc_driver_library => "hsqldb_2.5.0.jar"
jdbc_driver_class => "org.hsqldb.jdbc.JDBCDriver"
jdbc_connection_string => "jdbc:hsqldb:hsql://localhost/probe"
jdbc_user => "SA"
statement => "SELECT PLATFORM_DESTINATION_CANDIDATES FROM PUBLIC.MESSAGES_SENT"
connection_retry_attempts => 10
}
}
Did any of you encounter this kind of problem, and how did you solve it ?
Thanks.
OS : windows 10
Logstash version : 6.3.1
HSQLDB driver version : 2.5.0 (LINK)
I do not know if it is the best solution, but I managed to solve my issue. Here is how.
I replaced the line :
statement => "SELECT PLATFORM_DESTINATION_CANDIDATES FROM PUBLIC.MESSAGES_SENT"
with :
statement => SELECT concat_ws('', PLATFORM_DESTINATION_CANDIDATES , '') AS str_platforms
It has the consequence to put in the field str_platforms of type string data that looks like : ARRAY[1,2,3,4]
With the following ruby line, I then remove unwanted characters ( ARRAY[ and ] ) from the field :
ruby {
code => "event.set('listRxUnits',event.get('str_platforms').split('ARRAY[')[1].split(']')[0])"
}
I am trying to move SQL Server table record to elasticsearch via logstash. Its basically a synchronization. But I am getting an error from LogStash as unknown error. I have provided my configuration file as well as Error log.
Configuration:
input {
jdbc {
#https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-record_last_run
jdbc_connection_string => "jdbc:sqlserver://localhost-serverdb;database=Application;user=dev;password=system23$"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_user => nil
# The path to our downloaded jdbc driver
jdbc_driver_library => "C:\Program Files (x86)\sqljdbc6.2\enu\sqljdbc4-3.0.jar"
# The name of the driver class for SqlServer
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
#executes every minutes.
schedule => "* * * * *"
#executes 0th minute of every day, basically every hour.
#schedule => "0 * * * *"
last_run_metadata_path => "C:\Software\ElasticSearch\logstash-6.4.0\.logstash_jdbc_last_run"
#record_last_run => false
#clean_run => true
# Query for testing purpose
statement => "Select * from tbl_UserDetails"
}
}
output {
elasticsearch {
hosts => ["10.187.144.113:9200"]
index => "tbl_UserDetails"
#document_id is a unique id, this has to be provided during syn, else we may get duplicate entry in ElasticSearch index.
document_id => "%{Login_User_Id}"
}
}
Error Log:
[2018-09-18T21:04:32,171][ERROR][logstash.outputs.elasticsearch]
An unknown error occurred sending a bulk request to Elasticsearch. We will retry indefinitely {
:error_message=>"\"\\xF0\" from ASCII-8BIT to UTF-8",
:error_class=>"LogStash::Json::GeneratorError",
:backtrace=>["C:/Software/ElasticSearch/logstash-6.4.0/log
stash-core/lib/logstash/json.rb:27:in `jruby_dump'",
"C:/Software/ElasticSearch/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.2.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:119:in `block in bulk'"
, "org/jruby/RubyArray.java:2486:in `map'",
"C:/Software/ElasticSearch/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.2.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:119:in `block in bulk'", "org/jruby/RubyArray.java:1734:in `each'", "C:/Software/ElasticSearch/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.2.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:117:in `bulk'", "C:/Software/ElasticSearch/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9
.2.0-java/lib/logstash/outputs/elasticsearch/common.rb:275:in `safe_bulk'", "C:/Software/ElasticSearch/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.2.0-java/lib/logstash/outputs/elasticsearch/common.rb:180:in `submit'", "C:/Software/ElasticSearch/logstash-6.4.0/vendor/bundle/jruby/2.3.0
/gems/logstash-output-elasticsearch-9.2.0-java/lib/logstash/outputs/elasticsearch/common.rb:148:in `retrying_submit'", "C:/Software/ElasticSearch/logstash-6.4.0/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.2.0-java/lib/log
stash/outputs/elasticsearch/common.rb:38:in `multi_receive'", "org/logstash/config/ir/compiler/OutputStrategyExt.java:114:in `multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:97:in `multi_receive'", "C:/Soft
ware/ElasticSearch/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:372:in`block in output_batch'", "org/jruby/RubyHash.java:1343:in `each'", "C:/Software/ElasticSearch/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:371:in `output_batch'", "C:/Software/ElasticSearch/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:323:in `worker_loop'", "C:/Software/ElasticSearch/logstash-6.4.0/logstash-core/lib/logstash/pipeline.rb:285:in `block in start_workers'"]}
[2018-09-18T21:05:00,140][INFO ][logstash.inputs.jdbc ] (0.008273s) Select *
from tbl_UserDetails
Logstash Version : 6.4.0
Elasticsearch Version :6.3.1
Thanks in advance.
You have a character '\xF0' in database which is causing this issue. This '\xF0' character might be first byte of multibyte character. But since ruby here is trying to decode using ASCII-8BIT, it is considering each byte as character.
You may try using columns_charset to set proper charset. https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-columns_charset
The above issue resolved.
Thanks for your support guys.
The change what I did was in the input -> jdbc I added the below two properties
input {
jdbc {
tracking_column => "login_user_id"
use_column_value => true
}
}
and under output->elasticsearch I changed the two properties
output {
elasticsearch {
document_id => "%{login_user_id}"
document_type => "user_details"
}
}
the main take away from here is all the values should be mentioned in lowercase.
In my Logstash I want to download from a database the most recent data using :sql_last_value in a query and tracking_column option in conf file. I've set
last_run_metadata_path because I have 2 pipelines for the same table but Logstash saved last date only once or stopped saving new dates and now I can see in logs that it runs queries with the same :sql_last_value from metadata file.
That's how my conf file looks like, it has many jdbc inputs and one of them below:
jdbc {
jdbc_driver_library => "/opt/logstash/lib/ojdbc8.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => ""
jdbc_user => ""
jdbc_password => ""
schedule => "*/15 * * * *"
statement_filepath => "/etc/logstash/queries/UAT/transactions_UAT.sql"
use_column_value => true
tracking_column => 'sys_created_on'
tracking_column_type => "timestamp"
last_run_metadata_path => "/etc/logstash/conf.d/lastrun_metadata/transactions_uat_metadata"
tags => ["transactions_uat"]
}
Content of the metadata file:
--- 2018-05-26 08:41:55.000000000 -04:00
I can see in the logs that Logstash always uses the same date from the metadata file and newer updates it:
select * from snc_uat.syslog_transaction0007
where "sys_created_on" >= TIMESTAMP '2018-05-26 08:41:55.000000 -04:00'
Logstash is working and is downloading recent data but unnecessarily processes data that already exists. Why is Logstash not updating metadata?
This is because your comparison operator is greater than or equal to i.e. >= please change it to > and it will fix your problem.
Hope it helps.
I am using Logstash in windows. i was not able to install input jdbc plug so i downloaded the zip file manually and place the logstash folder from plugin in to my logstash-1.5.2 folder.
the folder structure- "D:\elastic search\logstash-1.5.2\lib\logstash\inputs\jdbc.rb".
my conf file
input {
jdbc {
jdbc_driver_library => "D:/elastic search/logstash-1.5.2/lib/mysql-connector-java-5.1.13-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
jdbc_user => "root"
jdbc_password => ""
statement => "SELECT * from data"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
embedded => true
index => "bike"
type => "bikeapp"
cluster =>"trailcluster"
protocol => "http"
port => "9200"
}
}
when i run the logstash i get the error
D:\elastic search\logstash-1.5.2\bin>logstash -f logtest.conf
io/console not supported; tty will not be manipulated
←[33mjdbc plugin doesn't have a version. This plugin isn't well
supported by the community and likely has no maintainer. {:level=>:warn}←[0m
←[33mYou are using a deprecated config setting "type" set in elasticsearch. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. You can achieve this same behavior with the new
conditionals, like: `if [type] == "sometype" { elasticsearch { ... } }`. If you have any questions about this, please visit the #logstash channel on freenode irc. {:name=>"type", :plugin=><LogStash::Outputs::ElasticSearch --
->, :level=>:warn}←[0m
LoadError: no such file to load -- sequel
require at org/jruby/RubyKernel.java:1072
require at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/polyglot-0.3.5/lib/polyglot.rb:65
prepare_jdbc_connection at D:/elastic search/logstash-1.5.2/lib/logstash/plugin_mixins/jdbc.rb:65
register at D:/elastic search/logstash-1.5.2/lib/logstash/inputs/jdbc.rb:144
start_inputs at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:148
each at org/jruby/RubyArray.java:1613
start_inputs at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:147
run at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:80
synchronize at org/jruby/ext/thread/Mutex.java:149
run at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:80
execute at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/agent.rb:150
run at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/runner.rb:91
call at org/jruby/RubyProc.java:271
run at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/runner.rb:96
call at org/jruby/RubyProc.java:271
initialize at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/stud-0.0.20/lib/stud/task.rb:12
After adding the Jar file to Plugin fodler,You just goto the folder path in CMD Prompt and install the plugin using below commands to logstash
Run in an installed Logstash :
Build your plugin gem
gem build logstash-input-jdbc.gemspec
Install the plugin from the Logstash home
bin/plugin install /your/local/plugin/logstash-input-jdbc.gem
Finally you will, Start Logstash and proceed to test the plugin using the configuration you are using....