I want some data in a postgresql database to be indexed to an elasticsearch index. To do so I decided to use Logstash.
I installed Logstash and JDBC.
I perform the following config:
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/product_data"
jdbc_user => "postgres"
jdbc_password => "<my_password>"
jdbc_driver_class => "org.postgresql.Driver"
schedule => "* * * * *" # cronjob schedule format (see "Helpful Links")
statement => "SELECT * FROM public.vendor_product" # the PG command for retrieving the documents IMPORTANT: no semicolon!
jdbc_paging_enabled => "true"
jdbc_page_size => "300"
}
}
output {
# used to output the values in the terminal (DEBUGGING)
# once everything is working, comment out this line
stdout { codec => "json" }
# used to output the values into elasticsearch
elasticsearch {
hosts => ["localhost:9200"]
index => "vendorproduct"
document_id => "document_%id"
doc_as_upsert => true # upserts documents (e.g. if the document does not exist, creates a new record)
}
}
As a test I sheduled this every minute. To run my test I did:
logstash.bat -f logstash_postgre_ES.conf --debug
On my console I get:
...
[2022-04-04T16:10:07,065][DEBUG][logstash.agent ] Starting puma
[2022-04-04T16:10:07,081][DEBUG][logstash.agent ] Trying to start WebServer {:port=>9600}
[2022-04-04T16:10:07,190][DEBUG][logstash.api.service ] [api-service] start
[2022-04-04T16:10:07,834][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
C:/Users/Admin/Desktop/Elastic_search/logstash-6.8.23/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated
[2022-04-04T16:10:09,577][DEBUG][logstash.instrument.periodicpoller.cgroup] One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
[2022-04-04T16:10:09,948][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2022-04-04T16:10:09,951][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2022-04-04T16:10:11,717][DEBUG][logstash.pipeline ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0xaf6cee2 sleep>"}
[2022-04-04T16:10:14,595][DEBUG][logstash.instrument.periodicpoller.cgroup] One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
[2022-04-04T16:10:14,960][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2022-04-04T16:10:14,961][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2022-04-04T16:10:16,742][DEBUG][logstash.pipeline ] Pushing flush onto pipeline {:pipeline_id=>"main", :thread=>"#<Thread:0xaf6cee2 sleep>"}
[2022-04-04T16:10:19,604][DEBUG][logstash.instrument.periodicpoller.cgroup] One or more required cgroup files or directories not found: /proc/self/cgroup, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/cpu
The last part gets printed every 2 seconds or so, for me it seems like its waiting to start though I let this run several minutes and it kept printing the same lines. In Kibana I check if my index got created but that wasn't the case.
The logstash-plain.log gives the same output as the console.
Why is there no index created and filled with the postgresqldata?
Related
I'm processing the 500 000 records from Postgres database to elastic using Logstash but it taking 40 minutes to completed the process. I want to reduce the process time and i have changed the pipeline.batch.size: 1000, pipeline.batch.delay: 50 in logstash.yml file and increase the heap space 1 gb to 2 gb in the JVM.options file still processing the records in same time.
Conf file
input {
jdbc {
jdbc_driver_library => "C:\Users\Downloads\elk stack/postgresql-42.3.1.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
jdbc_user => "postgres"
jdbc_password => "postgres123"
statement => "SELECT * FROM jolap.order_desk_activation"
}
}
output {
elasticsearch {
hosts =>["http://localhost:9200/"]
index => "test-powerbi-transformed"
document_type => "_doc"
}
stdout {}
}
The problem is not the logstash pipeline or the batch size. As above suggested, u need to get volume is less time.
This can be achieved using "Parallel Hints" which makes the query superfast, as the query start using the core processor the DB infrastructure (Dont miss to consult your DBA before applying this). Once u start getting volume records in less time, you can scale your logstash or tweak the pipeline settings.
Refer to this link.
#file:db.conf
input {
jdbc {
jdbc_driver_library => ""
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:#abcd.klm.uvw:1521/qtp1"
jdbc_user =>"user_wew"
jdbc_password => "password_wew"
statement => "select col1, col2, col3, col4, col5, col6, countid,max(version) as mv from master_object_table where version >:sql_last_value group by countid"
schedule => "* * * * *"
last_run_metadata_path => "C:/ES1/ELK_stack_7.4.2/logstash-7.4.2/logstash-7.4.2/Master_refresh_a.txt"
use_column_value => true
tracking_column => "version"
}
}
filter {
mutate {
convert => {
"countid" => "string"
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index =>"refresh_index_a"
document_id =>"%{countid}"
#document_type="_doc"
}
file {
path => "C:\\ES1\\ELK_stack_7.4.2\\logstash-7.4.2\\logstash-7.4.2\\bin\\logstashESRecordsIngestionDetails_refresh_a.txt"
codec => rubydebug
}
stdout { codec => rubydebug }
}
Above is my logstash config file setting. I want to run this logstash 24/7 and also if the machine shutsdown on which this logstash is running then how can I manage that as this logstash is ingesting the live data to ES index. Please suggest. Is there any way if one server goes down the logstash on another node will continue the work.
As per the documentation
Logstash is horizontally scalable and can form groups of nodes running
the same pipeline. Logstash’s adaptive buffering capabilities will
facilitate smooth streaming even through variable throughput loads. If
the Logstash layer becomes an ingestion bottleneck, simply add more
nodes to scale out. Here are a few general recommendations:
Beats should load balance across a group of Logstash nodes.
A minimum of two Logstash nodes are recommended for high availability.
It’s common to deploy just one Beats input per Logstash node, but multiple
Beats inputs can also be deployed per Logstash node to expose
independent endpoints for different data sources.
Currently
I have completed the above task by using one log file and passes data with logstash to one index in elasticsearch :
yellow open logstash-2016.10.19 5 1 1000807 0 364.8mb 364.8mb
What I actually want to do
If i have the following logs files which are named according to Year,Month and Date
MyLog-2016-10-16.log
MyLog-2016-10-17.log
MyLog-2016-10-18.log
MyLog-2016-11-05.log
MyLog-2016-11-02.log
MyLog-2016-11-03.log
I would like to tell logstash to read by Year,Month and Date and create the following indexes :
yellow open MyLog-2016-10-16.log
yellow open MyLog-2016-10-17.log
yellow open MyLog-2016-10-18.log
yellow open MyLog-2016-11-05.log
yellow open MyLog-2016-11-02.log
yellow open MyLog-2016-11-03.log
Please could I have some guidance as to how do i need to go about doing this ?
Thanks You
It is also simple as that :
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "MyLog-%{+YYYY-MM-DD}.log"
}
}
If the lines in the file contain datetime info, you should be using the date{} filter to set #timestamp from that value. If you do this, you can use the output format that #Renaud provided, "MyLog-%{+YYYY.MM.dd}".
If the lines don't contain the datetime info, you can use the input's path for your index name, e.g. "%{path}". To get just the basename of the path:
mutate {
gsub => [ "path", ".*/", "" ]
}
wont this configuration in output section be sufficient for your purpose ??
output {
elasticsearch {
embedded => false
host => localhost
port => 9200
protocol => http
cluster => 'elasticsearch'
index => "syslog-%{+YYYY.MM.dd}"
}
}
I have 2 linux boxes setup in which 1 box contains one component which generates log and logstash installed in it to transfer the logs. And in other box I have redis elasticsearch and logstash. here logstash will act as logstash indexer to grok the data.
Now my problem is that in 1st box component generate new log file everyday, but only difference in log file name varies as per date.
like
counters-20151120-0.log
counters-20151121-0.log
counters-20151122-0.log
and so on, I have included below type of code in my logstash shipper conf file:
file {
path => "/opt/data/logs/counters-%{YEAR}%{MONTHNUM}%{MONTHDAY}*.log"
type => "rg_counters"
}
And in my logstash indexer, I have below type of code to catch those log files:
if [type] == "rg_counters" {
grok{
match => ["message", "%{YEAR}%{MONTHNUM}%{MONTHDAY}\s*%{HOUR}:%{MINUTE}:%{SECOND}\s*(?<counters_raw_data>[0-9\-A-Z]*)\s*(?<counters_operation_type>[\-A-Z]*)\s*%{GREEDYDATA:counters_extradata}"]
}
}
output {
elasticsearch { host => ["elastichost1","elastichost1" ] port => "9200" protocol => "http" }
stdout { codec => rubydebug }
}
Please note that this is working setup and other types log files are getting transfered and processed successfully, so there is no issue of setup.
The problem is how do I process this log file which contains date in it's file name.
Any help here?
Thanks in advance!!
Based on the comments...
Instead of trying to use regexp patterns in your path:
path => "/opt/data/logs/counters-%{YEAR}%{MONTHNUM}%{MONTHDAY}*.log"
just use glob patterns:
path => "/opt/data/logs/counters-*.log"
logstash will remember which files (inodes) that it's seen before.
I am using Logstash in windows. i was not able to install input jdbc plug so i downloaded the zip file manually and place the logstash folder from plugin in to my logstash-1.5.2 folder.
the folder structure- "D:\elastic search\logstash-1.5.2\lib\logstash\inputs\jdbc.rb".
my conf file
input {
jdbc {
jdbc_driver_library => "D:/elastic search/logstash-1.5.2/lib/mysql-connector-java-5.1.13-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
jdbc_user => "root"
jdbc_password => ""
statement => "SELECT * from data"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
embedded => true
index => "bike"
type => "bikeapp"
cluster =>"trailcluster"
protocol => "http"
port => "9200"
}
}
when i run the logstash i get the error
D:\elastic search\logstash-1.5.2\bin>logstash -f logtest.conf
io/console not supported; tty will not be manipulated
←[33mjdbc plugin doesn't have a version. This plugin isn't well
supported by the community and likely has no maintainer. {:level=>:warn}←[0m
←[33mYou are using a deprecated config setting "type" set in elasticsearch. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. You can achieve this same behavior with the new
conditionals, like: `if [type] == "sometype" { elasticsearch { ... } }`. If you have any questions about this, please visit the #logstash channel on freenode irc. {:name=>"type", :plugin=><LogStash::Outputs::ElasticSearch --
->, :level=>:warn}←[0m
LoadError: no such file to load -- sequel
require at org/jruby/RubyKernel.java:1072
require at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/polyglot-0.3.5/lib/polyglot.rb:65
prepare_jdbc_connection at D:/elastic search/logstash-1.5.2/lib/logstash/plugin_mixins/jdbc.rb:65
register at D:/elastic search/logstash-1.5.2/lib/logstash/inputs/jdbc.rb:144
start_inputs at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:148
each at org/jruby/RubyArray.java:1613
start_inputs at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:147
run at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:80
synchronize at org/jruby/ext/thread/Mutex.java:149
run at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/pipeline.rb:80
execute at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/agent.rb:150
run at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/runner.rb:91
call at org/jruby/RubyProc.java:271
run at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.2.2-java/lib/logstash/runner.rb:96
call at org/jruby/RubyProc.java:271
initialize at D:/elastic search/logstash-1.5.2/vendor/bundle/jruby/1.9/gems/stud-0.0.20/lib/stud/task.rb:12
After adding the Jar file to Plugin fodler,You just goto the folder path in CMD Prompt and install the plugin using below commands to logstash
Run in an installed Logstash :
Build your plugin gem
gem build logstash-input-jdbc.gemspec
Install the plugin from the Logstash home
bin/plugin install /your/local/plugin/logstash-input-jdbc.gem
Finally you will, Start Logstash and proceed to test the plugin using the configuration you are using....