How to correctly implement elasticsearch on top of sql db datasource [duplicate] - elasticsearch

In one of my project, I am planning to use ElasticSearch with MySQL.
I have successfully installed ElasticSearch. I am able to manage index in ES separately. but I don't know how to implement the same with MySQL.
I have read a couple of documents but I am a bit confused and not having a clear idea.

As of ES 5.x , they have given this feature out of the box with logstash plugin.
This will periodically import data from database and push to ES server.
One has to create a simple import file given below (which is also described here) and use logstash to run the script. Logstash supports running this script on a schedule.
# file: contacts-index-logstash.conf
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
jdbc_user => "user"
jdbc_password => "pswd"
schedule => "* * * * *"
jdbc_validate_connection => true
jdbc_driver_library => "/path/to/latest/mysql-connector-java-jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
statement => "SELECT * from contacts where updatedAt > :sql_last_value"
}
}
output {
elasticsearch {
protocol => http
index => "contacts"
document_type => "contact"
document_id => "%{id}"
host => "ES_NODE_HOST"
}
}
# "* * * * *" -> run every minute
# sql_last_value is a built in parameter whose value is set to Thursday, 1 January 1970,
# or 0 if use_column_value is true and tracking_column is set
You can download the mysql jar from maven here.
In case indexes do not exist in ES when this script is executed, they will be created automatically. Just like a normal post call to elasticsearch

Finally i was able to find the answer. sharing my findings.
To use ElasticSearch with Mysql you will require The Java Database Connection (JDBC) importer. with JDBC drivers you can sync your mysql data into elasticsearch.
I am using ubuntu 14.04 LTS and you will require to install Java8 to run elasticsearch as it is written in Java
following are steps to install ElasticSearch 2.2.0 and ElasticSearch-jdbc 2.2.0 and please note both the versions has to be same
after installing Java8 ..... install elasticsearch 2.2.0 as follows
# cd /opt
# wget https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.2.0/elasticsearch-2.2.0.deb
# sudo dpkg -i elasticsearch-2.2.0.deb
This installation procedure will install Elasticsearch in /usr/share/elasticsearch/ whose configuration files will be placed in /etc/elasticsearch .
Now lets do some basic configuration in config file. here /etc/elasticsearch/elasticsearch.yml is our config file
you can open file to change by
nano /etc/elasticsearch/elasticsearch.yml
and change cluster name and node name
For example :
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: servercluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: vps.server.com
#
# Add custom attributes to the node:
#
# node.rack: r1
Now save the file and start elasticsearch
/etc/init.d/elasticsearch start
to test ES installed or not run following
curl -XGET 'http://localhost:9200/?pretty'
If you get following then your elasticsearch is installed now :)
{
"name" : "vps.server.com",
"cluster_name" : "servercluster",
"version" : {
"number" : "2.2.0",
"build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe",
"build_timestamp" : "2016-01-27T13:32:39Z",
"build_snapshot" : false,
"lucene_version" : "5.4.1"
},
"tagline" : "You Know, for Search"
}
Now let's install elasticsearch-JDBC
download it from http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.3.1/elasticsearch-jdbc-2.3.3.1-dist.zip and extract the same in /etc/elasticsearch/ and create "logs" folder also there ( path of logs should be /etc/elasticsearch/logs)
I have one database created in mysql having name "ElasticSearchDatabase" and inside that table named "test" with fields id,name and email
cd /etc/elasticsearch
and run following
echo '{
"type":"jdbc",
"jdbc":{
"url":"jdbc:mysql://localhost:3306/ElasticSearchDatabase",
"user":"root",
"password":"",
"sql":"SELECT id as _id, id, name,email FROM test",
"index":"users",
"type":"users",
"autocommit":"true",
"metrics": {
"enabled" : true
},
"elasticsearch" : {
"cluster" : "servercluster",
"host" : "localhost",
"port" : 9300
}
}
}' | java -cp "/etc/elasticsearch/elasticsearch-jdbc-2.2.0.0/lib/*" -"Dlog4j.configurationFile=file:////etc/elasticsearch/elasticsearch-jdbc-2.2.0.0/bin/log4j2.xml" "org.xbib.tools.Runner" "org.xbib.tools.JDBCImporter"
now check if mysql data imported in ES or not
curl -XGET http://localhost:9200/users/_search/?pretty
If all goes well, you will be able to see all your mysql data in json format
and if any error is there you will be able to see them in /etc/elasticsearch/logs/jdbc.log file
Caution :
In older versions of ES ... plugin Elasticsearch-river-jdbc was used which is completely deprecated in latest version so do not use it.
I hope i could save your time :)
Any further thoughts are appreciated
Reference url : https://github.com/jprante/elasticsearch-jdbc

The logstash JDBC plugin will do the job:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
jdbc_user => "root"
jdbc_password => "factweavers"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/home/comp/Downloads/mysql-connector-java-5.1.38.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
schedule => "* * * *"
statement => "SELECT" * FROM testtable where Date > :sql_last_value order by Date"
use_column_value => true
tracking_column => Date
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "test-migrate"
"document_type" => "data"
"document_id" => "%{personid}"
}
}

To make it more simple I have created a PHP class to Setup MySQL with Elasticsearch. Using my Class you can sync your MySQL data in elasticsearch and also perform full-text search. You just need to set your SQL query and class will do the rest for you.

Related

Error while connecting Logstash and Elasticsearch

I am very very new to ELK, I installed ELK version 5.6.12 on CentOS sever. Elasticsearch and Kibana works fine. But I cannot connect Logstash to Elastic search.
I have set environment variable as
export JAVA_HOME=/usr/local/jdk1.8.0_131
export PATH=/usr/local/jdk1.8.0_131/bin:$PATH
I run simple test :
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost:9200 protocol => "http" port => "9200" } }'
I get error :
WARNING: Could not find logstash.yml which is typically located in
$LS_HOME/config or /etc/logstash. You can specify the path using --
path.settings. Continuing using the defaults
Could not find log4j2 configuration at path
/etc/logstash/logstash.yml/log4j2.properties. Using default config which
logs errors to the console
Simple "slash" mentioned in official documentation of Logstash works like following :
$bin/logstash -e 'input { stdin { } } output { stdout {} }'
Hello
WARNING: Could not find logstash.yml which is typically located in
$LS_HOME/config or /etc/logstash. You can specify the path using --
path.settings. Continuing using the defaults Could not find log4j2
configuration at path /usr/share/logstash/config/log4j2.properties.
Using default config which logs errors to the console
The stdin plugin is now waiting for input: {
"#version" => "1",
"host" => "localhost",
"#timestamp" => 2018-11-01T04:44:58.648Z,
"message" => "Hello" }
What could be the problem?

Logstash Elastic Cloud 401 Unauthorized error

Official logstash elastic cloud module
Official doc for starting with
My logstash.yml looks like:
cloud.id: "Test:testkey"
cloud.auth: "elastic:password"
With 2 spaces in front and no space at end, within ""
This is all I have in logstash.yml and nothing else,
And I am getting:
[2018-08-29T12:33:52,112][WARN ][logstash.outputs.elasticsearch] Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"https://myserverurl:12345/", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::BadResponseCodeError, :error=>"Got response code '401' contacting Elasticsearch at URL 'https://myserverurl:12345/'"}
And the my_config_file_name.conf looks like:
input{jdbc{...jdbc here... This works, as I see data in windows console}}
output {
stdout { codec => json_lines }
elasticsearch {
hosts => ["myserverurl:12345"]
index => "my_index"
# document_id => "%{brand}"
}
What I am doing is hitting bin/logstash on windows cmd,
It loads data from database that I have configured in input of conf file and then shows me error, I want to index my data from MySQL to elasticsearch on Cloud, I took 14 days trial and created a test index, for learning purpose as I later have to deploy it.
My Pipeline looks like:
- pipeline.id: my_id
path.config: "./config/conf_file_name.conf"
pipeline.workers: 1
If logs won't include senistive data, I can also provide them.
Basically I wan't to sync (schedule check) my MYSQL data with ElasticSearch on cloud i.e. AWS
The output shall be:
elasticsearch {
hosts => ["https://yourhost:yourport/"]
user => "elastic"
password => "password"
# protocol => https
# port => "yourport"
index => "test_index"
# document_id => "%{table_id}"
# - represent comments
as stated at: Configuring logstash with elastic cloud docs
The document provided while deploying app does not provide config for jdbc, jdbc as well need user and password even if defined in settings file i.e. logstash.yml
Also if you created your API key in the web UI you will not be able to get the values needed to configure Logstash. You must to use the devtool console found at /app/dev_tools#/console with something like this:
POST /_security/api_key
{
"name": "logstash"
}
of which the output is something like:
{
"id": "<id value>",
"name": "logstash",
"api_key": "<api key>",
"encoded": "<encoded api key>"
}
And in your logstash pipeline config you use the values like this:
output {
elasticsearch {
cloud_id => "<cloud id>"
api_key => "<id value>:<api key>"
data_stream => true
ssl => true
}
stdout { codec => rubydebug }
}
Note the combined "api_key" value separated by ":". Also, you can find the "cloud id" under your "Deployments" menu option.
I add the same issue in my dev environment. After scour hours on google, I understood by default, when you install Logstash, X-Pack is installed. In the doc https://www.elastic.co/guide/en/logstash/current/setup-xpack.html it is stated that
Blockquote
X-Pack is an Elastic Stack extension that provides security, alerting, monitoring, machine learning, pipeline management, and many other capabilities
Blockquote
As I don't need x-pack to run in my dev while I am streaming Elasticsearch, I had to disable it by setting ilm_enabled to false in the output of my indexation file configuration.
output {
elasticsearch {
hosts => [.. ]
ilm_enabled => false
}
}
The link bellow may help
https://discuss.opendistrocommunity.dev/t/logstash-oss-with-non-removable-x-pack/655/3

Elasticsearch-6.24 logstash-6.2.4 migration error from MySQL to ElasticSearch

Hi please have a look at below issue. I am clueless how to fix this issue.
I've downloaded ElasticSrearch -6.2.4 and Logstash - 6.2.4 on the window machine.
I'm trying to import data from MySQL to ElasticSearch using LogStash. but I'm getting the below error :
C:\logstash-6.2.4\bin>logstash -f logstash.conf
Error: Could not find or load main class Files\Apache
here are the steps I'm following:
first I started the ElasticSearch which is running perfectly on the port 9200.
then I've added the below Scripts in logstash.yml which has all the migration instructions.
# ------------ MySQL to ElasticSearch -------------
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost:3306/MySQL_ElasticSearch_Demo"
# The user we wish to execute our statement as
jdbc_user => "root"
jdbc_password => "root"
# The path to our downloaded jdbc driver
jdbc_driver_library => "C:\mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# our query
statement => "SELECT * FROM user"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "localhost:9200"
"index" => "users"
"document_type" => "usersData"
}
}
I'm trying to run the logstash via command prompt using below command:
C:\logstash-6.2.4\bin>logstash -f logstash.conf
Error: Could not find or load main class Files\Apache
===> any help will be much appreciated. thanks in advance!

how to connect cassandra with logstash input?

Logstash.conf
input { tcp { port => 7199 } } output { elasticsearch { hosts => ["localhost"] } }
Cassandra running on 7199 port and jhipster application running on localhost:8080.
we are unable to add into logstash by my_application
No log4j2 file found.
I think you can use the JDBC plugin:
https://github.com/logstash-plugins/logstash-input-jdbc
input {
jdbc {
jdbc_connection_string => "jdbc:cassandra://hostname:XXXX" # Your port
jdbc_user => "user" # The user value
jdbc_password => "password" # The password
jdbc_driver_library => "$PATH/cassandra_driver.jar" # Jar path
jdbc_driver_class => "org.apache.cassandra.cql.jdbc.CassandraDriver" # Driver
statement => "SELECT * FROM keyspace.my_table" # Your query
}
}
I had the same issue. The issue was solved by downloading a Cassandra JDBC from DatabaseSchema.
also when You want to add the jar files, add it in the
logstashFolder/logstash-core/lib/jar
there seems to be a bug with logstash which only looks this path for external jar files.
also if there were some jar files that were duplicated use the latest ones.

Logstash not starting up

I am trying to start logstash 5.4 on my linux rhel 6 server but i'm getting the following message:
WARNING: Default JAVA_OPTS will be overridden by the JAVA_OPTS defined in the environment. Environment JAVA_OPTS are -Xms1G .Xmx64G
Error: Could not find or load main class .Xmx64G
Following is my logstash.conf in which I'm try to ingest data from sqlserver
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-5.1.42-bin.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://9.37.92.72:1433;databaseName=KaiserPermanente;"
jdbc_user => "sa"
jdbc_password => "passw0rd!"
statement => "select * from IEVDIncident ;"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "kaiserpermanente"
}
stdout { codec => json_lines }
}
Please tell me how can I resolve this one. Thanks
It seems you have an environment variable JAVA_OPTS with value -Xms1G .Xmx64G so it overrides logstash options. You need to change your variable to -Xms1G -Xmx64G. Replace . with -.

Resources