Logstash from SQL Server to Elasticsearch character encoding problem - elasticsearch

I am using ELK stack v8.4.1 and trying to integrate data between SQL Server and Elasticsearch via Logstash. My source table includes Turkish characters (collation SQL_Latin1_General_CP1_CI_AS). When Logstash writes these characters to Elasticsearch, it converts the Turkish characters to '?'. For example 'Şükrü' => '??kr?'. (I used before ELK stack v7.* and didn't have that problem)
This is my config file:
input {
jdbc
{
jdbc_connection_string => "jdbc:sqlserver://my-sql-connection-info;encrypt=false;characterEncoding=utf8"
jdbc_user => "my_sql_user"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_driver_library => "my_path\mssql-jdbc-11.2.0.jre11.jar"
statement => [ "Select id,name,surname FROM ELK_Test" ]
schedule => "*/30 * * * * *"
}
stdin {
codec => plain { charset => "UTF-8"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test_index"
document_id => "%{id}"
user => "logstash_user"
password => "password"
}
stdout { codec => rubydebug }
}
I tried with and without filter to force encoding to UTF-8 but doesn't change.
filter {
ruby {
code => 'event.set("name", event.get("name").force_encoding(::Encoding::UTF_8))'
}
}
Below is my Elasticsearch result:
{
"_index": "test_index",
"_id": "2",
"_score": 1,
"_source": {
"name": "??kr?",
"#version": "1",
"id": 2,
"surname": "?e?meci",
"#timestamp": "2022-09-16T13:02:00.254013300Z"
}
}
BTW console output results are correct.
{
"name" => "Şükrü",
"#version" => "1",
"id" => 2,
"surname" => "Çeşmeci",
"#timestamp" => 2022-09-16T13:32:00.851877400Z
}
I tried to insert sample data from Kibana Dev Tool and the data was inserted without a problem. Does anybody help, please? What can be wrong? What can I check?

The solution is changing the JDK version. I changed the embedded OpenJDK with Oracle JDK-19 and the problem was solved.

Related

Nested objects in ElasticSearch with MySQL data

I am new to ES and trying to load data from MYSQL to Elasticsearch using logstash jdbc.
In my situation I want to use column values as field names, Please see new & hex in output data, I want 'id' values as field names.
Mysql data
cid id color new hex create modified
1 101 100 euro abcd #86c67c 5/5/2016 15:48 5/13/2016 14:15
1 102 100 euro 1234 #fdf8ff 5/5/2016 15:48 5/13/2016 14:15
output needed
{
"_index": "colors_hexa",
"_type": "colors",
"_id": "1",
"_version": 218,
"found": true,
"_source": {
"cid": 1,
"color": "100 euro",
"new" : {
"101": "abcd",
"102": "1234",
}
"hex" : {
"101": "#86c67c",
"102": "#fdf8ff",
}
"created": "2016-05-05T10:18:51.000Z",
"modified": "2016-05-13T08:45:30.000Z",
"#version": "1",
"#timestamp": "2016-05-14T01:30:00.059Z"
}
}
Logstash config:
input {
jdbc {
jdbc_driver_library => "/etc/logstash/mysql/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test"
jdbc_user => "root"
jdbc_password => "*****"
schedule => "* * * * *"
statement => "select cid,id,color, new ,hexa_value ,created,modified from colors_hex_test order by cid"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
output {
elasticsearch {
index => "colors_hexa"
document_type => "colors"
document_id => "%{cid}"
hosts => "localhost:9200"
}
}
Can anyone please help with filter tag for this data, 'new' & 'hex' fields are the issue here. I m trying to convert two records to single document.

load array data mysql to ElasticSearch using logstash jdbc

Hi i am new to ES and i m trying to load data from 'MYSQL' to 'Elasticsearch'
I am getting below error when trying to loadata in array format, any help
Here is mysql data, need array data for new & hex value columns
cid color new hex create modified
1 100 euro abcd #86c67c 5/5/2016 15:48 5/13/2016 14:15
1 100 euro 1234 #fdf8ff 5/5/2016 15:48 5/13/2016 14:15
Here us the logstash config
input {
jdbc {
jdbc_driver_library => "/etc/logstash/mysql/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test"
jdbc_user => "root"
jdbc_password => "*****"
schedule => "* * * * *"
statement => "select cid,color, new as 'cvalue.new',hexa_value as 'cvalue.hexa',created,modified from colors_hex_test order by cid"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
output {
elasticsearch {
index => "colors_hexa"
document_type => "colors"
document_id => "%{cid}"
hosts => "localhost:9200"
Need array data for cvalue (new, hexa) like
{
"_index": "colors_hexa",
"_type": "colors",
"_id": "1",
"_version": 218,
"found": true,
"_source": {
"cid": 1,
"color": "100 euro",
"cvalue" : {
"new": "1234",
"hexa_value": "#fdf8ff",
}
"created": "2016-05-05T10:18:51.000Z",
"modified": "2016-05-13T08:45:30.000Z",
"#version": "1",
"#timestamp": "2016-05-14T01:30:00.059Z"
}
}
this is the error i m getting while running logstash
"status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"Field name [cvalue.hexa] cannot contain '.'"}}}, :level=>:warn}
You cant give a field name with .. But you can try to add:
filter {
mutate {
rename => { "new" => "[cvalue][new]" }
rename => { "hexa" => "[cvalue][hexa]" }
}
}

33mfailed action with response of 400, dropping action - logstash

I'm trying to use elasticsearch + logstash (jdbc input).
my elasticsearch seems to be ok. The problem seems to be in logstash (elasticsearch output plugin).
my logstash.conf:
input {
jdbc {
jdbc_driver_library => "C:\DEV\elasticsearch-1.7.1\plugins\elasticsearch-jdbc-1.7.1.0\lib\sqljdbc4.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://localhost:1433;databaseName=dbTest"
jdbc_user => "user"
jdbc_password => "pass"
schedule => "* * * * *"
statement => "SELECT ID_RECARGA as _id FROM RECARGA where DT_RECARGA >= '2015-09-04'"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
filter {
}
output {
elasticsearch {
host => "localhost"
protocol => "http"
index => "test_index"
document_id => "%{objectId}"
}
stdout { codec => rubydebug }
}
when I run the log stash:
C:\DEV\logstash-1.5.4\bin>logstash -f logstash.conf
I'm getting this result:
←[33mfailed action with response of 400, dropping action: ["index", {:_id=>"%{ob
jectId}", :_index=>"parrudo", :_type=>"logs", :_routing=>nil}, #<LogStash::Event
:0x5d4c2abf #metadata_accessors=#<LogStash::Util::Accessors:0x900a6e7 #store={"r
etry_count"=>0}, #lut={}>, #cancelled=false, #data={"_id"=>908026, "#version"=>"
1", "#timestamp"=>"2015-09-04T21:19:00.322Z"}, #metadata={"retry_count"=>0}, #ac
cessors=#<LogStash::Util::Accessors:0x4929c6a4 #store={"_id"=>908026, "#version"
=>"1", "#timestamp"=>"2015-09-04T21:19:00.322Z"}, #lut={"type"=>[{"_id"=>908026,
"#version"=>"1", "#timestamp"=>"2015-09-04T21:19:00.322Z"}, "type"], "objectId"
=>[{"_id"=>908026, "#version"=>"1", "#timestamp"=>"2015-09-04T21:19:00.322Z"}, "
objectId"]}>>] {:level=>:warn}←[0m
{
"_id" => 908026,
"#version" => "1",
"#timestamp" => "2015-09-04T21:19:00.322Z"
}
{
"_id" => 908027,
"#version" => "1",
"#timestamp" => "2015-09-04T21:19:00.322Z"
}
{
"_id" => 908028,
"#version" => "1",
"#timestamp" => "2015-09-04T21:19:00.323Z"
}
{
"_id" => 908029,
"#version" => "1",
"#timestamp" => "2015-09-04T21:19:00.323Z"
}
In elasticsearch the index was created but have any docs.
I'm using windows and MSSql Server.
elasticsearch version: 1.7.1
logstash version: 1.5.4
any idea?
Thanks!
All right... after looking fot this erros in elasticsearch log I realize that the problem was the alias in my sql statement.
statement => "SELECT ID_RECARGA as _id FROM RECARGA where DT_RECARGA >= '2015-09-04'"
For some reason that I don't know It wasn't been processed, so I just remove this from my query and everything seems to be right now.
thanks!

Dynamic elasticsearch index_type using logstash

I am working on storing data on elasticsearch using logstash from a rabbitmq server.
My logstash command looks like
logstash -e 'input{
rabbitmq {
exchange => "redwine_log"
key => "info.redwine"
host => "localhost"
durable => true
user => "guest"
password => "guest"
}
}
output {
elasticsearch {
host => "localhost"
index => "redwine"
}
}
filter {
json {
source => "message"
remove_field => [ "message" ]
}
}'
But I needed logstash to put the data into different types in elasticsearch cluster. What i meant by type is:
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "logstash-2014.11.19",
"_type": "logs",
"_id": "ZEea8HBOSs-QwH67q1Kcaw",
"_score": 1,
"_source": {
"context": [],
"level": 200,
"level_name": "INFO",
This is the part of search result, where you can see the logstash by defualt creates a type named "logs" (_type : "logs"). In my project i needed the type to be dynamic and should be created based on the input data.
For example: my input data looks like
{
"data":"some data",
"type": "type_1"
}
and i need the logstash to create a new type in elasticsearch with a name "type_1"..
I tried using grok..But couldnt able to get this specifc requirment.
Its worked for me in this way
elasticsearch {
host => "localhost"
index_type => "%{type}"
}

Logstash and ElasticSearch filter Date #timestamp issue

Im trying to index some data from file to ElasticSearch by using Logstash.
If I'm not using the Date filter in order to replace the #timestamp everything works very well, but when in using the filter I do not get all the data.
I can't figure out why there is a difference between the Logstash command line and Elasticsearch in the #timestamp value.
Logstash conf
filter {
mutate {
replace => {
"type" => "dashboard_a"
}
}
grok {
match => [ "message", "%{DATESTAMP:Logdate} \[%{WORD:Severity}\] %{JAVACLASS:Class} %{GREEDYDATA:Stack}" ]
}
date {
match => [ "Logdate", "dd-MM-yyyy hh:mm:ss,SSS" ]
}
}
Logstash Command line trace
{
**"#timestamp" => "2014-08-26T08:16:18.021Z",**
"message" => "26-08-2014 11:16:18,021 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
"#version" => "1",
"host" => "bts10d1",
"path" => "D:\\ElasticSearch\\logstash-1.4.2\\Dashboard_A\\Log_1\\6.log",
"type" => "dashboard_a",
"Logdate" => "26-08-2014 11:16:18,021",
"Severity" => "DEBUG",
"Class" => "com.fnx.snapshot.mdb.SnapshotMDB",
"Stack" => " - SnapshotMDB Ctor is called\r"
}
ElasticSearch result
{
"_index": "logstash-2014.08.28",
"_type": "dashboard_a",
"_id": "-y23oNeLQs2mMbyz6oRyew",
"_score": 1,
"_source": {
**"#timestamp": "2014-08-28T14:31:38.753Z",
**"message": "15:07,565 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
"#version": "1",
"host": "bts10d1",
"path": "D:\\ElasticSearch\\logstash-1.4.2\\Dashboard_A\\Log_1\\6.log",
"type": "dashboard_a",
"tags": ["_grokparsefailure"]
}
}
Please make sure all your logs is in format!
You can see in the logstash command line trace the logs is
26-08-2014 11:16:18,021 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r
But, in the elastsicsearch the log is
15:07,565 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
Two logs have different time and their format are not same! The second one do not have any information about daytime, therefore it will cause the grok filter parsing error. You can go to check the origin logs. Or can you provide the origin logs sample for more discussion if all of them are in format!

Resources