Nested objects in ElasticSearch with MySQL data

Nested objects in ElasticSearch with MySQL data - elasticsearch

I am new to ES and trying to load data from MYSQL to Elasticsearch using logstash jdbc.
In my situation I want to use column values as field names, Please see new & hex in output data, I want 'id' values as field names.
Mysql data
cid id color new hex create modified
1 101 100 euro abcd #86c67c 5/5/2016 15:48 5/13/2016 14:15
1 102 100 euro 1234 #fdf8ff 5/5/2016 15:48 5/13/2016 14:15
output needed
{
"_index": "colors_hexa",
"_type": "colors",
"_id": "1",
"_version": 218,
"found": true,
"_source": {
"cid": 1,
"color": "100 euro",
"new" : {
"101": "abcd",
"102": "1234",
}
"hex" : {
"101": "#86c67c",
"102": "#fdf8ff",
}
"created": "2016-05-05T10:18:51.000Z",
"modified": "2016-05-13T08:45:30.000Z",
"#version": "1",
"#timestamp": "2016-05-14T01:30:00.059Z"
}
}
Logstash config:
input {
jdbc {
jdbc_driver_library => "/etc/logstash/mysql/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test"
jdbc_user => "root"
jdbc_password => "*****"
schedule => "* * * * *"
statement => "select cid,id,color, new ,hexa_value ,created,modified from colors_hex_test order by cid"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
output {
elasticsearch {
index => "colors_hexa"
document_type => "colors"
document_id => "%{cid}"
hosts => "localhost:9200"
}
}
Can anyone please help with filter tag for this data, 'new' & 'hex' fields are the issue here. I m trying to convert two records to single document.

Related

Logstash from SQL Server to Elasticsearch character encoding problem

I am using ELK stack v8.4.1 and trying to integrate data between SQL Server and Elasticsearch via Logstash. My source table includes Turkish characters (collation SQL_Latin1_General_CP1_CI_AS). When Logstash writes these characters to Elasticsearch, it converts the Turkish characters to '?'. For example 'Şükrü' => '??kr?'. (I used before ELK stack v7.* and didn't have that problem)
This is my config file:
input {
jdbc
{
jdbc_connection_string => "jdbc:sqlserver://my-sql-connection-info;encrypt=false;characterEncoding=utf8"
jdbc_user => "my_sql_user"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_driver_library => "my_path\mssql-jdbc-11.2.0.jre11.jar"
statement => [ "Select id,name,surname FROM ELK_Test" ]
schedule => "*/30 * * * * *"
}
stdin {
codec => plain { charset => "UTF-8"}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test_index"
document_id => "%{id}"
user => "logstash_user"
password => "password"
}
stdout { codec => rubydebug }
}
I tried with and without filter to force encoding to UTF-8 but doesn't change.
filter {
ruby {
code => 'event.set("name", event.get("name").force_encoding(::Encoding::UTF_8))'
}
}
Below is my Elasticsearch result:
{
"_index": "test_index",
"_id": "2",
"_score": 1,
"_source": {
"name": "??kr?",
"#version": "1",
"id": 2,
"surname": "?e?meci",
"#timestamp": "2022-09-16T13:02:00.254013300Z"
}
}
BTW console output results are correct.
{
"name" => "Şükrü",
"#version" => "1",
"id" => 2,
"surname" => "Çeşmeci",
"#timestamp" => 2022-09-16T13:32:00.851877400Z
}
I tried to insert sample data from Kibana Dev Tool and the data was inserted without a problem. Does anybody help, please? What can be wrong? What can I check?

The solution is changing the JDK version. I changed the embedded OpenJDK with Oracle JDK-19 and the problem was solved.

Filter jdbc data in Logstash

In my DB, I've data in below format:
But in ElasticSearch I want to push data with respect to item types. So each record in ElasticSearch will list all item names & its values per item type.
Like this:
{
"_index": "daily_needs",
"_type": "id",
"_id": "10",
"_source": {
"item_type: "10",
"fruits": "20",
"veggies": "32",
"butter": "11",
}
}
{
"_index": "daily_needs",
"_type": "id",
"_id": "11",
"_source": {
"item_type: "11",
"hair gel": "50",
"shampoo": "35",
}
}
{
"_index": "daily_needs",
"_type": "id",
"_id": "12",
"_source": {
"item_type: "12",
"tape": "9",
"10mm screw": "7",
"blinker fluid": "78",
}
}
Can I achieve this in Logstash?
I'm new into Logstash, but as per my understanding it can be done in filter. But I'm not sure which filter to use or do I've to create a custom filter for this.
Current conf example:
input {
jdbc {
jdbc_driver_library => "ojdbc6.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "myjdbc-configs"
jdbc_user => "dbuser"
jdbc_password => "dbpasswd"
schedule => "* * * * *"
statement => "SELECT * from item_table"
}
}
filter {
## WHAT TO WRITE HERE??
}
output {
elasticsearch {
hosts => [ "http://myeshost/" ]
index => "myindex"
}
}
Kindly suggest. Thank you.

You can achieve this using aggregate filter plugin. I have not tested below, but should give you an idea.
filter {
aggregate {
task_id => "%{item_type}" #
code => "
map['Item_type'] = event.get('Item_type')
map[event.get('Item_Name')] = map[event.get('Item_Value')]
"
push_previous_map_as_event => true
timeout => 3600
timeout_tags => ['_aggregatetimeout']
}
if "aggregated" not in [tags] {
drop {}
}
}
Important Caveats for using aggregate filter:
The sql query MUST order the results by Item_Type, so the events are not out of order.
Column names in sql query should match the column names in the filter map[]
You should use ONLY ONE worker thread for aggregations otherwise events may be processed out of sequence and unexpected results will occur.

Logstash 2.3.4 How to load nested document in elasticsearch using logstash-jdbc plugin

I am currently using elasticsearch 2.3.4 and logstash 2.3.4 to load relational data from Oracle db into my elasticsearch index using logstash-jdbc plugin. As suggested in various posts, I am using aggregate filter for this. Still I am not able to load the inner nested object in the document. The values are not getting mapped to fields and are displayed as NULL.
I have two related entities with following data:
CREATE TABLE DEPARTMENT (
id NUMBER PRIMARY KEY,
name VARCHAR2(4000) NOT NULL
)
CREATE TABLE EMPLOYEE (
id NUMBER PRIMARY KEY,
name VARCHAR2(4000) NOT NULL,
departmentid NUMBER,
CONSTRAINT EMPLOYEE_FK FOREIGN KEY (departmentid) REFERENCES DEPARTMENT(id)
)
insert into DEPARTMENT values (1, 'dept1');
insert into DEPARTMENT values (2, 'dept2');
insert into DEPARTMENT values (3, 'dept3');
insert into DEPARTMENT values (4, 'dept4');
insert into EMPLOYEE values (1, 'emp1', 1);
insert into EMPLOYEE values (2, 'emp2', 1);
insert into EMPLOYEE values (3, 'emp3', 1);
insert into EMPLOYEE values (4, 'emp4', 2);
insert into EMPLOYEE values (5, 'emp5', 2);
insert into EMPLOYEE values (6, 'emp6', 3);`
Here is my mapping:
{
"mappings": {
"departments": {
"properties": {
"id": {
"type": "integer"
},
"deptName": {
"type": "string"
},
"employee_details": {
"type": "nested",
"properties": {
"empId": {
"type": "integer"
},
"empName": {
"type": "string"
}
}
}
}
}
}
}
And this is my logstash configuration:
input{
jdbc{
jdbc_validate_connection => true
jdbc_connection_string => "jdbc:oracle:thin:#host:port:db"
jdbc_user => "user"
jdbc_password => "pwd"
jdbc_driver_library => "../vendor/jar/ojdbc14.jar"
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
statement => "SELECT
department.id AS id,
department.name AS deptName,
employee.id AS empId,
employee.name AS empName
FROM department LEFT JOIN employee
ON department.id = employee.departmentid
ORDER BY id"
}
}
filter{
aggregate {
task_id => "%{id}"
code => "
map['id'] = event['id']
map['deptName'] = event['deptName'] #solution - deptName should be in smaller case and other fields too.
map['employee_details'] ||= []
map['employee_details'] << {'empId' => event['empId], 'empName' => event['empName'] }
"
push_previous_map_as_event => true
timeout => 5
timeout_tags => ['aggregated']
}
}
output{
stdout{ codec => rubydebug }
elasticsearch{
action => "index"
index => "my_index"
document_type => "departments"
document_id => "%{id}"
hosts => "localhost:9200"
}
}
When i perform a XGET on all documents:
curl -XGET 'localhost:9200/my_index/_search/?pretty=true&q=:
The values are not mapped to fields and displayed as NULL:
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "departments",
"_id": "2",
"_score": 1,
"_source": {
"id": 2,
"deptName": null,
"employee_details": [
{
"empId": null,
"empName": null
},
{
"empId": null,
"empName": null
}
],
"#version": "1",
"#timestamp": "2019-05-14T10:47:33.477Z",
"tags": [
"aggregated"
]
}
},
{
"_index": "my_index",
"_type": "departments",
"_id": "4",
"_score": 1,
"_source": {
"id": 4,
"deptname": "dept4",
"empid": null,
"empname": null,
"#version": "1",
"#timestamp": "2019-05-14T10:47:33.367Z",
"deptName": null,
"employee_details": [
{
"empId": null,
"empName": null
}
]
}
},
{
"_index": "my_index",
"_type": "departments",
"_id": "1",
"_score": 1,
"_source": {
"id": 1,
"deptName": null,
"employee_details": [
{
"empId": null,
"empName": null
},
{
"empId": null,
"empName": null
},
{
"empId": null,
"empName": null
}
],
"#version": "1",
"#timestamp": "2019-05-14T10:47:33.477Z",
"tags": [
"aggregated"
]
}
},
{
"_index": "my_index",
"_type": "departments",
"_id": "3",
"_score": 1,
"_source": {
"id": 3,
"deptName": null,
"employee_details": [
{
"empId": null,
"empName": null
}
],
"#version": "1",
"#timestamp": "2019-05-14T10:47:33.492Z",
"tags": [
"aggregated"
]
}
}
]
}
}
rubydebug suggests the values are set to 'nil'. Could anyone please help me with what I am doing wrong here?
Here is a snippet from stdout for document with id = 1:
{
"id" => 1.0,
"deptname" => "dept1",
"empid" => 1.0,
"empname" => "emp1",
"#version" => "1",
"#timestamp" => "2019-05-14T12:32:14.272Z"
}
{
"id" => 1.0,
"deptname" => "dept1",
"empid" => 2.0,
"empname" => "emp2",
"#version" => "1",
"#timestamp" => "2019-05-14T12:32:15.272Z"
}
{
"id" => 1.0,
"deptname" => "dept1",
"empid" => 3.0,
"empname" => "emp3",
"#version" => "1",
"#timestamp" => "2019-05-14T12:32:15.272Z"
}
{
"id" => 1.0,
"deptName" => nil,
"employee_details" => [
[0] {
"empId" => nil,
"empName" => nil
},
[1] {
"empId" => nil,
"empName" => nil
},
[2] {
"empId" => nil,
"empName" => nil
}
],
"#version" => "1",
"#timestamp" => "2019-05-14T12:32:15.381Z",
"tags" => [
[0] "aggregated"
]
}

Following code works for me .
input {
jdbc{
jdbc_validate_connection => true
jdbc_connection_string => "----/employees"
jdbc_user => "---"
jdbc_password => "--"
jdbc_driver_library => "/home/ilsa/mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT
e.emp_no as employee_number,
birth_date, first_name, last_name, gender, hire_date, t.title AS titlename,
t.from_date AS titlefrom_date, t.to_date AS titleto_date, d.dept_no AS departmentnumber,
ds.dept_name AS departmentname, d.from_date AS departmentfrom_date, d.to_date AS departmentto_date
FROM employees e
LEFT JOIN(titles t, dept_emp d, departments ds)
ON(e.emp_no = t.emp_no AND e.emp_no = d.emp_no AND d.dept_no = ds.dept_no AND t.from_date < d.to_date AND t.to_date > d.from_date)
ORDER BY e.emp_no ASC"
}
}
filter {
aggregate {
task_id => "%{employee_number}"
code => "
map['employee_number'] = event.get('employee_number')
map['birth_date'] = event.get('birth_date')
map['first_name'] = event.get('first_name')
map['last_name'] = event.get('last_name')
map['gender'] = event.get('gender')
map['hire_date'] = event.get('hire_date')
map['roles'] ||= []
map['roles'] << {
'title.name' => event.get('titlename'),
'title.from_date' => event.get('titlefrom_date'),
'title.to_date' => event.get('titleto_date'),
'department.number' => event.get('departmentnumber'),
'department.name' => event.get('departmentname'),
'department.from_date' => event.get('departmentfrom_date'),
'department.to_date' => event.get('departmentto_date')
}
event.cancel()"
push_previous_map_as_event => true
timeout => 30
}
}
output {
stdout{ codec => rubydebug }
elasticsearch{
action => "index"
index => "employees"
document_type => "employee"
document_id => "%{employee_number}"
hosts => "localhost:9200"
}
}

You can also try to make use of jdbc streaming in logstash filter plugin.
Check this post
Inserting Nested Objects using Logstash
For example, I am taking Stackoverflow Posts and Users as an example. Here Post is parent table and it is associated with Users table on OwnerUserId. So my plugin configuration is
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/javalib/mssql-jdbc-8.2.2.jre11.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://host.docker.internal;database=StackOverflow2010;user=pavan;password=pavankumar#123"
jdbc_user => "pavan"
jdbc_password => "pavankumar#123"
statement => "select top 500 * from StackOverflow2010.dbo.Posts p "
}
}
filter{
jdbc_streaming {
jdbc_driver_library => "/usr/share/logstash/javalib/mssql-jdbc-8.2.2.jre11.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://host.docker.internal;database=StackOverflow2010;user=pavan;password=pavankumar#123"
jdbc_user => "pavan"
jdbc_password => "pavankumar#123"
statement => "select * from StackOverflow2010.dbo.Users u where u.Id = :owneruserid"
parameters => {"owneruserid" => "owneruserid"}
target => "user_details"
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200", "http://elasticsearch:9200"]
index => "stackoverflow_top_user"
}
stdout {
codec => rubydebug
}
}

Load data from MySQL TO Elasticsearch

Hi I am Using the following scirp file in lostash 2.X version I have over 186000 records in MySQL database table,but while running this .conf file only one document is loading in elastic search index.
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://localhost/elasticsearch"
jdbc_user => "root"
jdbc_password => "empower"
#jdbc_validate_connection => true
jdbc_driver_library => "/home/wtc082/Documents/com.mysql.jdbc_5.1.5.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
statement => "SELECT * FROM index_part_content_local;"
#schedule => "* * * * *"
#codec => "json"
}
}
output {
elasticsearch {
index => "mysqltest"
document_type => "mysqltest_type"
document_id => "%{id}"
hosts => "localhost:9200"
}
}
When i use this query only one document is index
GET mysqltest/_search
{
"query": {
"match_all": {}
}
}
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "mysqltest",
"_type": "mysqltest_type",
"_id": "%{id}",
"_score": 1,
"_source": {
"partnum": "",
"property1": "",
"property2": "",
"color": "",
"size": "",
"dim": "",
"thumburl": "",
"catid": "6575",
"subcatid": "3813",
"termid": "31999",
"longdesc": "<ul><li>Equipment and Parts<li>GC32-XD Parts<li>D/V Lock Plate Screw</ul>",
"hier1desc": "Heavy Duty Tools / Equipment",
"hier2desc": "Other Heavy Duty Equipment",
"hier3desc": "Hose Crimping Equipment & Accessories",
"aaiabrandid": "BBSC",
"aaiabrandname": "Gates",
"brandimageurl": "es-logo-sm.jpg",
"linecode": "GAT",
"descrp": "D/V Lock Plate Screw",
"#version": "1",
"#timestamp": "2016-12-20T09:16:40.075Z"
}
}
]
}
}

Ok, as you can see the ID of your document is the verbatim value "%{id}", which means that apparently you don't have any id column in your database and all records from your database are indexed under the same document id, hence why you only see one document.
In your elasticsearch output, you need to make sure to use a field that is the primary key of your table
document_id => "%{PRIMARY_KEY}"
Fix that and that will work.

load array data mysql to ElasticSearch using logstash jdbc

Hi i am new to ES and i m trying to load data from 'MYSQL' to 'Elasticsearch'
I am getting below error when trying to loadata in array format, any help
Here is mysql data, need array data for new & hex value columns
cid color new hex create modified
1 100 euro abcd #86c67c 5/5/2016 15:48 5/13/2016 14:15
1 100 euro 1234 #fdf8ff 5/5/2016 15:48 5/13/2016 14:15
Here us the logstash config
input {
jdbc {
jdbc_driver_library => "/etc/logstash/mysql/mysql-connector-java-5.1.39-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test"
jdbc_user => "root"
jdbc_password => "*****"
schedule => "* * * * *"
statement => "select cid,color, new as 'cvalue.new',hexa_value as 'cvalue.hexa',created,modified from colors_hex_test order by cid"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
}
}
output {
elasticsearch {
index => "colors_hexa"
document_type => "colors"
document_id => "%{cid}"
hosts => "localhost:9200"
Need array data for cvalue (new, hexa) like
{
"_index": "colors_hexa",
"_type": "colors",
"_id": "1",
"_version": 218,
"found": true,
"_source": {
"cid": 1,
"color": "100 euro",
"cvalue" : {
"new": "1234",
"hexa_value": "#fdf8ff",
}
"created": "2016-05-05T10:18:51.000Z",
"modified": "2016-05-13T08:45:30.000Z",
"#version": "1",
"#timestamp": "2016-05-14T01:30:00.059Z"
}
}
this is the error i m getting while running logstash
"status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"Field name [cvalue.hexa] cannot contain '.'"}}}, :level=>:warn}

You cant give a field name with .. But you can try to add:
filter {
mutate {
rename => { "new" => "[cvalue][new]" }
rename => { "hexa" => "[cvalue][hexa]" }
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Nested objects in ElasticSearch with MySQL data - elasticsearch

Related

Logstash from SQL Server to Elasticsearch character encoding problem

Filter jdbc data in Logstash

Logstash 2.3.4 How to load nested document in elasticsearch using logstash-jdbc plugin

Load data from MySQL TO Elasticsearch

load array data mysql to ElasticSearch using logstash jdbc

Categories

Resources