Elasticsearch:how to use elasticsearch-river-jdbc to keep in sync(MySql) - elasticsearch

Now i am working with elasticsearch-river-jdbc.When i update Mysql database,i want my elasitcsearchdata will update(automatic).When i created a river,this is my code:
curl -XPUT '127.0.0.1:9200/_river/my_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/myapp_development",
"user" : "root",
"password" : "",
"sql" : "select * from users",
"autocommit" : "true"
}
}'
But when i update mysql,nothing in elasticsearch data changes.
So what is my wrong??

Simply add 'schedule' as documented here:
elasticsearch-river-jdbc#time-scheduled-execution-of-jdbc-river
{
"type" : "jdbc",
"schedule" : "0 0-59 0-23 ? * *",
"jdbc" : [ {
"url" : "jdbc:mysql://localhost:3306/ZZZZ",
"user" : "root",
"password" : "ZZZ",
"sql" : "Select …"
}]
}
(This one will get updates every minute)

Related

Elasticsearch JDBC importer not importing entry correctly

Having the following mapping:
curl -XPUT 'localhost:9200/borrador' -d '{
"mappings": {
"item": {
"dynamic": "strict",
"properties" : {
"body" : { "type": "string" },
"source_id" : { "type": "integer" },
}}}}'
I'm trying to import my DB to Elasticsearch using the Elasticsearch-JDBC importer.
This is the script I'm using:
#!/bin/sh
bin=/usr/share/elasticsearch/elasticsearch-jdbc-2.1.1.2/bin
lib=/usr/share/elasticsearch/elasticsearch-jdbc-2.1.1.2/lib
echo "Indexando base de datos..."
echo '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mydbip/mydbname",
"user" : "username",
"password" : "pw",
"sql" : "select source_id, body, id as _id from table_name",
"index" : "borrador",
"type" : "item"
}
}' | java \
-cp "${lib}/*" \
-Dlog4j.configurationFile=${bin}/log4j2.xml \
org.xbib.tools.Runner \
org.xbib.tools.JDBCImporter
Most of the rows of the table are indexed correctly, but the following row from that DB is giving me an error and it's not indexing correctly:
This is the error that shows up:
[ERROR][org.xbib.elasticsearch.helper.client.BulkTransportClient][elasticsearch[importer][listener][T#1]]
bulk [957] failed with 1 failed items, failure message = failure in
bulk execution:
[3499]: index [borrador], type [item], id [14327140], message [MapperParsingException[failed to parse [body]]; nested:
IllegalArgumentException[unknown property [records]];]
As you can see in this case, this specific row has a json format string ({"format":"MS Excel","price":"750","records":"577","recordType":"records"}<!-- com -->) instead of the normal string that has the other entries that are indexing correctly.
What is happening? I would like to store that as a normal string. It's problem of the mapping as it's reading it as a json or something? Even if I remove the "dynamic": "strict", or the entire mapping, it still gives me the error. Thanks in advance.
By default the JDBC importer tries to detect JSON strings in your data and will parse them. You need to modify the configuration of your importer with the detect_json setting and set it to false:
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mydbip/mydbname",
"user" : "username",
"password" : "pw",
"sql" : "select source_id, body, id as _id from table_name",
"index" : "borrador",
"type" : "item",
"detect_json": false <--- add this
}
}

Es update giving bad request

I am trying to update the following document:
$ curl http://192.168.0.108:9200/customer/customer/AVGbTCQ2XioLLLZULBAD?pretty
{
"_index" : "customer",
"_type" : "customer",
"_id" : "AVGbTCQ2XioLLLZULBAD",
"_version" : 15,
"found" : true,
"_source":{"age":0,"n":"Abhishek Gupta","id":"AVGbTCQ2XioLLLZULBAD"}
}
According to update API my update request is:
$ curl -XPOST http://192.168.0.108:9200/customer/customer/AVGbTCQ2XioLLLZULBAD/_update -d '{
> "script": {
> "inline": "ctx._source.n = name",
> "params": {
> "name": "Elas"
> }
> }
> }'
{"error":"ActionRequestValidationException[Validation Failed: 1: script or doc is missing;]","status":400}
Why I am getting bad request?
Es version details:
$ curl http://192.168.0.108:9200/
{
"status" : 200,
"name" : "Vance Astro",
"cluster_name" : "dexter-elasticsearch",
"version" : {
"number" : "1.7.1",
"build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
"build_timestamp" : "2015-07-29T09:54:16Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
Scripted updated in ES 1.7.1 is slightly different than in ES 2.x, i.e. there is no inline parameter like in ES 2.x, instead you write it like this:
curl -XPOST http://192.168.0.108:9200/customer/customer/AVGbTCQ2XioLLLZULBAD/_update -d '{
"script": "ctx._source.n = name",
"params": {
"name": "Elas"
}
}'

Elastic search csv river module not working

I am trying to index csv file in elasticsearch
curl -XPUT localhost:9200/_river/my_csv_river/_meta -d '
{
"type" : "csv",
"csv_file" : {
"folder" : "/tmp",
"filename_pattern" : ".*\\.csv$",
"first_line_is_header" : "true",
"field_separator" : ",",
"field_id" : "_id"
},
"index" : {
"index" : "my_csv_data_1",
"type" : "csv_type_1",
"bulk_size" : 100,
"bulk_threshold" : 10
}
}'
after indexing while searching http://localhost:9200/my_csv_data_1/_search
got
{
error: "IndexMissingException[[my_csv_data_1] missing]",
status: 404
}
any thoughts or i missed any thing?

Elasticsearch jdbc sort by _id

I want to make my results sortable by _id so I need to add
"_id" : {
"index" : "not_analyzed",
"store" : true
}
to my index, correct?
here is my script for jdbc feeder:
#!/bin/bash
#Verzeichnis von JDBC -> wichtig, anpassen!
export JDBC_IMPORTER_HOME=~/Downloads/elasticsearch-jdbc-1.6.0.0
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
echo '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://xxxxxxxxx",
"user" : "xxxx",
"password" : "xxxxxxx",
"index" : "xxxxxxxx",
"type" : "xxxxxx",
"sql" : "SELECT xxxxxxxxx"
}
}' | java \
-cp "${lib}/*" \
-Dlog4j.configurationFile=${bin}/log4j2.xml \
org.xbib.tools.Runner \
org.xbib.tools.JDBCImporter
I added
"index_settings" : {
"_id" : {
"index" : "not_analyzed",
"store" : true
}
}
after
"password" : "xxxxxxx",
"index" : "xxxxxxxx",
but it doesn't work
where do I add this setting?

Two elasticsearch jdbc river, index data count not match database data count

The table agent_task_base has 12000000 rows
curl -XPUT 'localhost:9200/river/myjdbc_river1/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "...",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index1",
"type" : "my_jdbc_type1"
}
}'
curl -XPUT 'localhost:9200/river/myjdbc_river2/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "..",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index2",
"type" : "my_jdbc_type2"
}
}'
two river execute together, but final result is
my_jdbc_index1 has 10000000+ rows
my_jdbc_index2 has 11000000+ rows
Why????
There is an issue on github of elasticsearch-jdbc-river (#143) which describes the sam problem as you described above. Try to reduce the max bulk requests and let elasticsearch indexing again.
For more details see: https://github.com/jprante/elasticsearch-river-jdbc/issues/143#issuecomment-29550301
I hope this will help
I just figured this out after much trial and error, as i was experiencing the same issue
what worked for me was defining the jdbc river parameters bulk_size and max_bulk_requests
curl -XPUT 'localhost:9200/river/myjdbc_river1/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "...",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index1",
"type" : "my_jdbc_type1",
"bulk_size" : 160,
"max_bulk_requests" : 5
}
}'
bulk size of 160 seemed to be my magic number, bulk size of 500 was too high for my local install, and would return a java.sql exception closing the database connection, but was ok for my web server environment
bottom line is you can tinker with these numbers to tune performance, but by setting them you should see your index doc count match your sql result count

Resources