Elasticsearch JDBC importer not importing entry correctly - jdbc

Having the following mapping:
curl -XPUT 'localhost:9200/borrador' -d '{
"mappings": {
"item": {
"dynamic": "strict",
"properties" : {
"body" : { "type": "string" },
"source_id" : { "type": "integer" },
}}}}'
I'm trying to import my DB to Elasticsearch using the Elasticsearch-JDBC importer.
This is the script I'm using:
#!/bin/sh
bin=/usr/share/elasticsearch/elasticsearch-jdbc-2.1.1.2/bin
lib=/usr/share/elasticsearch/elasticsearch-jdbc-2.1.1.2/lib
echo "Indexando base de datos..."
echo '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mydbip/mydbname",
"user" : "username",
"password" : "pw",
"sql" : "select source_id, body, id as _id from table_name",
"index" : "borrador",
"type" : "item"
}
}' | java \
-cp "${lib}/*" \
-Dlog4j.configurationFile=${bin}/log4j2.xml \
org.xbib.tools.Runner \
org.xbib.tools.JDBCImporter
Most of the rows of the table are indexed correctly, but the following row from that DB is giving me an error and it's not indexing correctly:
This is the error that shows up:
[ERROR][org.xbib.elasticsearch.helper.client.BulkTransportClient][elasticsearch[importer][listener][T#1]]
bulk [957] failed with 1 failed items, failure message = failure in
bulk execution:
[3499]: index [borrador], type [item], id [14327140], message [MapperParsingException[failed to parse [body]]; nested:
IllegalArgumentException[unknown property [records]];]
As you can see in this case, this specific row has a json format string ({"format":"MS Excel","price":"750","records":"577","recordType":"records"}<!-- com -->) instead of the normal string that has the other entries that are indexing correctly.
What is happening? I would like to store that as a normal string. It's problem of the mapping as it's reading it as a json or something? Even if I remove the "dynamic": "strict", or the entire mapping, it still gives me the error. Thanks in advance.

By default the JDBC importer tries to detect JSON strings in your data and will parse them. You need to modify the configuration of your importer with the detect_json setting and set it to false:
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mydbip/mydbname",
"user" : "username",
"password" : "pw",
"sql" : "select source_id, body, id as _id from table_name",
"index" : "borrador",
"type" : "item",
"detect_json": false <--- add this
}
}

Related

Getting unknown setting [index._id] error while adding data to Elasticsearch

I have created a mapping eventlog in Elasticsearch 5.1.1. I added it successfully however while adding data under it, I am getting Illegal_argument_exception with reason unknown setting [index._id]. My result from getting the indices is yellow open eventlog sX9BYIcOQLSKoJQcbn1uxg 5 1 0 0 795b 795b
My mapping is:
{
"mappings" : {
"_default_" : {
"properties" : {
"datetime" : {"type": "date"},
"ip" : {"type": "ip"},
"country" : { "type" : "keyword" },
"state" : { "type" : "keyword" },
"city" : { "type" : "keyword" }
}
}
}
}
and I am adding the data using
curl -u elastic:changeme -XPUT 'http://localhost:8200/eventlog' -d '{"index":{"_id":1}}
{"datetime":"2016-03-31T12:10:11Z","ip":"100.40.135.29","country":"US","state":"NY","city":"Highland"}';
If I don't include the {"index":{"_id":1}} line, I get Illegal_argument_exception with reason unknown setting [index.apiKey].
The problem was arising with sending the data from the command line as a string. Keeping the data in a JSON file and sending it as binary solved it. The correct command is:
curl -u elastic:changeme -XPUT 'http://localhost:8200/eventlog/_bulk?pretty' --data-binary #eventlogs.json

Kibana Create Index Pattern : strange behaviour of wildcard

I have just one index in elasticsearch, with name aa-bb-YYYY-MM.
Documents in this index contain a field i want to use as date field.
Those documents have been inserted from a custom script (not using logstash).
When creating the index pattern in kibana:
If i enter aa-bb-*, the date field is not found.
If i enter aa-*, the date field is not found.
If i enter aa*, the date field is found, and i can create the index pattern.
But i really need to group indexes by the first two "dimensions".I tried using "_" instead "-", with the same result.
Any idea of what is going on?
Its working for me. I'm on the latest build on the 5.0 release branch (just past the beta1 release). I don't know what version you're on.
I created this index and added 2 docs;
curl --basic -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09' -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"test" : {
"properties" : {
"date" : { "type" : "date"},
"action" : {
"type" : "text",
"analyzer" : "standard",
"fields": {
"raw" : { "type" : "text", "index" : "not_analyzed" }
}
},
"myid" : { "type" : "integer"}
}
}
}
}'
curl -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09/test/1' -d '{
"date" : "2015-08-23T00:01:00",
"action" : "start",
"myid" : 1
}'
curl -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09/test/2' -d '{
"date" : "2015-08-23T14:02:30",
"action" : "stop",
"myid" : 1
}'
and I was able to create the index pattern with aa-bb-*

How to tell ElasticSearch to create nested fields

I'm putting data in ES and check the mapping which is created,
I'm executing this:
curl -XPOST 'http://localhost:9200/testnested2/type1/0' -d '{
"p1": ["1","2","3","4"],
"users" : {
"first" : "John",
"last" : "Sm11ith"
}
}'
and this is the schema it created:
{
"testnested2":{
"mappings":{
"type1":{
"properties":{
"p1":{"type":"string"},
"users":{
"properties":{
"first":{"type":"string"},
"last":{"type":"string"}
}
}
}
}
}
}
}
I was wondering if it's possible to tell it that "users" is nested or I have to create the mapping for myself.
I would like that ES could create an shema like this:
curl -XPOST http://180.5.5.93:9200/testnested3 -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"properties" : {
"propiedad1" : { "type" : "string"},
"users" : {
"type" : "nested",
"include_in_parent": true,
"properties": {
"first" : {"type": "string" },
"last" : {"type": "string" }
}
}
}
}
}
}'
By default, the dynamic mapping feature of ElasticSearch will map users as an object instead of nested.
If you want to tune this behavior, you have to define explicitely a users attribute as nested either in :
the type1 mapping
the default mapping of your index. This way, for any type created, the users attribute will be set automatically to nested(see here for default mapping information)

Two elasticsearch jdbc river, index data count not match database data count

The table agent_task_base has 12000000 rows
curl -XPUT 'localhost:9200/river/myjdbc_river1/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "...",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index1",
"type" : "my_jdbc_type1"
}
}'
curl -XPUT 'localhost:9200/river/myjdbc_river2/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "..",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index2",
"type" : "my_jdbc_type2"
}
}'
two river execute together, but final result is
my_jdbc_index1 has 10000000+ rows
my_jdbc_index2 has 11000000+ rows
Why????
There is an issue on github of elasticsearch-jdbc-river (#143) which describes the sam problem as you described above. Try to reduce the max bulk requests and let elasticsearch indexing again.
For more details see: https://github.com/jprante/elasticsearch-river-jdbc/issues/143#issuecomment-29550301
I hope this will help
I just figured this out after much trial and error, as i was experiencing the same issue
what worked for me was defining the jdbc river parameters bulk_size and max_bulk_requests
curl -XPUT 'localhost:9200/river/myjdbc_river1/meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "...",
"user" : "...",
"password" : "...",
"sql" : "select * from agenttask_base where status=1",
"index" : "my_jdbc_index1",
"type" : "my_jdbc_type1",
"bulk_size" : 160,
"max_bulk_requests" : 5
}
}'
bulk size of 160 seemed to be my magic number, bulk size of 500 was too high for my local install, and would return a java.sql exception closing the database connection, but was ok for my web server environment
bottom line is you can tinker with these numbers to tune performance, but by setting them you should see your index doc count match your sql result count

Issues when replicating from couchbase bucket to elasticsearch index?

This issue seems to be related to using the XDCR in couchbase. If I had the following simple objects
1: { "name" : "Mark", "age" : 30}
2: { "name" : "Bill", "age" : "forty"}
and set up an elasticsearch index as such
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"dynamic_templates": [
{
"store_generic": {
"match": "*",
"mapping": {
"store": "yes"
}
}
}
]
}
}'
I can then add the two objects to this index using the REST API
curl -XPUT localhost:9200/test/couchbaseDocument/1 -d '{
"name" : "Mark",
"age" : 30
}'
curl -XPUT localhost:9200/test/couchbaseDocument/2 -d '{
"name" : "Bill",
"age" : "forty"
}'
They are now both searchable (despite the fact the "age" is long for one and string for the other.
If, however, I stored these two objects in a couchbase bucket (rather than straight to elasticsearch) and set up the XDCR the first object replicates fine but the second fails with the following error
failed to execute bulk item (index) index {[test][couchbaseDocument][2], source[{"doc":{"name":"Bill","age":"forty"},"meta":{"id":"2","rev":"8-00000b9360d0a0bf0000000000000000","expiration":0,"flags":0}}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [doc.age]
I can't figure out why it works via the REST API but not when couchbase replicates the same objects.
I followed the answer and used the following mapping to get things to work via XDCR
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"properties" : {
"doc": {
"properties" : {
"name" : {"type" : "string", "store" : "yes"},
"age" : {"type" : "string", "store" : "yes"}
}
}
}
}
}'
Now all the objects (despite having different types for the same fields) are replicated and searchable. I don't think there was any need to include the dynamic_templates approach I initially tried. The mapping works.
It's something you have to solve on elasticsearch side.
If the same field name can contain both numeric values and string values, you should create a mapping first which says that age is a String.
So elasticsearch won't try to auto guess type for this field.
Hope this helps

Resources