Issues when replicating from couchbase bucket to elasticsearch index? - elasticsearch

This issue seems to be related to using the XDCR in couchbase. If I had the following simple objects
1: { "name" : "Mark", "age" : 30}
2: { "name" : "Bill", "age" : "forty"}
and set up an elasticsearch index as such
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"dynamic_templates": [
{
"store_generic": {
"match": "*",
"mapping": {
"store": "yes"
}
}
}
]
}
}'
I can then add the two objects to this index using the REST API
curl -XPUT localhost:9200/test/couchbaseDocument/1 -d '{
"name" : "Mark",
"age" : 30
}'
curl -XPUT localhost:9200/test/couchbaseDocument/2 -d '{
"name" : "Bill",
"age" : "forty"
}'
They are now both searchable (despite the fact the "age" is long for one and string for the other.
If, however, I stored these two objects in a couchbase bucket (rather than straight to elasticsearch) and set up the XDCR the first object replicates fine but the second fails with the following error
failed to execute bulk item (index) index {[test][couchbaseDocument][2], source[{"doc":{"name":"Bill","age":"forty"},"meta":{"id":"2","rev":"8-00000b9360d0a0bf0000000000000000","expiration":0,"flags":0}}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [doc.age]
I can't figure out why it works via the REST API but not when couchbase replicates the same objects.
I followed the answer and used the following mapping to get things to work via XDCR
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"properties" : {
"doc": {
"properties" : {
"name" : {"type" : "string", "store" : "yes"},
"age" : {"type" : "string", "store" : "yes"}
}
}
}
}
}'
Now all the objects (despite having different types for the same fields) are replicated and searchable. I don't think there was any need to include the dynamic_templates approach I initially tried. The mapping works.

It's something you have to solve on elasticsearch side.
If the same field name can contain both numeric values and string values, you should create a mapping first which says that age is a String.
So elasticsearch won't try to auto guess type for this field.
Hope this helps

Related

Getting unknown setting [index._id] error while adding data to Elasticsearch

I have created a mapping eventlog in Elasticsearch 5.1.1. I added it successfully however while adding data under it, I am getting Illegal_argument_exception with reason unknown setting [index._id]. My result from getting the indices is yellow open eventlog sX9BYIcOQLSKoJQcbn1uxg 5 1 0 0 795b 795b
My mapping is:
{
"mappings" : {
"_default_" : {
"properties" : {
"datetime" : {"type": "date"},
"ip" : {"type": "ip"},
"country" : { "type" : "keyword" },
"state" : { "type" : "keyword" },
"city" : { "type" : "keyword" }
}
}
}
}
and I am adding the data using
curl -u elastic:changeme -XPUT 'http://localhost:8200/eventlog' -d '{"index":{"_id":1}}
{"datetime":"2016-03-31T12:10:11Z","ip":"100.40.135.29","country":"US","state":"NY","city":"Highland"}';
If I don't include the {"index":{"_id":1}} line, I get Illegal_argument_exception with reason unknown setting [index.apiKey].
The problem was arising with sending the data from the command line as a string. Keeping the data in a JSON file and sending it as binary solved it. The correct command is:
curl -u elastic:changeme -XPUT 'http://localhost:8200/eventlog/_bulk?pretty' --data-binary #eventlogs.json

Kibana Create Index Pattern : strange behaviour of wildcard

I have just one index in elasticsearch, with name aa-bb-YYYY-MM.
Documents in this index contain a field i want to use as date field.
Those documents have been inserted from a custom script (not using logstash).
When creating the index pattern in kibana:
If i enter aa-bb-*, the date field is not found.
If i enter aa-*, the date field is not found.
If i enter aa*, the date field is found, and i can create the index pattern.
But i really need to group indexes by the first two "dimensions".I tried using "_" instead "-", with the same result.
Any idea of what is going on?
Its working for me. I'm on the latest build on the 5.0 release branch (just past the beta1 release). I don't know what version you're on.
I created this index and added 2 docs;
curl --basic -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09' -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"test" : {
"properties" : {
"date" : { "type" : "date"},
"action" : {
"type" : "text",
"analyzer" : "standard",
"fields": {
"raw" : { "type" : "text", "index" : "not_analyzed" }
}
},
"myid" : { "type" : "integer"}
}
}
}
}'
curl -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09/test/1' -d '{
"date" : "2015-08-23T00:01:00",
"action" : "start",
"myid" : 1
}'
curl -XPUT 'http://elastic:changeme#localhost:9200/aa-bb-2016-09/test/2' -d '{
"date" : "2015-08-23T14:02:30",
"action" : "stop",
"myid" : 1
}'
and I was able to create the index pattern with aa-bb-*

Elasticsearch mapping not working as expected

Having the following mapping:
curl -X PUT 'localhost:9200/cambio_indice?pretty=true' -d '{
"mappings" : {
"el_tipo" : {
"properties" : {
"name" : { "type" : "string" },
"age" : { "type" : "integer" },
"read" : { "type" : "integer" }
}}}}'
If I add the following code it works perfectly even though it doesn't match with the mapping (read is missing) but ES doesn't complain.
curl -X PUT 'localhost:9200/cambio_indice/el_tipo/1?pretty=true' -d '{
"name" : "Eduardo Inda",
"age" : 23
}'
And if I add the following entry, it also works.
curl -X PUT 'localhost:9200/cambio_indice/el_tipo/2?pretty=true' -d '{
"jose" : "stuff",
"ramon" : 23,
"garcia" : 1
}'
It seems that the mapping is not taking effect on the elements I'm adding. I'm doing something wrong when I try to map my type?
This is the default behaviour of Elasticsearch and is desirable in most of the cases. But for your case, if you do not want to allow indexing of fields not defined in your mapping, you need to update the mapping and set its "dynamic" property to "strict". Basically, your mapping definition should look like below:
{
"mappings": {
"el_tipo": {
"dynamic": "strict",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"read": {
"type": "integer"
}
}
}
}
}
Then if you try to index fields like "jose", "ramon" or "garcia", Elasticsearch will throw with an appropriate message saying that the dynamic addition of these fields is prohibited.
As per documentation Of ES:
By default, Elasticsearch provides automatic index and mapping when data is added under an index that has not been created before. In other words, data can be added into Elasticsearch without the index and the mappings being defined a priori. This is quite convenient since Elasticsearch automatically adapts to the data being fed to it - moreover, if certain entries have extra fields, Elasticsearch schema-less nature allows them to be indexed without any issues.
So new fields added by you will get automatically added to your mappings.
See this for more info

Issues mapping a document with ElasticSearch

I have a document that I was hoping to store in ElasticSearch and be able to run queries against, but I think the document structure is possibly badly formed and as such I wont be able to do effective queries.
The document is trying to be generic and as such, has a set of repeating structures.
For example:
description : [
{ type : "port", value : 1234 }.
{ type : "ipaddress", value : "192.168.0.1" },
{ type : "path", value : "/app/index.jsp app/hello.jsp" },
{ type : "upsince", value : "2014-01-01 12:00:00" },
{ type : "location", value : "-40, 70" }
]
Note: Ive simplified the example, as in the real document the repeating structure has about 7 fields, of which 3 fields will explicitly identify the "type".
From the above example I can't see how I can write a mapping, as the "value" could either be an:
Integer
IP Address
A field that needs to be tokenized by only whitespace
A datetime
A GEO Point
The only solution I can see is that the document needs to be converted into another format that would more easily map with ElasticSearch ?
This case is somewhat described here: http://www.found.no/foundation/beginner-troubleshooting/#keyvalue-woes
You can't have different kinds of values in the same field. What you can do is to have different fields like location_value, timestamp_value, and so on.
Here's a runnable example: https://www.found.no/play/gist/ad90fb9e5210d4aba0ee
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"mappings": {
"type": {
"properties": {
"description": {
"type": "nested",
"properties": {
"integer_value": {
"type": "integer"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"timestamp_value": {
"type": "date"
}
}
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"description":[{"type":"port","integer_value":1234},{"type":"upsince","timestamp_value":"2014-01-01T12:00:00"}]}
'
You're doing to save yourself a lot of headaches if you convert them documents like this first
{
"port": 1234,
"ipaddress" : "192.168.0.1" ,
"path" : "/app/index.jsp app/hello.jsp",
"upsince" : "2014-01-01 12:00:00",
"location" : "-40, 70"
}
Elasticsearch is designed to be flexible when it comes to fields and values, so it can already deal with pretty much any key/value combination you throw at it.
Optionally you can include the original document in a field that's explicitly stored but not indexed in case you need the orginal document returned in your queries.

Elasticsearch index aliases (w/ routing) and parent/child docs

I'm trying to set up an index with the following characteristics:
The index houses data for many projects. Most work is project-specific, so I set up aliases for each project, using project_id as the routing field. (And as an associated term filter.)
The data in question have a parent/child structure. For simplicity, let's call the doc types "mama" and "baby."
So we create the index and aliases:
curl -XDELETE http://localhost:9200/famtest
curl -XPOST http://localhost:9200/famtest -d '
{ "mappings" :
{ "mama" :
{ "properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
},
"baby" :
{ "_parent" :
{ "type" : "mama" },
"properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
}
}
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family1",
"index": "famtest",
"routing": "100",
"filter":
{ "term": { "project_id": "100" } }
}
} ]
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family2",
"index": "famtest",
"routing": "200",
"filter":
{ "term": { "project_id": "200" } }
}
} ]
}'
Now let's make some mamas:
curl -XPOST localhost:9200/family1/mama/1 -d '{ "name" : "Family 1 Mom", "project_id" : "100" }'
curl -XPOST localhost:9200/family2/mama/2 -d '{ "name" : "Family 2 Mom", "project_id" : "200" }'
These documents are now available via /familyX/_search. So now we want to add a baby:
curl -XPOST localhost:9200/family1/baby/1?parent=1 -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
Unfortunately, ES doesn't like that:
{"error":"ElasticSearchIllegalArgumentException[Alias [family1] has index routing associated with it [100], and was provided with routing value [1], rejecting operation]","status":400}
So... any idea how to use alias routing and still set the parent id? If I understand this right, it shouldn't be a problem: all project operations ("family1", in this case) go through the alias, so parent and child docs will wind up on the same shard anyway. Is there some alternative way to set the parent id, without interfering with the routing?
Thanks. Please let me know if I can be more specific.
Interesting question! As you already know the parent id is used for routing too since children must be indexed in the same shard as the parent documents. What you're trying to do is fine, since parent and children would fall into the same family, thus in the same shard anyway since you configured the routing in the family alias.
But I'm afraid the parent id has higher priority than the routing defined in the alias, which gets overwritten, but that's not possible and that's why you get the error. In fact, if you try again providing the routing in your index request it works:
curl -XPOST 'localhost:9200/family1/baby/1?parent=1&routing=100' -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
I would fill in a github issue with a curl recreation.

Resources