Elasticsearch mapping not working as expected - elasticsearch

Having the following mapping:
curl -X PUT 'localhost:9200/cambio_indice?pretty=true' -d '{
"mappings" : {
"el_tipo" : {
"properties" : {
"name" : { "type" : "string" },
"age" : { "type" : "integer" },
"read" : { "type" : "integer" }
}}}}'
If I add the following code it works perfectly even though it doesn't match with the mapping (read is missing) but ES doesn't complain.
curl -X PUT 'localhost:9200/cambio_indice/el_tipo/1?pretty=true' -d '{
"name" : "Eduardo Inda",
"age" : 23
}'
And if I add the following entry, it also works.
curl -X PUT 'localhost:9200/cambio_indice/el_tipo/2?pretty=true' -d '{
"jose" : "stuff",
"ramon" : 23,
"garcia" : 1
}'
It seems that the mapping is not taking effect on the elements I'm adding. I'm doing something wrong when I try to map my type?

This is the default behaviour of Elasticsearch and is desirable in most of the cases. But for your case, if you do not want to allow indexing of fields not defined in your mapping, you need to update the mapping and set its "dynamic" property to "strict". Basically, your mapping definition should look like below:
{
"mappings": {
"el_tipo": {
"dynamic": "strict",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"read": {
"type": "integer"
}
}
}
}
}
Then if you try to index fields like "jose", "ramon" or "garcia", Elasticsearch will throw with an appropriate message saying that the dynamic addition of these fields is prohibited.

As per documentation Of ES:
By default, Elasticsearch provides automatic index and mapping when data is added under an index that has not been created before. In other words, data can be added into Elasticsearch without the index and the mappings being defined a priori. This is quite convenient since Elasticsearch automatically adapts to the data being fed to it - moreover, if certain entries have extra fields, Elasticsearch schema-less nature allows them to be indexed without any issues.
So new fields added by you will get automatically added to your mappings.
See this for more info

Related

Elasticsearch: match exact keywords with special characters

I am storing tags as an array of keywords:
...
Tags: {
type: "keyword"
},
...
Resulting in arrays like this:
Tags: [
"windows",
"opengl",
"unicode",
"c++",
"c",
"cross-platform",
"makefile",
"emacs"
]
I thought that as I am using the keyword type I could easily do exact search terms, as it is not supposed to be using any analyser.
Apparently I was wrong! this gives me results:
body.query.bool.must.push({term: {"_all": "c"}}); # 38 results
But this doesn't:
body.query.bool.must.push({term: {"_all": "c++"}}); # 0 results
Although there are obviously instances of this tag, as seen above.
If I use body.query.bool.must.push({match: {"_all": search}}); instead (using match instead of term) then "c" and "c++" returns the exact same results, which is wrong as well.
The problem here is that you are using _all - Field, which uses an analyzer (standard by default). Make a small test with your data to be sure:
Test 1:
curl -X POST http://127.0.0.1:9200/script/test/_search \
-d '{
"query": {
"term" : { "_all": "c++"}
}
}'
Test 2:
curl -X POST http://127.0.0.1:9200/script/test/_search \
-d '{
"query": {
"term" : { "tags": "c++"}
}
}'
In my test second query returns documents, first not.
Do you really need to search with multiple fields? If so, you can override the default analyzer of _all field - for a quick test I put an index with settings like this:
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"test" : {
"_all" : {"type" : "string", "index" : "not_analyzed", "analyzer" : "keyword"},
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
}
Or you can create Custom _all Field.
Solutions like Multi Field query, that allow to define list of fields to be searched over would rather behave like your example with body.query.bool.must.push({match: {"_all": search}});.

copy_to and custom analyzer not working

(I'm doing this with a fresh copy of Elasticsearch 1.5.2)
I've defined a custom analyzer and it's working:
curl -XPUT 127.0.0.1:9200/test -d '{
"settings": {
"index": {
"analysis": {
"tokenizer": {
"UrlTokenizer": {
"type": "pattern",
"pattern": "https?://([^/]+)",
"group": 1
}
},
"analyzer": {
"accesslogs": {
"tokenizer": "UrlTokenizer"
}
}
}
}
}
}'; echo
curl '127.0.0.1:9200/test/_analyze?analyzer=accesslogs&text=http://192.168.1.1/123?a=2#1111' | json_pp
Now I apply it to an index:
curl -XPUT 127.0.0.1:9200/test/accesslogs/_mapping -d '{
"accesslogs" : {
"properties" : {
"referer" : { "type" : "string", "copy_to" : "referer_domain" },
"referer_domain": {
"type": "string",
"analyzer": "accesslogs"
}
}
}
}'; echo
From the mapping I can see both of them are applied.
Now I try to insert some data,
curl 127.0.0.1:9200/test/accesslogs/ -d '{
"referer": "http://192.168.1.1/aaa.php",
"response": 100
}';echo
And the copy_to field, aka referer_domain was not generated and if I try to add a field with that name, the tokenizer is not applied either.
Any ideas?
copy_to works but, you are assuming that since you don't see the field being generated, it doesn't exist.
When you return your document back (with GET /test/accesslogs/1 for example), you don't see the field under _source. This contains the original document that has been indexed. And you didn't index any referer_domain field, just referer and response. And this is the reason why you don't see it.
But Elasticsearch does create that field in the inverted index. You can use it to query, compute or retrieve if you stored it.
Let me exemplify my statements:
you can query that field and you will get results back based on it. If you really want to see what has been stored in the inverted index, you can do this:
GET /test/accesslogs/_search
{
"fielddata_fields": ["referer","response","referer_domain"]
}
you can, also, retrieve that field if you stored it:
"referer_domain": {
"type": "string",
"analyzer": "accesslogs",
"store" : true
}
with this:
GET /test/accesslogs/_search
{
"fields": ["referer","response","referer_domain"]
}
In conclusion, copy_to modifies the indexed document, not the source document. You can query your documents having that field and it will work because the query looks at the inverted index. If you want to retrieve that field you need to store it, as well. But you will not see that field in the _source field because _source is the initial document that has been indexed. And the initial document doesn't contain referer_domain.

How to create alias for dynamic fields in elasticsearch dynamic templates?

I am using elasticsearch 1.0.2 and using a sample dynamic template in my index. Is there anyway we can derive the field index name from a part of dynamic field Name
This is my template
{"dynamic_templates":[
"dyn_string_fields": {
"match": "dyn_string_*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index" : "analyzed",
"index_name": "{name}"
}
}
}]}
The dynamic templates work and I am able to add fields. Our goal is to add fields with the "dyn_string_" prefix but while searching it should be just the fieldname without the "dyn_string_" prefix. I tested using match_mapping_type to add fields but this will allow any field to be added. Does someone have any suggestions?
I looked at Elasticsearch API and they have a transform feature in 1.3 which allows to modify the document before insertion.(unfortunately I will not be able to upgrade to that version.)
In single template several aliases can be set. For quick example please have a look at this dummy example:
curl -XPUT localhost:9200/_template/test_template -d '
{
"template" : "test_*",
"settings" : {
"number_of_shards" : 4
},
"aliases" : {
"name_for_alias" : {}
},
"mappings" : {
"type" : {
"properties" : {
"id" : {
"type" : "integer",
"include_in_all" : false
},
"test_user_id" : {
"type" : "integer",
"include_in_all" : false
}
}
}
}
}
'
There "name_for_alias" is you simple alias. As parameter there can be defined preset filters if you want use alias for filtering data.
More information can be found here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

Issues when replicating from couchbase bucket to elasticsearch index?

This issue seems to be related to using the XDCR in couchbase. If I had the following simple objects
1: { "name" : "Mark", "age" : 30}
2: { "name" : "Bill", "age" : "forty"}
and set up an elasticsearch index as such
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"dynamic_templates": [
{
"store_generic": {
"match": "*",
"mapping": {
"store": "yes"
}
}
}
]
}
}'
I can then add the two objects to this index using the REST API
curl -XPUT localhost:9200/test/couchbaseDocument/1 -d '{
"name" : "Mark",
"age" : 30
}'
curl -XPUT localhost:9200/test/couchbaseDocument/2 -d '{
"name" : "Bill",
"age" : "forty"
}'
They are now both searchable (despite the fact the "age" is long for one and string for the other.
If, however, I stored these two objects in a couchbase bucket (rather than straight to elasticsearch) and set up the XDCR the first object replicates fine but the second fails with the following error
failed to execute bulk item (index) index {[test][couchbaseDocument][2], source[{"doc":{"name":"Bill","age":"forty"},"meta":{"id":"2","rev":"8-00000b9360d0a0bf0000000000000000","expiration":0,"flags":0}}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [doc.age]
I can't figure out why it works via the REST API but not when couchbase replicates the same objects.
I followed the answer and used the following mapping to get things to work via XDCR
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"properties" : {
"doc": {
"properties" : {
"name" : {"type" : "string", "store" : "yes"},
"age" : {"type" : "string", "store" : "yes"}
}
}
}
}
}'
Now all the objects (despite having different types for the same fields) are replicated and searchable. I don't think there was any need to include the dynamic_templates approach I initially tried. The mapping works.
It's something you have to solve on elasticsearch side.
If the same field name can contain both numeric values and string values, you should create a mapping first which says that age is a String.
So elasticsearch won't try to auto guess type for this field.
Hope this helps

Issues mapping a document with ElasticSearch

I have a document that I was hoping to store in ElasticSearch and be able to run queries against, but I think the document structure is possibly badly formed and as such I wont be able to do effective queries.
The document is trying to be generic and as such, has a set of repeating structures.
For example:
description : [
{ type : "port", value : 1234 }.
{ type : "ipaddress", value : "192.168.0.1" },
{ type : "path", value : "/app/index.jsp app/hello.jsp" },
{ type : "upsince", value : "2014-01-01 12:00:00" },
{ type : "location", value : "-40, 70" }
]
Note: Ive simplified the example, as in the real document the repeating structure has about 7 fields, of which 3 fields will explicitly identify the "type".
From the above example I can't see how I can write a mapping, as the "value" could either be an:
Integer
IP Address
A field that needs to be tokenized by only whitespace
A datetime
A GEO Point
The only solution I can see is that the document needs to be converted into another format that would more easily map with ElasticSearch ?
This case is somewhat described here: http://www.found.no/foundation/beginner-troubleshooting/#keyvalue-woes
You can't have different kinds of values in the same field. What you can do is to have different fields like location_value, timestamp_value, and so on.
Here's a runnable example: https://www.found.no/play/gist/ad90fb9e5210d4aba0ee
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"mappings": {
"type": {
"properties": {
"description": {
"type": "nested",
"properties": {
"integer_value": {
"type": "integer"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"timestamp_value": {
"type": "date"
}
}
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"description":[{"type":"port","integer_value":1234},{"type":"upsince","timestamp_value":"2014-01-01T12:00:00"}]}
'
You're doing to save yourself a lot of headaches if you convert them documents like this first
{
"port": 1234,
"ipaddress" : "192.168.0.1" ,
"path" : "/app/index.jsp app/hello.jsp",
"upsince" : "2014-01-01 12:00:00",
"location" : "-40, 70"
}
Elasticsearch is designed to be flexible when it comes to fields and values, so it can already deal with pretty much any key/value combination you throw at it.
Optionally you can include the original document in a field that's explicitly stored but not indexed in case you need the orginal document returned in your queries.

Resources