Elasticsearch "strict" mapping not working for fields with null values - elasticsearch

I have an index for which I have set the mapping to "dynamic":"strict".
As expected, for the most part, if a field that is not listed in the mapping is introduced, Elasticsearch will reject it.
However I am finding that any field with a null value is not caught, and will make it into my index. Here is what my mapping looks like:
{
"myindex": {
"mappings": {
"mystuff": {
"dynamic": "strict",
"_id": {
"store": true,
"index": "not_analyzed"
},
"_timestamp": {
"enabled": true,
"store": true
},
"_index": {
"enabled": true
},
"_type": {
"store": true
},
"properties": {
"entitlements": {
"type": "nested",
"properties": {
"accountNumber": {
"type": "string",
"index": "not_analyzed"
},
"active": {
"type": "string",
"index": "not_analyzed"
},
"assetEndDate": {
"type": "date",
"format": "date_time_no_millis"
}
}
}
}
}
}
}
}
EDIT (including example scenarios)
With the mapping above, here are the scenarios I am seeing:
1) When Posting a valid document (one that follows the mapping), 200 OK.
posted document:
{
"entitlements": [
{
"accountNumber": "123213",
"active": "true",
"assetEndDate": "2016-10-13T00:00:00Z"
}
]
}
elasticsearch response:
{
"_index": "myindex",
"_type": "mystuff",
"_id": "5",
"_version": 1,
"created": true
}
2) When posting an invalid document (one that does not follow the mapping), 400 StrictDynamicMappingException.
posted document:
{
"entitlements": [
{
"accountNumber":"123213",
"XXXXactive": "true",
"assetEndDate": "2016-10-13T00:00:00Z"
}
]
}
elasticsearch response:
{
"error": "StrictDynamicMappingException[mapping set to strict, dynamic introduction of [Xactive] within [entitlements] is not allowed]",
"status": 400
}
3) When posting an invalid document (one that does not follow the mapping) with a value that is null for the invalid field, 200 OK.
posted document:
{
"entitlements": [
{
"accountNumber":"123213",
"XXXXactive": null,
"assetEndDate": "2016-10-13T00:00:00Z"
}
]
}
elasticsearch response:
{
"_index": "myindex",
"_type": "mystuff",
"_id": "7",
"_version": 1,
"created": true
}
4) When posting an invalid document (one that does not follow the mapping) with a value that is null for the invalid field, 200 OK.
posted document:
{
"entitlements": [
{
"accountNumber":"123213",
"XXXXactive": null,
"assetEndDate": "2016-10-13T00:00:00Z",
"THIS_SHOULD_NOT_BE_HERE": null
}
]
}
elasticsearch response:
{
"_index": "myindex",
"_type": "mystuff",
"_id": "9",
"_version": 1,
"created": true
}
It is the 3rd and 4th scenarios, that I am concerned about.
It looks like this issue (or one very similar) was raised one the Elasticsearch git repository here and has since been closed. However, the problem appears to still be an issue in version 1.7 .
This is being seen locally, as well as on indexes I have deployed with AWS Elasticsearch Service.
Am I making a mistake somewhere, or Has anyone found a solution to this problem ?

Related

Username search in Elasticsearch

I want to implement a simple username search within Elasticsearch. I don't want weighted username searches yet, so I would expect it wouldn't be to hard to find resources on how do this. But in the end, I came across NGrams and lot of outdated Elasticsearch tutorials and I completely lost track on the best practice on how to do this.
This is now my setup, but it is really bad because it matches so much unrelated usernames:
{
"settings": {
"index" : {
"max_ngram_diff": "11"
},
"analysis": {
"analyzer": {
"username_analyzer": {
"tokenizer": "username_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"username_tokenizer": {
"type": "ngram",
"min_gram": "1",
"max_gram": "12"
}
}
}
},
"mappings": {
"properties": {
"_all" : { "enabled" : false },
"username": {
"type": "text",
"analyzer": "username_analyzer"
}
}
}
}
I am using the newest Elasticsearch and I just want to query similar/exact usernames. I have a user db and users should be able to search for eachother, nothing to fancy.
If you want to search for exact usernames, then you can use the term query
Term query returns documents that contain an exact term in a provided field. If you have not defined any explicit index mapping, then you need to add .keyword to the field. This uses the keyword analyzer instead of the standard analyzer.
There is no need to use an n-gram tokenizer if you want to search for the exact term.
Adding a working example with index data, index mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Index Data:
{
"username": "Jack"
}
{
"username": "John"
}
Search Query:
{
"query": {
"term": {
"username.keyword": "Jack"
}
}
}
Search Result:
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Jack"
}
}
]
Edit 1:
To match for similar terms, you can use the fuzziness parameter along with the match query
{
"query": {
"match": {
"username": {
"query": "someting",
"fuzziness":"auto"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "3",
"_score": 0.6065038,
"_source": {
"username": "something"
}
}
]

How to index a field in elasticsearch but not store it in _source?

I have a collection of documents with a text field "features", and would like to make this field indexed (so documents can be searched through the field) but not stored (in order to save disk space).
How to index a field in elasticsearch like this "features" field but not store it in _source?
The following index mapping, will index a field value but not store it
Index Mapping:
{
"mappings": {
"properties": {
"features": {
"type": "text",
"index": "true",
"store": "false"
}
}
}
}
Index Data:
{
"features": "capacity"
}
Search Query:
{
"stored_fields": [
"features"
]
}
Search Result:
"hits": [
{
"_index": "67155998",
"_type": "_doc",
"_id": "1",
"_score": 1.0
}
]
UPDATE 1:
When a field is indexed, then you can perform queries on it. If a field is stored the contents of the field can be shown when the document matches.
But if you want that the content of the field should also not to be displayed in the _source, then you need to disable the _source field.
You need to modify your index mapping as
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"features": {
"type": "text",
"index": "true",
"store": "false"
}
}
}
}
Search Query:
{
"query":{
"match":{
"features":"capacity"
}
}
}
Search Result:
"hits": [
{
"_index": "67155998",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821
}
]

ElasticSearch NEST Reindex, edit name fields

I have an Index with nested Objects something like
"_index": "originindex",
"_source": {
"message": "",
"environment": "",
"nestedObj": {
"field1": "field1",
"field2": 1 },
"anotherfield": 1}
And I want to reindexit to something like
"_index": "newindex",
"_source": {
"message": "",
"nestedObj-field1":"field1",
"nestedObj-field2": 1 ,
"anotherfield": 1}
I'am new to all of this I'm using Nest on .Net V4.5, it proposes a ReindexAPI But don'tknow how to use it for this purpose
Thank you!
POST _reindex
{
"source": {
"index": "originindex"
},
"dest": {
"index": "newindex"
},
"script":{
"source":"ctx._source.nestedObj-field1 = ctx._source.remove(\"field1\");ctx._source.nestedObj-field2 = ctx._source.remove(\"field2\");"
}
Just make sure your mappings are in place on the dest index before you execute this.

I can't get nested stored_fields

All.
I'm using ElasticSearch 5.0 and I have next mapping:
{
"mappings": {
"object":{
"properties":{
"attributes":{
"type":"nested",
"properties":{
"name": { "type": "keyword", "store":true},
"value": { "type": "text", "store":true },
"converted": {"type": "double", "store":true},
"datetimestamp": { "type": "date", "store":true}
}
}
}
}
}
}
Then I add one document:
{
"attributes":[
{"name":"attribute_string", "value":"string_value","converted":null,"datetimestamp":null},
{"name":"attribute_double", "value":"1234.567","converted":1234.567,"datetimestamp":null},
{"name":"attribute_datetime", "value":"2015-01-01T12:10:30Z","converted":null,"datetimestamp":"2015-01-01T12:10:30Z"}
]
}
When I query w/ "stored_fields", I don't have fields in results:
_search
{
"stored_fields":["attributes.converted"]
}
Results:
{
"_index": "test_index",
"_type": "object",
"_id": "1",
"_score": 1
}
But when I use "_source":["attributes.converted"] , i have result:
{
"_index": "test_index",
"_type": "object",
"_id": "1",
"_score": 1,
"_source": {
"attributes": [
{ "converted": null },
{ "converted": 1234.567 },
{ "converted": null }
]
}
}
What is the proper way to use stored_fields?
Does usage of "_source" affect performance compare to "stored_fields" approach?
If "_source" approach is fast as "stored_fields", shall I remove "store":true for the fields?
Thank you.
You're using nested types, so use inner_hits.
In the nested case, documents are returned based on matches in nested inner objects.
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html
As per elasticsearch-docs
On its own, stored_fields cannot be used to load fields in nested objects — if a field contains a nested object in its path, then no data will be returned for that stored field. To access nested fields, stored_fields must be used within an inner_hits block.

Elasticsearch display number value as string

I found bellow data in elasticsearch that confused me :
{
"_index": "statsd-2015.09.24",
"_type": "counter",
"_id": "AU__eqbL4jN5fst_IEyK",
"_score": 1,
"_source": {
"ns": "statsd",
"grp": "bad_lines_seen",
"tgt": "",
"act": "",
"val": 0,
"#timestamp": 1443072093000
}
},
{
"_index": "statsd-2015.09.24",
"_type": "counter",
"_id": "AU__fKQM4jN5fst_IEy_",
"_score": 1,
"_source": {
"ns": "statsd",
"grp": "bad_lines_seen",
"tgt": "",
"act": "",
"val": "0",
"#timestamp": "1443072852000"
}
}
Why the field val and timestamp are displayed in different formats?
In first document , they are numbers
In second document, they are strings
They are in the same index and type whose mapping is :
{
"statsd-2015.09.24": {
"mappings": {
"counter": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"act": {
"type": "string",
"index": "not_analyzed"
},
"grp": {
"type": "string",
"index": "not_analyzed"
},
"ns": {
"type": "string",
"index": "not_analyzed"
},
"tgt": {
"type": "string",
"index": "not_analyzed"
},
"val": {
"type": "long"
}
}
}
}
}
}
How is this possible ?
You have found differences between field types in the source of the documents. The _source is the original JSON submitted to ElasticSearch to index a document.
However, it isn't what is actually indexed : this is just stored for other purposes (more information can be found in the documentation).
In your case, the val is mapped as a long field, so the string value provided in the second document is in fact parsed to its long value.
You can check this easily : try to index a document like this
{
"ns": "statsd",
"grp": "bad_lines_seen",
"tgt": "",
"act": "",
"val": "abc",
"#timestamp": "1443072852000"
}
You'll have the following parsing error :
MapperParsingException[failed to parse [val]]; nested: NumberFormatException[For input string: \"abc\"];
So, to answer your question : your values' types are different, but in fact the values are both indexed as long due to parsing.
For additional information:
output : {"#timestamp": "2019-03-21T10:52:35.435Z"}
Index mapping: "#timestamp": {"type": "date"}
output : {"#timestamp": 1443072852000}
Index Mapping: "#timestamp": {"type": "date","format": "epoch_millis"}

Resources