Elasticsearch display number value as string - elasticsearch

I found bellow data in elasticsearch that confused me :
{
"_index": "statsd-2015.09.24",
"_type": "counter",
"_id": "AU__eqbL4jN5fst_IEyK",
"_score": 1,
"_source": {
"ns": "statsd",
"grp": "bad_lines_seen",
"tgt": "",
"act": "",
"val": 0,
"#timestamp": 1443072093000
}
},
{
"_index": "statsd-2015.09.24",
"_type": "counter",
"_id": "AU__fKQM4jN5fst_IEy_",
"_score": 1,
"_source": {
"ns": "statsd",
"grp": "bad_lines_seen",
"tgt": "",
"act": "",
"val": "0",
"#timestamp": "1443072852000"
}
}
Why the field val and timestamp are displayed in different formats?
In first document , they are numbers
In second document, they are strings
They are in the same index and type whose mapping is :
{
"statsd-2015.09.24": {
"mappings": {
"counter": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"act": {
"type": "string",
"index": "not_analyzed"
},
"grp": {
"type": "string",
"index": "not_analyzed"
},
"ns": {
"type": "string",
"index": "not_analyzed"
},
"tgt": {
"type": "string",
"index": "not_analyzed"
},
"val": {
"type": "long"
}
}
}
}
}
}
How is this possible ?

You have found differences between field types in the source of the documents. The _source is the original JSON submitted to ElasticSearch to index a document.
However, it isn't what is actually indexed : this is just stored for other purposes (more information can be found in the documentation).
In your case, the val is mapped as a long field, so the string value provided in the second document is in fact parsed to its long value.
You can check this easily : try to index a document like this
{
"ns": "statsd",
"grp": "bad_lines_seen",
"tgt": "",
"act": "",
"val": "abc",
"#timestamp": "1443072852000"
}
You'll have the following parsing error :
MapperParsingException[failed to parse [val]]; nested: NumberFormatException[For input string: \"abc\"];
So, to answer your question : your values' types are different, but in fact the values are both indexed as long due to parsing.

For additional information:
output : {"#timestamp": "2019-03-21T10:52:35.435Z"}
Index mapping: "#timestamp": {"type": "date"}
output : {"#timestamp": 1443072852000}
Index Mapping: "#timestamp": {"type": "date","format": "epoch_millis"}

Related

How to search over all fields and return every document containing that search in elasticsearch?

I have a problem regarding searching in elasticsearch.
I have a index with multiple documents with several fields. I want to be able to search over all the fields running a query and want it to return all the documents that contains the value specified in the query. I Found that using simple_query_string worked well for this. However, it does not return consistent results. In my index I have documents with several fields that contain dates. For example:
"revisionDate" : "2008-01-01T00:00:00",
"projectSmirCreationDate" : "2008-07-01T00:00:00",
"changedDate" : "1971-01-01T00:00:00",
"dueDate" : "0001-01-01T00:00:00",
Those are just a few examples, however when I index for example:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "2008"
}
}
}
It only returns two documents, this is a problem because I have much more documents than just two that contains the value "2008" in their fields.
I also have problem searching file names.
In my index there are fields that contain fileNames like this:
"fileName" : "testPDF.pdf",
"fileName" : "demo.pdf",
"fileName" : "demo.txt",
When i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo"
}
}
}
I get no results
But if i query:
GET new_document-20_v2/_search
{
"size": 1000,
"query": {
"simple_query_string" : {
"query": "demo.txt"
}
}
}
I get the proper result.
Is there any better way to search across all documents and fields than I did? I want it to return all the document matching the query and not just two or zero.
Any help would be greatly appreciated.
Elasticsearch uses a standard analyzer if no analyzer is specified. Since no analyzer is specified on "fileName", demo.txt gets tokenized to
{
"tokens": [
{
"token": "demo.txt",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Now when you are searching for demo it will not give any result, but searching for demo.txt will give the result.
You can instead use a wildcard query to search for a document having demo in fileName
{
"query": {
"wildcard": {
"fileName": {
"value": "demo*"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"fileName": "demo.pdf"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"fileName": "demo.txt"
}
}
]
Since revisionDate, projectSmirCreationDate, changedDate, dueDate are all of type date, so you cannot do a partial search on these dates.
You can use multi-fields, to add one more field (of text type) in the above fields. Modify your index mapping as shown below
{
"mappings": {
"properties": {
"changedDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"projectSmirCreationDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"dueDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
},
"revisionDate": {
"type": "date",
"fields": {
"raw": {
"type": "text"
}
}
}
}
}
}
Index Data:
{
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
{
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
Search Query:
{
"query": {
"multi_match": {
"query": "2008"
}
}
}
Search Result:
"hits": [
{
"_index": "67303015",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"revisionDate": "2008-01-01T00:00:00",
"projectSmirCreationDate": "2008-07-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
},
{
"_index": "67303015",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156,
"_source": {
"revisionDate": "2008-02-01T00:00:00",
"projectSmirCreationDate": "2008-02-01T00:00:00",
"changedDate": "1971-01-01T00:00:00",
"dueDate": "0001-01-01T00:00:00"
}
}
]

How to index a field in elasticsearch but not store it in _source?

I have a collection of documents with a text field "features", and would like to make this field indexed (so documents can be searched through the field) but not stored (in order to save disk space).
How to index a field in elasticsearch like this "features" field but not store it in _source?
The following index mapping, will index a field value but not store it
Index Mapping:
{
"mappings": {
"properties": {
"features": {
"type": "text",
"index": "true",
"store": "false"
}
}
}
}
Index Data:
{
"features": "capacity"
}
Search Query:
{
"stored_fields": [
"features"
]
}
Search Result:
"hits": [
{
"_index": "67155998",
"_type": "_doc",
"_id": "1",
"_score": 1.0
}
]
UPDATE 1:
When a field is indexed, then you can perform queries on it. If a field is stored the contents of the field can be shown when the document matches.
But if you want that the content of the field should also not to be displayed in the _source, then you need to disable the _source field.
You need to modify your index mapping as
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"features": {
"type": "text",
"index": "true",
"store": "false"
}
}
}
}
Search Query:
{
"query":{
"match":{
"features":"capacity"
}
}
}
Search Result:
"hits": [
{
"_index": "67155998",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821
}
]

How does type ahead in ElasticSearch work on multiple words and partial text match

I would like to explain with an example.
Documents of my ElasticSearch dataset has a field 'product_name'.
One document has product_name = 'Anmol Twinz Biscuit"
When the user types (a)'Anmol Twin' or (b)'Twin Anmol' or (c)'Twinz Anmol' or (d) Anmol Twinz, I want this specific record returned as search result.
However, this works only if I specify the complete words in the search query. Partial matches are not working. Thus (a) & (b) is not returning the desired result.
Mapping defined (obtained by _mapping query)
{
"sbis_product_idx": {
"mappings": {
"items": {
"properties": {
"category_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_company": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_id": {
"type": "long"
},
"product_name": {
"type": "text"
},
"product_price": {
"type": "float"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
Query being used:
{
"_source": "product_name",
"query": {
"multi_match" : {
"type": "best_fields",
"query": "Twin Anmol",
"fields": [ "product_name", "product_company" ],
"operator": "and"
}
}
}
The document in ES
{
"_index": "sbis_product_idx",
"_type": "misc",
"_id": "107996",
"_version": 1,
"_score": 0,
"_source": {
"suggest": {
"input": [
"Anmol",
"Twinz",
"Biscuit"
]
},
"category_name": "Other Product",
"product_company": "Anmol",
"product_price": 30,
"product_name": "Anmol Twinz Biscuit",
"product_id": 107996
}
}
Result
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
Mistake in query / mapping?
I just created the index with your mapping and indexed ES doc given in your example and just changed the operator in your query from and to or and it's giving me all result for all 4 query combinations.
Find below my query
{
"_source": "product_name",
"query": {
"multi_match" : {
"type": "best_fields",
"query": "Anmol Twinz",
"fields": [ "product_name", "product_company" ],
"operator": "or" --> changed it to `or`
}
}
}
With and operator your query tries to find both terms in your search query, some of which are not complete token like Twin in ES, hence you were not getting results for them, when you change your operator to or then if any of the token present, it will match.
Note:- if you want to match on partial tokens like Twin or Twi then, you need to use the n-gram tokens as explained in official ES doc https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html and its a completely different design.

I can't get nested stored_fields

All.
I'm using ElasticSearch 5.0 and I have next mapping:
{
"mappings": {
"object":{
"properties":{
"attributes":{
"type":"nested",
"properties":{
"name": { "type": "keyword", "store":true},
"value": { "type": "text", "store":true },
"converted": {"type": "double", "store":true},
"datetimestamp": { "type": "date", "store":true}
}
}
}
}
}
}
Then I add one document:
{
"attributes":[
{"name":"attribute_string", "value":"string_value","converted":null,"datetimestamp":null},
{"name":"attribute_double", "value":"1234.567","converted":1234.567,"datetimestamp":null},
{"name":"attribute_datetime", "value":"2015-01-01T12:10:30Z","converted":null,"datetimestamp":"2015-01-01T12:10:30Z"}
]
}
When I query w/ "stored_fields", I don't have fields in results:
_search
{
"stored_fields":["attributes.converted"]
}
Results:
{
"_index": "test_index",
"_type": "object",
"_id": "1",
"_score": 1
}
But when I use "_source":["attributes.converted"] , i have result:
{
"_index": "test_index",
"_type": "object",
"_id": "1",
"_score": 1,
"_source": {
"attributes": [
{ "converted": null },
{ "converted": 1234.567 },
{ "converted": null }
]
}
}
What is the proper way to use stored_fields?
Does usage of "_source" affect performance compare to "stored_fields" approach?
If "_source" approach is fast as "stored_fields", shall I remove "store":true for the fields?
Thank you.
You're using nested types, so use inner_hits.
In the nested case, documents are returned based on matches in nested inner objects.
Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html
As per elasticsearch-docs
On its own, stored_fields cannot be used to load fields in nested objects — if a field contains a nested object in its path, then no data will be returned for that stored field. To access nested fields, stored_fields must be used within an inner_hits block.

Elasticsearch "strict" mapping not working for fields with null values

I have an index for which I have set the mapping to "dynamic":"strict".
As expected, for the most part, if a field that is not listed in the mapping is introduced, Elasticsearch will reject it.
However I am finding that any field with a null value is not caught, and will make it into my index. Here is what my mapping looks like:
{
"myindex": {
"mappings": {
"mystuff": {
"dynamic": "strict",
"_id": {
"store": true,
"index": "not_analyzed"
},
"_timestamp": {
"enabled": true,
"store": true
},
"_index": {
"enabled": true
},
"_type": {
"store": true
},
"properties": {
"entitlements": {
"type": "nested",
"properties": {
"accountNumber": {
"type": "string",
"index": "not_analyzed"
},
"active": {
"type": "string",
"index": "not_analyzed"
},
"assetEndDate": {
"type": "date",
"format": "date_time_no_millis"
}
}
}
}
}
}
}
}
EDIT (including example scenarios)
With the mapping above, here are the scenarios I am seeing:
1) When Posting a valid document (one that follows the mapping), 200 OK.
posted document:
{
"entitlements": [
{
"accountNumber": "123213",
"active": "true",
"assetEndDate": "2016-10-13T00:00:00Z"
}
]
}
elasticsearch response:
{
"_index": "myindex",
"_type": "mystuff",
"_id": "5",
"_version": 1,
"created": true
}
2) When posting an invalid document (one that does not follow the mapping), 400 StrictDynamicMappingException.
posted document:
{
"entitlements": [
{
"accountNumber":"123213",
"XXXXactive": "true",
"assetEndDate": "2016-10-13T00:00:00Z"
}
]
}
elasticsearch response:
{
"error": "StrictDynamicMappingException[mapping set to strict, dynamic introduction of [Xactive] within [entitlements] is not allowed]",
"status": 400
}
3) When posting an invalid document (one that does not follow the mapping) with a value that is null for the invalid field, 200 OK.
posted document:
{
"entitlements": [
{
"accountNumber":"123213",
"XXXXactive": null,
"assetEndDate": "2016-10-13T00:00:00Z"
}
]
}
elasticsearch response:
{
"_index": "myindex",
"_type": "mystuff",
"_id": "7",
"_version": 1,
"created": true
}
4) When posting an invalid document (one that does not follow the mapping) with a value that is null for the invalid field, 200 OK.
posted document:
{
"entitlements": [
{
"accountNumber":"123213",
"XXXXactive": null,
"assetEndDate": "2016-10-13T00:00:00Z",
"THIS_SHOULD_NOT_BE_HERE": null
}
]
}
elasticsearch response:
{
"_index": "myindex",
"_type": "mystuff",
"_id": "9",
"_version": 1,
"created": true
}
It is the 3rd and 4th scenarios, that I am concerned about.
It looks like this issue (or one very similar) was raised one the Elasticsearch git repository here and has since been closed. However, the problem appears to still be an issue in version 1.7 .
This is being seen locally, as well as on indexes I have deployed with AWS Elasticsearch Service.
Am I making a mistake somewhere, or Has anyone found a solution to this problem ?

Resources