I am working on a based facebook comments dashboard from facebook graph api using elasticsearch5 & kibana5. I add some analyzed fields and they are appearing in the discover part on Kibana but when going to the visualization i don't find those fields.
My facebook comments index :
PUT fb_comments
{
"settings": {
"analysis": {},
"mapping.ignore_malformed": true
},
"mappings": {
"fb_comment": {
"dynamic_templates": [
{
"created_time": {
"match": "created_time",
"mapping": {
"type": "date",
"format": "epoch_second"
}
}
},
{
"message": {
"match": "message",
"mapping": {
"type": "string",
"analyzer": "simple"
}
}
},
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
The field message the analyzed one is appearing in discover
The field message the analyzed one is not appearing in visualization part
I think it might be related to a memory limitation. As per Kibana 5 help, analyzed fields might required more memory.
I checked my memory and it is indeed used at its max capacity.
I finally found the solution.
So in elasticsearch 4.X we had string type and then you specified the type of analyzer if you wish to be analyzed. In elasticsearch 5.X we have two types keyword which is automatically aggregated and not analyzed, and the 2nd is text which is autmatically analyzed and not aggregated. The solution is if you want an analyzed field and aggregated at the same time you should add a property "fielddata":true and it will be analyzed and aggregated.
Related
I am pretty new to Elastic Search. I have a dataset with multiple fields like name, product_info, description etc., So while searching a document, the search term can come from any of these fields (let us call them as "search core fields").
If I start storing the data in elastic search, should I derive a field which is a concatenated term of all the "search core fields" ? and then index this field alone ?
I came across _all mapping concept and little confused. Does it do the same ?
no, you don't need to create any new field with concatenated terms.
You can just use _all with match query to search a text from any field.
About _all, yes, it searches the text from any field
The _all field has been removed in ES 7, so it would only work in ES 6 and previous versions. The main reason for this is that it used too much storage space.
However, you can define your own all field using the copy_to feature. You basically specify in your mapping which fields should be copied to your custom all field and then you can search on that field.
You can define your mapping like this:
PUT my-index
{
"mappings": {
"properties": {
"name": {
"type": "text",
"copy_to": "custom_all"
},
"product_info": {
"type": "text",
"copy_to": "custom_all"
},
"description": {
"type": "text",
"copy_to": "custom_all"
},
"custom_all": {
"type": "text"
}
}
}
}
PUT my-index/_doc/1
{
"name": "XYZ",
"product_info": "ABC product",
"description": "this product does blablabla"
}
And then you can search on your "all" field like this:
POST my-index/_search
{
"query": {
"match": {
"custom_all": {
"query": "ABC",
"operator": "and"
}
}
}
}
Discover: The length of [message] field of [-CSnZmwB_xkQcDCOrP1V] doc of [prod_logs] index has exceeded [1000000] - maximum allowed to be analyzed for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset] index level setting. For large texts, indexing with offsets or term vectors is recommended!
I get the above error in Kibana. I use ELK version 7.2.0. Answers / Suggestions are most welcome.
You should change your mapping.If you can not update your mapping create a temp new index.And add term_vector your big text field
"mappings": {
"properties": {
"sample_field": {
"type": "text",
"term_vector": "with_positions_offsets"
}
}
}
Then clone your data to new index.
POST /_reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
Then use "unified" in highlight query.
"highlight": {
"fields": {
"textString": {
"type": "unified"
}
}
like that.
In my query I have following filter:
"term": {
"language": "en-us"
}
And it's not returning any results despite there are a lot of docs with "language" = "en-us" and this field is defined in the mapping correctly. When I change filter for example for:
"term": {
"isPublic": true
}
Then it correctly filter by "isPublic" field.
My suspicion here is that field named "language" is treated somehow special? Maybe it's reserved keyword in ES query? Can't find it in docs.
ES v2.4.0
Mapping of document:
"mappings": {
"contributor": {
"_timestamp": {},
"properties": {
"createdAt": {
"type": "date",
"format": "epoch_millis||dateOptionalTime"
},
"displayName": {
"type": "string"
},
"followersCount_en_us": {
"type": "long"
},
"followersCount_zh_cn": {
"type": "long"
},
"id": {
"type": "long"
},
"isPublic": {
"type": "boolean"
},
"language": {
"type": "string"
},
"photoUrl": {
"type": "string",
"index": "not_analyzed"
},
"role": {
"type": "string",
"store": true
},
"slug": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
The field language is nothing special. It should be all in the mapping. Several possible causes come to mind:
query analyzer != index analyzer
the analyzer first splits into two tokens, en and de and then throws away short tokens, which would leave both, query and index empty:-)
the field is not indexed, just stored.
The - is not a normal ascii dash in the index or the query. I have seen crazy things happening when people paste queries from a word processor, like quotes are no longer straight quotes, dashes are ndash or mdash, ü ist not one character but a combined character.
EDIT after mapping was added to the question:
The type string is analyzed with the Standard Analyzer which splits text into tokens in particular at dashes too, so the field contains two tokens, "en" and "us". Your search is a term query, which should probably be called token-query, because it queries exactly this, the token as you write it: "en-us". But this token does not exist in the field.
Two ways to remedy this:
set the field to not-analyzed and keep the query as is
change the query to a match query.
I would rather use (1), since the language field content is something like an ID and should not be analyzed.
More about the topic: "Why doesn’t the term query match my document?" on https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-term-query.html
I have an index with a text field.
"state": {
"type": "text"
}
Now suppose there are two data.
"state": "vail"
and
"state": "eagle vail"
For one of my requirements,
- I need to do a term level query, such that if I type "vail", the search results should only return states with "vail" and not "eagle vail".
But another requirement for different search on the same index,
- I need to do a match query for full text search, such that if I type "vail", "eagle vail" should display as well.
So my question is, how do I do both term level and full text search in this field, as for doing a term level query, I would have to set it as "keyword" type such that it wont be analyzed.
You can use "multi-field" feature to achieve this. Here is a mapping:
{
"mappings": {
"my_type": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
In this case state will act as text field (tokenized) whereas state.raw will be keyword (single-token). When indexing a document you should only set state. state.raw will be created automatically.
I have a large elasticsearch database full of records that each have a Name field, which is a single word. I would like to be able to page through the (sorted by Name) results starting at a particular letter. For example, I want to be able to start showing results where Name starts with the letter 'J', and then be able to page through all the remaining results.
This is how Name is currently mapped:
"Name": {
"type": "multi_field",
"fields": {
"name_exact": {
"type": "string",
"index": "not_analyzed"
},
"name_simple": {
"type": "string",
"analyzer": "simple"
},
"name_snow": {
"type": "string",
"analyzer": "snowball"
}
}
}
Is there a query that will let me do this?
You can use a prefix filter (cached by default) or prefix query (not cacheable).
Note that the query string itself is not analyzed.
If you want analysis on the query string, you should change your mapping and add an edge-ngram analyzed field; you can then use it with a match query.