I mapped a field in elastic search so that it gets analyzed with an edge 2gram tokenizer:
"google.title.#t": {
"type": "string",
"index_analyzer": "edge_2gram_body_analyzer",
"search_analyzer": "standard"
}
When I get the mapping, it seems healthy. I would expect this:
POST myIndex/_analyze?field=google.title.#t
{"test"}
to return the tokens:
te, tes, test
Yet, it does not, it returns "test" instead: it is defaulting to the standard analyzer.
Now, when I remove the # from the key (google.title.t), it works. Is there a way I can escape the # at mapping time? What are the other forbidden characters?
This is becuase "#" in url needs to be url-encoded
Example:
POST myIndex/_analyze?field=google.title.%23t&text=text
Related
I have run into a brick wall considering searching in my logged events. I am using an elasticsearch solution, filebeat to load messages from logs to elasticsearch, and Kibana front end.
I currently log the messages into a field message and exception stacktrace (if present) into error.message. So the logged event's snippet may look like:
{
"message": "Thrown exception: CustomException (Exception for testing purposes)"
"error" : {
"message" : "com.press.controller.CustomException: Exception for testing purposes\n at
com.press.controller....<you get the idea at this point>"
}
}
Of course there are other fields like timestamp, but those are not important. What is important is this:
When I search message : customException, I can find the events I logged. When I search error.message : customException, I do not get the events. I need to be able to fulltext search all fields.
Is there a way how to tell elasticsearch to enable the fulltext search in the fields?
And why has the "message" field enabled it by default? None of my colleagues are aware that any indexing command was run on the field in the console after deployment and our privileges do not allow me or other team members to run indexing or analysis commands on any field. So it has to be in the config somewhere.
So far I was unable to find the solution. Please push me in the right direction.
Edit:
The config of fields is as follows:
We use a modified ECS, and both messages are declared as
level: core
type: text
in file fields.yml.
in filebeat, the config snippet is as such:
filebeat.inputs:
- type: log
enabled: true
paths: .....
...
...
processors:
- rename:
fields:
- from: "msg"
to: "message"
- from: "filepath"
to: "log.file.name"
- from: "ex"
to: "error.message"
ignore_missing: true
fail_on_error: true
logging.level: debug
logging.to_files: true
For security requirements, I cannot disclose full files. Also, I need to write all the snippets by hand, so misspells are probably my fault.
Thanks
Problem is with the analyzer associated with your field, by default for text fields in ES, standard analyzer is used which doesn't create separate tokens if text contains . for ex: foo.bar would result in just 1 token as foo.bar while if you want both foo and bar should match in foo.bar then you need to genrate 2 tokens as foo and bar.
What you need is a custom analyzer which creates token as above as your error.message text contains . which I explained in my example:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": ["replace_dots"]
}
},
"char_filter": {
"replace_dots": {
"type": "mapping",
"mappings": [
". => \\u0020"
]
}
}
}
}
}
POST /my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "foo.bar"
}
The above example creates 2 tokens as foo and bar and same should happen with you when you create and test it with these API.
Let me know if you face any issue with it.
Elastic Search indexes all fields by default, here you did not define the mapping hence all fields should be indexed by default.
Also for your case I doubt if the data is properly going in elastic search as the log doesn't seem to be proper json.
Do you see proper logs in Kibana if yes please send a sample log/screenshot
I have been trying to implement snowball analyzer like functionality on one of my doc fields which is of type keyword. Like, for example plurals should be treated exactly like their singulars so that results are same for both.
Initially, I struggled setting analyzer on my field just to discover that fields of type keyword cannot have analyzers but normalizers. So, I tried setting a normalizer for snowball on those fields but it seems like my normalizer is not allowing the snowball filter (may be normalizers don't support the snowball filter)
I can't change the type of the field. I want to achieve functionality like if my input text matches restaurants it should treat it same as restaurant and give the results so that I don't have to add restaurants as a keyword to that field.
Can we achieve this through normalizers? I have gone through the elastic documentations and various posts but got no clue. Below is how I tried setting normalizer with the response from my elastic server.
PUT localhost:9200/db110/_settings
{
"analysis": {
"normalizer": {
"snowball_normalizer": {
"filter": ["lowercase","snowball" ]
}
},
"filter" : {
"snow" : {
"type" : "snowball",
"language" : "English"
}
}
}
}
Response
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Custom normalizer [snowball_normalizer] may not use filter [snowball]"
}
],
"type": "illegal_argument_exception",
"reason": "Custom normalizer [snowball_normalizer] may not use filter [snowball]"
},
"status": 400
}
you can't do that! Snowball is a stemmer and it is used for fulltext search - e.g. text datatype, because it is a token filter, that manipulates every single token. With keyword datatype you create a single token for all the content of the field. How stemmer could works for keyword field, according you? Use stemmer without tokens has no sense. Normalizer for keyword fields are only lowercase and asciifolding. Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/normalizer.html
My Elasticsearch queries are not working properly because sometimes (not always) my stored data have spaces () substituted with underscores (_). When users search with spaces, the don't get the entries with underscores in the results.
For example, if users search for the string annoying problem they get nothing because annoying_problem is the string stored in the index.
I have many similar problems for other characters as well, such as Ø being replaced with o in the data used to populate my index.
How should I solve this?
try using stopwords
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "standard",
"stopwords": [ "_"]
}
}
}
}
}
refrence https://www.elastic.co/guide/en/elasticsearch/guide/current/using-stopwords.html
i need help to correct kibana field. when I try to visualizing the fields, shown me the following warning:
Careful! The field contains Analyzed selected strings. Analyzed
strings are highly unique and can use a lot of memory to visualize.
Values: such as bar will be foo-foo and bar broken into. See Core
Mapping Types for more information on setting esta field Analyzed as
not
Elasticsearch default dynamic mapping is to analyze any string field (break the field into tokens, for instance: aaa_bbb_ccc will be break down into aaa,bbb and ccc).
If you do not want such behavior you must change the mapping settings
before any document was pushed into the index.
You have two options to do that:
Change the mapping for a particular index using mapping API, in a static way or dynamic way (dynamic means that the mapping will be applies also to fields that still does not exist in the index)
You can change the behavior of any index according to a pattern, using the template API
This example shows a template that changes the mapping for any index that starts with "app", applying "not analyze" to any field in any type and make sure "timestamp" is a date (good for cases in with the timestamp is represented as a number of seconds from 1970):
{
"template": "myindciesprefix*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
},
{
"timestamp_field": {
"match": "timestamp",
"mapping": {
"type": "date"
}
}
}
]
}
}
}
Really you dont have any problem is only a message of info, but if you dont want analyzed fields when you build your index in elasticsearch you must indicate that one field is a not analyzed field.
I am a bit confused about Bool Query vs. Finding Exact Values in elasticsearch. Specifically, I have a title_field and a post_field that I want to search on. But all of my other fields I use because I want to look up if they exist or not or how many times (like url or username which must be exact).
So I can see from the docs that I can do a multimatch query on the title_field and post_field.
But what about the other fields that I want exact response from? Do I do a boolean query(using must)? Or do I need to remap all of those fields as not_analyzed? Or do I need to map them as not_anayzed first and then do a boolean query?
Indeed, you should map the fields you want to do exact matches on as not_analyzed, which means they are treated as a single token instead of broken into several tokens.
Then you should use a term query or filter to exactly match against the token. If you are using a filter, you can use and, or, and not filters as well (more convenient than bool).
Since mapping all fields is a bit tedious, you could instead use dynamic_mapping to map all string fields as not_analyzed and then simply add a mapping for those fields you do want analyzed:
"dynamic_templates": [
{
"non_analyzed_string": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]