why elasticsearch keyword search not working? - elasticsearch

i use NLog to write log message to Elasticsearch, the index structure is here:
"mappings": {
"logevent": {
"properties": {
"#timestamp": {
"type": "date"
},
"MachineName": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"level": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
},
"message": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
}
}
}
I was able to get results using a text search:
GET /webapi-2022.07.28/_search
{
"query": {
"match": {
"message": "ERROR"
}
}
}
result
"hits" : [
{
"_index" : "webapi-2022.07.28",
"_type" : "logevent",
"_id" : "IFhYQoIBRhF4cR9wr-ja",
"_score" : 4.931916,
"_source" : {
"#timestamp" : "2022-07-28T01:07:58.8822339Z",
"level" : "Error",
"message" : """2022-07-28 09:07:58.8822|ERROR|AppSrv.Filter.AccountAuthorizeAttribute|[KO17111808]-[172.10.2.200]-[ERROR]-"message"""",
"MachineName" : "WIN-EPISTFOBD41"
}
}
//.....
]
but when i use keyword, i get nothing:
GET /webapi-2022.07.28/_search
{
"query": {
"term": {
"message.keyword": "ERROR"
}
}
}
i tried term and match, the result is same.

this is happening due to message field not just containing ERROR but also having other string in the .keyword field, you need to use the text search only in your case, you can use the .keyword field only in case of the exact search.
If your message field contained only the ERROR string than only searching on your .keyword would produce result, you can test it yourself by indexing a sample document.

Related

Count total number of words of all documents pointing to specific fields

Someone asked this question but no one seems to answer or tried to suggest possible ways to solve it: https://discuss.elastic.co/t/count-the-number-of-words-in-the-field-elastic-search-6-2/121373
Now, I'm trying to produce a report from Elasticsearch to count the number of WORDS / TOKENS from a specific field called title and content
Is there a proper aggregation for this?
For example, I have this query:
GET web/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"fields":[
"title",
"content"
],
"query":"((\"Hello\") AND (\"World\")"
}
},
{
"range":{
"pub_date":{
"from":1569456000,
"to":1570060800
}
}
}
]
}
}
}
And for example, this query produced 23 DOCUMENTS, I want to make a response telling me how MANY words do those 23 documents contain based from the title and content fields?
I would leverage the token_count data type. In your index, you can add a sub-field of type token_count to your title and content fields, like this:
PUT web
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
},
"content": {
"type": "text",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
Then, in order to find out the number of tokens, you can simply run a sum aggregation on the .length sub-field, like this:
POST web/_search
{
"size": 0,
"aggs": {
"title_tokens": {
"sum": {
"field": "title.length"
}
},
"content_tokens": {
"sum": {
"field": "content.length"
}
}
}
}
I am using data type called token_count It will calculate and store the count of tokens for each text. This count value can be utilized to get the token count of fields
PUT index18
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
Data:
"hits" : [
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "edJPtW0BVHM68p7X-Wlu",
"_score" : 1.0,
"_source" : {
"title" : "Mayor Isko"
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "etJQtW0BVHM68p7XGmmr",
"_score" : 1.0,
"_source" : {
"title" : "Isko"
}
}
]
Query
GET index18/_search
{
"query": {"match_all": {}},
"aggs": {
"WordCount": {
"sum": {
"field": "title.length"
}
}
}
}

Custom indexing template is not being applied

I have a project where I am to analyze and visualize access log data. I use Logstash to send data to Elasticsearch and then visualize some stuff with Kibana.
Everything has worked fine until I discovered that I needed the Path Hierarchy Analyzer to show what I want to. I now have a custom template (JSON) and changed the out section of my Logstash configuration. But when I index data, my template is not being applied.
(Version 5.2 of Elasticseach and Logstash, can't update since that is the version in use at the place where I work).
My JSON file is valid. As far as the input and filters go, my Logstash configuration is fine, too. I guess I made a mistake in the output.
I already tried setting manage_template to false. I also tried template_overwrite => "false" just for the sake of it.
I tried creating the index first (Kibana Dev Tools) and populating it after. I created the index template and then the index. That way my template was applied and when I created the index pattern, everything seemed correct. Then I indexed one of my log files. I ended up with a Courier Fetch Error. http://localhost:9200/_all/_mapping?pretty=1 showed my that while indexing my data a default template was being used instead of my custom one. Nothing was different from before adding a custom template.
I searched the web and read everything I could find on stackoverflow and in the elastic forum about custom templates not being applied. I tried out all the solutions provided there, that is why I ended up opting for a custom template saved locally and providing the path in my logstash output. But I am all out of ideas now.
This is the output of my logstash configuration:
output {
elasticsearch {
hosts => ["localhost:9200"]
template => "/etc/logstash/conf.d/template.json"
index => "beam-%{+YYYY.MM.dd}"
manage_template => "true"
template_overwrite => "true"
document_type => "beamlogs"
}
stdout {
codec => rubydebug
}
}
And this is my custom template:
{
"template": "beam_custom",
"index_patterns": "beam-*",
"order" : 5,
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
},
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
"custom_hierarchy_reversed": {
"type": "path_hierarchy",
"delimiter": "/",
"reverse": "true"
}
}
}
},
"mappings": {
"beamlogs": {
"properties": {
"object": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
},
"referral": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
},
"#timestamp" : {
"type" : "date"
},
"action" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"datetime" : {
"type" : "date",
"format": "time_no_millis",
"fields" : {
"keyword" : {
"type": "keyword"
}
}
},
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"info" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"page" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"path" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"result" : {
"type" : "long"
},
"s_direct" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"s_limit" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"s_mobile" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"s_terms" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"size" : {
"type" : "long"
},
"sort" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
}
}
}
}
}
After indexing my data this is part of what I get with http://localhost:9200/_all/_mapping?pretty=1
"datetime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"object" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
datetime should not have the type text. But worse than that, fields like objet.tree are not even created.
I really don't care about the wrong mapping for datetime, but I need to get the Path Hierarchy Analyzer to work. I just don't know what to do anymore.
So. What I just tried was creating the index template in Kibana.
PUT _template/beam_custom
/followed by what is in my template.json
I then checked if the template was created.
GET _template/beam_custom
The output was this:
{
"beam_custom": {
"order": 100,
"template": "beam_custom",
"settings": {
"index": {
"analysis": {
"analyzer": {
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
},
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
...
So I guess creating the template worked.
Then I created an index
PUT beam-2019-07-15
But when I checked the index, I got this:
{
"beam-2019.07.15": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"creation_date": "1563044670605",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "rGzplctSQDmrI_NSlt47hQ",
"version": {
"created": "5061699"
},
"provided_name": "beam-2019.07.15"
}
}
}
}
Shouldn't the index pattern have been recognized? I think this is the heart of the problem. I thought that my template would have been used and the output should have been something like this instead:
{
"beam-2019.07.15": {
"aliases": {},
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date"
},
"action": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},...
Why doesn't it recognize the pattern?
So, I found the mistake.
When I looked up how to build my own template, at some point I looked at the documentation for the current version. But in 5.2., "index_patterns =>" doesn't exist.
"template": "beam_custom",
"index_patterns": "beam-*",
This doesn't work then, of course.
Instead, I dropped the "index_patterns" line and defined my pattern in the template-parameter.
"template": ["beam-*"],
//rest
This fixed the problem. After that, my pattern was recognized.
Yet I am facing a different problem now. The Path Hierarchy Analyzer is not working properly. object.tree and the rest of the fields I want are not being created.
GET beam-*/_search
{
"query": {
"term": {
"object.tree": "/belletristik/"
}
}
}
yields nothing, though I should have a few hundred hits. Looking at my data, there are no analyzed fields for my paths. Any ideas?

Is it impossible to index a document where a property has multiple fields, one of them being a completion type with contexts?

Here is my mapping (some fields renamed/removed), I'm using ES 6.0
{
"mappings": {
"_doc" :{
"properties" : {
"username" : {
"type": "keyword",
"fields": {
"suggest" : {
"type" : "completion",
"contexts": [
{
"name": "user_id",
"type": "category"
}
]
}
}
},
"user_id": {
"type": "integer"
}
}
}
}
}
Now when I try to index a document with
PUT usernames/_doc/1
{
"username" : "JOHN",
"user_id": 1
}
OR
PUT usernames/_doc/1
{
"username" : {
"input": "JOHN",
"contexts: {
"user_id": 1
}
}
"user_id": 1
}
The first doesn't index with context and the second just fails. I've attempted to add a path like so,
{
"mappings": {
"_doc" :{
"properties" : {
"username" : {
"type": "keyword",
"fields": {
"suggest" : {
"type" : "completion",
"contexts": [
{
"name": "user_id",
"type": "category",
"path": "user_id",
}
]
}
}
},
"user_id": {
"type": "integer"
}
}
}
}
}
And attempting indexing again
PUT usernames/_doc/1
{
"username" : "JOHN",
"user_id": 1
}
But it just throws a context must be a keyword or text error. Do I have to give up and make a totally new property username-autocomplete instead? Or is there some magical way where I can have a context completion suggester and another field on the same property, and be able to index like I would other multifield properties?
The second approach is the right one (i.e. with the path inside the context), but you need to set the user_id field as a keyword and it will work:
{
"mappings": {
"_doc" :{
"properties" : {
"username" : {
"type": "keyword",
"fields": {
"suggest" : {
"type" : "completion",
"contexts": [
{
"name": "user_id",
"type": "category",
"path": "user_id",
}
]
}
}
},
"user_id": {
"type": "keyword" <--- change this
}
}
}
}
}
Then you can index your document without creating an additional field, like this:
PUT usernames/_doc/1
{
"username" : "JOHN",
"user_id": "1" <--- wrap in double quotes
}

Elasticsearch - Can't search using suggestion field (“is not a completion suggest field”)

I'm completely new to elasticsearch and I'm trying to use elasticsearch completion suggester on an existing field called "identity.full_name", index = "search" and type = "person".
I followed the below index to change the mappings of the field.
1)
POST /search/_close
2)
POST search/person/_mapping
{
"person": {
"properties": {
"identity.full_name": {
"type": "text",
"fields":{
"suggest":{
"type":"completion"
}
}
}
}
}
}
3)
POST /search/_open
When I check the mappings at this point, using
GET search/_mapping/person/field/identity.full_name
I get the result,
{
"search": {
"mappings": {
"person": {
"identity.full_name": {
"full_name": "identity.full_name",
"mapping": {
"full_name": {
"type": "text",
"fields": {
"completion": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
},
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 50
}
}
}
}
}
}
}
}
}
Which is suggesting that it has been updated to be a completion field.
However, when I'm querying to check if this works using,
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "EMANNUEL",
"completion" : {
"field" : "identity.full_name"
}
}
}
}
It is giving me the error "Field [identity.full_name] is not a completion suggest field"
I'm not sure why I'm getting this error. Is there anything else I can try?
sample data:
{
"_index": "search",
"_type": "person",
"_id": "3106105149",
"_score": 1,
"_source": {
"identity": {
"id": "3106105149",
"first_name": "FLORENT",
"last_name": "TEBOUL",
"full_name": "FLORENT TEBOUL"
}
}
}
{
"_index": "search",
"_type": "person",
"_id": "125296353",
"_score": 1,
"_source": {
"identity": {
"id": "125296353",
"first_name": "CHRISTINA",
"last_name": "BHAN",
"full_name": "CHRISTINA K BHAN"
}
}
}
so when I do a GET based on prefix "CHRISTINA"
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "CHRISTINA",
"completion" : {
"field" : "identity.full_name.suggest"
}
}
}
}
I'm getting all the results like a match_all query.
You should use it like
GET search/person/_search
{
"suggest": {
"person-suggest" : {
"prefix" : "EMANNUEL",
"completion" : {
"field" : "identity.full_name.suggest"
}
}
}
}
Mapping for GET search/_mapping/person/field/identity.full_name
{
"search" : {
"mappings" : {
"person" : {
"identity.full_name" : {
"full_name" : "identity.full_name",
"mapping" : {
"full_name" : {
"type" : "text",
"fields" : {
"suggest" : {
"type" : "completion",
"analyzer" : "simple",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
}
}
}
}
}
}
}
}
}

Elasticsearch: multiple languages in two fields when the query's language is unknown or mixed

I am new to Elasticsearch, and I am not sure how to proceed in my situation.
I have the following mapping:
{
"mappings": {
"book": {
"properties": {
"title": {
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"ar": {
"type": "string",
"analyzer": "arabic"
}
}
},
"keyword": {
"properties": {
"en": {
"type": "string",
"analyzer": "english"
},
"ar": {
"type": "string",
"analyzer": "arabic"
}
}
}
}
}
}
}
A sample document may have two languages for the same field of the same book. Here are two example documents:
{
"title" : {
"en": "hello",
"ar": "مرحبا"
},
"keyword" : {
"en": "world",
"ar": "عالم"
}
}
{
"title" : {
"en": "Elasticsearch"
},
"keyword" : {
"en": "full-text index"
}
}
When I know what language is used in query, I am able to build query as follows (when English is used):
"query": {
"multi_match" : {
"query" : "keywords",
"fields" : [ "title.en", "keyword.en" ]
}
}
Based on my current document mapping, how can I build a query if
the query language is unknown or
is mixed with English and Arabic?
Thanks for any input!
Regards.
p.s. I am also open to any improvement to the above mapping.
the query language is unknown
You can use same multi match query but on all the fields.for eg,
Assuming you are using keyword analyzer
"query": {
"multi_match" : {
"query" : "keywords",
"fields" : [ "title.en", "keyword.en", "title.ar", "keyword.ar" ]
}
}
is mixed with English and Arabic
You need to change the analyzer to standard and then you can perform the same query.
Thanks

Resources