Only getting results when elasticsearch is case sensitive - elasticsearch

I currently have a problem with my multi match search in ES.
Its simple like that: If I'm searching for the City "Sachsen", I'm getting results.
If I'm searching for "sachsen" (lowercase), I'm getting no results.
how to avoid this?
QUERY with no results
{
"match" : {
"City" : {
"query" : "sachsen"
}
}
My analyzer is analyzer_keyword. Should I have anything add ?
MAPPING
City: {
type: "string",
analyzer: "analyzer_keyword"
}

Your analyzer_keyword analyzer is most probably of type keyword which means you can only perform exact matches on it.
It's standard practice to apply multiple "variants" of a field, one of which is going to match lowercase, possibly ascii-tokenized characters (think München -> munchen) and one which will not be tokenized in any way (this is what you have in your analyzer_keyword).
Since you intend to search the lowercase version of Sachsen, your mapping could look something like
PUT sachsen
{
"mappings": {
"properties": {
"City": {
"type": "keyword", <----
"fields": {
"standard": {
"type": "text",
"analyzer": "standard" <----
}
}
}
}
}
}
After indexing a doc
POST sachsen/_doc
{
"City": "Sachsen"
}
The following will work for exact matches:
GET sachsen/_search
{
"query": {
"match": {
"City": "Sachsen"
}
}
}
and this for lowercase
GET sachsen/_search
{
"query": {
"match": {
"City.standard": "sachsen"
}
}
}
Note that I'm using the default, standard analyzer here but you can choose any one you deem appropriate.

Related

elastic search for string with quotes and colon

I have a log entry in my elastic search results that looks like this
'source_volid': None
or
'source_volid': '71509e33-3a1f-4c5a-a0a3-e785ff92xxxx'
I happen to only want to find the entries with the uuid as a value
I can't get es to match anything other than source_volid, not matter what the search string and escapes are
I've tried
"source_volid': '"
"source_volid': '"
"source_volid\': \'"
and many others
no matter what it always matches
source_volid
which gives me every entry in the log, which includes the None values.
Tldr;
When search for source_volid: "" it is going to match every document.
Solution
Use the keyword in conjunction with a bool + must_not query this should do the trick.
POST /_bulk
{"index":{"_index":"73887408"}}
{"source_volid":""}
{"index":{"_index":"73887408"}}
{"source_volid":"some data"}
GET /73887408/_search
{
"query": {
"bool": {
"must_not": [
{
"term": {
"source_volid.keyword": ""
}
}
]
}
}
}
ps: Make sure that you index the field as a keyword
Here is the mapping I have.
{
"73887408": {
"mappings": {
"properties": {
"source_volid": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}

Kibana - missing text highlighting for multi-field mapping

I am experimenting with ECS - Elastic Common Schema.
We need to highlight text search for the field error.stack_trace . This field is a multi-field mapped defined here
I just did a simple test running Elasticsearch and Kibana 7.17.4 one field defined as multi-field and one with single field.
PUT simple-index-01
{
"mappings": {
"properties": {
"stack_trace01": { "type": "text" },
"stack_trace02": {
"fields": {
"text": {
"type": "text"
}
},
"type": "wildcard"
}
}
}
}
POST simple-index-01/_doc
{
"#timestamp" : "2022-06-07T08:21:05.000Z",
"stack_trace01": "java.lang.NullPointerException: null",
"stack_trace02": "java.lang.NullPointerException: null"
}
Is it a Kibana expected behavior not to highlight multi-fields?
wildcard type will be not available to search using full text query as mentioned in documentaion (it is part of keyword type family):
The wildcard field type is a specialized keyword field for
unstructured machine-generated content you plan to search using
grep-like wildcard and regexp queries.
So when you try below query it will not return result and this is the reason why it is not highlghting your stack_trace02 field in discover.
POST simple-index-01/_search
{
"query": {
"match": {
"stack_trace02": "null"
}
}
}
But below query will give result:
{
"query": {
"wildcard": {
"stack_trace02": {
"value": "*null*"
}
}
}
}
You can create index mapping something like below and your parent type field should text type:
PUT simple-index-01
{
"mappings": {
"properties": {
"stack_trace01": {
"type": "text"
},
"stack_trace02": {
"fields": {
"text": {
"type": "wildcard"
}
},
"type": "text"
}
}
}
}
You can now use stack_trace02.wildcard when you want to search wildcard type of query.
There is already open issue on similar behaviour but it is not for wildcard type.

elasticsearch query string(full text search) which can't be searched

we have a document below. I can't searched with financialmarkets. but it can be searched with industry_icon_financialmarkets.png. Can anyone tell me what is the reason?
content is the text type field.
document:
{
"title":"test",
"content":"industry_icon_financialmarkets.png"
}
Query:
{
"from": 0,
"size": 2,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "\"industry_icon_financialmarkets.png\""
}
}
]
}
}
}
The default analyzer for text field is standard which won't break industry_icon_financialmarkets into tokens using _ as a delimiter. I would suggest you to use simple analyzer instead which will breaks text into terms whenever it encounters a character which is not a letter.
You can also add sub-field of type keyword to retain the original value.
So the mapping of the field should be:
{
"content": {
"type": "text",
"analyzer": "simple",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
At the time of creating index, we should have our own mapping for each fields based on its type to get the expected result.
Mapping
PUT relevance
{"mapping":{"ID":{"type":"long"},"title":
{"type":"keyword","analyzer":"my_analyzer"},
"content":
{"type":"string","analyzer":"my_analyzer","search_analyzer":"my_analyzer"}},
"settings":
{"analysis":
{"analyzer":
{"my_analyzer":
{"tokenizer":"my_tokenizer"}},
"tokenizer":
{"my_tokenizer":
{"type":"ngram","min_gram":3,"max_gram":30,"token_chars":
["letter","digit"]
}
}
},"number_of_shards":5,"number_of_replicas":2
}
}
Then start inserting documents,
POST relevance/_doc/1
{
"name": "1elastic",
"content": "working fine" //replace special characters with space using program before inserting into ES index.
}
Query
GET relevance/_search
{"size":20,"query":{"bool":{"must":[{"match":{"content":
{"query":"fine","fuzziness":1}}}]}}}

Elasticsearch: Is there a way to exclude synomyms from highlighting?

I'm trying to exclude synonyms from highlighting. I created a copy of my current analyzer with a synonym filter. So for each field I now have an analyzer and a search_analyzer. The search analyzer is the new analyzer with all the same filters plus the synonym filter.
Any ideas? I am using elasticsearch 5.2
Mapping:
"mappings": {
"doc": {
"properties": {
"body": {
"type": "text",
"analyzer": "custom_analyzer",
"search_analyzer": "custom_analyzer_with_synonyms",
"fields": {
"plain": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
Search Query:
{
"query": {
"match": {
"body": "something"
}
},
"highlight": {
"pre_tags": "<strong>",
"post_tags": "<strong>",
"fields" : {
"body.plain" : {
"number_of_fragments": 1,
"require_field_match": false
}
}
}
}
I am not sure about the reason behind the problem. I'd have thought that simply highlighting on a non-synonym-analyzed field would have done it. But according to the comments, it is still highlighting the synonyms. There are 2 possible reasons i can think of: (I haven't looked into the highlighter source code)
It could be because of the multi-word synonym problem mentioned in this link: https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-word-synonyms.html It could be fixed now since the link is old. If not, it could be causing the highlighter to look at wrong position offsets.
And/Or, it could also be because of not using the highlight field in the query. The highlighter might be simply using the tokens emitted from the searched field's analyzer (which would contain synonyms) and looking for those tokens in the highlighted field.
If it's the 1st problem, you could try to change your synonyms to use simple contraction. See: https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html#synonyms-contraction But, it has its own problems with the frequencies of uncommon words and could be a lot of work.
Fixing for the second case would be to use the "body.plain" field in the query, but you cannot do that since it affects your scores. In that case, specifying a different query for the highlighter (so that scores are not affected) on the non-synonym field does the trick. It works even if the 1st case is the problem too since we are not using synonyms in the highlight field.
So your query should look something like this:
{
"query": {
"match": {
"body": "something"
}
},
"highlight": {
"pre_tags": "<strong>",
"post_tags": "<strong>",
"fields" : {
"body.plain" : {
"number_of_fragments": 1,
"highlight_query": {
"match": {"body.plain": "something"}
}
}
}
}
}
See: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search-request-highlighting.html#_highlight_query

Proper way to query documents using an 'uppercase' token filter

We have an ElasticSearch index with some fields that use custom analyzers. One of the analyzers includes an uppercase token filter in order to get rid of case sensitivity while making queries (e.g. we want "ball" to also match "Ball" or "BALL")
The issue here is when doing regular expressions, the pattern is matched against the term in the index which is all uppercase. So "app*" won't match "Apple" in our index, because behind the scenes its really indexed as "APPLE".
Is there a way to get this to work without doing some hacky things outside of ES?
I might play around with "query_string" instead and see if that has any different results.
This all depends on the type of the query you are using. If that type will use the analyzer of the field itself to analyze the input string then it should be fine.
If you are using the regexp query, this one will NOT analyze the input string, so if you pass app.* to it, it will stay the same and this is what it will user for search.
But, if you use properly the query_string query that one should work:
{
"settings": {
"analysis": {
"analyzer": {
"my": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"uppercase"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"some_field": {
"type": "text",
"analyzer": "my"
}
}
}
}
}
And the query itself:
{
"query": {
"query_string": {
"query": "some_field:app*"
}
}
}
To make sure it's doing what I think it is, I always use the _validate api:
GET /_validate/query?explain&index=test
{
"query": {
"query_string": {
"query": "some_field:app*"
}
}
}
which will show what ES is doing to the input string:
"explanations": [
{
"index": "test",
"valid": true,
"explanation": "some_field:APP*"
}
]

Resources