elasticsearch query string(full text search) which can't be searched - elasticsearch

we have a document below. I can't searched with financialmarkets. but it can be searched with industry_icon_financialmarkets.png. Can anyone tell me what is the reason?
content is the text type field.
document:
{
"title":"test",
"content":"industry_icon_financialmarkets.png"
}
Query:
{
"from": 0,
"size": 2,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "\"industry_icon_financialmarkets.png\""
}
}
]
}
}
}

The default analyzer for text field is standard which won't break industry_icon_financialmarkets into tokens using _ as a delimiter. I would suggest you to use simple analyzer instead which will breaks text into terms whenever it encounters a character which is not a letter.
You can also add sub-field of type keyword to retain the original value.
So the mapping of the field should be:
{
"content": {
"type": "text",
"analyzer": "simple",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}

At the time of creating index, we should have our own mapping for each fields based on its type to get the expected result.
Mapping
PUT relevance
{"mapping":{"ID":{"type":"long"},"title":
{"type":"keyword","analyzer":"my_analyzer"},
"content":
{"type":"string","analyzer":"my_analyzer","search_analyzer":"my_analyzer"}},
"settings":
{"analysis":
{"analyzer":
{"my_analyzer":
{"tokenizer":"my_tokenizer"}},
"tokenizer":
{"my_tokenizer":
{"type":"ngram","min_gram":3,"max_gram":30,"token_chars":
["letter","digit"]
}
}
},"number_of_shards":5,"number_of_replicas":2
}
}
Then start inserting documents,
POST relevance/_doc/1
{
"name": "1elastic",
"content": "working fine" //replace special characters with space using program before inserting into ES index.
}
Query
GET relevance/_search
{"size":20,"query":{"bool":{"must":[{"match":{"content":
{"query":"fine","fuzziness":1}}}]}}}

Related

Kibana - missing text highlighting for multi-field mapping

I am experimenting with ECS - Elastic Common Schema.
We need to highlight text search for the field error.stack_trace . This field is a multi-field mapped defined here
I just did a simple test running Elasticsearch and Kibana 7.17.4 one field defined as multi-field and one with single field.
PUT simple-index-01
{
"mappings": {
"properties": {
"stack_trace01": { "type": "text" },
"stack_trace02": {
"fields": {
"text": {
"type": "text"
}
},
"type": "wildcard"
}
}
}
}
POST simple-index-01/_doc
{
"#timestamp" : "2022-06-07T08:21:05.000Z",
"stack_trace01": "java.lang.NullPointerException: null",
"stack_trace02": "java.lang.NullPointerException: null"
}
Is it a Kibana expected behavior not to highlight multi-fields?
wildcard type will be not available to search using full text query as mentioned in documentaion (it is part of keyword type family):
The wildcard field type is a specialized keyword field for
unstructured machine-generated content you plan to search using
grep-like wildcard and regexp queries.
So when you try below query it will not return result and this is the reason why it is not highlghting your stack_trace02 field in discover.
POST simple-index-01/_search
{
"query": {
"match": {
"stack_trace02": "null"
}
}
}
But below query will give result:
{
"query": {
"wildcard": {
"stack_trace02": {
"value": "*null*"
}
}
}
}
You can create index mapping something like below and your parent type field should text type:
PUT simple-index-01
{
"mappings": {
"properties": {
"stack_trace01": {
"type": "text"
},
"stack_trace02": {
"fields": {
"text": {
"type": "wildcard"
}
},
"type": "text"
}
}
}
}
You can now use stack_trace02.wildcard when you want to search wildcard type of query.
There is already open issue on similar behaviour but it is not for wildcard type.

Why fuzzy query returns a match but query with fuzziness doesn't on the same input?

I created the following index in Elasticsearch:
PUT /my-index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": ["lowercase", "3_5_edgegrams"]
}
},
"filter": {
"3_5_edgegrams": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Then I inserted the following document:
{
"name": "Nuvus Gro Corp"
}
When I make the following query (let's call it fuzzy_query):
GET /my-index/_search
{
"query": {
"fuzzy": {
"name": {
"value": "qnuv"
}
}
}
}
I get a match for the above document.
When I make the query (let's call the query match_with_fuzziness):
GET /my-index/_search
{
"query": {
"match": {
"name": {
"query": "qnuv",
"fuzziness": "AUTO"
}
}
}
}
I don't get a match. If I make the following query:
GET /my-index/_search
{
"query": {
"match": {
"name": {
"query": "nuvq",
"fuzziness": "AUTO"
}
}
}
}
I again get a match. I don't understand why when I make the match_with_fuzziness query I don't get any matches.
EDIT: I analyzed the queries with Kibana Profiler and according to the profiler match_with_fuzziness is a SynonymQuery Synonym(name:qnu name:qnuv) query while fuzzy_query is a BoostQuery (name:nuv)^0.6666666
Very similar problem to the one explained in your other question.
The problem is that you haven't specified a specific search_analyzer, so at search time qnuv and nuvq also get analyzed by my_analyzer and edge-ngramed as well, hence the match you're receiving.
If we check the first query, since you're using the fuzzy query, qnuv (the search term) will match nuv (the first indexed edge-ngramed token) with a distance of 1 (i.e. the first q is "tolerated"), which is what the fuzzy query does by default (with "fuzziness: AUTO")
In the third query, nuv (the first edge-ngramed token of the search term) will match nuv (the first indexed edge-ngramed token).
The case of the second query is a bit special and I'm referencing below how the fuzziness parameter works in the context of match queries
Fuzzy matching is not applied to terms with synonyms or in cases where the analysis process produces multiple tokens at the same position. Under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.
The part in bold is what applies to your case. Since the search term qnuv is analyzed by my_analyzer, it produces the two tokens qnu and qnuv at the same position and that does not support fuzzy matching.
You need to change your mapping to this one instead and it will work the way you expect, i.e. all three queries will return your document:
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "standard" <---- add this line
}
}
}

Elasticsearch template in Logstash doesn't mapping and not able to sort fields

I want to sort datas via elasticsearch rest client, below is my template in logstash
{
"index_patterns": ["index_name"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"int_var": {
"type": "keyword"
}
}
}
}
}
}
When I try to reach, with the below code
{
"size": 100,
"query": {
"bool": {
"must": {
"match": {
"match_field": user_request
}
}
}
},
"sort": [
{"int_var": {"order": "asc"}}
]
}
I've got this error
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true
How can i solve this ? Thanks for answering
Here's the documentation regarding field data and how to enable it as long as you are aware of the performance impacts.
When ingested into Elasticsearch, field values are tokenized based on their data type.
Text fields are broken into tokens delimited by whitespace. I.E. "quick brown fox" creates three tokens: 'quick', 'brown', and 'fox'. If you perform a search for any of these three words, you will generate matches.
Keyword fields, on the other hand, create a single token of the entire value. I.E. "quick brown fox" is a single token, 'quick brown fox'. Searching for anything that is not exactly 'quick brown fox' will generate no matches.
Unless you scrubbed your query before you posted it here, you need to modify the field name under match to be the actual field name, like below.
{
"size": 100,
"query": {
"bool": {
"must": {
"match": {
"int_var": "whatever value you are searching for"
}
}
}
},
"sort": [
{"int_var": {"order": "asc"}}
]
}

Only getting results when elasticsearch is case sensitive

I currently have a problem with my multi match search in ES.
Its simple like that: If I'm searching for the City "Sachsen", I'm getting results.
If I'm searching for "sachsen" (lowercase), I'm getting no results.
how to avoid this?
QUERY with no results
{
"match" : {
"City" : {
"query" : "sachsen"
}
}
My analyzer is analyzer_keyword. Should I have anything add ?
MAPPING
City: {
type: "string",
analyzer: "analyzer_keyword"
}
Your analyzer_keyword analyzer is most probably of type keyword which means you can only perform exact matches on it.
It's standard practice to apply multiple "variants" of a field, one of which is going to match lowercase, possibly ascii-tokenized characters (think München -> munchen) and one which will not be tokenized in any way (this is what you have in your analyzer_keyword).
Since you intend to search the lowercase version of Sachsen, your mapping could look something like
PUT sachsen
{
"mappings": {
"properties": {
"City": {
"type": "keyword", <----
"fields": {
"standard": {
"type": "text",
"analyzer": "standard" <----
}
}
}
}
}
}
After indexing a doc
POST sachsen/_doc
{
"City": "Sachsen"
}
The following will work for exact matches:
GET sachsen/_search
{
"query": {
"match": {
"City": "Sachsen"
}
}
}
and this for lowercase
GET sachsen/_search
{
"query": {
"match": {
"City.standard": "sachsen"
}
}
}
Note that I'm using the default, standard analyzer here but you can choose any one you deem appropriate.

Proper way to query documents using an 'uppercase' token filter

We have an ElasticSearch index with some fields that use custom analyzers. One of the analyzers includes an uppercase token filter in order to get rid of case sensitivity while making queries (e.g. we want "ball" to also match "Ball" or "BALL")
The issue here is when doing regular expressions, the pattern is matched against the term in the index which is all uppercase. So "app*" won't match "Apple" in our index, because behind the scenes its really indexed as "APPLE".
Is there a way to get this to work without doing some hacky things outside of ES?
I might play around with "query_string" instead and see if that has any different results.
This all depends on the type of the query you are using. If that type will use the analyzer of the field itself to analyze the input string then it should be fine.
If you are using the regexp query, this one will NOT analyze the input string, so if you pass app.* to it, it will stay the same and this is what it will user for search.
But, if you use properly the query_string query that one should work:
{
"settings": {
"analysis": {
"analyzer": {
"my": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"uppercase"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"some_field": {
"type": "text",
"analyzer": "my"
}
}
}
}
}
And the query itself:
{
"query": {
"query_string": {
"query": "some_field:app*"
}
}
}
To make sure it's doing what I think it is, I always use the _validate api:
GET /_validate/query?explain&index=test
{
"query": {
"query_string": {
"query": "some_field:app*"
}
}
}
which will show what ES is doing to the input string:
"explanations": [
{
"index": "test",
"valid": true,
"explanation": "some_field:APP*"
}
]

Resources