handling both exact and partial search on the same search string - elasticsearch

I want to define the schema which can tackle the partial as well as the exact search for the same search value.
The exact search should always return the "exact match", ES should not break the search string into tokens in this case.

For partial match data type of the property should be text and for exact it should be keyword. For having the feasibility to have both partial and exact search without having to index the data to different properties you can leverage using fields. What it does is that it helps to index same data into different ways.
So, lets say you want to index name of persons, and have the ability for partial and exact search. In such case the mapping would be:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Lets index a few docs:
PUT test/_doc/1
{
"name": "Nishant Saini"
}
PUT test/_doc/2
{
"name": "Nishant Kumar"
}
For partial search we have to query name field and it is of type text.
GET test/_doc/_search
{
"query": {
"query_string": {
"query": "Nishant Saini",
"field": [
"name"
]
}
}
}
The above query will return both docs (1 and 2) because one token i.e. Nishant appears in both the document for field name.
For exact search we need to query on name.keyword. To perform exact match we can use term query as below:
{
"query": {
"term": {
"name.keyword": "Nishant Saini"
}
}
}
This would match doc 1 only.

Related

how to write Elastic search query for exact match for a string

I am using kibanna
I am trying to put filter on a field container_name = "armenian"
but I have other container names with following names
armenian_alpha
armenian_beta
armenian_gama
armenian1
armenian2
after putting the filter , search query in kibanna becomes
{
"query": {
"match": {
"container_name": {
"query": "armenian",
"type": "phrase"
}
}
}
}
But the output searches logs for all containers , as I can see the Elastic search query is using a pattern matching
How can I put an exact match with the string provided and avoid the rest ?
You can try out with term query. Do note that it is case sensitive by default unless you specify with case_insensitive equals to true. Also, if your container_name is a text field type instead of keyword field type, do add the .keyword after the field name. Otherwise, ignore the .keyword.
Example:
GET /_search
{
"query": {
"term": {
"container_name.keyword": {
"value": "armenian"
}
}
}
}
Link here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
I would recommend using a direct wildcard in query or wildcard as follow
GET /_search
{
"query": {
"match": {
"container_name": {
"query": "*armenian",
"type": "phrase"
}
}
}
}
GET /_search
{
"query": {
"wildcard": {
"container_name": {
"value": "*armenian"
}
}
}
}
With *armenian you are ensuring that armenian comes at the end.

Kibana - missing text highlighting for multi-field mapping

I am experimenting with ECS - Elastic Common Schema.
We need to highlight text search for the field error.stack_trace . This field is a multi-field mapped defined here
I just did a simple test running Elasticsearch and Kibana 7.17.4 one field defined as multi-field and one with single field.
PUT simple-index-01
{
"mappings": {
"properties": {
"stack_trace01": { "type": "text" },
"stack_trace02": {
"fields": {
"text": {
"type": "text"
}
},
"type": "wildcard"
}
}
}
}
POST simple-index-01/_doc
{
"#timestamp" : "2022-06-07T08:21:05.000Z",
"stack_trace01": "java.lang.NullPointerException: null",
"stack_trace02": "java.lang.NullPointerException: null"
}
Is it a Kibana expected behavior not to highlight multi-fields?
wildcard type will be not available to search using full text query as mentioned in documentaion (it is part of keyword type family):
The wildcard field type is a specialized keyword field for
unstructured machine-generated content you plan to search using
grep-like wildcard and regexp queries.
So when you try below query it will not return result and this is the reason why it is not highlghting your stack_trace02 field in discover.
POST simple-index-01/_search
{
"query": {
"match": {
"stack_trace02": "null"
}
}
}
But below query will give result:
{
"query": {
"wildcard": {
"stack_trace02": {
"value": "*null*"
}
}
}
}
You can create index mapping something like below and your parent type field should text type:
PUT simple-index-01
{
"mappings": {
"properties": {
"stack_trace01": {
"type": "text"
},
"stack_trace02": {
"fields": {
"text": {
"type": "wildcard"
}
},
"type": "text"
}
}
}
}
You can now use stack_trace02.wildcard when you want to search wildcard type of query.
There is already open issue on similar behaviour but it is not for wildcard type.

elasticsearch query string(full text search) which can't be searched

we have a document below. I can't searched with financialmarkets. but it can be searched with industry_icon_financialmarkets.png. Can anyone tell me what is the reason?
content is the text type field.
document:
{
"title":"test",
"content":"industry_icon_financialmarkets.png"
}
Query:
{
"from": 0,
"size": 2,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "\"industry_icon_financialmarkets.png\""
}
}
]
}
}
}
The default analyzer for text field is standard which won't break industry_icon_financialmarkets into tokens using _ as a delimiter. I would suggest you to use simple analyzer instead which will breaks text into terms whenever it encounters a character which is not a letter.
You can also add sub-field of type keyword to retain the original value.
So the mapping of the field should be:
{
"content": {
"type": "text",
"analyzer": "simple",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
At the time of creating index, we should have our own mapping for each fields based on its type to get the expected result.
Mapping
PUT relevance
{"mapping":{"ID":{"type":"long"},"title":
{"type":"keyword","analyzer":"my_analyzer"},
"content":
{"type":"string","analyzer":"my_analyzer","search_analyzer":"my_analyzer"}},
"settings":
{"analysis":
{"analyzer":
{"my_analyzer":
{"tokenizer":"my_tokenizer"}},
"tokenizer":
{"my_tokenizer":
{"type":"ngram","min_gram":3,"max_gram":30,"token_chars":
["letter","digit"]
}
}
},"number_of_shards":5,"number_of_replicas":2
}
}
Then start inserting documents,
POST relevance/_doc/1
{
"name": "1elastic",
"content": "working fine" //replace special characters with space using program before inserting into ES index.
}
Query
GET relevance/_search
{"size":20,"query":{"bool":{"must":[{"match":{"content":
{"query":"fine","fuzziness":1}}}]}}}

How to store URLs in Elasticsearch for fast access?

I would like to use Elasticsearch for a website and need a fast way to retrieve documents via the page's URL strings - actually paths (e.g. /shoes/sneakers/nike). The paths are unique.
Following solutions come to my mind:
Store as string, indexed, not analyzed
Store in the _id field
Which one would be the better solution and are there maybe better methods?
Thanks!
You can store the url field as keyword datatype and use the below query to get the results.
https://www.elastic.co/guide/en/elasticsearch/reference/master/keyword.html
POST index/_search
{
"query": {
"term": {
"url": "/shoes/sneakers/nike"
}
}
}
if you store it as text data type then elasticsearch will automatically create a keyword field for you and you can use the below query to get the results
mapping created by Elasticsearch
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
query to search
POST index/_search
{
"query": {
"term": {
"url.keyword": "/shoes/sneakers/nike"
}
}
}
{
"query": {
"match_phrase": {
"url": "/shoes/sneakers/nike"
}
}
}
You need to store url field with their values. This match_phrase will solve your problem.

Elastic search highlight not working

I am new to elastic search and i am trying to highlight the matched keywords
GET /{index}/_search
{
"query": {
"match": {
"_all": "first"
}
},
"highlight": {
"fields": {
"*": {}
},
"require_field_match": false
}
}
My output is a nested object.I also tried without "require_field_match" parameter
You can use one of the 2 methods mentioned in below link to search and highlight on all fields
A field can only be used for highlighting if the original string value
is available, either from the _source field or as a stored field.
The _all field is not present in the _source field and it is not
stored or enabled by default, and so cannot be highlighted. There are
two options. Either store the _all field or highlight the original
fields.
Highlight all fields
you can't produce a highlight with a search from the _all field.
You have to search in an actual field for it to work:
GET /{index}/_search
{
"query": {
"match": {
"title": "first"
}
},
"highlight": {
"fields": {
"title": {}
}
}
}

Resources