Elasticsearch - how to search exact string match with special charater (-) in the json document - elasticsearch

i have been using ES 5.x version and this is my sample data set json.
{"id":"1"}
{.... "company" : "HCL-US",....}
{"id":"2"}
{.... "company" : "HCL",....}
{"id":"3"}
{.... "company" : "HCL-IND",....}
{"id":"4"}
{.... "company" : "HCL-AUS",....}
How can i search and get who is belonging to "HCL-US". i tried using this query "_search?q=company:"HCL-US"" , it is returning HCL * result. How can i match exact string with special string.

You can use Term Query that matches exact term. Assuming company is a text field, you will get a keyword version of the same , following query should do the needful
{
"query": {
"term": {
"company.keyword": {
"value": "HCL-US"
}
}
}
}

1/ You can specify a whitespace analyzer in the mapping for the field company. This analyzer will split the query only on whitespace while the standard will split on non-alphanumeric characters.
The standard analyzer is the one used when no analyzer is defined.
2/ Or your can query on company.keyword which is a field automatically created for text field since 5.X . This keyword is not analyzed and you can safely use a term query on it to do exact matching.

Related

Elastic search wildcard search space issue

Consider index field "ProductName" having the value "dove 3.75oz" and when user searches for "dove 3.75oz" text below bool query is working fine to retreive the document:
{"bool":{"must":[{"wildcard":{"ProductName":{"value":"dove"}}},{"wildcard":{"ProductName":{"value":"3.75oz"}}}]}}
If user searches for "dove 3.75 oz" (Space between "3.75" and "oz") the bool query is failing to retrieve the same document:
{"bool":{"must":[{"wildcard":{"ProductName":{"value":"dove"}}},{"wildcard":{"ProductName":{"value":"3.75 oz"}}}]}}
Question: How to design a query using a wildcard query that supports space or no spaces? Please share an example.
Text fields values are broken into tokens by default and then stored. So something like "hello man"" will be saved separately as hello and man because of the space between them. And that is exactly why this will not work with a wildcard query.
{"wildcard":{"ProductName":{"value":"3.75 oz"}}}
It only works for single tokens. For wildcard queries you can use a special field type called wildcard.
If you do not want to reindex your data, try phrase search like:
"match_phrase": {
"ProductName": {
"query": "3.75 oz"
}
}

Elastic Search - Conditional field query if no match found for another field

Is it possible to do conditional field query if match was not found for another field ?
for eg: if I have a 3 fields in the index local_rating , global_rating and default_rating , I need to first check in local_rating and if there is no match then try for global_rating and finally for default_rating .
is this possible to do with one query ? or any other ways to achieve this
thanks in advance
Not sure about any existing features of Elasticsearh to fulfill your current requirements but you can try with fields and per-fields boosting, Individual fields can be boosted with the caret (^)notation. Also I don't know boosting is possible with numeric value or not?
GET /_search
{
"query": {
"multi_match" : {
"query" : 10,
"fields" : [ "local_rating^6", "global_rating^3","default_rating"]
}
}
}
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#field-boost

ElasticSearch Search query is not case sensitive

I am trying to search query and it working fine for exact search but if user enter lowercase or uppercase it does not work as ElasticSearch is case insensitive.
example
{
"query" : {
"bool" : {
"should" : {
"match_all" : {}
},
"filter" : {
"term" : {
"city" : "pune"
}
}
}
}
}
it works fine when city is exactly "pune", if we change text to "PUNE" it does not work.
ElasticSearch is case insensitive.
"Elasticsearch" is not case-sensitive. A JSON string property will be mapped as a text datatype by default (with a keyword datatype sub or multi field, which I'll explain shortly).
A text datatype has the notion of analysis associated with it; At index time, the string input is fed through an analysis chain, and the resulting terms are stored in an inverted index data structure for fast full-text search. With a text datatype where you haven't specified an analyzer, the default analyzer will be used, which is the Standard Analyzer. One of the components of the Standard Analyzer is the Lowercase token filter, which lowercases tokens (terms).
When it comes to querying Elasticsearch through the search API, there are a lot of different types of query to use, to fit pretty much any use case. One family of queries such as match, multi_match queries, are full-text queries. These types of queries perform analysis on the query input at search time, with the resulting terms compared to the terms stored in the inverted index. The analyzer used by default will be the Standard Analyzer as well.
Another family of queries such as term, terms, prefix queries, are term-level queries. These types of queries do not analyze the query input, so the query input as-is will be compared to the terms stored in the inverted index.
In your example, your term query on the "city" field does not find any matches when capitalized because it's searching against a text field whose input underwent analysis at index time. With the default mapping, this is where the keyword sub field could help. A keyword datatype does not undergo analysis (well, it has a type of analysis with normalizers), so can be used for exact matching, as well as sorting and aggregations. To use it, you would just need to target the "city.keyword" field. An alternative approach could also be to change the analyzer used by the "city" field to one that does not use the Lowercase token filter; taking this approach would require you to reindex all documents in the index.
Elasticsearch will analyze the text field lowercase unless you define a custom mapping.
Exact values (like numbers, dates, and keywords) have the exact value
specified in the field added to the inverted index in order to make
them searchable.
However, text fields are analyzed. This means that their values are
first passed through an analyzer to produce a list of terms, which are
then added to the inverted index. There are many ways to analyze text:
the default standard analyzer drops most punctuation, breaks up text
into individual words, and lower cases them.
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
So if you want to use a term query — analyze the term on your own before querying. Or just lowercase the term in this case.
To Solve this issue i create custom normalization and update mapping to add,
before we have to delete index and add it again
First Delete the index
DELETE PUT http://localhost:9200/users
now create again index
PUT http://localhost:9200/users
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"user": {
"properties": {
"city": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}

elastic search fetch the exact match first followed by others

I am newbie to elastic search
I have an education index in es
index creation
when i search 'btech' with match query as
"match" : { "name" : "btech" }
the result is like
result json object
but i need btech(exact match word) as the first document and remaining documents followed by it.
so for that what i have to change in my index creation
can anybody please help me
You can use term query
"term" : { "name" : "btech" }
Or regexp query
"regexp" : { "name" : "btech" }
You are using text type, make sure to check keyword type too
from documentation
If you need to index structured content such as email addresses,
hostnames, status codes, or tags, it is likely that you should rather
use a keyword field.

Elasticsearch match query and tokenization

I wrote the following query concerning a field that is tokenized by whitespace :
"match" {
"field" : {
"query" : "bora"
}
}
I have two documents that matches the query on my index, one with "bora" on that field, another with "bora bora".
My problem is that "bora bora" document ends up with a better score than the other and this is not the required behaviour.
Do you see a way to do the same query but prioritizing the records which are not a repetition of the searched word ?
I can't update the index / remove the tokenization.

Resources