Elastic search Match query with comma value not working - elasticsearch

Hi We wanted to suppot both partial search and exact match for one filed category.
Here is the mapping for category , We achieved this with fields.raw
"category": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete",
"search_analyzer": "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Everything is working as expected , I am able to do both exact and partial search.
But When I am having char comma "," in the data , Exact match is not working.
I am searching with category.name.raw, which is not_analyzed filed
{ "query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "",
"type": "cross_fields",
"fields": [
"filed1",
"field2^12"
]
}
},
{
"match": {
"category.name.raw": " Poverty, Poor and Hunger"
}
}
]
}
}
}}}
I am not getting any results, I am not sure what I am doing wrong, Please help me to fix this.
Thanks in advance

Try to use below analyzer:
"lower_whitespace" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "whitespace"
}
for more details check below about tokenizers:
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/analysis-whitespace-analyzer.html
And it seems you using an old version from Elastic search consider upgrading will be a great idea

The problem is
{
"match": {
"category.name.raw": " Poverty, Poor and Hunger"
}
}
Whilst the targeted field is mapped as not_analyzed (aka keyword in newer versions of Elasticsearch), the query input here will be analyzed. I think it'll inherit the standard analyzer defined for the search_analyzer on category.name.
If you need an exact match, use a term query instead of the match query.

Related

Elasticsearch synonyms that include spaces, commas and parentheses

I'm attempting to configure Elasticsearch (version 6.4) so it's possible to do full text search on documents that may contain chemical names using a number of chemical synonyms. The synonym terms can:
be multi-word (i.e. contain spaces)
contain hyphens
contain parentheses
contain commas
Can anyone help me come up with a configuration that meets these requirements?
The index config I have at the moment looks like this:
PUT /documents
{
"settings": {
"analysis": {
"analyzer": {
"chemical_synonyms": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["lowercase","chem_synonyms"]
},
"lower": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
},
"filter": {
"chem_synonyms": {
"type": "synonym_graph",
"synonyms":[
"N\\,N-Bis(2-hydroxyethyl)amine, Niax DEOA-LF, 111-42-2"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"text": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
},
"raw": {
"type": "text",
"analyzer":"lower"
}
}
}
}
}
}
}
This config contains a single line of SOLR style synonyms. In reality there are more and they come from a file, but the jist is the same.
Assume I have three documents:
PUT /documents/doc/1
{"text": "N,N-Bis(2-hydroxyethyl)amine"}
PUT /documents/doc/2
{"text": "Niax DEOA-LF"}
PUT /documents/doc/3
{"text": "111-42-2"}
If I run a search using this config:
POST /documents/_search
{
"query": {
"bool": {
"should": [
{
"query_string": {
"default_operator": "AND",
"type": "cross_fields",
"query": "\"N,N-Bis(2-hydroxyethyl)amine\""
}
},
{
"query_string": {
"default_operator": "AND",
"default_field": "*.raw",
"analyzer": "chemical_synonyms",
"query": "\"N,N-Bis(2-hydroxyethyl)amine\""
}
}
]
}
}
}
I would expect it to match all three documents, however it's currently not matching document 2. Changing the query to "111-42-2" also fails to match document 2. Searching for "Niax DEOA-LF" correctly matches all three.
How can I change either my index config or my search query (or both) so that a search for any one of these synonym terms will match all documents that contain any other of the synonym terms? Also normal full text searching must also continue to work so any changes can't prevent standard text searching of non-synonym terms from working.

How do I search for partial accented keyword in elasticsearch?

I have the following elasticsearch settings:
"settings": {
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":["lowercase", "asciifolding"]
}
}
}
}
}
The above works fine for the following keywords:
Beyoncé
Céline Dion
The above data is stored in elasticsearch as beyonce and celine dion respectively.
I can search for Celine or Celine Dion without the accent and I get the same results. However, the moment I search for Céline, I don't get any results. How can I configure elasticsearch to search for partial keywords with the accent?
The query body looks like:
{
"track_scores": true,
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["name"],
"type": "phrase",
"query": "Céline"
}
}
]
}
}
}
and the mapping is
"mappings" : {
"artist" : {
"properties" : {
"name" : {
"type" : "string",
"fields" : {
"orig" : {
"type" : "string",
"index" : "not_analyzed"
},
"simple" : {
"type" : "string",
"analyzer" : "analyzer_keyword"
}
},
}
I would suggest this mapping and then go from there:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
}
Confirm that the same analyzer is getting used at query time. Here are some possible reasons why that might not be happening:
you specify a separate analyzer at query time on purpose that is not performing similar analysis
you are using a term or terms query for which no analyzer is applied (See Term Query and the section title "Why doesn’t the term query match my document?")
you are using a query_string query (E.g. see Simple Query String Query) - I have found that if you specify multiple fields with different analyzers and so I have needed to separate the fields into separate queries and specify the analyzer parameter (working with version 2.0)

Exact match in elastic search query

I want to exactly match the string ":Feed:" in a message field and go back a day pull all such records. The json I have seems to also match the plain word " feed ". I am not sure where I am going wrong. Do I need to add "constant_score" to this query JSON? The JSON I have currently is as shown below:
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": ["message"],
"query": "\\:Feed\\:"
}
},
"must": {
"range": {
"timestamp": {
"gte": "now-1d",
"lte": "now"
}
}
}
}
}
}
As stated here: Finding Exact Values, since the field has been analyzed when indexed - you have no way of exact-matching its tokens (":"). Whenever the tokens should be searchable the mapping should be "not_analyzed" and the data needs to be re-indexed.
If you want to be able to easily match only ":feed:" inside the message field you might want to costumize an analyzer which doesn't tokenize ":" so you will be able to query the field with a simple "match" query instead of wild characters.
Not able to do this with query_string but managed to do so by creating a custom normalizer and then using a "match" or "term" query.
The following steps worked for me.
create a custom normalizer (available >V5.2)
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"filter": ["lowercase"]
}
}
}
}
Create a mapping with type "keyword"
{
"mappings": {
"default": {
"properties": {
"title": {
"type": "text",
"fields": {
"normalize": {
"type": "keyword",
"normalizer": "my_normalizer"
},
"keyword" : {
"type": "keyword"
}
}
}
}
}
}
use match or term query
{
"query": {
"bool": {
"must": [
{
"match": {
"title.normalize": "string to match"
}
}
]
}
}
}
Use match phrase
GET /_search
{
"query": {
"match_phrase": {
"message": "7000-8900"
}
}
}
In java use matchPhraseQuery of QueryBuilder
QueryBuilders.matchPhraseQuery(fieldName, searchText);
Simple & Sweet Soln:
use term query..
GET /_search
{
"query": {
"term": {
"message.keyword": "7000-8900"
}
}
}
use term query instead of match_phrase,
match_phrase this find/match with ES-document stored sentence, It will not exactly match. It matches with those sentence words!

Why match_phrase_prefix query returns wrong results with diffrent length of phrase?

I have very simple query:
POST /indexX/document/_search
{
"query": {
"match_phrase_prefix": {
"surname": "grab"
}
}
}
with mapping:
"surname": {
"type": "string",
"analyzer": "polish",
"copy_to": [
"full_name"
]
}
and definition for index (I use Stempel (Polish) Analysis for Elasticsearch plugin):
POST /indexX
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym" : {
"type": "synonym",
"synonyms_path": "analysis/synonyms.txt"
},
"polish_stop": {
"type": "stop",
"stopwords_path": "analysis/stopwords.txt"
},
"polish_my_stem": {
"type": "stemmer",
"rules_path": "analysis/stems.txt"
}
},
"analyzer": {
"polish_with_synonym": {
"tokenizer": "standard",
"filter": [
"synonym",
"lowercase",
"polish_stop",
"polish_stem",
"polish_my_stem"
]
}
}
}
}
}
}
For this query I get zero results. When I change phrase to GRA or GRABA it returns 1 result (GRABARZ is the surname). Why is this happening?
I tried max_expansions with values even as high as 1200 and that didn't help.
At the first glance, your analyzer stems the search term ("grab") and renders it unusable ("grabić").
Without going into details on how to resolve this, please consider getting rid of polish analyzer here. We are talking about people's names, not "ordinary" polish words.
I saw different techniques used in this case: multi-field searches, fuzzy searches, phonetic searches, dedicated plugins.
Some links:
https://www.elastic.co/blog/multi-field-search-just-got-better
http://www.basistech.com/fuzzy-search-names-in-elasticsearch/
https://www.found.no/play/gist/6c6434c9c638a8596efa
But I guess in case of polish names some kind of prefix query on non-analyzed field would suffice...

analyzer for spelling mistakes

I have saved the user inputs directly in elastcisearch. The name field has various spelling combinations for the same student.
PrabhuNath Prasad
PrabhuNathPrasad
Prabhu NathPrasad
Prabhu Nath Prashad
PrabhuNath Prashad
PrabhuNathPrashad
Prabhu NathPrashad
The real name of the student is "Prabhu Nath Prasad" and when I search by that name, I should get all the above results back. Is there any analyzer in elasticsearch that can take care of it?
You could do that custom_analyzer, This is my setup
POST name_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"char_filter": [
"space_removal"
],
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"char_filter": {
"space_removal": {
"type": "pattern_replace",
"pattern": "\\s+",
"replacement": ""
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"name": {
"type": "string",
"fields": {
"variation": {
"type": "string",
"analyzer": "my_custom_analyzer"
}
}
}
}
}
}
}
I have mapped name with both standard analyzer and custom_analyzer which uses keyword tokenizer and lowercase filter along with char_filter which removes space and joins the string. This char_filter will help us query different variations effectively.
I inserted all those 7 combinations you have given in index. This is my query
GET name_index/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "Prabhu Nath Prasad"
}
},
{
"match": {
"name.variation": {
"query": "Prabhu Nath Prasad",
"fuzziness": "AUTO"
}
}
}
]
}
}
}
This handles all your possibilities and it will also give back prabhu, prasad etc.
Hope this helps!!
There is no analyzer for that however, what you can look into is the "fuzzy"..
In your query specify the fuzziness which can help you in getting the above record.
I will Suggest you to go through the links below
https://www.elastic.co/blog/found-fuzzy-search
https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzziness.html
This will help you achieve what you want.
Also there wont be any straight way to get the record if the user have typed "PrabhuNath", because elastic will treat it as a single token, however you can use "phrase_prefix" query which help you fetch records while the user is typing..
Your query will look like this to get the basic spelling mistake
{
"query": {
"match": {
"name": {
"query":"PrabhuNath Prasad",
"fuzziness": 2
}
}
}
}

Resources