Synonym analyzer not working

Synonym analyzer not working - elasticsearch

Here are my settings:
{
"countries": {
"aliases": {},
"mappings": {
"country": {
"properties": {
"countryName": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"creation_date": "1472140045116",
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms_path": "synonym.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "whitespace"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "7-fKyD9aR2eG3BwUNdadXA",
"version": {
"created": "2030599"
}
}
},
"warmers": {}
}
}
My synonym.txt file is in the config folder inside the main elasticsearch folder.
Here is my query:
query: {
query_string: {
fields: ["countryName"],
default_operator: "AND",
query: searchInput,
analyzer: "synonym"
}
}
The words in synonym.txt are: us, u.s., united states.
So this doesn't work. What's interesting is that search works as normal, except for when I enter any of the words in the synonym.txt file. So for example, when I usually type in us into the search, I would get results. With this analyzer, us doesn't give me anything.
I've done close and open to my ES server, and still it doesn't work.
EDIT
An example of a document:
{
"_index": "countries",
"_type": "country",
"_id": "57aabeb80057405968de152b",
"_score": 1,
"_source": {
"countryName": "United States"
}
Example of searchInput (this is coming from the front-end):
united states
EDIT #2:
Here is my updated index config file:
{
"countries": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"number_of_shards": "5",
"creation_date": "1472219634083",
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms_path": "synonym.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "whitespace"
}
}
},
"country": {
"properties": {
"countryName": {
"type": "string",
"analyzer": "synonym"
},
"number_of_replicas": "1",
"uuid": "50ZwpIVFTqeD_rJxlmd59Q",
"version": {
"created": "2030599"
}
}
},
"warmers": {}
}
}
}
}
When I try adding documents, and doing a search on said documents, the synonym analyzer does not work for me.
EDIT #3
Here are 2 documents in the index:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [{
"_index": "stocks",
"_type": "stock",
"_id": "2",
"_score": 1,
"_source": {
"countryName": "United States"
}
}, {
"_index": "stocks",
"_type": "stock",
"_id": "1",
"_score": 1,
"_source": {
"countryName": "Canada"
}
}]
}
}

You are close, but I suggest reading thoroughly this section from the documentation to understand better this functionality.
As a solution:
PUT /countries
{
"mappings": {
"country": {
"properties": {
"countryName": {
"type": "string",
"analyzer": "synonym"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms_path": "synonym.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"lowercase",
"synonym"
],
"tokenizer": "whitespace"
}
}
}
}
}
You need to delete the index and create it again with the mapping above.
Then use this query:
"query": {
"query_string": {
"fields": [
"countryName"
],
"default_operator": "AND",
"query": "united states"
}
}

Have you deleted/created the index after pushing the txt ?
I think you should remove the "synonyms": "" if you are using "synonyms_path"

Related

Getting the document with searched term first, before its synonyms [Elastic]

I think I should explain my problem with an example:
Assume that I've created index with synonym analyzer and I declare that "laptop", "phone" and "tablet" are similar words that can be generalized as "mobile":
PUT synonym
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"phone, tablet, laptop => mobile"
]
}
}
}
}
},
"mappings": {
"synonym" : {
"properties" : {
"field1" : {
"type" : "text",
"analyzer": "synonym",
"search_analyzer": "synonym"
}
}
}
}
}
Now I am creating some docs:
PUT synonym/synonym/1
{
"field1" : "phone"
}
PUT synonym/synonym/2
{
"field1" : "tablet"
}
PUT synonym/synonym/3
{
"field1" : "laptop"
}
Now when I match query for laptop, tablet or phone, the result is always:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.2876821,
"hits": [
{
"_index": "synonym",
"_type": "synonym",
"_id": "2",
"_score": 0.2876821,
"_source": {
"field1": "tablet"
}
},
{
"_index": "synonym",
"_type": "synonym",
"_id": "1",
"_score": 0.18232156,
"_source": {
"field1": "phone"
}
},
{
"_index": "synonym",
"_type": "synonym",
"_id": "3",
"_score": 0.18232156,
"_source": {
"field1": "laptop"
}
}
]
}
}
You can see that the score of tablet is always higher even when I search for laptop.
I know that is because I declared them as similar words.
However, I am trying to figure out how can I query so that document with the search term can appear in the first place, before the similar words in the result list.
It can be done by boosting, but there must be a simpler approach..

Multi-fields to your rescue.
Index the field1 in two ways, one with the synonym analyzer, and the other with a standard analyzer.
Now you can simply use a bool-should query to add score for match on field1 (synonym) and on field1.raw (standard).
So, your mappings should be like so:
PUT synonym
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2,
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"phone, tablet, laptop => mobile"
]
}
}
}
}
},
"mappings": {
"synonym": {
"properties": {
"field1": {
"type": "text",
"analyzer": "synonym",
"search_analyzer": "synonym",
"fields": {
"raw": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
}
}
And you can query using:
GET synonyms/_search?search_type=dfs_query_then_fetch
{
"query": {
"bool": {
"should": [
{
"match": {
"field1": "tablet"
}
},
{
"match": {
"field1.raw": "tablet"
}
}
]
}
}
}
Notice: I've used search_type=dfs_query_then_fetch. Since you're testing on 3 shards and have very few documents, the scores you're getting aren't what they should be. This is because the frequencies are calculated per shard. You can use dfs_query_then_fetch while testing but it is discouraged for production. See: https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch

Search across _all field in Elastic and return results with highlighting

I am using Elastic 5.4 and wanted to query across index containing documents of multiple types.(type a and type b). Below are example documents in the index:
Documents:
{
"_index": "test",
"_type": "a",
"_id": "1",
"_source": {
"id": "1",
"name": "john-usa-soccer",
"class": "5",
"lastseen": "2017-07-05",
"a_atts": {
"lastname": "tover",
"hobby": "soccer",
"country": "usa"
}
}
}
{
"_index": "test",
"_type": "b",
"_id": "2",
"_source": {
"id": "2",
"name": "john-usa",
"class": "5",
"lastseen": "2017-07-05",
"b_atts": {
"lastname": "kaml",
"hobby": "baseball",
"country": "usa"
}
}
}
Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "ngram",
"min_gram": "3",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"a": {
"dynamic_templates": [
{
"strings": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "my_ngram_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"suggest": {
"type": "completion",
"analyzer": "simple"
},
"analyzer1": {
"type": "text",
"analyzer": "simple"
},
"analyzer2": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
]
},
"b": {
"dynamic_templates": [
{
"strings": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "my_ngram_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"suggest": {
"type": "completion",
"analyzer": "simple"
},
"analyzer1": {
"type": "text",
"analyzer": "simple"
},
"analyzer2": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}
]
}
}
}
My query is to search all documents which contain 'john' across any of the fields in any type and highlight the fields where the match was found. This query is constructed as per Elastic documentation. My Schema mappings has ngram_analyzer configured as analyzer instead of default analyzer for all fields of type string in the schema.
Query: http://localhost:9200/student/_search
{
"query": {
"bool": {
"should": [
{ "match": { "_all": "john"} }
]
}
},
"highlight": {
"fields": {
"name": {
"require_field_match": false
},
"a_atts.lastname":{
"require_field_match": false
},
"a_atts.hobby":{
"require_field_match": false
},
"a_atts.country":{
"require_field_match": false
}
}
}
}
Response:
{
"took": 79,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.17669111,
"hits": [
{
"_index": "student",
"_type": "a",
"_id": "AV1WjBeYEZrDBYsdGMtY",
"_score": 0.17669111,
"_source": {
"name": "john-usa-soccer",
"class": "5",
"lastseen": "2017-07-05",
"a_atts": {
"lastname": "tover",
"hobby": "soccer",
"country": "usa"
}
}
},
{
"_index": "student",
"_type": "b",
"_id": "AV1WjHFxEZrDBYsdGMtZ",
"_score": 0.17669111,
"_source": {
"name": "john-usa",
"class": "5",
"lastseen": "2017-07-05",
"b_atts": {
"lastname": "kaml",
"hobby": "baseball",
"country": "usa"
}
}
}
]
}
}
However, executing the above query against an index, returns documents matched with their _source content but not highlight field. It is missing the following:
"highlight": {
"name": [
"<em>john</em>-usa-soccer"
]
}
How can I return highlight in the results?

I got highlighter to work by following the answer provided in this link.
"highlight": {
"fields": {
"*": {}
},
"require_field_match": false
}

Elasticsearch: Shingles not working with odd no. of search words

When trying to search with odd no. of words with my shingle analyzer no result shows up.
'new delhi abcd' does not return any results but 'new delhi abcd xyz' returns
My analyzers look like below
{
"search": {
"settings": {
"index": {
"analysis": {
"filter": {
"my_shingle_filter": {
"max_shingle_size": "2",
"min_shingle_size": "2",
"output_unigrams": "false",
"type": "shingle"
},
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "40"
}
},
"analyzer": {
"my_shingle_analyzer": {
"filter": [
"lowercase",
"my_shingle_filter"
],
"type": "custom",
"tokenizer": "whitespace"
},
"shingle_with_autocomplete": {
"filter": [
"lowercase",
"my_shingle_filter",
"autocomplete_filter"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
}
}
The mapping for the type is shared below
{
"search": {
"mappings": {
"address": {
"properties": {
"full_address": {
"type": "text",
"norms": false,
"fields": {
"edge_n_grams": {
"type": "text",
"norms": false,
"analyzer": "autocomplete"
},
"shingles": {
"type": "text",
"norms": false,
"analyzer": "shingle_with_autocomplete"
},
"synonym": {
"type": "text",
"norms": false,
"analyzer": "synonym_autocomplete"
}
},
"analyzer": "whitespace"
}
}
}
}
}
}
Query 1
GET search/address/_search?_source=full_address
{
"query": {
"match": {
"full_address.shingles": {
"query": "new delhi",
"analyzer": "my_shingle_analyzer"
}
}
}
}
Results
{
...
"hits": {
"total": 21801,
"max_score": 8.015874,
"hits": [{
"_index": "search",
"_type": "address",
"_id": "581c50297fd84ecc35420570",
"_score": 8.015874,
"_source": {
"full_address": "new delhi nagar palika adarsh vidyalay new delhi nagar palika adarsh vidyalay tilak lane tilak marg area new delhi delhi 110001"
}
}, {
"_index": "search",
"_type": "address",
"_id": "581c502a7fd84ecc35422010",
"_score": 7.013889,
"_source": {
"full_address": "kingdom hall of jehovant witness gyan bharti public school saket new delhi new delhi delhi 110017"
}
},
...
]
}
}
Query 2
GET search/address/_search?_source=full_address
{
"query": {
"match": {
"full_address.shingles": {
"query": "new delhi nag",
"analyzer": "my_shingle_analyzer"
}
}
}
}
Results
{
...,
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
'new delhi nagar pal' returns proper results.
Any help/insights would be appreciated

This is a bug in ES 5.2.x. Issue is with the single filter in new version

elasticsearch not returning text when entered partial word

I have my analyzers set like this:
"analyzer": {
"edgeNgram_autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "autocomplete"]
},
"full_name": {
"filter":["standard","lowercase","asciifolding"],
"type":"custom",
"tokenizer":"standard"
}
My filter:
"filter": {
"autocomplete": {
"type": "edgeNGram",
"side":"front",
"min_gram": 1,
"max_gram": 50
}
Name field analyzer:
"textbox": {
"_parent": {
"type": "document"
},
"properties": {
"text": {
"fields": {
"text": {
"type":"string",
"analyzer":"full_name"
},
"autocomplete": {
"type": "string",
"index_analyzer": "edgeNgram_autocomplete",
"search_analyzer": "full_name",
"analyzer": "full_name"
}
},
"type":"multi_field"
}
}
}
Put all together, makes up my mapping for docstore index:
PUT http://localhost:9200/docstore
{
"settings": {
"analysis": {
"analyzer": {
"edgeNgram_autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "autocomplete"]
},
"full_name": {
"filter":["standard","lowercase","asciifolding"],
"type":"custom",
"tokenizer":"standard"
}
},
"filter": {
"autocomplete": {
"type": "edgeNGram",
"side":"front",
"min_gram": 1,
"max_gram": 50
} }
}
},
"mappings": {
"space": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"document": {
"_parent": {
"type": "space"
},
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"textbox": {
"_parent": {
"type": "document"
},
"properties": {
"bbox": {
"type": "long"
},
"text": {
"fields": {
"text": {
"type":"string",
"analyzer":"full_name"
},
"autocomplete": {
"type": "string",
"index_analyzer": "edgeNgram_autocomplete",
"search_analyzer": "full_name",
"analyzer":"full_name"
}
},
"type":"multi_field"
}
}
},
"entity": {
"_parent": {
"type": "document"
},
"properties": {
"bbox": {
"type": "long"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Add a space to hold all docs:
POST http://localhost:9200/docstore/space
{
"name": "Space 1"
}
When user enters word: proj
this should return, all text:
SampleProject
Sample Project
Project Name
myProjectname
firstProjectName
my ProjectName
But it returns nothing.
My query:
POST http://localhost:9200/docstore/textbox/_search
{
"query": {
"match": {
"text": "proj"
}
},
"filter": {
"has_parent": {
"type": "document",
"query": {
"term": {
"name": "1-a1-1001.pdf"
}
}
}
}
}
If I search by project, I get:
{ "took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 3.0133555,
"hits": [
{
"_index": "docstore",
"_type": "textbox",
"_id": "AVRuV2d_f4y6IKuxK35g",
"_score": 3.0133555,
"_routing": "AVRuVvtLf4y6IKuxK33f",
"_parent": "AVRuV2cMf4y6IKuxK33g",
"_source": {
"bbox": [
8750,
5362,
9291,
5445
],
"text": [
"Sample Project"
]
}
},
{
"_index": "docstore",
"_type": "textbox",
"_id": "AVRuV2d_f4y6IKuxK35Y",
"_score": 2.4106843,
"_routing": "AVRuVvtLf4y6IKuxK33f",
"_parent": "AVRuV2cMf4y6IKuxK33g",
"_source": {
"bbox": [
8645,
5170,
9070,
5220
],
"text": [
"Project Name and Address"
]
}
}
]
}
}
Maybe my edgengram is not suited for this?
I am saying:
side":"front"
Should I do it differently?
Does anyone know what I am doing wrong?

The problem is with the autocomplete indexing analyzer field name.
Change:
"index_analyzer": "edgeNgram_autocomplete"
To:
"analyzer": "edgeNgram_autocomplete"
And also search like (#Andrei Stefan) showed in his answer:
POST http://localhost:9200/docstore/textbox/_search
{
"query": {
"match": {
"text.autocomplete": "proj"
}
}
}
And it will work as expected!
I have tested your configuration on Elasticsearch 2.3
By the way, type multi_field is deprecated.
Hope I have managed to help :)

Your query should actually try to match on text.autocomplete and not text:
"query": {
"match": {
"text.autocomplete": "proj"
}
}

edge_ngram filter and not analzyed to match search

I have the following elastic search configuration:
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"snow_filter" : {
"type" : "snowball",
"language" : "English"
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"snow_filter",
"autocomplete_filter"
]
}
}
}
}
}
PUT /my_index/_mapping/my_type
{
"my_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "snowball"
},
"not": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "name": "Brown foxes" }
{ "index": { "_id": 2 }}
{ "name": "Yellow furballs" }
{ "index": { "_id": 3 }}
{ "name": "my discovery" }
{ "index": { "_id": 4 }}
{ "name": "myself is fun" }
{ "index": { "_id": 5 }}
{ "name": ["foxy", "foo"] }
{ "index": { "_id": 6 }}
{ "name": ["foo bar", "baz"] }
I am trying to get a search to only return item 6 that has a name of "foo bar" and I am not quite sure how. This is what I am doing right now:
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "foo b"
}
}
}
}
I know it's a combination of how the tokenizer is splitting the word but sort of lost on how both be flexible and be strict enough to match this. I am guessing I need to do a multiple field on my mapping of name, but I am not sure. How can I fix the query and/or my mapping to satisfy my needs?

You're already close. Since your edge_ngram analyzer generates tokens of a minimum length of 1, and your query gets tokenized into "foo" and "b", and the default match query operator is "or", your query matches each document that has a term starting with "b" (or "foo"), three of the docs.
Using the "and" operator seems to do what you want:
POST /my_index/my_type/_search
{
"query": {
"match": {
"name": {
"query": "foo b",
"operator": "and"
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4451914,
"hits": [
{
"_index": "test_index",
"_type": "my_type",
"_id": "6",
"_score": 1.4451914,
"_source": {
"name": [
"foo bar",
"baz"
]
}
}
]
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/4f6fb7c1fdc6942023091ee1433d7490e04e7dea

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Synonym analyzer not working - elasticsearch

Have you deleted/created the index after pushing the txt ? I think you should remove the "synonyms": "" if you are using "synonyms_path"

Related

Getting the document with searched term first, before its synonyms [Elastic]

Search across _all field in Elastic and return results with highlighting

Elasticsearch: Shingles not working with odd no. of search words

elasticsearch not returning text when entered partial word

edge_ngram filter and not analzyed to match search

Categories

Resources