Elasticsearch neglecting special characters - elasticsearch

word search in elasticsearch is working fine, but it seems to neglect all special characters. For example, i have this data (123) apple and 123 pear, but when i query "(123)", i expect "(123) apple" to be the first that appear instead of "123 pear". I have tried to change tokeniser from standard tokenizer to whitespace tokenizer, but still not working. Kindly advice. Thanks!
Data:
(123) apple
123 pear
Query: "(123)"
Expected:
(123) apple
123 pear
Actual result:
123 pear
(123) apple

I tried with whitespace tokenizer, it worked
PUT /index25
{
"mappings": {
"properties": {
"message":{
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}
Data:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cIC70m0BD5PlkoxX1O0B",
"_score" : 1.0,
"_source" : {
"message" : "123 pear"
}
},
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cYC70m0BD5PlkoxX9-3n",
"_score" : 1.0,
"_source" : {
"message" : "(123) apple"
}
}
]
Query:
GET index25/_search
{
"query": {
"match": {
"message": "(123)"
}
}
}
Response:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cYC70m0BD5PlkoxX9-3n",
"_score" : 0.47000363,
"_source" : {
"message" : "(123) apple"
}
}
]
Query:
GET index25/_search
{
"query": {
"match": {
"message": "123"
}
}
}
Response:
[
{
"_index" : "index25",
"_type" : "_doc",
"_id" : "cIC70m0BD5PlkoxX1O0B",
"_score" : 0.9808292,
"_source" : {
"message" : "123 pear"
}
}
]

Related

Synonyms relevance issue in Elasticsearch

I am trying to configured synonyms in elasticsearch and done the sample configuration as well. But not getting expected relevancy when i am searching data.
Below is index Mapping configuration:
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"filter": {
"my_synonyms": {
"type": "synonym",
"synonyms": [
"mind, brain",
"brainstorm,brain storm"
]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase"
]
},
"my_search_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonyms"
]
}
}
}
}
},
"mappings": {
"properties": {
"my_field": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Below is sample data which i have indexed:
POST test_index/_bulk
{ "index" : { "_id" : "1" } }
{"my_field": "This is a brainstorm" }
{ "index" : { "_id" : "2" } }
{"my_field": "A different brain storm" }
{ "index" : { "_id" : "3" } }
{"my_field": "About brainstorming" }
{ "index" : { "_id" : "4" } }
{"my_field": "I had a storm in my brain" }
{ "index" : { "_id" : "5" } }
{"my_field": "I envisaged something like that" }
Below is query which i am trying:
GET test_index/_search
{
"query": {
"match": {
"my_field": {
"query": "brainstorm",
"analyzer": "my_search_analyzer"
}
}
}
}
Current Result:
"hits" : [
{
"_index" : "test_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.8185701,
"_source" : {
"my_field" : "A different brain storm"
}
},
{
"_index" : "test_index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.4100728,
"_source" : {
"my_field" : "I had a storm in my brain"
}
},
{
"_index" : "test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.90928507,
"_source" : {
"my_field" : "This is a brainstorm"
}
}
]
I am expecting document which is matching exect with query on top and document which is matching with synonyms should come with low score.
so here my expectation is document with value "This is a brainstorm" should come at position one.
Could you please suggest me how i can achive.
I have tried to applied boosting and weightage as well but no luck.
Thanks in advance !!!
Elasticsearch "replaces" every instance of a synonym all other synonyms, and does so on both indexing and searching (unless you provide a separate search_analyzer) so you're losing the exact token. To keep this information, use a subfield with standard analyzer and then use multi_match query to match either synonyms or exact value + boost the exact field.
I have got answer from Elastic Forum here. I have copied below for quick referance.
Hello there,
Since you are indexing synonyms into your inverted index, brain storm and brainstorm are all different tokens after analyzer does its thing. So Elasticsearch on query time uses your analyzer to create tokens for brain, storm and brainstorm from your query and match multiple tokens with indexes 2 and 4, your index 2 has lesser words so tf/idf scores it higher between the two and index number 1 only matches brainstorm.
You can also see what your analyzer does to your input with this;
POST test_index/_analyze
{
"analyzer": "my_search_analyzer",
"text": "I had a storm in my brain"
}
I did some trying out so, you should change your index analyzer to my_analyzer;
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"filter": {
"my_synonyms": {
"type": "synonym",
"synonyms": [
"mind, brain",
"brainstorm,brain storm"
]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase"
]
},
"my_search_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonyms"
]
}
}
}
}
},
"mappings": {
"properties": {
"my_field": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Then you want to boost your exact matches, but you also want to get hits from my_search_analyzer tokens as well so i have changed your query a bit;
GET test_index/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"my_field": {
"query": "brainstorm",
"analyzer": "my_search_analyzer"
}
}
},
{
"match_phrase": {
"my_field": {
"query": "brainstorm"
}
}
}
]
}
}
}
result:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 2.3491273,
"hits" : [
{
"_index" : "test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.3491273,
"_source" : {
"my_field" : "This is a brainstorm"
}
},
{
"_index" : "test_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.8185701,
"_source" : {
"my_field" : "A different brain storm"
}
}
]
}
}

Why can't I get Elasticsearch's completion suggester to sort based on a field?

I'm trying to get autocomplete suggestions from Elasticsearch, but sorted by an internal popularity score that I supply in the data, so that the most popular ones show at the top. My POST looks like this:
curl "http://penguin:9200/node/_search?pretty" --silent --show-error \
--header "Content-Type: application/json" \
-X POST \
-d '
{
"_source" : [
"name",
"popular_score"
],
"sort" : [ "popular_score" ],
"suggest" : {
"my_suggestion" : {
"completion" : {
"field" : "searchbar_suggest",
"size" : 10,
"skip_duplicates" : true
},
"text" : "f"
}
}
}
'
I get back valid autocomplete suggestions, but they aren't sorted by the popular_score field:
{
...
"suggest" : {
"my_suggestion" : [
{
"text" : "f",
"offset" : 0,
"length" : 1,
"options" : [
{
"text" : "2020 Fact Longlist",
"_index" : "node",
"_type" : "_doc",
"_id" : "245105",
"_score" : 1.0,
"_source" : {
"popular_score" : "35",
"name" : "2020 Fact Longlist"
}
},
{
"text" : "Fable",
"_index" : "node",
"_type" : "_doc",
"_id" : "125903",
"_score" : 1.0,
"_source" : {
"popular_score" : "69.33333333333333333333333333333333333333",
"name" : "Fable"
}
},
{
"text" : "Fables",
"_index" : "node",
"_type" : "_doc",
"_id" : "172986",
"_score" : 1.0,
"_source" : {
"popular_score" : "24",
"name" : "Fables"
}
}
...
]
}
]
}
}
My mappings are:
{
"mappings": {
"properties": {
"nodeid": {
"type": "integer"
},
"name": {
"type": "text",
"copy_to": "searchbar_suggest"
},
"popular_score": {
"type": "float"
},
"searchbar_suggest": {
"type": "completion"
}
}
}
}
What am I doing wrong?

filter with special character in ElasticSearch 6.0.0

I am trying to filter all data which contains some special character like '#', '.','/' etc. But not able to succeed.
I am willing to fetch the city which contains the # or dot(.), so i need a query which provide me the output that contains the special character.
I am quite new here in Elasticsearch query. So please help me.
Thanks
Below is index:
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [
{
"_index" : "student",
"_type" : "data",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "Mirja",
"city" : "pune # bandra",
"contact number" : 9723124343
}
},
{
"_index" : "student",
"_type" : "data",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Rohan",
"city" : "BBSR /. patia",
"contact number" : 9723124343
}
},
{
"_index" : "student",
"_type" : "data",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Diya",
"city" : "pune_bandra",
"contact number" : 9723124343
}
}
}
]
}
}```
You need to check the analyzer on your city field. If it's standard analyzer, it will remove special characters when creating tokens. Instead use the below mapping on city field and search using a regular match query
PUT test_index
{
"mappings": {
"properties": {
"city": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "whitespace"
}
}
}
}
}

Elasticsearch Highlights document randomly

When i run a search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query":"Rose wa",
"fuzziness": "AUTO"
}
}
}
]
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
then following documents come in matches
"_index" : "product",
"_type" : "_doc",
"_id" : "52486",
"_score" : 19.770897,
"_source" : {",
"category_code" : "personalcare",
"name" : "Nivea Rosewater Face Wash (100 Ml)",
"category_name" : "Personal Care "
}
},
{
"_index" : "product",
"_type" : "_doc",
"_id" : "120830",
"_score" : 17.775726,
"_source" : {
"category_code" : "beverages",
"name" : "3 Roses 500G",
"category_name" : "Beverages"
},
"highlight" : {
"name" : [
"3 <em>Roses</em> 500G"
]
}
},
why some document have highlighted fields while other don't?
and how do i ensure that highlight is always present for matched documents

Returning all documents when query string is empty

Say I have the following mapping:
{
'properties': {
{'title': {'type': 'text'},
{'created': {'type': 'text'}}
}
}
Sometimes the user will query by created, and sometimes by title and created. In both cases I want the query JSON to be as similar as possible. What's a good way to create a query that filters only by created when the user is not using the title to query?
I tried something like:
{
bool: {
must: [
{range: {created: {gte: '2010-01-01'}}},
{query: {match_all: {}}}
]
}
}
But that didn't work. What would be the best way of writing this query?
Your query didn't work cause created is of type text and not date, range queries on string dates will not work as expected, you should change your mappings from type text to date and reindex your data.
Follow this to reindex your data (with the new mappings) step by step.
Now if I understand correctly you want to use a generic query which filters title or/and created depending on the user input.
In this case, my suggestion is to use Query String.
An example (version 7.4.x):
Mappings
PUT my_index
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"created": { -------> change type to date instead of text
"type": "date"
}
}
}
}
Index a few documents
PUT my_index/_doc/1
{
"title":"test1",
"created": "2010-01-01"
}
PUT my_index/_doc/2
{
"title":"test2",
"created": "2010-02-01"
}
PUT my_index/_doc/3
{
"title":"test3",
"created": "2010-03-01"
}
Search Query (created)
GET my_index/_search
{
"query": {
"query_string": {
"query": "created:>=2010-02-01",
"fields" : ["created"]
}
}
}
Results
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "test2",
"created" : "2010-02-01"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "test3",
"created" : "2010-03-01"
}
}]
Search Query (title)
GET my_index/_search
{
"query": {
"query_string": {
"query": "test2",
"fields" : ["title"]
}
}
}
Results
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9808292,
"_source" : {
"title" : "test2",
"created" : "2010-02-01"
}
}
]
Search Query (title and created)
GET my_index/_search
{
"query": {
"query_string": {
"query": "(created:>=2010-02-01) AND test3"
}
}
}
Results
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808292,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.9808292,
"_source" : {
"title" : "test3",
"created" : "2010-03-01"
}
}
]
fields in query string - you can mention both fields. if you remove fields then the query will apply on all fields in your mappings.
Hope this helps

Resources