I want to pass a list of emails in Elastic Search Query, So I tried below query to achieve that, but didn't get any result.
{
"query": {
"terms": {
"email": [ "andrew#gmail.com", "michel#gmail.com" ]
}
}
}
When I used id instead of emails, that worked !
{
"query": {
"terms": {
"id": [ 43, 67 ]
}
}
}
Could you please explain what's wrong with my email query and how make it works
If you want to recognize email addresses as single tokens you should use uax_url_email tokenizer.
UAX URL Email Tokenizer
A working example:
Mappings
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_email_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer",
"filter": ["lowercase", "stop"]
}
},
"tokenizer": {
"my_tokenizer":{
"type": "uax_url_email"
}
}
}
},
"mappings": {
"properties": {
"email": {
"type": "text",
"analyzer": "my_email_analyzer",
"search_analyzer": "my_email_analyzer",
"fields": {
"keyword":{
"type":"keyword"
}
}
}
}
}
}
POST few documents
POST my_index/_doc/1
{
"email":"andrew#gmail.com"
}
POST my_index/_doc/2
{
"email":"michel#gmail.com"
}
Search Query
GET my_index/_search
{
"query": {
"multi_match": {
"query": "andrew#gmail.com michel#gmail.com",
"fields": ["email"]
}
}
}
Results
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"email" : "andrew#gmail.com"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"email" : "michel#gmail.com"
}
}
]
}
Another option is to use keyword type.
Search Query
GET my_index/_search
{
"query": {
"terms": {
"email.keyword": [
"andrew#gmail.com",
"michel#gmail.com"
]
}
}
}
In my opinion using the uax_url_email tokenizer is a better solution.
Hope this helps
Related
I would like to apply any analyser that satisfy below search. Let's take an example. Suppose I have entered below text in a document
I have store similar kind of sentence as specialization in opensearch.
Cardiologist Doctor.
Cardiac surgeon.
neuro surgeon.
cardiac specialist.
nursing care
Anatomy.
Anaesthesiology.
So, if I search cardiac surgeon result should be ['cardiologist', 'cardiac surgeon', 'cardiac specialist'] and it should not return 'neuro surgeon', 'nursing care'.
Also, if I search anatomy result should be ['anatomoy'] and it should not return Anaesthesiology.
I have tried with ngram_filter, but when I search cardiologist it's returning cardiologist and nursing care both instead of cardiologist only.
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
},
My suggestion using synonyms:
PUT synonyms
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonyms_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonyms_filter"
]
}
},
"filter": {
"synonyms_filter": {
"type": "synonym",
"synonyms": [
"cardiac surgeon, cardiologist, cardiac surgeon, cardiac specialist"
]
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"search_analyzer": "synonyms_analyzer"
}
}
}
}
POST _bulk
{ "index" : { "_index" : "synonyms", "_id" : "1"}}
{ "name" : "Cardiac surgeon" }
{ "index" : { "_index" : "synonyms", "_id" : "2"}}
{ "name" : "Cardiologist Doctor" }
{ "index" : { "_index" : "synonyms", "_id" : "3"}}
{ "name" : "neuro surgeon" }
{ "index" : { "_index" : "synonyms", "_id" : "4"}}
{ "name" : "cardiac specialist" }
{ "index" : { "_index" : "synonyms", "_id" : "5"}}
{ "name" : "nursing care" }
{ "index" : { "_index" : "synonyms", "_id" : "6"}}
{ "name" : "Anatomy" }
{ "index" : { "_index" : "synonyms", "_id" : "7"}}
{ "name" : "Anaesthesiology" }
GET synonyms/_search
{
"query": {
"match": {
"name": "cardiac surgeon"
}
}
}
Hits:
"hits": [
{
"_index": "synonyms",
"_id": "1",
"_score": 13.066887,
"_source": {
"name": "Cardiac surgeon"
}
},
{
"_index": "synonyms",
"_id": "4",
"_score": 7.9681025,
"_source": {
"name": "cardiac specialist"
}
},
{
"_index": "synonyms",
"_id": "2",
"_score": 1.567127,
"_source": {
"name": "Cardiologist Doctor"
}
}
]
I had used Elasticsearch few years ago(version 6.4.0) and they had no provision to provide highlight the "copy_to" field. I would like to know if they have this provision now?
Yes, highlight can enabled on copy_to field in latest version.
Please check below example which i have tried.
Index Mapping:
PUT my-index-000001
{
"mappings": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
Document Index:
PUT my-index-000001/_doc/1
{
"first_name": "John",
"last_name": "Smith"
}
Query:
GET my-index-000001/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
},
"highlight": {
"fields": {
"full_name": {}
}
}
}
Result:
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"first_name" : "John",
"last_name" : "Smith"
},
"highlight" : {
"full_name" : [
"<em>Smith</em>",
"<em>John</em>"
]
}
}
]
Update 1: Search using copy_to field and highlight match to particular field
In below example, search will be happen on full_name field which is copy field and highlight will be happen on first_name field.
Query:
GET my-index-000001/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
},
"highlight": {
"require_field_match": "false",
"fields": {
"first_name": {}
}
}
}
Result:
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"first_name" : "John",
"last_name" : "Smith"
},
"highlight" : {
"first_name" : [
"<em>John</em>"
]
}
}
I have an index set up like so:
PUT items
{
"settings": {
"index": {
"sort.field": ["popularity", "title_keyword"],
"sort.order": ["desc", "asc"]
},
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"tokenizer": "lowercase"
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 15,
"token_chars": [
"letter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search"
},
"title_keyword": {
"type": "keyword"
},
"popularity": {
"type": "integer"
},
"visibility": {
"type": "keyword"
}
}
}
}
With the following data:
POST items/_doc/1
{
"title": "The Arbor",
"popularity": 5,
"title_keyword": "The Arbor",
"visibility": "public"
}
POST items/_doc/2
{
"title": "The Canon",
"popularity": 10,
"title_keyword": "The Canon",
"visibility": "public"
}
POST items/_doc/3
{
"title": "The Brew",
"popularity": 15,
"title_keyword": "The Brew",
"visibility": "public"
}
I run this query on the data:
GET items/_search
{
"size": 3,
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "the",
"operator": "and"
}
}
},
{
"match": {
"visibility": "public"
}
}
]
}
},
"highlight": {
"pre_tags": ["<mark>"],
"post_tags": ["</mark>"],
"fields": {
"title": {}
}
}
}
It seems to match the records correctly on the word the but the sorting does not seem to work. I would expect it to be sorted by popularity as defined and the results would be The Arbor, The Brew, The Canon in that order but the results I get are as follows:
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.27381438,
"hits" : [
{
"_index" : "items",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.27381438,
"_source" : {
"title" : "The Brew",
"popularity" : 15,
"title_keyword" : "The Brew",
"visibility" : "public"
},
"highlight" : {
"title" : [
"<mark>The</mark> Brew"
]
}
},
{
"_index" : "items",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.26392496,
"_source" : {
"title" : "The Arbor",
"popularity" : 5,
"title_keyword" : "The Arbor",
"visibility" : "public"
},
"highlight" : {
"title" : [
"<mark>The</mark> Arbor"
]
}
},
{
"_index" : "items",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.26392496,
"_source" : {
"title" : "The Canon",
"popularity" : 10,
"title_keyword" : "The Canon",
"visibility" : "public"
},
"highlight" : {
"title" : [
"<mark>The</mark> Canon"
]
}
}
]
}
}
Does defining the sort fields and orders when creating the index, under the settings, automatically sort the results? It seems to be sorting by score and not the popularity. If I include the sort options in the query it gives me the correct results back:
GET items/_search
{
"size": 3,
"sort": [
{
"popularity": {
"order": "desc"
}
},
{
"title_keyword": {
"order": "asc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "the",
"operator": "and"
}
}
},
{
"match": {
"visibility": "public"
}
}
]
}
},
"highlight": {
"pre_tags": ["<mark>"],
"post_tags": ["</mark>"],
"fields": {
"title": {}
}
}
}
I read that including the sort in the query like this is inefficient and to include it in the settings. Am I not doing something when creating the index to make it sort by popularity by default? Does including the sort options in the query result in inefficient queries? Or do I actually need to include it in every query?
Hopefully this makes sense! Thanks
Index sorting defines how segments are sorted in a shard, this is not related to the sorting of search results. You can use a sorted index, if you often have searches that are sorted with the same criteria, then the index sort speeds up the search.
If your search has a different sort than the index or no sort at all, the index sort is not relevant.
Please see the documentation for index sorting and especially the part that explains how index sorting is used.
I'm having trouble setting up a search_as_you_type field with highlighting following the guide here https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-as-you-type.html
I'll leave a series of commands to reproduce what I'm seeing. Hopefully somebody can weigh in on what I'm missing :)
create mapping
PUT /test_index
{
"mappings": {
"properties": {
"plain_text": {
"type": "search_as_you_type",
"index_options": "offsets",
"term_vector": "with_positions_offsets"
}
}
}
}
insert document
POST /test_index/_doc
{
"plain_text": "This is some random text"
}
search for document
GET /snippets_test/_search
{
"query": {
"multi_match": {
"query": "rand",
"type": "bool_prefix",
"fields": [
"plain_text",
"plain_text._2gram",
"plain_text._3gram",
"plain_text._index_prefix"
]
}
},
"highlight" : {
"fields" : [
{
"plain_text": {
"number_of_fragments": 1,
"no_match_size": 100
}
}
]
}
}
response
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_index",
"_type" : "_doc",
"_id" : "rLZkjm8BDC17cLikXRbY",
"_score" : 1.0,
"_source" : {
"plain_text" : "This is some random text"
},
"highlight" : {
"plain_text" : [
"This is some random text"
]
}
}
]
}
}
The response I get back does not have the highlighting I expect
Idealy the highlight is: This is some <em>ran</em>dom text
In order to achieve highlighting of n-grams (chars) you'll need:
a custom ngram tokenizer. By default the maximum difference between min_gram and max_gram is 1, so in my example highlighting will work only for the search terms with length 3 or 4. You can change this and creating more n-grams by setting a higher value for index.max_ngram_diff .
a custom analyzer based on the custom tokenizer
in mapping add "plain_text.highlight" field
Here's the configuration:
{
"settings": {
"analysis": {
"analyzer": {
"partial_words" : {
"type": "custom",
"tokenizer": "ngrams",
"filter": ["lowercase"]
}
},
"tokenizer": {
"ngrams": {
"type": "ngram",
"min_gram": 3,
"max_gram": 4
}
}
}
},
"mappings": {
"properties": {
"plain_text": {
"type": "text",
"fields": {
"shingles": {
"type": "search_as_you_type"
},
"ngrams": {
"type": "text",
"analyzer": "partial_words",
"search_analyzer": "standard",
"term_vector": "with_positions_offsets"
}
}
}
}
}
}
the query:
{
"query": {
"multi_match": {
"query": "rand",
"type": "bool_prefix",
"fields": [
"plain_text.shingles",
"plain_text.shingles._2gram",
"plain_text.shingles._3gram",
"plain_text.shingles._index_prefix",
"plain_text.ngrams"
]
}
},
"highlight" : {
"fields" : [
{
"plain_text.ngrams": { }
}
]
}
}
and the result:
"hits": [
{
"_index": "test_index",
"_type": "_doc",
"_id": "FkHLVHABd_SGa-E-2FKI",
"_score": 2,
"_source": {
"plain_text": "This is some random text"
},
"highlight": {
"plain_text.ngrams": [
"This is some <em>rand</em>om text"
]
}
}
]
Note: in some cases, this config might be expensive for memory usage and storage.
I got elasticsearch version 7.3 and two indexes, profiles and purchases,
here is their mappings:
\purchases
{
"purchases": {
"mappings": {
"properties": {
"product": {
"type": "keyword"
},
"profile": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"profiles": "purchases"
}
}
}
}
}
}
\profiles
{
"profiles": {
"mappings": {
"properties": {
"user": {
"type": "keyword"
}
}
}
}
}
I added one profile with user:abc, _id:1 and two purchases this way
{
"profile": {"name": "profiles", "parent": "1"},
"product" : "tomato",
}
{
"profile": {"name": "profiles", "parent": "1"},
"product" : "tomato 2",
}
Then I do search query for purchases
{
"query": {
"has_parent": {
"parent_type": "profiles",
"query": {
"query_string": {
"query": "user:abc"
}
}
}
}
}
And I get empty result, what is wrong?
As stated in the documentation of the Join datatype you can not create parent-child-relationships over multiple indices:
The join datatype is a special field that creates parent/child relation within documents of the same index.
If you would like to use the join datatype, you have to model it in one index.
UPDATE
This is how your mapping and the Indexing of the documents would look like:
PUT profiles-purchases-index
{
"mappings": {
"properties": {
"user":{
"type": "keyword"
},
"product":{
"type": "keyword"
},
"profile":{
"type": "join",
"relations":{
"profiles": "purchases"
}
}
}
}
}
Index parent document:
PUT profiles-purchases-index/_doc/1
{
"user": "abc",
"profile": "profiles"
}
Index child documents:
PUT profiles-purchases-index/_doc/2?routing=1
{
"product": "tomato",
"profile":{
"name": "purchases",
"parent": 1
}
}
PUT profiles-purchases-index/_doc/3?routing=1
{
"product": "tomato 2",
"profile":{
"name": "purchases",
"parent": 1
}
}
Run Query:
GET profiles-purchases-index/_search
{
"query": {
"has_parent": {
"parent_type": "profiles",
"query": {
"match": {
"user": "abc"
}
}
}
}
}
Response:
{
...
"hits" : [
{
"_index" : "profiles-purchases-index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"product" : "tomato",
"profile" : {
"name" : "purchases",
"parent" : 1
}
}
},
{
"_index" : "profiles-purchases-index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"product" : "tomato 2",
"profile" : {
"name" : "purchases",
"parent" : 1
}
}
}
]
}
}
Notice that you have to set the routing parameter to index the child documents. But please refer to the documentation for that.