Using Email tokenizer in elasticsearch - elasticsearch

Did try some examples from elasticsearch documentation and from google but nothing helped in figuring out..
just a sample data I have is just few blog posts. I am trying to see all posts with email address. When I use "email":"someone" I see all the posts matching someone but when I change to use someone#gmail.com nothing shows up!
"hits": [
{
"_index": "blog",
"_type": "post",
"_id": "2",
"_score": 1,
"_source": {
"user": "sreenath",
"email": "someone#gmail.com",
"postDate": "2011-12-12",
"body": "Trying to figure out this",
"title": "Elastic search testing"
}
}
]
when I use Get query is as shown below, I see all posts matching someone#anything.com. But I want to change this
{ "term" : { "email" : "someone" }} to { "term" : { "email" : "someone#gmail.com" }}
GET blog/post/_search
{
"query" : {
"filtered" : {
"filter" : {
"and" : [
{ "term" :
{ "email" : "someone" }
}
]
}
}
}
}
I did the curl -XPUT for the following, but did not help
curl -XPUT localhost:9200/test/ -d '
{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^#]+)",
"(\\p{L}+)",
"(\\d+)",
"#(.+)"
]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "email", "lowercase", "unique" ]
}
}
}
}
}
'

You have created a custom analyzer for email addresses but you are not using it. You need to declare the email field in your mapping type to actually use that analyzer, like below. Also make sure to create the right index with that analyzer, i.e. blog and not test
change this
|
v
curl -XPUT localhost:9200/blog/ -d '{
"settings" : {
"analysis" : {
"filter" : {
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^#]+)",
"(\\p{L}+)",
"(\\d+)",
"#(.+)"
]
}
},
"analyzer" : {
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [ "email", "lowercase", "unique" ]
}
}
}
},
"mappings": { <--- add this
"post": {
"properties": {
"email": {
"type": "string",
"analyzer": "email"
}
}
}
}
}
'

Related

ELK bool query with match and prefix

I'm new in ELK. I have a problem with the followed search query:
curl --insecure -H "Authorization: ApiKey $ESAPIKEY" -X GET "https://localhost:9200/commsrch/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"should" : [
{"match" : {"cn" : "franc"}},
{"prefix" : {"srt" : "99889300200"}}
]
}
}
}
'
I need to find all documents that satisfies the condition: OR field "cn" contains "franc" OR field "srt" starts with "99889300200".
Index mapping:
{
"commsrch" : {
"mappings" : {
"properties" : {
"addr" : {
"type" : "text",
"index" : false
},
"cn" : {
"type" : "text",
"analyzer" : "compname"
},
"srn" : {
"type" : "text",
"analyzer" : "srnsrt"
},
"srt" : {
"type" : "text",
"analyzer" : "srnsrt"
}
}
}
}
}
Index settings:
{
"commsrch" : {
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "commsrch",
"creation_date" : "1675079141160",
"analysis" : {
"filter" : {
"ngram_filter" : {
"type" : "ngram",
"min_gram" : "3",
"max_gram" : "4"
}
},
"analyzer" : {
"compname" : {
"filter" : [
"lowercase",
"stop",
"ngram_filter"
],
"type" : "custom",
"tokenizer" : "whitespace"
},
"srnsrt" : {
"type" : "custom",
"tokenizer" : "standard"
}
}
},
"number_of_replicas" : "1",
"uuid" : "C15EXHnaTIq88JSYNt7GvA",
"version" : {
"created" : "8060099"
}
}
}
}
}
Query works properly with just only one condition. If query has only "match" condition, results has properly documents count. If query has only "prefix" condition, results has properly documents count.
In case of two conditions "match" and "prefix", i see in result documents that corresponds only "prefix" condition.
In ELK docs can't find any limitation about mixing "prefix" and "match", but as i see some problem exists. Please help to find where is the problem.
In continue of experince I have one more problem.
Example:
Source data:
1st document cn field: "put stone is done"
2nd document cn field:: "job one or two"
Mapping and index settings the same as described in my first post
Request:
{
"query": {
"bool": {
"should" : [
{"match" : {"cn" : "one"}},
{"prefix" : {"cn" : "one"}}
]
}
}
}
'
As I understand, the high scores got first document, because it has more repeats of "one". But I need high scores for documents, that has at least one word in field "cn" started from string "one". I have experiments with query:
{
"query": {
"bool": {
"should": [
{"match": {"cn": "one"}},
{
"constant_score": {
"filter": {
"prefix": {
"cn": "one"
}
},
"boost": 100
}
}
]
}
}
}
But it doesn't work properly. What's wrong with my query?

Query Elasticsearch index for words with and without accent

I query for the word "café" and get 20 articles. Then I repeat the search for the word "cafe" and will only get 3 articles. So I'm looking for a possibility to handle words with letters with accent in the same way like words with letters without accent.
My problem is also, that I already have a filled index so I have to modify an existing system. I'm using Elasticsearch 6.5.
I found some useful information and went through the following steps:
Setting up folding analyzer
curl -H "Content-Type: application/json" --user <user:pass> -XPUT http://localhost/test/_settings?pretty -d '{
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}'
Modify existing mapping for the content field
curl -H "Content-Type: application/json" --user <user:pass> -XPUT http://localhost/test/mytype/_mapping -d '{
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"folded" : {
"type" : "text",
"analyzer" : "folding"
}
}
}
}
}'
Do the search
curl -H "Content-Type: application/json" --user <user:pass> -XGET http://localhost/test/_search -d '{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "cafe"
}
}
]
}
},
"size" : 10,
"from" : 0
}'
But it's the same effect like before: I only find the articles with "cafe", not also the articles with "café". Is there something I miss?
Great start! You have created a new analyzer and changed your mapping, however, you also now need to reindex your data in order to fill in the new content.folded field.
You can do it very easily by calling the update by query endpoint like this:
curl --user <user:pass> -XPOST http://localhost/test/_update_by_query
In your search query you should mention content.folded, folding analyzer is assigned to content.folded and not content.
After a mappings update you will have to reindex your data in order to apply the change.
Reindex step by step Reindex
A working example:
Mappings
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"content": {
"type": "text",
"fields": {
"folded": {
"type": "text",
"analyzer": "folding"
}
}
}
}
}
}
}
Inserting few documents
POST my_index/_doc/1
{
"content":"café"
}
POST my_index/_doc/2
{
"content":"cafe"
}
Search Query
GET my_index/_search
{
"query": {
"match": {
"content.folded": "cafe"
}
}
}
Results
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.18232156,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.18232156,
"_source" : {
"content" : "café"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.18232156,
"_source" : {
"content" : "cafe"
}
}
]
}
Hope this helps

How to create and add custom analyser in Elastic search?

i have a batch of "smartphones" products in my ES and I need to query them by using "smart phone" text. So I m looking into the compound word token filter. Specifically , I m planning to use a custom filter like this:
curl -XPUT 'localhost:9200/_all/_settings -d '
{
"analysis" : {
"analyzer":{
"second":{
"type":"custom",
"tokenizer":"standard",
"filter":["myFilter"]
}
"filter": {
"myFilter" :{
"type" : "dictionary_decompounder"
"word_list": ["smart", "phone"]
}
}
}
}
}
'
Is this the correct approach ? Also I d like to ask you how can i create and add the custom analyser to ES? I looked into several links but couldn't figure out how to do it. I guess I m looking for the correct syntax.
Thank you
EDIT
I m running 1.4.5 version.
and I verified that the custom analyser was added successfully:
{
"test_index" : {
"settings" : {
"index" : {
"creation_date" : "1453761455612",
"analysis" : {
"filter" : {
"myFilter" : {
"type" : "dictionary_decompounder",
"word_list" : [ "smart", "phone" ]
}
},
"analyzer" : {
"second" : {
"type" : "custom",
"filter" : [ "lowercase", "myFilter" ],
"tokenizer" : "standard"
}
}
},
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "1040599"
},
"uuid" : "xooKEdMBR260dnWYGN_ZQA"
}
}
}
}
Your approach looks good, I would also consider adding lowercase token filter, so that even Smartphone (notice Uppercase 'S') will be split into smart and phone.
Then You could create index with analyzer like this,
curl -XPUT 'localhost:9200/your_index -d '
{
"settings": {
"analysis": {
"analyzer": {
"second": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"myFilter"
]
}
},
"filter": {
"myFilter": {
"type": "dictionary_decompounder",
"word_list": [
"smart",
"phone"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "second"
}
}
}
}
}
'
Here you are creating index named your_index, custom analyzer named second and applied that to name field.
You can check if the analyzer is working as expected with analyze api like this
curl -XGET 'localhost:9200/your_index/_analyze' -d '
{
"analyzer" : "second",
"text" : "LG Android smartphone"
}'
Hope this helps!!

Elasticsearch "failed to find analyzer"

I have created a synonym analyser on an index:
curl http://localhost:9200/test_index/_settings?pretty
{
"test_index" : {
"settings" : {
"index" : {
"creation_date" : "1429175067557",
"analyzer" : {
"search_synonyms" : {
"filter" : [ "lowercase", "search_synonym_filter" ],
"tokenizer" : "standard"
}
},
"uuid" : "Zq6Id8xsRWGofJrNCb7M8w",
"number_of_replicas" : "1",
"analysis" : {
"filter" : {
"search_synonym_filter" : {
"type" : "synonym",
"synonyms" : [ "sneakers,pumps" ]
}
}
},
"number_of_shards" : "5",
"version" : {
"created" : "1050099"
}
}
}
}
}
But when I try to use it with the mapping:
curl -XPUT 'http://localhost:9200/test_index/_mapping/product_catalog?pretty' -H "Content-Type: application/json" \
-d '{"product_catalog": {"properties" : {"name": {"type": "string", "include_in_all": true, "analyzer":"search_synonyms"} }}}'
I get the error:
{
"error" : "MapperParsingException[Analyzer [search_synonyms] not found for field [name]]",
"status" : 400
}
I have also tried to just check the analyser with:
curl 'http://localhost:9200/test_index/_analyze?analyzer=search_synonyms&pretty=1&text=pumps'
but still get an error:
ElasticsearchIllegalArgumentException[failed to find analyzer [search_synonyms]]
Any ideas, I may be missing something but I can't think what.
The analyzer element has to be inside your analysis component. Change your index creator as follows:
{
"settings": {
"index": {
"creation_date": "1429175067557",
"uuid": "Zq6Id8xsRWGofJrNCb7M8w",
"number_of_replicas": "0",
"analysis": {
"filter": {
"search_synonym_filter": {
"type": "synonym",
"synonyms": [
"sneakers,pumps"
]
}
},
"analyzer": {
"search_synonyms": {
"filter": [
"lowercase",
"search_synonym_filter"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "5",
"version": {
"created": "1050099"
}
}
}
}

ElasticSearch : search and return nested type

I am pretty new to ElasticSearch and I am having trouble using nested mapping / query.
I have the following data structure added to my index :
{
"_id": "3",
"_rev": "6-e9e1bc15b39e333bb4186de05ec1b167",
"skuCode": "test",
"name": "Dragon vol. 1",
"pages": [
{
"id": "1",
"tags": [
{
"name": "dragon"
},
{
"name": "japonese"
}
]
},
{
"id": "2",
"tags": [
{
"name": "tagforanotherpage"
}
]
}
]
}
This index mapping is defined as bellow :
{
"metabook" : {
"metabook" : {
"properties" : {
"_rev" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"pages" : {
"type" : "nested",
"properties" : {
"tags" : {
"properties" : {
"name" : {
"type" : "string"
}
}
}
}
},
"skuCode" : {
"type" : "string"
}
}
}
}
}
My goal is to search all pages containing a specific tag, and return the book object with the filtered page list (I would like ES to return only pages that match the given tag). Something like (ignoring the second page) :
{
"_id": "3",
"_rev": "6-e9e1bc15b39e333bb4186de05ec1b167",
"skuCode": "test",
"name": "Dragon vol. 1",
"pages": [
{
"id": "1",
"tags": [
{
"name": "dragon"
},
{
"name": "japonese"
}
]
}
]
}
Here is the query I actually use :
{
"from": 0,
"size": 10,
"query" : {
"nested" : {
"path" : "pages",
"score_mode" : "avg",
"query" : {
"term" : { "tags.name" : "japonese" }
}
}
}
}
But it actually returns an empty result. What am I doing wrong ? Maybe I should index my "pages" directly instead of books ? What am I missing ?
Thank you in advance !
Sadly you can't get back only parts of the a document. If the document matches a query, you will get the whole thing back; the root and all nested docs. If you want to get only parts back, then you could look at using parent/child docs.
Also you aren't seeing any hits as you have a small syntax error in the nested query. Look closely at the field name:
{
"from": 0,
"size": 10,
"query" : {
"nested" : {
"path" : "pages",
"score_mode" : "avg",
"query" : {
"term" : { "pages.tags.name" : "japonese" }
}
}
}
}
If you need help with parent child docs feel free to ask! (There should be examples if you do a google search)
Good luck!

Resources