Given the following document,
{
"domain": "www.example.com",
"tag": [
{
"name": "IIS"
},
{
"name": "Microsoft ASP.NET"
}
]
}
When I launch a query for asp or asp.net I would like to see the Microsoft ASP.NET document in the result set.
So I need a lower case analyzer and remove the . character from word delimiters, so I tried the following mapping,
curl -XPUT http://localhost:9200/tag-test -d '{
"settings": {
"analysis": {
"filter": {
"domain_filter": {
"type": "word_delimiter",
"type_table": [". => ALPHANUM", ": => ALPHANUM"]
}
},
"analyzer": {
"domain_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": ["lowercase", "domain_filter"]
}
}
}
},
"mappings": {
"assets": {
"properties": {
"domain": {
"type": "string",
"analyzer": "domain_analyzer"
},
"tag": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"analyzer": "domain_analyzer"
}
}
}
}
}
}
}'; echo
Then I tried the following queries, all of which yield an empty result
tag.name:asp
tag.name:asp.net
tag.name:*asp*
I'm using querystring query,
curl http://localhost:9200/tag-test/_search?q=tag.name:asp
Any ideas?
First of all the query_string query doesn't have support for nested queries and unless your use include_in_parent: true (which will flatten the nested field in an array in the parent document) in your mapping, the query_string will not work ever.
Secondly, with your analyzer you will have asp.net as a term being indexed in Elasticsearch. Which means the query_string will work with tag.name:asp.net and tag.name:*asp*. I recommend not to use a leading wildcard though.
So, in the end your test should be:
PUT /tag-test
{
"settings": {
"analysis": {
"filter": {
"domain_filter": {
"type": "word_delimiter",
"type_table": [
". => ALPHANUM",
": => ALPHANUM"
]
}
},
"analyzer": {
"domain_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"domain_filter"
]
}
}
}
},
"mappings": {
"assets": {
"properties": {
"domain": {
"type": "string",
"analyzer": "domain_analyzer"
},
"tag": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "string",
"analyzer": "domain_analyzer"
}
}
}
}
}
}
}
Notice "include_in_parent": true in the mapping for tag.
Then the query should be:
curl -XGET "http://localhost:9200/tag-test/_search?q=tag.name:asp*"
curl -XGET "http://localhost:9200/tag-test/_search?q=tag.name:asp.net"
Related
How can I force Elasticsearch query_string to recognize '#' as a simple character?
Assuming I have an Index, and I added a few documents, by this statement:
POST test/item/_bulk
{"text": "john.doe#gmail.com"}
{"text": "john.doe#outlook.com"}
{"text": "john.doe#gmail.com, john.doe#outlook.com"}
{"text": "john.doe[at]gmail.com"}
{"text": "john.doe gmail.com"}
I want this search:
GET test/item/_search
{
"query":
{
"query_string":
{
"query": "*#gmail.com",
"analyze_wildcard": "true",
"allow_leading_wildcard": "true",
"default_operator": "AND"
}
}
}
to return only the first and third documents.
I tried 3 kinds of mapping:
First i tried -
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"email_analyzer": {
"tokenizer": "email_tokenizer"
}
},
"tokenizer": {
"email_tokenizer": {
"type": "uax_url_email"
}
}
}
},
"mappings": {
"item": {
"properties": {
"text": {
"type": "string",
"analyzer": "email_analyzer"
}
}
}
}
}
than i tried -
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "whitespace"
}
}
}
},
"mappings": {
"item": {
"properties": {
"text": {
"type": "string",
"analyzer": "my_analyzer"
}
}
}
}
}
and i also tried this one -
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "whitespace"
}
}
}
},
"mappings": {
"item": {
"properties": {
"text": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
None of the above worked, actually they all returned all the documents.
Is there an analyzer/tokenizer/parameter that will make Elasticsearch to acknowledge the '#' sign like it does with any other character
This is working with your last setting, by putting the text to not analyze:
GET test/item/_search
{
"query":
{
"wildcard":
{
"text": "*#gmail.com*"
}
}
}
When using not analyzed field, you should use Term level query but not Full-Text level query: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/term-level-queries.html
I've got an Elasticsearch v5 index set up for mapping config hashes to URLs.
{
"settings": {
"analysis": {
"analyzer": {
"url-analyzer": {
"type": "custom",
"tokenizer": "url-tokenizer"
}
},
"tokenizer": {
"url-tokenizer": {
"type": "path_hierarchy",
"delimiter": "/"
}
}
}
},
"mappings": {
"route": {
"properties": {
"uri": {
"type": "string",
"index": "analyzed",
"analyzer": "url-analyzer"
},
"config": {
"type": "object"
}}}}}
I would like to match the longest path prefix with the highest score, so that given the documents
{ "uri": "/trousers/", "config": { "foo": 1 }}
{ "uri": "/trousers/grey", "config": { "foo": 2 }}
{ "uri": "/trousers/grey/lengthy", "config": { "foo": 3 }}
when I search for /trousers, the top result should be trousers, and when I search for /trousers/grey/short the top result should be /trousers/grey.
Instead, I'm finding that the top result for /trousers is /trousers/grey/lengthy.
How can I index and query my documents to achieve this?
I have one solution, after drinking on it: what if we treat the URI in the index as a keyword, but still use the PathHierarchyTokenizer on the search input?
Now we store the following docs:
/trousers
/trousers/grey
/trousers/grey/lengthy
When we submit a query for /trousers/grey/short, the search_analyzer can build the input [trousers, trousers/grey, trousers/grey/short].
The first two of our documents will match, and we can trivially select the longest match using a custom sort.
Now our mapping document looks like this:
{
"settings": {
"analysis": {
"analyzer": {
"uri-analyzer": {
"type": "custom",
"tokenizer": "keyword"
},
"uri-query": {
"type": "custom",
"tokenizer": "uri-tokenizer"
}
},
"tokenizer": {
"uri-tokenizer": {
"type": "path_hierarchy",
"delimiter": "/"
}
}
}},
"mappings": {
"route": {
"properties": {
"uri": {
"type": "text",
"fielddata": true,
"analyzer": "uri-analyzer",
"search_analyzer": "uri-query"
},
"config": {
"type": "object"
}
}
}
}
}
```
and our query looks like this:
{
"sort": {
"_script": {
"script": "doc.uri.length",
"order": "asc",
"type": "number"
}
},
"query": {
"match": {
"uri": {
"query": "/trousers/grey/lengthy",
"type": "boolean"
}
}
}
}
Hello all i am facing two problems in ES
I have a 'city' 'New York' in ES now i want to write a term filter such that if given string exactly matches "New York" then only it returns but what is happening is that when my filter matches "New" OR "York" for both it returns "New York" but it is not returning anything for "New York" my mapping is given below please tell me which analyzer or tokenizer should i use inside mapping
Here are the settings and mapping:
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
}
}
}
},
mappings : {
"restaurant" : {
properties:{
address : {
properties:{
city : {"type" : "string", "analyzer": "synonym"},
}
}
}
}
Second problem is that when i am trying to use wildcard query on lowercase example "new*" then ES is not returning not anything but when i am trying to search uppercase example "New*" now it is returning "New York" now i in this second case i want to write my city mappings such that when i search for lowercase or uppercase for both ES returns the same thing i have seen ignore case and i have set it to false inside synonyms but still i am not able to search for both lowercase and uppercases.
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true // See here
}
I believe you didn't provide enough details, but hoping that my attempt will generate questions from you, I will post what I believe it should be a step forward:
The mapping:
PUT test
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym"
]
},
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true
}
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"address": {
"properties": {
"city": {
"type": "string",
"analyzer": "synonym",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
},
"raw_ignore_case": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
}
}
Test data:
POST /test/restaurant/1
{
"address": {"city":"New York"}
}
POST /test/restaurant/2
{
"address": {"city":"new york"}
}
Query for the first problem:
GET /test/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"address.city.raw": "New York"
}
}
}
}
}
Query for the second problem:
GET /test/restaurant/_search
{
"query": {
"query_string": {
"query": "address.city.raw_ignore_case:new*"
}
}
}
I am trying to setup stemming in my ES mapping. I pass the name of the stemming analyzer through the indexed document (de_analyzer)
I observe that the below mapping properly adds the stemmed terms to the index, but now I can no longer search for the unstemmed terms. No matches are returned. It seems that only the stemmed terms are indexed?
This is the index configuration showing the filter, index analyzer, search analyzer and field configuration.
What am I overseeing?
Thanks!
{
"globalfashionmonitor": {
"template": "myindex*",
"settings": {
"index.number_of_shards": 5,
"default_search": "analyzer_search",
"analysis": {
"filter": {
"de_stem_filter": {
"type": "stemmer",
"name": "minimal_german"
}
}
},
"analyzer": {
"analyzer_search": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": [
"icu_folding"
]
},
"de_analyzer": {
"type": "custom",
"filter": [
"icu_normalizer",
"de_stop_filter",
"de_stem_filter",
"icu_folding"
],
"tokenizer": "icu_tokenizer"
}
},
"mappings": {
"items": {
"_analyzer": {
"path": "use_analyzer"
},
"properties": {
"summarizedArticle": {
"fields": {
"stemmed": {
"index_analyzer": "de_analyzer",
"type": "string",
"index": "analyzed"
}
},
"type": "string"
}
}
}
}
}
}
}
I Really thought I had this working, but I'm actually having issues. I have a dynamic template set up to match nested documents. I set up my mappings like so:
curl -XPUT 'http://localhost:9200/test/' -d '{
"mappings": {
"Item": {
"dynamic_templates": [
{
"metadata_template": {
"match_mapping_type": "string",
"path_match": "metadata.*",
"mapping": {
"type": "multi_field",
"fields": {
"{name}": {
"type": "{dynamic_type}",
"index": "analyzed"
},
"standard": {
"type": "{dynamic_type}",
"index": "analyzed",
"analyzer" : "standard"
}
}
}
}
}
]
}
},
"settings": {
"analysis": {
"filter": {
"my_ngram": {
"max_gram": 10,
"min_gram": 1,
"type": "nGram"
},
"lb_stemmer": {
"type": "stemmer",
"name": "english"
}
},
"analyzer": {
"default_index": {
"filter": [
"standard",
"lowercase",
"asciifolding",
"my_ngram"
],
"type": "custom",
"tokenizer": "keyword"
},
"default_search": {
"filter": [
"standard",
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}'
My expectation is that all fields that start with "metadata." should be stored in an analyzed field and in an unanalyzed field with the suffix ".standard". Am I completely misunderstanding this?
I add an item:
curl -XPUT localhost:9200/test/Item/1 -d '{
"name" : "test",
"metadata" : {
"strange_tag" : "CLEAN_2C_abcdefghij_07MAY2005_AB"
}
}'
This query works great:
{
"query": {
"match": {
"metadata.strange_tag": {
"query": "CLEAN_2C_abcdefghij_07MAY2005_AB",
"type": "boolean"
}
}
}
}
But the searching for the word CLEAN, or clean doesn't return any results. I expect that field to have gone through the ngram tokenizer. Anyone have a suggestion for what I'm doing wrong?
Looks l like I was incorrectly creating my NGRAM analyzer. Here is a working example:
curl -XDELETE 'localhost:9200/test'
curl -XPUT 'localhost:9200/test' -d '{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": [
"standard",
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"Item": {
"dynamic_templates": [
{
"metadata_template": {
"match_mapping_type": "string",
"path_match": "*",
"mapping": {
"type": "multi_field",
"fields": {
"{name}": {
"type": "{dynamic_type}",
"index": "analyzed",
"analyzer" : "my_ngram_analyzer"
},
"standard": {
"type": "{dynamic_type}",
"index": "analyzed",
"analyzer": "standard"
}
}
}
}
}
]
}
}
}'