I Really thought I had this working, but I'm actually having issues. I have a dynamic template set up to match nested documents. I set up my mappings like so:
curl -XPUT 'http://localhost:9200/test/' -d '{
"mappings": {
"Item": {
"dynamic_templates": [
{
"metadata_template": {
"match_mapping_type": "string",
"path_match": "metadata.*",
"mapping": {
"type": "multi_field",
"fields": {
"{name}": {
"type": "{dynamic_type}",
"index": "analyzed"
},
"standard": {
"type": "{dynamic_type}",
"index": "analyzed",
"analyzer" : "standard"
}
}
}
}
}
]
}
},
"settings": {
"analysis": {
"filter": {
"my_ngram": {
"max_gram": 10,
"min_gram": 1,
"type": "nGram"
},
"lb_stemmer": {
"type": "stemmer",
"name": "english"
}
},
"analyzer": {
"default_index": {
"filter": [
"standard",
"lowercase",
"asciifolding",
"my_ngram"
],
"type": "custom",
"tokenizer": "keyword"
},
"default_search": {
"filter": [
"standard",
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}'
My expectation is that all fields that start with "metadata." should be stored in an analyzed field and in an unanalyzed field with the suffix ".standard". Am I completely misunderstanding this?
I add an item:
curl -XPUT localhost:9200/test/Item/1 -d '{
"name" : "test",
"metadata" : {
"strange_tag" : "CLEAN_2C_abcdefghij_07MAY2005_AB"
}
}'
This query works great:
{
"query": {
"match": {
"metadata.strange_tag": {
"query": "CLEAN_2C_abcdefghij_07MAY2005_AB",
"type": "boolean"
}
}
}
}
But the searching for the word CLEAN, or clean doesn't return any results. I expect that field to have gone through the ngram tokenizer. Anyone have a suggestion for what I'm doing wrong?
Looks l like I was incorrectly creating my NGRAM analyzer. Here is a working example:
curl -XDELETE 'localhost:9200/test'
curl -XPUT 'localhost:9200/test' -d '{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": [
"standard",
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"Item": {
"dynamic_templates": [
{
"metadata_template": {
"match_mapping_type": "string",
"path_match": "*",
"mapping": {
"type": "multi_field",
"fields": {
"{name}": {
"type": "{dynamic_type}",
"index": "analyzed",
"analyzer" : "my_ngram_analyzer"
},
"standard": {
"type": "{dynamic_type}",
"index": "analyzed",
"analyzer": "standard"
}
}
}
}
}
]
}
}
}'
Related
I'm using es 6.4 as AWS service. Here is my mapping -
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer"
}
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter"
]
}
}
},
"mappings": {
"tsetse": {
"properties": {
"id": {
"type": "integer"
},
"user_id": {
"type": "integer"
},
"description": {
"type": "text",
"analyzer": "my_analyzer"
},
"type": {
"type": "integer"
}
}
}
}
}
The index has a record with description = "greatest performance on earth". When I try to search, it always works on complete word - earth or performance. Does not return results on great or perf. What am I missing?
Here is updated mapping with EdgeNGram `
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"tsetse": {
"properties": {
"id": {
"type": "integer"
},
"user_id": {
"type": "integer"
},
"description": {
"type": "text",
"analyzer": "my_analyzer"
},
"type": {
"type": "integer"
}
}
}
}
}
`
Gist script - https://gist.github.com/swati-patil/0b1cea74fc52b1b96d44ad239ad2580d
Thanks,
Thanks for the Gist. I can see you're not creating your index correctly:
you're using POST instead of PUT
you're specifying a type where you shouldn't
there are two closing curly braces that you need to remove at the end
Do it like this instead:
# first delete your index
curl -XDELETE 'my-instance-us-east1.amazonaws.com/my_index'
# then create it correctly
curl -XPUT "my-instance-us-east1.amazonaws.com/my_index" -H 'Content-Type: application/json' -d '{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}'
# then analyze works
curl -XPOST my-instance-us-east1.amazonaws.com/my_index/_analyze -H 'Content-Type: application/json' -d '{
"analyzer": "my_analyzer",
"text": "Greatest performance on earth"
}'
Then index your documents and run your queries, they will both work.
I am searching for a phrase in a email body. Need to get the exact data filtered like, if I search for 'Avenue New', it should return only results which has the phrase 'Avenue New' not 'Avenue Street', 'Park Avenue'etc
My mapping is like:
{
"exchangemailssql": {
"aliases": {},
"mappings": {
"email": {
"dynamic_templates": [
{
"_default": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"doc_values": true,
"type": "keyword"
}
}
}
],
"properties": {
"attachments": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"body": {
"type": "text",
"analyzer": "keylower",
"fielddata": true
},
"count": {
"type": "short"
},
"emailId": {
"type": "long"
}
}
}
},
"settings": {
"index": {
"refresh_interval": "3s",
"number_of_shards": "1",
"provided_name": "exchangemailssql",
"creation_date": "1500527793230",
"analysis": {
"filter": {
"nGram": {
"min_gram": "4",
"side": "front",
"type": "edge_ngram",
"max_gram": "100"
}
},
"analyzer": {
"keylower": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
},
"email": {
"filter": [
"lowercase",
"unique",
"nGram"
],
"type": "custom",
"tokenizer": "uax_url_email"
},
"full": {
"filter": [
"lowercase",
"snowball",
"nGram"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "0",
"uuid": "2XTpHmwaQF65PNkCQCmcVQ",
"version": {
"created": "5040099"
}
}
}
}
}
I have given the search query like:
{
"query": {
"match_phrase": {
"body": "Avenue New"
}
},
"highlight": {
"fields" : {
"body" : {}
}
}
}
The problem here is that you're tokenizing the full body content using the keyword tokenizer, i.e. it will be one big lowercase string and you cannot search inside of it.
If you simply change the analyzer of your body field to standard instead of keylower, you'll find what you need using the match_phrase query.
"body": {
"type": "text",
"analyzer": "standard", <---change this
"fielddata": true
},
I use ElasticSearch-2.3.5. I want to add my custom analyzer to mapping while index creating.
PUT /library
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
}
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
},
"mappings": {
"book": {
"properties": {
"Id": {
"type": "long",
"search_analyzer": "search_term_analyzer",
"index_analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
},
"Title": {
"type": "string",
"search_analyzer": "search_term_analyzer",
"index_analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
}
}
}
}
}
I take a template example from official guide.
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"type1" : {
"properties" : {
"field1" : { "type" : "string", "index" : "not_analyzed" }
}
}
}
}
But I get an error trying to execute the first part of code. There is my error:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "analyzer [search_term_analyzer] not found for field [Title]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [book]: analyzer [search_term_analyzer] not found for field [Title]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "analyzer [search_term_analyzer] not found for field [Title]"
}
},
"status": 400
}
I can do it if I put my mappings inside of settings, but I think that it is wrong way. So I try to find my book by using a part of title. I have the "King Arthur" book for example. My query looks like this:
POST /library/book/_search
{
"query": {
"match": {
"Title": "kin"
}
}
}
Nothing will be found. What I do wrong? Could you help me? It seems my analyzer and tokenizer don't work. How can I get the terms "k", "i", "ki", "king" etc.? Because I think that I have only two terms right now. There are 'king' and 'arthur'.
You have misplaced the search_term_analyzer analyzer, it should be inside the analyzer section
PUT /library
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"book": {
"properties": {
"Id": {
"type": "long", <---- you probably need to make this a string or remove the analyzers
"search_analyzer": "search_term_analyzer",
"analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
},
"Title": {
"type": "string",
"search_analyzer": "search_term_analyzer",
"analyzer": "index_ngram_analyzer",
"term_vector":"with_positions_offsets"
}
}
}
}
}
Also make sure to use analyzer instead of index_analyzer, the latter as been deprecated in ES 2.x
I'm using elastic search number: "1.5.2" and I'm trying to implement an edge_ngram autocomplete search. I have the following mapping:
curl -XPUT 'localhost:8080/users' -d '{
"settings": {
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}
},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"edge_ngram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
},
"mappings": {
"user": {
"_all": {
"type":"string",
"index_analyzer": "edge_ngram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"id":{
"type": "integer",
"index": "no",
"include_in_all":false
},
"email": {
"type": "string"
},
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
}
}
}
}
}
}'
I then index an "user" document:
curl -XPUT 'localhost:8080/users/user/1' -d '{
"email": "a.smith#gmail.com",
"firstName": "Alexander",
"lastName": "Smith"
}'
When I run the following query nothing is returned:
curl -XGET 'localhost:8080/users/_search' -d '{
"query": {
"match":{
"_all":{
"query": "ale",
"operator":"and"
}
}
}
}'
Why is the _all match query not matching on the user document?
You can achieve the functionality of autocomplete by edge_ngram without overriding the _all field analysis. This is done by changing the names of the analyzers you have defined to default_index and default_search (you can alias them to reflect your original names ("edge_ngram_analyzer" and "whitespace_analyzer") if you want). Here is your configuration with the relevant changes:
curl -XPUT 'localhost:8080/users' -d '{
"settings": {
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}
},
"analyzer": {
"default_index": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"edge_ngram_filter"
]
},
"default_search": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
},
"mappings": {
"user": {
"properties": {
"id":{
"type": "integer",
"index": "no",
"include_in_all":false
},
"email": {
"type": "string"
},
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
}
}
}
}
}
}'
Hope I have managed to help :)
In the mapping char_filter section of elasticsearch mapping, its kind of vague and I'm having a lot of difficulty understanding if and how to use charfilter analyzer: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-mapping-charfilter.html
Basically the data we are storing in the index are ids of type String that look like this: "008392342000". I want to be able to search such ids when query terms actually contain a hyphen or trailing space like this: "008392342-000 ".
How would you advise I set the analyzer like?
Currently this is the definition of the field:
"mappings": {
"client": {
"properties": {
"ucn": {
"type": "multi_field",
"fields": {
"ucn_autoc": {
"type": "string",
"index": "analyzed",
"index_analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"ucn": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Here is the settings for the index containing analyzer etc.
"settings": {
"analysis": {
"filter": {
"autocomplete_ngram": {
"max_gram": 15,
"min_gram": 1,
"type": "edge_ngram"
},
"ngram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 8
}
},
"analyzer": {
"lowercase_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
},
"autocomplete_index": {
"filter": [
"lowercase",
"autocomplete_ngram"
],
"tokenizer": "keyword"
},
"ngram_index": {
"filter": [
"ngram_filter",
"lowercase"
],
"tokenizer": "keyword"
},
"autocomplete_search": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
},
"ngram_search": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
},
"index": {
"number_of_shards": 6,
"number_of_replicas": 1
}
}
}
You haven't provided your actual analyzers, what data goes in and what your expectations are, but based on the info you provided I would start with this:
{
"settings": {
"analysis": {
"char_filter": {
"my_mapping": {
"type": "mapping",
"mappings": [
"-=>"
]
}
},
"analyzer": {
"autocomplete_search": {
"tokenizer": "keyword",
"char_filter": [
"my_mapping"
],
"filter": [
"trim"
]
},
"autocomplete_index": {
"tokenizer": "keyword",
"filter": [
"trim"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"ucn": {
"type": "multi_field",
"fields": {
"ucn_autoc": {
"type": "string",
"index": "analyzed",
"index_analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"ucn": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
The char_filter would replace - with nothing: -=>. I would, also, use the trim filter to get rid of any trailing or leading white spaces. No idea what your autocomplete_index analyzer you have, I just used a keyword one.
Testing the analyzer GET /my_index/_analyze?analyzer=autocomplete_search&text= 0123-34742-000 results in:
"tokens": [
{
"token": "012334742000",
"start_offset": 0,
"end_offset": 17,
"type": "word",
"position": 1
}
]
which means it does eliminate the - and the white spaces.
And the typical query would be:
{
"query": {
"match": {
"ucn.ucn_autoc": " 0123-34742-000 "
}
}
}