elasticsearch context suggester stopwords - elasticsearch

Is there a way to analyze a field that is passed to the context suggester?
If, say, I have this in my mapping:
mappings: {
myitem: {
title: {type: 'string'},
content: {type: 'string'},
user: {type: 'string', index: 'not_analyzed'},
suggest_field: {
type: 'completion',
payloads: false,
context: {
user: {
type: 'category',
path: 'user'
},
}
}
}
}
and I index this doc:
POST /myindex/myitem/1
{
title: "The Post Title",
content: ...,
user: 123,
suggest_field: {
input: "The Post Title",
context: {
user: 123
}
}
}
I would like to analyze the input first, split it into separate words, run it through lowercase and stop words filters so that the context suggester actually gets
suggest_field: {
input: ["post", "title"],
context: {
user: 123
}
}
I know I can pass an array into the suggest field but I would like to avoid lowercasing the text, splitting it, running the stop words filter in my application, before passing to ES. If possible, I would rather ES do this for me. I did try adding an index_analyzer to the field mapping but that didn't seem to achieve anything.
OR, is there another way to get autocomplete suggestions for words?

Okay, so this is pretty involved, but I think it does what you want, more or less. I'm not going to explain the whole thing, because that would take quite a bit of time. However, I will say that I started with this blog post and added a stop token filter. The "title" field has sub-fields (what used to be called a multi_field) that use different analyzers, or none. The query contains a couple of terms aggregations. Also notice that the aggregations results are filtered by the match query to only return results relevant to the text query.
Here is the index setup (spend some time looking through this; if you have specific questions I will try to answer them but I encourage you to go through the blog post first):
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
},
"stop_filter": {
"type": "stop"
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"stop_filter",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"stop_filter"
]
},
"stopword_only_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"asciifolding",
"stop_filter"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "string",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
},
"stopword_only": {
"type": "string",
"analyzer": "stopword_only_analyzer"
}
}
}
}
}
}
}
Then I added a few docs:
PUT /test_index/_bulk
{"index": {"_index":"test_index", "_type":"doc", "_id":1}}
{"title": "The Lion King"}
{"index": {"_index":"test_index", "_type":"doc", "_id":2}}
{"title": "Beauty and the Beast"}
{"index": {"_index":"test_index", "_type":"doc", "_id":3}}
{"title": "Alladin"}
{"index": {"_index":"test_index", "_type":"doc", "_id":4}}
{"title": "The Little Mermaid"}
{"index": {"_index":"test_index", "_type":"doc", "_id":5}}
{"title": "Lady and the Tramp"}
Now I can search the documents with word prefixes if I want (or the full words, capitalized or not), and use aggregations to return both the intact titles of the matching documents, as well as intact (non-lowercased) words, minus the stopwords:
POST /test_index/_search?search_type=count
{
"query": {
"match": {
"title": {
"query": "mer king",
"operator": "or"
}
}
},
"aggs": {
"word_tokens": {
"terms": { "field": "title.stopword_only" }
},
"intact_titles": {
"terms": { "field": "title.raw" }
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"intact_titles": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "The Lion King",
"doc_count": 1
},
{
"key": "The Little Mermaid",
"doc_count": 1
}
]
},
"word_tokens": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "The",
"doc_count": 2
},
{
"key": "King",
"doc_count": 1
},
{
"key": "Lion",
"doc_count": 1
},
{
"key": "Little",
"doc_count": 1
},
{
"key": "Mermaid",
"doc_count": 1
}
]
}
}
}
Notice that "The" gets returned. This seems to be because the default _english_ stopwords only contain "the". I didn't immediately find a way around this.
Here is the code I used:
http://sense.qbox.io/gist/2fbb8a16b2cd35370f5d5944aa9ea7381544be79
Let me know if that helps you solve your problem.

You can set up a analyzer which does this for you.
If you follow the tutorial called you complete me, there is a section about stopwords.
There is a change in how elasticsearch works after this article was written. The standard analyzer no logner does stopword removal, so you need to use the stop analyzer in stead.
The mapping
curl -X DELETE localhost:9200/hotels
curl -X PUT localhost:9200/hotels -d '
{
"mappings": {
"hotel" : {
"properties" : {
"name" : { "type" : "string" },
"city" : { "type" : "string" },
"name_suggest" : {
"type" : "completion",
"index_analyzer" : "stop",//NOTE HERE THE DIFFERENCE
"search_analyzer" : "stop",//FROM THE ARTICELE!!
"preserve_position_increments": false,
"preserve_separators": false
}
}
}
}
}'
Getting suggestion
curl -X POST localhost:9200/hotels/_suggest -d '
{
"hotels" : {
"text" : "m",
"completion" : {
"field" : "name_suggest"
}
}
}'
Hope this helps. I have spent a long time looking for this answer myself.

Related

No match on document if the search string is longer than the search field

I have a title I am looking for
The title is, and is stored in a document as
"Police diaries : stefan zweig"
When I search "Police"
I get the result.
But when I search Policeman
I do not get the result.
Here is the query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"title",
omitted because irrelevance...
],
"query": "Policeman",
"fuzziness": "1.5",
"prefix_length": "2"
}
}
],
"must": {
omitted because irrelevance...
}
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
and here is the mapping
{
"books": {
"mappings": {
"book": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"sort": {
"type": "text",
"analyzer": "to order in another language, (creates a string with symbols)",
"fielddata": true
}
}
}
}
}
}
}
}
It should be noted that I have documents with a title "some title"
which get hits if I search for "someone title".
I cant figure out why the police book is not showing up.
So you have 2 parts of your question.
You want to search the title containing police when searching for policeman.
want to know why some title documents match the someone title document and according to that you expect the first one to match as well.
Let me first explain you why second query matches and the why the first one doesn't and then would tell you, how to make the first one to work.
Your document containing some title creates below tokens and you can verify this with analyzer API.
POST /_analyze
{
"text": "some title",
"analyzer" : "standard" --> default analyzer for text field
}
Generated tokens
{
"tokens": [
{
"token": "some",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "title",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Now when you search for someone title using the match query which is analyzed and uses the same analyzer which is used on index time on field.
So it creates 2 tokens someone and title and match query matches the title tokens, which is the reason it comes in your search result, you can also use Explain API to verify and see the internals how it matches in detail.
How to bring police title when searching for policeman
You need to make use of synonyms token filter as shown in the below example.
Index Def
{
"settings": {
"analysis": {
"analyzer": {
"synonyms": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms" : ["policeman => police"] --> note this
}
}
}
},
"mappings": {
"properties": {
"": {
"type": "text",
"analyzer": "synonyms"
}
}
}
}
Index sample doc
{
"dialog" : "police"
}
Search query having term policeman
{
"query": {
"match" : {
"dialog" : {
"query" : "policeman"
}
}
}
}
And search result
"hits": [
{
"_index": "so_syn",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"dialog": "police" --> note source has `police` only.
}
}
]

How to exclude asterisks while searching with analyzer

I need to search by an array of values, and each value can be either simple text or text with askterisks(*).
For example:
["MYULTRATEXT"]
And I have the next index(i have a really big index, so I will simplify it):
................
{
"settings": {
"analysis": {
"char_filter": {
"asterisk_remove": {
"type": "pattern_replace",
"pattern": "(\\d+)*(?=\\d)",
"replacement": "1$"
}
},
"analyzer": {
"custom_search_analyzer": {
"char_filter": [
"asterisk_remove"
],
"type": "custom",
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"analyzer":"keyword",
"search_analyzer": "custom_search_analyzer"
},
......................
And all data in the index is stored with asterisks * e.g.:
curl -X PUT "localhost:9200/locations/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
"name" : "MY*ULTRA*TEXT"
}
I need to return exact the same name value when I search by this string MYULTRATEXT
curl -XPOST 'localhost:9200/locations/_search?pretty' -d '
{
"query": { terms: { "name": ["MYULTRATEXT"] } }
}'
It Should return MY*ULTRA*TEXT, but it does not work, so can't find a workaround. Any thoughts?
I tried pattern_replace but seems like I am doing something wrong or I am missing something here.
So I need to replace all * to empty `` while searching
There appears to be a problem with the regex you provided and the replacement pattern.
I think what you want is:
"char_filter": {
"asterisk_remove": {
"type": "pattern_replace",
"pattern": "(\\w+)\\*(?=\\w)",
"replacement": "$1"
}
}
Note the following changes:
\d => \w (match word characters instead of only digits)
escape * since asterisks have a special meaning for regexes
1$ => $1 ($<GROUPNUM> is how you reference captured groups)
To see how Elasticsearch will analyze the text against an analyzer, or to check that you defined an analyzer correctly, Elasticsearch has the ANALYZE API endpoint that you can use: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html
If you try this API with your current definition of custom_search_analyzer, you will find that "MY*ULTRA*TEXT" is analyzed to "MY*ULTRA*TEXT" and not "MYULTRATEXT" as you intend.
I have a personal app that I use to more easily interact with and visualize the results of the ANALYZE API. I tried your example and you can find it here: Elasticsearch Analysis Inspector.
This might help you - your regex pattern is the issue.
You want to replace all * occurrences with `` the pattern below will do the trick..
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer":"my_analyzer"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"asterisk_remove": {
"type": "pattern_replace",
"pattern": "(?<=\\w)(\\*)(?=\\w)",
"replacement": ""
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"asterisk_remove"
],
"type": "custom",
"tokenizer": "keyword"
}
}
}
}
}
Analyze query
POST my_index/_analyze
{
"analyzer": "my_analyzer",
"text": ["MY*ULTRA*TEXT"]
}
Results of analyze query
{
"tokens": [
{
"token": "myultratext",
"start_offset": 0,
"end_offset": 13,
"type": "word",
"position": 0
}
]
}
Post a document
POST my_index/doc/1
{
"name" : "MY*ULTRA*TEXT"
}
Search query
GET my_index/_search
{
"query": {
"match": {
"name": "MYULTRATEXT"
}
}
}
Or
GET my_index/_search
{
"query": {
"match": {
"name": "myultratext"
}
}
}
Results search query
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "my_index",
"_type": "doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": "MY*ULTRA*TEXT"
}
}
]
}
}
Hope it helps

Elastic synonym usage in aggregations

Situation :
Elastic version used: 2.3.1
I have an elastic index configured like so
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"british,english",
"queen,monarch"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}
Which is great, when I query the document and use a query term "english" or "queen" I get all documents matching british and monarch. When I use a synonym term in filter aggregation it doesnt work. For example
In my index I have 5 documents, 3 of them have monarch, 2 of them have queen
POST /my_index/_search
{
"size": 0,
"query" : {
"match" : {
"status.synonym":{
"query": "queen",
"operator": "and"
}
}
},
"aggs" : {
"status_terms" : {
"terms" : { "field" : "status.synonym" }
},
"monarch_filter" : {
"filter" : { "term": { "status.synonym": "monarch" } }
}
},
"explain" : 0
}
The result produces:
Total hits:
5 doc count (as expected, great!)
Status terms: 5 doc count for queen (as expected, great!)
Monarch filter: 0 doc count
I have tried different synonym filter configuration:
queen,monarch
queen,monarch => queen
queen,monarch => queen,monarch
But the above hasn't changed the results. I was wanting to conclude that maybe you can use filters at query time only but then if terms aggregation is working why shouldn't filter, hence I think its my synonym filter configuration that is wrong. A more extensive synonym filter example can be found here.
QUESTION:
How to use/configure synonyms in filter aggregation?
Example to replicate the case above:
1. Create and configure index:
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"wlh,wellhead=>wellwell"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
}
}
PUT my_index/_mapping/job
{
"properties": {
"title":{
"type": "string",
"analyzer": "my_synonyms"
}
}
}
2.Put two documents:
PUT my_index/job/1
{
"title":"wellhead smth else"
}
PUT my_index/job/2
{
"title":"wlh other stuff"
}
3.Execute a search on wlh which should return 2 documents; have a terms aggregation which should have 2 documents for wellwell and a filter which shouldn't have 0 count:
POST my_index/_search
{
"size": 0,
"query" : {
"match" : {
"title":{
"query": "wlh",
"operator": "and"
}
}
},
"aggs" : {
"wlhAggs" : {
"terms" : { "field" : "title" }
},
"wlhFilter" : {
"filter" : { "term": { "title": "wlh" } }
}
},
"explain" : 0
}
The results of this query is:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"wlhAggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "wellwell",
"doc_count": 2
},
{
"key": "else",
"doc_count": 1
},
{
"key": "other",
"doc_count": 1
},
{
"key": "smth",
"doc_count": 1
},
{
"key": "stuff",
"doc_count": 1
}
]
},
"wlhFilter": {
"doc_count": 0
}
}
}
And thats my problem, the wlhFilter should have at least 1 doc count in it.
I'm short in time, so if needed I can elaborate a bit more at a later time today/tomorrow. But the following should work:
DELETE /my_index
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"british,english",
"queen,monarch"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_synonyms",
"fielddata": true
}
}
}
}
}
POST my_index/test/1
{
"title" : "the british monarch"
}
GET my_index/_search
{
"query": {
"match": {
"title": "queen"
}
}
}
GET my_index/_search
{
"query": {
"match": {
"title": "queen"
}
},
"aggs": {
"queen_filter": {
"filter": {
"term": {
"title": "queen"
}
}
},
"monarch_filter": {
"filter": {
"term": {
"title": "monarch"
}
}
}
}
}
Could you share the mapping you have defined for your status.synonym field?
EDIT: V2
The reason why your filter's output is 0, is because a filter in Elasticsearch never goes through an analysis phase. It's meant for exact matches.
The token 'wlh' in your aggregation will not be translated to 'wellwell', meaning that it doesn't occur in the inverted index. This is because, during index time, your 'wlh' is translated into 'wellwell'.
In order to achieve what you want, you will have to index the data into a separate field and adjust your filter accordingly.
You could try something like:
DELETE my_index
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"my_synonym_filter": {
"type": "synonym",
"synonyms": [
"wlh,wellhead=>wellwell"
]
}
},
"analyzer": {
"my_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_synonym_filter"
]
}
}
}
},
"mappings": {
"job": {
"properties": {
"title": {
"type": "string",
"fields": {
"synonym": {
"type": "string",
"analyzer": "my_synonyms"
}
}
}
}
}
}
}
PUT my_index/job/1
{
"title":"wellhead smth else"
}
PUT my_index/job/2
{
"title":"wlh other stuff"
}
POST my_index/_search
{
"size": 0,
"query": {
"match": {
"title.synonym": {
"query": "wlh",
"operator": "and"
}
}
},
"aggs": {
"wlhAggs": {
"terms": {
"field": "title.synonym"
}
},
"wlhFilter": {
"filter": {
"term": {
"title": "wlh"
}
}
}
}
}
Output:
{
"aggregations": {
"wlhAggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "wellwell",
"doc_count": 2
},
{
"key": "else",
"doc_count": 1
},
{
"key": "other",
"doc_count": 1
},
{
"key": "smth",
"doc_count": 1
},
{
"key": "stuff",
"doc_count": 1
}
]
},
"wlhFilter": {
"doc_count": 1
}
}
}
Hope this helps!!
So with the help of #Byron Voorbach below and his comments this is my solution:
I have created a separate field which I use synonym analyser on, as
opposed to having a property field (mainfield.property).
And most importantly the problem was my synonyms were contracted! I
had, for example, british,english => uk. Changing that to
british,english,uk solved my issue and the filter aggregation is
returning the right number of documents.
Hope this helps someone, or at least point to the right direction.
Edit:
Oh lord praise the documentation! I completely fixed my issue with Filters (S!) aggregation (link here). In filters configuration I specified Match type of query and it worked! Ended up with something like this:
"aggs" : {
"messages" : {
"filters" : {
"filters" : {
"status" : { "match" : { "cats.saurus" : "monarch" }},
"country" : { "match" : { "cats.saurus" : "british" }}
}
}
}
}

Elasticsearch on multiple fields with partial and full matches

Our Account model has a first_name, last_name and a ssn (social security number).
I want to do partial matches on the first_name,last_name' but an exact match on ssn. I have this so far:
settings analysis: {
filter: {
substring: {
type: "nGram",
min_gram: 3,
max_gram: 50
},
ssn_string: {
type: "nGram",
min_gram: 9,
max_gram: 9
},
},
analyzer: {
index_ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "substring"]
},
search_ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "substring"]
},
ssn_ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["ssn_string"]
},
}
}
mapping do
[:first_name, :last_name].each do |attribute|
indexes attribute, type: 'string',
index_analyzer: 'index_ngram_analyzer',
search_analyzer: 'search_ngram_analyzer'
end
indexes :ssn, type: 'string', index: 'not_analyzed'
end
My search is as follows:
query: {
multi_match: {
fields: ["first_name", "last_name", "ssn"],
query: query,
type: "cross_fields",
operator: "and"
}
}
So this works:
Account.search("erik").records.to_a
and even (for Erik Smith):
Account.search("erik smi").records.to_a
and the ssn:
Account.search("111112222").records.to_a
but not:
Account.search("erik 111112222").records.to_a
Any idea if I am indexing or querying wrong?
Thank you for any help!
Does it have to be done with a single query string? If not, I would do something like this:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"_all": {
"enabled": true,
"index_analyzer": "ngram_analyzer",
"search_analyzer": "standard"
},
"properties": {
"first_name": {
"type": "string",
"include_in_all": true
},
"last_name": {
"type": "string",
"include_in_all": true
},
"ssn": {
"type": "string",
"index": "not_analyzed",
"include_in_all": false
}
}
}
}
}
Notice the use of the_all field. I included first_name and last_name in _all, but not ssn, and ssn is not analyzed at all since I want to do exact matches against it.
I indexed a couple of documents for illustration:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"first_name":"Erik","last_name":"Smith","ssn":"111112222"}
{"index":{"_id":2}}
{"first_name":"Bob","last_name":"Jones","ssn":"123456789"}
Then I can query for the partial names, and filter by the exact ssn:
POST /test_index/doc/_search
{
"query": {
"filtered": {
"query": {
"match": {
"_all": {
"query": "eri smi",
"operator": "and"
}
}
},
"filter": {
"term": {
"ssn": "111112222"
}
}
}
}
}
And I get back what I'm expecting:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.8838835,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.8838835,
"_source": {
"first_name": "Erik",
"last_name": "Smith",
"ssn": "111112222"
}
}
]
}
}
If you need to be able to do the search with a single query string (no filter), you could include ssn in the all field as well, but with this setup it will also match on partial strings (like 111112) so that may not be what you want.
If you only want to match prefixes (i.e., search terms that start at the beginning of the words), you should use edge ngrams.
I wrote a blog post about using ngrams which might help you out a little: http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch
Here is the code I used for this answer. I tried a few different things, including the setup I posted here, and another inluding ssn in _all, but with edge ngrams. Hope this helps:
http://sense.qbox.io/gist/b6a31c929945ef96779c72c468303ea3bc87320f

Elasticsearch multi_field type search and sort issue

I'm having an issue with multi_field mapping type in one of my indexes and I am not sure what the issue is. I use a very similar mapping in another index and I don't have these issues. ES version is 90.12
I have set this up I have a mapping that looks like this:
{
"settings": {
"index": {
"number_of_shards": 10,
"number_of_replicas": 1
}
},
"mappings": {
"production": {
"properties": {
"production_title": {
"type": "multi_field",
"fields": {
"production_title_edgengram": {
"type": "string",
"index": "analyzed",
"index_analyzer": "autocomplete_index",
"search_analyzer": "autocomplete_search"
},
"production_title": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
The .yml looks like this:
index:
mapper:
dynamic: true
analysis:
analyzer:
autocomplete_index:
tokenizer: keyword
filter: ["lowercase", "autocomplete_ngram"]
autocomplete_search:
tokenizer: keyword
filter: lowercase
ngram_index:
tokenizer: keyword
filter: ["ngram_filter"]
ngram_search:
tokenizer: keyword
filter: lowercase
filter:
autocomplete_ngram:
type: edgeNGram
min_gram: 1
max_gram: 15
side: front
ngram_filter:
type: nGram
min_gram: 2
max_gram: 8
So doing this:
curl -XGET 'http://localhost:9200/productionindex/production/_search' -d '{
"sort": [
{
"production_title": "asc"
}
],
"size": 1
}'
and
curl -XGET 'http://localhost:9200/productionindex/production/_search' -d '{
"sort": [
{
"production_title": "desc"
}
],
"size": 1
}'
I end up with the exact same result somewhere in the middle of the alphabet:
"production_title": "IL, 'Hoodoo Love'"
However, if I do this:
{
"query": {
"term": {
"production_title": "IL, 'Hoodoo Love'"
}
}
}
I get zero results.
Furthermore, if I do this:
{
"query": {
"match": {
"production_title_edgengram": "Il"
}
}
}
I also get zero results.
If I don't use multi_field and I separate them out, I can then search on them fine, (term and autocomplete) but I still can't sort.
While indexing I am only sending production_title when indexing multi_field.
Does anyone have any idea what is going on here?
Below please find the explain (last result only for brevity)
{
"_shard": 6,
"_node": "j-D2SYPCT0qZt1lD1RcKOg",
"_index": "productionindex",
"_type": "production",
"_id": "casting_call.productiondatetime.689",
"_score": null,
"_source": {
"venue_state": "WA",
"updated_date": "2014-03-10T12:08:13.927273",
"django_id": 689,
"production_types": [
69,
87,
89
],
"production_title": "WA, 'Footloose'"
},
"sort": [
null
],
"_explanation": {
"value": 1.0,
"description": "ConstantScore(cache(_type:audition)), product of:",
"details": [
{
"value": 1.0,
"description": "boost"
},
{
"value": 1.0,
"description": "queryNorm"
}
]
}
}
from this curl:
curl -XPOST 'http://localhost:9200/productionindex/production/_search?pretty=true&explain=true' -d '{
"query": {
"match_all": {}
},
"sort": [
{
"production_title": {
"order": "desc"
}
}
]
}'

Resources