Elasticsearch is not sorting the results - sorting

I'm having problem with an elasticsearch query.
I want to be able to sort the results but elasticsearch is ignoring the sort tag. Here my query:
{
"sort": [{
"title": {"order": "desc"}
}],
"query":{
"term": { "title": "pagos" }
}
}
However, when I remove the query part and I send only the sort tag, it works.
Can anyone point me out the correct way?
I also tried with the following query, which is the complete query that I have:
{
"sort": [{
"title": {"order": "asc"}
}],
"query":{
"bool":{
"should":[
{
"match":{
"title":{
"query":"Pagos",
"boost":9
}
}
},
{
"match":{
"description":{
"query":"Pagos",
"boost":5
}
}
},
{
"match":{
"keywords":{
"query":"Pagos",
"boost":3
}
}
},
{
"match":{
"owner":{
"query":"Pagos",
"boost":2
}
}
}
]
}
}
}
Settings
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 15,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"default" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "asciifolding"]
},
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding",
"autocomplete_filter"
]
}
}
}
}
}
Mappings
{
"objects": {
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"type": { "type": "string" },
"title": { "type": "string", "boost": 9, "analyzer": "autocomplete", "search_analyzer": "standard" },
"owner": { "type": "string", "boost": 2 },
"description": { "type": "string", "boost": 4 },
"keywords": { "type": "string", "boost": 1 }
}
}
}
Thanks in advance!

The field "title" in your document is an analyzed string field, which is also a multivalued field, which means elasticsearch will split the contents of the field into tokens and stores it separately in the index.
You probably want to sort the "title" field alphabetically on the first term, then on the second term, and so forth, but elasticsearch doesn’t have this information at its disposal at sort time.
Hence you can change your mapping of the "title" field from:
{
"title": {
"type": "string", "boost": 9, "analyzer": "autocomplete", "search_analyzer": "standard"
}
}
into a multifield mapping like this:
{
"title": {
"type": "string", "boost": 9, "analyzer": "autocomplete", "search_analyzer":"standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Now execute your search based on analyzed "title" field and sort based on the not_analyzed "title.raw" field
{
"sort": [{
"title.raw": {"order": "desc"}
}],
"query":{
"term": { "title": "pagos" }
}
}
It is beautifully explained here: String Sorting and Multifields

Related

Elasticsearch returning unexpected results

I used following mapping:
I have modified english analyzer to use ngram analyzer as follows,so that I should be able to search under following scenarios :
1] partial search and special character search
2] To get advantage of language analyzers
{
"settings": {
"analysis": {
"analyzer": {
"english_ngram": {
"type": "custom",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_stemmer",
"ngram_filter"
],
"tokenizer": "whitespace"
}
},
"filter": {
"english_stop": {
"type": "stop"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
},
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 25
}
}
}
},
"mappings": {
"movie": {
"properties": {
"title": {
"type": "string",
"fields": {
"en": {
"type": "string",
"analyzer": "english_ngram"
}
}
}
}
}
}
}
Indexed my data as follows:
PUT http://localhost:9200/movies/movie/1
{
"title" : "$peci#l movie"
}
Query as follows:
{
"query": {
"multi_match": {
"query": "$peci#44 m11ov",
"fields": ["title.en"],
"operator":"and",
"type": "most_fields",
"minimum_should_match": "75%"
}
}
}
In query I am looking for "$peci#44 m11ov" string ,ideally I should not get results for this.
Anything wrong in here ?
This is a result of ngram tokenization. When you tokenize a string $peci#l movie your analyzer produces tokens like $, $p, $pe, etc. Your query also produces most of these tokens. Though these matches will have a lower score than a complete match. If it's critical for you to exclude these false positive matches, you can try to set a threshold using min_score option https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-min-score.html

Elastic Search FVH highlights the smallest matching token

Settings:
{
"settings": {
"analysis": {
"analyzer": {
"idx_analyzer_ngram": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding",
"edgengram_filter_1_32"
],
"tokenizer": "ngram_alltokenchar_tokenizer_1_32"
},
"ngrm_srch_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
},
"tokenizer": {
"ngram_alltokenchar_tokenizer_1_32": {
"token_chars": [
"letter",
"whitespace",
"punctuation",
"symbol",
"digit"
],
"min_gram": "1",
"type": "nGram",
"max_gram": "32"
}
}
}
}
}
Mappings:
{
"properties": {
"TITLE": {
"type": "string",
"fields": {
"untouched": {
"index": "not_analyzed",
"type": "string"
},
"ngramanalyzed": {
"search_analyzer": "ngrm_srch_analyzer",
"index_analyzer": "idx_analyzer_ngram",
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}
Query:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "have some ha",
"fields": [
"TITLE.ngramanalyzed"
],
"default_operator": "and"
}
}
}
},
"highlight": {
"fields": {
"TITLE.ngramanalyzed": {}
}
}
}
I have document indexed with TITLE have some happy meal. When I search have some, I am able to get proper highlights.
<em>have</em> <em>some</em> happy meal
As i type more have some ha, the highlight results are not as expected.
<em>ha</em>ve <em>some</em> <em>ha</em>ppy meal
The have word gets partially highlighted as ha.
I would expect it to highlight the longest matching token, because with an ngrams with min size = 1, this gives me a highlight of 1 or more char while there should be another matching token of 4 or 5 chars (for example: have should also be highlighted along with ha being highlighted.
I am not able to find any solution for the same. Please suggest.

Highlight with fuzziness and ngram

I guess the title of the topic spoiled you enough :D
I use edge_ngram and highlight to build an autocomplete search. I have added fuzziness in the query to allow users to mispell their search, but it brokes a bit the highlight.
When i write Sport this is what I get :
<em>Spor</em>t
<em>Spor</em>t mécanique
<em>Spor</em>t nautique
I guess it's because it matches with the token spor generated by the ngram tokenizer.
The query:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "sport",
"operator": "and",
"fuzziness": "AUTO"
}
}
},
{
"match_phrase_prefix": {
"name.raw": {
"query": "sport"
}
}
}
]
}
},
"highlight": {
"fields": {
"name": {
"term_vector": "with_positions_offsets"
}
}
}
}
And the mapping:
{
"settings": {
"analysis": {
"analyzer": {
"partialAnalyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": ["asciifolding", "lowercase"]
},
"keywordAnalyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["asciifolding", "lowercase"]
},
"searchAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [ "letter", "digit" ]
}
}
}
},
"mappings": {
"place": {
"properties": {
"name": {
"type": "string",
"index_analyzer": "partialAnalyzer",
"search_analyzer": "searchAnalyzer",
"term_vector": "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"analyzer": "keywordAnalyzer"
}
}
}
}
}
}
}
I tried to add a new match clause without fuzziness in the query to try to match the keyword before the match with fuzziness but it changed nothing.
'match': {
'name': {
'query': 'sport',
'operator': 'and'
}
Any idea how I can handle this?
Regards, Raphaël
You could do that with highlight_query I guess
Try this in your highlighting query.
"highlight": {
"fields": {
"name": {
"term_vector": "with_positions_offsets",
"highlight_query": {
"match": {
"name.raw": {
"query": "spotr",
"fuzziness": 2
}
}
}
}
}
}
I hope it helps.

Trying to form an Elasticsearch query for autocomplete

I've read a lot and it seems that using EdgeNGrams is a good way to go for implementing an autocomplete feature for search applications. I've already configured the EdgeNGrams in my settings for my index.
PUT /bigtestindex
{
"settings":{
"analysis":{
"analyzer":{
"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[ "standard", "stop", "kstem", "ngram" ]
}
},
"filter":{
"edgengram":{
"type":"ngram",
"min_gram":2,
"max_gram":15
}
},
"highlight": {
"pre_tags" : ["<em>"],
"post_tags" : ["</em>"],
"fields": {
"title.autocomplete": {
"number_of_fragments": 1,
"fragment_size": 250
}
}
}
}
}
}
So if in my settings I have the EdgeNGram filter configured how do I add that to the search query?
What I have so far is a match query with highlight:
GET /bigtestindex/doc/_search
{
"query": {
"match": {
"content": {
"query": "thing and another thing",
"operator": "and"
}
}
},
"highlight": {
"pre_tags" : ["<em>"],
"post_tags" : ["</em>"],
"field": {
"_source.content": {
"number_of_fragments": 1,
"fragment_size": 250
}
}
}
}
How would I add autocomplete to the search query using EdgeNGrams configured in the settings for the index?
UPDATE
For the mapping, would it be ideal to do something like this:
"title": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
},
Or do I need to use multi_field type:
"title": {
"type": "multi_field",
"fields": {
"title": {
"type": "string"
},
"autocomplete": {
"analyzer": "autocomplete",
"type": "string",
"index": "not_analyzed"
}
}
},
I'm using ES 1.4.1 and want to use the title field for autocomplete purposes.... ?
Short answer: you need to use it in a field mapping. As in:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"stop",
"kstem",
"ngram"
]
}
},
"filter": {
"edgengram": {
"type": "ngram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"doc": {
"properties": {
"field1": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
}
For a bit more discussion, see:
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
and
http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch
Also, I don't think you want the "highlight" section in your index definition; that belongs in the query.
EDIT: Upon trying out your code, there are a couple of problems with it. One was the highlight issue I already mentioned. Another is that you named your filter "edgengram", even though it is of type "ngram" rather than type "edgeNGram", but then you referenced the filter "ngram" in your analyzer, which will use the default ngram filter, which probably doesn't give you what you want. (Hint: you can use term vectors to figure out what your analyzer is doing to your documents; you probably want to turn them off in production, though.)
So what you actually want is probably something like this:
PUT /test_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"stop",
"kstem",
"edgengram_filter"
]
}
},
"filter": {
"edgengram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"doc": {
"properties": {
"content": {
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}
}
When I indexed these two docs:
POST test_index/doc/_bulk
{"index":{"_id":1}}
{"content":"hello world"}
{"index":{"_id":2}}
{"content":"goodbye world"}
And ran this query (there was an error in your "highlight" block as well; should have said "fields" rather than "field")"
POST /test_index/doc/_search
{
"query": {
"match": {
"content": {
"query": "good wor",
"operator": "and"
}
}
},
"highlight": {
"pre_tags": [
"<em>"
],
"post_tags": [
"</em>"
],
"fields": {
"content": {
"number_of_fragments": 1,
"fragment_size": 250
}
}
}
}
I get back this response, which seems to be what you're looking for, if I understand you correctly:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2712221,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.2712221,
"_source": {
"content": "goodbye world"
},
"highlight": {
"content": [
"<em>goodbye</em> <em>world</em>"
]
}
}
]
}
}
Here is some code I used to test it out:
http://sense.qbox.io/gist/3092992993e0328f7c4ee80e768dd508a0bc053f

ElasticSearch filter on array of terms is filtering out all terms

In my application, users belong to a list of roles and objects have a list of roles associated with them to determine visibility. I'm trying to create a query that ensures the user belongs to at least one of the groups that is required by the object.
Here is my index configuration:
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 12,
"token_chars": []
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"team" : {
"dynamic": "strict",
"properties" : {
"id": {
"type": "string",
"index": "not_analyzed"
},
"object": {
"type": "string",
"index": "not_analyzed"
},
"roles": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"text": {
"type": "string",
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
}
}
}
}
}
Here is some sample data that I have indexed:
(verified via localhost:9200/index/_search?q=name:employee_1&pretty)
{
"id":"lsJ17K4sgQVfd",
"roles: ["OwnerslsJ17K21px6VX","AdminslsJ17K21px6VX"],
"object":"contact",
"name":"employee_1",
"text":"lsJ17K4sgQVfd employee_1 employee_1 employee_1#lsj17k1nysk75.com"
}
Here is my query that I am trying to execute to find that same contact:
{
"_source": ["id", "object", "name"],
"size": 30,
"query": {
"filtered": {
"query": {
"bool": {
"should": {
"multi_match": {
"query": "employee_1",
"type": "cross_fields",
"operator": "or",
"fields": ["name^2", "text"],
"minimum_should_match": "50%",
"fuzziness": "AUTO"
}
},
...,
"minimum_should_match": 1
}
},
"filter": {
"terms": {
"roles": [ "AdminslsJ17K21px6VX", "lsJ17K3gHCH4P" ]
}
}
}
},
"suggest": {
"text": "employee_1",
"text_suggestion": {
"term": {
"size": 3,
"field": "name",
"sort": "score",
"suggest_mode": "missing",
"prefix_length": 1
}
}
}
}
If I remove the filter clause then I get results, but as soon as I add it back everything gets filtered out again. What is the right way to express that I want the results to have at least one role in common?
The query above works as expected, the only problem was that my test case was executing before the index was fully populated. Adding a short wait time before making my first query solved the problem.

Resources