ElasticSearch sort query alphabetically with score - sorting

is it possible to choose to have your queries sorted alphabetically via some change in the index settings or mapping?
I know that there is a "sort" query, but that removes the score, and I'd like the score to also be considered in addition to the alphabetical sort of the term.
For example if results "A" & "Z" have scores of 2, and "C" has a score of 1 I want the order to be:
A
Z
C
is this possible?
This is my current index settings and mapping:
{
"users": {
"settings": {
"index": {
"analysis": {
"filter": {
"shingle_filter": {
"max_shingle_size": "2",
"min_shingle_size": "2",
"output_unigrams": "true",
"type": "shingle"
},
"edgeNGram_filter": {
"type": "nGram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete_query_analyzer": {
"filter": [
"standard",
"asciifolding",
"lowercase"
],
"tokenizer": "standard"
},
"autocomplete_index_analyzer": {
"filter": [
"standard",
"asciifolding",
"lowercase",
"shingle_filter",
"edgeNGram_filter"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
}
{
"users": {
"mappings": {
"data": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete_index_analyzer",
"search_analyzer": "autocomplete_query_analyzer"
}
}
}
}
}
}
Is it possible to add something to the index settings so results are automatically sorted alphabetically?

The example in ElasticSearch's documentation shows that you should be able to sort by multiple values, including _score. Try something like:
"sort": [
{ "date": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
I'm not sure which field you're trying to sort by, so you'll need to adjust the example above according to your needs.
Let me know if this works for you :)

Related

Elasticsearch Only Searching From Start

Currently, Elasticsearch is only searching through the mapped items from the beginning of the string instead of throughout the string.
I have a custom analyzer, as well as a custom edge ngram tokenizer.
I am currently using bool queries from within javascript to search the index.
Index
{
"homestead_dev_index": {
"aliases": {},
"mappings": {
"elasticprojectnode": {
"properties": {
"archived": {
"type": "boolean"
},
"id": {
"type": "text",
"analyzer": "full_name"
},
"name": {
"type": "text",
"analyzer": "full_name"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "homestead_dev_index",
"creation_date": "1535439085947",
"analysis": {
"analyzer": {
"full_name": {
"filter": [
"standard",
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "mytok"
}
},
"tokenizer": {
"mytok": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "10"
}
}
},
"number_of_replicas": "1",
"uuid": "iCa7qKJVRU-_MA8sCYIAXw",
"version": {
"created": "5060399"
}
}
}
}
}
Query Body
{
"query": {
"bool": {
"should": [
{ "match": { "name": this.searchString } },
{ "match": { "id": this.searchString } }
]
}
},
"highlight": {
"pre_tags": ["<b style='background-color:yellow'>"],
"post_tags": ["</b>"],
"fields": {
"name": {},
"id": {}
}
}
}
Example
If I have projects with the names "Road - Area 1", "Road - Area 2" and "Sub-area 5 - Road" and the user searches for "Road", only "Road - Area 1" and "Road - Area 2" will display with the word "Road" highlighted in yellow.
The code needs to pick up the final project as well.
I seem to have figured it out.
In the original description, I am using the edge_ngram tokenizer when I am supposed to be using the ngram tokenizer.
Found on: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html#_partial_word_tokenizers

Elasticsearch 6.3.2 - how to search part-words using all fields on this index?

Do you have any idea for create search request? multi-match + part words searching.
{
"shop_index": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "shop_index",
"creation_date": "1534235625279",
"analysis": {
"filter": {
"nGram_filter": {
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
],
"min_gram": "2",
"type": "nGram",
"max_gram": "20"
}
},
"analyzer": {
"nGram_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
],
"type": "custom",
"tokenizer": "whitespace"
},
"whitespace_analyzer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1",
"uuid": "SBB9u344RVGm1QQUo-rVMg",
"version": {
"created": "6030299"
}
}
}
}
}
mapping
{
"shop_index": {
"mappings": {
"products": {
"properties": {
"html_keywords": {
"type": "text"
},
"html_title": {
"type": "text"
},
"name": {
"type": "text"
}
}
}
}
}
}
I would want to search phrase like -> searching "HOUSE"
typing -> "ho" -> show me "HOUSE"
typing -> "hou" -> show me "HOUSE"
typing -> "use" -> show me "HOUSE
typing -> "se" -> show me "HOUSE"
You can achieve it by using Fuzzy query - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html. Also consider combining it with slop (how far apart terms are allowed) and use multi_match to search in multiple fields.
Sample query:
{
"query":
{
"multi_match":
{
"fields": ["field1", "field2"],
"query": "hous",
"slop": 3,
"fuzziness": "AUTO"
}
}
}

Why does my Elasticsearch multi-match query look only for prefixes?

I am trying to write an Elasticsearch multi-match query (with the Java API) to create a "search-as-you-type" program. The query is applied to two fields, title and description, which are analyzed as ngrams.
My problem is, it seems that Elasticsearch tries to find only words beginning like my query. For instance, if I search for "nut", then it matches with documents featuring "nut", "nuts", "Nutella", etc, but it does not match documents featuring "walnut", which should be matched.
Here are my settings :
{
"index": {
"analysis": {
"analyzer": {
"edgeNGramAnalyzer": {
"tokenizer": "edgeTokenizer",
"filter": [
"word_delimiter",
"lowercase",
"unique"
]
}
},
"tokenizer": {
"edgeTokenizer": {
"type": "edgeNGram",
"min_gram": "3",
"max_gram": "8",
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}
Here is the relevant part of my mapping :
{
"content": {
"properties": {
"title": {
"type": "text",
"analyzer": "edgeNGramAnalyzer",
"fields": {
"sort": {
"type": "keyword"
}
}
},
"description": {
"type": "text",
"analyzer": "edgeNGramAnalyzer",
"fields": {
"sort": {
"type": "keyword"
}
}
}
}
}
}
And here is my query :
new MultiMatchQueryBuilder(query).field("title", 3).field("description", 1).fuzziness(0).tieBreaker(1).minimumShouldMatch("100%")
Do you have any idea what I could be doing wrong ?
That's because you're using an edgeNGram tokenizer instead of nGram one. The former only indexes prefixes, while the latter will index prefixes, suffixes and also sub-parts of your data.
Change your analyzer definition to this instead and it should work as expected:
{
"index": {
"analysis": {
"analyzer": {
"edgeNGramAnalyzer": {
"tokenizer": "edgeTokenizer",
"filter": [
"word_delimiter",
"lowercase",
"unique"
]
}
},
"tokenizer": {
"edgeTokenizer": {
"type": "nGram", <---- change this
"min_gram": "3",
"max_gram": "8",
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}

Highlight part of word with ngram and whitespace analyzers

I have an elasticsearch index with the following data:
"The A-Team" (as an example)
My index settings are :
"index": {
"number_of_shards": "1",
"provided_name": "tyh.tochniyot",
"creation_date": "1481039136127",
"analysis": {
"analyzer": {
"whitespace_analyzer": {
"type": "whitespace"
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer"
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": "3",
"max_gram": "7"
}
}
},
When i search for :
_search
{
"from": 0,
"size": 20,
"track_scores": true,
"highlight": {
"fields": {
"*": {
"fragment_size": 100,
"number_of_fragments": 10,
"require_field_match": false
}
}
},
"query": {
"match": {
"_all": {
"query": "Tea"
}
}
}
I expect to get the highlight result :
"highlight": {
"field": [
"The A-<em>Tea</em>m"
]
}
But i dont get any highlight at all.
The reason i am using whitespace for search and ngram for indexing is that i dont want in search phase to break the word i am searching, like if i am searching for "Team" it will find me "Tea","eam","Team"
Thank you
The problem was that my Analyzer and search Analyzer were running on the _all filed.
When i placed Analyzer attribute on the specific fields the Highlight started working.

ElasticSearch indexing so query returns contains

I've been trying to create my own index for users, where the query is indexed on the "name" value.
This is my current index settings:
{
"users": {
"settings": {
"index": {
"analysis": {
"filter": {
"shingle_filter": {
"max_shingle_size": "2",
"min_shingle_size": "2",
"output_unigrams": "true",
"type": "shingle"
},
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": "1",
"max_gram": "20"
}
},
"analyzer": {
"autocomplete_query_analyzer": {
"filter": [
"standard",
"asciifolding",
"lowercase"
],
"tokenizer": "standard"
},
"autocomplete_index_analyzer": {
"filter": [
"standard",
"asciifolding",
"lowercase",
"shingle_filter",
"edgeNGram_filter"
],
"tokenizer": "standard"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
}
and my mapping:
{
"users": {
"mappings": {
"data": {
"properties": {
"name": {
"type": "string",
"analyzer": "autocomplete_index_analyzer",
"search_analyzer": "autocomplete_query_analyzer"
}
}
}
}
}
}
Right now my problem is that search queries do not return results that contain the term. For example if I have a user "David", the search queries "Da", "Dav", "Davi", etc will return the value but search for "vid" or "avid" will not return any values.
Is this because of some value I'm missing in the settings?
You need to use nGram instead of edgeNGram. So simply change this
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": "1",
"max_gram": "20"
}
into this
"edgeNGram_filter": {
"type": "nGram", <--- change here
"min_gram": "1",
"max_gram": "20"
}
Note that you need to wipe your index, recreate it and the populate it again.

Resources