ElasticSearch sorting document based on fields that the phrase is found - elasticsearch

In ElasticSearch how do i sort documents based on finding a phrase in the following order of fields.
Search Phrase: Miami
Fields: Title, Content, Topics
If found in Title, Content and in Topics it will show before other documents that the phrase is only found in Content.
Maybe there is a way to say:
if phrase found in Title then weight 2
if phrase found in Content then weight 1.5
if phrase found in Topics then weight 1
and this will be sum(weight) with _score
My Current query looks like
{
"index": "abc",
"type": "mydocuments",
"body": {
"query": {
"multi_match": {
"query": "miami",
"type": "phrase",
"fields": [
"title",
"content",
"topics",
"destinations"
]
}
}
}
}

You can use boosting on fields with the caret ^ notation to score them higher than other matching fields
{
"index": "abc",
"type": "mydocuments",
"body": {
"query": {
"multi_match": {
"query": "miami",
"type": "phrase",
"fields": [
"title^10",
"content^3",
"topics",
"destinations"
]
}
}
}
}
Here I have applied a weight of 10 to title and weight of 3 to content. Documents will be returned in decreasing _score order so you need to boost scores in fields that you consider more important; the values to use for boosting are up to you and may require a little trial and improvement to return documents in your preferred order.

Related

Fuzzy score to include only best match found in query text - Elasticsearch

Lets say I have multiple words in query text that are close to word "Raquel" which in indexed in field1.
The problem is that score increases as the number of terms increases in query text.
To elaborate:
if query text is "Raquel", score is 5 (let's assume)
if query text is "Raquei", score is 4.9 (let's assume)
if query text is "Raquel Raquei Raque", the score increases, lets say (15). I need this score to be 5. Just the best score from amongst all scores evaluated against all query terms for a specific term in a field. Is there anyway I can achieve this?
Here's the query:
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "Raquel Raquei Raque",
"fields": ["filed1", "filed2"],
"fuzziness": "AUTO",
"minimum_should_match": "1"
}}]
}
}
Mappings for the fields used in query:
"filed1": {
"type": "text",
"analyzer": "standard_rebuilt",
"index_options": "docs"
},
"filed2": {
"type": "text",
"analyzer": "standard_rebuilt",
"index_options": "docs"
}
where standard_rebuilt uses unique word filter

How to force certain fields in mult_match to have exact match

I am trying to match the title of a product listing to a database of known products. My first idea was to put the known products and their metadata into elasticsearch and try to find the best match with multi_match. My current query is something like:
{
"query": {
"multi_match" : {
"query": "Men's small blue cotton pants SKU123",
"fields": ["sku^2","title","gender","color", "material","size"],
"type" : "cross_fields"
}
}
}
The problem is sometimes it will return products with the wrong color. Is there a way i could modify the above query to only score items in my index that have a color field equal to a word that exists in the query string? I am using elasticsearch 5.1.
If you want elasticsearch to score only items that meet certain criteria then you need to use the terms query in a filter context.
Since the terms query does not analyze your query, you'll have to do that yourself. Something simple would be to tokenize by whitespace and lowercase and generate a query that looks like this:
{
"query": {
"bool": {
"filter": {
"terms": {
"color": ["men's", "small", "blue", "cotton", "pants", "sku123"]
}
},
"must": {
"multi_match": {
"query": "Men's small blue cotton pants SKU123",
"fields": [
"sku^2",
"title",
"gender",
"material",
"size"
],
"type": "cross_fields"
}
}
}
}
}

How to add fuzziness to search query in elasticsearch?

I'm trying to implement fuzziness on a particular field in a cross-fields query. It's a bit difficult though.
So the query should:
Match phrases across fields.
Match an exact match against partNumber and barcode (no fuzziness)
Match fuzzy terms against title and subtitle.
The query that I have so far is below - note the fuzziness isn't working at all in query so far.
So this should match 1 result which is "Amazing t-Shirt" in the title, and Blue in the subtitle. (note the spelling error).
Is it possible to implement the fuzziness at the index mapping level instead? Title and subtitle are quite short in the data set - maybe 30 - 40 characters combined maximum.
Otherwise how can I add fuzziness to the title and subtitle in the query?
{
"query": {
"multi_match": {
"query": "Bleu Amazing T-Shirt",
"fuzziness": "auto",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
],
"type": "cross_fields"
}
},
"fields": [
"identity.partNumber",
"identity.barcode",
"identity.ppn",
"descriptions.title",
"descriptions.subtitle"
]
}
well it doesn't seem to be supported to fuzzy search using cross_fields, there was a few related issues. So instead of crossfield search, I copied the title & subtitle to a new field at index time and split the query like below. Seems to work for my test cases at least....
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "{{searchTerm}}",
"operator": "and",
"fields": [
"identity.partNumber^4",
"identity.altIdentifier^4",
"identity.barcode",
"identity.mpn",
"identity.ppn"
],
"type": "best_fields"
}
},
{
"match": {
"fuzzyFields": {
"query": "{{searchTerm}}",
"operator": "and",
"fuzziness": "auto"
}
}
}
]
}
}

elasticsearch scoring unique terms vs ngram terms

i've figured out how to return results on a partial word result using ngrams. but now i'd like to arrange (score or sort) my results based on the term first and then a partial term.
for example, the user searches a movie db for 'we'. i want 'we are marshall' and similar to show up at the top, and not 'north by northwest'. (the 'we' is in 'northwest').
currently this is my mapping for this title field:
"title": {
"type": "string",
"analyzer": "ngramAnalyer",
"fields": {
"term": {
"type": "string",
"analyzer": "fullTermCaseInsensitive"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
i've created a multifield where ngramAnalyzer is a custom ngram, term is using a keyword tokenizer with a standard filter, and raw is not_indexed.
my query is as follows:
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "_score * (1+ (1 / doc['salesrank'].value) )"
}
}
],
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": {
"query": "we",
"max_expansions": 10
}
}
}
],
"should":{
"term" : {
"title.term" : {
"value" : "we",
"boost" : 10
}
}
}
}
}
}
i'm basically requiring that the ngram must be matched, and the term 'we' should be matched, and if so, boost it.
this isn't working of course.
any ideas?
edit
to add further complexity ... how would i match first on exact title, then on a custom score?
i've taken some stabs at it, but doesn't seem to work.
for example:
input: 'game'
results should be ordered by exact match 'game'
followed by a custom score based on a sales rank (integer)
so that the next results after 'game' might be something like 'hunger games'
what about bool combination of boosting query, where first match about full term with 10x boost factor, and another matches against ngram term with standard boost factor?

OR query with elasticsearch

I have an index with "name" and "description" filed. I am running a Boolean query against my index. Sometimes the term is present in both name and description fields, in this case the documents in which both the name and description contains the search term are scored higher compared to the ones having either the name or the description having the search term.
What I want is to score them equal. So the the documents with either name or description having the term has the same score as the document having the search term present in both name and description.
Is it possible?
Here is the example:
{
"name": "xyz",
"description": "abc xyz"
},
{
"name": "abc",
"description": "xyz pqr"
},
{
"name": "xyz",
"description": "abc pqr"
}
If the user search for term "xyz" I want all three documents above to have the same score.
As all documents contains the term "xyz" either in name or in description or in both fields.
You can use a Filtered Query for this. Filters are not scored. See the query below for searching the term "xyz":
POST <index name>/<type>/_search
{
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"should": [
{
"term": {
"name": "xyz"
}
},
{
"term": {
"description": "xyz"
}
}
]
}
}
}
}
I think you can either :
transform you query to a filter. Filters do not affect score (and are faster than queries)
or wrap your query in a "Constant score query" - see : http://www.elasticsearch.org/guide/reference/query-dsl/constant-score-query/

Resources