How can I score Elasticsearch matches for particular field names higher when using a full text search on _all? - elasticsearch

I've setup an index that has many types representing user data such as a ShoppingList, Playlist, etc. Each type has an "identity_id" field for the user's unique identifier. I use the following query to search across all types and fields for a user (for a search function in a website):
GET _search
{
"query": {
"filtered": {
"query": {
"match_phrase_prefix": {
"_all": "awesome"
}
},
"filter": {
"match": {
"identity_id": 1
}
}
}
}
}
My questions are:
Is there a way to give a higher score to matches on fields that have "name" in the field name? For example, the ShoppingList type will have a shopping_list_name field, and I want a match on that to be higher than its other fields.
Is the above way of doing a full text search for a particular user (query then filter) the most efficient way? What about creating an index per user?

How about this query that boosts certain fields:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "awesome",
"fields": [
"*_name",
"field*"
]
}
},
"functions": [
{
"weight": 2,
"filter": {
"multi_match": {
"query": "awesome",
"fields": [
"*_name"
]
}
}
},
{
"weight": 1,
"filter": {
"multi_match": {
"query": "awesome",
"fields": [
"field*"
]
}
}
}
]
}
}
}
What the query above does is to boost (weigth: 2) the *_name fields query and not do apply any boosting to fields called field*.
Is the above way of doing a full text search for a particular user (query then filter) the most efficient way? What about creating an index per user?
Regarding this ^ question, that's more complicated and you also need to consider how many users you have, the hardware resources the cluster has, structure of data, queries used etc.

Related

Elasticsearch Multi match and exact matches

My knowledge of Elasticsearch is a bit limited, so what I want to do might not even be possible.
Say I have an ecommerce where I want to be able to freely search on the article names and other fields, but I also want to search on exact article codes aswell. Is this possible in the same query?
Example:
"articlecode": "v400",
"name": "Earplugs for humans"
}
{
"articlecode": "b6655",
"name": "Hammer 400"
}
So can a query be written that combines both multimatch and terms? So that If I search for '400' I get 2 results, but if I search for v400 I just get one result as it is an exact match on the "articlecode"-field.
Below is our current query, where i have an ngram on the "name" field and where I use the term-keyword on the language-field.
{
"size": 10,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "v400",
"fields": [
"articlecode^10",
"name^7"
]
}
},
"filter": {
"term": {
"IdLang.keyword": "sv"
}
}
}
}
}
Have you ever thought of using query_string instead of multi_match? Then you can use wildcard in your search:
{
"size": 10,
"query": {
"bool": {
"must": {
"query_string": {
"query": "*v400",
"fields": [
"articlecode^10",
"name^7"
]
}
}
}
}
}
If you want to search with 400 anywhere in the 2 fields, you can do *400*, or only leading or trailing, depending on what you want.

Filter results from Elasticsearch if only a specific field matches

I'm using the following query for searching across multiple fields:
{
"query": {
"multi_match": {
"query": "italian sports car",
"fields": ["car_name", "car_brand", "car_description", "car_country"],
"type": "most_fields"
}
}
}
In this example, I'm looking for sports cars made in Italy (hence the car_country field). However, this will return all the cars made in Italy even if they are not sports cars. I want car_country to be just an auxiliary search field, so I don't want hits when the only matched field is car_country. Is this possible? I know I can set a lower score for that field, but I want hits with only this matching field to be completely ignored.
There can be different ways you handle this problem depending on the scoring etc. you require from you results. For instance -
Use a bool query with 2 parts
Must query - include queries that must match for the document to be in the resultset
Should query - include queries that should match(and impact scoring) but do not decide if a document should or should not be in the result set.
Add the multi-match query without the car_country field in must query and a match query for car_country field in should query.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "italian sports car",
"fields": [
"car_name",
"car_brand",
"car_description"
],
"type": "most_fields"
}
}
],
"should": [
{
"match": {
"car_country": {
"query": "italian sports car"
}
}
}
]
}
}
}

Elastic Search Query (a like x and y) or (b like x and y)

Some background info: In the bellow example user searched for "HTML CSS". I split each word from the search string and created the SQL query seen bellow.
Now I am trying to make an elastic search query that has the same logic as the following SQL query:
SELECT
title, description
FROM `classes`
WHERE
(`title` LIKE '%html%' AND `title` LIKE '%css%') OR
(description LIKE '%html%' AND description LIKE '%css%')
Currently, half way there but can't seem to get it right yet.
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "html"
}
},
{
"term": {
"title": "css"
}
}
]
}
},
"_source": [
"title"
],
"size": 30
}
Now I need to find how to add follow logic
OR (description LIKE '%html%' AND description LIKE '%css%')
One important point is that I need to only fetch documents that have both words in either title or disruption. I don't want to fetch documents that have only 1 word.
I will update questions as I find more info.
Update: The chosen answer also provides a way to boost scoring based on the field.
Can you try following query. You can use should for making or operation
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": { // Go for term if your field is analyzed
"title": {
"query": "html css",
"operator": "and",
"boost" : 2
}
}
}
]
}
},
{
"bool": {
"must": [
{
"match": {
"description": {
"query": "html css",
"operator": "and"
}
}
}
]
}
}
],
"minimum_number_should_match": 1
}
},
"_source": [
"title",
"description"
]
}
Hope this helps!!
I feel most appropriate query to be used in this case is multi_match.
multi_match query is convenient way of running the same query on
multiple fields.
So your query can be written as:
GET /_search
{
"_source": ["title", "description"],
"query": {
"multi_match": {
"query": "html css",
"fields": ["title^2", "description"],
"operator":"and"
}
}
}
_source filters the dataset so that only fields mentioned in array
will be displayed in results.
^2 denotes boosting title field with the number 2
operator:and makes sure that all terms in query must be matched
in either fields
From the elasticsearch 5.2 doc:
One option is to use the nested datatype instead of the object datatype.
More details here: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/nested.html
Hope this helps

Search in every field with a fixed parameter

Perhaps it's a basic question; by the way, I need to search in every indexed field and to have a specific fixed value for another field.
How can I do it?
Currently I have a simple: query( "aValue", array_of_models )
I tried many options without success, for example:
query({
"query": {
"bool": {
"query": "aValue",
"filter": {
"term": {
"published": "true"
}
}
}
}
})
I would prefer to avoid to specify the fields to search in because I use the same search params for different models.
I found a solution, perhaps it's not optimized but works:
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": "aValue"
}
}
],
"filter": {
"term": {
"published": true
}
}
}
}
}
Not sure if I understood correctly your intention.
The _all field is as default enabled. So if you have no special mapping every indexed field value is added as text string to the _all field.
You can use the
Query String Query, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
Simple Query String Query, https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
With a simple query like this, that should work for you.
GET my_index/_search
{
"query": {
"simple_query_string": {
"query": "aValue",
"fields": []
}
}
}
Both query types contains parameters, that should suffice your use case IMHO.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Resources