Filter results from Elasticsearch if only a specific field matches - elasticsearch

I'm using the following query for searching across multiple fields:
{
"query": {
"multi_match": {
"query": "italian sports car",
"fields": ["car_name", "car_brand", "car_description", "car_country"],
"type": "most_fields"
}
}
}
In this example, I'm looking for sports cars made in Italy (hence the car_country field). However, this will return all the cars made in Italy even if they are not sports cars. I want car_country to be just an auxiliary search field, so I don't want hits when the only matched field is car_country. Is this possible? I know I can set a lower score for that field, but I want hits with only this matching field to be completely ignored.

There can be different ways you handle this problem depending on the scoring etc. you require from you results. For instance -
Use a bool query with 2 parts
Must query - include queries that must match for the document to be in the resultset
Should query - include queries that should match(and impact scoring) but do not decide if a document should or should not be in the result set.
Add the multi-match query without the car_country field in must query and a match query for car_country field in should query.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "italian sports car",
"fields": [
"car_name",
"car_brand",
"car_description"
],
"type": "most_fields"
}
}
],
"should": [
{
"match": {
"car_country": {
"query": "italian sports car"
}
}
}
]
}
}
}

Related

Elasticsearch exact search query

I'm using query string to search on documents in my index.
GET my_index/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "table test",
"default_field": "table.name",
"default_operator":"AND"
}
}]
}
}
}
the problem is that it returns all additional strings that include search keywords.. I wanna to give strings that have exact phrase.
for example the documents table test 1 and table test 12 and table test are in my index. when I search table test, I wanna it just return table test.
I used term also, but it could not consider space charter between strings!
how can I handle this?
your mapping is generated by Elasticsearch, than for every text field there will be a corresponding .keyword field and hence
{
"query": {
"term": {
"table.name.kwyword": { // Note .keyword in the field name.
"value": "table test",
"boost": 1.0
}
}
}
if you don't have a .keyword field, then you have to create a keyword field and use term query that is used for exact or keyword searches.
You can use Match Phrase Query as Amit suggested in another answer.
Also, if you want to use only Query String type of query then you can give your query in double quotes as shown below:
GET my_index/_search
{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "\"table test\"",
"default_field": "table.name",
"default_operator":"AND"
}
}]
}
}
}
Updated:
if you want to do exact match in entire field then you can go ahead with term query in elasticsearch:
{
"query": {
"term": {
"table.name.keyword": {
"value": "table test",
"boost": 1.0
}
}
}
}

multi_match vs should match vs must query_string in ElasticSearch

I tried these type of queries in ElasticSearch and wondering which type is the most suitable (most accurate and most efficient) one. Basically, one person can have multiple set of names (array). Names split into firstname, surname and middlename. Some person can have just firstname and surname. Parameter (input) is fullname (combination of firstname, surname and middlename in one string). Fuzzy logic added. One difference I notice is the score.
This is the score of the first result returned.
first query: 17.41911
second query: 24.332222
third query: 21.200104
Is this mean that the second query is the most accurate query for this requirement?
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "David Bill Gonzalo~",
"fields": [
"nameDetails.name.nameValue.firstName",
"nameDetails.name.nameValue.surname",
"nameDetails.name.nameValue.middleName"
]
}
}
]
}
}
}
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"nameDetails.name.nameValue.firstName": "David Bill Gonzalo~"
}
},
{
"match": {
"nameDetails.name.nameValue.surname": "David Bill Gonzalo~"
}
},
{
"match": {
"nameDetails.name.nameValue.middleName": "David Bill Gonzalo~"
}
}
]
}
}
}
GET /person/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"fields": [
"nameDetails.name.nameValue.firstName",
"nameDetails.name.nameValue.surname",
"nameDetails.name.nameValue.middleName"
],
"query": "David Bill Gonzalo~"
}
}
]
}
}
}
First Query:
The multi-match query allows us to run a query on multiple fields. It is an extension of the match query.
As in the first query, you have not specified any type parameter, so by default best_fields is considered the type. This finds all the documents which match with the query, but _score is calculated only from the best field.
To know more about the types of multi-match queries, refer to this part of the documentation.
Second Query:
This is a boolean query with the combination of the bool/should clause. The score from each matching should clause is taken to calculate the final score here.
Third Query:
In the third query, query_string is running against multiple fields.
As you have not specified any type parameter, so by default best_fields is considered the type. This finds all the documents which match with the query, but _score is calculated only from the best field.
Since you are querying on multiple fields, with the same query parameter i.e "David Bill Gonzalo~", according to me you should use a multi-match query. You can use multi-match queries with different options as well like boosting one or more fields, adding type parameter in multi-match queries, etc.

How to force certain fields in mult_match to have exact match

I am trying to match the title of a product listing to a database of known products. My first idea was to put the known products and their metadata into elasticsearch and try to find the best match with multi_match. My current query is something like:
{
"query": {
"multi_match" : {
"query": "Men's small blue cotton pants SKU123",
"fields": ["sku^2","title","gender","color", "material","size"],
"type" : "cross_fields"
}
}
}
The problem is sometimes it will return products with the wrong color. Is there a way i could modify the above query to only score items in my index that have a color field equal to a word that exists in the query string? I am using elasticsearch 5.1.
If you want elasticsearch to score only items that meet certain criteria then you need to use the terms query in a filter context.
Since the terms query does not analyze your query, you'll have to do that yourself. Something simple would be to tokenize by whitespace and lowercase and generate a query that looks like this:
{
"query": {
"bool": {
"filter": {
"terms": {
"color": ["men's", "small", "blue", "cotton", "pants", "sku123"]
}
},
"must": {
"multi_match": {
"query": "Men's small blue cotton pants SKU123",
"fields": [
"sku^2",
"title",
"gender",
"material",
"size"
],
"type": "cross_fields"
}
}
}
}
}

How can I score Elasticsearch matches for particular field names higher when using a full text search on _all?

I've setup an index that has many types representing user data such as a ShoppingList, Playlist, etc. Each type has an "identity_id" field for the user's unique identifier. I use the following query to search across all types and fields for a user (for a search function in a website):
GET _search
{
"query": {
"filtered": {
"query": {
"match_phrase_prefix": {
"_all": "awesome"
}
},
"filter": {
"match": {
"identity_id": 1
}
}
}
}
}
My questions are:
Is there a way to give a higher score to matches on fields that have "name" in the field name? For example, the ShoppingList type will have a shopping_list_name field, and I want a match on that to be higher than its other fields.
Is the above way of doing a full text search for a particular user (query then filter) the most efficient way? What about creating an index per user?
How about this query that boosts certain fields:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "awesome",
"fields": [
"*_name",
"field*"
]
}
},
"functions": [
{
"weight": 2,
"filter": {
"multi_match": {
"query": "awesome",
"fields": [
"*_name"
]
}
}
},
{
"weight": 1,
"filter": {
"multi_match": {
"query": "awesome",
"fields": [
"field*"
]
}
}
}
]
}
}
}
What the query above does is to boost (weigth: 2) the *_name fields query and not do apply any boosting to fields called field*.
Is the above way of doing a full text search for a particular user (query then filter) the most efficient way? What about creating an index per user?
Regarding this ^ question, that's more complicated and you also need to consider how many users you have, the hardware resources the cluster has, structure of data, queries used etc.

ElasticSearch HasChild Query

In my ElasticSearch instance, I have two types in a single index. Think of them as "Profile" and "ProfileMetadata". There may be many ProfileMetadata items pointing to a single Profile.
Profile contains basic user info. Say firstname. ProfileMetadata contains metadata for the user, say "Tags".
What I want to be able to do, is run a single query that may look like the following. "Firstname NOT tag". The user would type this into the search bar. It would be a single search bar to search across both types at once.
The two queries are below :
Profile Query
GET _search
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"PersonalDetail.FirstName",
"PersonalDetail.LastName",
"PersonalDetail.Email"
],
"query": "John Smith NOT tag"
}
}
}
}
}
ProfileMetadata Query
GET _search
{
"query": {
"filtered": {
"query": {
"has_child":
{
"type": "ProfileMetadata",
"query":
{
"query_string": {
"fields": [
"Tags"
],
"query": "John Smith NOT Tag"
}
}
}
}
}
}
}
Is there any way to combine these queries, so that we get all John Smiths without that particular tag. I am using NEST in C#, and at the moment I am taking both of these queries (In NEST form), and using an OR between them, which isn't working as I need it to. So I'm trying to break it down into pure ES form first.
Maybe you could use only the second query, it will return all the matching parent document and then pass a filter on it representing your first query.
In this way you would not have to do a OR between two queries and might gain in performance with only one query+ filter.

Resources