Finding matches for two connected fields with fuzziness - elasticsearch

I'm trying to search a specific person by his given name and surname. I think the best option to search within two fields simultaneously is a bool query:
{
"query":{
"bool":{
"must":[
{"match": {"name":"Martin"}},
{"match": {"surname":"Mcfly"}}
]
}
}
}
But bool queries don't seem to support fuzziness. So what could I do to find the person "Marty Mcfly" since this match isn't found by the above query. I also would like to be ably to find someone like "Marty J. Mcfly" if it's possible.

bool is just a wrapper to join AND/OR/NOT/FILTER operations.
In your case it would make sense to use multi_match query:
{
"query":{
"bool":{
"must":[
{
"multi_match":{
"query":"Marty J. Mcfly",
"operator": "and",
"fields":[
"name",
"surname"
]
}
}
]
}
}
}
This will search data in both name and surname fields and ensure that all terms must match in both of your fields.
Updated
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "Martin",
"operator": "and",
"fuzziness": 1
}
}
},
{
"match": {
"surname": {
"query": "Mcfly",
"operator": "and",
"fuzziness": 1
}
}
}
]
}
}
}

Related

Filtering and matching with an elasticsearch query

I am having trouble applying a secondary filter to my elasticsearch query below. Only the first filter is matching. I want both filters to apply to the query.
"query": {
"bool": {
"must": [
{
"bool": {
"filter": {
"range": {
"#timestamp": {
"gte": "2019-03-12",
"lte": "2019-03-13"
}
}
}
}
},
{
"bool": {
"filter": {
"bool": {
"must": {
"match": {
"msg_text": "foo AND bar"
}
}
}
}
}
}
]
}
}
Well I've mentioned two solutions, first one makes use of Match Query while the second one makes use of Query String.
Also I'm assuming msg_text field is of type text.
Difference is that, query_string uses a parser, that would parse the text you mention based on the operators like AND, OR.
While match query would read the text, analyse the text and based on it constructs a bool query. In the sense you don't need to mention operators and it won't work
You can read more about them in the links I've mentioned.
1. Using Match Query
POST <your_index_name>/_search
{
"query":{
"bool":{
"filter":{
"bool":{
"must":[
{
"range":{
"#timestamp":{
"gte":"2019-03-12",
"lte":"2019-03-13"
}
}
},
{
"match":{
"msg_text":"foo bar"
}
}
]
}
}
}
}
}
2. Using Query String
POST <your_index_name>/_search
{
"query":{
"bool":{
"filter":{
"bool":{
"must":[
{
"range":{
"#timestamp":{
"gte":"2019-03-12",
"lte":"2019-03-13"
}
}
},
{
"query_string":{
"fields": ["msg_text"], <----- You can add more fields here using comma as delimiter
"query":"foo AND bar"
}
}
]
}
}
}
}
}
Technically nothing is wrong with your solution, in the sense, it would work, but I hope my answers clear, simplifies the query and helps you understand what you are trying to do.
Let me know if it helps!

fuzziness in bool query with multimatch elasticsearch

i am using elasticsearch version 6.3.0. I want to use fuzziness along with multimatch. but there is no option for that. Can anybody provide me a solution ? Thanks in advance
Query :
{ "query": {
"bool": {
"must": [
{"function_score": {
"query": {
"multi_match": {
"query": "local",
"fields": [
"user.name^3",
"main_product"
],
"type": "phrase"
}
}
}}
],
"filter": {
"geo_distance": {
"distance": "1000km",
"user.geolocation": {
"lat": 25.55,
"lon": -84.44
}
}
}
}
} }
Looking at your existing query, you are looking for mix of
Boosting based on field
Multifield match
Phrase Matching
Fuzzy Matching
If it isn't phrase_match you can simply add "fuzziness": "AUTO" or "fuzziness":1 or whatever number based on your requirement in your existing query and you'd get what you are looking for.
Fuzzy without Phrase
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"function_score":{
"query":{
"multi_match":{
"query":"local",
"fields":[
"user.name^3",
"main_product"
],
"fuzziness":"AUTO"
}
}
}
}
],
"filter":{
"geo_distance":{
"distance":"1000km",
"user.geolocation":{
"lat":25.55,
"lon":-84.44
}
}
}
}
}
}
Fuzzy with Phrase:
In this case, you need to make use of Span Queries
I've discarded the filtering part just for the sake of simplicity and came up with the below query. And let's say that I am searching for phrase called pearl jam.
POST <your_index_name>/_search
{
"query":{
"function_score":{
"query":{
"bool":{
"should":[
{
"bool":{
"boost":3,
"must":[
{
"span_near":{
"clauses":[
{
"span_multi":{
"match":{
"fuzzy":{
"user.name":"pearl"
}
}
}
},
{
"span_multi":{
"match":{
"fuzzy":{
"user.name":"jam"
}
}
}
}
],
"slop":0,
"in_order":true
}
}
]
}
},
{
"bool":{
"boost":1,
"must":[
{
"span_near":{
"clauses":[
{
"span_multi":{
"match":{
"fuzzy":{
"main_product":"pearl"
}
}
}
},
{
"span_multi":{
"match":{
"fuzzy":{
"main_product":"jam"
}
}
}
}
],
"slop":0,
"in_order":true
}
}
]
}
}
]
}
}
}
}
}
So what I am doing is performing boosting based on fields in multi-field phrase with fuzzy match for phrase called pearl jam.
Having slop: 0 and in_order:true would enable me to do phrase match for the words I've specified in the clauses.
Let me know if you have any queries.
What makes you think there is no option for fuzziness on a multi-match query?
For example, with the data below:
http://localhost:9200/question_1/doc/_bulk
{"index":{}}
{"name" : "John Lazy", "text": "lazzi"}
{"index":{}}
{"name" : "John Lassi", "text": "lasso"}
{"index":{}}
{"name" : "Joan Labbe", "text": "lazzy"}
And this query:
http://localhost:9200/question_1/_search
{
"query": {
"multi_match" : {
"query" : "lazi",
"fields" : [ "name", "text" ],
"fuzziness": 1
}
}
}
Then I get one result, but if I change the fuzziness parameter to 2 I'll get three results.

Elasticsearch get all parents with no children

Originally I've been trying to get a list of parents and a single most recent child for each one of them. I've figured how to do that with the following query
{"query":
{"has_child":
{"inner_hits":
{"name": "latest", "size": 1, "sort":
[{"started_at": {"order": "desc"}}]
},
"type": "child_type",
"query": {"match_all": {}}
}
}
}
But the problem is — the results do not include parents with no children. Adding min_children: 0 doesn't help either. So I thought I could make a query for all parents with no children and combine those two in a single OR query. But I'm having trouble building such a query. Would appreciate any suggestions.
Here is your query:
{
"query":{
"bool":{
"should":[
{
"bool":{
"must_not":[
{
"has_child":{
"type":"child_type",
"query":{
"match_all":{}
}
}
}
]
}
},
{
"has_child":{
"inner_hits":{
"name":"latest",
"size":1, "sort":[{"started_at": {"order": "desc"}}]
},
"type":"child_type",
"query":{
"match_all":{}
}
}
}
]
}
}
}
Another point: just use must_not for has_child will not only show parents without child, but all the child(s) as well, because they all don't have any child...
So another limitation should be added in the bool query:
{
"query":{
"bool": {
"must_not": [
{
"has_child": {
"type": "<child-type>",
"query": {
"match_all": {}
}
}
}
],
"should": [
{
"term": {
"<the join field>": {
"value": "<parent-type>"
}
}
}
]
}
}
}

Elasticsearch insensitive search by term

I've got next query for search
{
"query":{
"bool":{
"must":[
{
"term":{
"cardrecord.fields.name.raw":"HERE_IS_SOME_NAME"
}
}
],
"must_not":[
],
"should":[
]
}
},
"from":0,
"size":50,
"sort":[
],
"facets":{
}
}
How can I modify the query for case insensetive search by term? I can add some more description if needed.
all the fields are analyzed using Standard Analyzer by default. If "index":"not_analyzed" is specified in mapping then the field will not be analyzed
Standard Analyzer converts the input string to lowercase and splits with whitespace and special characters. so in your case, HERE_IS_SOME_NAME will be split into tokens some, name. But the tokenshere and is will not be created as they are english adverbs.
Same thing happens when you search for "cardrecord.fields.name.raw" field. It splits into tokens and searches for all documents with that tokens in specific field (using Standard Analyzer). P.S: Separate or different analyzer can be configured for searching also.
so match query searches for all documents with some and name tokens. Hence you would have got additional documents.
term query specifically looks for exact case and full word match. But it will not match any document since tokens are already split and lowercase
Follow these steps for your requirement:
{
"mappings": {
"my_type": {
"properties": {
"cardrecord.fields.name.raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Update this mapping for your index named my_type as per code given above. You need to create new index with new mapping though. Since update might not reflect.
Then try running your search query in your question.
Adding detailed sequence of query:
mapping:
{
"mappings": {
"my_type": {
"properties": {
"cardrecord.fields.name.raw": {
"type": "string",
"index": "not_analyzed",
"store": "true"
}
}
}
}
}
Indexing document:
{
"cardrecord.fields.name.raw": "HERE_IS_SOME_NAME"
}
search query:
{
"query": {
"bool": {
"must": [
{
"term": {
"cardrecord.fields.name.raw": "HERE_IS_SOME_NAME"
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {}
}
use filter instead of query, this will reduce amount of processing a lot:
{
"filter":{
"bool":{
"must":[
{
"term":{
"cardrecord.fields.name.raw":"HERE_IS_SOME_NAME"
}
}
],
"must_not":[
],
"should":[
]
}
},
"from":0,
"size":50,
"sort":[
],
"facets":{
}
}
Try using a match query
{
"query":{
"bool":{
"must":[
{
"match":{
"cardrecord.fields.name.raw":"HERE_IS_SOME_NAME"
}
}
],
"must_not":[
],
"should":[
]
}
},
"from":0,
"size":50,
"sort":[
],
"facets":{
}
}
You can use a match query, but you need to match the cardrecord.fields.name field, because the raw subfield is probably not_analyzed and thus won't work for case-insensitive matching.
{
"query": {
"bool": {
"must": [
{
"match": {
"cardrecord.fields.name": "HERE_IS_SOME_NAME"
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {}
}

multiple search conditions in one query in es and distinguish the items according to the conditions

For one case I need to put multiple search conditions in one query to reduce the number of queries we need.
However, I need to distinguish the returning items based on the conditions.
Currently I achieved this goal by using function score query, specifically: each condition is assigned with a score, and I can differentiate the results based on those scores.
However, the performance is not that good. Plus now we need to get the doc count of each condition.
So is there any way to do it? I'm thinking using aggregation, but not sure if I can do it.
Thanks!
update:
curl -X GET 'localhost:9200/locations/_search?fields=_id&from=0&size=1000&pretty' -d '{
"query":{
"bool":{
"should":[
{
"filtered":{
"filter":{
"bool":{
"must":[{"term":{"city":"new york"}},{"term":{"state":"ny"}}]
}
}
}
},
{
"filtered":{
"filter":{
"bool":{
"must":[{"term":{"city":"los angeles"}},{"term":{"state":"ca"}}]
}
}
}
}
]
}
}}'
Well to answer the first part of your question , names queries are the best.
For eg:
{
"query": {
"bool": {
"should": [
{
"match": {
"field1": {
"query": "qbox",
"_name": "firstQuery"
}
}
},
{
"match": {
"field2": {
"query": "hosted Elasticsearch",
"_name": "secondQuery"
}
}
}
]
}
}
}
This will return an additional field called matched_queries for each hit which will have the information on queries matched for that document.
You can find more info on names queries here
But this this information cant be used for aggregation.
So you need to handle the second part of your question in a separate manner.
Filter aggregation for each query type would be the idea solution here.
For eg:
{
"query": {
"bool": {
"should": [
{
"match": {
"text": {
"query": "qbox",
"_name": "firstQuery"
}
}
},
{
"match": {
"source": {
"query": "elasticsearch",
"_name": "secondQuery"
}
}
}
]
}
},
"aggs": {
"firstQuery": {
"filter": {
"term": {
"text": "qbox"
}
}
},
"secondQuery": {
"filter": {
"term": {
"source": "elasticsearch"
}
}
}
}
}
You can find more on filter aggregation here

Resources