Elasticsearch insensitive search by term - elasticsearch

I've got next query for search
{
"query":{
"bool":{
"must":[
{
"term":{
"cardrecord.fields.name.raw":"HERE_IS_SOME_NAME"
}
}
],
"must_not":[
],
"should":[
]
}
},
"from":0,
"size":50,
"sort":[
],
"facets":{
}
}
How can I modify the query for case insensetive search by term? I can add some more description if needed.

all the fields are analyzed using Standard Analyzer by default. If "index":"not_analyzed" is specified in mapping then the field will not be analyzed
Standard Analyzer converts the input string to lowercase and splits with whitespace and special characters. so in your case, HERE_IS_SOME_NAME will be split into tokens some, name. But the tokenshere and is will not be created as they are english adverbs.
Same thing happens when you search for "cardrecord.fields.name.raw" field. It splits into tokens and searches for all documents with that tokens in specific field (using Standard Analyzer). P.S: Separate or different analyzer can be configured for searching also.
so match query searches for all documents with some and name tokens. Hence you would have got additional documents.
term query specifically looks for exact case and full word match. But it will not match any document since tokens are already split and lowercase
Follow these steps for your requirement:
{
"mappings": {
"my_type": {
"properties": {
"cardrecord.fields.name.raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Update this mapping for your index named my_type as per code given above. You need to create new index with new mapping though. Since update might not reflect.
Then try running your search query in your question.
Adding detailed sequence of query:
mapping:
{
"mappings": {
"my_type": {
"properties": {
"cardrecord.fields.name.raw": {
"type": "string",
"index": "not_analyzed",
"store": "true"
}
}
}
}
}
Indexing document:
{
"cardrecord.fields.name.raw": "HERE_IS_SOME_NAME"
}
search query:
{
"query": {
"bool": {
"must": [
{
"term": {
"cardrecord.fields.name.raw": "HERE_IS_SOME_NAME"
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {}
}

use filter instead of query, this will reduce amount of processing a lot:
{
"filter":{
"bool":{
"must":[
{
"term":{
"cardrecord.fields.name.raw":"HERE_IS_SOME_NAME"
}
}
],
"must_not":[
],
"should":[
]
}
},
"from":0,
"size":50,
"sort":[
],
"facets":{
}
}

Try using a match query
{
"query":{
"bool":{
"must":[
{
"match":{
"cardrecord.fields.name.raw":"HERE_IS_SOME_NAME"
}
}
],
"must_not":[
],
"should":[
]
}
},
"from":0,
"size":50,
"sort":[
],
"facets":{
}
}

You can use a match query, but you need to match the cardrecord.fields.name field, because the raw subfield is probably not_analyzed and thus won't work for case-insensitive matching.
{
"query": {
"bool": {
"must": [
{
"match": {
"cardrecord.fields.name": "HERE_IS_SOME_NAME"
}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {}
}

Related

How to join ElasticSearch query with multi_match, boosting, wildcard and filter?

I'm trying to acheve this goals:
Filter out results by bool query, like "status=1"
Filter out results by bool range query, like "discance: gte 10 AND lte 60"
Filter out results by match at least one int value from int array
Search words in many fields with calculating document score. Some fields needs wildcard, some boosting, like importantfield^2, somefield*, someotherfield^0.75
All above points join by AND operator. All terms in one point join by OR operator.
Now I wrote something like this, but wildcards not working. Searching "abc" don't finds "abcd" in "name" field.
How to solve this?
{
"filtered": {
"query": {
"multi_match": {
"query": "John Doe",
"fields": [
"*name*^1.75",
"someObject.name",
"tagsArray",
"*description*",
"ownerName"
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"status": 2
}
},
{
"bool": {
"should": [
{
"term": {
"someIntsArray": 1
}
},
{
"term": {
"someIntsArray": 5
}
}
]
}
},
{
"range": {
"distanceA": {
"lte": 100
}
}
},
{
"range": {
"distanceB": {
"gte": 50,
"lte": 100
}
}
}
]
}
}
}
}
Mappings:
{
"documentId": {
"type": "integer"
},
"ownerName": {
"type": "string",
"index": "not_analyzed"
},
"description": {
"type": "string"
},
"status": {
"type": "byte"
},
"distanceA": {
"type": "short"
},
"createdAt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"distanceB": {
"type": "short"
},
"someObject": {
"properties": {
"someObject_id": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"someIntsArray": {
"type": "integer"
},
"tags": {
"type": "string",
"index": "not_analyzed"
}
}
You can make use of Query String if you would want to apply wildcard for multiple fields and at the same time apply various boosting values for individual fields:
Below is how your query would be:
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"abc*",
"fields":[
"*name*^1.75",
"someObject.name",
"tagsArray",
"*description*",
"ownerName"
]
}
}
],
"filter":{
"bool":{
"must":[
{
"term":{
"status":"2"
}
},
{
"bool":{
"minimum_should_match":1,
"should":[
{
"term":{
"someIntsArray":1
}
},
{
"term":{
"someIntsArray":5
}
}
]
}
},
{
"range":{
"distanceA":{
"lte":100
}
}
},
{
"range":{
"distanceB":{
"gte": 50,
"lte":100
}
}
}
]
}
}
}
}
}
Note that for the field someIntsArray, I've made use of "minimum_should_match":1 so that you won't end up with documents that'd have neither of those values.
Updated Answer:
Going by the updated comment, you can have the fields with wildcard search used by query_string and you can make use of simple match query with boosting as shown in below. Include both these queries (can even add more match queries depending on your requirement) in a combine should clause. That way you can control where wildcard query can be used and where not.
{
"query":{
"bool":{
"should":[
{
"query_string":{
"query":"joh*",
"fields":[
"name^2"
]
}
},
{
"match":{
"description":{
"query":"john",
"boost":15
}
}
}
],
"filter":{
"bool":{
"must":[
{
"term":{
"status":"2"
}
},
{
"bool":{
"minimum_should_match":1,
"should":[
{
"term":{
"someIntsArray":1
}
},
{
"term":{
"someIntsArray":5
}
}
]
}
},
{
"range":{
"distanceA":{
"lte":100
}
}
},
{
"range":{
"distanceB":{
"lte":100
}
}
}
]
}
}
}
}
}
Let me know if this helps

fuzziness in bool query with multimatch elasticsearch

i am using elasticsearch version 6.3.0. I want to use fuzziness along with multimatch. but there is no option for that. Can anybody provide me a solution ? Thanks in advance
Query :
{ "query": {
"bool": {
"must": [
{"function_score": {
"query": {
"multi_match": {
"query": "local",
"fields": [
"user.name^3",
"main_product"
],
"type": "phrase"
}
}
}}
],
"filter": {
"geo_distance": {
"distance": "1000km",
"user.geolocation": {
"lat": 25.55,
"lon": -84.44
}
}
}
}
} }
Looking at your existing query, you are looking for mix of
Boosting based on field
Multifield match
Phrase Matching
Fuzzy Matching
If it isn't phrase_match you can simply add "fuzziness": "AUTO" or "fuzziness":1 or whatever number based on your requirement in your existing query and you'd get what you are looking for.
Fuzzy without Phrase
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"function_score":{
"query":{
"multi_match":{
"query":"local",
"fields":[
"user.name^3",
"main_product"
],
"fuzziness":"AUTO"
}
}
}
}
],
"filter":{
"geo_distance":{
"distance":"1000km",
"user.geolocation":{
"lat":25.55,
"lon":-84.44
}
}
}
}
}
}
Fuzzy with Phrase:
In this case, you need to make use of Span Queries
I've discarded the filtering part just for the sake of simplicity and came up with the below query. And let's say that I am searching for phrase called pearl jam.
POST <your_index_name>/_search
{
"query":{
"function_score":{
"query":{
"bool":{
"should":[
{
"bool":{
"boost":3,
"must":[
{
"span_near":{
"clauses":[
{
"span_multi":{
"match":{
"fuzzy":{
"user.name":"pearl"
}
}
}
},
{
"span_multi":{
"match":{
"fuzzy":{
"user.name":"jam"
}
}
}
}
],
"slop":0,
"in_order":true
}
}
]
}
},
{
"bool":{
"boost":1,
"must":[
{
"span_near":{
"clauses":[
{
"span_multi":{
"match":{
"fuzzy":{
"main_product":"pearl"
}
}
}
},
{
"span_multi":{
"match":{
"fuzzy":{
"main_product":"jam"
}
}
}
}
],
"slop":0,
"in_order":true
}
}
]
}
}
]
}
}
}
}
}
So what I am doing is performing boosting based on fields in multi-field phrase with fuzzy match for phrase called pearl jam.
Having slop: 0 and in_order:true would enable me to do phrase match for the words I've specified in the clauses.
Let me know if you have any queries.
What makes you think there is no option for fuzziness on a multi-match query?
For example, with the data below:
http://localhost:9200/question_1/doc/_bulk
{"index":{}}
{"name" : "John Lazy", "text": "lazzi"}
{"index":{}}
{"name" : "John Lassi", "text": "lasso"}
{"index":{}}
{"name" : "Joan Labbe", "text": "lazzy"}
And this query:
http://localhost:9200/question_1/_search
{
"query": {
"multi_match" : {
"query" : "lazi",
"fields" : [ "name", "text" ],
"fuzziness": 1
}
}
}
Then I get one result, but if I change the fuzziness parameter to 2 I'll get three results.

In ElasticSearch, how do I filter the nested documents in my result?

Suppose, in ElasticSearch 5, I have data with nesting like:
{"number":1234, "names": [
{"firstName": "John", "lastName": "Smith"},
{"firstName": "Al", "lastName": "Jones"}
]},
...
And I want to query for hits with number 1234 but return only the names that match "lastName": "Jones", so that my result omits names that don't match. In other words, I want to get back only part of the matching document, based on a term query or similar.
A simple nested query won't do, as such would be filtering top-level results. Any ideas?
{ "query" : { "bool": { "filter":[
{ "term": { "number":1234} },
???? something with "lastName": "Jones" ????
] } } }
I want back:
hits: [
{"number":1234, "names": [
{"firstName": "Al", "lastName": "Jones"}
]},
...
]
hits section returns a _source - this is exactly the same document you have indexed.
You are right, nested query filters top-level results, but with inner_hits it will show you which inner nested objects caused these top-level documents to be returned, and this is exactly what you need.
names field can be excluded from top-level hits using _source parameter.
{
"_source": {
"excludes": ["names"]
},
"query":{
"bool":{
"must":[
{
"term":{
"number":{
"value":"1234"
}
}
},
{
"nested":{
"path":"names",
"query":{
"term":{
"names.lastName":"Jones"
}
},
"inner_hits":{
}
}
}
]
}
}
}
So now top-level documents are returned without names field, and you have an additional inner_hits section with the names that match.
You should treat nested objects as part of a top-level document.
If you really need them to be separate - consider parent/child relations.
Try something like this
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
{ "term": { "number":1234} }
},
{
"nested": {
"path": "something",
"query": {
"term": {
"something.lastName": "Jones"
}
},
"inner_hits" : {}
}
}
]
}
}
}
}
}
I used this Refrence
Similar but a bit different, use the should parameter and then look at inner hits for the names. This will return the top level doc and then inner_hits will have any hits.
{
"_source": {
"excludes": ["names"]
},
"query":{
"bool":{
"must":[
{
"term":{
"number":{
"value":"1234"
}
}
}
],
should: [
{
"nested":{
"path":"names",
"query":{
"term":{
"names.lastName":"Jones"
}
},
"inner_hits":{
}
}
}
]
}
}
}

Elasticsearch get all parents with no children

Originally I've been trying to get a list of parents and a single most recent child for each one of them. I've figured how to do that with the following query
{"query":
{"has_child":
{"inner_hits":
{"name": "latest", "size": 1, "sort":
[{"started_at": {"order": "desc"}}]
},
"type": "child_type",
"query": {"match_all": {}}
}
}
}
But the problem is — the results do not include parents with no children. Adding min_children: 0 doesn't help either. So I thought I could make a query for all parents with no children and combine those two in a single OR query. But I'm having trouble building such a query. Would appreciate any suggestions.
Here is your query:
{
"query":{
"bool":{
"should":[
{
"bool":{
"must_not":[
{
"has_child":{
"type":"child_type",
"query":{
"match_all":{}
}
}
}
]
}
},
{
"has_child":{
"inner_hits":{
"name":"latest",
"size":1, "sort":[{"started_at": {"order": "desc"}}]
},
"type":"child_type",
"query":{
"match_all":{}
}
}
}
]
}
}
}
Another point: just use must_not for has_child will not only show parents without child, but all the child(s) as well, because they all don't have any child...
So another limitation should be added in the bool query:
{
"query":{
"bool": {
"must_not": [
{
"has_child": {
"type": "<child-type>",
"query": {
"match_all": {}
}
}
}
],
"should": [
{
"term": {
"<the join field>": {
"value": "<parent-type>"
}
}
}
]
}
}
}

Multi_match and match queries together

I have the following queries in elastic search :
{
"query": {
"multi_match": {
"query": "bluefin bat",
"type": "phrase",
"fields": [
"title^5",
"body.value"
]
}
},
"highlight": {
"fields": {
"body.value": {
"number_of_fragments": 3
}
}
},
"fields": [
"title",
"id"
]
}
I have tried using "dis_max" but then two of my fields have to be searched for the same query.
The remaining match query has a different query text.
The remaining match query is like this:
{
"query": {
"match": {
"ingredients": "key1, key2",
"analyzer": "keyword_analyzer"
}
}
}
How can I integrate these two queries without using dis_max for joining.
I figured out the answer. multi_match internally applies :
"dis_max"
Hence, you cannot apply dis_max with multi_match.
But what I could do is I could apply bool query to solve this type of problem.
I could apply should which actually translates to OR boolean value or I could apply must which is equivalent to AND.
So this is how I modified my query :
{
"query": {
"bool":{
"should": [
{"multi_match":
{"query": "SOME_QUERY",
"type": "phrase",
"fields": ["title^5","body"]
}
},
{
"match":{
"labels" :{
"query": "SOME_QUERY",
"analyzer": "keyword_analyzer"
}
}
},
{
"match":{
"displayName" :{
"query": "SOME_QUERY",
"fuzziness": "AUTO"
}
}
}
],
"minimum_number_should_match": "50%"
}
},
"fields": ["title","id","labels","displayName","username"],
"highlight": {
"fields": {
"body.storage.value": {
"number_of_fragments": 3}
}
}
}
I hope this helps someone in future.

Resources