Fuzziness in date type filed - elasticsearch

I have date field in my mapping and I want to do fuzzy search on my field
Below is my code
GET _search
{
"query": {
"fuzzy": {
"death_date": {
"value": "3548"
}
}
}
}
Current result don't return data based as per expectation.
Although I have 3548 value it's score is less that 3549 value which appears on the top of the result
I have changed my query to include range parameter as suggested
GET _search
{
"query" : {
"bool": {
"must":
{
"match": {
"marriages.marriage_year": "1630"
}
},
"should":
{
"match": {
"first_name":
{ "query" : "mary",
"fuzziness":"2"
}
}
},
"must":
{
"range" : {
"marriages.marriage_year": {
"gt" : "1620",
"lte" : "1740"
}
}
}
}
}
}
It is returning data with marriages.marriage_year= "1630" with Mary as first_name as highest score.I also want to include marriages.marriage_year between 1620 - 1740 which are not shown in the results. It is showing data only for marraige_year 1630

Fuzzy query is meant to work on string fields to accommodate for typing errors. It gives you result on basis of edit distance. It doesn't make sense to use it on numeric fields. As 1000, 9000 has distance 1 only but they are far apart. You can do a range query as suggested by Russ or if you are bothered about edit distance and not range, index it as string field and then do fuzzy query.

Related

Elasticsearch: How to filter results with a specific word in a value using elasticsearch

I need to add a parameter to my search that filters results containing a specific word in a value. The query is searching for user history records and contains a url key. I need to filter out /history and any other url containing that string.
Here's my current query:
GET /user_log/_search
{
"size" : 50,
"query": {
"match": {
"user_id": 56678
}
}
}
Here's an example of a record, boiled down to just the value we're looking at:
"_source": {
"url": "/history?page=2&direction=desc",
},
How can the parameters of the search be changed to filter out this result.
You can use the filter param of boolean query in Elasticsearch.
if your url field is of type keyword, you can use the below query
{
"query": {
"bool": {
"must": {
"match": {
"user_id": 56678
}
},
"filter": { --> note filter
"term": {
"url": "/history"
}
}
}
}
}
I found a way to solve my specific issue. Instead of filtering on the url I'm filtering on a different value. Here's what I'm using now:
{
"size" : 50,
"query": {
"bool" : {
"must" : {
"match" : { "user_id" : 56678 }
},
"must_not": {
"match" : { "controller": "History" }
}
}
}
}
I'm still going to leave this question open for a while to see if anyone has other ways of solving the original problem.

Boost result which has the current date in between dates

My mapping has two properties:
"news_from_date" : {
"type" : "string"
},
"news_to_date" : {
"type" : "string"
},
Search results have the properties news_from_date, news_to_date
curl -X GET 'http://172.2.0.5:9200/test_idx1/_search?pretty=true' 2>&1
Result:
{
"news_from_date" : "2022-05-30 00:00:00",
"news_to_date" : "2022-06-23 00:00:00"
}
Question is: How can I boost all results with the current date being in between their "news_from_date"-"news_to_date" interval, so they are shown as highest ranking results?
Tldr;
First off if you are going to play with dates, you should probably use the one of the dates type provided by Elasticsearch.
They are many way to approach you problem, using painless, using scoring function or even more classical query types.
Using Should
Using the Boolean query type, you have multiple clauses.
Must
Filter
Must_not
Should
Should allow for optionals clause to be factored in the final score.
So you go with:
GET _search
{
"query": {
"bool": {
"should": [
{
"range": {
"news_from_date": {
"gte": "now"
}
}
},
{
"range": {
"news_to_date": {
"lte": "now"
}
}
}
]
}
}
}
Be aware that:
You can use the minimum_should_match parameter to specify the number or percentage of should clauses returned documents must match.
If the bool query includes at least one should clause and no must or filter clauses, the default value is 1. Otherwise, the default value is 0.
Using a script
As provided by the documentation, you can create a custom function to score your documents according to your own business rules.
The script is using Painless (a stripped down version of java)
GET /_search
{
"query": {
"function_score": {
"query": {
"match": { "message": "elasticsearch" }
},
"script_score": {
"script": {
"source": "Math.log(2 + doc['my-int'].value)"
}
}
}
}
}

What is the difference between should and boost final score calculation?

I'm a little confused about what is the difference between should and boost final score calculation
when a bool query has a must clause, the should clauses act as a boost factor, meaning none of them have to match but if they do, the relevancy score for that document will be boosted and thus appear higher in the result.
so,if we have:
one query which contains must and should clauses
vs
second query which contains must clause and boosting clause
Is there a difference ?
when you recommend to use must and should vs must and boosting clauses in a query ?
You can read the documentation of boolean query here, there is huge difference in the should and boost.
Should and must both contributes to the _score of the document, and as mentioned in the above documentation, follows the
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.
While boost is a parameter, using which you can increase the weight according to your value, let me explain that using an example.
Index sample docs
POST _doc/1
{
"brand" : "samsung",
"name" : "samsung phone"
}
POST _doc/2
{
"brand" : "apple",
"name" : "apple phone"
}
Boolean Query using should without boost
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple"
}
}
}
]
}
}
}
Search result showing score
"max_score": 1.3862942,
Now in same query use boost of factor 10
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple",
"boost": 10 --> Note additional boost
}
}
}
]
}
}
}
Query result showing boost
"max_score": 7.624619, (Note considerable high score)
In short, when you want to boost a particular document containing your query term, you can additionally pass the boost param and it will be on top of the normal score calculated by should or must.

Elastic Search query_shard_exception failed to execute query on datetime field

recently I use Kibana for get data from ElasticSearch.
There are that node:
{ "_index" : "impasti",
"_type" : "impasti",
"_id" : "2019-01-02T15:25:20",
"_score" : 1.9806902,
"_source" : {
"sensor" : "Temperature",
"mac_address" : "",
"time" : "2019-01-02T14:25:19.728709Z",
"unit" : "'C",
"value" : 20.937
}},
I try get the data by the time field, into datetime range or exactly datetime.
But when I run this query
POST /impasti/impasti/_search
{'query':{
"query_string": {
"default_field": "time",
"query": "2019-01-02T14:25:19.728709Z"
}
}
}
the response is an error like this:
"type": "query_shard_exception" "reason": "Failed to parse query [2019-01-02T14:25:19.728709Z]",
Where is the mistake?
Thanks guys
Error states that you would need to do something more in order for the ES query parser to understand that the value is a date and not a string/text/keyword type:
Moreover date fields are usually used for a particular range. Below is how it can be done via adding Range feature
Using query_string:
POST your_index_name/_search
{
"query": {
"query_string": {
"default_field": "time",
"query": "time:[2019-01-02T14:25:19.728709Z TO 2019-01-02T14:25:19.728709Z]"
}
}
}
Generally it is [min To max] for finding docs in specified time range, but in case if you want to find documents for that date, mention same date for both min and max.
Using Range Bool Query:
POST you_index_name/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"mydate": {
"gte": "2019-01-02T14:25:19.728709Z",
"lte": "2019-01-02T14:25:19.728709Z"
}
}
}
]
}
}
}
The above is an example of Range Query using query DSL
Alternatively you can also make use of a simple match query to get what you want whilst using query DSL via Term Query, that is because internally it is stored as long value.
POST your_index_name/_search
{
"query": {
"term": {
"mydate": "2019-01-02T14:25:19.728709Z"
}
}
}
Note: Elasticsearch internally stores the date values in the form of long datatype in its inverted index as mentioned in this link
Could you update that query which gives you this error in the question. I can quickly check and let you know.
Sure!
This is the query
`POST /impasti/impasti/_search{
"query":{
"filtered": {
"query": {
"query_string": {
"default_field": "time",
"query": "time:[2019-01-02T14:20:19.728709Z TO 2019-02-02T14:25:19.728709Z]"
}
},
"filter": {
"term":{ "sensor": "temperature" }
}
}
}
}`
It get this error:
no [query] registered for [filtered]

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Resources