Optimize MLT elasticsearch query - performance

I want to apply more like this query, so I use this(python wrapper for elasticsearch):
{
"query": {
"more_like_this": {
"fields": ["title", "content"],
"docs": [
{
"_index": "kavosh",
"_type": "articles",
"_id": str(news_id)
}
]
}
},
"size": 1,
}
but I have many timeout. so i decided to reduce range of mlt checking to one week. (Is it effective?) for example adding this:
{
"range": {
"publication_date": {
"lte": now,
"gte": now - 1week
}
}
}
How can apply this filter to MLT query and do you have any suggestion to optimize query?

You can use below query:
{
"query": {
"filtered": {
"query": {
"more_like_this": {
"fields": [
"title",
"content"
],
"docs": [
{
"_index": "kavosh",
"_type": "articles",
"_id": str(news_id)
}
]
}
},
"filter": {
"range": {
"publication_date": {
"lte": "now",
"gte": "now - 1week"
}
}
}
}
}
}
Hope it helps.

Related

Elastic Search Query on String Array Field

I'm working on Elastic Search and facing an issue regarding Array field. I've index named test-index with following mapping.
{
"test-index": {
"mappings": {
"properties": {
"courses": {
"type": "keyword"
}
}
}
}
}
My elastic search documents looks like this.
"hits": [
{
"_index": "test-index",
"_id": "1ac:0000000000_1",
"_score": 1,
"_source": {
"courses": [
"Course-1A",
"Course-1B",
"Course-1C",
"Course-1D",
"Course-1E",
"Course-1F"
]
}
},
{
"_index": "test-index",
"_id": "1ac:0000000000_2",
"_score": 1,
"_source": {
"courses": [
"Course-2A",
"Course-2B",
"Course-2C",
"Course-1A"
]
}
}
]
The document _id is my student ID. I want to get results with the maximum/highest relevance at the top and lowest on the bottom.
e.g
If I'm searching for courses ["Course-2A","Course-2B","Course-1C"] then user 1ac:0000000000_2 should appear at the top and user 1ac:0000000000_1 at the bottom.
I've tried following queries.
GET test-index/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"courses": [
"Course-1A",
"Course-2A",
"Course-2B"
]
}
}
]
}
}
}
User 1ac:0000000000_1 at the top and other at the bottom.
GET test-index/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"courses": "Course-1A",
}
},
{
"term": {
"courses": "Course-2A",
}
},
{
"term": {
"courses": "Course-2B",
}
}
],
"minimum_should_match": "70%"
}
}
}
This gives me some desired results but not sure for larger dataset.

elasticsearch find doc by time with datetime field

I'm trying to retrieve all documents that have a date between 2 dates and a time between 2 hours.
I can't get the query to work.
Is it possible ? If yes, how.
[
{
"_index": "a1",
"_type": "_doc",
"_id": "50c09e31-1fad-4d25-ab9d-35154a1b765b",
"_score": 5.0,
"_source":
{
"start_at": "2022-06-23 14:00",
"end_at": "2022-06-23 14:15",
...
}
},
{
"_index": "a1",
"_type": "_doc",
"_id": "d96ba291-63de-422a-9123-3d1a1d573861",
"_score": 5.0,
"_source":
{
"start_at": "2022-06-24 16:30",
"end_at": "2022-06-24 17:00",
...
}
}
]
GET /a1/_search?pretty
{
"query": {
"bool": {
"must": [
{
"range": {
"start_at": {
"gte": "2022-06-20",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"start_at": {
"lt": "2022-06-27",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"start_at": {
"gte": "14:00",
"format": "HH:mm"
}
}
},
{
"range": {
"start_at": {
"lt": "18:00",
"format": "HH:mm"
}
}
},
]
}
},
"size": 10
}
Thanks.
The immediate solution would be to use a query similar to this one but change the script part to:
doc['start_at'].value.getHourOfDay() ...
Since scripting can be bad for performance, a better solution would be to index the hours into a dedicated field and then perform a range query on it.

elasticsearch - fuzziness with bool_prefix type

I have the following query:
{
size: 6,
query: {
multi_match: {
query,
type: 'bool_prefix',
fields: ['recommendation', 'recommendation._2gram', 'recommendation._3gram'],
},
},
highlight: {
fields: {
recommendation: {},
},
},
}
I want to add fuzziness: 1 to this query, but it has issues with the type: 'bool_prefix'. I need the type: 'bool_prefix to remain there b/c its integral to how the query works, but I'd also like to add some fuzziness to it. Any ideas?
As mentioned in the official ES documentation of bool_prefix
The fuzziness, prefix_length, max_expansions, fuzzy_rewrite, and
fuzzy_transpositions parameters are supported for the terms that are
used to construct term queries, but do not have an effect on the
prefix query constructed from the final term.
Adding a working example with index mapping, data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"recommendation": {
"type": "search_as_you_type",
"max_shingle_size": 3
}
}
}
}
Index Data:
{
"recommendation":"good things"
}
{
"recommendation":"good"
}
Search Query:
You can add fuzziness parameter with bool_prefix, as shown below
{
"size": 6,
"query": {
"multi_match": {
"query": "goof q",
"type": "bool_prefix",
"fields": [
"recommendation",
"recommendation._2gram",
"recommendation._3gram"
],
"fuzziness": 1
}
},
"highlight": {
"fields": {
"recommendation": {}
}
}
}
Search Result:
"hits": [
{
"_index": "65817192",
"_type": "_doc",
"_id": "2",
"_score": 1.1203322,
"_source": {
"recommendation": "good things"
},
"highlight": {
"recommendation": [
"<em>good</em> things"
]
}
},
{
"_index": "65817192",
"_type": "_doc",
"_id": "1",
"_score": 0.1583319,
"_source": {
"recommendation": "good"
},
"highlight": {
"recommendation": [
"<em>good</em>"
]
}
}
]
I ended up with additional fuzzy query combined with multi_match by bool. In your case it would look like this:
{
"size": 6,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "goof q",
"type": "bool_prefix",
"fields": [
"recommendation",
"recommendation._2gram",
"recommendation._3gram"
]
}
},
{
"fuzzy": {
"nameSearch": {
"value": "goof q",
"fuzziness": "AUTO"
}
}
}
]
}
},
"highlight": {
"fields": {
"recommendation": {}
}
}
}

Elasticsearch - Trouble querying for exact date with range query

I have the following mapping definition in my events index:
{
"events": {
"mappings": {
"properties": {
"data": {
"properties": {
"reportDate": {
"type": "date",
"format": "M/d/YYYY"
}
}
}
}
}
}
And an example doc:
{
"_index": "events",
"_type": "_doc",
"_id": "12345",
"_version": 1,
"_seq_no": 90,
"_primary_term": 1,
"found": true,
"_source": {
"data": {
"reportDate": "12/4/2018",
}
}
}
My goal is query for docs with an exact data.reportDate of 12/4/2018, but when I run this query:
{
"query": {
"range": {
"data.reportDate": {
"lte": "12/4/2018",
"gte": "12/4/2018",
"format": "M/d/YYYY"
}
}
}
}
I instead get all of the docs that have a data.reportDate that is in the year 2018, not just 12/4/2018. I've tried setting relation to CONTAINS and WITHIN with no luck. Any ideas?
You need to change your date format from M/d/YYYY to M/d/yyyy. Refer to this ES official documentation to know more about date formats. You can even refer to this documentation to know about the difference between yyyy and YYYY
yyyy specifies the calendar year whereas YYYY specifies the year (of
“Week of Year”)
Adding a working example with index mapping, data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"data": {
"properties": {
"reportDate": {
"type": "date",
"format": "M/d/yyyy"
}
}
}
}
}
}
Index Data:
{
"data": {
"reportDate": "12/3/2018"
}
}
{
"data": {
"reportDate": "12/4/2018"
}
}
{
"data": {
"reportDate": "12/5/2018"
}
}
Search Query:
{
"query": {
"bool": {
"must": {
"range": {
"data.reportDate": {
"lte": "12/4/2018",
"gte": "12/4/2018"
}
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "65312594",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"data": {
"reportDate": "12/4/2018"
}
}
}
]

Using a Kibana view query from application

I used the following filter and then searched for query string using Lucene to get the view that I was looking for.
{
"query": {
"match": {
"eventSource": {
"query": "ec2.amazonaws.com",
"type": "phrase"
}
}
}
}
I do not want to return event names those start with the word describe or get. Rest of the event names from ec2 event source should be returned.
!(eventName.keyword: Describe* OR eventName.keyword:
Get* )
The question is how to combine these 2 search requests into one?
I need to use that query from my application.
Update:
The Inspect menu of Kibana Discover tab generates this query. I am just trying to rewrite query_string part with usual match or match_phrase using boolean OR clause.
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "!(eventName.keyword: Describe* OR eventName.keyword: Get* )",
"analyze_wildcard": true
}
},
{
"match_phrase": {
"eventSource": {
"query": "ec2.amazonaws.com"
}
}
},
{
"range": {
"#timestamp": {
"format": "strict_date_optional_time",
"gte": "2020-07-09T08:39:15.947Z",
"lte": "2020-07-24T08:39:15.947Z"
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
}
You can easily use the boolean query's must_not clause to exclude the documents which you don't want in your search result and you can add as many as must_not as you want, it's fairly easy to do and can be done in a single query.
Please refer the example in the same link to get more info. Created sample in my local to show your the correct query, Please note instead of wildcard I am using the prefix query which is better and server your use-case.
Create index mapping
{
"mappings": {
"properties": {
"eventName": {
"type": "keyword"
}
}
}
}
Index sample doc
{
"eventName" : "Describe the events"
}
{
"eventName" : "the Describe events"
}
{
"eventName" : "Get the event"
}
{
"eventName" : "event Get"
}
Now search query to get only 2 and 3rd doc according to your req
{
"query": {
"bool": {
"must_not": [
{
"prefix": {
"eventName": "Desc"
}
},
{
"prefix": {
"eventName": "Get"
}
}
]
}
}
}
Search result
"hits": [
{
"_index": "ngramkey",
"_type": "_doc",
"_id": "2",
"_score": 0.0,
"_source": {
"eventName": "the Describe events"
}
},
{
"_index": "ngramkey",
"_type": "_doc",
"_id": "4",
"_score": 0.0,
"_source": {
"eventName": "event Get"
}
}
]
As suggested by the user "Opster Elasticsearch Ninja", I have merged must not boolean query like this...
{
"query": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"prefix": {
"eventName.keyword": "Desc"
}
},
{
"prefix": {
"eventName.keyword": "Get"
}
}
]
}
},
{
"match_phrase": {
"eventSource": {
"query": "ec2.amazonaws.com"
}
}
},
{
"range": {
"#timestamp": {
"format": "strict_date_optional_time",
"gte": "2020-07-09T08:39:15.947Z",
"lte": "2020-07-24T08:39:15.947Z"
}
}
}
],
"filter": [],
"should": [],
"must_not": []
}
}
}

Resources