Is it ok to use only filter query in elastic search - elasticsearch

i have to query elastic search for some data and all my filters are drop down values as in they are exact matches only so i thought of using only the filter query and not any must or match query, so is there any problem with this kind of approach.
in the below example i am trying to get last 15 min data where L1 is any 1 of ("XYZ","CFG") and L2 is any 1 of ( "ABC","CDE")
My query looks like below :
{
"size": 20,
"sort": [
{
"eventTs": "desc"
}
],
"query": {
"bool": {
"filter": [
{
"range": {
"eventTs": {
"gte": "now-15m",
"lte": "now",
"format": "epoch_millis",
"boost": 1
}
}
},
{
"terms": {
"l1": [
"XYZ","CFG"
]
}
},
{
"terms": {
"l2":[
"ABC","CDE"
]
}
}
]
}
}
}

If you don't need _score which is used to show the relevant documents according to their score, you can use filter which is executed in much faster way(since calculation of score is disabled), and cached as well.
Must read query and filter context for in-depth understanding of these concepts.

Related

Elasticsearch:How to fetch all the records with most recent entry at first

How can i get all the records from elasticsearch with most recent entry as the first record
for example,
If i have 5 libraries with ids from 1 to 5,
how can i get complete list of books from library 5 which is sorted with latest book entered.
Here is my sample query which consisting of nested fields
http://localhost:9200/library*/_search
{
"size": 1000,
"_source": [
"library.bookname","library.author"
],
"query": {
"bool": {
"must": [
{
"match": {
"library.id": 5
}
}
]
}
}
}
You can use the sort option in the request body. If you want to sort it by the latest book entered, there is an order option that should be set to desc. For this, you need to have a timestamp field or a similar one that can be sorted by the order it entered.
Lets say you have a date field timestamp and it will be used for sorting by the latest entered, then you can do something like this to sort the result:
"sort": { "timestamp": { "order": "desc" } }.
So, your sample query will look like:
http://localhost:9200/library*/_search
{
"size": 1000,
"sort": { "timestamp": { "order": "desc" } },
"_source": [
"library.bookname","library.author"
],
"query": {
"bool": {
"must": [
{
"match": {
"library.id": 5
}
}
]
}
}
}

Give higher priority to specific ranges in elasticsearch

Here is a sample document from my index
{
"name" : "Neil Buckinson",
"insuranceType" : "personal",
"premiumAmount": 4000,
"age": 36
}
I want to give the documents with premium amount 3500 to 4500 more priority than others. How can I do that in elasticsearch?
I would say a better approach would be function score query with filter functions -
{
"query": {
"function_score": {
"functions": [
{
"weight": 100,
"filter": {
"range": {
"premiumAmount": {
"gte": 3500,
"lte": 4500
}
}
}
}
]
}
}
}
That's quite simple. Just use Bool Query and add your condition in should clause.
The bool query takes a more-matches-is-better approach, so the score
from each matching must or should clause will be added together to
provide the final _score for each document.
GET /index/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"should": [
{
"range": {
"premiumAmount": {
"gte": 3500,
"lte": 4500
},
"boost": 2
}
}
]
}
}
}
Just place your current query in must clause instead of match_all.
should means that this condition is optional, but if it matches your criterion, it will boost your document
That's what exactly you're looking for.

Elasticsearch: Limit filtered query to 5 items per type per day

I'm using elasticsearch to gather data for my frontpage on my event-portal. the current query is as follows:
{
"query": {
"function_score": {
"filter": {
"and": [
{
"geo_distance": {
"distance": "50km",
"location": {
"lat": 50.78,
"lon": 6.08
},
"_cache": true
}
},
{
"or": [
{
"and": [
{
"term": {
"type": "event"
}
},
{
"range": {
"datetime": {
"gt": "now"
}
}
}
]
},
{
"not": {
"term": {
"type": "event"
}
}
}
]
}
]
},
"functions": [
...
]
}
}
}
So basically all events in an 50km distance which are future events or other types. Other types could be status, photo, video, soundcloud etc... All these items have a datetime field and a parent field which account the items belongs to. There are some functions after the filter for scoring objects based on there distance and age.
Now my question:
Is there a way to filter the query to get only the first (or even better highest scored) 5 items per type per account per day?
So currently I have accounts which upload 20 images at the same time. This is too much to display on the frontpage.
I thought about using filter scripts in a post_filter. But i am not very familiar with this topic.
Any ideas?
many thanks in advance
DTFagus
I solved it this way:
"aggs": {
"byParent": {
"terms": {
"field": "parent_id"
},
"aggs": {
"byType": {
"terms": {
"field": "type"
},
"aggs": {
"perDay": {
"date_histogram" : {
"field" : "datetime",
"interval": "day"
},
"aggs": {
"topHits": {
"top_hits": {
"size": 5,
"_source": {
"include": ["path"]
}
}
}
}
}
}
}
}
}
}
Unfortunately there is no pagination for aggregations (or other way around: the pagination of the query is not used). So I will get the paginated query results and the aggregation of all hits and intersect the arrays in js. Does not sound very efficient but I currently have no better idea. Anyone?
The only way around this I see would be to index all data into two indices. One containing all data and one with only the top 5 per day per type per account. This would be less time consuming to query but more time and storage consuming while indexing :/
You can limit the number of results returned by your query using the "size" parameter.if you set size to 5, then you will get the first 5 results returned by your query.
Check the documentation http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/pagination.html
Hope this helps!

search for a certain text between within a range of a certain timestamp with Elasticsearch

I have worked with Elasticsearch and have done some research on the Internet how to query data with a certain text and how to query data within a range of timestamp, using Elasticsearch PHP Client API. Now I would like to combine these two queries in one. Lets say search for a certain text and within a range of a certain timestamp. Can someone please tell me how to do that using Elasticsearch PHP Client API? Thanks in advanced! I have searched on the Internet but still cannot combine these two queries in one :-(
Here is an example of a bool query, the logic here is that the record must fall within a date range and should also contain the text in the textfield field. You could have both query conditions within the must clause.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{
"range": {
"datefield": {
"gte": "from",
"lte": "to"
}
}
}
],
"should": [
{
"match": {
"textfield": {
"query": "Name",
"boost": 10
}
}
}
]
}
}
}
UPDATE - OR MUST HAVE BOTH
{
"from": 0,
"size": 20,
"query": {
"bool": {
"must": [
{
"range": {
"datefield": {
"gte": "from",
"lte": "to"
}
}
},
{
"match": {
"textfield": {
"query": "Name",
"boost": 10
}
}
}
]
}
}
}

ElasticSearch Delete Query - Filter with term and range

I have the following query that I am trying to use to delete data from an ElasticSearch index.
{
"filter": {
"and": [
{
"range": {
"Time": {
"from": "20120101T000000",
"to": "20120331T000000"
}
}
},
{
"term": {
"Source": 1
}
}
]
}
}
I have tried to delete documents based on this query. This query returns results fine when I run it against the index. But when I try to run a delete command against the index, nothing happens.
I am not sure if I am constructiing the query wrong or what else.
You're using only a filter while the delete by query API probably needs a query. You can convert your filter to a query using a filtered query like this:
{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter": {
"and": [
{
"range": {
"Time": {
"from": "20120101T000000",
"to": "20120331T000000"
}
}
},
{
"term": {
"Source": 1
}
}
]
}
}
}
Otherwise you could convert your filter to a query using a bool query with two must clauses, so that you don't need a filtered query anymore. Anyway, I guess the filter approach is better since filters are faster.

Resources