Delete by Query with Sort in Elasticsearch - elasticsearch

I want to delete the most current item in my Elasticsearch index sorted by myDateField which is a date type. Is that possible? I want something like this query but this would delete all matching items even though I have the size at 1.
{
"query" : {
"match_all" : {
}
},
"size" : "1",
"sort" : [
{
"myDateField" : {
"order" : "desc"
}
}
]
}

Delete by query is unlikely to support any sorting features.
If you try Delete by query - however you'll get the error: request does not support [sort]. I couldn't find any documentation saying that the "sort" parameter is not supported in delete by query.
I've one idea to do it but don't know it's the best way or not?
Step 1: Do a normal query based on your conditions+sorting and get those ids.
Step 2: Build a bulk query to delete all documents retrieved above by id those you got on Step 1.

Related

Is it possible to check that specific data matches the query without loading it to the index?

Imagine that I have a specific data string and a specific query. The simple way to check that the query matches the data is to load the data into the Elastic index and run the online query. But can I do it without putting it into the index?
Maybe there are some open-source libraries that implement the Elastic search functionality offline, so I can call something like getScore(data, query)? Or it's possible to implement by using specific API endpoints?
Thanks in advance!
What you can do is to leverage the percolator type.
What this allows you to do is to store the query instead of the document and then test whether a document would match the stored query.
For instance, you first create an index with a field of type percolator that will contain your query (you also need to add in the mapping any field used by the query so ES knows what their types are):
PUT my_index
{
"mappings": {
"properties": {
"query": {
"type": "percolator"
},
"message": {
"type": "text"
}
}
}
}
Then you can index a real query, like this:
PUT my_index/_doc/match_value
{
"query" : {
"match" : {
"message" : "bonsai tree"
}
}
}
Finally, you can check using the percolate query if the query you've just stored would match
GET /my_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document" : {
"message" : "A new bonsai tree in the office"
}
}
}
}
So all you need to do is to only store the query (not the documents), and then you can use the percolate query to check if the documents would have been selected by the query you stored, without having to store the documents themselves.

Elasticsearch - How to delete a list of documents?

I have an array of _id.
On this page I found out how to retrieve a list of documents from it :
GET ads/_mget
{
"ids": [ "586213440e7d2c7f10fe2574",
"586213440e7d2c7f10fe2575",
"586213450e7d2c7f10fe2576",
"586213450e7d2c7f10fe2577" ]
}
This works and returns a list of 4 full documents, as expected.
(sidenote)
I find it weird to have to write "ids" in the query, when it actually acts on the "_id" field.
(end sidenote)
Now I can't figure out how to DELETE these documents from the same _id list.
I tried DELETE ads/_mget but I get an error : No handler found for uri [/ads/_mget] and method [DELETE]
I tried _mdelete instead of _mget but it doesn't seem to exist.
I also tried
DELETE ads
{
"ids": [ "586213440e7d2c7f10fe2574",
"586213440e7d2c7f10fe2575",
"586213450e7d2c7f10fe2576",
"586213450e7d2c7f10fe2577" ]
}
...but this... just deletes EVERYTHING and I have to reindex the database.
You can always use feature of Delete By Query and supply payload like:
POST ads/_delete_by_query
{
"query" : {
"terms" : {
"_id" :
[ "586213440e7d2c7f10fe2574",
"586213440e7d2c7f10fe2575",
"586213450e7d2c7f10fe2576",
"586213450e7d2c7f10fe2577" ]
}
}
}
For more infromation about terms query please follow https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html

Ordering term aggregation buckets by sub-aggregration result values

I have two questions about the query seen on this capture:
How do I order by value in the sum_category field in the results?
I use respsize again in the query but it's not correct as you can see below.
Even if I make only an aggregration, why do all the documents come with the result? I mean, if I make a group by query in SQL it retrieves only grouped data, but Elasticsearch retrieves all documents as if I made a normal search query. How do I skip them?
Try this:
{
"query" : {
"match_all" : {}
},
"size" : 0,
"aggs" : {
"categories" : {
"terms" : {
"field" : "category",
"size" : 999999,
"order" : {
"sum_category" : "desc"
}
},
"aggs" : {
"sum_category" : {
"sum" : {
"field" : "respsize"
}
}
}
}
}
}
1). See the note in (2) for what your sort is doing. As for ordering the categories by the value of sum_category, see the order portion. There appears to be an old and closed issue related to that https://github.com/elastic/elasticsearch/issues/4643 but it worked fine for me with v1.5.2 of Elasticsearch.
2). Although you do not have that match_all query, I think that's probably what you are getting results for. And so the sort your specified is actually getting applied to those results. To not get these back, I just have size: 0 portion.
Do you want buckets for all the categories? I noticed you do not have size specified for the main aggregation. That's the size: 999999 portion.

Queries vs Filters - Order of execution

I've read this question and a colleague of mine made me doubt:
In a filtered query, when is the filter applied ? Before or after executing the query ? When is the result cached ?
If the filter is applied beforehand, wouldn't it be a a good thing to duplicate the query part in the filters ?
If the filter is applied afterward, then i'm having trouble understanding what is cached.
Luckily, ES provides two types of filters for you to work with:
{
"query" : {
"field" : { "title" : "Catch-22" }
},
"filter" : {
"term" : { "year" : 1961 }
}
}
{
"query": {
"filtered" : {
"query" : {
"field" : { "title" : "Catch-22" }
},
"filter" : {
"term" : { "year" : 1961 }
}
}
}
}
In the first case, filters are applied to all documents found by the query. In the second case, the documents are filtered before the query runs. This yields better performance.
Quoted from: http://www.packtpub.com/elasticsearch-server-for-fast-scalable-flexible-search-solution/book
About cache, I'm not sure about cache mechanism of filters.
My guessing would be:
First case, since the filter is against a set of results returned by query, the cache is kind of specific for this return set.
Second case, the filter is applied first, the cache is stored for the indices you checked against, thus, this cache is more reusable because it does not rely on the content of the query, but at larger memory cost and query time for first time(before the cache is generated).
Let me explain you search query execution-
First thing is that there is always a Complete document of reference in which you want to search.
If you have filter query included with search query then it will just make that document smaller or in other words filter queries are cached results of same query.
Now you have a smaller tree to search from with your query text.
Now your doubt part- Duplicating the query in filters will only increase overhead of cache mechanism and There are many guide lines on what to include in filter query and what to ignore. It's all play of relevancy.

Register and call query in ElasticSearch

Is it possible to register query (like the percolate process) and call them by name to execute them.
I am building an application that will let the user save search query associated with a label. I would like to save the query generated by the filter in ES.
If I save the query in an index, I have to call ES first to retrieve the query, extract the field containing the query and then call ES again to execute it. Can I do it in one call ?
The other solution is to register queries (labels with _percolator with an identifier of the user:
/_percolate/transaction/user1_label1
{
"userId": "user1",
"query":{
"term":{"field1":"foo" }
}
}
and when there is a new document use the percolator in a non indexing mode (filtered per userId) to retrieve which query match, and then update the document by adding a field "label":["user1_label1", "user1_label2"] and finaly index the document. SO the labelling is done at indexing time.
What do you think ?
Thanks in advance.
Try Filter Aliases.
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{
"add" : {
"index" : "the_real_index",
"alias" : "user1",
"filter" : { "term" : { "field1" : "foo" } }
}
}
]
}'

Resources