Elastic_search function for deleting all records except a spastic set - elasticsearch

I want to delete all records from a specified index except for a specific set. Is there such a function in elastic_search?
I tried to use delete_by_query function but could not get it to work as desired. Below is a snippet of what I tried. I basically want to have an array of ids instead of only one id at a time.
POST /myindex/_delete_by_query
{
"query": {
"bool": {
"must_not": [
{
"match": {
"id": {
"query": [12345,67890]
}
}
}
]
}
}
}
I am new to elastic_search, but in SQL terms I want to something like the following query:
DELETE * FROM <my-index> WHERE <id> != <listOfIds>

Good start!! You can do it like you suggest with a terms query:
POST /myindex/_delete_by_query
{
"query": {
"bool": {
"must_not": [
{
"terms": {
"id": [
12345,
67890
]
}
}
]
}
}
}

Related

ElasticSearch / OpenSearch term search with logical OR

I have been scratching my head for a while looking at OpenSearch documentation and stackoverflow questions. How can I do something like this:
Select documents WHERE studentId in [1234, 5678] OR applicationId in [2468, 1357].
As long as studentId exactly matches one of the supplied values, or applicationId exactly matches one of the supplied values, then that document should be included in the response.
When I want to search for multiple values for a single field and get an exact match the following works:
{
"must":[
{
"terms": {
"studentId":["1234", "5678"]
}
}
]
}
This will find me exact matches on studentId in [1234, 5678].
If I try to add the condition to also look for (logical or) applicationId in [2468, 1357] then the following will not work:
{
"must":[
{
"terms": {
"studentId":["1234", "5678"]
}
},
{
"terms": {
"applicationId":["2468", "1357"]
}
}
]
}
because this will do a logical and on the two queries. I want logical or.
I cannot use should because this returns irrelevant results. The following does not work for me:
{
"should":[
{
"terms": {
"studentId":["1234", "5678"]
}
},
{
"terms": {
"applicationId":["2468", "1357"]
}
}
]
}
This seems to return all results, ranked by relevance. I find that the returned results do not actually match, despite the fact that this is a terms search.
Can you try with following query..
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"terms": {
"studentId":["1234", "5678"]
}
}
]
}
},
{
"bool": {
"must": [
{
"terms": {
"applicationId":["2468", "1357"]
}
}
]
}
}
]
}
}
}

Elastic search query using python list

How do I pass a list as query string to match_phrase query?
This works:
{"match_phrase": {"requestParameters.bucketName": {"query": "xxx"}}},
This does not:
{
"match_phrase": {
"requestParameters.bucketName": {
"query": [
"auditloggingnew2232",
"config-bucket-123",
"web-servers",
"esbck-essnap-1djjegwy9fvyl",
"tempexpo",
]
}
}
}
match_phrase simply does not support multiple values.
You can either use a should query:
GET _search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"requestParameters.bucketName": {
"value": "auditloggingnew2232"
}
}
},
{
"match_phrase": {
"requestParameters.bucketName": {
"value": "config-bucket-123"
}
}
}
]
},
...
}
}
or, as #Val pointed out, a terms query:
{
"query": {
"terms": {
"requestParameters.bucketName": [
"auditloggingnew2232",
"config-bucket-123",
"web-servers",
"esbck-essnap-1djjegwy9fvyl",
"tempexpo"
]
}
}
}
that functions like an OR on exact terms.
I'm assuming that 1) the bucket names in question are unique and 2) that you're not looking for partial matches. If that's the case, plus if there are barely any analyzers set on the field bucketName, match_phrase may not even be needed! terms will do just fine. The difference between term and match_phrase queries is nicely explained here.

How To Combine Multiple Queries In ElasticSearch

I am encountering an issue trying to correctly combine elastic search queries, in SQL my query would look something like this:
Select * from ABPs where (PId = 10 and PUId = 1130) or (PId = 30 and PUId = 2000) or (PlayerID = '12345')
I can achieve each of these by themselves and get correct results.
Query A) (PId = 10 and PUId = 1130)
translates to
{
"query": {
"bool": {
"must": [
{
"term": {
"PId": "1366"
}
},
{
"term": {
"PUId": "10"
}
}
]
}
}
}
Query B) (PId = 10 and PUId = 1130)
translates the same as above just with different values
Query C) (PlayerID = '12345')
translates to
{
"query": {
"match": {
"PlayerUuid": "62fe0832-7881-477c-88bb-9cbccdbfb3c3"
}
}
}
I have been trying to figure out how to get all of these into the same ES search query and I am just not having any luck at all and was hoping someone with more extensive ES experience would be able to give me a hand.
You can make use of Bool query using should(Logical OR) and must(Logical AND) clause.
Below is the ES query representation of the clause Select * from ABPs where (PId = 10 and PUId = 1130) or (PId = 30 and PUId = 2000) or (PlayerID = '12345')
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"PId": "10"
}
},
{
"term": {
"PUId": {
"value": "1130"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"PId": "30"
}
},
{
"term": {
"PUId": "2000"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"PlayerId": "12345"
}
}
]
}
}
]
}
}
}
Note that I'm assuming the fields PId, PUId and PlayerId are all of type keyword.
Wrap all your queries into a should-clause of a bool-query which you put in the filter-clause of another top-level bool-query.
Pseudo-code (as I’m typing on a cell phone):
“bool”: {
“filter”: {
“bool”: {
“should”: [
{query1},
{query2},
{query3}
]
}
}
}
In a bool- query made up of only should-clauses, will make it a requirement that at least one of the queries in the should-clause has to match (minimum_should_match-will be in such a scenario).
Update with the actual query (additional explanation):
POST <your_index_name>/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"should": [
{"bool": {{"must": [ {"term": {"PId": "10"}},{"term": {"PUId": "1130"}} ]}}},
{"bool": {{"must": [ {"term": {"PId": "30"}},{"term": {"PUId": "2000"}} ]}}},
{"term": {"PlayerId": "12345"}}
]
}
}
}
}
}
The example above is wrapping your actual bool-query in a filer-clause of another top-level bool-query to follow best-practices and guarantee for a better performance: whenever you don't care about the score, especially when it's always about exact-matching queries, you should put them into filter-clauses. For those queries Elasticsearch will not calculate a score and therefore can even potentially cache the results of that query for even better performance.

How to transform a Kibana query to `elasticsearch_dsl` query

I have a query
GET index/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"key1": "value"
}
},
{
"wildcard": {
"key2": "*match*"
}
}
]
}
}
}
I want to make the same call with elasticsearch_dsl package
I tried with
s = Search(index=index).query({
"bool": {
"should": [
{
"match": {
"key1": "value"
}
},
{
"wildcard": {
"key2": "*match*"
}
}
]
}
})
s.using(self.client).scan()
But the results are not same, am I missing something here
Is there a way to represent my query with elasticsearch_dsl
tried this, no results
s = Search(index=index).query('wildcard', key2='*match*').query('match', key1=value)
s.using(self.client).scan()
it seems to me that you forgot the stars in the query.
s = Search(index=index).query('wildcard', key='*match*').query('match', key=value)
This query worked for me
s = Search(index=index).query('match', key1=value)
.query('wildcard', key2='*match*')
.source(fields)
also, if key has _ like key_1 elastic search behaves differently and query matches results even which do not match your query. So try to choose your key which do not have underscores.

Elasticsearch : filter results based on the date range

I'm using Elasticsearch 6.6, trying to extract multiple results/records based on multiple values (email_address) passed to the query (Bool) on a date range. For ex: I want to extract information about few employees based on their email_address (annie#test.com, charles#test.com, heman#test.com) and from the period i.e project_date (2019-01-01).
I did use should expression but unfortunately it's pulling all the records from elasticsearch based on the date range i.e. it's even pulling other employees information from project_date 2019-01-01.
{
"query": {
"bool": {
"should": [
{ "match": { "email_address": "annie#test.com" }},
{ "match": { "email_address": "chalavadi#test.com" }}
],
"filter": [
{ "range": { "project_date": { "gte": "2019-08-01" }}}
]
}
}
}
I also tried must expression but getting no result. Could you please help me on finding employees using their email_address with the date range?
Thanks in advance.
Should(Or) clauses are optional
Quoting from this article.
"In a query, if must and filter queries are present, the should query occurrence then helps to influence the score. However, if bool query is in a filter context or has neither must nor filter queries, then at least one of the should queries must match a document."
So in your query should is only influencing the score and not actually filtering the document. You must wrap should in must, or move it in filter(if scoring not required).
GET employeeindex/_search
{
"query": {
"bool": {
"filter": {
"range": {
"projectdate": {
"gte": "2019-01-01"
}
}
},
"must": [
{
"bool": {
"should": [
{
"term": {
"email.raw": "abc#text.com"
}
},
{
"term": {
"email.raw": "efg#text.com"
}
}
]
}
}
]
}
}
}
You can also replace should clause with terms clause as in #AlwaysSunny's answer.
You can do it with terms and range along with your existing query inside filter in more shorter way. Your existing query doesn't work as expected because of should clause, it makes your filter weaker. Read more here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
{
"query": {
"bool": {
"filter": [
{
"terms": {
"email_address.keyword": [
"annie#test.com", "chalavedi#test.com"
]
}
},
{
"range": {
"project_date": {
"gte": "2019-08-01"
}
}
}
]
}
}
}

Resources