How to delete mutiple documents by ID in elasticsearch? - elasticsearch

I am trying to delete a short list of documents in one swoop on Elasticsearch 2.4, and I can't seem to give it a query that results in >0 documents getting deleted.
id_list = ["AWeKNmt5qJi-jqXwc6qO", "AWeKT7ULqJi-jqXwc6qS"] #example
# The following does not delete any document (despite these ids being valid)
delres = es.delete_by_query("my_index", doc_type="my_doctype", body={
"query": {
"terms": {
"_id": id_list
}
}
})
If I go one by one, then they get deleted just fine. Which seems to point to my query being the problem.
for the_id in id_list:
es.delete("my_index", doc_type="my_doctype", id=the_id)
I've also tried the ids query instead of terms, but that also does not delete anything.
es.delete_by_query(..., body = {"query": {"ids" { "values": id_list }}})
What am I missing?

delete_by_query was deprecated in ES 1.5.3, removed in ES 2.0, and reintroduced in ES 5.0. From https://www.elastic.co/guide/en/elasticsearch/reference/1.7/docs-delete-by-query.html:
Delete by Query will be removed in 2.0: it is problematic since it silently forces a refresh which can quickly cause OutOfMemoryError during concurrent indexing, and can also cause primary and replica to become inconsistent. Instead, use the scroll/scan API to find all matching ids and then issue a bulk request to delete them.

Related

Elastic GET by ID query on rollover alias fails with ""Alias [...] has more than one indices associated with it..."

Our new rollover indices just rolled over. Now this query...
GET http://my.elastic/system-logs/_doc/7e8017d8-0cb8-4b9e-b021-b2a4b4ac71c7
...fails with this:
"Alias [system-logs] has more than one indices associated with it [[system-logs-000002, system-logs-000001]], can't execute a single index op"
But doing the same thing with _search works fine:
GET http://my.elastic/system-logs/_search/
{
"query": {
"bool": {
"must": [{"term": {"_id": "a1906f52-3957-4f4b-9b40-531422e3a04e"}}]
}
}
}
The exception comes from this code, which looks like there is an allowAliasesToMultipleIndices setting for this, but I haven't been able to find a place to set it.
We're on Elastic 6.8.
In the first http request, you are just trying to find the doc with particular id on an index which in turn is an alias of more than one index.
That's the problem.
Reason:
_doc is a mapping type in elastic search. It is used to segregate documents in the same index. So it cannot check across the indices. It is deprecated. Refer, this also
And you need to use GET request with the permitted queries[like your second example] (term, terms, match, query_string, simple_query_string). Refer

How to query parent/child relation using matching _version?

I'm having this join datatype
"Review_Sentence": {
"type": "join",
"relations": {
"Review": "Sentence"
}
},
If I have a review v1 like that
review_v1
sentence1_v1
sentence2_v1
sentence3_v1
and later someone updates it and remove the last sentence
review_v2
sentence1_v2
sentence2_v2
then in Elastic searh I still have sentence3_v1 refering to the same review, so the query will return something like that
review_v2
sentence1_v2
sentence2_v2
sentence3_v1
How can I make sure the child _version is the same as the parent _version. I tried to use an external _version but if elasticsearch is giving me the latest _version for _id, regardless if the parent/child _version are matching or not.
So far my workaround is to delete all children for that review and insert the new one, but this is introducing a latency that I would like to get rid of

Application-side Joins Elasticsearch

I have two indexes in Elasticsearch, a system index, and a telemetry index. I'd like to perform queries and aggregations on the telemetry index using filters from the systems index. The systems index is relatively small and only receives new documents occasionally, but the telemetry index is much larger and is constantly receiving new documents. This seems like an ideal situation for using an application-side join.
I tried emulating the example query at the pervious link, but it turns out the filtered query is deprecated as of ES 5.0. (Why is this example in the current documentation?!)
Here are my queries:
GET /system/_search
{
"query": {
"match": {
"name": "George's system"
}
}
}
GET /telemetry/_search
{
"query": {
"bool":{
"must": {
"multi_match": {
"operator": "and",
"fields": ["systemId"]
, [1] }
}
}
}
}
}
The second one fails with a json_parse_exception because for some reason it doesn't like the [ ] characters after "fields".
Can anyone provide a simple example of using application-side joins?
Once such a query is defined (perhaps in Kibana's Dev Tools console) is there a way to visualize it in Kibana?
With elastic there is no way to execute two nested queries like in a relational database where the first query uses the response of the second. The example in the application-side join, means that you are actually making two queries (two different requests to elastic) on the application side.
First query you get the list of ids you need to filter on.
Second query you pass the list of ids that you got to the terms filter.
This works when you have no more than 1024 values for systemId. Because terms query has a limit on the number of terms.
Because this query is not feasible, then you can't visualize it in kibana.
In such case you have to sacrifice a little of space and add the systemId to your mapping.
Good Luck!

how elastic search find document content by doc id

There are many articles talking about inverted index and posting list in elastic search. But I did not find any article which explain that how elastic search find document content by doc id.
Could anyone explain this to me?
thx.
Ragav is correct. However, I do have a bit to add that may help you work with document Ids.
When you index documents that don't have an ID, and ID is generated for you by ElasticSearch. That field name is "_id".
If you know the Id value of the document you wish to find, you can simply perform the query like this:
GET my_index/_search
{
"query": {
"terms": {
"_id": [ "1", "2" ]
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
The above query would return documents that have have _id equal to 1 OR 2.
As Ragav said in his answer, if you created documents in the way described with id 1 or 2, you would return them with that sample query I pulled from the ElasticSearch documentation.
Hope this helps.
Elasticsearch is built on top of Lucene.
When you index a new document onto Elasticsearch, it indexes _index, _type and _id as a part of the document along with the actual content(_source).
So, when you try to get a document using the get API _index/_type/_id, it is basically converted into a query which searches for doc matching the _index, _type and the _id.
This is how Elasticsearch is able to return you the document.

Unable to loop through array field ES 6.1

I'm facing a problem in ElasticSearch 6.1 that I cannot solve and I don't know why. I have read the docs several times and maybe I'm missing something.
I have a scripted query that needs to do some calculation before decides if a record is available or not.
Here is the following script:
https://gist.github.com/dunice/a3a8a431140ec004fdc6969f77356fdf
What I'm doing is trying to loop though an array field with the following source:
"unavailability": [
{
"starts_at": "2018-11-27T18:00:00+00:00",
"local_ends_at": "2018-11-27T15:04:00",
"local_starts_at": "2018-11-27T13:00:00",
"ends_at": "2018-11-27T20:04:00+00:00"
},
{
"starts_at": "2018-12-04T18:00:00+00:00",
"local_ends_at": "2018-12-04T15:04:00",
"local_starts_at": "2018-12-04T13:00:00",
"ends_at": "2018-12-04T20:04:00+00:00"
},
]
When the script is executed it throws the error: No field found for [unavailability] in mapping with types [aircraft]
Is there any clue to make it work?
Thanks
UPDATE
Query:
https://gist.github.com/dunice/3ccd7d83ca6ddaa63c11013b84e659aa
UPDATE 2
Mapping:
https://gist.github.com/dunice/f8caee114bbd917115a21b8b9175a439
Data example:
https://gist.github.com/dunice/8ad0602bc282b4ca19bce8ae849117ad
You cannot access an array present in the source document via doc_values (i.e. doc). You need to directly access the source document via the _source variable instead, like this:
for(int i = 0; i < params._source['unavailability'].length; i++) {
Note that depending on your ES version, you might want to try ctx._source or just _source instead of params._source
I solve my use-case in a different approach.
Instead having a field as array of object like unavailability was I decided to create two fields as array of datetime:
unavailable_from
unavailable_to
My script walks through the first field then checks the second with the same position.
UPDATE
The direct access to _source is disabled by default:
https://github.com/elastic/elasticsearch/issues/17558

Resources