Elasticsearch update by query - update only document with minimum timestamp - elasticsearch

I am new to the elasticsearch and I am trying to get the first document from my index and update its list of object.
I tried several queries e.g:
POST localhost:9200/test-index/_update_by_query
{
"size":1,
"sort": [{"timestamp":"asc"}],
"script": {
"inline": "ctx._source.addresses.add(params.address)",
"params" : {
"address" :{
"street": "Yemen Road",
"number": 15,
"county": "Yemen"
}
}
}
}
But this updates all my documents.
What is the fastest way to do this?
Thank you in advance!

I got the following answer from the elasticsearch community:
Update by query does not support a size. You would need to run a query first using size and sorting, and then use the update API on that single document.

Related

Reindexing elastic-search documents into another index by changing the routing key to a combination of two field values

I have an existing elastic search index with the following document structure, without a routing_key
{
"_id",
"feild1"
"field2"
}
I need to migrate the data into a new index. The structure of the index remains the same with an added routing_key. The routing key needs to be updated to "field1_field2". Is there a simple Kibana script to migrate the data to the new index?
Combination of a simple painless and the reindex API of elastic search could be used to achieve this.
POST _reindex
{
"source": {
"index": "{old_index_name}",
"size": {batch_size}
},
"dest": {
"index": "{new_index_name}"
},
"script": {
"lang": "painless",
"inline": "if (ctx._source.participants.length > 0) {ctx._routing=ctx._source.field1+'-'+ctx._source.field2}"
}
}

Duplicate a document on elasticsearch

I need to clone the content of a document in my elasticsearch index (in the same index) by using the kibana console. I need exactly the same fields in the _source of the document (of course, the copy will have another id). I tryed to:
GET the document
Create a new empty instance of document
Update the new document by
manually copying the properties of the result on (1):
POST /blog/post/VAv2FWoBKgnBpki61WiD/_update { "doc" : {
"content" : "..." ...
But the problem is the field contain veeeery long properties. And sometimes I got an error since the strings seem not to be scaped when I manually copy them from the Kibana interface.
I searched in the documentation but I can not find a query to duplicate a document, and it is a quite common think to do I think...
Any clue?
Make use of Reindex API. Here is what you can do.
Summary of steps:
Create a destination_index (dummy). Make sure the mapping is exact to that of source_index
Using Reindex API, reindex that particular document from source_index to desitnation_index. During this operation, update the _id (I've mentioned the script)
Reindex this document back from desitnation_index to source_index
Reindex Query
Step 1: Copy document from source_index to destination_index. (With the script)
POST _reindex
{
"source": {
"index": "source_index",
"query": {
"match": {
"_id": "1"
}
}
},
"dest": {
"index": "destination_index"
},
"script": {
"inline": "ctx._id=2",
"lang": "painless"
}
}
Note how I've added a script in the above query that would change the _id (_id is set as 2) of the document. Your destination_index would have all the fields with exact same values as that of source except for the _id field.
Step 2: Copy that document from destination_index to source_index
POST _reindex
{
"source": {
"index": "destination_index",
"query": {
"match": {
"_id": "2"
}
}
},
"dest": {
"index": "source_index"
}
}
Now search the source_index, it would have two documents with different _ids (_id=1 and _id=2) having exact same content.
Hope this helps!

Elasticsearch upsert based on query

Two years ago someone asked how to do upserts when you don't know a document's id. The (unaccepted) answer referenced the feature request
that resulted in the _update_by_query API.
However _update_by_query does not allow insertion if no hits exist, so it is not really an upsert, but just another way to do update.
Is there a way to do an upsert without an _id yet? I know that my query will always return one or zero results. Or am I forced to do multiple requests (and maintain the uniqueness myself)?
This doesn't seem to be possible right now. _update provides an upsert attribute, but this doesn't work with _update_by_query unfortunately. The following just gives you an error around Unknown key for a START_OBJECT in [upsert].
POST website/doc/_update_by_query?conflicts=proceed
{
"query": {
"term": {
"url": "http://foo.com"
}
},
"script": {
"inline": "ctx._source.views+=1",
"lang": "painless"
},
"upsert": {
"views": 1,
"url": "http://foo.com"
}
}
Without knowing in_stock values in all the document now you can reduce its count by 1:
POST products/_update_by_query
{
"script": {
"source": "ctx._source.in_stock--"
},
"query": {
"match_all": {}
}
}

Elasticsearch: document size and query performance

I have an ES index with medium size documents (15-30 Mb more or less).
Each document has a boolean field and most of the times users just want to know if a specific document ID has that field set to true.
Will document size affect the performance of this query?
"size": 1,
"query": {
"term": {
"my_field": True
}
},
"_source": [
"my_field"
]
And will a "size":0 query results in better time performance?
Adding "size":0 to your query, you will avoid some net transfer this behaviour will improve your performance time.
But as I understand your case of use, you can use count
An example query:
curl -XPOST 'http://localhost:9200/test/_count -d '{
"query": {
"bool": {
"must": [
{
"term": {
"id": xxxxx
}
},
{
"term": {
"bool_field": True
}
}
]
}
}
}'
With this query only checking if there is some total, you will know if a doc with some id have set the bool field to true/false depending on the value that you specify in bool_field at query. This will be quite fast.
Considering that Elasticsearch will index your fields, the document size will not be a big problem for the performance. Using size 0 don't affect the query performance inside Elasticsearch but affect positively the performance to retrieve the document because the network transfer.
If you just want to check one boolean field for a specific document you can simply use Get API to obtain the document just retrieving the field you want to check, like this:
curl -XGET 'http://localhost:9200/my_index/my_type/1000?fields=my_field'
In this case Elasticsearch will just retrieve the document with _id = 1000 and the field my_field. So you can check the boolean value.
{
"_index": "my_index",
"_type": "my_type",
"_id": "1000",
"_version": 9,
"found": true,
"fields": {
"my_field": [
true
]
}
}
By looking at your question I see that you haven't mentioned the elasticsearch version you are using. I would say there are lot of factors that affects the performance of a elasticsearch cluster.
However assuming it is the latest elasticsearch and considering that you are after a single value, the best approach is to change your query in to a non-scoring, filtering query. Filters are quite fast in elasticsearch and very easily cached. Making a query non-scoring avoids the scoring phase entirely(calculating relevance, etc...).
To to this:
GET localhost:9200/test_index/test_partition/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"my_field" : True
}
}
}
}
}
Note that we are using the search API. The constant_score is used to convert the term query in to a filter, which should be inherently fast.
For more information. Please refer Finding exact values

retrieving the multivalued array from elasticsearch

I have trouble getting the facet in my index.
Basically I want to get the details of particular facet say "Company" in a separate array
I tried many queries but it all get entire facet under facet array .How can I get only particular facet in a facet array
My index is https://gist.github.com/4015817
Please help me .I am badly stuck here
Considering how complex your data structure is, the simples way to extract this information might be using script fields:
curl "localhost:9200/index/doc/_search?pretty=true" -d '{
"query" : {
"match_all" : {
}
},
"script_fields": {
"entity_facets": {
"script": "result=[];foreach(facet : _source.Categories.Types.Facets) {if(facet.entity==entity) result.add(facet);} result",
"params": {
"entity": "Country"
}
},
"first_facet": {
"script": "_source.Categories.Types.Facets[0]"
}
}
}'

Resources