I have an elasticsearch server which i'm accessing via a java server using the Jest client and i was looking for the best way to update multiple fields of a document each time.
I have looked to the documentation so far, and i have found that there are two way for doing it :
Partial update via a script : i don't think it is suitable for multiple field update (because i don't know the modified fields).
Whole document update: via re-indexing the whole document.
My question is how could i update the whole document knowing that Jest provide only update via a script?
Is it the best way to delete a document and indexing the updated version?
Already answered this in the github issue you also opened but again:
You should use the second way you linked (Whole document update) and there is no special API for it, it's just a regular index request. So you can do it simply by sending your Index request against the id of the document you want to update.
For example assuming you have below document already indexed in Elasticsearch within index people, type food, id 9:
{"user": "kramer", "fav_food": "jello"}
Then you would do:
String source = "{\"user\": \"kramer\", \"fav_food\": \"pizza\"}";
JestResult result = client.execute(
new Index.Builder(source)
.index("people")
.type("food")
.id(9)
.build()
);
Related
I have an index which contains data as follows:
{
"some_field": string, -- exists in my database
"some_other_field": string, -- exists in my database
"another_field": string -- does NOT exist in my database
}
I have a script which grabs data from a database and performs a bulk insert. However, only some of the fields above come from the database as shown above.
If a document already exists, I still want to update the fields that come from the database, but without overwriting/deleting the field that does not come from the database.
I am using the bulk API to do this, however, I lose all data relating to another_field when running the script. Looking at bulk docs, I can't find any options to simply update an existing doc.
I am unable to share the script, but hope this might be enough information to shine some light on possible solutions.
TLDR;
Yes it is use index, as the doc explain:
(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.
But make sure to provide the _id of the document in case of an update.
To understand
I created a toy project to replay and understand:
# post a single document
POST /71177773/_doc
{
"some_field": "data",
"some_other_field": "data"
}
GET /71177773/_search
# try to "update" with out providing an id
POST /_bulk
{"index":{"_index":"71177773"}}
{"some_field":"data","some_other_field":"data","another_field":"data"}
# 2 Documents exist now
GET /71177773/_search
# Try the same command but provide using the Id on the first documents
POST /_bulk
{"index":{"_index":"71177773", "_id": "<Id of the document>"}}
{"some_field":"data","some_other_field":"data","another_field":"data"}
# It seems it worked
GET /71177773/_search
If your question was:
Is Elasticsearch smart enough to recognise I want to update an existing document without providing the Id ?
I am afraid it is not possible.
I want to keep a record of pages requested by users. Additionally, I have to store count of each page requested. I am currently storing my website page visits of users by updating an index in elastic search.
I do this by updating a document which is similar to
{
userid : 'id1234',
url : 'website.com/url-1',
count : 23,
}
Here, count of '23' is the total number of time the URL was requested by user with id 'id1234'.
To achieve this, I retrieve the document, increment the present count, and re-push again. My questions is that is it possible to do this with a single query?
I saw a similar approach using scripts here.
Can we do this without scripts?
Elasticsearch is not well suited for Updates. So, even if it was possible to do an update like this, it was first deleting the record, then adding it (the whole document) and reindexing.
Probably the closest thing here is using partial update feature:
Here is an example from a documentation:
POST /metrics/users/1/_update
{
"script" : "ctx._source.count+=1"
}
But you've mentioned it in the question ( The link to the relevant document is available here)
But if you were using scripts, the problem still is that it's relatively slow
I am using olivere/elastic library for elasticsearch in my go app . I have list of values for a particular field (say fieldA) of elasticsearch document. I want to update a particular field of all document by searching on field fieldA .
This : Updating a record in ElasticSearch using olivere/elastic in google go
explains the update part. But in my case in don't have Id of documents to be updated . So, either i can make search call to retrieve document ids and then update them , or is there another way am missing? Thanks in Advance.
If you need to update a list of documents, you can use the Update By Query API. The unit tests give you a hint about how the syntax looks like. However, if you have individual values for individual documents, I guess there's no other way than updating them one by one. The fastest way to achieve that is by using the Bulk API.
I have been evaluating elasticsearch 5.1.1. My data upload happens via NEST. I have used two different types and different index names while testing. Now that I have a better understanding of the API, I have settled on a type. I deleted all the indices and created a new one.
My documents have their own ID and I have fluent code as follows
config.InferMappingFor<SearchFriendlyIssue>(ib => ib.IdProperty(p => p.Id));
When I upload documents, the API comes back as "Updated". This is strange, since I just created a new index. What is worse, my new index only contains one document. What I expected is to have a Created response. The code to add data is as per the API documentation
var searchObject = new SearchFriendlyIssue(issue);
var response = Client.Index(searchObject, idx => idx.Index(Index));
Console.WriteLine(response.Result.ToString());
I think I am missing something around how types and indices interact. How do I get rid of my unreachable documents? Rather more specifically how do I get them into my index so they can be deleted or dealt with?
Looks like the assumption I had unreachable documents was wrong. Instead, the declaration for the ID property wasn't working, and I was overwriting the same document over and over again. My bad!
I need to add a new field to ALL documents in an index without pulling down the document and pushing it back up (this will take about a day). Is it possible to use the _BULK api to achieve this?
I have also researched the update_by_query plugin, and it seems to would take just as long as pulling them down and pushing them back myself.
Yes, the bulk API supports updates which can add a new field using a partial document or script. To iterate through your document ids do a scan and scroll with the fields parameter set to an empty array.