Update Builder gives late response when multiple versions are there in Elasticsearch? - elasticsearch

Project : Spring Boot
I'm updating my elasticsearch document using following way,
#Override
public Document update(DocumentDTO document) {
try {
Document doc = documentMapper.documentDTOToDocument(document);
Optional<Document> fetchDocument = documentRepository.findById(document.getId());
if (fetchDocument.isPresent()) {
fetchDocument.get().setTag(doc.getTag());
Document result = documentRepository.save(fetchDocument.get());
final UpdateRequest updateRequest = new UpdateRequest(Constants.INDEX_NAME, Constants.INDEX_TYPE, document.getId().toString());
updateRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);
updateRequest.doc(jsonBuilder().startObject().field("tag", doc.getTag()).endObject());
UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
log.info("ES result : "+ updateResponse.status());
return result;
}
} catch (Exception ex) {
log.info(ex.getMessage());
}
return null;
}
Using this my document updated successfully and version incremented but when version goes 20+.
It takes lot many time to retrieve data(around 14sec).
I'm still confused regarding process of versioning. How it works in update and delete scenario? At time of search it process all the data version and send latest one? Is it so?

Elasticsearch internally uses Lucene which uses immutable segments to store the data. as these segments are immutable, every update on elasticsearch internally marks the old document delete(soft delete) and inserts a new document(with a new version).
The old document is later on cleanup during a background segment merging process.
A newly updated document should be available in 1 second(default refresh interval) but it can be disabled or change, so please check this setting in your index. I can see you are using wait_for param in your code, please remove this and you should be able to see the updated document fast if you have not changed the default refresh_interval.
Note:- Here both update and delete operation works similarly, the only difference is that in delete operation new document is not created, and the old document is marked soft delete and later on during segment merge deleted permanently.

Related

How can I enable automatic slicing on Elasticsearch operations like UpdateByQuery or Reindex using the Nest client?

I'm using the Nest client to programmatically execute requests against an Elasticsearch index. I need to use the UpdateByQuery API to update existing data in my index. To improve performance on large data sets, the recommended approach is to use slicing. In my case I'd like to use the automatic slicing feature documented here.
I've tested this out in the Kibana dev console and it works beautifully. I'm struggling on how to set this property in code through the Nest client interface. here's a code snippet:
var request = new Nest.UpdateByQueryRequest(indexModel.Name);
request.Conflicts = Elasticsearch.Net.Conflicts.Proceed;
request.Query = filterQuery;
// TODO Need to set slices to auto but the current client doesn't allow it and the server
// rejects a value of 0
request.Slices = 0;
var elasticResult = await _elasticClient.UpdateByQueryAsync(request, cancellationToken);
The comments on that property indicate that it can be set to "auto", but it expects a long so that's not possible.
// Summary:
// The number of slices this task should be divided into. Defaults to 1, meaning
// the task isn't sliced into subtasks. Can be set to `auto`.
public long? Slices { get; set; }
Setting to 0 just throws an error on the server. Has anyone else tried doing this? Is there some other way to configure this behavior? Other APIs seem to have the same problem, like ReindexOnServerAsync.
This was a bug in the spec and an unfortunate consequence of generating this part of the client from the spec.
The spec has been fixed and the change will be reflected in a future version of the client. For now though, it can be set with the following
var request = new Nest.UpdateByQueryRequest(indexModel.Name);
request.Conflicts = Elasticsearch.Net.Conflicts.Proceed;
request.Query = filterQuery;
((IRequest)request).RequestParameters.SetQueryString("slices", "auto");
var elasticResult = await _elasticClient.UpdateByQueryAsync(request, cancellationToken);

Solr partial update is not working

I'm trying to make Solr partial update is working with Spring Data.
SolrTemplate solr; // autowired
public void updateEntry(final Entry entry) {
PartialUpdate update = new PartialUpdate("id", entry.getId());
update.add("foo_field", "hohoho");
solr.saveBean(update);
solr.commit();
}
However nothing happens with the document - the field "id" doesn't change. Deletion and creation are working. Any ideas would be appreciated.
I'm doing some partial update for my current project with spring data solr and it's working fine. Here's the code i have :
PartialUpdate partialUpdate = new PartialUpdate("id", "yourId");
partialUpdate.setValueOfField("fieldToUpdate", "value");
solrTemplate.saveBean(partialUpdate);
solrTemplate.commit();
Be sure to have the proper solrconfig for your core : (https://wiki.apache.org/solr/Atomic_Updates)
you will need to activate on your solrconfig.xml the updatelog and in your schema.xml you will need to defined the _version_ field.

Nest - Reindexing

Elasticsearch released their new Reindex API in Elasticsearch 2.3.0, does the current version of NEST (2.1.1) make use of this api yet? If not, are there plans to do so?
I am aware that the current version has a reindex method, but it forces you to create the new index. For my use case, the index already exists.
Any feedback/insights will be greately appricated. Thnx!
This kind of question is best asked on the github issues for NEST since the committers on the project will be able to best answer :)
A commit went in on 6 April to map the new Reindex API available in Elasticsearch 2.3.0, along with other features like the Task Management API and Update By Query. This made its way into NEST 2.3.0
NEST 2.x already contains a helper for doing reindexing that uses scan/scroll under the covers and returns an IObservable<IReindexResponse<T>> that can be used to observe progress
public class Document {}
var observable = client.Reindex<Document>("from-index", "to-index", r => r
// settings to use when creating to-index
.CreateIndex(c => c
.Settings(s => s
.NumberOfShards(5)
.NumberOfReplicas(2)
)
)
// query to optionally limit documents re-indexed from from-index to to-index
.Query(q => q.MatchAll())
// the number of documents to reindex in each request.
// NOTE: The number of documents in each request will actually be
// NUMBER * NUMBER OF SHARDS IN from-index
// since reindex uses scan/scroll
.Size(100)
);
ExceptionDispatchInfo e = null;
var waitHandle = new ManualResetEvent(false);
var observer = new ReindexObserver<Document>(
onNext: reindexResponse =>
{
// do something with notification. Maybe log total progress
},
onError: exception =>
{
e = ExceptionDispatchInfo.Capture(exception);
waitHandle.Set();
},
completed: () =>
{
// Maybe log completion, refresh the index, etc..
waitHandle.Set();
}
);
observable.Subscribe(observer);
// wait for the handle to be signalled
waitHandle.Wait();
// throw the exception if one was captured
e?.Throw();
Take a look at the ReIndex API tests for some ideas.
The new Reindex API is named client.ReIndexOnServer() in the client to differentiate it from the existing observable implementation.

Setting Time To Live (TTL) from Java - sample requested

EDIT:
This is basically what I want to do, only in Java
Using ElasticSearch, we add documents to an index bypassing IndexRequest items to a BulkRequestBuilder.
I would like for the documents to be dropped from the index after some time has passed (time to live/ttl)
This can be done either by setting a default for the index, or on a per-document basis. Either approach is fine by me.
The code below is an attempt to do it per document. It does not work. I think it's because TTL is not enabled for the index. Either show me what Java code I need to add to enable TTL so the code below works, or show me different code that enables TTL + sets default TTL value for the index in Java I know how to do it from the REST API but I need to do it from Java code, if at all possible.
logger.debug("Indexing record ({}): {}", id, map);
final IndexRequest indexRequest = new IndexRequest(_indexName, _documentType, id);
final long debug = indexRequest.ttl();
if (_ttl > 0) {
indexRequest.ttl(_ttl);
System.out.println("Setting TTL to " + _ttl);
System.out.println("IndexRequest now has ttl of " + indexRequest.ttl());
}
indexRequest.source(map);
indexRequest.operationThreaded(false);
bulkRequestBuilder.add(indexRequest);
}
// execute and block until done.
BulkResponse response;
try {
response = bulkRequestBuilder.execute().actionGet();
Later I check in my unit test by polling this method, but the document count never goes down.
public long getDocumentCount() throws Exception {
Client client = getClient();
try {
client.admin().indices().refresh(new RefreshRequest(INDEX_NAME)).actionGet();
ActionFuture<CountResponse> response = client.count(new CountRequest(INDEX_NAME).types(DOCUMENT_TYPE));
CountResponse countResponse = response.get();
return countResponse.getCount();
} finally {
client.close();
}
}
After a LONG day of googling and writing test programs, I came up with a working example of how to use ttl and basic index/object creation from the Java API. Frankly most of the examples in the docs are trivial, and some JavaDoc and end-to-end examples would go a LONG way to help those of us who are using the non-REST interfaces.
Ah well.
Code here: Adding mapping to a type from Java - how do I do it?

Can I switch use of 'entities.SingleOrDefault' ==> 'entities.Find' without hazards?

In my WCF service's business logic, most of the places when I need to locate an entity, I use this syntax:
public void UpdateUser(Guid userId, String notes)
{
using (ProjEntities entities = new ProjEntities())
{
User currUser = entities.SingleOrDefault(us => us.Id == userId);
if (currUser == null)
throw new Exception("User with ID " + userId + " was not found");
}
}
I have recentely discovered that the DbContext has the Find method, and I understand I can now do this:
public void UpdateUser(Guid userId, String notes)
{
using (ProjEntities entities = new ProjEntities())
{
User currUser = entities.Find(userId);
if (currUser == null)
throw new Exception("User with ID " + userId + " was not found");
}
}
Note : the 'userId' property is the primary key for the table.
I read that when using Find method entity framework checks first to see if the entity is already in the local memory, and if so - brings it from there. Otherwise - a trip is made to the database (vs. SingleOrDefault which always makes a trip to the database).
I was wondering if I now will convert all my uses of SingleOrDefault to Find is there any potential of danger?
Is there a chance I could get some old data that has not been updated if I use Find and it fetches the data from memory instead of the database?
What happens if I have the user in memory, and someone changed the user in the database - won't it be a problem if I always use now this 'memory' replica instead of always fetching the latest updated one from the database?
Is there a chance I could get some old data that has not been updated
if I use Find and it fetches the data from memory instead of the
database?
I think you have sort of answered your own question here. Yes, there is a chance that using Find you could end up having an entity returned that is out of sync with your database because your context has a local copy.
There isn't much more anyone can tell you without knowing more about your specific application; do you keep a context alive for a long time or do you open it, do your updates and close it? obviously, the longer you keep your context around the more susceptible you are to retrieving an up to date entity.
I can think of two strategies for dealing with this. The first is outlined above; open your context, do what you need and then dispose of it:
using (var ctx = new MyContext())
{
var entity = ctx.EntitySet.Find(123);
// Do something with your entity here...
ctx.SaveChanges();
}
Secondly, you could retrieve the DbEntityEntry for your entity and use the GetDatabaseValues method to update it with the values from the database. Something like this:
var entity = ctx.EntitySet.Find(123);
// This could be a cached version so ensure it is up to date.
var entry = ctx.Entry(entity);
entry.OriginalValues.SetValues(entry.GetDatabaseValues());

Resources