Nest - Reindexing - elasticsearch

Elasticsearch released their new Reindex API in Elasticsearch 2.3.0, does the current version of NEST (2.1.1) make use of this api yet? If not, are there plans to do so?
I am aware that the current version has a reindex method, but it forces you to create the new index. For my use case, the index already exists.
Any feedback/insights will be greately appricated. Thnx!

This kind of question is best asked on the github issues for NEST since the committers on the project will be able to best answer :)
A commit went in on 6 April to map the new Reindex API available in Elasticsearch 2.3.0, along with other features like the Task Management API and Update By Query. This made its way into NEST 2.3.0
NEST 2.x already contains a helper for doing reindexing that uses scan/scroll under the covers and returns an IObservable<IReindexResponse<T>> that can be used to observe progress
public class Document {}
var observable = client.Reindex<Document>("from-index", "to-index", r => r
// settings to use when creating to-index
.CreateIndex(c => c
.Settings(s => s
.NumberOfShards(5)
.NumberOfReplicas(2)
)
)
// query to optionally limit documents re-indexed from from-index to to-index
.Query(q => q.MatchAll())
// the number of documents to reindex in each request.
// NOTE: The number of documents in each request will actually be
// NUMBER * NUMBER OF SHARDS IN from-index
// since reindex uses scan/scroll
.Size(100)
);
ExceptionDispatchInfo e = null;
var waitHandle = new ManualResetEvent(false);
var observer = new ReindexObserver<Document>(
onNext: reindexResponse =>
{
// do something with notification. Maybe log total progress
},
onError: exception =>
{
e = ExceptionDispatchInfo.Capture(exception);
waitHandle.Set();
},
completed: () =>
{
// Maybe log completion, refresh the index, etc..
waitHandle.Set();
}
);
observable.Subscribe(observer);
// wait for the handle to be signalled
waitHandle.Wait();
// throw the exception if one was captured
e?.Throw();
Take a look at the ReIndex API tests for some ideas.
The new Reindex API is named client.ReIndexOnServer() in the client to differentiate it from the existing observable implementation.

Related

How can I enable automatic slicing on Elasticsearch operations like UpdateByQuery or Reindex using the Nest client?

I'm using the Nest client to programmatically execute requests against an Elasticsearch index. I need to use the UpdateByQuery API to update existing data in my index. To improve performance on large data sets, the recommended approach is to use slicing. In my case I'd like to use the automatic slicing feature documented here.
I've tested this out in the Kibana dev console and it works beautifully. I'm struggling on how to set this property in code through the Nest client interface. here's a code snippet:
var request = new Nest.UpdateByQueryRequest(indexModel.Name);
request.Conflicts = Elasticsearch.Net.Conflicts.Proceed;
request.Query = filterQuery;
// TODO Need to set slices to auto but the current client doesn't allow it and the server
// rejects a value of 0
request.Slices = 0;
var elasticResult = await _elasticClient.UpdateByQueryAsync(request, cancellationToken);
The comments on that property indicate that it can be set to "auto", but it expects a long so that's not possible.
// Summary:
// The number of slices this task should be divided into. Defaults to 1, meaning
// the task isn't sliced into subtasks. Can be set to `auto`.
public long? Slices { get; set; }
Setting to 0 just throws an error on the server. Has anyone else tried doing this? Is there some other way to configure this behavior? Other APIs seem to have the same problem, like ReindexOnServerAsync.
This was a bug in the spec and an unfortunate consequence of generating this part of the client from the spec.
The spec has been fixed and the change will be reflected in a future version of the client. For now though, it can be set with the following
var request = new Nest.UpdateByQueryRequest(indexModel.Name);
request.Conflicts = Elasticsearch.Net.Conflicts.Proceed;
request.Query = filterQuery;
((IRequest)request).RequestParameters.SetQueryString("slices", "auto");
var elasticResult = await _elasticClient.UpdateByQueryAsync(request, cancellationToken);

Update Builder gives late response when multiple versions are there in Elasticsearch?

Project : Spring Boot
I'm updating my elasticsearch document using following way,
#Override
public Document update(DocumentDTO document) {
try {
Document doc = documentMapper.documentDTOToDocument(document);
Optional<Document> fetchDocument = documentRepository.findById(document.getId());
if (fetchDocument.isPresent()) {
fetchDocument.get().setTag(doc.getTag());
Document result = documentRepository.save(fetchDocument.get());
final UpdateRequest updateRequest = new UpdateRequest(Constants.INDEX_NAME, Constants.INDEX_TYPE, document.getId().toString());
updateRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);
updateRequest.doc(jsonBuilder().startObject().field("tag", doc.getTag()).endObject());
UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
log.info("ES result : "+ updateResponse.status());
return result;
}
} catch (Exception ex) {
log.info(ex.getMessage());
}
return null;
}
Using this my document updated successfully and version incremented but when version goes 20+.
It takes lot many time to retrieve data(around 14sec).
I'm still confused regarding process of versioning. How it works in update and delete scenario? At time of search it process all the data version and send latest one? Is it so?
Elasticsearch internally uses Lucene which uses immutable segments to store the data. as these segments are immutable, every update on elasticsearch internally marks the old document delete(soft delete) and inserts a new document(with a new version).
The old document is later on cleanup during a background segment merging process.
A newly updated document should be available in 1 second(default refresh interval) but it can be disabled or change, so please check this setting in your index. I can see you are using wait_for param in your code, please remove this and you should be able to see the updated document fast if you have not changed the default refresh_interval.
Note:- Here both update and delete operation works similarly, the only difference is that in delete operation new document is not created, and the old document is marked soft delete and later on during segment merge deleted permanently.

Reindexing using NEST V5.4 - ElasticSearch

I'm quite new to ElasticSearch. I'm trying to reindex a index in order to rename it. I'm using NEST API v5.4.
I saw this example:
var reindex =
elasticClient.Reindex<Customer>(r =>
r.FromIndex("customers-v1")
.ToIndex("customers-v2")
.Query(q => q.MatchAll())
.Scroll("10s")
.CreateIndex(i =>
i.AddMapping<Customer>(m =>
m.Properties(p =>
p.String(n => n.Name(name => name.Zipcode).Index(FieldIndexOption.not_analyzed))))));
Source: http://thomasardal.com/elasticsearch-migrations-with-c-and-nest/
However, I can't reproduce this using NEST 5.4. I think that is to version 2.4.
I check the breaking changes of ElasticSearch and try reindexing using this:
Source: https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/nest-breaking-changes.html
public method Nest.ReindexDescriptor..ctor Declaration changed (Breaking)
2.x: public .ctor(IndexName from, IndexName to) 5.x: public .ctor()
var reindex = new client.Reindex(oldIndexName, newIndexName);
But this did not work too.
I also search for documentation but i didn't find any code on c#, just JSON
Source: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html)
Can someone give me a example how to reindex using NEST 5.4 on C#?
Thanks in advance! :slight_smile:
After search for 2 long days I found out the solution to reindex a index. In order to solve future problems, I'll provide my solution.
Nest Version - 5.4
var reindex = client.Reindex<object>(r => r
.BackPressureFactor(10)
// ScrollAll - Scroll all the documents of the index and store it for 1minute
.ScrollAll("1m", 2, s => s
.Search(ss => ss
.Index(oldIndexName)
.AllTypes())
// there needs to be some degree of parallelism for this to work
.MaxDegreeOfParallelism(4))
.CreateIndex(c => c
// New index here
.Index(newIndexName)
.Settings(
// settings goes here)
.Mappings(
// mappings goes here))
.BulkAll(b => b
// New index here!
.Index(newIndexName)
.Size(100)
.MaxDegreeOfParallelism(2)
.RefreshOnCompleted()));
the ReIndex method returns a cold IObservable on which you have to call .Subscribe() to kick off everything.
So, you need to add it to your code:
var o = new ReindexObserver(
onError: (e) => { //do something },
onCompleted: () => { //do something });
reindex.Subscribe(o);
Useful links to check this are:
Documentation
Issue 2660 on GitHub
Issue 2771 on GitHub

NEST: How can I do different operations and mapping types in one bulk request?

I have a list of "event" objects. Each event has its operation (delete, update, index, etc), its mapping type (document, folder, etc.), and the actual content to be indexed into Elasticsearch, if any. I don't know what any of these operations will be in advance. How can I use NEST to dynamically choose the bulk operation and mapping type for each of these events?
Bulk method on ElasticClient should fit your requirements.
You can pass various bulk operations into theBulkRequest, this is a simple usage:
var bulkRequest = new BulkRequest();
bulkRequest.Operations = new List<IBulkOperation>
{
new BulkCreateDescriptor<Document>().Id(1).Document(new Document{}),
new BulkDeleteDescriptor<Document>().Id(2)
};
var bulkResponse = client.Bulk(bulkRequest);
Hope it helps.

How to present NEST query results?

I want to return NEST query results as console output.
My query is:
private static void PerformTermQuery(string query)
{
var result =
client.Search<Post>(s => s
.Query(p => p.Term(q => q.PostText, query)));
}
What I am getting is object, with 2 Documents. How to "unpack" it to show documents as json (full or partial) to the console?
Assuming you are using version 1.3.1 of NEST, you can:
get raw JSON response using result.RequestInformation.ResponseRaw.Utf8String()
parse JSON to get _source
include/exclude _source properties using SearchSourceDescriptor on SearchDescriptor
var result =
client.Search<Post>(s => s
.Query(p => p.Term(q => q.PostText, query)).Source(...));
For NEST / Elasticsearch 5.x, result.RequestInformation is no longer available. Instead, you can access the raw request and response data by first disabling direct streaming on the request:
var results = elasticClient.Search<MyObject>(s => s
.Index("myindex")
.Query(q => q
...
)
.RequestConfiguration(rc => rc
.DisableDirectStreaming()
)
);
After you've disabled direct streaming, you can access results.ApiCall.ResponseBodyInBytes (if you look at this property without disabling direct streaming, it will be null)
string rawResponse = Encoding.UTF8.GetString(results.ApiCall.ResponseBodyInBytes);
This probably has a performance impact so I would avoid using it on production. You can also disable direct streaming at the connection / client level, if you need to use it across all your queries. Take a look at the documentation for more information.

Resources