Elasticsearch upgrade path from transport client to high level REST client - elasticsearch

What is the upgrade path for an application using the Elasticsearch native Java client API (TransportClient) to move to using the high-level REST client for Java?
Documentation (preliminary?) seems to indicate:
The Java High Level REST Client depends on the Elasticsearch core
project. It accepts the same request arguments as the TransportClient
and returns the same response objects.
(Source: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/5.x/java-rest-high.html)
But I am not entirely clear what this means. Will I be able to switch my entire codebase over to the high level REST client without rewriting my queries, or other client-type operations? It doesn't seem like the REST client implements the Client interface. That may make sense from a decoupling point-of-view.'
What I need to know is whether I should be building my own abstraction around client operations, or whether HighLevelRestClient will be basically implementing the Client interface already.
Should I continue, for the time being, to write code against the TransportClient API or will that code all need to be rewritten when TransportClient is deprecated?
Note that I am looking at the high-level REST client, not the low-level REST client.

The high level REST client doesn't implement the Client interface. The plan is described in this blogpost that I wrote a while ago.
We are also in the process of writing documentation, which will contain a page with instructions on how to migrate from the transport client.
The new client reuses requests and responses from the existing transport client, but the client object is not compatible, that means that for instance the following:
IndexRequest indexRequest = new IndexRequest("index", "type", "id");
indexRequest.source("field", "value");
IndexResponse indexResponse = transportClient.index(indexRequest).get();
will become something like:
IndexRequest indexRequest = new IndexRequest("index", "type", "id");
indexRequest.source("field", "value");
IndexResponse indexResponse = restHighLevelClient.index(indexRequest);
As for async requests, the call is slightly different (see the method name), in the new client we went for a different method with a name that ends with the "Async" suffix, you would go from the following:
transportClient.index(indexRequest, new ActionListener<IndexResponse>() {
#Override
public void onResponse(IndexResponse indexResponse) {
// called when the operation is successfully completed
}
#Override
public void onFailure(Exception e) {
// called on failure
}
});
to the following:
restHighLevelClient.indexAsync(indexRequest, new ActionListener<IndexResponse>() {
#Override
public void onResponse(IndexResponse indexResponse) {
// called when the operation is successfully completed
}
#Override
public void onFailure(Exception e) {
// called on failure
}
});
Unfortunately the Client#prepare* methods won't be available in the high level client, so something like:
IndexResponse indexResponse = transportClient.prepareIndex("index", "type", "id").setSource("field", "value").get();
needs to be migrated to the above using ActionRequests rather then ActionRequestBuilders. We are making this change as there was always confusion between requests and builders in the transport client, two ways of doing exactly the same thing. The new client will have a single way to provide requests.
If you want to have a look at the current documentation, it is already live although work in progress: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high.html .
The High Level REST Client will replace the Transport Client, although its first upcoming release will only support index, bulk, get, delete, update, search, search scroll and clear scroll APIs. Support for missing APIs will come next, we are also open to contributions from users as usual.
The Transport Client will be soon deprecated, hence I would advice to move over the the High Level REST Client as soon as possible, it shouldn't be a huge change and it will pay off as we will be improving it overtime, already going through REST is a great improvement.

Related

coordinating multiple outgoing requests in a reactive manner

this is more of a best practice question.
in my current system (monolith), a single incoming http api request might need to gather similarly structured data from to several backend sources, aggregate it and only then return the data to the client in the reponse of the API.
in the current implementation I simply use a threadpool to send all requests to the backend sources in parallel and a countdown latch of sorts to know all requests returned.
i am trying to figure out the best practice for transforming the described above using reactice stacks like vert.x/quarkus. i want to keep the reactiveness of the service that accepts this api call, calls multiple (similar) backend source via http, aggregates the data.
I can roughly guess I can use things like rest-easy reactive for the incoming request and maybe MP HTTP client for the backend requests (not sure its its reactive) but I am not sure what can replace my thread pool to execute things in parallel and whats the best way to aggregate the data that returns.
I assume that using a http reactive client I can invoke all the backend sources in a loop and because its reactive it will 'feel' like parralel work. and maybe the returned data should be aggragated via the stream API (to join streams of data)? but TBH I am not sure.
I know its a long long question but some pointers would be great.
thanks!
You can drop the thread pool, you don't need it to invoke your backend services in parallel.
Yes, the MP RestClient is reactive. Let's say you have this service which invokes a backend to get a comic villain:
#RegisterRestClient(configKey = "villain-service")
public interface VillainService {
#GET
#Path("/")
#NonBlocking
#CircuitBreaker
Uni<Villain> getVillain();
}
And a similar one for heroes, HeroService. You can inject them in your endpoint class, retrieve a villain and a hero, and then compute the fight:
#Path("/api")
public class Api {
#RestClient
VillainService villains;
#RestClient
HeroService heroes;
#Inject
FightService fights;
#GET
public Uni<Fight> fight() {
Uni<Villain> villain = villains.getVillain();
Uni<Hero> hero = heroes.getRandomHero();
return Uni.combine().all().unis(hero, villain).asTuple()
.chain(tuple -> {
Hero h = tuple.getItem1();
Villain v = tuple.getItem2();
return fights.computeResult(h, v);
});
}
}

Start processing Flux response from server before completion: is it possible?

I have 2 Spring-Boot-Reactive apps, one server and one client; the client calls the server like so:
Flux<Thing> things = thingsApi.listThings(5);
And I want to have this as a list for later use:
// "extractContent" operation takes 1.5s per "thing"
List<String> thingsContent = things.map(ThingConverter::extractContent)
.collect(Collectors.toList())
.block()
On the server side, the endpoint definition looks like this:
#Override
public Mono<ResponseEntity<Flux<Thing>>> listThings(
#NotNull #Valid #RequestParam(value = "nbThings") Integer nbThings,
ServerWebExchange exchange
) {
// "getThings" operation takes 1.5s per "thing"
Flux<Thing> things = thingsService.getThings(nbThings);
return Mono.just(new ResponseEntity<>(things, HttpStatus.OK));
}
The signature comes from the Open-API generated code (Spring-Boot server, reactive mode).
What I observe: the client jumps to things.map immediately but only starts processing the Flux after the server has finished sending all the "things".
What I would like: the server should send the "things" as they are generated so that the client can start processing them as they arrive, effectively halving the processing time.
Is there a way to achieve this? I've found many tutorials online for the server part, but none with a java client. I've heard of server-sent events, but can my goal be achieved using a "classic" Open-API endpoint definition that returns a Flux?
The problem seemed too complex to fit a minimal viable example in the question body; full code available for reference on Github.
EDIT: redirect link to main branch after merge of the proposed solution
I've got it running by changing 2 points:
First: I've changed the content type of the response of your /things endpoint, to:
content:
text/event-stream
Don't forget to change also the default response, else the client will expect the type application/json and will wait for the whole response.
Second point: I've changed the return of ThingsService.getThings to this.getThingsFromExistingStream (the method you comment out)
I pushed my changes to a new branch fix-flux-response on your Github, so you can test them directly.

Spring Boot Webflux/Netty - Detect closed connection

I've been working with spring-boot 2.0.0.RC1 using the webflux starter (spring-boot-starter-webflux). I created a simple controller that returns a infinite flux. I would like that the Publisher only does its work if there is a client (Subscriber). Let's say I have a controller like this one:
#RestController
public class Demo {
#GetMapping(value = "/")
public Flux<String> getEvents(){
return Flux.create((FluxSink<String> sink) -> {
while(!sink.isCancelled()){
// TODO e.g. fetch data from somewhere
sink.next("DATA");
}
sink.complete();
}).doFinally(signal -> System.out.println("END"));
}
}
Now, when I try to run that code and access the endpoint http://localhost:8080/ with Chrome, then I can see the data. However, once I close the browser the while-loop continues since no cancel event has been fired. How can I terminate/cancel the streaming as soon as I close the browser?
From this answer I quote that:
Currently with HTTP, the exact backpressure information is not
transmitted over the network, since the HTTP protocol doesn't support
this. This can change if we use a different wire protocol.
I assume that, since backpressure is not supported by the HTTP protocol, it means that no cancel request will be made either.
Investigating a little bit further, by analyzing the network traffic, showed that the browser sends a TCP FIN as soon as I close the browser. Is there a way to configure Netty (or something else) so that a half-closed connection will trigger a cancel event on the publisher, making the while-loop stop?
Or do I have to write my own adapter similar to org.springframework.http.server.reactive.ServletHttpHandlerAdapter where I implement my own Subscriber?
Thanks for any help.
EDIT:
An IOException will be raised on the attempt to write data to the socket if there is no client. As you can see in the stack trace.
But that's not good enough, since it might take a while before the next chunk of data will be ready to send and therefore it takes the same amount of time to detect the gone client. As pointed out in Brian Clozel's answer it is a known issue in Reactor Netty. I tried to use Tomcat instead by adding the dependency to the POM.xml. Like this:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</dependency>
Although it replaces Netty and uses Tomcat instead, it does not seem reactive due to the fact that the browser does not show any data. However, there is no warning/info/exception in the console. Is spring-boot-starter-webflux as of this version (2.0.0.RC1) supposed to work together with Tomcat?
Since this is a known issue (see Brian Clozel's answer), I ended up using one Flux to fetch my real data and having another one in order to implement some sort of ping/heartbeat mechanism. As a result, I merge both together with Flux.merge().
Here you can see a simplified version of my solution:
#RestController
public class Demo {
public interface Notification{}
public static class MyData implements Notification{
…
public boolean isEmpty(){…}
}
#GetMapping(value = "/", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<? extends Notification>> getNotificationStream() {
return Flux.merge(getEventMessageStream(), getHeartbeatStream());
}
private Flux<ServerSentEvent<Notification>> getHeartbeatStream() {
return Flux.interval(Duration.ofSeconds(2))
.map(i -> ServerSentEvent.<Notification>builder().event("ping").build())
.doFinally(signalType ->System.out.println("END"));
}
private Flux<ServerSentEvent<MyData>> getEventMessageStream() {
return Flux.interval(Duration.ofSeconds(30))
.map(i -> {
// TODO e.g. fetch data from somewhere,
// if there is no data return an empty object
return data;
})
.filter(data -> !data.isEmpty())
.map(data -> ServerSentEvent
.builder(data)
.event("message").build());
}
}
I wrap everything up as ServerSentEvent<? extends Notification>. Notification is just a marker interface. I use the event field from the ServerSentEvent class in order to separate between data and ping events. Since the heartbeat Flux sends events constantly and in short intervals, the time it takes to detect that the client is gone is at most the length of that interval. Remember, I need that because it might take a while before I get some real data that can be sent and, as a result, it might also take a while before it detects that the client is gone. Like this, it will detect that the client is gone as soon as it can’t sent the ping (or possibly the message event).
One last note on the marker interface, which I called Notification. This is not really necessary, but it gives some type safety. Without that, we could write Flux<ServerSentEvent<?>> instead of Flux<ServerSentEvent<? extends Notification>> as return type for the getNotificationStream() method. Or also possible, make getHeartbeatStream() return Flux<ServerSentEvent<MyData>>. However, like this it would allow that any object could be sent, which I don’t want. As a consequence, I added the interface.
I'm not sure why this behaves like this, but I suspect it is because of the choice of generation operator. I think using the following would work:
return Flux.interval(Duration.ofMillis(500))
.map(input -> {
return "DATA";
});
According to Reactor's reference documentation, you're probably hitting the key difference between generate and push (I believe a quite similar approach using generate would probably work as well).
My comment was referring to the backpressure information (how many elements a Subscriber is willing to accept), but the success/error information is communicated over the network.
Depending on your choice of web server (Reactor Netty, Tomcat, Jetty, etc), closing the client connection might result in:
a cancel signal being received on the server side (I think this is supported by Netty)
an error signal being received by the server when it's trying to write on a connection that's been closed (I believe the Servlet spec does not provide that that callback and we're missing the cancel information).
In short: you don't need to do anything special, it should be supported already, but your Flux implementation might be the actual problem here.
Update: this is a known issue in Reactor Netty

Netty: How to add websocket handshake and framing while still supporting native socket

To me it looks like there is no out of the box support with mixed websocket/native socket for Netty 4. I'm using custom binary protocol on my server and it is supposed to support both native and websocket on the same port. Here is what I'm trying in my ServerInitializer:
#Override
public void initChannel(SocketChannel ch) {
System.out.println("channel initialized");
ChannelPipeline pipeline = ch.pipeline();
pipeline.addLast(new HttpServerCodec());
pipeline.addLast(new HttpObjectAggregator(65536));
// client decoders cannot be singleton....
pipeline.addLast(new WebSocketDecoder(), new ClientCommandDecoder());
pipeline.addLast(this.webSocketEncoder, this.serverCommandEncoder);
pipeline.addLast(this.roomHandler);
}
The WebSocketDecoder is taken from the examples, however it seems to use a handshaker which handles only FullHttpRequests which makes use of HttpObjectAggregator mandatory.
However both HttpServerCodec and HttpObjectAggregator don't seem to pass the input data by if it is not HTTP requests. So here is what I wonder:
Can I write custom implementations of given classes and override logic in order to pass the input data if it is not web socket but native
Or can I somehow detect if input data is from websocket and swerve to two different flows (one with HTTP support, other without)
You will need to adjust the pipeline on the fly depending on your input.
Please check our PortUnification example...

Heavy REST Application

I have an Enterprise Service Bus (ESB) that posts Data to Microservices (MCS) via Rest. I use Spring to do this. The main Problem is that i have 6 Microservices, that run one after one. So it looks like this: MCS1 -> ESB -> MCS2 -> ESB -> ... -> MCS6
So my Problem looks like this: (ESB)
#RequestMapping(value = "/rawdataservice/container", method = RequestMethod.POST)
#Produces(MediaType.APPLICATION_JSON)
public void rawContainer(#RequestBody Container c)
{
// Here i want to do something to directly send a response and afterwards execute the
// heavy code
// In the heavy code is a postForObject to the next Microservice
}
And the Service does something like this:
#RequestMapping(value = "/container", method = RequestMethod.POST)
public void addDomain(#RequestBody Container container)
{
heavyCode();
RestTemplate rt = new RestTemplate();
rt.postForObject("http://134.61.64.201:8080/rest/rawdataservice/container",container, Container.class);
}
But i dont know how to do this. I looked up the post for Location method, but i dont think it would solve the Problem.
EDIT:
I have a chain of Microservices. The first Microservice waits for a Response of the ESB. In the response the ESB posts to another Microservice and waits for a response and the next one does the same as the first one. So the Problem is that the first Microservice is blocked as long as the complete Microservice Route is completed.
ESB Route
Maybe a picture could help. 1.rawdataService 2.metadataservice 3.syntaxservice 4.semantik
// Here i want to do something to directly send a response and afterwards execute the
// heavy code
The usual spelling of that is to use the data from the http request to create a Runnable that knows how to do the work, and dispatch that runnable to an executor service for later processing. Much the same, you copy the data you need into a queue, which is polled by other threads ready to complete the work.
The http request handler then returns as soon as the executor service/queue has accepted the pending work. The most common implementation is to return a "202 Accepted" response, including in the Location header the url for a resource that will allow the client to monitor the work in progress, if desired.
In Spring, it might be ResponseEntity that manages the codes for you. For instance
ResponseEntity.accepted()....
See also:
How to respond with HTTP 400 error in a Spring MVC #ResponseBody method returning String?
REST - Returning Created Object with Spring MVC
From the caller's point of view, it would invoke RestTemplate.postForLocation, receive a URI, and throw away that URI because the microservice only needs to know that the work as been accepted
Side note: in the long term, you are probably going to want to be able to correlate the activities of the different micro services, especially when you are troubleshooting. So make sure you understand what Gregor Hohpe has to say about correlation identifiers.

Resources