Avoid using scroll in ReactiveElasticsearchTemplate or Clearing/closing scroll after use - spring

We have a spring webflux application which is querying Elasticsearch using ReactiveElasticsearchTemplate like this
final inline fun <reified R> getSearchMonoList(
indexName: String,
query: NativeSearchQuery,
metricName: String
): Mono<List<SearchHit<R>>> {
val startTime = getCurrentTime()
recordThroughput(metricName, THROUGHPUT)
return when (indexName == EMPTY_STRING) {
true -> getEsClientTemplate().search(query, R::class.java).collectList()
else -> getEsClientTemplate().search(query, R::class.java, IndexCoordinates.of(indexName)).collectList()
}.doOnError {
recordThroughput(metricName, FAILED_THROUGHPUT)
}.doFinally {
getEsClientTemplate()
recordTime(metricName, startTime)
}
}
Client Configuration is
public ReactiveElasticsearchClient reactiveElasticsearchClient() {
ClientConfiguration clientConfiguration = ClientConfiguration.builder()
.connectedTo(urls.toArray(String[]::new))
.withWebClientConfigurer(webClient -> {
final ExchangeStrategies exchangeStrategies = ExchangeStrategies.builder()
.codecs(configurer -> configurer.defaultCodecs()
.maxInMemorySize(-1))
.build();
return webClient.mutate().exchangeStrategies(exchangeStrategies).build();
})
.build();
return ReactiveRestClients.create(clientConfiguration);
}
Our Problem
whenever Template queries ES Request body: {"from":0,"size":10000}
is requesting ES for 10000 records which was way too much for a term query so we have fixed in query builder by using Pageable,
However we do not want to use scroll as this is exhausting Max scroll connections as this API is very heavily used.
NativeSearchQueryBuilder().withQuery(boolQueryBuilders).withPageable(PageRequest.of(0,1))
i am aware of this spring documentation where this was the suggested solution
SearchScrollHits<SampleEntity> scroll = template.searchScrollStart(1000, searchQuery, SampleEntity.class, index);
String scrollId = scroll.getScrollId();
List<SampleEntity> sampleEntities = new ArrayList<>();
while (scroll.hasSearchHits()) {
sampleEntities.addAll(scroll.getSearchHits());
scrollId = scroll.getScrollId();
scroll = template.searchScrollContinue(scrollId, 1000, SampleEntity.class);
}
template.searchScrollClear(scrollId);
However in current production we can not update the libraries because of DMZ restrictions, We are using spring data elasticsearch 4.1.1
How can i disable scroll or clear the scroll after use any help would be appreciable ?

Related

Springboot webflux reactor delete items from mongoDB

I use springboot + mongodb, and I am a beginer for webflux. I write code for CRUD. When I access delete ids in Controller, code not working because count alway return 0. Any one help me?
#ApiOperation(value = "Delete multi cities")
#DeleteMapping
public Mono<ResponseEntity<AtomicInteger>> deleteByIds(#RequestBody #NotNull Set<String> ids) {
AtomicInteger count = new AtomicInteger(0);
Flux.fromIterable(ids)
.flatMap((id) -> {
return cityService.findById(id)
.flatMap((c) -> {
count.getAndAdd(1);
return cityService.deleteById(c.getId());
});
});
log.debug("count = {}", count);
return Mono.just(ResponseEntity.ok(count));
}
The Flux is not be subscribed
you should try like this
return Flux.fromIterable(ids)
.flatMap((id) -> {
return cityService.findById(id)
.flatMap((c) -> {
count.getAndAdd(1);
return cityService.deleteById(c.getId());
});
})
.then(Mono.defer(() -> {
log.debug("count = {}", count);
return Mono.just(ResponseEntity.ok(count));
}));

NoNodeAvailableException[None of the configured nodes were available:

if I do not set size, I can get 10 hits:
SearchResponse sr = client.prepareSearch("xxx").setTypes("xxx")
.setQuery(rangeQueryBuilder)
.setQuery(queryBuilder)
but when I set size more than 12:
SearchResponse sr = client.prepareSearch("xxx").setTypes("xxx")
.setSize(13)
.setQuery(rangeQueryBuilder)
.setQuery(queryBuilder)
I get this problem:
NoNodeAvailableException[None of the configured nodes were available: [{gw_172.28.236.85:40001}{oHcfPhqFQDSW4opwUuzCpA}{P1GbtDqrRda4nlbRRBmW1Q}{172.28.236.85}{172.28.236.85:40101}{xpack.installed=true},
my java connect code:
public static TransportClient client() throws UnknownHostException {
if (client != null) {
return client;
}
synchronized (esConnection_old.class) {
if (client == null) {
Settings settings = Settings.builder().put("cluster.name", ClusterName)
.put("client.transport.sniff", false)
.put(SecurityKey, basicAuthHeaderValue(SecurityUser, SecurityPassword))
.build();
client = new PreBuiltTransportClient(settings);
String[] oneInstance = GatewayIpPorts.split(",");
for (String item : oneInstance) {
String[] ipPort = item.split(":");
client.addTransportAddresses(new TransportAddress(InetAddress.getByName(ipPort[0]), Integer.parseInt(ipPort[1])));
}
return client;
}
return client;
}
}
Normally this exception comes when Elasticsearch needs to perform a certain action on a node (allocation of the shard, indexing, and searching data) and it does not find the nodes which can serve these requests.
You can have a look at NoNodeAvailableException Code and trace back to it, I looked this is the latest code and couldn't find None of the configured nodes were available: for search action which you are trying to perform.
Please provide your elasticsearch version and also confirm this exception comes just because of size param value more than 10?

Spring reactive : mixing RestTemplate & WebClient

I have two endpoints : /parent and /child/{parentId}
I need to return list of all Child
public class Parent {
private long id;
private Child child;
}
public class Child {
private long childId;
private String someAttribute;
}
However, call to /child/{parentId} is quite slow, so Im trying to do this:
Call /parent to get 100 parent data, using asynchronous RestTemplate
For each parent data, call /child/{parentId} to get detail
Add the result call to /child/{parentId} into resultList
When 100 calls to /child/{parentId} is done, return resultList
I use wrapper class since most endpoints returns JSON in format :
{
"next": "String",
"data": [
// parent goes here
]
}
So I wrap it in this
public class ResponseWrapper<T> {
private List<T> data;
private String next;
}
I wrote this code, but the resultList always return empty elements.
What is the correct way to achieve this?
public List<Child> getAllParents() {
var endpointParent = StringUtils.join(HOST, "/parent");
var resultList = new ArrayList<Child>();
var responseParent = restTemplate.exchange(endpointParent, HttpMethod.GET, httpEntity,
new ParameterizedTypeReference<ResponseWrapper<Parent>>() {
});
responseParent.getBody().getData().stream().forEach(parent -> {
var endpointChild = StringUtils.join(HOST, "/child/", parent.getId());
// async call due to slow endpoint child
webClient.get().uri(endpointChild).retrieve()
.bodyToMono(new ParameterizedTypeReference<ResponseWrapper<Child>>() {
}).map(wrapper -> wrapper.getData()).subscribe(children -> {
children.stream().forEach(child -> resultList.add(child));
});
});
return resultList;
}
Calling subscribe on a reactive type starts the processing but returns immediately; you have no guarantee at that point that the processing is done. So by the time your snippet is calling return resultList, the WebClient is probably is still busy fetching things.
You're better off discarding the async resttemplate (which is now deprecated in favour of WebClient) and build a single pipeline like:
public List<Child> getAllParents() {
var endpointParent = StringUtils.join(HOST, "/parent");
var resultList = new ArrayList<Child>();
Flux<Parent> parents = webClient.get().uri(endpointParent)
.retrieve().bodyToMono(ResponseWrapper.class)
.flatMapMany(wrapper -> Flux.fromIterable(wrapper.data));
return parents.flatMap(parent -> {
var endpointChild = StringUtils.join(HOST, "/child/", parent.getId());
return webClient.get().uri(endpointChild).retrieve()
.bodyToMono(new ParameterizedTypeReference<ResponseWrapper<Child>>() {
}).flatMapMany(wrapper -> Flux.fromIterable(wrapper.getData()));
}).collectList().block();
}
By default, the parents.flatMap operator will process elements with some concurrency (16 by default I believe). You can choose a different value by calling another variant of the Flux.flatMap operator with a chosen concurrency value.

Validation Webflux ReactiveMongo

I wanna ask about Webflux using Spring-data-mongoreactive. how you perform checking data exist? I've been search for 3 days but still not found the way.
public Mono<TenantRegistrationModel> registration(TenantRegistrationModel registrationModel1) {
registrationModel1.beforeRegistration();
return Mono.just(registrationModel1).flatMap(registrationModel -> {
TenantEntity tenantEntity = new TenantEntity();
tenantEntity.setFullName(registrationModel.getFullName());
tenantEntity.setEmail(registrationModel.getEmail());
tenantEntity.setCountry(registrationModel.getCountry());
tenantEntity.setCity(registrationModel.getCity());
BCryptPasswordEncoder passwordEncoder = new BCryptPasswordEncoder();
tenantEntity.setPassword(passwordEncoder.encode(registrationModel.getPassword()));
tenantEntity.setUpdatedAt(new Date());
tenantEntity.setCreatedAt(new Date());
Mono<TenantEntity> tenantEntityMono = this.tenantService.save(tenantEntity);
return this.tenantRepository.findByEmail(registrationModel.getEmail()).next().doOnNext(tenantEntity1 -> {
if (tenantEntity1.getEmail() != null)
throw new ApiExceptionUtils("email already eixst", HttpStatus.UNPROCESSABLE_ENTITY.value(), StatusCodeUtils.VALIDATION_FAIL);
}).switchIfEmpty(tenantEntityMono).map(tenantEntity1 -> {
registrationModel.setId(tenantEntity1.getId());
return registrationModel;
});
});
}
I want to validate 2 query from mongoreactive and comparing the result.

Elastic Search and Twitter Data example

I am learning about elastic search and I am following the next tutorial. In that tutorial it is used tweets of Twiter as example data. Method tweetJsonList return a example data. I am trying to save this in the index "tweets_juan" and type "tweet". The application run without problems, but when I search all documents using (http://localhost:9200/tweets_juan/tweet/_search?q=:) I do not found anything. Could you help me please to verify whats happens here?
public class App
{
#SuppressWarnings("unchecked")
public static void main( String[] args ) throws TwitterException, UnknownHostException
{
System.out.println( "Hello World!" );
List<String> tweetJsonList = searchForTweets();
Client client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));
String index = "tweets_juan";
client.admin().indices()
.create(new CreateIndexRequest(index))
.actionGet();
save(client, tweetJsonList, index);
searchExample(client);
}
public static void save(Client client, List<String> tweetJsonList, String index) {
BulkRequestBuilder bulkRequestBuilder = client.prepareBulk().setRefresh(true);
for (String data : tweetJsonList) {
String indexName = index;
String typeName = "tweet";
String json = new Gson().toJson(data);
System.out.println("Juan Debug:" + data);
bulkRequestBuilder.add(client.prepareIndex(indexName, typeName).setSource(json));
}
bulkRequestBuilder.execute().actionGet();
}
public static void searchExample(Client client) {
BoolQueryBuilder queryBuilder = QueryBuilders
.boolQuery()
.must(termsQuery("text", "Baloncesto"));
SearchResponse searchResponse = client.prepareSearch("tweets_juan")
.setQuery(queryBuilder)
.setSize(25)
.execute()
.actionGet();
}
public static List searchForTweets() throws TwitterException {
Twitter twitter = new TwitterFactory().getInstance();
Query query = new Query("mundial baloncesto");
List tweetList = new ArrayList<>();
for (int i = 0; i < 10; i++) {
QueryResult queryResult = twitter.search(query);
tweetList.addAll(queryResult.getTweets());
if (!queryResult.hasNext()) {
break;
}
query = queryResult.nextQuery();
}
Gson gson = new Gson();
return (List) tweetList.stream().map(gson::toJson).collect(Collectors.toList());
}
}
You need to put more information before anyone can answer your question.
Since you are not using any explicit mapping your fields must be getting analyzed by default. So your text field will get tokenized into multiple terms.
Use "match all" query to see what data has been indexed.
Term query is used for exact match ( including exact case) and you are trying to run term query on an analyzed field "text" which will not work.
Try using match or match phrase query on the text field and see if you get back any result.

Resources