I am using RestHighLevelClient to fetch documents from ES storage.
.....
RestHighLevelClient client = new RestHighLevelClient(restClientBuilder);
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(60L));
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.size(100);
sourceBuilder.query(QueryBuilders.matchQuery("id", id));
SearchRequest searchRequest = new SearchRequest("my-index");
searchRequest.scroll(scroll);
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
Currently this fetches all fields, but I want to fetch only a specific field.
How can this be done using RestHighLevelClient.
You need to use the source filtering and pass the array of field names that you want to fetch, below request added on according to your example will fetch only title fields and exclude everything else in response.
RestHighLevelClient client = new RestHighLevelClient(restClientBuilder);
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(60L));
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.size(100);
String[] includeFields = new String[] {"title"};
sourceBuilder.fetchSource(includeFields, null);
sourceBuilder.query(QueryBuilders.matchQuery("id", id));
SearchRequest searchRequest = new SearchRequest("my-index");
searchRequest.scroll(scroll);
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
Related
I am trying to get all the documents from multiple indexes with Scroll Api but it doesn't return all of them. I found a similar question but op was obviously missing first set of documents. Link to the question: Elasticsearch Search Scroll API doesn't retrieve all the documents from an index
Here is my code:
//Code to get indexes
for (String indexName : indexNames) {
final Scroll scroll = new Scroll(TimeValue.timeValueSeconds(45L));
SearchRequest searchRequest = new SearchRequest(indexName);
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilder query = QueryBuilders.boolQuery()
.filter(QueryBuilders.termQuery(sourceId, 2))
.filter(QueryBuilders.rangeQuery(date).gte(01-05-2021).lte(31-05-2021));
searchSourceBuilder.query(query);
searchSourceBuilder.size(10000);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();
List<Model> model = new ArrayList<>();
while(searchHits != null && searchHits.length > 0) {
for (SearchHit document : searchHits){
//add document to model list created above
} //end of for loop
// insert model list to database
SearchScrollRequest searchScrollRequest = new SearchScrollRequest(scrollId);
searchScrollRequest.scroll(scroll);
searchResponse = client.scroll(searchScrollRequest, RequestOptions.DEFAULT);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
} //end of while loop
ClearScrollRequest clear = new ClearScrollRequest();
clear.addScrollId(scrollId);
} //end of for loop at the top
Total number of documents I should get is 115 millions but I am missing more than 2 millions documents. I repeatedly checked my code but no idea what I am missing.
We started using elasticsearch high level client recently and we use scroll API to fetch large set of data from ES. We see a pattern in high CPU utilization as follows:
It's pattern repeating every 30 minutes. No clue what's going on. We see exception in elasticsearch too -
[2021-05-12T04:19:29,516][DEBUG][o.e.a.s.TransportSearchScrollAction]
[node-2] [93486247] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException:
[node-3][10.160.86.222:7550][indices:data/read/search[phase/query/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No
search context found for id [93486247]
at org.elasticsearch.search.SearchService.getExecutor(SearchService.java:496)
~[elasticsearch-6.8.9.jar:6.8.9]
at org.elasticsearch.search.SearchService.runAsync(SearchService.java:373)
~[elasticsearch-6.8.9.jar:6.8.9]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:435)
~[elasticsearch-6.8.9.jar:6.8.9]
at org.elasticsearch.action.search.SearchTransportService$8.messageReceived(SearchTransportService.java:376)
~[elasticsearch-6.8.9.jar:6.8.9]
at org.elasticsearch.action.search.SearchTransportService$8.messageReceived(SearchTransportService.java:373)
~[elasticsearch-6.8.9.jar:6.8.9]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:250)
~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
~[elasticsearch-6.8.9.jar:6.8.9]
The high level client code being used is the usual code given in the official documentation-
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest(INDEX_NAME);
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
if (StringUtils.isNotBlank(keyword)) {
LOG.info("Searching for keyword: {}", keyword);
boolQueryBuilder.must(QueryBuilders.multiMatchQuery(keyword, INDEXED_FIELDS));
}
if(StringUtils.isNotBlank(param1)) {
boolQueryBuilder.filter(QueryBuilders.termQuery("param1", param1));
}
if(Objects.nonNull(param1)) {
boolQueryBuilder.filter(QueryBuilders.termsQuery("param1", param1));
}
if(Objects.nonNull(param1)) {
boolQueryBuilder.filter(QueryBuilders.termsQuery("param1", param1));
}
if(Objects.nonNull(param1)) {
boolQueryBuilder.filter(QueryBuilders.termsQuery("param1", param1));
}
if(Objects.nonNull(param1)) {
boolQueryBuilder.filter(QueryBuilders.termsQuery("param1", param1));
}
searchSourceBuilder.query(boolQueryBuilder);
searchRequest.source(searchSourceBuilder);
List<Object1> statuses = new ArrayList<>();
try {
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
SearchHit[] searchHits = searchResponse.getHits().getHits();
while (searchHits != null && searchHits.length > 0) {
for (SearchHit hit : searchHits) {
Object1 agent = JsonUtil.parseJson(hit.getSourceAsString(),
Object1.class);
statuses.add(agent);
}
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
}
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
boolean succeeded = clearScrollResponse.isSucceeded();
I am using the Scroll API to get more than 10,000 documents from our Elastic Search, however, whenever I the code tries to query past 10k, I get the below error:
Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
This is my code:
try {
// 1. Build Search Request
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest(eventId);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(queryBuilder);
searchSourceBuilder.size(limit);
searchSourceBuilder.profile(true); // used to profile the execution of queries and aggregations for a specific search
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // optional parameter that controls how long the search is allowed to take
if(CollectionUtils.isNotEmpty(sortBy)){
for (int i = 0; i < sortBy.size(); i++) {
String sortByField = sortBy.get(i);
String orderByField = orderBy.get(i < orderBy.size() ? i : orderBy.size() - 1);
SortOrder sortOrder = (orderByField != null && orderByField.trim().equalsIgnoreCase("asc")) ? SortOrder.ASC : SortOrder.DESC;
if(keywordFields.contains(sortByField)) {
sortByField = sortByField + ".keyword";
} else if(rawFields.contains(sortByField)) {
sortByField = sortByField + ".raw";
}
searchSourceBuilder.sort(new FieldSortBuilder(sortByField).order(sortOrder));
}
}
searchSourceBuilder.sort(new FieldSortBuilder("_id").order(SortOrder.ASC));
if (includes != null) {
String[] excludes = {""};
searchSourceBuilder.fetchSource(includes, excludes);
}
if (CollectionUtils.isNotEmpty(aggregations)) {
aggregations.forEach(searchSourceBuilder::aggregation);
}
searchRequest.scroll(scroll);
searchRequest.source(searchSourceBuilder);
SearchResponse resp = null;
try {
resp = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = resp.getScrollId();
SearchHit[] searchHits = resp.getHits().getHits();
// Pagination - will continue to call ES until there are no more pages
while(searchHits != null && searchHits.length > 0){
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
resp = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = resp.getScrollId();
searchHits = resp.getHits().getHits();
}
// Clear scroll request to release the search context
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
} catch (Exception e) {
String msg = "Could not get search result. Exception=" + ExceptionUtilsEx.getExceptionInformation(e);
throw new Exception(msg);
I am implementing the solution from this link: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search-scroll.html
Can anyone tell me what I am doing wrong and what I need to do to get past 10,000 with the scroll api?
If your iterations take more than 5 minutes, then you need to adapt the scroll time. Change this line to make sure the scroll context doesn't disappear after 1 minute
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10L));
And remove this one:
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // optional parameter that controls how long the search is allowed to take
I want to search all nearby location given a pair of lat-long through my elastic search which have two separate columns for latitude and longitude.
How to do this
You will have to use the geoDistanceQuery. Please find snippet of code below (written in Java, using the Elasticsearch REST High Level Client.
FYI : a complete tutorial is available on my website : www.ictdynamic.be -> ElasticSearch 6 – Spatial Queries with RestHighLevelClient and Java – Part 1: geoDistanceQuery
public Set<?> geoDistanceQuery(String index, String nameGeoPointField, double lat, double lon, double distance, EsQuery esQuery) throws IOException {
Date startDate = new Date();
Set<Object> objectsWithinDistance = new LinkedHashSet<>();
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
QueryBuilder geoDistanceQueryBuilder = QueryBuilders
.geoDistanceQuery(nameGeoPointField)
.point(lat, lon)
.distance(distance, DistanceUnit.KILOMETERS);
BoolQueryBuilder boolQuery = getBooleanQueryWithConditions(esQuery);
QueryBuilder completeQuery = QueryBuilders
.boolQuery()
.must(boolQuery)
.filter(geoDistanceQueryBuilder);
sourceBuilder.query(completeQuery).size(SIZE_ES_QUERY);
SearchRequest searchRequest = new SearchRequest(index)
.source(sourceBuilder.sort(SortBuilders.geoDistanceSort(nameGeoPointField, lat, lon)
.order(SortOrder.ASC)
.unit(DistanceUnit.KILOMETERS)));
SearchResponse searchResponse = restClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
for (SearchHit hit : hits.getHits()) {
objectsWithinDistance.add(GeoService.getObjectFrom_ES_Hit(hit, nameGeoPointField));
}
return timedReturn(LOGGER, new Object() {}.getClass().getEnclosingMethod().getName(), startDate.getTime(), objectsWithinDistance);
}
I have the following:
final duration = (jsonBuilder()
.startObject()
.field('start', new DateTime(testResult.startTime, dateTimeZone))
.field('end', new DateTime(testResult.endTime, dateTimeZone))
.endObject())
client.prepareIndex('builds', 'test')
.setSource(jsonBuilder()
.startObject()
.field("duration", duration)
.endObject())
SearchResponse searchResponse = client.prepareSearch('builds')
.setQuery(boolQuery()
.must(termQuery('_type', 'test')))
.execute()
.actionGet()
final source = searchResponse.hits.hits[0].source as Map<String, Object>
How do I retrieve the values of duration.start and duration.end from here?
Try 1..!
SearchHit[] searchHits = searchResponse.getHits().getHits();
Map<String, Object> s=searchHits[0].sourceAsMap();
Map<String, Date> duration=(Map<String, Date>) s.get("duration");
Date start=duration.get("start");
Date end=duration.get("end");
Try 2..!
SearchHit[] searchHits = searchResponse.getHits().getHits();
StringBuilder builder = new StringBuilder();
int length = searchHits.length;
builder.append("[");
for (int i = 0; i < length; i++) {
if (i == length - 1) {
builder.append(searchHits[i].getSourceAsString());
} else {
builder.append(searchHits[i].getSourceAsString());
builder.append(",");
}
}
builder.append("]");
String result= builder.toString();
it will return a string and its a valid JSON array use JSON parser and fetch values as normal json process..!
The problem is that field() doesn't recognize XContentBuilder as a value despite what http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/index_.html implies. From the source code for XContentBuilder, it's unclear to me how to use field with XContentBuilder.
It's easy enough to use a Map as a value, though.
final duration = [
'start': new DateTime(testResult.startTime, dateTimeZone),
'end': new DateTime(testResult.endTime, dateTimeZone)]
client.prepareIndex('builds', 'test')
.setSource(jsonBuilder()
.startObject()
.field("duration", duration)
.endObject())
SearchResponse searchResponse = client.prepareSearch('builds')
.setQuery(boolQuery()
.must(termQuery('_type', 'test')))
.execute()
.actionGet()
final source = searchResponse.hits.hits[0].source
assertThat(source.duration.start, equalTo('1970-01-01T00:00:00.001Z'))
assertThat(source.duration.end, equalTo('1970-01-01T00:00:00.002Z'))