Elasticsearch Indexing by BulkRequestBuilder getting slow down

Elasticsearch Indexing by BulkRequestBuilder getting slow down - performance

Hi all elasticsearch masters.
I have millions of data to be indexed by elasticsearch Java API.
The number of cluster nodes for elasticsearch are three (1 as master + 2 nodes).
My code snippet is below.
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "MyClusterName").build();
TransportClient client = new TransportClient(settings);
String hostname = "myhost ip";
int port = 9300;
client.addTransportAddress(new InetSocketTransportAddress(hostname, port));
BulkRequestBuilder bulkBuilder = client.prepareBulk();
BufferedReader br = new BufferedReader(new InputStreamReader(new DataInputStream(new FileInputStream("my_file_path"))));
long bulkBuilderLength = 0;
String readLine = "";
String index = "my_index_name";
String type = "my_type_name";
String id = "";
while((readLine = br.readLine()) != null){
id = somefunction(readLine);
String json = new ObjectMapper().writeValueAsString(readLine);
bulkBuilder.add(client.prepareIndex(index, type, id)
.setSource(json));
bulkBuilderLength++;
if(bulkBuilderLength % 1000== 0){
logger.info("##### " + bulkBuilderLength + " data indexed.");
BulkResponse bulkRes = bulkBuilder.execute().actionGet();
if(bulkRes.hasFailures()){
logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
}
}
}
br.close();
if(bulkBuilder.numberOfActions() > 0){
logger.info("##### " + bulkBuilderLength + " data indexed.");
BulkResponse bulkRes = bulkBuilder.execute().actionGet();
if(bulkRes.hasFailures()){
logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
}
bulkBuilder = client.prepareBulk();
}
It works fine but the performance getting SLOW DOWN RAPIDLY after thousands of document.
I've already tried to change settings value of "refresh_interval" as -1 and "number_of_replicas" as 0.
However, the situation of performance decreasing is the same.
If I monitor the status of my cluster using bigdesk, the GC value reaches 1 in every seconds like the screenshot below.
Anyone can help me?
Thanks in advance.
=================== UPDATED ===========================
Finally, I've solved this problem. (See the answer).
The cause of the problem is that I've missed recreate a new BulkRequestBuilder.
Performance degradation is never occurred after I've changed my code snippet like below.
Thank you very much.
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "MyClusterName").build();
TransportClient client = new TransportClient(settings);
String hostname = "myhost ip";
int port = 9300;
client.addTransportAddress(new InetSocketTransportAddress(hostname, port));
BulkRequestBuilder bulkBuilder = client.prepareBulk();
BufferedReader br = new BufferedReader(new InputStreamReader(new DataInputStream(new FileInputStream("my_file_path"))));
long bulkBuilderLength = 0;
String readLine = "";
String index = "my_index_name";
String type = "my_type_name";
String id = "";
while((readLine = br.readLine()) != null){
id = somefunction(readLine);
String json = new ObjectMapper().writeValueAsString(readLine);
bulkBuilder.add(client.prepareIndex(index, type, id)
.setSource(json));
bulkBuilderLength++;
if(bulkBuilderLength % 1000== 0){
logger.info("##### " + bulkBuilderLength + " data indexed.");
BulkResponse bulkRes = bulkBuilder.execute().actionGet();
if(bulkRes.hasFailures()){
logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
}
bulkBuilder = client.prepareBulk(); // This line is my mistake and the solution !!!
}
}
br.close();
if(bulkBuilder.numberOfActions() > 0){
logger.info("##### " + bulkBuilderLength + " data indexed.");
BulkResponse bulkRes = bulkBuilder.execute().actionGet();
if(bulkRes.hasFailures()){
logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
}
bulkBuilder = client.prepareBulk();
}

The problem here is that you don't recreate again a new Bulk after Bulk execution.
It means that you are reindexing the same first data again and again.
BTW, look at BulkProcessor class. Definitely better to use.

Related

ElasticSearch Scroll API not going past 10000 limit

I am using the Scroll API to get more than 10,000 documents from our Elastic Search, however, whenever I the code tries to query past 10k, I get the below error:
Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
This is my code:
try {
// 1. Build Search Request
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest(eventId);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(queryBuilder);
searchSourceBuilder.size(limit);
searchSourceBuilder.profile(true); // used to profile the execution of queries and aggregations for a specific search
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // optional parameter that controls how long the search is allowed to take
if(CollectionUtils.isNotEmpty(sortBy)){
for (int i = 0; i < sortBy.size(); i++) {
String sortByField = sortBy.get(i);
String orderByField = orderBy.get(i < orderBy.size() ? i : orderBy.size() - 1);
SortOrder sortOrder = (orderByField != null && orderByField.trim().equalsIgnoreCase("asc")) ? SortOrder.ASC : SortOrder.DESC;
if(keywordFields.contains(sortByField)) {
sortByField = sortByField + ".keyword";
} else if(rawFields.contains(sortByField)) {
sortByField = sortByField + ".raw";
}
searchSourceBuilder.sort(new FieldSortBuilder(sortByField).order(sortOrder));
}
}
searchSourceBuilder.sort(new FieldSortBuilder("_id").order(SortOrder.ASC));
if (includes != null) {
String[] excludes = {""};
searchSourceBuilder.fetchSource(includes, excludes);
}
if (CollectionUtils.isNotEmpty(aggregations)) {
aggregations.forEach(searchSourceBuilder::aggregation);
}
searchRequest.scroll(scroll);
searchRequest.source(searchSourceBuilder);
SearchResponse resp = null;
try {
resp = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = resp.getScrollId();
SearchHit[] searchHits = resp.getHits().getHits();
// Pagination - will continue to call ES until there are no more pages
while(searchHits != null && searchHits.length > 0){
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
resp = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = resp.getScrollId();
searchHits = resp.getHits().getHits();
}
// Clear scroll request to release the search context
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
} catch (Exception e) {
String msg = "Could not get search result. Exception=" + ExceptionUtilsEx.getExceptionInformation(e);
throw new Exception(msg);
I am implementing the solution from this link: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-search-scroll.html
Can anyone tell me what I am doing wrong and what I need to do to get past 10,000 with the scroll api?

If your iterations take more than 5 minutes, then you need to adapt the scroll time. Change this line to make sure the scroll context doesn't disappear after 1 minute
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10L));
And remove this one:
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // optional parameter that controls how long the search is allowed to take

Why are Elasticsearch Scan/Scroll results so different?

I'm performing a scan/scroll to remap an index in my cluster (v2.4.3) and I'm having trouble understanding the results. In the head plugin my original index has this size/doc count:
size: 1.74Gi (3.49Gi)
docs: 708,108 (1,416,216)
If I perform a _reindex command on this index I get a new index with the same number of docs and the same size.
But if I perform a scan/scroll to copy the index I end up with may more records in my new index. I'm in the middle of the process right now and here is the current state of the new index:
size: 1.81Gi (3.61Gi)
docs: 6,492,180 (12,981,180)
Why are there so many more documents in the new index versus the old one? The mapping file declares 13 nested objects but I did not change the number of nested objects between the two indices.
Here is my scan/scroll code:
SearchResponse response = client.prepareSearch("nas")
.addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(120000))
.setQuery(matchAllQuery())
.setSize(pageable.getPageSize()).execute().actionGet();
while (true) {
if (response.getHits().getHits().length <= 0) break; //break out of the loop if failed
long startTime = System.currentTimeMillis();
List<IndexQuery> indexQueries = new ArrayList<>();
Arrays.stream(response.getHits().getHits()).forEach(hit -> {
NasProduct nasProduct = null;
try {
nasProduct = objectMapper.readValue(hit.getSourceAsString(), NasProduct.class);
} catch (IOException e) {
logger.error("Problem parsing nasProductJson json: || " + hit.getSourceAsString() + " ||", e);
}
if (nasProduct != null) {
IndexQuery indexQuery = new IndexQueryBuilder()
.withObject(nasProduct)
.withId(nasProduct.getProductKey())
.withIndexName(name)
.withType("product")
.build();
indexQueries.add(indexQuery);
}
});
elasticsearchTemplate.bulkIndex(indexQueries);
logger.info("Index updated update count: " + indexQueries.size() + " duration: " + (System.currentTimeMillis() - startTime) + " ms");
response = client.prepareSearchScroll(response.getScrollId())
.setScroll(new TimeValue(120000))
.execute().actionGet();
}

How to set Encoding in MQ headers using Beanshell in Jmeter

I am developing a test script to put a message onto a queue using IBM MQ API 8.0. I am using JMeter 3.1 and Beanshell Sampler for this (see code below).
The problem I am having is setting the "Encoding" field in the MQ headers. I've tried different methods as per API documentation, but nothing worked for me.
Has anyone faced this issue?
Thanks in advance!
Code below:
try {
MQEnvironment.hostname = _hostname;
MQEnvironment.channel = _channel;
MQEnvironment.port = _port;
MQEnvironment.userID = "";
MQEnvironment.password = "";
log.info("Using queue manager: " + _qMgr);
MQQueueManager _queueManager = new MQQueueManager(_qMgr);
int openOptions = CMQC.MQOO_OUTPUT + CMQC.MQOO_FAIL_IF_QUIESCING + CMQC.MQOO_INQUIRE + CMQC.MQOO_BROWSE
+ CMQC.MQOO_SET_IDENTITY_CONTEXT;
log.info("Using queue: " + _queueName + ", openOptions: " + openOptions);
MQQueue queue = _queueManager.accessQueue(_queueName, openOptions);
log.info("Building message...");
MQMessage sendmsg = new MQMessage();
sendmsg.clearMessage();
// Set MQ MD Headers
sendmsg.messageType = CMQC.MQMT_DATAGRAM;
sendmsg.replyToQueueName = _queueName;
sendmsg.replyToQueueManagerName = _qMgr;
sendmsg.userId = MQuserId;
sendmsg.setStringProperty("BAH_FR", fromBIC); // from /AppHdr/Fr/FIId/FinInstnId/BICFI
sendmsg.setStringProperty("BAH_TO", toBIC); // from /AppHdr/To/FIId/FinInstnId/BICFI
sendmsg.setStringProperty("BAH_MSGDEFIDR", "pacs.008.001.05"); // from /AppHdr/MsgDefIdr
sendmsg.setStringProperty("BAH_BIZSVC", "cus.clear.01-" + bizSvc); // from /AppHdr/BizSvcr
sendmsg.setStringProperty("BAH_PRTY", "NORM"); // priority
sendmsg.setStringProperty("userId", MQuserId); // user Id
sendmsg.setStringProperty("ConnectorId", connectorId);
sendmsg.setStringProperty("Roles", roleId);
MQPutMessageOptions pmo = new MQPutMessageOptions(); // accept the defaults, same as MQPMO_DEFAULT constant
pmo.options = CMQC.MQOO_SET_IDENTITY_CONTEXT; // set identity context by userId
// Build message
String msg = "<NS1> .... </NS1>";
// MQRFH2 Headers
sendmsg.format = CMQC.MQFMT_STRING;
//sendmsg.encoding = CMQC.MQENC_INTEGER_NORMAL | CMQC.MQENC_DECIMAL_NORMAL | CMQC.MQENC_FLOAT_IEEE_NORMAL;
sendmsg.encoding = 546; // encoding - 546 Windows/Linux
sendmsg.messageId = msgID.getBytes();
sendmsg.correlationId = CMQC.MQCI_NONE;
sendmsg.writeString(msg);
String messageIdBefore = new String(sendmsg.messageId, "UTF-8");
log.info("Before put, messageId=[" + messageIdBefore + "]");
int depthBefore = queue.getCurrentDepth();
log.info("Queue Depth=" + depthBefore);
log.info("Putting message on " + _queueName + ".... ");
queue.put(sendmsg, pmo);
int depthAfter = queue.getCurrentDepth();
log.info("Queue Depth=" + depthAfter);
log.info("**** Done");
String messageIdAfter = new String(sendmsg.messageId, "UTF-8");
log.info("After put, messageId=[" + messageIdAfter + "]");
log.info("Closing connection...");
} catch (Exception e) {
log.info("\\nFAILURE - Exception\\n");
StringWriter errors = new StringWriter();
e.printStackTrace(new PrintWriter(errors));
log.error(errors.toString());
}

I think you are over thinking the problem. If you are not doing some sort of weird manual character/data conversion then you should be using:
sendmsg.encoding = MQC.MQENC_NATIVE;

Issue in reading google text document

I could get the handle to the google text doc i needed. I am now stuck at how to read the contents.
My code looks like:
GoogleOAuthParameters oauthParameters = new GoogleOAuthParameters();
oauthParameters.setOAuthConsumerKey(Constants.CONSUMER_KEY);
oauthParameters.setOAuthConsumerSecret(Constants.CONSUMER_SECRET);
oauthParameters.setOAuthToken(Constants.ACCESS_TOKEN);
oauthParameters.setOAuthTokenSecret(Constants.ACCESS_TOKEN_SECRET);
DocsService client = new DocsService("sakshum-YourAppName-v1");
client.setOAuthCredentials(oauthParameters, new OAuthHmacSha1Signer());
URL feedUrl = new URL("https://docs.google.com/feeds/default/private/full/");
DocumentQuery dquery = new DocumentQuery(feedUrl);
dquery.setTitleQuery("blood_donor_verification_template_dev");
dquery.setTitleExact(true);
dquery.setMaxResults(10);
DocumentListFeed resultFeed = client.getFeed(dquery, DocumentListFeed.class);
System.out.println("feed size:" + resultFeed.getEntries().size());
String emailBody = "";
for (DocumentListEntry entry : resultFeed.getEntries()) {
System.out.println(entry.getPlainTextContent());
emailBody = entry.getPlainTextContent();
}
Plz note that entry.getPlainTextContent() does not work and throws object not TextContent type exception

finally i solved it as:
for (DocumentListEntry entry : resultFeed.getEntries()) {
String docId = entry.getDocId();
String docType = entry.getType();
URL exportUrl =
new URL("https://docs.google.com/feeds/download/" + docType
+ "s/Export?docID=" + docId + "&exportFormat=html");
MediaContent mc = new MediaContent();
mc.setUri(exportUrl.toString());
MediaSource ms = client.getMedia(mc);
InputStream inStream = null;
try {
inStream = ms.getInputStream();
int c;
while ((c = inStream.read()) != -1) {
emailBody.append((char)c);
}
} finally {
if (inStream != null) {
inStream.close();
}
}
}

ORA-01460: unimplemented or unreasonable

I try to run this query in oracle database but unfortunately I receive this error please help me :(
java.sql.SQLException: ORA-01460: unimplemented or unreasonable conversion requested
right now that problem solved and I have an other exception:
I change this line
pstmt.setBinaryStream(7, fis, (int) file.length());
with
pstmt.setBinaryStream(7, fis, (long) file.length());
Exception in thread "AWT-EventQueue-0" java.lang.AbstractMethodError: oracle.jdbc.driver.OraclePreparedStatement.setBinaryStream(ILjava/io/InputStream;J)V
for text file there is no issue, but when I try to upload a JPG file I receive this error.
PreparedStatement pstmt =
conn.prepareStatement("INSERT INTO PM_OBJECT_TABLE( " +
"N_ACTIVITY_ID, V_NAME,N_SIZE, D_MODIFY,N_CATEGORY, N_NODE_ID ,O_OBJECT) " +
" VALUES ( ? , ? , ? , ? , ? , ? ,?)");
pstmt.setLong(1, N_ACTIVITY_ID);
pstmt.setString(2, file.getName());
pstmt.setLong(3, file.length());
java.util.Date date = new java.util.Date();
java.sql.Date sqlDate = new java.sql.Date(date.getTime());
pstmt.setDate(4,sqlDate);
pstmt.setInt(5, N_CATEGORY);
pstmt.setLong(6, N_NODE_ID);
pstmt.setBinaryStream(7, fis, (int) file.length());
pstmt.executeUpdate();

java.lang.AbstractMethodError: com.mysql.jdbc.ServerPreparedStatement.setBinaryStream(ILjava/io/InputStream;J)V
To fix this problem you need to change the call to setBinaryStream so the last parameter is passed as an integer instead of a long.
i found the quote in a blog during facing the same problem
like the above PreparedStatement.setBinaryStream() has THREE overloading methods
and we should use setBinaryStream(columnIndex, InputStream, (((((((((INT)))))))
OTHERWISE, that may cause an error

I also experienced this issue with code that was working and then I got this error suddenly.
I am running Netbeans 8.0.2 with Glassfish 3
In the GlassFish\Glassfish\libs folder I had 2 ojdbc files ojdbc6.jar and ojdbc14.jar
It seems that even though ojdbc6 was included into the project libraries ojdbc14 was also loaded in.
I stopped glassfish renamed ojdbc14.jar to ojdbc14.jar.bak then clean and build and deployed the project.
Problem fixed.

I solve my problem by one of previous suggestion:
public String insertBineryToDB(long N_ACTIVITY_ID,int N_CATEGORY,long N_NODE_ID ,FileInputStream fis , java.io.File file) {
Statement statement;
try {
//conn.close();
// N_ACTIVITY_ID, V_NAME,N_SIZE, D_MODIFY,N_CATEGORY, N_NODE_ID ,O_OBJECT
PreparedStatement pstmt =
conn.prepareStatement("INSERT INTO PM_OBJECT_TABLE( " +
"N_ACTIVITY_ID, V_NAME,N_SIZE, D_MODIFY,N_CATEGORY, N_NODE_ID ,O_OBJECT) " +
" VALUES ( ? , ? , ? , ? , ? , ? ,empty_blob())");
InputStream bodyIn = fis;
pstmt.setLong(1, N_ACTIVITY_ID);
pstmt.setString(2, file.getName());
pstmt.setLong(3, file.length());
java.util.Date date = new java.util.Date();
java.sql.Date sqlDate = new java.sql.Date(date.getTime());
pstmt.setDate(4,sqlDate);
pstmt.setInt(5, N_CATEGORY);
pstmt.setLong(6, N_NODE_ID);
//pstmt.setBinaryStream(7, bodyIn,(int) file.length());
pstmt.executeUpdate();
conn.commit();
PreparedStatement stmt2 = conn.prepareStatement(" select O_OBJECT from PM_OBJECT_TABLE where N_ACTIVITY_ID = ? for update ");
stmt2.setLong(1, N_ACTIVITY_ID);
ResultSet rset = stmt2.executeQuery();
FileInputStream inputFileInputStream = new FileInputStream(file);
rset.next();
BLOB image = ((OracleResultSet) rset).getBLOB("O_OBJECT");
int bufferSize;
byte[] byteBuffer;
int bytesRead = 0;
int bytesWritten = 0;
int totBytesRead = 0;
int totBytesWritten = 0;
bufferSize = image.getBufferSize();
byteBuffer = new byte[bufferSize];
OutputStream blobOutputStream = image.getBinaryOutputStream();
while ((bytesRead = inputFileInputStream.read(byteBuffer)) != -1) {
// After reading a buffer from the binary file, write the contents
// of the buffer to the output stream using the write()
// method.
blobOutputStream.write(byteBuffer, 0, bytesRead);
totBytesRead += bytesRead;
totBytesWritten += bytesRead;
}
inputFileInputStream.close();
blobOutputStream.close();
conn.commit();
rset.close();
stmt2.close();
String output = "Wrote file " + file.getName() + " to BLOB column." +
totBytesRead + " bytes read." +
totBytesWritten + " bytes written.\n";
return output;
} catch (Exception e) {
e.printStackTrace();
return "Wrote file " + file.getName() + " to BLOB column failed." ;
}
}

Use java.sql.PreparedStatement.setBinaryStream(int parameterIndex, InputStream x) -- 2 parameters, not 3.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch Indexing by BulkRequestBuilder getting slow down - performance

The problem here is that you don't recreate again a new Bulk after Bulk execution. It means that you are reindexing the same first data again and again. BTW, look at BulkProcessor class. Definitely better to use.

Related

ElasticSearch Scroll API not going past 10000 limit

Why are Elasticsearch Scan/Scroll results so different?

How to set Encoding in MQ headers using Beanshell in Jmeter

Issue in reading google text document

ORA-01460: unimplemented or unreasonable

Categories

Resources