silent failures when indexing in elasticsearch - elasticsearch

I'm using elasticsearch 6.4. We index about 100M documents with a node-js loader using the client 15.2.0.
The results are weird, because after every index we get different number of documents.
The code create a batch and after it reached a certain size it is loaded in elasticsearch with the bulk API. To be more performant we disable the refresh. If the bulk is rejected we wait 20 seconds and try again.
We checked also that response.error is true/false assuming that response.error = true means there are not failures.
Here the code:
if (i % options.batchSize === 0) {
var previous_start = new Date();
//sleep.msleep(options.slowdown);
async.waterfall([
function (callback) {
client.bulk(
{
refresh: "false", //we do refresh only at the end
//requestTimeout: 200000,
body: batch
},
function (err, resp) {
if (err) {
console.log(err.message);
throw err;
} else if (resp.errors) {
console.log('Bulk is rejected... let\'s medidate');
// let's just wait and re-send the bulk request with increased
// timeout to be on the safe side
console.log("Waiting for 20 seconds");
sleep.msleep(20000); // -> this is blocking... time for elasticsearch to do whatever it does
// and be in a better mood to accept this bulk
client.bulk(
{
refresh: "false",
//requestTimeout: 200000,
body: batch
},
function (err, resp) {
if (err) {
console.log(err.message);
throw err;
} else if (resp.errors) {
console.log(resp);
throw resp;
// alternative would be to block again and resend
}
console.log("bulk is finally ingested...");
let theEnd = new Date();
return callback(null, theEnd);
});
} else {
let theEnd = new Date();
return callback(null, theEnd);
}
});
},
function(end, callback) {
let total_time = (end - start) / 1000;
let intermediate_time = (end - previous_start) / 1000;
indexed += options.batchSize;
console.log('Loaded %s records in %d s (%d record/s)', indexed, total_time, options.batchSize / intermediate_time);
return callback(null, total_time);
}
],
function (err, total_time) {
if (err)
console.log(err);
});
batch = [];
i = 0;
}
});
Looks like we have some silents failures. Anybody has the same issue? Any suggestion?
Moreover when looking at calling http://localhost:9200/_cat/indices?v, I get strange results on the column of the docs.deleted, what that column means?
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open index_name Cqc2ABpRRs23P1DKlgaKJQ 5 0 96450728 340 24.8gb 24.8gb
sometimes this number changes during indexing, like:
green open index_name WsJPPQLcTuuiw37Vv0lfhA 5 0 21958048 6594 6.6gb 6.6gb
(then this number decreases)
I could not find any explanation about it... any help on this?
Thank you in advance

Do you create a new index or you use an existing one each time?
For the deleted docs, you let es generate _id or you force it?
What is your batchSize ?
To index faster you also could turn offreplica during indexation (if you have one of course)
settings => NumberOfReplicas(0)
index_buffer_size could be upd too.

Related

Is it possible to keep executing test if element wasn't found with Cypress

I have a case when I need to wait for element (advertising), if it's visible then needs to click it, but if element wasn't found after timeout then needs to keep executing a test.
How to handle the situation with Cypress ?
The way Cypress says to check for a conditional element is Element existence
cy.get('body').then(($body) => {
const modal = $body.find('modal')
if (modal.length) {
modal.click()
}
})
Most likely you put that at the top of the test, and it runs too soon (there's no retry timeoout).
You can add a wait say 30 seconds, but the test is delayed every time.
Better to call recursively
const clickModal = (selector, attempt = 0) => {
if (attempt === 100) return // whole 30 seconds is up
cy.get('body').then(($body) => {
const modal = $body.find('modal')
if (!modal.length) {
cy.wait(300) // wait in small chunks
clickModal(selector, ++attempt)
}
})
return // done, exit
}
cy.get('body')
.then($body => clickModal('modal'))
Intercept the advert
Best is if you can find the url for the advert in network tab, use cy.intercept() to catch it and stub it out to stop the modal displaying.
I tried the above solution, but seems that in some cases parameter $body could not contain necessary element, cause it was not loaded when we invoked cy.get('body'). So, I found another solution, using jQuery via Cypress, here is it:
let counter = 0;
const timeOut: number = Cypress.config('defaultCommandTimeout');
const sleep = (milliseconds) => {
const date = Date.now();
let currentDate = null;
do {
currentDate = Date.now();
} while (currentDate - date < milliseconds);
};
while (true) {
if (Cypress.$(element).length > 0 && Cypress.$(element).is(':visible')) {
Cypress.$(element).click();
break;
} else {
sleep(500);
counter = counter + 500;
if (counter >= timeOut) {
cy.log(elementName+ ' was not found after timeout');
break;
}
}
}

How to prevent memory leak when calling promise in a loop

I seem to get an error
ActivityManager: Process nl.xxxx.yyyy(pid 21526) has
died: fore TOP (2411,292) ActivityManager: setHasOverlayUi called on
unknown pid: 21526
when i call undermentioned function in a loop. In about 200 calls the error appears. It does not look like it is a timing problem, because even if i call the function in a timeout of 3 seconds (after the pomise has returned) it dies after about 200 cycles . I tried putting all variables outside the function , setting the variables to null but nothing seems to help. I have googled my fingers to the bone, but nothing came up... has anybody got a clue what i need to do ??
function testrun(f)
{
// initially called with f = 0
if(f > 500) return; // limit test to 500 cycles
console.log ("fire :"+ f); // show the cycle you're in
getMyThumb("/storage/emulated/0/DCIM/Screenshots/Screenshot_20190710-092009_ScanApp.jpg")
.then( thumb =>
{
console.log(thumb);
setTimeout(() =>
{
testrun(f+1);
}, 5000); // tried setting timeout from 100ms to 5 seconds per cycle... All bugg out at ca. 200 cycles
})
global.gc(); // testd with\ without garbage collection
}
global.getMyThumb = function name(filepath)
{
return new Promise((resolve, reject)=>
{
global.gc(); // tried with \ without garbage collection here
imageSource = imageSourceModule.fromFile(filepath);
try
{
var mutable = BitmapFactory.makeMutable(imageSource);
var ThumbBitmap = BitmapFactory.asBitmap(mutable).dispose((bmp) =>
{
var optisizestring = "25,25";
test = bmp.resize(optisizestring);
base64JPEG = test.toBase64(BitmapFactory.OutputFormat.JPEG, 75);
img = imageSource.fromBase64(base64JPEG);
resolve( "data:image/png;base64," + base64JPEG);
global.gc(); // tried with \ without garbage collection here
});
} catch(ex) { console.log("errds " + ex); resolve (null);}
});
}

How to retry failures with $q.all

I have some code that saves data using Breeze and reports progress over multiple saves that is working reasonably well.
However, sometimes a save will timeout, and I'd like to retry it once automatically. (Currently the user is shown an error and has to retry manually)
I am struggling to find an appropriate way to do this, but I am confused by promises, so I'd appreciate some help.
Here is my code:
//I'm using Breeze, but because the save takes so long, I
//want to break the changes down into chunks and report progress
//as each chunk is saved....
var surveys = EntityQuery
.from('PropertySurveys')
.using(manager)
.executeLocally();
var promises = [];
var fails = [];
var so = new SaveOptions({ allowConcurrentSaves: false});
var count = 0;
//...so I iterate through the surveys, creating a promise for each survey...
for (var i = 0, len = surveys.length; i < len; i++) {
var query = EntityQuery.from('AnsweredQuestions')
.where('PropertySurveyID', '==', surveys[i].ID)
.expand('ActualAnswers');
var graph = manager.getEntityGraph(query)
var changes = graph.filter(function (entity) {
return !entity.entityAspect.entityState.isUnchanged();
});
if (changes.length > 0) {
promises.push(manager
.saveChanges(changes, so)
.then(function () {
//reporting progress
count++;
logger.info('Uploaded ' + count + ' of ' + promises.length);
},
function () {
//could I retry the fail here?
fails.push(changes);
}
));
}
}
//....then I use $q.all to execute the promises
return $q.all(promises).then(function () {
if (fails.length > 0) {
//could I retry the fails here?
saveFail();
}
else {
saveSuccess();
}
});
Edit
To clarify why I have been attempting this:
I have an http interceptor that sets a timeout on all http requests. When a request times out, the timeout is adjusted upwards, the user is displayed an error message, telling them they can retry with a longer wait if they wish.
Sending all the changes in one http request is looking like it could take several minutes, so I decided to break the changes down into several http requests, reporting progress as each request succeeds.
Now, some requests in the batch might timeout and some might not.
Then I had the bright idea that I would set a low timeout for the http request to start with and automatically increase it. But the batch is sent asynchronously with the same timeout setting and the time is adjusted for each failure. That is no good.
To solve this I wanted to move the timeout adjustment after the batch completes, then also retry all requests.
To be honest I'm not so sure an automatic timeout adjustment and retry is such a great idea in the first place. And even if it was, it would probably be better in a situation where http requests were made one after another - which I've also been looking at: https://stackoverflow.com/a/25730751/150342
Orchestrating retries downstream of $q.all() is possible but would be very messy indeed. It's far simpler to perform retries before aggregating the promises.
You could exploit closures and retry-counters but it's cleaner to build a catch chain :
function retry(fn, n) {
/*
* Description: perform an arbitrary asynchronous function,
* and, on error, retry up to n times.
* Returns: promise
*/
var p = fn(); // first try
for(var i=0; i<n; i++) {
p = p.catch(function(error) {
// possibly log error here to make it observable
return fn(); // retry
});
}
return p;
}
Now, amend your for loop :
use Function.prototype.bind() to define each save as a function with bound-in parameters.
pass that function to retry().
push the promise returned by retry().then(...) onto the promises array.
var query, graph, changes, saveFn;
for (var i = 0, len = surveys.length; i < len; i++) {
query = ...; // as before
graph = ...; // as before
changes = ...; // as before
if (changes.length > 0) {
saveFn = manager.saveChanges.bind(manager, changes, so); // this is what needs to be tried/retried
promises.push(retry(saveFn, 1).then(function() {
// as before
}, function () {
// as before
}));
}
}
return $q.all(promises)... // as before
EDIT
It's not clear why you might want to retry downsteam of $q.all(). If it's a matter of introducing some delay before retrying, the simplest way would be to do within the pattern above.
However, if retrying downstream of $q.all() is a firm requirement, here's a cleanish recursive solution that allows any number of retries, with minimal need for outer vars :
var surveys = //as before
var limit = 2;
function save(changes) {
return manager.saveChanges(changes, so).then(function () {
return true; // true signifies success
}, function (error) {
logger.error('Save Failed');
return changes; // retry (subject to limit)
});
}
function saveChanges(changes_array, tries) {
tries = tries || 0;
if(tries >= limit) {
throw new Error('After ' + tries + ' tries, ' + changes_array.length + ' changes objects were still unsaved.');
}
if(changes_array.length > 0) {
logger.info('Starting try number ' + (tries+1) + ' comprising ' + changes_array.length + ' changes objects');
return $q.all(changes_array.map(save)).then(function(results) {
var successes = results.filter(function() { return item === true; };
var failures = results.filter(function() { return item !== true; }
logger.info('Uploaded ' + successes.length + ' of ' + changes_array.length);
return saveChanges(failures), tries + 1); // recursive call.
});
} else {
return $q(); // return a resolved promise
}
}
//using reduce to populate an array of changes
//the second parameter passed to the reduce method is the initial value
//for memo - in this case an empty array
var changes_array = surveys.reduce(function (memo, survey) {
//memo is the return value from the previous call to the function
var query = EntityQuery.from('AnsweredQuestions')
.where('PropertySurveyID', '==', survey.ID)
.expand('ActualAnswers');
var graph = manager.getEntityGraph(query)
var changes = graph.filter(function (entity) {
return !entity.entityAspect.entityState.isUnchanged();
});
if (changes.length > 0) {
memo.push(changes)
}
return memo;
}, []);
return saveChanges(changes_array).then(saveSuccess, saveFail);
Progress reporting is slightly different here. With a little more thought it could be made more like in your own answer.
This is a very rough idea of how to solve it.
var promises = [];
var LIMIT = 3 // 3 tris per promise.
data.forEach(function(chunk) {
promises.push(tryOrFail({
data: chunk,
retries: 0
}));
});
function tryOrFail(data) {
if (data.tries === LIMIT) return $q.reject();
++data.tries;
return processChunk(data.chunk)
.catch(function() {
//Some error handling here
++data.tries;
return tryOrFail(data);
});
}
$q.all(promises) //...
Two useful answers here, but having worked through this I have concluded that immediate retries is not really going to work for me.
I want to wait for the first batch to complete, then if the failures are because of timeouts, increase the timeout allowance, before retrying failures.
So I took Juan Stiza's example and modified it to do what I want. i.e. retry failures with $q.all
My code now looks like this:
var surveys = //as before
var successes = 0;
var retries = 0;
var failedChanges = [];
//The saveChanges also keeps a track of retries, successes and fails
//it resolves first time through, and rejects second time
//it might be better written as two functions - a save and a retry
function saveChanges(data) {
if (data.retrying) {
retries++;
logger.info('Retrying ' + retries + ' of ' + failedChanges.length);
}
return manager
.saveChanges(data.changes, so)
.then(function () {
successes++;
logger.info('Uploaded ' + successes + ' of ' + promises.length);
},
function (error) {
if (!data.retrying) {
//store the changes and resolve the promise
//so that saveChanges can be called again after the call to $q.all
failedChanges.push(data.changes);
return; //resolved
}
logger.error('Retry Failed');
return $q.reject();
});
}
//using map instead of a for loop to call saveChanges
//and store the returned promises in an array
var promises = surveys.map(function (survey) {
var changes = //as before
return saveChanges({ changes: changes, retrying: false });
});
logger.info('Starting data upload');
return $q.all(promises).then(function () {
if (failedChanges.length > 0) {
var retries = failedChanges.map(function (data) {
return saveChanges({ changes: data, retrying: true });
});
return $q.all(retries).then(saveSuccess, saveFail);
}
else {
saveSuccess();
}
});

Update records in Parse with Geopoints

I have about 600,000 records I uploaded through the data uploader in CSV format. My longitude and latitude columns are separate. I'm trying to modify the class in cloud code with this script. It updates sometimes and then other times there is an error. Can someone help me with this script or is there a way to do this that I'm not aware of.
Parse.Cloud.job("CreatePoints", function(request, status) {
// Set up to modify user data
Parse.Cloud.useMasterKey();
var recordsUpdated = 0;
// Query for all objects with GeoPoint location null
var query = new Parse.Query("Class");
query.doesNotExist("location");
query.each(function(object) {
var location = {
latitude: object.get("latitude"),
longitude: object.get("longitude")
};
if (!location.latitude || !location.longitude) {
return Parse.Promise.error("There was an error.");
}
recordsUpdated += 1;
if (recordsUpdated % 100 === 0) {
// Set the job's progress status
status.message(recordsUpdated + " records updated.");
}
// Update to GeoPoint
object.set("location", new Parse.GeoPoint(location));
return object.save();
}).then(function() {
// Set the job's success status
status.success("Migration completed successfully.");
}, function(error) {
// Set the job's error status
console.log(error);
status.error("Uh oh, something went wrong!");
})
});
As per the comments, your issue is that some of the Class members have no longitude or latitude.
You could change your query to only process those that have both values:
var query = new Parse.Query("Class");
query.doesNotExist("location");
query.exists("longitude");
query.exists("latitude");
query.each(function(object) {
// etc
Then you no longer need to check for them being empty, no longer need to return a Parse.Promise.error(), so should no longer hit your error.

Less CSS and local storage issue

I'm using LESS CSS (more exactly less.js) which seems to exploit LocalStorage under the hood. I had never seen such an error like this before while running my app locally, but now I get "Persistent storage maximum size reached" at every page display, just above the link the unique .less file of my app.
This only happens with Firefox 12.0 so far.
Is there any way to solve this?
P.S.: mainly inspired by Calculating usage of localStorage space, this is what I ended up doing (this is based on Prototype and depends on a custom trivial Logger class, but this should be easily adapted in your context):
"use strict";
var LocalStorageChecker = Class.create({
testDummyKey: "__DUMMY_DATA_KEY__",
maxIterations: 100,
logger: new Logger("LocalStorageChecker"),
analyzeStorage: function() {
var result = false;
if (Modernizr.localstorage && this._isLimitReached()) {
this._clear();
}
return result;
},
_isLimitReached: function() {
var localStorage = window.localStorage;
var count = 0;
var limitIsReached = false;
do {
try {
var previousEntry = localStorage.getItem(this.testDummyKey);
var entry = (previousEntry == null ? "" : previousEntry) + "m";
localStorage.setItem(this.testDummyKey, entry);
}
catch(e) {
this.logger.debug("Limit exceeded after " + count + " iteration(s)");
limitIsReached = true;
}
}
while(!limitIsReached && count++ < this.maxIterations);
localStorage.removeItem(this.testDummyKey);
return limitIsReached;
},
_clear: function() {
try {
var localStorage = window.localStorage;
localStorage.clear();
this.logger.debug("Storage clear successfully performed");
}
catch(e) {
this.logger.error("An error occurred during storage clear: ");
this.logger.error(e);
}
}
});
document.observe("dom:loaded",function() {
var checker = new LocalStorageChecker();
checker.analyzeStorage();
});
P.P.S.: I didn't measure the performance impact on the UI yet, but a decorator could be created and perform the storage test only every X minutes (with the last timestamp of execution in the local storage for instance).
Here is a good resource for the error you are running into.
http://www.sitepoint.com/building-web-pages-with-local-storage/#fbid=5fFWRXrnKjZ
Gives some insight that localstorage only has so much room and you can max it out in each browser. Look into removing some data from localstorage to resolve your problem.
Less.js persistently caches content that is #imported. You can use this script to clear content that is cached. Using the script below you can call the function destroyLessCache('/path/to/css/') and it will clear your localStorage of css files that have been cached.
function destroyLessCache(pathToCss) { // e.g. '/css/' or '/stylesheets/'
if (!window.localStorage || !less || less.env !== 'development') {
return;
}
var host = window.location.host;
var protocol = window.location.protocol;
var keyPrefix = protocol + '//' + host + pathToCss;
for (var key in window.localStorage) {
if (key.indexOf(keyPrefix) === 0) {
delete window.localStorage[key];
}
}
}

Resources