Update records in Parse with Geopoints - parse-platform

I have about 600,000 records I uploaded through the data uploader in CSV format. My longitude and latitude columns are separate. I'm trying to modify the class in cloud code with this script. It updates sometimes and then other times there is an error. Can someone help me with this script or is there a way to do this that I'm not aware of.
Parse.Cloud.job("CreatePoints", function(request, status) {
// Set up to modify user data
Parse.Cloud.useMasterKey();
var recordsUpdated = 0;
// Query for all objects with GeoPoint location null
var query = new Parse.Query("Class");
query.doesNotExist("location");
query.each(function(object) {
var location = {
latitude: object.get("latitude"),
longitude: object.get("longitude")
};
if (!location.latitude || !location.longitude) {
return Parse.Promise.error("There was an error.");
}
recordsUpdated += 1;
if (recordsUpdated % 100 === 0) {
// Set the job's progress status
status.message(recordsUpdated + " records updated.");
}
// Update to GeoPoint
object.set("location", new Parse.GeoPoint(location));
return object.save();
}).then(function() {
// Set the job's success status
status.success("Migration completed successfully.");
}, function(error) {
// Set the job's error status
console.log(error);
status.error("Uh oh, something went wrong!");
})
});

As per the comments, your issue is that some of the Class members have no longitude or latitude.
You could change your query to only process those that have both values:
var query = new Parse.Query("Class");
query.doesNotExist("location");
query.exists("longitude");
query.exists("latitude");
query.each(function(object) {
// etc
Then you no longer need to check for them being empty, no longer need to return a Parse.Promise.error(), so should no longer hit your error.

Related

Updating an entry in PersistentVector not working NEAR Protocol

I'm trying to update the status of a job object. I get the "success" message return but the value is not updating. Do I miss something?
#nearBindgen
export class Contract {
private jobs: PersistentVector<Job> = new PersistentVector<Job>('jobs');
......
#mutateState()
cancelJob(jobTitle: string): string {
for (let i = 0; i < this.jobs.length; i++) {
if (this.jobs[i].title == jobTitle) {
this.jobs[i].status = "Cancelled";
return "success"
}
}
return "not found";
}
And I'm calling it like that:
near call apptwo.msaudi.testnet cancelJob '{\"jobTitle\":\"title2\"}' --account-id=msaudi.testnet
It’s not enough to update entry when you fetch it. You need to update the storage on the contract as well. Write it back in so to speak.
This isn’t enough
this.jobs[i].status = "Cancelled";
You need to add it back in:
if (this.jobs[i].title == jobTitle) {
const job: Job = this.jobs[i]; // Need an intermediate object in memory
job.status = "Cancelled";
this.jobs.replace(i, job); // Update storage with the new job.
return "success"
}

Exception: Service invoked too many times for one day: urlfetch

I created a script in Google Sheets, which is working well but after a while I'm getting the following error:
Exception: Service invoked too many times for one day: urlfetch
I think I called the function like 200-300 times in the day, for what I checked it should be below the limit.
I read we can use cache to avoid this issue but not sure how to use it in my code.
function scrapercache(url) {
var result = [];
var description;
var options = {
'muteHttpExceptions': true,
'followRedirects': false,
};
var cache = CacheService.getScriptCache();
var properties = PropertiesService.getScriptProperties();
try {
let res = cache.get(url);
if (!res) {
// trim url to prevent (rare) errors
url.toString().trim();
var r = UrlFetchApp.fetch(url, options);
var c = r.getResponseCode();
// check for meta refresh if 200 ok
if (c == 200) {
var html = r.getContentText();
cache.put(url, "cached", 21600);
properties.setProperty(url, html);
var $ = Cheerio.load(html); // make sure this lib is added to your project!
// meta description
if ($('meta[name=description]').attr("content")) {
description = $('meta[name=description]').attr("content").trim();
}
}
result.push([description]);
}
}
catch (error) {
result.push(error.toString());
}
finally {
return result;
}
}
how can I use cache like this to enhance my script please?
var cache = CacheService.getScriptCache();
var result = cache.get(url);
if(!result) {
var response = UrlFetchApp.fetch(url);
result = response.getContentText();
cache.put(url, result, 21600);
Answer:
You can implement CacheService and PropertiesService together and only retrieve the URL again after a specified amount of time.
Code Change:
Be aware that additional calls to retrieving the cache and properties will slow your function down, especially if you are doing this a few hundred times.
As the values of the cache can be a maximum of 100 KB, we will use CacheService to keep track of which URLs are to be retrieved, but PropertiesService to store the data.
You can edit your try block as so:
var cache = CacheService.getScriptCache();
var properties = PropertiesService.getScriptProperties();
try {
let res = cache.get(url);
if (!res) {
// trim url to prevent (rare) errors
url.toString().trim();
var r = UrlFetchApp.fetch(url, options);
var c = r.getResponseCode();
// check for meta refresh if 200 ok
if (c == 200) {
var html = r.getContentText();
cache.put(url, "cached", 21600);
properties.setProperty(url, html);
var $ = Cheerio.load(html); // make sure this lib is added to your project!
// meta description
if ($('meta[name=description]').attr("content")) {
description = $('meta[name=description]').attr("content").trim();
}
}
result.push([description]);
}
}
catch (error) {
result.push(error.toString());
}
finally {
return result;
}
References:
Class CacheService | Apps Script | Google Developers
Class Cache | Apps Script | Google Developers
Class PropertiesService | Apps Script | Google Developers
Related Questions:
Service invoked too many times for one day: urlfetch

silent failures when indexing in elasticsearch

I'm using elasticsearch 6.4. We index about 100M documents with a node-js loader using the client 15.2.0.
The results are weird, because after every index we get different number of documents.
The code create a batch and after it reached a certain size it is loaded in elasticsearch with the bulk API. To be more performant we disable the refresh. If the bulk is rejected we wait 20 seconds and try again.
We checked also that response.error is true/false assuming that response.error = true means there are not failures.
Here the code:
if (i % options.batchSize === 0) {
var previous_start = new Date();
//sleep.msleep(options.slowdown);
async.waterfall([
function (callback) {
client.bulk(
{
refresh: "false", //we do refresh only at the end
//requestTimeout: 200000,
body: batch
},
function (err, resp) {
if (err) {
console.log(err.message);
throw err;
} else if (resp.errors) {
console.log('Bulk is rejected... let\'s medidate');
// let's just wait and re-send the bulk request with increased
// timeout to be on the safe side
console.log("Waiting for 20 seconds");
sleep.msleep(20000); // -> this is blocking... time for elasticsearch to do whatever it does
// and be in a better mood to accept this bulk
client.bulk(
{
refresh: "false",
//requestTimeout: 200000,
body: batch
},
function (err, resp) {
if (err) {
console.log(err.message);
throw err;
} else if (resp.errors) {
console.log(resp);
throw resp;
// alternative would be to block again and resend
}
console.log("bulk is finally ingested...");
let theEnd = new Date();
return callback(null, theEnd);
});
} else {
let theEnd = new Date();
return callback(null, theEnd);
}
});
},
function(end, callback) {
let total_time = (end - start) / 1000;
let intermediate_time = (end - previous_start) / 1000;
indexed += options.batchSize;
console.log('Loaded %s records in %d s (%d record/s)', indexed, total_time, options.batchSize / intermediate_time);
return callback(null, total_time);
}
],
function (err, total_time) {
if (err)
console.log(err);
});
batch = [];
i = 0;
}
});
Looks like we have some silents failures. Anybody has the same issue? Any suggestion?
Moreover when looking at calling http://localhost:9200/_cat/indices?v, I get strange results on the column of the docs.deleted, what that column means?
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open index_name Cqc2ABpRRs23P1DKlgaKJQ 5 0 96450728 340 24.8gb 24.8gb
sometimes this number changes during indexing, like:
green open index_name WsJPPQLcTuuiw37Vv0lfhA 5 0 21958048 6594 6.6gb 6.6gb
(then this number decreases)
I could not find any explanation about it... any help on this?
Thank you in advance
Do you create a new index or you use an existing one each time?
For the deleted docs, you let es generate _id or you force it?
What is your batchSize ?
To index faster you also could turn offreplica during indexation (if you have one of course)
settings => NumberOfReplicas(0)
index_buffer_size could be upd too.

How to retry failures with $q.all

I have some code that saves data using Breeze and reports progress over multiple saves that is working reasonably well.
However, sometimes a save will timeout, and I'd like to retry it once automatically. (Currently the user is shown an error and has to retry manually)
I am struggling to find an appropriate way to do this, but I am confused by promises, so I'd appreciate some help.
Here is my code:
//I'm using Breeze, but because the save takes so long, I
//want to break the changes down into chunks and report progress
//as each chunk is saved....
var surveys = EntityQuery
.from('PropertySurveys')
.using(manager)
.executeLocally();
var promises = [];
var fails = [];
var so = new SaveOptions({ allowConcurrentSaves: false});
var count = 0;
//...so I iterate through the surveys, creating a promise for each survey...
for (var i = 0, len = surveys.length; i < len; i++) {
var query = EntityQuery.from('AnsweredQuestions')
.where('PropertySurveyID', '==', surveys[i].ID)
.expand('ActualAnswers');
var graph = manager.getEntityGraph(query)
var changes = graph.filter(function (entity) {
return !entity.entityAspect.entityState.isUnchanged();
});
if (changes.length > 0) {
promises.push(manager
.saveChanges(changes, so)
.then(function () {
//reporting progress
count++;
logger.info('Uploaded ' + count + ' of ' + promises.length);
},
function () {
//could I retry the fail here?
fails.push(changes);
}
));
}
}
//....then I use $q.all to execute the promises
return $q.all(promises).then(function () {
if (fails.length > 0) {
//could I retry the fails here?
saveFail();
}
else {
saveSuccess();
}
});
Edit
To clarify why I have been attempting this:
I have an http interceptor that sets a timeout on all http requests. When a request times out, the timeout is adjusted upwards, the user is displayed an error message, telling them they can retry with a longer wait if they wish.
Sending all the changes in one http request is looking like it could take several minutes, so I decided to break the changes down into several http requests, reporting progress as each request succeeds.
Now, some requests in the batch might timeout and some might not.
Then I had the bright idea that I would set a low timeout for the http request to start with and automatically increase it. But the batch is sent asynchronously with the same timeout setting and the time is adjusted for each failure. That is no good.
To solve this I wanted to move the timeout adjustment after the batch completes, then also retry all requests.
To be honest I'm not so sure an automatic timeout adjustment and retry is such a great idea in the first place. And even if it was, it would probably be better in a situation where http requests were made one after another - which I've also been looking at: https://stackoverflow.com/a/25730751/150342
Orchestrating retries downstream of $q.all() is possible but would be very messy indeed. It's far simpler to perform retries before aggregating the promises.
You could exploit closures and retry-counters but it's cleaner to build a catch chain :
function retry(fn, n) {
/*
* Description: perform an arbitrary asynchronous function,
* and, on error, retry up to n times.
* Returns: promise
*/
var p = fn(); // first try
for(var i=0; i<n; i++) {
p = p.catch(function(error) {
// possibly log error here to make it observable
return fn(); // retry
});
}
return p;
}
Now, amend your for loop :
use Function.prototype.bind() to define each save as a function with bound-in parameters.
pass that function to retry().
push the promise returned by retry().then(...) onto the promises array.
var query, graph, changes, saveFn;
for (var i = 0, len = surveys.length; i < len; i++) {
query = ...; // as before
graph = ...; // as before
changes = ...; // as before
if (changes.length > 0) {
saveFn = manager.saveChanges.bind(manager, changes, so); // this is what needs to be tried/retried
promises.push(retry(saveFn, 1).then(function() {
// as before
}, function () {
// as before
}));
}
}
return $q.all(promises)... // as before
EDIT
It's not clear why you might want to retry downsteam of $q.all(). If it's a matter of introducing some delay before retrying, the simplest way would be to do within the pattern above.
However, if retrying downstream of $q.all() is a firm requirement, here's a cleanish recursive solution that allows any number of retries, with minimal need for outer vars :
var surveys = //as before
var limit = 2;
function save(changes) {
return manager.saveChanges(changes, so).then(function () {
return true; // true signifies success
}, function (error) {
logger.error('Save Failed');
return changes; // retry (subject to limit)
});
}
function saveChanges(changes_array, tries) {
tries = tries || 0;
if(tries >= limit) {
throw new Error('After ' + tries + ' tries, ' + changes_array.length + ' changes objects were still unsaved.');
}
if(changes_array.length > 0) {
logger.info('Starting try number ' + (tries+1) + ' comprising ' + changes_array.length + ' changes objects');
return $q.all(changes_array.map(save)).then(function(results) {
var successes = results.filter(function() { return item === true; };
var failures = results.filter(function() { return item !== true; }
logger.info('Uploaded ' + successes.length + ' of ' + changes_array.length);
return saveChanges(failures), tries + 1); // recursive call.
});
} else {
return $q(); // return a resolved promise
}
}
//using reduce to populate an array of changes
//the second parameter passed to the reduce method is the initial value
//for memo - in this case an empty array
var changes_array = surveys.reduce(function (memo, survey) {
//memo is the return value from the previous call to the function
var query = EntityQuery.from('AnsweredQuestions')
.where('PropertySurveyID', '==', survey.ID)
.expand('ActualAnswers');
var graph = manager.getEntityGraph(query)
var changes = graph.filter(function (entity) {
return !entity.entityAspect.entityState.isUnchanged();
});
if (changes.length > 0) {
memo.push(changes)
}
return memo;
}, []);
return saveChanges(changes_array).then(saveSuccess, saveFail);
Progress reporting is slightly different here. With a little more thought it could be made more like in your own answer.
This is a very rough idea of how to solve it.
var promises = [];
var LIMIT = 3 // 3 tris per promise.
data.forEach(function(chunk) {
promises.push(tryOrFail({
data: chunk,
retries: 0
}));
});
function tryOrFail(data) {
if (data.tries === LIMIT) return $q.reject();
++data.tries;
return processChunk(data.chunk)
.catch(function() {
//Some error handling here
++data.tries;
return tryOrFail(data);
});
}
$q.all(promises) //...
Two useful answers here, but having worked through this I have concluded that immediate retries is not really going to work for me.
I want to wait for the first batch to complete, then if the failures are because of timeouts, increase the timeout allowance, before retrying failures.
So I took Juan Stiza's example and modified it to do what I want. i.e. retry failures with $q.all
My code now looks like this:
var surveys = //as before
var successes = 0;
var retries = 0;
var failedChanges = [];
//The saveChanges also keeps a track of retries, successes and fails
//it resolves first time through, and rejects second time
//it might be better written as two functions - a save and a retry
function saveChanges(data) {
if (data.retrying) {
retries++;
logger.info('Retrying ' + retries + ' of ' + failedChanges.length);
}
return manager
.saveChanges(data.changes, so)
.then(function () {
successes++;
logger.info('Uploaded ' + successes + ' of ' + promises.length);
},
function (error) {
if (!data.retrying) {
//store the changes and resolve the promise
//so that saveChanges can be called again after the call to $q.all
failedChanges.push(data.changes);
return; //resolved
}
logger.error('Retry Failed');
return $q.reject();
});
}
//using map instead of a for loop to call saveChanges
//and store the returned promises in an array
var promises = surveys.map(function (survey) {
var changes = //as before
return saveChanges({ changes: changes, retrying: false });
});
logger.info('Starting data upload');
return $q.all(promises).then(function () {
if (failedChanges.length > 0) {
var retries = failedChanges.map(function (data) {
return saveChanges({ changes: data, retrying: true });
});
return $q.all(retries).then(saveSuccess, saveFail);
}
else {
saveSuccess();
}
});

How to copy latitude and longitude fields into a GeoPoint on Parse.com

I have imported a 16,000 row csv into a Parse.com class. Now I need to convert the latitude and longitude fields into a Parse.com GeoPoint data type field.
I have added the GeoPoint field and now must run an update to copy the lat/lng data from each row into their respective GeoPoint field.
Where in the API or Parse.com UI does one accomplish this? I can't seem to find it. I know it's possible because what sane team of developers would omit such a feature.
The solution is indeed to create a Job using CloudCode. Here is an example in Javascript:
Parse.Cloud.job("airportMigration", function(request, status) {
// Set up to modify user data
Parse.Cloud.useMasterKey();
var recordsUpdated = 0;
// Query for all airports with GeoPoint location null
var query = new Parse.Query("Airports");
query.doesNotExist("location");
query.each(function(airport) {
var location = {
latitude: airport.get("latitude_deg"),
longitude: airport.get("longitude_deg")
};
if (!location.latitude || !location.longitude) {
return Parse.Promise.error("There was an error.");
// return Parse.Promise.resolve("I skipped a record and don't care.");
}
recordsUpdated += 1;
if (recordsUpdated % 100 === 0) {
// Set the job's progress status
status.message(recordsUpdated + " records updated.");
}
// Update to GeoPoint
airport.set("location", new Parse.GeoPoint(location));
return airport.save();
}).then(function() {
// Set the job's success status
status.success("Migration completed successfully.");
}, function(error) {
// Set the job's error status
console.log(error);
status.error("Uh oh, something went wrong.");
});
});

Resources