This answer from Parse says:
You can call destroy() on any ParseObject from Cloud Code to delete them. Deleting, as well as creating or updating, multiple objects from Cloud Code is not recommended, however.
Why? The answerer doesn't say, and it seems like Cloud Code would be exactly the place to bulk update/delete objects. Is he using Cloud Code in opposition to a Cloud background job? Or am I missing some other way to delete objects in Parse?
The linked answer was from before the launch of Background Jobs, which have an increased time-limit.
Cloud Functions have a 15 second maximum run-time. This is why you need to be a little conservative about how many operations you perform in a specific cloud function.
Now, Background Jobs are the recommended path for maintenance-type processes. https://parse.com/docs/cloud_code_guide#jobs
They have a 15 minute time limit, and if you're clever about it, can be used to handle lots of work at near-real-time speeds. i.e. https://gist.github.com/gfosco/131974d200c5e9fc6c94
Related
I know there's a question with the same title but my question is a little different: I got a Lambda API - saveInputAPI() to save the value into a specified field. Users can invoke this API with different parameter, for example:
saveInput({"adressType",1}); //adressType is a DB field.
or
saveInput({"name","test"}) //name is a DB field.
And of course, this hosts on AWS so I'm also using API Gateway as well. But the problem is sometimes, an error like this happened:
As you can see. API No. 19 was invoked first but ended up finishing later
(10:10:16:828) -> (10:10:18:060)
While API No.18 was invoked later but finished sooner...
(10:10:17:611) -> (10:10:17:861)
This leads to a lot of problems in my project. And sometimes, the delay between 2 API was up to 10 seconds. The front project acts independently so users don't know what happens behind. They think they have set addressType to 1 but in reality, the addressType is still 2. Since this project is large and I cannot change this kind of [using only 1 API to update DB value] design. Is there any way for me to fix this problem ?? Really appreciate any idea. Thanks
If updates to Database can't be skipped if last updated timestamp is more recent than the source event timestamp, we need to decouple Api Gateway and Lambda.
Api Gateway writes to SQS FIFO Queue.
Lambda to consume SQS and process the request.
This will ensure older event is processed first.
Amazon Lambda is asynchronous by design. That means that trying to make it synchronous and predictable is kind of waste.
If your concern is avoiding "old" data (in a sense of scheduling) overwrite "fresh" data, then you might consider timestamping each data and then applying constraints like "if you want to overwrite target data, then your source timestamp have to be in the future compared to timestamp of the targeted data"
I'm creating a new service, and for that I have database entries (Mongo) that have a state field, which I need to update based on a current time, so, for instance, the start time was set to two hours from now, I need to change state from CREATED -> STARTED in database, and there can be multiple such states.
Approaches I've thought of:
Keep querying database entries that are <= current time and then change their states accordingly. This causes extra reads for no reason and half the time empty reads, and it will get complicated fast with more states coming in.
I write a job scheduler (I am using go, so that'd be not so hard), and schedule all the jobs, but I might lose queue data in case of a panic/crash.
I use some products like celery, have found a go implementation for it https://github.com/gocelery/gocelery
Another task scheduler I've found is on Google Cloud https://cloud.google.com/solutions/reliable-task-scheduling-compute-engine, but I don't want to get stuck in proprietary technologies.
I wanted to use some PubSub service for this, but I couldn't find one that has delayed messages (if that's a thing). My problem is mainly not being able to find an actual name for this problem, to be able to search for it properly, I've even tried searching Microsoft docs. If someone can point me in the right direction or if any of the approaches I've written are the ones I should use, please let me know, that would be a great help!
UPDATE:
Found one more solution by Netflix, for the same problem
https://medium.com/netflix-techblog/distributed-delay-queues-based-on-dynomite-6b31eca37fbc
I think you are right in that the problem you are trying to solve is the job or task scheduling problem.
One approach that many companies use is the system you are proposing: jobs are inserted into a datastore with a time to execute at and then that datastore can be polled for jobs to be run. There are optimizations that prevent extra reads like polling the database at a regular interval and using exponential back-off. The advantage of this system is that it is tolerant to node failure and the disadvantage is added complexity to the system.
Looking around, in addition to the one you linked (https://github.com/gocelery/gocelery) there are other implementations of this model (https://github.com/ajvb/kala or https://github.com/rakanalh/scheduler were ones I found after a quick search).
The other approach you described "schedule jobs in process" is very simple in go because goroutines which are parked are extremely cheap. It's simple to just spawn a goroutine for your work cheaply. This is simple but the downside is that if the process dies, the job is lost.
go func() {
<-time.After(expirationTime.Sub(time.Now()))
// do work here.
}()
A final approach that I have seen but wouldn't recommend is the callback model (something like https://gitlab.com/andreynech/dsched). This is where your service calls to another service (over http, grpc, etc.) and schedules a callback for a specific time. The advantage is that if you have multiple services in different languages, they can use the same scheduler.
Overall, before you decide on a solution, I would consider some trade-offs:
How acceptable is job loss? If it's ok that some jobs are lost a small percentage of the time, maybe an in-process solution is acceptable.
How long will jobs be waiting? If it's longer than the shutdown period of your host, maybe a datastore based solution is better.
Will you need to distribute job load across multiple machines? If you need to distribute the load, sharding and scheduling are tricky things and you might want to consider using a more off-the-shelf solution.
Good luck! Hope that helps.
I am looking for some high level guidance on an architecture. I have a provider writing "transactions" to a Kinesis pipe (about 1MM/day). I need to pull those transactions off, one at a time, validating data, hitting other SOAP or Rest services for additional information, applying some business logic, and writing the results to S3.
One approach that has been proposed is use Spark job that runs forever, pulling data and processing it within the Spark environment. The benefits were enumerated as shareable cached data, availability of SQL, and in-house knowledge of Spark.
My thought was to have a series of Lambda functions that would process the data. As I understand it, I can have a Lambda watching the Kinesis pipe for new data. I want to run the pulled data through a bunch of small steps (lambdas), each one doing a single step in the process. This seems like an ideal use of Step Functions. With regards to caches, if any are needed, I thought that Redis on ElastiCache could be used.
Can this be done using a combination of Lambda and Step Functions (using lambdas)? If it can be done, is it the best approach? What other alternatives should I consider?
This can be achieved using a combination of Lambda and Step Functions. As you described, the lambda would monitor the stream and kick off a new execution of a state machine, passing the transaction data to it as an input. You can see more documentation around kinesis with lambda here: http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html.
The state machine would then pass the data from one Lambda function to the next where the data will be processed and written to S3. You need to contact AWS for an increase on the default 2 per second StartExecution API limit to support 1MM/day.
Hope this helps!
Started using Realm as storage layer for my app. This is these scenario I am trying to solve
Scenario: I get a whole bunch of data from the server. I convert each piece of data into a RLMObject. I want to just "save" to persistent storage at the end. In between, I want these RLMObjects create dot reflected when I do a query
I don't see a solution for this in Realm. Looks like only way to is to write each Object back into the Realm DB after they are created. Documentation also says that writes are expensive. Is there any way around?
To reduce the overhead, I guess I could maintain list of objects created and write all of them in one transaction. Still seems like a lot of work. Is that how it is intended to be used?
You can create the objects as standalone without adding them to the Realm, and then add them all in single transaction (which is very efficient) at the end.
Check out the documentation about creating objects here: https://realm.io/docs/objc/latest/#creating-objects
There is also an example of adding objects in bulk here, where they get added in chunks so that other threads can observe the changes as they happens: https://realm.io/docs/objc/latest/#using-a-realm-across-threads
I need to know the relative position of an object in a list. Lets say I need to know the position of a certain wine of all wines added to the database, based in the votes received by users. The app should be able to receive the ranking position as an object property when retrieving a "wine" class object.
This should be easy to do in the backend side but I've seen Cloud Code and it seems it only is able to execute code before or after saving or deleting, not before reading and giving response.
Any way to do this task?. Any workaround?.
Thanks.
I think you would have to write a Cloud function to perform this calculation for a particular wine.
https://www.parse.com/docs/cloud_code_guide#functions
This would be a function you would call manually. You would have to provide the "wine" object or objectId as a parameter and then get have your cloud function return the value you need. Keep in mind there are limitations on cloud functions. Read the documentation about time limits. You also don't want to make too many API calls every time you run this. It sounds like your computation could be fairly heavy if your dataset is large and you aren't caching at least some of the information.