How to implement saveEventually in CloudKit - parse-platform

One of true greatest Parse features is PFObject's saveEventually method. From Parse’s doc:
saveEventually
Saves this object to the server at some unspecified time in the future, even if Parse is currently inaccessible.
Basically it saves it locally and keeps trying to push Parse whenever it feels there is a connection.
How can someone implement the same functionality using CloudKit?

In CloudKit you have to do everything yourself.
You could set the object in a queue (in memory and persist to file in case of an app restart) When the object is saved to CloudKit, you can remove it from the queue.
You could create a special queue object that would contain the actual data plus some extra information like timestamps and retry count.
In your AppDelegate application didFinishLaunchingWithOptions you should read the queue from file and continue processing

Related

MassTransit MessageData Management

I have been starting to make greater use of the message data feature of masstransit and am getting to the point needing to manage the message data in the store - i.e. remove old data.
The obvious choice is to have some outside process tidy up data, but clearly a scheduled (or not) clean up could remove data still in use or referenced by error or dead letter queues.
Ideally I would like to limit stored message data retention to messages only in error or dead letter queues, and automatically remove data for messages that have been successfully processed.
What would be the best approach to achieve this with MassTransit? Perhaps with a MiddleWare approach or similar, and if that is the case what is the correct approach?
Manual cleanup is recommended, using whatever makes sense for the repository in use. Because messages may still be in queues, or in error/dead-letter queues as you pointed out, it is really up to development/operations team to know when the right time is to remove older message data.
I'd suggest monitoring and managing the error/dead-letter queues more aggressively, keeping them empty. And then, just figure a good timeframe to delete old message data - one week, ten days, whatever - and deal with it that way.
I have had a backlog item to come up with a way to automatically manage message data, but since message data can be forwarded (using the same stored data) either via publish or send, there is no good way to track references.

Spring-integration: keep a context for a Message throught a chain

I am using spring-integration, and I have messages that goes through an int:chain with multiple elements: int:service-activator, int:transformers, etc. In the end, a message is sent to another app's Rest endpoint. There is also an errorHandler that will save any Exception in a text file.
For administration purpose, I would like to keep some information about what happened in the chain (ex: "this DB call returned this", "during this transformation, this rule was applied", etc.). This would be equivalent to a log file, but bound to a Message. Of course there is already a logger, but in the end, I need to create (either after the Rest called is made, or when an error occurs) a file for this specific Message with the data.
I was wondering if there was some kind of "context" for the Message that I could call through any part of the chain, and where I could store stuff. I didn't found anything in the official documentation, but I'm not really sure about what to look for.
I've been thinking about putting it all in the Message itself, but:
It's an immutable object, so I would need to rebuild it each time I want to add something to its header (or the payload).
I wouldn't be able to retrieve any new data from the error handler in case of Exception, because it takes the original message.
I can't really add it to the payload object because some native transformers/service-activators are directly using it (and that would also mean rewriting a lot of code ...)
I've been also thinking to some king of "thread-bound" bean that would act as a context for each Message, but I see too many problem arising from this.
Maybe I'm wrong about some of these ideas. Anyway, I just need a way to keep data though multiple element of a Spring integration chain and also be able to access it in the error handler.
Add a header, e.g. a map or list, and add to it in each stage.
The framework does something similar when message history is enabled.

ES,CQRS messaging flow

I was trying to understanding ES+CQRS and tech stack can be used.
As per my understanding flow should be as below.
UI sends a request to Controller(HTTP Adapter)
Controller calls application service by passing Request Object as parameter.
Application Service creates Command from Request Object passed from controller.
Application Service pass this Command to Message Consumer.
Message Consumer publish Command to message broker(RabbitMQ)
Two Subscriber will be listening for above command
a. One subscriber will generate Aggregate from eventStore using command
and will apply command than generated event will be stored in event store.
b. Another subscriber will be at VIEW end,that will populate data in view database/cache.
Kindly suggest my understanding is correct.
Kindly suggest my understanding is correct
I think you've gotten a bit tangled in your middleware.
As a rule, CQRS means that the writes happen to one data model, and reads in another. So the views aren't watching commands, they are watching the book of record.
So in the subscriber that actually processes the command, the command handler will load the current state from the book of record into memory, update the copy in memory according to the domain model, and then replace the state in the book of record with the updated version.
Having update the book of record, we can now trigger a refresh of the data model that backs the view; no business logic is run here, this is purely a transform of the data from the model we use for writes to the model we use for reads.
When we add event sourcing, this pattern is the same -- the distinction is that the data model we use for writes is a history of events.
How atomicity is achieved in writing data in event store and writing data in VIEW Model?
It's not -- we don't try to make those two actions atomic.
how do we handle if event is stored in EventStrore but System got crashed before we send event in Message Queue
The key idea is to realize that we typically build new views by reading events out of the event store; not by reading the events out of the message queue. The events in the queue just tell us that an update is available. In the absence of events appearing in the message queue, we can still poll the event store watching for updates.
Therefore, if the event store is unreachable, you just leave the stale copy of the view in place, and wait for the system to recover.
If the event store is reachable, but the message queue isn't, then you update the view (if necessary) on some predetermined schedule.
This is where the eventual consistency part comes in. Given a successful write into the event store, we are promising that the effects of that write will be visible in a finite amount of time.

CQRS+ES: Client log as event

I'm developing small CQRS+ES framework and develop applications with it. In my system, I should log some action of the client and use it for analytics, statistics and maybe in the future do something in domain with it. For example, client (on web) download some resource(s) and I need save date, time, type (download, partial,...), from region or country (maybe IP), etc. after that in some view client can see count of download or some complex report. I'm not sure how to implement this feather.
First solution creates analytic context and some aggregate, in each client action send some command like IncreaseDownloadCounter(resourced) them handle the command and raise domain event's and updating view, but in this scenario first download occurred and after that, I send command so this is not really command and on other side version conflict increase.
The second solution is raising event, from client side and update the view model base on it, but in this type of handling my event not store in event store because it's not raise by command and never change any domain context. If is store it in event store, no aggregate to handle it after fetch for some other use.
Third solution is raising event, from client side and I store it on other database may be for each type of event have special table, but in this manner of event handle I have multiple event storage with different schema and difficult on recreating view models and trace events for recreating contexts states so in future if I add some domain for use this type of event's it's difficult to use events.
What is the best approach and solution for this scenario?
First solution creates analytic context and some aggregate
Unquestionably the wrong answer; the event has already happened, so it is too late for the domain model to complain.
What you have is a stream of events. Putting them in the same event store that you use for your aggregate event streams is fine. Putting them in a separate store is also fine. So you are going to need some other constraint to make a good choice.
Typically, reads vastly outnumber writes, so one concern might be that these events are going to saturate the domain store. That might push you towards storing these events separately from your data model (prior art: we typically keep the business data in our persistent book of record, but the sequence of http requests received by the server is typically written instead to a log...)
If you are supporting an operational view, push on the requirement that the state be recovered after a restart. You might be able to get by with building your view off of an in memory model of the event counts, and use something more practical for the representations of the events.
Thanks for your complete answer, so I should create something like the ES schema without some field (aggregate name or type, version, etc.) and collect client event in that repository, some offline process read and update read model or create command to do something on domain space.
Something like that, yes. If the view for the client doesn't actually require any validation by your model at all, then building the read model from the externally provided events is fine.
Are you recommending save some claim or authorization token of the user and sender app for validation in another process?
Maybe, maybe not. The token describes the authority of the event; our own event handler is the authority for the command(s) that is/are derived from the events. It's an interesting question that probably requires more context -- I'd suggest you open a new question on that point.

Check if S3 file has been modified

How can I use a shell script check if an Amazon S3 file ( small .xml file) has been modified. I'm currently using curl to check every 10 seconds, but it's making many GET requests.
curl "s3.aws.amazon.com/bucket/file.xml"
if cmp "file.xml" "current.xml"
then
echo "no change"
else
echo "file changed"
cp "file.xml" "current.xml"
fi
sleep(10s)
Is there a better way to check every 10 seconds that reduces the number of GET requests? (This is built on top of a rails app so i could possibly build a handler in rails?)
Let me start by first telling you some facts about S3. You might know this, but in case you don't, you might see that your current code could have some "unexpected" behavior.
S3 and "Eventual Consistency"
S3 provides "eventual consistency" for overwritten objects. From the S3 FAQ, you have:
Q: What data consistency model does Amazon S3 employ?
Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
Eventual consistency for overwrites means that, whenever an object is updated (ie, whenever your small XML file is overwritten), clients retrieving the file MAY see the new version, or they MAY see the old version. For how long? For an unspecified amount of time. It typically achieves consistency in much less than 10 seconds, but you have to assume that it will, eventually, take more than 10 seconds to achieve consistency. More interestingly (sadly?), even after a successful retrieval of the new version, clients MAY still receive the older version later.
One thing that you can be assured of is: if a client starts download a version of the file, it will download that entire version (in other words, there's no chance that you would receive for example, the first half of the XML file as the old version and the second half as the new version).
With that in mind, notice that your script could fail to identify the change within your 10-second timeframe: you could make multiple requests, even after a change, until your script downloads a changed version. And even then, after you detect the change, it is (unfortunately) entirely possible the the next request would download the previous (!) version, and trigger yet another "change" in your code, then the next would give the current version, and trigger yet another "change" in your code!
If you are OK with the fact that S3 provides eventual consistency, there's a way you could possibly improve your system.
Idea 1: S3 event notifications + SNS
You mentioned that you thought about using SNS. That could definitely be an interesting approach: you could enable S3 event notifications and then get a notification through SNS whenever the file is updated.
How do you get the notification? You would need to create a subscription, and here you have a few options.
Idea 1.1: S3 event notifications + SNS + a "web app"
If you have a "web application", ie, anything running in a publicly accessible HTTP endpoint, you could create an HTTP subscriber, so SNS will call your server with the notification whenever it happens. This might or might not be possible or desirable in your scenario
Idea 2: S3 event notifications + SQS
You could create a message queue in SQS and have S3 deliver the notifications directly to the queue. This would also be possible as S3 event notifications + SNS + SQS, since you can add a queue as a subscriber to an SNS topic (the advantage being that, in case you need to add functionality later, you could add more queues and subscribe them to the same topic, therefore getting "multiple copies" of the notification).
To retrieve the notification you'd make a call to SQS. You'd still have to poll - ie, have a loop and call GET on SQS (which cost about the same, or maybe a tiny bit more depending on the region, than S3 GETs). The slight difference is that you could reduce a bit the number of total requests -- SQS supports long-polling requests of up to 20 seconds: you make the GET call on SQS and, if there are no messages, SQS holds the request for up to 20 seconds, returning immediately if a message arrives, or returning an empty response if no messages are available within those 20 seconds. So, you would send only 1 GET every 20 seconds, to get faster notifications than you currently have. You could potentially halve the number of GETs you make (once every 10s to S3 vs once every 20s to SQS).
Also - you could chose to use one single SQS queue to aggregate all changes to all XML files, or multiple SQS queues, one per XML file. With a single queue, you would greatly reduce the overall number of GET requests. With one queue per XML file, that's when you could potentially "halve" the number of GET request as compared to what you have now.
Idea 3: S3 event notifications + AWS Lambda
You can also use a Lambda function for this. This could require some more changes in your environment - you wouldn't use a Shell Script to poll, but S3 can be configured to call a Lambda Function for you as a response to an event, such as an update on your XML file. You could write your code in Java, Javascript or Python (some people devised some "hacks" to use other languages as well, including Bash).
The beauty of this is that there's no more polling, and you don't have to maintain a web server (as in "idea 1.1"). Your code "simply runs", whenever there's a change.
Notice that, no matter which one of these ideas you use, you still have to deal with eventual consistency. In other words, you'd know that a PUT/POST has happened, but once your code sends a GET, you could still receive the older version...
Idea 4: Use DynamoDB instead
If you have the ability to make a more structural change on the system, you could consider using DynamoDB for this task.
The reason I suggest this is because DynamoDB supports strong consistency, even for updates. Notice that it's not the default - by default, DynamoDB operates in eventual consistency mode, but the "retrieval" operations (GetItem, for example), support fully consistent reads.
Also, DynamoDB has what we call "DynamoDB Streams", which is a mechanism that allows you to get a stream of changes made to any (or all) items on your table. These notifications can be polled, or they can even be used in conjunction with a Lambda function, that would be called automatically whenever a change happens! This, plus the fact that DynamoDB can be used with strong consistency, could possibly help you solve your problem.
In DynamoDB, it's usually a good practice to keep the records small. You mentioned in your comments that your XML files are about 2kB - I'd say that could be considered "small enough" so that it would be a good fit for DynamoDB! (the reasoning: DynamoDB reads are typically calculated as multiples of 4kB; so to fully read 1 of your XML files, you'd consume just 1 read; also, depending on how you do it, for example using a Query operation instead of a GetItem operation, you could possibly be able to read 2 XML files from DynamoDB consuming just 1 read operation).
Some references:
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
http://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html
I can think of another way by using S3 Versioning; this would require the least amount of changes to your code.
Versioning is a means of keeping multiple variants of an object in the same bucket.
This would mean that every time a new file.xml is uploaded, S3 will create a new version.
In your script, instead of getting the object and comparing it, get the HEAD of the object which contains the VersionId field. Match this version with the previous version to find out if the file has changed.
If the file has indeed changed, get the new file, and also get the new version of that file and save it locally so that next time you can use this version to check if a newer-newer version has been uploaded.
Note 1: You will still be making lots of calls to S3, but instead of fetching the entire file every time, you are only fetching the metadata of the file which is much faster and smaller in size.
Note 2: However, if your aim was to reduce the number of calls, the easiest solution I can think of is using lambdas. You can trigger a lambda function every time a file is uploaded that then calls the REST endpoint of your service to notify you of the file change.
You can use --exact-timestamps
see AWS discussion
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Instead of using versioning, you can simply compare the E-Tag of the file, which is available in the header, and is similar to the MD-5 hash of the file (and is exactly the MD-5 hash if the file is small, i.e. less than 4 MB, or sometimes even larger. Otherwise, it is the MD-5 hash of a list of binary hashes of blocks.)
With that said, I would suggest you look at your application again and ask if there is a way you can avoid this critical path.

Resources