I came across this question in my AWS study:
A user is designing a new service that receives location updates from
3600 rental cars every second. The cars location needs to be uploaded
to an Amazon S3 bucket. Each location must also be checked for
distance from the original rental location. Which services will
process the updates and automatically scale?
Options:
A. Amazon EC2 and Amazon EBS
B. Amazon Kinesis Firehose and Amazon S3
C. Amazon ECS and Amazon RDS
D. Amazon S3 events and AWS Lambda
My question is how can Option D be used as the solution? or should I use Firehose to ingest (capture and transform) data into S3?
Thanks.
I would choose Option B
a.)This provides a service to ingest data directly to S3
b.) Firehose would have Lambda transformation which can compute the distance from the original location and that can be stored in S3
Related
Problem
Is there a way to achieve transactionality between S3 and another database like ElasticSearch?
What I'm trying to do is to upload an object to S3 and save his identifier to ElasticSearch in an atomic way.
For the backend where logic exists, we are using Java with Springboot.
From AWS docs
I saw that this is a common pattern recommended by AWS, but they mention that you need to handle on our own the failures:
"You can also store the item as an object in Amazon Simple Storage Service (Amazon S3) and store the Amazon S3 object identifier in your DynamoDB item."
"DynamoDB doesn't support transactions that cross Amazon S3 and DynamoDB. Therefore, your application must deal with any failures, which could include cleaning up orphaned Amazon S3 objects."
Ref: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-use-s3-too.html
I have a high voluminous data in my oracle database. I want to migrate it on the AWS S3 bucket. I cannot find a good documentation for this. Please share if someone has already done it.
Thanks
You can use AWS Data Pipeline
[Copied from above link]
With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR.
Also found some code on GitHub to backup Oracle data to S3 link
I have a java microservice that runs in a Docker container in a Ec2 instance .
It has to get notified when a file is dropped in a S3 bucker. We have a SNS and SQS that is connected to the S3 bucket. How can i connect the microserice to the SNS/SQS ? If there is a better way to get the java microservice get notified when the files is dropped into S3 bucket please let me know ?
The AWS SDK for Java is pretty good.
You can either:
write an HTTP endpoint that SNS can post to (see http://docs.aws.amazon.com/sns/latest/dg/SendMessageToHttp.example.java.html)
or
subscribe to an SQS topic (see https://github.com/aws/aws-sdk-java/blob/master/src/samples/AmazonSimpleQueueService/SimpleQueueServiceSample.java).
Yes, this is one use case of AWS Lambda:
As an event-driven compute service where AWS Lambda runs your code in
response to events, such as changes to data in an Amazon S3 bucket or
an Amazon DynamoDB table.
http://docs.aws.amazon.com/lambda/latest/dg/welcome.html
Since it runs your code, you are free to write something that places a request to a microservice.
I do some scientific calculations and I have some intermidiate results on each iteration, so I think I can use spot instance reduce cost of processing.
How can I save intermidiate results on each iteration?
How can I automatically rerun instance from last checkpoint when it's terminated?
When the spot price of an Amazon EC2 instance rises above your bid price, your Amazon EC2 instance is terminated. A 2-minute notice is provided via the metadata interface. You can use this notice as a trigger for saving your work, or you could simply save work at regular intervals regardless of the notice period.
Do not save your work "locally", since the Amazon EBS volumes will either be deleted (eg boot volume) or disconnected (eg data volumes). I would recommend that you save your work in a persistent datastore, such as a database or Amazon S3.
One option would be to save files to your local disk, but use the AWS Command-Line Interface (CLI) to copy the files to Amazon S3 using the aws s3 sync command.
Then, if you have configured a persistent spot instance, simply copy the files from Amazon S3 when the new Amazon EC2 spot instance is started.
See:
Spot Instance Interruptions
I have a large set of data to be analyzed and I am planning to use Amazon EC2 to compute. So I am wondering where can I store the data for computing.
There is a lot of lingo in the amazon world.
You can either store the data on an EBS drive connected to your EC2 instance, or if it is in MySQL format or a simple format, you could consider storing it on Amazon's managed MySQL service called RDS.
EC2 units can either be backed by S3 storage, or EBS volumes. If you want to have rapid access to your data, you will need to choose an EC2 instance backed by Amazon Elastic Block Storage (EBS). EBS gives you the flexibility to use any database or data structure you want.