I am looking to run a batch script on files that are uploaded from my website (one at a time), and return the resulting file produced by that batch script. The website is hosted on a shared linux environment, so I cannot run the batch file on the server.
It sounds like something I could accomplish with Amazon S3 and Amazon Lambda, but I was wondering if there were any other services out there that would allow me to accomplish the same task.
I would recommend that you look into S3 Events and Lambda.
Using S3 events, you can trigger a lambda function on puts and deletes in a S3 bucket and depending on your "batch file" task you may be able to achieve your goal purely in Lambda.
If you cannot use Lambda to replace the functionality of your batch file you can try the following:
If you need to have the batch process run on a specific instance, take a look at Amazon SQS. You can have the S3 event triggered Lambda create a work item in SQS and your instance can regularly poll SQS for work to process.
If you need something a bit more real time, you could use Amazon SNS for a push rather than pull approach to the above.
If you don't need the file to be processed by a specific instance but you have to have a batch file run against it, perhaps you can have your S3 event triggered Lambda create an instance that has a UserData script that will sys prep the server as needed, download the s3 file, process the batch against it and then finally self terminate by looking up it's own instance ID via the EC2 Metadata service and calling the api method terminate instances.
Here is some related reading to assist with the above approaches:
Amazon SQS
https://aws.amazon.com/documentation/sqs/
Amazon SNS
https://aws.amazon.com/documentation/sns/
Amazon Lambda
https://aws.amazon.com/documentation/lambda/
Amazon S3 Event Notifications
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
EC2 UserData
http://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-instance-metadata.html#instancedata-add-user-data
EC2 Metadata Service
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html#instancedata-data-retrieval
AWS Tools for Powershell Cmdlet Reference
http://docs.aws.amazon.com/powershell/latest/reference/Index.html
Related
I have a java microservice that runs in a Docker container in a Ec2 instance .
It has to get notified when a file is dropped in a S3 bucker. We have a SNS and SQS that is connected to the S3 bucket. How can i connect the microserice to the SNS/SQS ? If there is a better way to get the java microservice get notified when the files is dropped into S3 bucket please let me know ?
The AWS SDK for Java is pretty good.
You can either:
write an HTTP endpoint that SNS can post to (see http://docs.aws.amazon.com/sns/latest/dg/SendMessageToHttp.example.java.html)
or
subscribe to an SQS topic (see https://github.com/aws/aws-sdk-java/blob/master/src/samples/AmazonSimpleQueueService/SimpleQueueServiceSample.java).
Yes, this is one use case of AWS Lambda:
As an event-driven compute service where AWS Lambda runs your code in
response to events, such as changes to data in an Amazon S3 bucket or
an Amazon DynamoDB table.
http://docs.aws.amazon.com/lambda/latest/dg/welcome.html
Since it runs your code, you are free to write something that places a request to a microservice.
As part of my CD pipeline in snap-ci.com, I want to start the instances in my AWS opsworks stack before deploying the application.
As starting hosts takes a certain amount of time (after the command has already returned), I need to poll for the instances to be running (using the describe-instances command in AWS CLI). This command does return a full JSON response of which one of the fields contains the status of the instance (e.g. "running").
I am new to shell scripting and AWS CLI and would appreciate some pointers. I am aware that I can also use the AWS SDK's to program it in java, but that would require to deploy that program to the snap-ci hosts first which sounds complex as well.
AWS CLI has support for wait commands, those will block and wait for the condition you specify, such as waiting for an instance to be ready.
The Advanced Usage of the AWS CLI talk from Re:Invent 2014 shows how to use waiters (18:55), queries, profiles and other tips for using CLI.
I am looking to have multiple Amazon EC2 instances use the same data store. Amazon does not condone mounting an S3 Bucket as a file system, so I am trying to avoid this solution. Is there a way to synchronize an EBS volume with S3 or would it be best to use rsync and cron?
Do you really have to have the files locally available from within EBS? What if instead you served them to yourself via CloudFront, and restricted the permissions so that only your instances (or only your Security Group) could see the files?
Come Fall 2015 you'll be able to use Elastic File Storage (EFS) for this. But until then, I would suppose the next best thing is to use the AWS command-line to sync down from S3 to your volume:
aws s3 sync s3://my-bucket/my/folder /mnt/my-ebs/
After the initial run, that sync command is surprisingly fast. So from there you could just cron it to run hourly or so?
I have a Java process which is an executable jar file. This java process reads in a file, performs some processing and then writes the output to a new file. I'd like to share this process via a cloud service but I'm what to use.
Is there a Heroku or Amazon setup I could use for this ?
Using Amazon I could upload the file to processed Amazon Simple Storage Service and trigger the process job and then expose the results of this job via a web service?
I'm just enquiring about high level options as I'm not sure where to begin?
Where do these input files come from? What services consume them afterwards?
One possible workflow on AWS.
1. Modify your jar to be able read from S3 and write to S3
2. Create a webservice in this jar with a simple API that takes an S3 location and gives back an S3 location where the output will go.
3. Run this jar on an EC2 host, possibly using Elastic Beanstalk
The workflow would be something like:
1. Upload an input file to s3
2. Call the webservice, providing the S3 url to the input file
3. The webservice does the job and writes the file to another S3 bucket, giving back the URL
This can be modified in all sorts of ways, but it's difficult to give more guidance without knowing more of your requirements.
I have cron-jobs which will create hive instances using elastic-mapreduce to run my queries to process raw-data from amazon s3 folder and push the data to the database.
After the cron-job runs, it will create an amazon ec2 instance in amazon vm. If there are any errors in the job the logs are stored in the vm and will vanish once the ec2 instance is shut down. Is there any way to send those logs to email or to any other server where I can investigate the logs even after the vm is shut down,.
You can have a look at log rotate, it can zip your log files periodically and you can choose which log file u want to rotate and also you can send the zip file to a desired email also. http://geekospace.com/using-logrotate-to-rotate-log-files/ (Disclaimer: I wrote this ariticle)