How to trigger an event to monitor dataproc job status in cloud function? - aws-lambda

I am looking for some way by which I can track my dataproc spark job status from google cloud function in real time. I am not aware of any google cloud service which can fit in this situation similar to AWS lambda EventBridge (CloudWatch Events) where the lambda is triggered when a glue job state changes.
Please help me to solve this problem.
Any comments will be appreciated.
Thanks in advance :)

To track the real-time status of a DataProc Spark job from a Google Cloud Function, there are several options available. These include using Cloud Pub/Sub to trigger a Cloud Function after a job completes, using Cloud Storage events to trigger the Cloud Function when a job status is written to a bucket, using Cloud Logging to trigger a Cloud Function based on certain log entries, or using the Dataproc Jobs API to retrieve the job status and call it from the Cloud Function. The appropriate option will depend on the specific use case.

Related

How to do performance testing of a lambda which is triggered by event bridge?

I have a lambda that is triggered whenever an event is dropped in the eventbus to which my lambda is connected and is triggered automatically. How can I performance test it to test how it performs of 500 events are dropped at a time?
Also I know aws has some inbuilt metrics like lambda execution time, xray tracing etc. Can anyone let me know how to use them for my use case?
If by "eventbus" you mean AWS EventBus which is a part of Amazon EventBrigde my expectation is that the easiest would be using PutEvents API endpoint, you can come up with a JSON payload having 500 events or make 500 separate calls with 1 event or any combination you can think of.
Be aware that the AWS API requests need to be signed to the load testing tool you choose must have the possibility to calculate this signature. A guide for Apache JMeter: How to Handle Dynamic AWS SigV4 in JMeter
With regards to metrics - check out AWS CloudWatch

Using Spring or Lambda for bulk event trigger

Looking for some help on an application design. I am using spring framework and hosting application in AWS.
I am working on an enterprise Java Web application that is suppose to handle events when their trigger time is reached. For example, consumers can set an event to begin on 12/20/22 at 07:35 AM, and system is suppose to send a notification when that time is reached.
I can store these events in a database along with their trigger time and setup a Spring scheduler (#Scheduler) to run every minute and process events whose trigger time is reached. My only concern with this approach is, there could be hundreds/thousands of event to trigger at any minute, and it cannot be processed within one minute.
Is there any alternate way to design this? I don't know if Spring offers a feature where I could create these Event, and Frameworks trigger these events when trigger time is reached. In that way, I can stay away from managing Scheduling and Triggering part.
I am using AWS to host this applications, so another option I'm thinking towards is creating an AWS lambda for every such Event, and let AWS manage the triggering part. In that way, I can stay away from managing the triggers.
Let me know your views? Or If you came across similar problems and how you resolved that?
You can consider using spring-cloud-dataflow to manage this as tasks and streams.
You create a custom batch application that will use #Scheduled to check the your database when events are dure and then send events to a stream. You can use Spring Integration APIs to interact with RabbitMQ or Kafka topics.
The event should contain enough information needed to process the event.
You then have a stream application that produces the content and send via email or pass it on to a separate stream app that sends the email.
https://dataflow.spring.io/docs/stream-developer-guides/programming-models/
The flow will look something like:
:mail_events | message-processor | message-sender
You will configure property for mail_events to match the topic created and configured for you mail-event-batch application.
You can use Spring Cloud Data Flow to manage the mail-event-batch application as well.
You can scale each application https://dataflow.spring.io/docs/recipes/scaling/

Is there a good pattern to send a message between AWS Lambdas

My use case is the following. I have 5 lambdas. They need to talk to each other. I've heard that it can be done with SNS but also SNS and SQS. What is the difference, why not call lambdas only from one another directly?
It's possible to design durable and scalable applications using SNS-SQS AWS pattern. You can do this by having an SNS topic to which lambda A posts then the SNS triggers directly SQS which is a queue. In that way if you have high volume messages they will be processed sequentially.
Take care that the SNS and SQS can trigger more than once.
For more info check the article here:
https://aws.amazon.com/blogs/compute/designing-durable-serverless-apps-with-dlqs-for-amazon-sns-amazon-sqs-aws-lambda/
You can also use AWS Step Function which is a serverless function orchestrator that makes it easy to sequence AWS Lambda functions and multiple AWS services.
You can check out getting started guide here - https://docs.aws.amazon.com/step-functions/latest/dg/getting-started.html

Best way to schedule one-time events in serverless environments

Example use case
Send the user a notification 2 hours after signup.
Options considered
setTimeout(() => { /* send notification */ }, 2*60*60*1000); is not an option in serverless environments since the function terminates after execution (so it has to be stateless).
CloudWatch events can schedule lambda invocations using cron expressions - but this was designed for repetitive invocations (there's a limit of 100 rules/region).
I have not seen scheduling options in AWS SNS/SQS or GCP Pub/Sub. Are there alternatives with scheduling?
I want to avoid (if possible) setting up a dedicated message broker (overkill) or stateful/non-serverless instance - is there a serverless way to do this?
I can queue the events in a database and invoke a lambda function every minute to poll the database for events to execute in that minute... is there a more elegant solution?
Use AWS Step functions, they are like serverless functions that don't have the 15 minute limit like AWS Lambda does. You can design a workflow in AWS step that integrates with API Gateway, Lambda and SNS to send email and text notifications as follows:
Create a REST API via API gateway that will invoke a Lambda function passing in for example, the destination address (email, phone #) of the SNS notification, when it should be sent, notification method (e.g. email, text, etc.).
The Lambda function on invocation will invoke the Step function passing in the data (Lambda is needed because API Gateway currently can't invoke Step functions directly).
The Step function is basically a workflow, you can define states for waiting (like waiting for the specified time to send the notification e.g. 30 seconds), and states for invoking other Lambda functions that can use SNS to send out an email and/or text notifications.
A rudimentary example is provided by AWS w/ their Task Timer example.
Things are coming on GCP for doing this, but not very soon. Thereby, today, the solution is to poll a database.
You can to that with Datastore/firestore with the execution datetime indexed (to prevent to read all the documents each minute). But be careful of traffic spike, you could create hotspot.
You can use Cloud Scheduler on Google Cloud Platform. As is is stated in the official documentation :
Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, cloud infrastructure operations, and more. You can automate everything, including retries in case of failure to reduce manual toil and intervention. Cloud Scheduler even acts as a single pane of glass, allowing you to manage all your automation tasks from one place.
Here you can check a quickstart for using it with Pub/Sub and Cloud Functions.

What are the drawbacks of SQS poller which AWS Lambda removes?

I have an architecture which looks like as follows:-
Multiple SNS -> (AWS Lambda or SQS with Poller)??? -> Dynamo Db
So, basically multiple SNS have subscribed to AWS Lambda or SQS with Poller and that thing pushes data to Dynamo Db.
But this ? thing do lot of transformation of message in between. So, now for such case, I can either use AWS Lambda or SQS with Poller. With AWS Lambda, I can do transformation in Lambda function and with SQS with Poller, I can do transformation in Poller. With AWS Lambda, I see one problem that code would become quite large as transformation is quite complex(has lot of rules), so I am thinking to use SQS. But before finalising on SQS, I wanted to know of the drawbacks of SQS which AWS Lambda removes?
Please help. Let me know if you need further information.
Your question does not contain much detail, so I shall attempt to interpret your needs.
Option 1: SQS Polling
Information is sent to an Amazon SNS topic
An SQS queue is subscribed to the SNS topic
An application running on Amazon EC2 instance(s) regularly poll the SQS queue to ask for a message
If a message is available, the data in the message is transformed and saved to an Amazon DynamoDB table
This approach is good if the transformation takes a long time to process. The number of EC2 instances can be scaled based upon the amount of work in the queue. Multiple messages can be received at the same time. It is a traditional message-based approach.
Option 2: Using Lambda
Information is sent to an Amazon SNS topic
An AWS Lambda function is subscribed to the SNS topic
A Lambda function is invoked when a message is sent to the SNS topic
The Lambda function transforms the data in the message and saves it to an Amazon DynamoDB table
AWS Lambda functions are limited to five minutes of execution time, so this approach will only work if the transformation process can be completed within that timeframe.
No servers are required because Lambda will automatically run multiple functions in parallel. When no work is to be performed, no Lambda functions execute and there is no compute charge.
Between the two options, using AWS Lambda is much more efficient and scalable but it might vary depending upon your specific workload.
We can now use SQS messages to trigger AWS Lambda Functions.
28 JUN 2018: AWS Lambda Adds Amazon Simple Queue Service to Supported
Event Sources
Moreover, no longer required to run a message polling service or create an SQS to SNS mapping.
AWS Serverless Model supports a new event source as following:
Type: SQS
PropertiesProperties:
QueueQueue: arn:aws:sqs:us-west-2:213455678901:test-queue arn:aws:sqs:us-west-2:123791293
BatchSize: 10
AWS Console also support:
Further details:
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

Resources