I made MSK cluster, and sent a message from my Producer.
It works well.
And, I want to make my Consumer get the message.
The Consumer in a python file, and the python file is in EC2.
How can I run the python file and make my Consumer get the message?
Actually I can run the file manually, but I want it run automatically after the Producer sends message.
Do you initiate a consumer in the Python file? Or the Python file is the business logic you want to execute on each Kafka message? It is not clear from your question.
If it's the former you basically need to run the script when the EC2 is bootstrapping. You can do that using user-data which is a bash script running when the instance initiated.
If it's the latter, you should initiate a consumer and pass the handler from the Python script to the consumer poll loop.
Since you are running MSK, you can use a Python Lambda action without writing any Kafka consumer code or running an EC2 machine.
https://docs.aws.amazon.com/lambda/latest/dg/with-msk.html
If you must use EC2, then you should run your consumer indefinitely, and have it wait for any message, not be dependent upon any specific producer action.
Related
I'm currently building cd pipeline that replace existing Google Cloud Dataflow streaming pipeline with the new one with bash command. The old and new has the same name job. And I write bash command like this
gcloud dataflow jobs drain "${JOB_ID}" --region asia-southeast2 && \
gcloud dataflow jobs run NAME --other-flags
The problem with this command is that the first command doesn't wait until the job finish draining so that the second command throw error because duplicated job name.
Is there a way to wait until dataflow job finish draining? Or is there any better way?
Thanks!
Seeing as this post hasn't garnered any attention, I will be posting my comment as a post:
Dataflow jobs are asynchronous to the command gcloud dataflow jobs run, so when you use && the only thing that you'll be waiting on will be for the command to finish and since that command is just to get the process started (be it draining a job or running one) it finishes earlier than the job/drain does.
There are a couple of ways you could wait for the job/drain to finish, both having some added cost:
You could use a Pub/Sub step as part of a larger Dataflow job (think of it as a parent to the jobs you are draining and running, with the jobs you are draining or running sending a message to Pub/Sub about their status once it changes) - you may find the cost of Pub/Sub [here].
You could set up some kind of loop to repeatedly check the status of the job you're draining/running, likely inside of a bash script, though that can be a bit more tedious and isn't as neat as a listener, and it would require one's own computer/connection to be maintained or a GCE instance.
I have 100 servers with Kafka deployed on some of the hosts, and the kafka service is started by the kafka user. I want to use a shell script to find out the kafka machine and consume a message to ensure that the service is available.
Find out the Kafka machine
I'm not sure what this means
ensure that the service is available.
Ideally, you'd use something like Nagios or Consul for this, not a consumer. Otherwise, kcat -L or kcat -C are two popular CLI ways to check/consume from brokers that doesnt require extra dependencies.
Beyond that, your commands wouldn't be checking a specific broker, only the cluster
I have a .bat file on a windows ec2 instance I would like to run every day.
Is there any way to schedule the instance to run this file every day and then shut down the ec2 instance without manually going to the ec2 management console and launching the instance?
There are two requirements here:
Start the instance each day at a particular time (This is an assumption I made based on your desire to shutdown the instance each day, so something needs to turn it on)
Run the script and then shutdown
Option 1: Start & Stop
Amazon CloudWatch Events can perform a task on a given schedule, such as once-per-day. While it has many in-built capabilities, it cannot natively start an instance. Therefore, configure it to trigger an AWS Lambda function. The Lambda function can start the instance with a single API call.
When the instance starts up, use the normal Windows OS capabilities to run your desired program, eg: Automatically run program on Windows Server startup
When the program has finished running, it should issue a command to the Windows OS to shutdown Windows. The benefit of doing it this way (instead of trying to schedule a shutdown) is that the program will run to completion before any shutdown is activated. Just be sure to configure the EC2 instance to Stop on Shutdown (which is the default behaviour).
Option 2: Launch & Terminate
Instead of starting and stopping an instance, you could instead launch a new instance using an Amazon CloudWatch Events schedule.
Pass the desired PowerShell script to run in the instance's User Data. This script can install and run software.
When the script has finished, it should call the Windows OS command to shutdown Windows. However, this time configure Terminate on Shutdown so that the instance is terminated (deleted). This is fine because the above schedule will launch a new instance next time.
The benefit of this method is that the software configuration, and what should be run each time, can be fully configured via the User Data script, rather than having to start the instance, login, change the scripts, then shutdown. There is no need to keep an instance around just to be Stopped for most of the day.
Option 3: Rethink your plan and Go Serverless!
Instead of using an Amazon EC2 instance to run a script, investigate the ability to run an AWS Lambda function instead. The Lambda function might be able to do all the processing you desire, without having to launch/start/stop/terminate instances. It is also cheaper!
Some limitations might preclude this option (eg maximum 5 minutes run-time, limit of 500MB disk space) but it should be the first option you explore rather than starting/stopping an Amazon EC2 instance.
I have setup job in laravel thats time consuming so user get upload file and exit, and it works just fine when I do php artisan queue:listen or queue:work.
But that doesn't work when I get out of terminal. What do I need to do to have it work automatically?
I've tried amazon aws sqs, but that's useless because I can queue the job but thats about it, it doesn't have option to set endpoint to hit on job received.
I know there is iron.io but that outside of my budget.
Below is my code to push the job to database
public function queue()
{
$user = Property::find(1);
$this->dispatch(new SendReportEmail($user));
}
I cannot say Amazon sqs is useless
you can use a job on your scheduled jobs in laravel and use taht to recieve jobs from amazon sqs which has reference to the file / row to be processed and you can get the payload for the job and accordingly process that with the scheduled job.
For help here is a tutorial on setting up a queue listener for sqs via laravel
I can't get the information out of the documentation. Can anyone tell me how Spring-XD executes jobs? Does it assign a job to a certain container and is this job only executed on the container it is deployed to, or is each job execution assigned to another container? Can I somehow control that a certain job may be executed in parallel (with different arguments) and others may not ?
Thanks!
Peter
I am sure you would have seen some of the documentation here:
https://github.com/spring-projects/spring-xd/wiki/Batch-Jobs
To answer your questions:
Can anyone tell me how Spring-XD executes jobs? Does it assign a job to a certain container and is this job only executed on the container it is deployed to, or is each job execution assigned to another container?
After you create a new job definition using this:
xd>job create dailyfeedjob --definition "myfeedjobmodule" --deploy
the batch job module myfeedjobmodule gets deployed into the XD container. Once deployed, there is a job launching queue setup in the message broker: redis, rabbit or local. The name of the queue is job:dailyfeedjob in the message broker. Since this queue is bound to the job module deployed in the XD container, a request message sent to this queue is picked by the job module deployed inside that specific container.
Now, you can send the job launching request message (with job parameters) into the job:dailyfeedjob queue by simply setting up a stream that sends a message into this queue. For example: a trigger (fixed-delay, cron, date triggers) could do that. This also a job launch command from the shell which launches job only once.
This section would explain it more: https://github.com/spring-projects/spring-xd/wiki/Batch-Jobs#launching-a-job
Hence, the job is launched (every time it receives the job launching request) only inside the container where the job module is deployed and you can expect original the spring batch flow when the job is executed. (refer to shell doc for all the job related commands)
Can I somehow control that a certain job may be executed in parallel (with different arguments) and others may not ?
If it is for the different job parameters for the same job definition, then it would go to the same container where the job module is deployed.
But, you can still create a new job definition with the same batch job module.
xd>job create myotherdailyfeedjob --definition "myfeedjobmodule" --deploy
The only difference being it will be under that namespace. and, the job launching queue name would job:myotherdailyfeedjob. It all depends on how do you want to organize running your batch jobs.
Also, for parallel processing batch jobs you can use:
http://docs.spring.io/spring-batch/reference/html/scalability.html
and, XD provides single step partitioning support for running batch jobs:
Include this in your job module:
<import resource="classpath:/META-INF/spring-xd/batch/singlestep-partition-support.xml"/>
with partitioner and tasklet beans defined.
You can try out some of the XD batch samples from here:
https://github.com/spring-projects/spring-xd-samples