MassTransit.ConfigurationException: The job service must be configured prior to configuring a job consumer, using either ConfigureJobServiceEndpoints - masstransit

Need some help to incorporate JobService into my current MassTransit. My current project work great with MassTransit Consumer, since I have some long run process, and I don't want to hold up the RabbitMq, so I tried to adopt JobService. I follow the JobService example, but run into the error:
MassTransit.ConfigurationException: The job service must be configured prior to configuring a job consumer, using either ConfigureJobServiceEndpoints or ConfigureJobService
at MassTransit.Configuration.JobConsumerMessageConnector2.ConnectConsumer(IConsumePipeConnector consumePipe, IConsumerFactory1 consumerFactory, IConsumerSpecification`1 specification) in /_/src/MassTransit/Consumers/Configuration/JobConsumerMessageConnector.cs
This is my setup:
services.AddMassTransit(busConfig =>
                {
                    if (isReceiver)
                    {
                        busConfig.AddConsumer<JobItemPayloadConsumer>(cfg =>
                        {
                            cfg.Options<JobOptions<JobItemPayLoad>>(options => options
                                .SetConcurrentJobLimit(10));
                        });
                    }
                    busConfig.AddSagaRepository<JobSaga>()
                       .EntityFrameworkRepository(r =>
                       {
                           r.ExistingDbContext<JobServiceSagaDbContext>();
                           r.UseSqlServer();
                       });
                    busConfig.AddSagaRepository<JobTypeSaga>()
                                .EntityFrameworkRepository(r =>
                                {
                                    r.ExistingDbContext<JobServiceSagaDbContext>();
                                    r.UseSqlServer();
                                });
                    busConfig.AddSagaRepository<JobAttemptSaga>()
                                .EntityFrameworkRepository(r =>
                                {
                                    r.ExistingDbContext<JobServiceSagaDbContext>();
                                    r.UseSqlServer();
                                });
                    busConfig.SetKebabCaseEndpointNameFormatter();
                    busConfig.UsingRabbitMq((context, cfg) =>
                    {
                        var host = GetServiceBusHostingUri(config, env, busConfig);
                        InitializeMassTransitBus(config, busConfig, context, cfg, host);
                        cfg.ServiceInstance(instance =>
                        {
                            instance.ConfigureJobService();
                            instance.ConfigureJobServiceEndpoints();
                            // configure the job consumer on the job service endpoints
                            instance.ConfigureEndpoints(context);
                        });
                        busConfig
                            .ConfigureSagas(config, host)
                            .ConfigureEndpoints(cfg, config, host)
                        ;
                        cfg.ConfigureEndpoints(context);
                        if (isReceiver)
                        {
                            busConfig
                                .ConfigureReceivers(cfg, config, context);
                        }
                    });
                });
Thank you

never mind, I figured the issue. I need to register the ServiceInstance before the ConfigureEndpoints(context).

Related

Reset Spring Boot Kafka Stream Application on modifying topics

I'm using a spring-kafka to run Kafka Stream in a Spring Boot application using StreamsBuilderFactoryBean. I changed the number of partitions in some of the topics from 100 to 20 by deleting and recreating them, but now on running the application, I get the following error:
Existing internal topic MyAppId-KSTREAM-AGGREGATE-STATE-STORE-0000000092-changelog has invalid partitions: expected: 20; actual: 100. Use 'kafka.tools.StreamsResetter' tool to clean up invalid topics before processing.
I couldn't access the class kafka.tools.StreamsResetter and tried calling StreamsBuilderFactoryBean.getKafkaStreams.cleanup() but it gave NullPointerException. How do I do the said cleanup?
The relevant documentation is at here.
Step 1: Local Cleanup
For Spring Boot with StreamsBuilderFactoryBean, the first step can be done by simply adding CleanerConfig to the constructor:
// Before
new StreamsBuilderFactoryBean(new KafkaStreamsConfiguration(config));
// After
new StreamsBuilderFactoryBean(new KafkaStreamsConfiguration(config), new CleanupConfig(true, true));
This enables calling the KafkaStreams.cleanUp() method on both before start() & after stop().
Step 2: Global Cleanup
For step two, with all instances of the application stopped, simply use the tool as explained in the documentation:
# In kafka directory
bin/kafka-streams-application-reset.sh --application-id "MyAppId" --bootstrap-servers 1.2.3.4:9092 --input-topics x --intermediate-topics first_x,second_x,third_x --zookeeper 1.2.3.4:2181
What this does:
For any specified input topics: Reset the application’s committed consumer offsets to "beginning of the topic" for all partitions (for consumer group application.id).
For any specified intermediate topics: Skip to the end of the topic, i.e. set the application’s committed consumer offsets for all partitions to each partition’s logSize (for consumer group application.id).
For any internal topics: Delete the internal topic (this will also delete committed the corresponding committed offsets).

storm rebalance command not updating the number of workers for a topology

I tried executing the following command for storm 1.1.1:
storm [topologyName] -n [number_of_worker]
The command successfully runs but the number of workers remain unchanged. I tried reducing the number of workers too. That also didn't work.
I have no clue whats happening. Any pointer will be helpful.
FYI:
i have implemented custom scheduling?. Is it because of that?
You can always check Storm's source code behind that CLI. Or code the re-balance (tested against 1.0.2):
RebalanceOptions rebalanceOptions = new RebalanceOptions();
rebalanceOptions.set_num_workers(newNumWorkers);
Nimbus.Client.rebalance("foo", rebalanceOptions);

Hive execution hook

I am in need to hook a custom execution hook in Apache Hive. Please let me know if somebody know how to do it.
The current environment I am using is given below:
Hadoop : Cloudera version 4.1.2
Operating system : Centos
Thanks,
Arun
There are several types of hooks depending on at which stage you want to inject your custom code:
Driver run hooks (Pre/Post)
Semantic analyizer hooks (Pre/Post)
Execution hooks (Pre/Failure/Post)
Client statistics publisher
If you run a script the processing flow looks like as follows:
Driver.run() takes the command
HiveDriverRunHook.preDriverRun()
(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Driver.compile() starts processing the command: creates the abstract syntax tree
AbstractSemanticAnalyzerHook.preAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
Semantic analysis
AbstractSemanticAnalyzerHook.postAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
Create and validate the query plan (physical plan)
Driver.execute() : ready to run the jobs
ExecuteWithHookContext.run()
(HiveConf.ConfVars.PREEXECHOOKS)
ExecDriver.execute() runs all the jobs
For each job at every HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL interval:
ClientStatsPublisher.run() is called to publish statistics
(HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
If a task fails: ExecuteWithHookContext.run()
(HiveConf.ConfVars.ONFAILUREHOOKS)
Finish all the tasks
ExecuteWithHookContext.run() (HiveConf.ConfVars.POSTEXECHOOKS)
Before returning the result HiveDriverRunHook.postDriverRun() ( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
Return the result.
For each of the hooks I indicated the interfaces you have to implement. In the brackets
there's the corresponding conf. prop. key you have to set in order to register the
class at the beginning of the script.
E.g: setting the PreExecution hook (9th stage of the workflow)
HiveConf.ConfVars.PREEXECHOOKS -> hive.exec.pre.hooks :
set hive.exec.pre.hooks=com.example.MyPreHook;
Unfortunately these features aren't really documented, but you can always look into the Driver class to see the evaluation order of the hooks.
Remark: I assumed here Hive 0.11.0, I don't think that the Cloudera distribution
differs (too much)
a good start --> http://dharmeshkakadia.github.io/hive-hook/
there are examples...
note: hive cli from console show the messages if you execute from hue, add a logger and you can see the results in hiveserver2 log role.

Circular queue on beanstalkd

I'm using beanstalkc a python wrapper for the beanstalkd application.
What I'd like to do is have the producer to put some jobs(e.g: 'a','b','c','d') once and that the consumer could get the jobs continually(e.g: 'a','b','c','d','a','b',...).
In the consumer I get the jobs with job.reserve(). I thought the solution was just reserving the jobs without deleting them, but after I ran some consumer processes I got a TIMEOUT ERROR.
I'm clearly doing something wrong but I couldn't find a way to "re-queue" the jobs the consumers use.
I think this could be a solution:
producer:
queue.put('a', priority=0)
Consumer:
job = queue.reserve()
do something with job
new_priority = job.stats()['pri'] + 1
job.release(priority=new_priority)
Why not just, when you've completed a particular job, and after you've released it, put another copy of the same job you've just finished back into the queue?
You'd otherwise be trying to get it to do something that it's not designed to do.

Run #Scheduled task only on one WebLogic cluster node?

We are running a Spring 3.0.x web application (.war) with a nightly #Scheduled job in a clustered WebLogic 10.3.4 environment. However, as the application is deployed to each node (using the deployment wizard in the AdminServer's web console), the job is started on each node every night thus running multiple times concurrently.
How can we prevent this from happening?
I know that libraries like Quartz allow coordinating jobs inside clustered environment by means of a database lock table or I could even implement something like this myself. But since this seems to be a fairly common scenario I wonder if Spring does not already come with an option how to easily circumvent this problem without having to add new libraries to my project or putting in manual workarounds.
We are not able to upgrade to Spring 3.1 with configuration profiles, as mentioned here
Please let me know if there are any open questions. I also asked this question on the Spring Community forums. Thanks a lot for your help.
We only have one task that send a daily summary email. To avoid extra dependencies, we simply check whether the hostname of each node corresponds with a configured system property.
private boolean isTriggerNode() {
String triggerHostmame = System.getProperty("trigger.hostname");;
String hostName = InetAddress.getLocalHost().getHostName();
return hostName.equals(triggerHostmame);
}
public void execute() {
if (isTriggerNode()) {
//send email
}
}
We are implementing our own synchronization logic using a shared lock table inside the application database. This allows all cluster nodes to check if a job is already running before actually starting it itself.
Be careful, since in the solution of implementing your own synchronization logic using a shared lock table, you always have the concurrency issue where the two cluster nodes are reading/writing from the table at the same time.
Best is to perform the following steps in one db transaction:
- read the value in the shared lock table
- if no other node is having the lock, take the lock
- update the table indicating you take the lock
I solved this problem by making one of the box as master.
basically set an environment variable on one of the box like master=true.
and read it in your java code through system.getenv("master").
if its present and its true then run your code.
basic snippet
#schedule()
void process(){
boolean master=Boolean.parseBoolean(system.getenv("master"));
if(master)
{
//your logic
}
}
you can try using TimerManager (Job Scheduler in a clustered environment) from WebLogic as TaskScheduler implementation (TimerManagerTaskScheduler). It should work in a clustered environment.
Andrea
I've recently implemented a simple annotation library, dlock, to execute a scheduled task only once over multiple nodes. You can simply do something like below.
#Scheduled(cron = "59 59 8 * * *" /* Every day at 8:59:59am */)
#TryLock(name = "emailLock", owner = NODE_NAME, lockFor = TEN_MINUTE)
public void sendEmails() {
List<Email> emails = emailDAO.getEmails();
emails.forEach(email -> sendEmail(email));
}
See my blog post about using it.
You don't neeed to synchronize your job start using a DB.
On a weblogic application you can get the instanze name where the application is running:
String serverName = System.getProperty("weblogic.Name");
Simply put a condition two execute the job:
if (serverName.equals(".....")) {
execute my job;
}
If you want to bounce your job from one machine to the other, you can get the current day in the year, and if it is odd you execute on a machine, if it is even you execute the job on the other one.
This way you load a different machine every day.
We can make other machines on cluster not run the batch job by using the following cron string. It will not run till 2099.
0 0 0 1 1 ? 2099

Resources