How to run multiple Apache Ignite nodes in the same machine? - caching

I want to run multiple Ignite nodes on the same VM. Suppose, their address will be localhost:port (a set of ports, as a series). And, I want my Java client application to connect to the nodes.
Can you provide a simple and beginner-level guide to achieve this? The ones I tried are overwhelming.

public class MultipleIgnites {
public static void main(String[] args) throws Exception {
Ignition.start(new IgniteConfiguration().setIgniteInstanceName("s1")
.setDataStorageConfiguration(new DataStorageConfiguration()
.setDefaultDataRegionConfiguration(new DataRegionConfiguration().setPersistenceEnabled(true))));
Ignition.start(new IgniteConfiguration().setIgniteInstanceName("s2")
.setDataStorageConfiguration(new DataStorageConfiguration()
.setDefaultDataRegionConfiguration(new DataRegionConfiguration().setPersistenceEnabled(true))));
Ignition.start(new IgniteConfiguration().setIgniteInstanceName("s3")
.setDataStorageConfiguration(new DataStorageConfiguration()
.setDefaultDataRegionConfiguration(new DataRegionConfiguration().setPersistenceEnabled(true))));
}
This will start three of them, connected in one cluster.

See this documentation section that shows how to start isolated clusters in the same environment.

Related

Way to determine Kafka Topic for #KafkaListener on application startup?

We have 5 topics and we want to have a service that scales for example to 5 instances of the same app.
This would mean that i would want to dynamically (via for example Redis locking or similar mechanism) determine which instance should listen to what topic.
I know that we could have 1 topic that has 5 partitions - and each node in the same consumer group would pick up a partition. Also if we have a separately deployed service we can set the topic via properties.
The issue is that those two are not suitable for our situation and we want to see if it is possible to do that via what i explained above.
#PostConstruct
private void postConstruct() {
// Do logic via redis locking or something do determine topic
dynamicallyDeterminedVariable = // SOME LOGIC
}
#KafkaListener(topics = "{dynamicallyDeterminedVariable")
void listener(String data) {
LOG.info(data);
}
Yes, you can use SpEL for the topic name.
#{#someOtherBean.whichTopicToUse()}.

Apache Ignite CollisionSpi configuration

I have a requirement like "Only allow cache updates on same cache to run in sequence". Our client node is written in .net.
Every cache has affinity key and we use computeJob.AffinityCallAsync("cacheName", "affinityKey", job) to submit the compute job for execution.
Now If I use collisionSpi then, can I achieve "Sync jobs running on same node for same cache"? What configuration do I need to use?
Do I need to write same configuration for all the nodes(server and client)? I saw collisionSpi has no implementation for .net, so what can I do for .net client node?
Wrap your job logic in a lock to make it run in sequence:
public class MyJob : IComputeFunc<string>
{
private static readonly object SyncRoot = new object();
public string Invoke()
{
lock (SyncRoot)
{
// Update cache
}
}
}
Notes:
ICache.Invoke may be a better fit for your use case
The requirement for sequential update sounds weird and may cause suboptimal performance: Ignite caches are safe to update concurrently. Please make sure this requirement makes sense.
UPDATE
Adding a lock will ensure that one update happens at a time on a given node. Other nodes may perform updates in parallel. The order of updates is not guaranteed as well.

How to execute long running/polling operations in Eclipse Vert.x

I have a scenario where we need to keep on polling a database table for all active users and perform an api call to fetch any unread emails from their inbox. My approach is to use two verticles, one for polling and another for fetching emails for an user. The first verticle when found an user, sends a message(userId) to the second verticle through an event bus to fetch emails. That way, I can increase the number of second verticle instances required when there are lots of users.
Following two ways I found I can use to poll the database for active users and then perform an api call for each user.
vertx.setPeriodic
vertx.executeBlocking
But in the manual, its mentioned that for long running/polling tasks, its better to create an application managed thread to handle the task.
Is my approach for the problem correct, or is there a better approach to solve the problem at hand?
If I go through an application managed thread, can you please help illustrate with an example.
Thanks.
You can create a dedicated worker thread pool for that, and run your periodic tasks on it:
public class PeriodicWorkerExample {
public static void main(String[] args) {
Vertx vertx = Vertx.vertx();
vertx.deployVerticle(new MyPeriodicWorker(), new DeploymentOptions()
.setWorker(true)
.setWorkerPoolSize(1)
.setWorkerPoolName("periodic"));
}
}
class MyPeriodicWorker extends AbstractVerticle {
#Override
public void start() {
vertx.setPeriodic(1000, (r) -> {
System.out.println(Thread.currentThread().getName());
});
}
}

Leader election initialisation for multiple roles in clustered environment

I am currently working with an implementation based on:
org.springframework.integration.support.leader.LockRegistryLeaderInitiator
Having multiple candidate roles so that only one application instance within the cluster is elected as leader for each role. During initialisation of the cluster if autoStartup property is set to true the first application instance that is initialised will be elected as leader for all roles. This is something that we want to avoid and instead have a fair distribution of the lead roles across the cluster.
One possible solution on the above might be that when the cluster is ready and properly initialised then invoke an endpoint that will execute:
lockRegistryLeaderInitiator.start()
For all instances in the cluster so that the election process starts and the roles are fairly distributed across instances. One drawback on that is that this needs to be part of the deployment process, adding somehow complexity.
What is the proposed best practice on the above? Are there any plans for additional features related? For example to autoStartup the leader election only when X application instances are available?
I suggest you to take a look into the Spring Cloud Bus project. I don't know its details, but looks like your idea about autoStartup = false for all the LockRegistryLeaderInitiator instances and their startup by some distributed event is the way to go.
Not sure what we can do for you from the Spring Integration perspective, but it fully feels like not its responsibility and all the coordinations and rebalancing should be done via some other tool. Fortunately all our Spring projects can be used together as a single platform.
I think with the Bus you even really can track the number of instances joined the cluster and decide your self when and how to publish StartLeaderInitiators event.
It would be relatively easy with the Zookeeper LeaderInitiator because you could check in zookeeper for the instance count before starting it.
It's not so easy with the lock registry because there's no inherent information about instances; you would need some external mechanism (such as zookeeper, in which case, you might as well use ZK).
Or, you could use something like Spring Cloud Bus (with RabbitMQ or Kafka) to send a signal to all instances that it's time to start electing leadership.
I find very simple approach to do this.
You could add scheduled task to each node which periodically tries to yield leaderships if node holds too many of them.
For example, if you have N nodes and 2*N roles and you want to achieve completely fair leadership distribution (each node tries to hold only two leaderships) you can use something like this:
#Component
#RequiredArgsConstructor
public class FairLeaderDistributor {
private final List<LeaderInitiator> initiators;
#Scheduled(fixedDelay = 300_000) // once per 5 minutes
public void yieldExcessLeaderships() {
initiators.stream()
.map(LeaderInitiator::getContext)
.filter(Context::isLeader)
.skip(2) // keep only 2 leaderships
.forEach(Context::yield);
}
}
When all nodes will be up, you will eventually get completely fair leadership distribution.
You can also implement dynamic distribution based on current active node count if you use Zookeeper LeaderInitiator implementation.
Current number of participants can be easily retrieved from Curator LeaderSelector::getParticipants method.
You can get LeaderSelector with reflection from LeaderInitiator.leaderSelector field.
#Slf4j
#Component
#RequiredArgsConstructor
public class DynamicFairLeaderDistributor {
final List<LeaderInitiator> initiators;
#SneakyThrows
private static int getParticipantsCount(LeaderInitiator leaderInitiator) {
Field field = LeaderInitiator.class.getDeclaredField("leaderSelector");
field.setAccessible(true);
LeaderSelector leaderSelector = (LeaderSelector) field.get(leaderInitiator);
return leaderSelector.getParticipants().size();
}
#Scheduled(fixedDelay = 5_000)
public void yieldExcessLeaderships() {
int rolesCount = initiators.size();
if (rolesCount == 0) return;
int participantsCount = getParticipantsCount(initiators.get(0));
if (participantsCount == 0) return;
int maxLeadershipsCount = (rolesCount - 1) / participantsCount + 1;
log.info("rolesCount={}, participantsCount={}, maxLeadershipsCount={}", rolesCount, participantsCount, maxLeadershipsCount);
initiators.stream()
.map(LeaderInitiator::getContext)
.filter(Context::isLeader)
.skip(maxLeadershipsCount)
.forEach(Context::yield);
}
}

Laravel Scheduling in clustered environment

I am working with scheduling in Laravel 5.3. Previously, I was using one server to host the laravel application. Now that I am using two servers to run the Laravel App, how do I ensure that both servers are not running the same jobs at the same time?
Recently, I saw an Event method called "withoutOverlapping()". See https://laravel.com/docs/5.3/scheduling#preventing-task-overlaps
In my case, withoutOverlapping() cannot help me as I am working in a clustered environment.
Are there any workarounds or suggestions regarding this?
First of all, define if it is critical or not to avoid running task multiple times.
For example, if your app is using a task to do some sort of cleanup, there is almost no drawback to run it on every server (who care if you try to delete messages with +10 min twice?)
If it is absolutely critical to run every task only one time, you'll need to define a "main server" that will execute tasks, and a slave server that will just answer to requests but not perform any task. This is quite trivial as you just have to give every env a different name in your .env, and test against that when you define the scheduler tasks.
This is the easiest way, seriously don't bother making a database locking mecanism or whatever so you can synchronise tasks accross servers. Even OS's struggle to manage properly synchronisation against threads on the same machine, why do you want to implement the same accross different machines?
Here's what I've done when I ran into the same problems with load balancing:
class MutexCommand extends Command {
private $hash = null;
public function cleanup() {
if (is_string($this->hash)) {
Redis::del($this->hash);
$this->hash = null;
}
}
protected abstract function generateHash();
protected abstract function handleInternal();
public final function handle() {
register_shutdown_function([$this,"cleanup"]);
try {
$this->hash = $this->generateHash();
//Set a value if it does not exist atomically. Will fail if it does exist.
//Essentially setnx is the mechanism to acquire the lock
if (!Redis::setnx($this->hash,true)) {
$this->hash = null; //Prevent it from being cleaned up
throw new Exception("Already running");
}
$this->handleInternal();
} finally {
$this->cleanup();
}
}
}
Then you can write your commands:
class ThisShouldNotOverlap extends MutexCommand {
public function generateHash() {
return "Unique key for mutex, you can just use the class name if you want by doing return static::class";
}
public function handleInternal() { /* do stuff */ }
}
Then whenever you try to run the same command on multiple instances one would successfully acquire the "lock" and the others should fail.
Of course this assumes that you are using a non-clustered redis cache.
If you are not using redis then there's probably similar locking mechanisms you can implement in other caches, if you are using a clustered redis then you may need to use the RedLock locking mechanism
Essentially no, there's no a natural way using Laravel to know if another Laravel app have the same job on the job dispatcher.
We have some options there to find a solution:
Create a intermediate app that manages the jobs from the other apps.
Allow only one app to dispatch jobs.
Use worker queues, you have some packages for this, I would recommend to use Laravel 5 with WebSockets and Queue Asynchronously.
First of all Laravel scheduler isn't designed to work in a clustered environment. It was never intended to be that way.
I would suggest you should have a dedicated cron instance which manages your Laravel scheduler jobs.

Resources