Hazelcast persisting and loading data on all nodes - caching

I have a 2 node setup distributed cache setup which needs persistence setup for both members.
I have MapSore and Maploader implemented and the same code is deployed on both nodes.
The MapStore and MapLoader work absolutely ok on a single member setup, but after another member joins, MapStore and Maploader continue to work on the first member and all insert or updates by the second member are persisted to disk via the first member.
My requirement is that each member should be able to persist to disk independently so that distributed cache is backed up on all members and not just the first member.
Is there a setting I can change to achieve this.
Here is my Hazlecast Spring Configuration.
#Bean
public HazelcastInstance hazelcastInstance(H2MapStorage h2mapStore) throws IOException{
MapStoreConfig mapStoreConfig = new MapStoreConfig();
mapStoreConfig.setImplementation(h2mapStore);
mapStoreConfig.setWriteDelaySeconds(0);
YamlConfigBuilder configBuilder=null;
if(new File(hazelcastConfiglocation).exists()) {
configBuilder = new YamlConfigBuilder(hazelcastConfiglocation);
}else {
configBuilder = new YamlConfigBuilder();
}
Config config = configBuilder.build();
config.setProperty("hazelcast.jmx", "true");
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.setMapStoreConfig(mapStoreConfig);
return Hazelcast.newHazelcastInstance(config);
}
Here is my hazlecast yml config - This is placed in /opt/hazlecast.yml which is picked up by my spring config up above.
hazelcast:
group:
name: tsystems
management-center:
enabled: false
url: http://localhost:8080/hazelcast-mancenter
network:
port:
auto-increment: true
port-count: 100
port: 5701
outbound-ports:
- 0
join:
multicast:
enabled: false
multicast-group: 224.2.2.3
multicast-port: 54327
tcp-ip:
enabled: true
member-list:
- 192.168.1.13
Entire code is available here :
[https://bitbucket.org/samrat_roy/hazelcasttest/src/master/][1]

This might just be bad luck and low data volumes, rather than an actual error.
On each node, try the running the localKeySet() method and printing the results.
This will tell you which keys are on which node in the cluster. The node that owns key "X" will invoke the map store for that key, even if the update was initiated by another node.
If you have low data volumes, it may not be a 50/50 data split. At an extreme, 2 data records in a 2-node cluster could have both data records on the same node.
If you have a 1,000 data records, it's pretty unlikely that they'll all be on the same node.
So the other thing to try is add more data and update all data, to see if both nodes participate.

Ok after struggling a lot I noticed a teeny tiny buy critical detail.
Datastore needs to be a centralized system that is accessible from all Hazelcast members. Persistence to a local file system is not supported.
This is absolutely in line with what I was observing
[https://docs.hazelcast.org/docs/latest/manual/html-single/#loading-and-storing-persistent-data]
However not be discouraged, I found out that I could use event listeners to do the same thing I needed to do.
#Component
public class HazelCastEntryListner
implements EntryAddedListener<String,Object>, EntryUpdatedListener<String,Object>, EntryRemovedListener<String,Object>,
EntryEvictedListener<String,Object>, EntryLoadedListener<String,Object>, MapEvictedListener, MapClearedListener {
#Autowired
#Lazy
private RequestDao requestDao;
I created this class and hooked it into the config as so
MapConfig mapConfig = config.getMapConfig("requests");
mapConfig.addEntryListenerConfig(new EntryListenerConfig(entryListner, false, true));
return Hazelcast.newHazelcastInstance(config);
This worked flawlessly, I am able to replicate data over to both the embedded databases on each node.
My use case was to cover HA failover edge-cases. During HA failover, The slave node needed to know the working memory of the active node.
I am not using hazelcast as a cache, rather I am using as a data syncing mechanism.

Related

Clear remote Redis cache in Spring Boot application

I'm using Spring Boot 2.3 and I'm using the default cache mechanism using app.properties.
I defined all values:
spring.cache.type = redis
spring.redis.host = host
spring.redis.port = port
spring.redis.timeout = 4000
spring.redis.password = psw
spring.cache.redis.time-to-live = 28800000
I take advantage of the cache in Spring Repository for example:
#Cacheable(cacheNames = "contacts")
#Override
Page<Contact> findAll(Specification specification, Pageable pageable);
It works as expected. Redis, however is a cluster used from a couple of my applications and I need the second application is able to remove some/all keys in redis.
The applicazione A1 take advantage of the cache and put keys inside. The app A2, need to clear some keys or all keys.
In A2 I did:
cacheManager.getCacheNames().forEach(cacheName -> cacheManager.getCache(cacheName).clear());
but of course the list of cache names is empty becuase in this app I don't add keys to the cache and, anyway, I don't have the same keys of the A1.
I should list remote keys and then I need to clear them. Is there a simple way without using Spring Data Redis library?
You could define a separate prefix for entire your cache in the Redis. Something like a namespace for your cache entries.
And after you could flush all keys in this namespace.
Note: ensure that CacheManager has only Redis cache and doesn't have an in-memory cache (L1).

Changing hazelcast configuration at runtime

Is it possible to change Hazelcast configuration at runtime and if so what parameters are modifiable.
It seems to be possible using Hazelcast Management Center but can't find any examples/references in official docos/forums.
Might be a bit late to answer your question but better late than never :)
You can modify some of the map config properties after the map has been created using the MapService:
HazelcastInstance instance = Hazelcast.newHazelcastInstance();
// create map
IMap<String, Integer> myMap = instance.getMap("myMap");
// create a new map config
MapConfig newMapConfig = instance.getConfig().getMapConfig("myMap").setAsyncBackupCount(1);
// submit the new map config to the map service
MapService mapService = (MapService)(((AbstractDistributedObject)instance.getDistributedObject(MapService.SERVICE_NAME, "")).getService());
mapService.getMapServiceContext().getMapContainer("myMap").setMapConfig(newMapConfig);
Note that this API is not visible/documented so it might not work in future versions.
We are using this in our application when we need to insert several million entries in a distributed map at startup. Disabling the backup cut the insertion time by 30%. After the data are inserted, we enable the backup.
The Hazelcast internals are not really designed to be modifiable. What do you want to modify?

Connecting to elasticsearch cluster in NEST

Let's assume I have several elasticsearch machines in a cluster: 192.168.1.1, 192.168.1.2 and 192.168.1.3
Any of the machines can go down. It doesn't look like NEST supports providing a range of IPs to try to connect.
So how do I make sure I connect to any of the available machines from Nest? Just try to open connection to one, if TryConnect didn't work, try another?
You can run a local ES instance at your application server (eg your web server) and config it to work as a load balancer:
Set node.client: true (or node.master: false and node.data: false) for this local ES config to make it a load balancer. This mean ES will not become master nor contains data
Config it to join the cluster (your 3 nodes don't need to know this ES)
Config NEST to use local ES as your search server
Then this ES become a part of your cluster, and will distribute your requests to suitable nodes
If you don't want "load balancer", then you have to manually checking on client side to determine which node is alive.
Since you have a small set of nodes, you can use a StaticConnectionPool:
var uri1 = new Uri("192.168.1.1");
var uri2 = new Uri("192.168.1.2");
var uri3 = new Uri("192.168.1.3");
var uris = new List<Uri> { uri1, uri2, uri3 };
var connectionPool = new StaticConnectionPool(uris);
var connectionSettings = new ConnectionSettings(connectionPool); // <-- need to be reused
var client = new ElasticClient(connectionSettings);
An important point to keep in mind is to reuse the same ConnectionSetting when creating a new elastic client, since elasticsearch cache is per ConnectionSetting. See this GitHub post:
...In any case its important to share the same ConnectionSettings
instance across any elastic client you instantiate. ElasticClient can
be a singleton or not as long as each instance shares the same
ConnectionSettings instance.
All of our caches are per ConnectionSettings, this includes
serialization caches.
Also a single ConnectionSettings holds a single IConnectionPool and
IConnection something you definitely want to reuse across requests.
I would set up one of the nodes as a load balancer. Meaning that the URL your are calling should allways be up.
Though if you increase the number of replicas you can call any of the nodes by URL and still access the same data. ElasticSearch does not care which one you access while in a cluster. So you could build your own range of ips in your application.

Get Hbase region size via API

I am trying to write a balancer tool for Hbase which could balance regions across regionServers for a table by region count and/or region size (sum of storeFile sizes). I could not find any Hbase API class which returns the regions size or related info. I have already checked a few of the classes which could be used to get other table/region info, e.g. org.apache.hadoop.hbase.client.HTable and HBaseAdmin.
I am thinking, another way this could be implemented is by using one of the Hadoop classes which returns the size of the directories in the fileSystem, for e.g. org.apache.hadoop.fs.FileSystem lists the files under a particular HDFS path.
Any suggestions ?
I use this to do managed splits of regions, but, you could leverage it to load-balance on your own. I also load-balance myself to spread the regions ( of a given table ) evenly across our nodes so that MR jobs are evenly distributed.
Perhaps the code-snippet below is useful?
final HBaseAdmin admin = new HBaseAdmin(conf);
final ClusterStatus clusterStatus = admin.getClusterStatus();
for (ServerName serverName : clusterStatus.getServers()) {
final HServerLoad serverLoad = clusterStatus.getLoad(serverName);
for (Map.Entry<byte[], HServerLoad.RegionLoad> entry : serverLoad.getRegionsLoad().entrySet()) {
final String region = Bytes.toString(entry.getKey());
final HServerLoad.RegionLoad regionLoad = entry.getValue();
long storeFileSize = regionLoad.getStorefileSizeMB();
// other useful thing in regionLoad if you like
}
}
What's wrong with the default Load Balancer?
From the Wiki:
The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via hbase.balancer.period and defaults to 300000 (5 minutes).
If you really want to do it yourself you could indeed use the Hadoop API and more specifally, the FileStatus class. This class acts as an interface to represent the client side information for a file.

Run #Scheduled task only on one WebLogic cluster node?

We are running a Spring 3.0.x web application (.war) with a nightly #Scheduled job in a clustered WebLogic 10.3.4 environment. However, as the application is deployed to each node (using the deployment wizard in the AdminServer's web console), the job is started on each node every night thus running multiple times concurrently.
How can we prevent this from happening?
I know that libraries like Quartz allow coordinating jobs inside clustered environment by means of a database lock table or I could even implement something like this myself. But since this seems to be a fairly common scenario I wonder if Spring does not already come with an option how to easily circumvent this problem without having to add new libraries to my project or putting in manual workarounds.
We are not able to upgrade to Spring 3.1 with configuration profiles, as mentioned here
Please let me know if there are any open questions. I also asked this question on the Spring Community forums. Thanks a lot for your help.
We only have one task that send a daily summary email. To avoid extra dependencies, we simply check whether the hostname of each node corresponds with a configured system property.
private boolean isTriggerNode() {
String triggerHostmame = System.getProperty("trigger.hostname");;
String hostName = InetAddress.getLocalHost().getHostName();
return hostName.equals(triggerHostmame);
}
public void execute() {
if (isTriggerNode()) {
//send email
}
}
We are implementing our own synchronization logic using a shared lock table inside the application database. This allows all cluster nodes to check if a job is already running before actually starting it itself.
Be careful, since in the solution of implementing your own synchronization logic using a shared lock table, you always have the concurrency issue where the two cluster nodes are reading/writing from the table at the same time.
Best is to perform the following steps in one db transaction:
- read the value in the shared lock table
- if no other node is having the lock, take the lock
- update the table indicating you take the lock
I solved this problem by making one of the box as master.
basically set an environment variable on one of the box like master=true.
and read it in your java code through system.getenv("master").
if its present and its true then run your code.
basic snippet
#schedule()
void process(){
boolean master=Boolean.parseBoolean(system.getenv("master"));
if(master)
{
//your logic
}
}
you can try using TimerManager (Job Scheduler in a clustered environment) from WebLogic as TaskScheduler implementation (TimerManagerTaskScheduler). It should work in a clustered environment.
Andrea
I've recently implemented a simple annotation library, dlock, to execute a scheduled task only once over multiple nodes. You can simply do something like below.
#Scheduled(cron = "59 59 8 * * *" /* Every day at 8:59:59am */)
#TryLock(name = "emailLock", owner = NODE_NAME, lockFor = TEN_MINUTE)
public void sendEmails() {
List<Email> emails = emailDAO.getEmails();
emails.forEach(email -> sendEmail(email));
}
See my blog post about using it.
You don't neeed to synchronize your job start using a DB.
On a weblogic application you can get the instanze name where the application is running:
String serverName = System.getProperty("weblogic.Name");
Simply put a condition two execute the job:
if (serverName.equals(".....")) {
execute my job;
}
If you want to bounce your job from one machine to the other, you can get the current day in the year, and if it is odd you execute on a machine, if it is even you execute the job on the other one.
This way you load a different machine every day.
We can make other machines on cluster not run the batch job by using the following cron string. It will not run till 2099.
0 0 0 1 1 ? 2099

Resources