How to run multiple Apache Ignite nodes on same JVM? - caching

This is my Java implementation of Ignite as a caching layer.
public static void main(String[] args) throws IOException {
Properties conf = getConfiguration();
IgniteConfiguration cfg = new IgniteConfiguration();
CacheConfiguration configuration = new CacheConfiguration();
configuration.setName("ignt");
DataSource dataSource = new DataSource();
dataSource.setContactPoints(conf.getProperty("cass.contactPoints"));
RoundRobinPolicy robinPolicy = new RoundRobinPolicy();
dataSource.setLoadBalancingPolicy(robinPolicy);
dataSource.setReadConsistency("ONE");
dataSource.setWriteConsistency("ONE");
dataSource.setProtocolVersion(4);
dataSource.setPort(9042);
configuration.setWriteThrough(true);
configuration.setReadThrough(true);
configuration.setWriteBehindEnabled(true);
configuration.setWriteBehindFlushFrequency(30000);
String persistenceSettingsXml = FileUtils.readFileToString(new File(conf.getProperty("ignite.persistenceSettings")), "utf-8");
KeyValuePersistenceSettings persistenceSettings = new KeyValuePersistenceSettings(persistenceSettingsXml);
System.out.println(persistenceSettings.getKeyspace());
CassandraCacheStoreFactory cacheStoreFactory = new CassandraCacheStoreFactory();
cacheStoreFactory.setDataSource(dataSource);
cacheStoreFactory.setPersistenceSettings(persistenceSettings);
configuration.setCacheStoreFactory(cacheStoreFactory);
cfg.setCacheConfiguration(configuration);
cfg.setGridName("g1");
Ignite ignite=Ignition.getOrStart(cfg);
System.out.println(cfg.getNodeId());
cfg.setGridName("g2");
Ignite igTwo = Ignition.getOrStart(cfg);
}
Is there a way to run multiple nodes (on localhost) from the same JVM program? If it is not possible to run multiple nodes from the same Java program, is there a way to run all the nodes from the command prompt separately and then connect to them from the Java application?

Yes, you can, and in process of running Java tests we run dozens of Ignite instances in the same VM. They are lightweight and they start up pretty fast.
You should just make sure to set a different igniteInstanceName on IgniteConfiguration. Please also note that you can't reuse IgniteConfiguration when starting both instances. Create a factory method to build two IgniteConfiguration copies, one for every instance.

Related

How to create apache spark standalone cluster for integration testing using TestContainers?

Is anyone knows how to create an apache-spark cluster for integration testing using testContainers https://www.testcontainers.org/
any running example please, i am struggling to find that.
I was able to create this kind of integration test using the GenericContainer class and the bitnami/spark image. It's the following code (I wrote it for a library that writes a dataframe to AWS SQS).
The idea is creating a Spark container (in this case, it's not a cluster, but just the master node), copying all the files needed to run the test (some Python files and all the dependencies), issuing the spark-submit command and checking the final state (a message in the Localstack's SQS service in another container).
#Testcontainers
public class SparkIntegrationTest {
private static Network network = Network.newNetwork();
#Container
public LocalStackContainer localstack = new LocalStackContainer(DockerImageName.parse("localstack/localstack:0.12.13"))
.withNetwork(network)
.withNetworkAliases("localstack")
.withServices(SQS);
#Container
public GenericContainer spark = new GenericContainer(DockerImageName.parse("bitnami/spark:3.1.2"))
.withCopyFileToContainer(MountableFile.forHostPath("build/resources/test/.", 0744), "/home/")
.withCopyFileToContainer(MountableFile.forHostPath("build/libs/.", 0555), "/home/")
.withNetwork(network)
.withEnv("AWS_ACCESS_KEY_ID", "test")
.withEnv("AWS_SECRET_KEY", "test")
.withEnv("SPARK_MODE", "master");
#Test
public void shouldPutASQSMessageInLocalstackUsingSpark() throws IOException, InterruptedException {
String expectedBody = "my message body"; // the same value in resources/sample.txt
AmazonSQS sqs = AmazonSQSClientBuilder.standard()
.withEndpointConfiguration(localstack.getEndpointConfiguration(SQS))
.withCredentials(localstack.getDefaultCredentialsProvider())
.build();
sqs.createQueue("my-test");
org.testcontainers.containers.Container.ExecResult lsResult =
spark.execInContainer("spark-submit",
"--jars", "/home/spark-aws-messaging-0.3.1.jar,/home/deps/aws-java-sdk-core-1.12.12.jar,/home/deps/aws-java-sdk-sqs-1.12.12.jar",
"--master", "local",
"/home/sqs_write.py",
"/home/sample.txt",
"http://localstack:4566");
System.out.println(lsResult.getStdout());
System.out.println(lsResult.getStderr());
assertEquals(0, lsResult.getExitCode());
String queueUrl = sqs.getQueueUrl("my-test").getQueueUrl()
.replace("localstack", localstack.getContainerIpAddress());
List<Message> messages = sqs.receiveMessage(queueUrl)
.getMessages();
assertEquals(expectedBody, messages.get(0).getBody());
}
}
There's still a drawback: It's a black box, I can't measure code coverage.

How to initialize SpringContext once and share across tasks?

I am trying to initialize spring context in my Spark application. I want the context in my slave nodes as well as I want to re-use the beans. Here is the code for the same:-
shipperRD2.foreach(shipper->{
AmazonS3 amazonS3Client = AmazonS3ClientBuilder.standard().build();
FileSystemXmlApplicationContext context2 = new FileSystemXmlApplicationContext("https://s3.console.aws.amazon.com/s3/object/spring-configuration/app-context.xml");
PersistenceWrapper persistenceWrapper = context.getBean(PersistenceWrapper.class);
});
However, this is leading to context refresh every time a new task runs on the slave node. Is there any way to avoid this behavior. basically, just initialize the context on the first task run, and re-use that context in the subsequent tasks.
As mentioned by Jacek, I tried the singleton pattern and it worked.
public class SpringInit {
private static FileSystemXmlApplicationContext context = new FileSystemXmlApplicationContext(fileName);
private SpringInit(){
}
public static FileSystemXmlApplicationContext getInstance(){
return context;
}
}
From the spark,
shipperRD2.foreach(shipper->{
FileSystemXmlApplicationContext context = SpringInit.getInstance();
PersistenceWrapper persistenceWrapper = context.getBean(PersistenceWrapper.class);
});

jdbc connection pool using ThreadpoolExecutor in spring boot

I have an application that runs through multiple databases and for each database runs select query on all tables and dumps it to hadoop.
My design is to create one datasource connection at a time and use the connection pool obtained to run select queries in multiple threads. Once done for this datasource, close the connection and create new one.
Here is the Async code
#Component
public class MySampleService {
private final static Logger LOGGER = Logger
.getLogger(MySampleService.class);
#Async
public Future<String> callAsync(JdbcTemplate template, String query) throws InterruptedException {
try {
jdbcTemplate.query(query);
//process the results
return new AsyncResult<String>("success");
}
catch (Exception ex){
return new AsyncResult<String>("failed");
}
}
Here is the caller
public String taskExecutor() throws InterruptedException, ExecutionException {
Future<String> asyncResult1 = mySampleService.callAsync(jdbcTemplate,query1);
Future<String> asyncResult2 = mySampleService.callAsync(jdbcTemplate,query2);
Future<String> asyncResult3 = mySampleService.callAsync(jdbcTemplate,query3);
Future<String> asyncResult4 = mySampleService.callAsync(jdbcTemplate,query4);
LOGGER.info(asyncResult1.get());
LOGGER.info(asyncResult2.get());
LOGGER.info(asyncResult3.get());
LOGGER.info( asyncResult4.get());
//now all threads finished, close the connection
jdbcTemplate.getConnection().close();
}
I am wondering if this is a right way to do it or do any exiting/optimized solution that out of box I am missing. I can't use spring-data-jpa since my queries are complex.
Thanks
Spring Boot docs:
Production database connections can also be auto-configured using a
pooling DataSource. Here’s the algorithm for choosing a specific
implementation:
We prefer the Tomcat pooling DataSource for its performance and concurrency, so if that is available we always choose it.
Otherwise, if HikariCP is available we will use it.
If neither the Tomcat pooling datasource nor HikariCP are available and if Commons DBCP is available we will use it, but we
don’t recommend it in production.
Lastly, if Commons DBCP2 is available we will use it.
If you use the spring-boot-starter-jdbc or
spring-boot-starter-data-jpa ‘starters’ you will automatically get a
dependency to tomcat-jdbc.
So you should be provided with sensible defaults.

Understanding HBase Java Client

I started Hbase few days back and going through all the material of online.
I have installed and configured HBase and shell commands are working fine.
I got an example of Java client to get data from HBase Table and it executed successfully but I could not understand how it is working? In the code nowhere we have mentioned the port, host of Hbase server? How it able to fetch the data from table?
This is my code:
public class RetriveData {
public static void main(String[] args) throws IOException {
// Instantiating Configuration class
Configuration config = HBaseConfiguration.create();
// Instantiating HTable class
#SuppressWarnings({ "deprecation", "resource" })
HTable table = new HTable(config, "emp");
// Instantiating Get class
Get g = new Get(Bytes.toBytes("1"));
// Reading the data
Result result = table.get(g);
// Reading values from Result class object
byte [] value = result.getValue(Bytes.toBytes("personal data"),Bytes.toBytes("name"));
byte [] value1 = result.getValue(Bytes.toBytes("personal data"),Bytes.toBytes("city"));
// Printing the values
String name = Bytes.toString(value);
String city = Bytes.toString(value1);
System.out.println("name: " + name + " city: " + city);
}
}
The output looks like:
Output:
name: raju city: hyderabad
I agree with Binary Nerds answer
adding some more interesting information for better understanding.
Your Question :
I could not understand how it is working? In the code nowhere we have
mentioned the port, host of Hbase server? How it able to fetch the
data from table?
Since you are executing this program in cluster
// Instantiating Configuration class
Configuration config = HBaseConfiguration.create()
all the cluster properties will be taken care from inside the cluster.. since you are in cluster and you are executing hbase java client program..
Now try like below (execute same program in different way from remote machine eclipse on windows to find out difference of what you have done earlier and now).
public static Configuration configuration; // this is class variable
static { //fill clusternode1,clusternode2,clusternode3 from your cluster
configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.property.clientPort", "2181");
configuration.set("hbase.zookeeper.quorum",
"clusternode1,clusternode2,clusternode3");
configuration.set("hbase.master", "clusternode1:600000");
}
Hope this heps you to understand.
If you look at the source code for HBaseConfiguration on github you can see what it does when it calls create().
public static Configuration create() {
Configuration conf = new Configuration();
// In case HBaseConfiguration is loaded from a different classloader than
// Configuration, conf needs to be set with appropriate class loader to resolve
// HBase resources.
conf.setClassLoader(HBaseConfiguration.class.getClassLoader());
return addHbaseResources(conf);
}
Followed by:
public static Configuration addHbaseResources(Configuration conf) {
conf.addResource("hbase-default.xml");
conf.addResource("hbase-site.xml");
checkDefaultsVersion(conf);
HeapMemorySizeUtil.checkForClusterFreeMemoryLimit(conf);
return conf;
}
So its loading the configuration from your HBase configuration files hbase-default.xml and hbase-site.xml.

Set heartbeatintervalseconds using spring xml

I am using spring-data-Cassandra v1.3.2 in my project.
Is it possible to set heartbeatintervalseconds using spring configuration XML file.
Getting 4 lines of hearbeat DEBUG logs every 30 seconds in my application logs and i am not sure how to avoid them.
Unfortunately, no.
After reviewing the SD Cassandra CassandraCqlClusterParser class, it is apparent that you can specify both "local" and "remote" connection pooling options, however, neither handler handles all the Cassandra Java driver "pooling options" appropriately (such as heartbeatIntervalSeconds).
It appears several other options are missing as well: idleTimeoutSeconds, initializationExecutor, poolTimeoutMillis, and protocolVersion.
Equally unfortunate is it appears the SD Cassandra PoolOptionsFactoryBean does not support these "pooling options" either.
However, not all is lost.
While your SD Cassandra application may resolve it's configuration primarily from XML, it does not preclude you from using a combination of Java config and XML.
For instance, you could use a Spring Java config class to configure your cluster and express your PoolingOptions in Java config...
#Configuration
#ImportResource("/class/path/to/cassandra/config.xml")
class CassandraConfig {
#Bean
PoolingOptions poolingOptions() {
PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setHeartbeatIntervalSeconds(30);
poolingOptions.setIdleTimeoutSeconds(300);
poolingOptions.setMaxConnectionsPerHost(50);
poolingOptions.set...
return poolingOptions;
}
#Bean
CassandraClusterFactoryBean cluster() {
CassandraClusterFactoryBean cluster = new CassandraClusterFactoryBean()
cluster.setContactPoints("..");
cluster.setPort(1234);
cluster.setPoolingOptions(poolingOptions());
cluster.set...
return cluster;
}
}
Hope this helps.
As an FYI, you may want to upgrade to the "current" Spring Data Cassandra version, 1.4.1.RELEASE.
Sadly, but the answer is no. It's not possible to configure the heartbeat interval using XML configuration. Only the following local/remote properties can be configured in PoolingOptions:
min-simultaneous-requests
max-simultaneous-requests
core-connections
max-connections
If you switch to Java-based configuration, then you're able to configure PoolingOptions by extending AbstractClusterConfiguration:
#Configuration
public class MyConfig extends AbstractClusterConfiguration {
#Override
protected PoolingOptions getPoolingOptions() {
PoolingOptions poolingOptions = new PoolingOptions();
poolingOptions.setHeartbeatIntervalSeconds(10);
return poolingOptions
}
}

Resources