ClassNotFound error for Ignite User Defined Function in Flink Cluster - cluster-computing

I am trying to cache the data, streamed by Apache flink, into Apache Ignite cache. I also want to run the query which uses a User Defined Function. As per Ignite, I am using cacheConf.setSqlFunctionClasses(GetCacheKey.class) setting while declaring the cache. The class declaration is as follows,
public static class GetCacheKey implements Serializable{
#QuerySqlFunction
public static long getCacheKey(int mac, long local) {
long key=(local << 5) + mac;
return key;
}
}
When I run the code locally with Apache Flink, it works. But when I go for cluster execution of the code in Flink Cluster, I got an error that GetCacheKey class is not found. What will be the reason behind this?

Please, check if GetCacheKey.class is in ignite nodes classpaths.

The Flink directory must be available on every worker under the same path. You can use a shared NFS directory, or copy the entire Flink directory to every worker node.
Also ensure the Ignite libs are present in worker nodes classpath.

Related

How to restart ignite server with Spring config?

I have Ignite server nodes in my application with the following configuration, and this application is clustered hence there can be multiple ignite servers.
Ignite config looks like this:
#Bean
public Ignite igniteInstance(JdbcIpFinderDialect ipFinderDialect, DataSource dataSource) {
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setGridLogger(new Slf4jLogger());
cfg.setMetricsLogFrequency(0);
TcpDiscoverySpi discoSpi = new TcpDiscoverySpi()
.setIpFinder(new TcpDiscoveryJdbcIpFinder(ipFinderDialect).setDataSource(dataSource)
.setInitSchema(false));
cfg.setDiscoverySpi(discoSpi);
cfg.setCacheConfiguration(cacheConfigurations.toArray(new CacheConfiguration[0]));
cfg.setFailureDetectionTimeout(igniteFailureDetectionTimeout);
return Ignition.start(cfg);
}
But at some point after running it for a day or so, ignite falls over with errors in line with the followings.
o.a.i.spi.discovery.tcp.TcpDiscoverySpi : Node is out of topology (probably, due to short-time network problems
o.a.i.i.m.d.GridDiscoveryManager : Local node SEGMENTED: TcpDiscoveryNode [id=db3eb958-df2c-4211-b2b4-ba660bc810b0, addrs=[10.0.0.1], sockAddrs=[sd-9fdb-a8cb.nam.nsroot.net/10.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1612755975209, loc=true, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]
ROOT : Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SEGMENTATION, err=null]]
o.a.i.i.p.failure.FailureProcessor : Ignite node is in invalid state due to a critical failure.
ROOT : Stopping local node on Ignite failure: [failureCtx=FailureContext [type=SEGMENTATION, err=null]]
o.a.i.i.m.d.GridDiscoveryManager : Node FAILED: TcpDiscoveryNode [id=4d84f811-1c04-4f80-b269-a0003fbf7861, addrs=[10.0.0.1], sockAddrs=[sd-dc95-412b.nam.nsroot.net/10.0.0.1:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1612707966704, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]
o.a.i.i.p.cache.GridCacheProcessor : Stopped cache [cacheName=cacheOne]
o.a.i.i.p.cache.GridCacheProcessor : Stopped cache [cacheName=cacheTwo]
And whenever my applications' client nodes try to write in the server cache they fail with an error,
java.lang.IllegalStateException: class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): cacheOne
I am looking for a way to restart my Ignite Server node if it fails for such SEGMENTATION faults or any, some suggestions say that I will have to implement AbstractFailureHandler and setFailureHandler as that implementation but failed to find any examples.
You cannot restart an Ignite server node, so if you're using it in a Spring context you need a new context (usually means restarting an application).
Client node will try to reconnect, but if it can't, the same will apply.

How to run Hadoop as part of test suite of Spring application?

I would like to set up a simple "Hello, World!" to get an understanding of how to use basic Hadoop functionality such storing/reading files using HDFS.
Is it possible to:
Run an embedded Hadoop as part of my application?
Run an embedded Hadoop as part of my tests?
I would like to put together a minimal Spring Boot set up for this. What is the minimal Spring configuration required for this? There are sufficient examples illustrating how to read/write files using HDFS, but I still haven't been able to work out the what I need as Spring configuration. It's a bit hard to figure out what libraries one really needs, as it seems that the Spring Hadoop examples are out of date. Any help would be much appreciated.
You can easily use Hadoop Filesystem API 1 2 with any local POSIX filesystem without Hadoop cluster.
The Hadoop API is very generic and provides many concrete implementations for different storage systems such as HDFS, S3, Azure Data Lake Store, etc.
You can embed HDFS within your application (i.e run Namenode and Datanodes withing single JVM process), but this is only reasonable for tests.
There is Hadoop Minicluster which you can start from command-line (CLI MiniCluster) 3 or via Java API in your unit-tests with MiniDFSCluster class 4 found in hadoop-minicluster package.
You can start Mini Cluster with Spring by making a separate configuration for it and using it as #ContextConfiguration with your unit tests.
#org.springframework.context.annotation.Configuration
public class MiniClusterConfiguration {
#Bean(name = "temp-folder", initMethod = "create", destroyMethod = "delete")
public TemporaryFolder temporaryFolder() {
return new TemporaryFolder();
}
#Bean
public Configuration configuration(final TemporaryFolder temporaryFolder) {
final Configuration conf = new Configuration();
conf.set(
MiniDFSCluster.HDFS_MINIDFS_BASEDIR,
temporaryFolder.getRoot().getAbsolutePath()
);
return conf;
}
#Bean(destroyMethod = "shutdown")
public MiniDFSCluster cluster(final Configuration conf) throws IOException {
final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
.clusterId(String.valueOf(this.hashCode()))
.build();
cluster.waitClusterUp();
return cluster;
}
#Bean
public FileSystem fileSystem(final MiniDFSCluster cluster) throws IOException {
return cluster.getFileSystem();
}
#Bean
#Primary
#Scope(BeanDefinition.SCOPE_PROTOTYPE)
public Path temp(final FileSystem fs) throws IOException {
final Path path = new Path("/tmp", UUID.randomUUID().toString());
fs.mkdirs(path);
return path;
}
}
You will inject FileSystem and a temporary Path into your tests, and as I've mentioned above, there's no difference from API stand point in either it's a real cluster, mini-cluster, or local filesystem. Note that there is a startup cost of this, so you likely want to annotated your tests with #DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD) in order to prevent cluster restart for each test.
In you want this code to run on Windows you will need a compatibility layer called wintuils 5 (which makes possible to access Windows filesystem in a POSIX way).
You have to point environment variable HADOOP_HOME to it, and depending on version load its shared library
String HADOOP_HOME = System.getenv("HADOOP_HOME");
System.setProperty("hadoop.home.dir", HADOOP_HOME);
System.setProperty("hadoop.tmp.dir", System.getProperty("java.io.tmpdir"));
final String lib = String.format("%s/lib/hadoop.dll", HADOOP_HOME);
System.load(lib);

Spring with Apache Beam

I want to use Spring with Apache Beam that will run on Google Cloud Data flow Runner. Dataflow job should be able to use Spring Runtime application context while executing the Pipeline steps. I want to use Spring feature in my Apache Beam pipeline for DI and other stuff. After browsing hours on google, I couldn't find any post or documentation which shows Spring integration in Apache Beam. So, if anyone has tried spring with Apache beam, please let me know.
In main class i have initialised the spring application context but it is not available while execution of pipeline steps. I get null pointer exception for autowired beans. I guess the problem is, at runtime context is not available to worker threads.
public static void main(String[] args) {
initSpringApplicationContext();
GcmOptions options = PipelineOptionsFactory.fromArgs(args)
.withValidation()
.as(GcmOptions.class);
Pipeline pipeline = Pipeline.create(options);
// pipeline definition
}
I want to inject the spring application context to each of the ParDo functions.
The problem here is that the ApplicationContext is not available on any worker, as the main method is only called when constructing the job and not on any worker machine. Therefore, initSpringApplicationContext is never called on any worker.
I've never tried to use Spring within Apache Beam, but I guess moving initSpringApplicationContext in a static initializer block will lead to your expected result.
public class ApplicationContextHolder {
private static final ApplicationContext CTX;
static {
CTX = initApplicationContext();
}
public static ApplicationContext getContext() {
return CTX;
}
}
Please be aware that this alone shouldn't be considered as a best practice of using Spring within Apache Beam since it doesn't integrate well in the lifecycle of Apache Beam. For example, when an error happens during the initialization of the application context, it will appear in the first place where the ApplicationContextHolder is used. Therefore, I'd recommend to extract initApplicationContext out of the static initializer block and call it explicitly with regards to Apache Beam's Lifecycle. The setup phase would be a good place for this.

Enables Master/Replica operations with spring-boot-starter-data-redis-reactive

I'm using spring-boot-starter-data-redis-reactive and #SpringBootApplication annotation to auto configure redis connection. I have set up a redis cluster with 1 master and 2 slaves. I have the following config in the application.properties file
spring.redis.cluster.nodes=master-node:6379,slave1-node:6379,slave2-node:6379
I want to configure it so that all writes go to master, and all reads go to slaves (slave preferred).
I found that it is using Lettuce driver under the hood. In order to achieve this, I need to add .readFrom(SLAVE_PREFERRED) into the LettuceClientConfiguration. Looked at the org\springframework\boot\autoconfigure\data\redis\LettuceConnectionConfiguration.class, I don't see a way to add this config. Any idea how to achieve this?
You need to use the LettuceClientConfigurationBuilderCustomizer
public LettuceClientConfigurationBuilderCustomizer lettuceClientConfigurationBuilderCustomizer() {
return builder -> builder.readFrom(ReadFrom.REPLICA);
}

Develop programmatically a Jgroup Channel for Infinispan in a Cluster

I'm working with infinispan 8.1.0 Final and Wildfly 10 in a cluster set up.
Each server is started running
C:\wildfly-10\bin\standalone.bat --server-config=standalone-ha.xml -b 10.09.139.215 -u 230.0.0.4 -Djboss.node.name=MyNode
I want to use Infinispan in distributed mode in order to have a distributed cache. But for mandatory requirements I need to build a JGroups channel for dynamically reading some properties from a file.
This channel is necessary for me to build a cluster-group based on TYPE and NAME (for example Type1-MyCluster). Each server who wants to join a cluster has to use the related channel.
Sailing the net I have found some code like the one below:
public class JGroupsChannelServiceActivator implements ServiceActivator {
#Override
public void activate(ServiceActivatorContext context) {
stackName = "udp";
try {
channelServiceName = ChannelService.getServiceName(CHANNEL_NAME);
createChannel(context.getServiceTarget());
} catch (IllegalStateException e) {
log.log(Level.INFO, "channel seems to already exist, skipping creation and binding.");
}
}
void createChannel(ServiceTarget target) {
InjectedValue<ChannelFactory> channelFactory = new InjectedValue<>();
ServiceName serviceName = ChannelFactoryService.getServiceName(stackName);
ChannelService channelService = new ChannelService(CHANNEL_NAME, channelFactory);
target.addService(channelServiceName, channelService)
.addDependency(serviceName, ChannelFactory.class, channelFactory).install();
}
I have created the META-INF/services/....JGroupsChannelServiceActivator file.
When I deploy my war into the server, the operation fails with this error:
"{\"WFLYCTL0180: Services with missing/unavailable dependencies\" => [\"jboss.jgroups.channel.clusterWatchdog is missing [jboss.jgroups.stack.udp]\"]}"
What am I doing wrong?
How can I build a channel the way I need?
In what way I can tell to infinispan to use that channel for distributed caching?
The proposal you found is implementation dependent and might cause a lot of problems during the upgrade. I wouldn't recommend it.
Let me check if I understand your problem correctly - you need to be able to create a JGroups channel manually because you use some custom properties for it.
If that is the case - you could obtain a JGroups channel as suggested here. But then you obtain a JChannel instance which is already connected (so this might be too late for your case).
Unfortunately since Wildfly manages the JChannel (it is required for clustering sessions, EJB etc) the only way to get full control of JChannel creating process is using Infinispan embedded (library) mode. This would require adding infinispan-embedded into your WAR dependencies. After that you can initialize it similarly to this test.

Resources