Access to ZooKeeper from within Storm Bolt - apache-storm

Context: I would like to share configuration information within bolts and rather than passing via configuration files, would like to load that into ZooKeeper. When a bolt comes up it can read it from there.
My preference is to use the same ZooKeeper instance as Storm, so the question is how does one access the Storm ZooKeeper from within a bolt?
I have looked at the Java docs and afraid the path does not seem obvious.

Here is how I am using zookeeper in storm through the curator API:
List<String> servers = (List<String>) conf.get(Config.TRANSACTIONAL_ZOOKEEPER_SERVERS);
Long port = (Long) conf.get(Config.TRANSACTIONAL_ZOOKEEPER_PORT);
if (servers == null || port == null) {
servers = (List<String>) conf.get(Config.STORM_ZOOKEEPER_SERVERS);
port = (Long) conf.get(Config.STORM_ZOOKEEPER_PORT);
}
String connectionString = servers.get(0) + ":" + port.toString();
curatorFramework = CuratorFrameworkFactory.builder()
.connectString(connectionString)
.namespace(config.getNamespace())
.retryPolicy(new ExponentialBackoffRetry(1000, 3))
.build();
conf is the configuration object/map passed to each spout and bolt in the open or prepare method. namespace is a string to identify the path you will read and write to and is an attempt to keep all interactions with zookeeper separate from what storm is doing.

Related

Leader election initialisation for multiple roles in clustered environment

I am currently working with an implementation based on:
org.springframework.integration.support.leader.LockRegistryLeaderInitiator
Having multiple candidate roles so that only one application instance within the cluster is elected as leader for each role. During initialisation of the cluster if autoStartup property is set to true the first application instance that is initialised will be elected as leader for all roles. This is something that we want to avoid and instead have a fair distribution of the lead roles across the cluster.
One possible solution on the above might be that when the cluster is ready and properly initialised then invoke an endpoint that will execute:
lockRegistryLeaderInitiator.start()
For all instances in the cluster so that the election process starts and the roles are fairly distributed across instances. One drawback on that is that this needs to be part of the deployment process, adding somehow complexity.
What is the proposed best practice on the above? Are there any plans for additional features related? For example to autoStartup the leader election only when X application instances are available?
I suggest you to take a look into the Spring Cloud Bus project. I don't know its details, but looks like your idea about autoStartup = false for all the LockRegistryLeaderInitiator instances and their startup by some distributed event is the way to go.
Not sure what we can do for you from the Spring Integration perspective, but it fully feels like not its responsibility and all the coordinations and rebalancing should be done via some other tool. Fortunately all our Spring projects can be used together as a single platform.
I think with the Bus you even really can track the number of instances joined the cluster and decide your self when and how to publish StartLeaderInitiators event.
It would be relatively easy with the Zookeeper LeaderInitiator because you could check in zookeeper for the instance count before starting it.
It's not so easy with the lock registry because there's no inherent information about instances; you would need some external mechanism (such as zookeeper, in which case, you might as well use ZK).
Or, you could use something like Spring Cloud Bus (with RabbitMQ or Kafka) to send a signal to all instances that it's time to start electing leadership.
I find very simple approach to do this.
You could add scheduled task to each node which periodically tries to yield leaderships if node holds too many of them.
For example, if you have N nodes and 2*N roles and you want to achieve completely fair leadership distribution (each node tries to hold only two leaderships) you can use something like this:
#Component
#RequiredArgsConstructor
public class FairLeaderDistributor {
private final List<LeaderInitiator> initiators;
#Scheduled(fixedDelay = 300_000) // once per 5 minutes
public void yieldExcessLeaderships() {
initiators.stream()
.map(LeaderInitiator::getContext)
.filter(Context::isLeader)
.skip(2) // keep only 2 leaderships
.forEach(Context::yield);
}
}
When all nodes will be up, you will eventually get completely fair leadership distribution.
You can also implement dynamic distribution based on current active node count if you use Zookeeper LeaderInitiator implementation.
Current number of participants can be easily retrieved from Curator LeaderSelector::getParticipants method.
You can get LeaderSelector with reflection from LeaderInitiator.leaderSelector field.
#Slf4j
#Component
#RequiredArgsConstructor
public class DynamicFairLeaderDistributor {
final List<LeaderInitiator> initiators;
#SneakyThrows
private static int getParticipantsCount(LeaderInitiator leaderInitiator) {
Field field = LeaderInitiator.class.getDeclaredField("leaderSelector");
field.setAccessible(true);
LeaderSelector leaderSelector = (LeaderSelector) field.get(leaderInitiator);
return leaderSelector.getParticipants().size();
}
#Scheduled(fixedDelay = 5_000)
public void yieldExcessLeaderships() {
int rolesCount = initiators.size();
if (rolesCount == 0) return;
int participantsCount = getParticipantsCount(initiators.get(0));
if (participantsCount == 0) return;
int maxLeadershipsCount = (rolesCount - 1) / participantsCount + 1;
log.info("rolesCount={}, participantsCount={}, maxLeadershipsCount={}", rolesCount, participantsCount, maxLeadershipsCount);
initiators.stream()
.map(LeaderInitiator::getContext)
.filter(Context::isLeader)
.skip(maxLeadershipsCount)
.forEach(Context::yield);
}
}

How does the HDFS Client knows the block size while writing?

The HDFS Client is outside the HDFS Cluster. When the HDFS Client write the file to hadoop the HDFS clients split the files into blocks and then it will write the block to datanode.
The question here is how the HDFS Client knows the Blocksize ? Block size is configured in the Name node and the HDFS Client has no idea about the block size then how it will split the file into blocks ?
HDFS is designed in a way where the block size for a particular file is part of the MetaData.
Let's just check what does this mean?
The client can tell the NameNode that it will put data to HDFS with a particular block size.
The client has its own hdfs-site.xml that can contain this value, and can specify it on a per-request basis as well using the -Ddfs.blocksize parameter.
If the client configuration does not define this parameter, then it defaults to the org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_DEFAULT value which is 128MB.
NameNode can throw an error for the client if it specifies a blocksize that is smaller then dfs.namenode.fs-limits.min-block-size (1MB by default).
There is nothing magical in this, NameNode does know nothing about the data and let the client to decide the optimal splitting, as well as to define the replication factor for blocks of a file.
In simple words, When you do client URI deploy, it will place server URI into Client or you download and manually replace in client. So whenever client request for info, it will go to the NameNode and fetch the required info or place new info on DataNodes.
P.S: Client = EdgeNode
Some more details below (from the Hadoop Definitive Guide 4th edition)
"The client creates the file by calling create() on DistributedFileSystem (step 1 in
Figure 3-4). DistributedFileSystem makes an RPC call to the namenode to create a new
file in the filesystem’s namespace, with no blocks associated with it (step 2). The
namenode performs various checks to make sure the file doesn’t already exist and that the
client has the right permissions to create the file. If these checks pass, the namenode
makes a record of the new file; otherwise, file creation fails and the client is thrown an
IOException. The DistributedFileSystem returns an FSDataOutputStream for the client
to start writing data to. Just as in the read case, FSDataOutputStream wraps a
DFSOutputStream, which handles communication with the datanodes and namenode.
As the client writes data (step 3), the DFSOutputStream splits it into packets, which it writes to an internal queue called the data queue."
Adding more info in response to comment on this post:
Here is a sample client program to copy a file to HDFS (Source-Hadoop Definitive Guide)
public class FileCopyWithProgress {
public static void main(String[] args) throws Exception {
String localSrc = args[0];
String dst = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4096, true);
}
}
If you look at create() method implementation in FileSystem class, it has getDefaultBlockSize() as one of its arguments, which inturn fetches the values from configuration which is turn is provided by the namenode.
This is how client gets to know the block size configured on hadoop cluster.
Hope this helps

What needs to be changed when we switch Spark from Standalone to Yarn-Client?

Currently we have a program which is a web service, receiving SQL queries and use SQLContext to respond. The program is now in standalone mode, we set spark.master to a specific URL. The structure is something like below:
object SomeApp extends App
{
val conf = new SparkConf().setMaster("spark://10.21.173.181:7077")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
while(true)
{
val query = Listen_to_query()
val response = sqlContext.sql(query)
send(response)
}
}
Now we are going to shift the system to Spark on Yarn, and it seems that we should use submit to submit jobs to yarn. It would be strange to deploy such a "service" on yarn which won't stop like ordinary "Jobs". But we don't know how to separate "Jobs" from our program.
Do you have any suggestions? Thank you!
So if you just want to submit your jobs to yarn you can just change the master param. However it sounds like you are looking for a long running shared Spark Context and there are a few options for something like this. There is https://github.com/spark-jobserver/spark-jobserver and https://github.com/ibm-et/spark-kernel .

Storm - topology to topology

Is it possible or fine to emit the tuples from one topology to another topology?
Lets say in one topology, one specific bolt is doing storing of tuples into db. In another topology I don't want to duplicate or create the same bolt for storing the tuples. So from this second topology can I emit to first topology bolt?
-Hariprasad
While you cannot directly pass tuples from one topology to another, you can use queuing system such as Apache Kafka to accomplish what you described. Storm has Kafka spout packaged in their latest releases.
The setup requires two storm topologies (A and B) and one Kafka topic. Let's call it "transfer"
Within the A topology where you want to send data to the B topology, use a Kafka producer:
[The kafka initialization code is taken directly from the docs: https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example and obviously needs to be customized for your kafka installation.]
public void Execute(Tuple input){
...
Properties props = new Properties();
props.put("metadata.broker.list", "broker1:9092,broker2:9092 ");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class", "example.producer.SimplePartitioner");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String (config);
String msg = ...
KeyedMessage<String, String> data = new KeyedMessage<String, String>
("transfers", ip, msg);
producer.send(data);
producer.close();
From Topology B, you create a Kafka Spout when you initialize your topology:
BrokerHosts hosts = new ZkHosts(zkConnString);
SpoutConfig spoutConfig = new SpoutConfig(hosts, topicName, "/" + topicName,
UUID.randomUUID().toString());
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
// Now it's just like any other spout
topologyBuilder.setSpout(kafkaSpout);
That requires running kafka, of course (check out https://kafka.apache.org/08/quickstart.html).
[Edit: Reading your question again: it sounds like you have a reusable component (save tuple) that you want to call from two different topologies and you are trying to call one from the other. Another approach is to offload this task to a third topology devoted to handling saving tuples and just create kafka messages of the items that need to be persisted within your topologies. In this way, ALL events to save-tuple will be handled the same way.]
This is currently not supported, you can not pass on tuples from one topology to another.
Based on your use case why don't you use another bolt (within the same topology) subscribed to the db bolt instead of running a separate topology

Testing connection to HDFS

In order to test connection to HDFS from a java program, is it sufficient enough to rely on FileSystem.get(configuration) or additional sanity checks should be done to do so?(fo ex: some file-based operations like list,copy,delete)
FileSystem.get(Configuration) creates a DistrubutedFileSystem object, which in turn relies on a DFSClient to talk to the NameNode. Buried deep down in the source (1.0.2 is the version i'm looking through), is a call to create an RPC for the NameNode, which in turn creates a Proxy for the ClientProtocol interface.
When this proxy is created, (org.apache.hadoop.ipc.RPC.getProxy(Class<? extends VersionedProtocol>, long, InetSocketAddress, UserGroupInformation, Configuration, SocketFactory, int)), a call is made to ensure the server and client both talk the same 'version', so this confirmation affirms that a NameNode is running at the configured address:
VersionedProtocol proxy =
(VersionedProtocol) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[] { protocol },
new Invoker(protocol, addr, ticket, conf, factory, rpcTimeout));
long serverVersion = proxy.getProtocolVersion(protocol.getName(),
clientVersion);
if (serverVersion == clientVersion) {
return proxy;
} else {
throw new VersionMismatch(protocol.getName(), clientVersion,
serverVersion);
}
Of course, whether the NameNode has sufficient datanodes running to perform some actions (such as create / open files) is not reported by this version match check.

Resources