Storm Program not running - apache-storm

So I was trying to learn apache storm and was using the tutorialspoint guide as a reference point for working with my first storm program(https://www.tutorialspoint.com/apache_storm/apache_storm_quick_guide.htm)
I do not get the call log count output as expected. My zookeeper however shuts down
My topology is:
public class logAnalyserStorm {
public static void main(String[] args) throws InterruptedException{
Config config = new Config();
config.setDebug(true);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("call-log-reader-spout", new FakeCallLogGeneratorSpout(),100);
builder.setBolt("call-log-creator-bolt", new callLogCreatorBolt()).shuffleGrouping("call-log-reader-spout");
builder.setBolt("call-log-counter-bolt", new callLogCounterBolt()).fieldsGrouping("call-log-creator-bolt", new Fields("call"));
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("logAnalyserStorm", config, builder.createTopology());
Thread.sleep(10000);
cluster.killTopology("logAnalyserStorm");
cluster.shutdown();
}
}
The error is:
20680 [Thread-10] INFO o.a.s.event - Event manager interrupted
20683 [main] INFO o.a.s.testing - Shutting down in process zookeeper
20683 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] INFO o.a.s.s.o.a.z.s.NIOServerCnxnFactory - NIOServerCnxn factory exited run method

I realized that my nimbus was not running. Ugh. Thank You

Change Thread.sleep(10000); to Thread.sleep(60000);

Related

Kafka Stream: org.apache.kafka.clients.consumer.ConsumerConfig.addDeserializerToConfig

I'm learning Kafka Streams and I'm getting an error, I have tried a few things but noting works
Input : value_1, value_2, value_3 ...............
public static void main(String[] args) throws InterruptedException {
String host = "127.0.0.1:9092";
String consumer_group = "firstGroup1";
String topic = "test1";
// create properties
Properties properties = new Properties();
properties.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, host);
properties.setProperty(StreamsConfig.APPLICATION_ID_CONFIG, consumer_group);
properties.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.StringSerde.class.getName());
properties.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.StringSerde.class.getName());
// create a topology
StreamsBuilder builder = new StreamsBuilder();
// input topic
KStream<String, String> inputtopic = builder.stream(topic);
// filter the value
KStream<String, String> filtered_stream = inputtopic.filter((k, v) -> ((v.equalsIgnoreCase("value_5")) || (v.equalsIgnoreCase("value_7")) || (v.equalsIgnoreCase("value_9"))));
filtered_stream.foreach((k, v) -> System.out.println(v));
// output topic set
filtered_stream.to("prime_value");
// build a topology
KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), properties);
// start our stream system
kafkaStreams.start();
}
Error message
1800 [main] INFO org.apache.kafka.streams.processor.internals.assignment.AssignorConfiguration - stream-thread [firstGroup1-d1244e8e-dbc1-4139-8876-ca75cb89c609-StreamThread-1-consumer] Cooperative rebalancing enabled now
1852 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version: 3.0.0
1852 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 8cb0a5e9d3441962
1852 [main] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1641673582569
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.ConsumerConfig.addDeserializerToConfig(Ljava/util/Map;Lorg/apache/kafka/common/serialization/Deserializer;Lorg/apache/kafka/common/serialization/Deserializer;)Ljava/util/Map;
at org.apache.kafka.streams.processor.internals.StreamThread$InternalConsumerConfig.<init>(StreamThread.java:537)
at org.apache.kafka.streams.processor.internals.StreamThread$InternalConsumerConfig.<init>(StreamThread.java:535)
at org.apache.kafka.streams.processor.internals.StreamThread.<init>(StreamThread.java:527)
at org.apache.kafka.streams.processor.internals.StreamThread.create(StreamThread.java:406)
at org.apache.kafka.streams.KafkaStreams.createAndAddStreamThread(KafkaStreams.java:897)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:887)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:783)
at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:693)
at com.example.kafkastreams.main(kafkastreams.java:47)
Line no 47: { KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), properties);}
It looks like you have mismatched versions of kafka-clients and kafka-streams - they must be the same version.
When using Spring Boot; you should not add versions for the kafka dependencies; Boot will bring in the correct versions of both libraries.

Client is not connected to any Elasticsearch nodes in Flink

I am using Flink 1.1.2 and have added ElesticSearch dependency in Maven as follows
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch2_2.10</artifactId>
<version>1.2.0</version>
</dependency>
My program contains the following code that is reading data from Kafka and inserting to Elastic search
public class ReadFromKafka {
public static void main(String[] args) throws Exception {
// create execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", "test");
DataStream<JoinedStreamEvent> message = env.addSource(new FlinkKafkaConsumer09<JoinedStreamEvent>("test",
new JoinSchema(), properties));
System.out.println("reading form kafka ");
message.print();
Map<String, String> config = new HashMap<>();
config.put("bulk.flush.max.actions", "1"); // flush inserts after every event
config.put("cluster.name", "elasticsearch_amar"); // default cluster name
List<InetSocketAddress> transports = new ArrayList<>();
// set default connection details
transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
message.addSink(new ElasticsearchSink<>(config,transports,new ElasticInserter()));
env.execute();
} //main
public static class ElasticInserter implements ElasticsearchSinkFunction<JoinedStreamEvent>{
#Override
public void process(JoinedStreamEvent record, RuntimeContext runtimeContext, RequestIndexer requestIndexer) {
Map<String, Integer> json = new HashMap<>();
json.put("Time", record.getPatient_id());
json.put("heart Rate ", record.getHeartRate());
json.put("resp rete", record.getRespirationRate());
IndexRequest rqst = Requests.indexRequest()
.index("nyc-places") // index name
.type("popular-locations") // mapping name
.source(json);
requestIndexer.add(rqst);
} //process
} //ElasticInserter
} //ReadFromKafka
I have installed ElesticSearch using homebrew and then started it using elesticsearch command as shown below
however, when I start my program I got following error
my reputation below 50, can not comment.
I have a bit of suggestion:
first check whether ES is up,
see Can't Connect to Elasticsearch (through Curl).
recommended to use the docker container to start ES, eg. docker run -d --name es -p 9200:9200 elasticsearch:2 -Des.network.host=0.0.0.0
BTW, You can try: modify es.network.host value to 0.0.0.0 in ES config elasticsearch.yml:

DRPC Server error in storm

I am trying to execute the below code and getting an error .. Not sure if i am missing something here.. Also where would i see the output?
Error
java.lang.RuntimeException: No DRPC servers configured for topology
at backtype.storm.drpc.DRPCSpout.open(DRPCSpout.java:79)
at storm.trident.spout.RichSpoutBatchTriggerer.open(RichSpoutBatchTriggerer.java:58)
at backtype.storm.daemon.executor$fn__5802$fn__5817.invoke(executor.clj:519)
at backtype.storm.util$async_loop$fn__442.invoke(util.clj:434)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:744)
Code:
----
package com.**.trident.storm;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import storm.kafka.*;
import storm.trident.*;
import backtype.storm.*;
public class EventTridentDrpcTopology
{
private static final String KAFKA_SPOUT_ID = "kafkaSpout";
private static final Logger log = LoggerFactory.getLogger(EventTridentDrpcTopology.class);
public static StormTopology buildTopology(OpaqueTridentKafkaSpout spout) throws Exception
{
TridentTopology tridentTopology = new TridentTopology();
TridentState ts = tridentTopology.newStream("event_spout",spout)
.name(KAFKA_SPOUT_ID)
.each(new Fields("mac_address"), new SplitMac(), new Fields("mac"))
.groupBy(new Fields("mac"))
.persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("maccount"))
.parallelismHint(4)
;
tridentTopology
.newDRPCStream("mac_count")
.each(new Fields("args"), new SplitMac(), new Fields("mac"))
.stateQuery(ts,new Fields("mac"),new MapGet(), new Fields("maccount"))
.each(new Fields("maccount"), new FilterNull())
.aggregate(new Fields("maccount"), new Sum(), new Fields("sum"))
;
return tridentTopology.build();
}
public static void main(String[] str) throws Exception
{
Config conf = new Config();
BrokerHosts hosts = new ZkHosts("xxxx:2181,xxxx:2181,xxxx:2181");
String topic = "event";
//String zkRoot = topologyConfig.getProperty("kafka.zkRoot");
String consumerGroupId = "StormSpout";
DRPCClient drpc = new DRPCClient("xxxx",3772);
TridentKafkaConfig tridentKafkaConfig = new TridentKafkaConfig(hosts, topic, consumerGroupId);
tridentKafkaConfig.scheme = new SchemeAsMultiScheme(new XScheme());
OpaqueTridentKafkaSpout opaqueTridentKafkaSpout = new OpaqueTridentKafkaSpout(tridentKafkaConfig);
StormSubmitter.submitTopology("event_trident", conf, buildTopology(opaqueTridentKafkaSpout));
}
}
You have to configure the locations of the DRPC servers and launch them.
See Remote mode DRPC on http://storm.apache.org/releases/0.10.0/Distributed-RPC.html
Launch DRPC server(s)
Configure the locations of the DRPC servers
Submit DRPC topologies to Storm cluster
Launching a DRPC server can be done with the storm script and is just like launching Nimbus or the UI:
bin/storm drpc
Next, you need to configure your Storm cluster to know the locations of the DRPC server(s). This is how DRPCSpout knows from where to read function invocations. This can be done through the storm.yaml file or the topology configurations. Configuring this through the storm.yaml looks something like this:
drpc.servers:
- "drpc1.foo.com"
- "drpc2.foo.com"

How to run the word count job on hadoop yarn from Java code?

I have a requirement like below:
there is a 30 node hadoop YARN cluster, and a client machine for job submission.
Let's use the wordcount MR example, since it's world famous. I'd like to submit and run the wordcount MR job from a java method.
So what's the code required to submit the job? anything specific to configurations on the client machine?
Hadoop should be present on your client machine, with the same configurations as other machines in your hadoop cluster.
To submit the MR job from a java method, please refer to java ProcessBuilder and pass the hadoop command to launch you wordcount example.
The command and necessary application specific requirements for wordcount can be found here
You should make a class that implements Tool. An example here:
public class AggregateJob extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Job job = new Job(getConf());
job.setJarByClass(getClass());
job.setJobName(getClass().getSimpleName());
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(ProjectionMapper.class);
job.setCombinerClass(LongSumReducer.class);
job.setReducerClass(LongSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int rc = ToolRunner.run(new AggregateJob(), args);
System.exit(rc);
}
}
This example was obtained from here. As #hamsa-zafar already say, the client machine should have present hadoop configuration, as any other node in the cluster.

How to use JobControl in hadoop

I want to merge two files into one.
I made two mappers to read, and one reducer to join.
JobConf classifiedConf = new JobConf(new Configuration());
classifiedConf.setJarByClass(myjob.class);
classifiedConf.setJobName("classifiedjob");
FileInputFormat.setInputPaths(classifiedConf,classifiedInputPath );
classifiedConf.setMapperClass(ClassifiedMapper.class);
classifiedConf.setMapOutputKeyClass(TextPair.class);
classifiedConf.setMapOutputValueClass(Text.class);
Job classifiedJob = new Job(classifiedConf);
//first mapper config
JobConf featureConf = new JobConf(new Configuration());
featureConf.setJobName("featureJob");
featureConf.setJarByClass(myjob.class);
FileInputFormat.setInputPaths(featureConf, featuresInputPath);
featureConf.setMapperClass(FeatureMapper.class);
featureConf.setMapOutputKeyClass(TextPair.class);
featureConf.setMapOutputValueClass(Text.class);
Job featureJob = new Job(featureConf);
//second mapper config
JobConf joinConf = new JobConf(new Configuration());
joinConf.setJobName("joinJob");
joinConf.setJarByClass(myjob.class);
joinConf.setReducerClass(JoinReducer.class);
joinConf.setOutputKeyClass(Text.class);
joinConf.setOutputValueClass(Text.class);
Job joinJob = new Job(joinConf);
//reducer config
//JobControl config
joinJob.addDependingJob(featureJob);
joinJob.addDependingJob(classifiedJob);
secondJob.addDependingJob(joinJob);
JobControl jobControl = new JobControl("jobControl");
jobControl.addJob(classifiedJob);
jobControl.addJob(featureJob);
jobControl.addJob(secondJob);
Thread thread = new Thread(jobControl);
thread.start();
while(jobControl.allFinished()){
jobControl.stop();
}
But, I get this message:
WARN mapred.JobClient:
Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
anyone help please..................
Which version of Hadoop are you using?
The warning you get will stop the program?
You don't need to use setJarByClass(). You can see my snippet, I can run it without using setJarByClass() method.
JobConf job = new JobConf(PageRankJob.class);
job.setJobName("PageRankJob");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(PageRankMapper.class);
job.setReducerClass(PageRankReducer.class);
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
JobClient.runJob(job);
You should implement your Job this way:
public class MyApp extends Configured implements Tool {
public int run(String[] args) throws Exception {
// Configuration processed by ToolRunner
Configuration conf = getConf();
// Create a JobConf using the processed conf
JobConf job = new JobConf(conf, MyApp.class);
// Process custom command-line options
Path in = new Path(args[1]);
Path out = new Path(args[2]);
// Specify various job-specific parameters
job.setJobName("my-app");
job.setInputPath(in);
job.setOutputPath(out);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
// Submit the job, then poll for progress until the job is complete
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
// Let ToolRunner handle generic command-line options
int res = ToolRunner.run(new Configuration(), new MyApp(), args);
System.exit(res);
}
}
This comes straight out of Hadoop's documentation here.
So basically your job needs to inherit from Configured and implement Tool. This will force you to implement run(). Then start your job from your main class using Toolrunner.run(<your job>, <args>) and the warning will disappear.
You need to have this code in the driver job.setJarByClass(MapperClassName.class);

Resources