Getting backtype.storm.generated.InvalidTopologyException: null .If I comment call to setBolt it runs - apache-storm

If comment builder.setBolt then it runs correctly.Please tell where I am getting wrong
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("words", new TestWordSpout(), 3);
//Calling setBolt
builder.setBolt("exc", new ExclaimBolt(),3)
.allGrouping("words");
Config conf = new Config();
conf.setDebug(false);
//conf.setNumWorkers(2);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());

Had not declared output fields in Spout.
After declaring it works fine.
code example(in u bolt class):
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("msg"));//based on u project
}

Related

Apache Storm: Topology submission exception: [x] subscribes from non-existent stream

Sorry if the question is solved, but I tried to find it and I haven't had success. There are some similar, but I don't found help where I've seen. I have the next problem:
603 [main] WARN b.s.StormSubmitter - Topology submission exception:
Component: [escribirFichero] subscribes from non-existent stream:
[default] of component [buscamosEnKlout]
Exception in thread "main" java.lang.RuntimeException:
InvalidTopologyException(msg:Component:
[escribirFichero] subscribes from non-existent stream:
[default] of component [buscamosEnKlout])
I can't understand why I have this exception. I declare the bolt "buscamosEnKlout" before I use "escribirFichero". Next to my topology I'll put the elemental lines of the bolts. I know the spout is OK,because a trial-and-error approach.
The code of my topology is:
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.stats.RollingWindow;
import backtype.storm.topology.BoltDeclarer;
import backtype.storm.topology.TopologyBuilder;
import bolt.*;
import spout.TwitterSpout;
import twitter4j.FilterQuery;
public class TwitterTopologia {
private static String consumerKey = "xxx1";
private static String consumerSecret = "xxx2";
private static String accessToken = "yyy1";
private static String accessTokenSecret="yyy2";
public static void main(String[] args) throws Exception {
/**************** SETUP ****************/
String remoteClusterTopologyName = null;
if (args!=null) { ... }
TopologyBuilder builder = new TopologyBuilder();
FilterQuery tweetFilterQuery = new FilterQuery();
tweetFilterQuery.track(new String[]{"Vacaciones","Holy Week", "Semana Santa","Holidays","Vacation"});
tweetFilterQuery.language(new String[]{"en","es"});
TwitterSpout spout = new TwitterSpout(consumerKey, consumerSecret, accessToken, accessTokenSecret, tweetFilterQuery);
KloutBuscador buscamosEnKlout = new KloutBuscador();
FileWriterBolt fileWriterBolt = new FileWriterBolt("idUsuarios.txt");
builder.setSpout("spoutLeerTwitter",spout,1);
builder.setBolt("buscamosEnKlout",buscamosEnKlout,1).shuffleGrouping("spoutLeerTwitter");
builder.setBolt("escribirFichero",fileWriterBolt,1).shuffleGrouping("buscamosEnKlout");
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
}
else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("twitter-fun", conf, builder.createTopology());
Thread.sleep(460000);
cluster.shutdown();
}
}
}
Bolt "KloutBuscador", alias "buscamosEnKlout", is the next code:
String text = tuple.getStringByField("id");
String cadenaUrl;
cadenaUrl = "http://api.klout.com/v2/identity.json/twitter?screenName=";
cadenaUrl += text.replaceAll("\\[", "").replaceAll("\\]","");
cadenaUrl += "&key=" + kloutKey;
URL url = new URL(cadenaUrl);
HttpURLConnection c = (HttpURLConnection) url.openConnection();
...........c.setRequestMethod("GET");c.setRequestProperty("Content-length", "0");c.setUseCaches(false);c.setAllowUserInteraction(false);c.connect();
int status = c.getResponseCode();
StringBuilder sb = new StringBuilder();
switch (status) {
case 200:
case 201:
BufferedReader br = new BufferedReader(new InputStreamReader(c.getInputStream()));
String line;
while ((line = br.readLine()) != null) sb.append(line + "\n");
br.close();
}
JSONObject jsonResponse = new JSONObject(sb.toString());
//getJSONArray("id");
String results = jsonResponse.toString();
_collector.emit(new Values(text,results));
And the second bolt, fileWriterBolt, alias "escribirFichero", is the next one:
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
_collector = outputCollector;
try {
writer = new PrintWriter(filename, "UTF-8");...}...}
public void execute(Tuple tuple) {
writer.println((count++)+":::"+tuple.getValues());
//+"+++"+tweet.getUser().getId()+"__FINAL__"+tweet.getUser().getName()
writer.flush();
// Confirm that this tuple has been treated.
//_collector.ack(tuple);
}
If I pass over the bolt of Klous and only write the result of the spout, it works. I don't understand why the Klous's bolt causes this failure
Your buscamosEnKlout bolt needs to declare the format of the tuples it will emit, as well as which streams it will emit to. You most likely haven't implemented declareOutputFields correctly in that bolt. It should contain something like declarer.declare(new Fields("your-text-field", "your-results-field"))

Client is not connected to any Elasticsearch nodes in Flink

I am using Flink 1.1.2 and have added ElesticSearch dependency in Maven as follows
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch2_2.10</artifactId>
<version>1.2.0</version>
</dependency>
My program contains the following code that is reading data from Kafka and inserting to Elastic search
public class ReadFromKafka {
public static void main(String[] args) throws Exception {
// create execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", "test");
DataStream<JoinedStreamEvent> message = env.addSource(new FlinkKafkaConsumer09<JoinedStreamEvent>("test",
new JoinSchema(), properties));
System.out.println("reading form kafka ");
message.print();
Map<String, String> config = new HashMap<>();
config.put("bulk.flush.max.actions", "1"); // flush inserts after every event
config.put("cluster.name", "elasticsearch_amar"); // default cluster name
List<InetSocketAddress> transports = new ArrayList<>();
// set default connection details
transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
message.addSink(new ElasticsearchSink<>(config,transports,new ElasticInserter()));
env.execute();
} //main
public static class ElasticInserter implements ElasticsearchSinkFunction<JoinedStreamEvent>{
#Override
public void process(JoinedStreamEvent record, RuntimeContext runtimeContext, RequestIndexer requestIndexer) {
Map<String, Integer> json = new HashMap<>();
json.put("Time", record.getPatient_id());
json.put("heart Rate ", record.getHeartRate());
json.put("resp rete", record.getRespirationRate());
IndexRequest rqst = Requests.indexRequest()
.index("nyc-places") // index name
.type("popular-locations") // mapping name
.source(json);
requestIndexer.add(rqst);
} //process
} //ElasticInserter
} //ReadFromKafka
I have installed ElesticSearch using homebrew and then started it using elesticsearch command as shown below
however, when I start my program I got following error
my reputation below 50, can not comment.
I have a bit of suggestion:
first check whether ES is up,
see Can't Connect to Elasticsearch (through Curl).
recommended to use the docker container to start ES, eg. docker run -d --name es -p 9200:9200 elasticsearch:2 -Des.network.host=0.0.0.0
BTW, You can try: modify es.network.host value to 0.0.0.0 in ES config elasticsearch.yml:

Spark streaming using spring boot

Spring boot new for me as non web project. please guide me how to code Spark streaming in spring boot, i have already work in java-spark project and want to convert in spring boot non web application. any help or suggestion please.
Here is my Spark config
#Bean
public SparkConf sparkConf() {
SparkConf sparkConf = new SparkConf();
sparkConf.set("spark.app.name", "SparkReceiver"); //The name of application. This will appear in the UI and in log data.
//conf.set("spark.ui.port", "7077"); //Port for application's dashboard, which shows memory and workload data.
sparkConf.set("dynamicAllocation.enabled","false"); //Which scales the number of executors registered with this application up and down based on the workload
//conf.set("spark.cassandra.connection.host", "localhost"); //Cassandra Host Adddress/IP
sparkConf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer"); //For serializing objects that will be sent over the network or need to be cached in serialized form.
sparkConf.set("spark.driver.allowMultipleContexts", "true");
sparkConf.setMaster("local[4]");
return sparkConf;
}
#Bean
public JavaSparkContext javaSparkContext() {
return new JavaSparkContext(sparkConf());
}
#Bean
public SparkSession sparkSession() {
return SparkSession
.builder()
.sparkContext(javaSparkContext().sc())
.appName("Java Spark SQL basic example")
.getOrCreate();
}
#Bean
public JavaStreamingContext javaStreamingContext(){
return new JavaStreamingContext(sparkConf(), new Duration(2000));
}
Here is my testing class
#Autowired
private JavaSparkContext sc;
#Autowired
private SparkSession session;
public void testMessage() throws InterruptedException{
JavaStreamingContext jsc = new JavaStreamingContext(sc, new Duration(2000));
Map<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("zookeeper.connect", "localhost:2181"); //Make all kafka data for this cluster appear under a particular path.
kafkaParams.put("group.id", "testgroup"); //String that uniquely identifies the group of consumer processes to which this consumer belongs
kafkaParams.put("metadata.broker.list", "localhost:9092"); //Producer can find a one or more Brokers to determine the Leader for each topic.
kafkaParams.put("serializer.class", "kafka.serializer.StringEncoder"); //Serializer to use when preparing the message for transmission to the Broker.
kafkaParams.put("request.required.acks", "1"); //Producer to require an acknowledgement from the Broker that the message was received.
Set<String> topics = Collections.singleton("16jnfbtopic");
//Create an input DStream for Receiving data from socket
JavaPairInputDStream<String, String> directKafkaStream = KafkaUtils.createDirectStream(jsc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams, topics);
//Create JavaDStream<String>
JavaDStream<String> msgDataStream = directKafkaStream.map(new Function<Tuple2<String, String>, String>() {
#Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
//Create JavaRDD<Row>
msgDataStream.foreachRDD(new VoidFunction<JavaRDD<String>>() {
#Override
public void call(JavaRDD<String> rdd) {
JavaRDD<Row> rowRDD = rdd.map(new Function<String, Row>() {
#Override
public Row call(String msg) {
Row row = RowFactory.create(msg);
return row;
}
});
//Create Schema
StructType schema = DataTypes.createStructType(new StructField[] {DataTypes.createStructField("Message", DataTypes.StringType, true)});
Dataset<Row> msgDataFrame = session.createDataFrame(rowRDD, schema);
msgDataFrame.show();
}
});
jsc.start();
jsc.awaitTermination();
while run this app i am getting error please Guide me.
Here is my error log
Eclipse Error Log

SaveAsHadoopDataset never closes connection To zookeeper

I am using the below code to write to hbase
jsonDStream.foreachRDD(new Function<JavaRDD<String>, Void>() {
#Override
public Void call(JavaRDD<String> rdd) throws Exception {
DataFrame jsonFrame = sqlContext.jsonRDD(rdd);
DataFrame selecteFieldFrame = jsonFrame.select("id_str","created_at","text");
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "d-9543");
config.set("zookeeper.znode.parent","/hbase-unsecure");
config.set("hbase.zookeeper.property.clientPort", "2181");
final JobConf jobConfig=new JobConf(config,SveAsHadoopDataSetExample.class);
jobConfig.setOutputFormat(TableOutputFormat.class);
jobConfig.set(TableOutputFormat.OUTPUT_TABLE,"tableName");
selecteFieldFrame.javaRDD().mapToPair(new PairFunction<Row, ImmutableBytesWritable, Put>() {
#Override
public Tuple2<ImmutableBytesWritable, Put> call(Row row) throws Exception {
// TODO Auto-generated method stub
return convertToPut(row);
}
}).saveAsHadoopDataset(jobConfig);
return null;
}
});
But when i see zkDump in zookeeper the connections keeps on increasing
any suggestion/pointers will be of a great help!
I have the same problem, it is a hbase bug, I fix it:
change org.apache.hadoop.hbase.mapred.TableOutputFormat to org.apache.hadoop.hbase.mapreduce.TableOutputFormat,
and use org.apache.hadoop.mapreduce.Job, not org.apache.hadoop.mapred.JobConf
this is a sample:
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", zk_hosts)
conf.set("hbase.zookeeper.property.clientPort", zk_port)
conf.set(TableOutputFormat.OUTPUT_TABLE, "TABLE_NAME")
val job = Job.getInstance(conf)
job.setOutputFormatClass(classOf[TableOutputFormat[String]])
formatedLines.map{
case (a,b, c) => {
val row = Bytes.toBytes(a)
val put = new Put(row)
put.setDurability(Durability.SKIP_WAL)
put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("node"), Bytes.toBytes(b))
put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("topic"), Bytes.toBytes(c))
(new ImmutableBytesWritable(row), put)
}
}.saveAsNewAPIHadoopDataset(job.getConfiguration)
this may help you!
https://github.com/hortonworks-spark/shc/pull/20/commits/2074067c42c5a454fa4cdeec18c462b5367f23b9

is Generic option -D not supporting in hadoop 0.20.2?

i am trying to set the configuration property from the console using -D generic options.
here is my console input:
$ hadoop jar hadoop-0.20.2/gtee.jar dd.MaxTemperature -D file.pattern=2007.* /inputdata /outputdata
but i did cross verification from the code by
Configuration conf;
System.out.println(conf.get("file.pattern"));
results null output.what would be the problem here, why value of the property "file.pattern" not displaying ? Can any one please help me.
Thanks
EDITED SECTION:
Driver Code:
public int run(String[] args) throws Exception {
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("MaxTemperature");
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(args[0]), conf);
// System.out.println(conf.get("file.pattern"));
if (fs.exists(new Path(args[1]))) {
fs.delete(new Path(args[1]), true);
}
System.out.println(conf.get("file.pattern"));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileInputFormat.setInputPathFilter(job, RegexExcludePathFilter.class);
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapperClass(MapMapper.class);
job.setCombinerClass(Mapreducers.class);
job.setReducerClass(Mapreducers.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int xx = ToolRunner.run(new Configuration(),new MaxTemperature(), args);
System.exit(xx);
}
path filter implementation:
public static class RegexExcludePathFilter extends Configured implements
PathFilter {
//String pattern = "2007.[0-1]?[0-2].[0-9][0-9].txt" ;
Configuration conf;
Pattern pattern;
#Override
public boolean accept(Path path) {
Matcher m = pattern.matcher(path.toString());
return m.matches();
}
#Override
public void setConf(Configuration conf) {
this.conf = conf;
pattern = Pattern.compile(conf.get("file.pattern"));
System.out.println(pattern);
}
}
To confirm you -D option is supported in the version 20.2 however that requires you to implement the Tool interface to read variables from command line
Configuration conf = new Configuration(); //this is the issue
// When implementing tool use this
Configuration conf = this.getConf();
You are passing it with a space in between, that's not how you should do it. Instead try:
-Dfile.pattern=2007.*

Resources