Kryo serialization error in Spark job - hadoop

I want to use Kryo serialization in Spark job.
public class SerializeTest {
public static class Toto implements Serializable {
private static final long serialVersionUID = 6369241181075151871L;
private String a;
public String getA() {
return a;
}
public void setA(String a) {
this.a = a;
}
}
private static final PairFunction<Toto, Toto, Integer> WRITABLE_CONVERTOR = new PairFunction<Toto, Toto, Integer>() {
private static final long serialVersionUID = -7119334882912691587L;
#Override
public Tuple2<Toto, Integer> call(Toto input) throws Exception {
return new Tuple2<Toto, Integer>(input, 1);
}
};
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("SerializeTest");
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
conf.registerKryoClasses(new Class<?>[]{Toto[].class});
JavaSparkContext context = new JavaSparkContext(conf);
List<Toto> list = new ArrayList<Toto>();
list.add(new Toto());
JavaRDD<Toto> cursor = context.parallelize(list, list.size());
JavaPairRDD<Toto, Integer> writable = cursor.mapToPair(WRITABLE_CONVERTOR);
writable.saveAsHadoopFile(args[0], Toto.class, Integer.class, SequenceFileOutputFormat.class);
context.close();
}
}
But i have this error :
java.io.IOException: Could not find a serializer for the Key class: 'com.test.SerializeTest.Toto'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1179)
at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:1094)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/09/21 17:49:14 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException: Could not find a serializer for the Key class: 'com.test.SerializeTest.Toto'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1179)
at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:1094)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Thanks.

This error is related neither to Spark nor Kryo.
When using Hadoop output formats you need to make sure your key and value are instances of Writable. Hadoop doesn't use Java serialization by default (and you don't want to use it either, because it's very ineffective)
You can check your io.serializations property in configuration and you'll see list of used serializers including org.apache.hadoop.io.serializer.WritableSerialization
To fix this issue your Toto class must implement Writable. The same issue is with Integer, use rather IntWritable.

Related

Swagger codegen - change api and controller class name

I am trying to generate rest code for spring using swagger. Currently generate api is V10Api.java and controller is V10ApiController.java. I want to have custom prefix as 'ReadApi.java' and 'ReadApiController.java'.
I looked at the solution here to implement this, so my code was:
public class ReadApiSpringCodeGen extends SpringCodegen
{
static {
PREFIX="Read"; //compile error at PREFIX
}
}
it gives compilation error at PREFIX so i am guessing PREFIX is not in superclass.
I modified the class to over toApiName() method:
public class ReadApiSpringCodeGen extends SpringCodegen
{
#Override
public String toApiName(String name) {
System.out.println("Name in is ["+name+"]");
if (name.length() == 0) {
return "DefaultApi";
}
name = sanitizeName(name);
return camelize(name) + "Read";
}
public static void main(String[] args)
{
System.out.println("Main called");
}
}
When ran the code generator as:
${JAVA_HOME}/bin/java -cp .:./swagger-codegen-cli-2.2.1.jar \
-jar swagger-codegen-cli-2.2.1.jar generate \
-i Read.yaml \
-l com.foo.swag.codegen.swagger.ReadApiSpringCodeGen \
....
I get the error:
Exception in thread "main" java.lang.RuntimeException: Can't load config class with name com.foo.swag.codegen.swagger.ReadApiSpringCodeGen Available: android
ndroid
aspnet5
async-scala
cwiki
csharp
cpprest
.....
at io.swagger.codegen.CodegenConfigLoader.forName(CodegenConfigLoader.java:31)
at io.swagger.codegen.config.CodegenConfigurator.toClientOptInput(CodegenConfigurator.java:353)
at io.swagger.codegen.cmd.Generate.run(Generate.java:221)
at io.swagger.codegen.SwaggerCodegen.main(SwaggerCodegen.java:36)
Caused by: java.lang.ClassNotFoundException: com.foo.swag.codegen.swagger.ReadApiSpringCodeGen
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at io.swagger.codegen.CodegenConfigLoader.forName(CodegenConfigLoader.java:29)
... 3 more
You have to override method apiFilename - something like this:
#Override
public String apiFilename(final String templateName, final String tag) {
final String pathWithFile = super.apiFilename(templateName, tag);
final String pathWithoutFileExtension = pathWithFile.substring(0, pathWithFile.lastIndexOf('.')); //without .java
final int index = pathWithoutFileExtension.lastIndexOf('.');
final String className = ".Read" + pathWithoutFileExtension.substring(index+1) + ".java";
result pathWithoutFileExtension.substring(0, pathWithoutFileExtension.lastIndexOf('.')) + className;
}
I fixed it by extending SpringCodegen class and overriding toApiName() method.
public class ReadApiSpringCodeGen extends SpringCodegen
{
#Override
public String toApiName(String name) {
return "CustomReadApi";
}
}
Works perfectly. Thanks for all the clues.

Define prefix root node for Jackson Serialization/Deserialization YAML document to POJO with Prefix

I found https://github.com/FasterXML/jackson-dataformat-yaml to deserialize/serialize YAML files. However, I'm having a hard time to deserialize/serialize the following:
I want to define a prefix to the actual document to be parsed as a POJO. Similar to a subtree of the document.
I want to define the POJO that represents the simple object representation instead of creating multiple objects.
The Error "Unrecognized field "spring" (class ConfigServerProperties), not marked as ignorable (one known property: "repos"])" is shown. But I don't know how to represent the prefix "spring.cloud.config.server.git" to be the root element of the POJO.
Document
spring:
cloud:
config:
server:
git:
repos:
publisher:
uri: 'https://github.company.com/toos/spring-cloud-config-publisher-config'
cloneOnStart: true
username: myuser
password: password
pullOnRequest: false
differentProperty: My Value
config_test_server_config:
uri: 'https://github.company.com/mdesales/config-test-server-config'
cloneOnStart: true
username: 226b4bb85aa131cd6393acee9c484ec426111d16
password: ""
completelyDifferentProp: this is a different one
For this document, the requirements are as follows:
* I want to define the prefix as "spring.cloud.config.server.git".
* I want to create a POJO that represents the object.
POJO
I created the following POJOs to represent this.
ConfigServerProperties: represents the top pojo containing the list of repos.
ConfigServerOnboard: represents each of the elements of the document.
Each properties are stored in a map, so that we can add as many different properties as possible.
Each class is as follows:
public class ConfigServerProperties {
private Map<String, ConfigServerOnboard> repos;
public void setRepos(Map<String, ConfigServerOnboard> repos) {
this.repos = repos;
}
public Map<String, ConfigServerOnboard> getRepos() {
return this.repos;
}
}
The second class is as follows:
public class ConfigServerOnboard {
private Map<String, String> properties;
public Map<String, String> getProperties() {
return properties;
}
public void setProperties(Map<String, String> properties) {
this.properties = properties;
}
}
Deserialize
The deserialization strategy I tried is as follows:
public static ConfigServerProperties parseProperties(File filePath)
throws JsonParseException, JsonMappingException, IOException {
ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
JsonNodeFactory jsonNodeFactory = new JsonNodeFactory(false);
jsonNodeFactory.textNode("spring.cloud.config");
// tried to use this attempting to get the prefix
mapper.setNodeFactory(jsonNodeFactory);
ConfigServerProperties user = mapper.readValue(filePath, ConfigServerProperties.class);
return user;
}
Error Returned
Exception in thread "main" com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "spring" (class com.company.platform.config.onboarding.files.config.model.ConfigServerProperties), not marked as ignorable (one known property: "repos"])
at [Source: /tmp/config-server-onboards.yml; line: 3, column: 3] (through reference chain: com.company.platform.config.onboarding.files.config.model.ConfigServerProperties["spring"])
at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:62)
at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:834)
at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:1094)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1470)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1448)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:282)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:140)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3798)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2740)
at com.company.platform.config.onboarding.files.config.model.ConfigServerProperties.parseProperties(ConfigServerProperties.java:37)
at com.company.platform.config.onboarding.files.config.model.ConfigServerProperties.main(ConfigServerProperties.java:42)
Edit 1: Looking for a possible SpringBoot Solution
I'm open to solutions using SpringBoot's ConfigurationProperties("spring.cloud.config.server.git"). That way, we could have the following:
#ConfigurationProperties("spring.cloud.config.server.git")
public class Configuration {
private Map<String, Map<String, String>> repos = new LinkedHashMap<String, new HashMap<String, String>>();
// getter/setter
}
Questions
How to set the root element of the document?
Deserialization must read the document and produce instances of the POJOs.
Serialization must produce the same document with updated values.
I had to come up with the following:
Create 6 classes, each of them with the property required for the prefix "spring.cloud.config.server.git"
SpringCloudConfigSpring.java
SpringCloudConfigCloud.java
SpringCloudConfigConfig.java
SpringCloudConfigServer.java
SpringCloudConfigGit.java
The holder of all of them is SpringCloudConfigFile.java.
The holder and all the classes have a reference to the next property, which has a reference to the next, etc, with their own setter/getter methods as usual.
public class SpringCloudConfigSpring {
private SpringCloudConfigCloud cloud;
public SpringCloudConfigCloud getCloud() {
return cloud;
}
public void setCloud(SpringCloudConfigCloud cloud) {
this.cloud = cloud;
}
}
Implemented the representation of the map easily.
The last one I used the reference of a TreeMap to keep the keys sorted, another map to represent any property that may be added, without changing the representation.
public class SpringCloudConfigGit {
TreeMap<String, Map<String, Object>> repos;
public TreeMap<String, Map<String, Object>> getRepos() {
return repos;
}
public void setRepos(TreeMap<String, Map<String, Object>> repos) {
this.repos = repos;
}
}
Results
Creating the verification as follows:
public static void main(String[] args) throws JsonParseException, JsonMappingException, IOException {
File config = new File("/tmp/config-server-onboards.yml");
SpringCloudConfigFile props = ConfigServerProperties.parseProperties(config);
props.getSpring().getCloud().getConfig().getServer().getGit().getRepos().forEach((appName, properties) -> {
System.out.println("################## " + appName + " #######################3");
System.out.println(properties);
if (appName.equals("github_pages_reference")) {
properties.put("name", "Marcello");
properties.put("cloneOnStart", true);
}
System.out.println("");
});
saveProperties(new File(config.getAbsoluteFile().getParentFile(), "updated-config-onboards.yml"), props);
}
The output is as follows:
################## config_onboarding #######################3
{uri=https://github.company.com/servicesplatform-tools/spring-cloud-config-onboarding-config, cloneOnStart=true, username=226b4bb85aa131cd6393acee9c484ec426111d16, password=, pullOnRequest=false}
################## config_test_server_config #######################3
{uri=https://github.company.com/rlynch2/config-test-server-config, cloneOnStart=true, username=226b4bb85aa131cd6393acee9c484ec426111d16, password=, pullOnRequest=false}
################## github_pages_reference #######################3
{uri=https://github.company.com/servicesplatform-tools/spring-cloud-config-reference-service-config, cloneOnStart=true, username=226b4bb85aa131cd6393acee9c484ec426111d16, password=, pullOnRequest=false}
There are obvious improvements required:
I'd like to have a solution with a single class;
I'd like to have a an ObjetMapper method that specifies the "subtree" of the YAML object tree that I'd like to parse.
Maybe a more sophisticated SpringBoot-like #ConfigurationProperties("spring.cloud.config.server.git") would help.
Helper methods for loading and saving the state of these instances.
Load Method
public static SpringCloudConfigFile parseProperties(File filePath)
throws JsonParseException, JsonMappingException, IOException {
ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
SpringCloudConfigFile file = mapper.readValue(filePath, SpringCloudConfigFile.class);
return file;
}
Save Properties
public static void saveProperties(File filePath, SpringCloudConfigFile file) throws JsonGenerationException, JsonMappingException, IOException {
ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
mapper.writeValue(filePath, file);
}
File Saved
It maintained the sorted keys as implemented.

Caching Java 8 Optional with Spring Cache

I have a method:
#Cacheable(key = "#jobId")
public Optional<JobInfo> getJobById(String jobId) {
log.info("Querying for job " + jobId);
counterService.increment("queryJobById");
Job job = jobsRepository.findOne(jobId);
if (job != null) {
return Optional.of(createDTOFromJob(job));
}
return Optional.empty();
}
When I am trying to retrieve the cached item I am getting the following exception:
2016-01-18 00:01:10 ERROR [trace=,span=] http-nio-8021-exec-2 [dispatcherServlet]:182 - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.data.redis.serializer.SerializationException: Cannot serialize; nested exception is org.springframework.core.serializer.support.SerializationFailedException: Failed to serialize object using DefaultSerializer; nested exception is java.lang.IllegalArgumentException: DefaultSerializer requires a Serializable payload but received an object of type [java.util.Optional]] with root cause
java.lang.IllegalArgumentException: DefaultSerializer requires a Serializable payload but received an object of type [java.util.Optional]
Just implement the Serializable interface in your DTO
#Document(collection = "document_name")
public class Document implements Serializable {
private static final long serialVersionUID = 7156526077883281623L;
Spring supports caching Optional. The issue is your Redis serializer (JdkSerializationRedisSerializer probably). It uses Java based serialization which requires the classes to be Serializable. You can solve this by configuring the RedisCacheManager to use another serializer that doesn't have this limitation. For example you can use Kryo (com.esotericsoftware:kryo:3.0.3):
#Bean
RedisCacheManager redisCacheManager (RedisTemplate<Object, Object> redisOperations) {
// redisOperations will be injected if it is configured as a bean or create it: new RedisTemplate()...
redisOperations.setDefaultSerializer(new RedisSerializer<Object>() {
//use a pool because kryo instances are not thread safe
KryoPool kryoPool = new KryoPool.Builder(Kryo::new).build();
#Override
public byte[] serialize(Object o) throws SerializationException {
ByteBufferOutput output = new ByteBufferOutput();
Kryo kryo = kryoPool.borrow();
try {
kryo.writeClassAndObject(output, o);
} finally {
kryoPool.release(kryo);
output.close();
}
return output.toBytes();
}
#Override
public Object deserialize(byte[] bytes) throws SerializationException {
if(bytes.length == 0) return null;
Kryo kryo = kryoPool.borrow();
Object o;
try {
o = kryo.readClassAndObject(new ByteBufferInput(bytes));
} finally {
kryoPool.release(kryo);
}
return o;
}
});
RedisCacheManager redisCacheManager = new RedisCacheManager(redisOperations);
redisCacheManager.setCachePrefix(new DefaultRedisCachePrefix("app"));
redisCacheManager.setTransactionAware(true);
return redisCacheManager;
}
Note that this is just an example, I didn't test this imeplementation. But I use the Kryo serializer in production in the same manner for redis caching with Spring.
Because your serialized object is not implement RedisSerializer, or you can extend class JdkSerializationRedisSerializer, which have implement RedisSerializer.
example code:
import org.springframework.data.redis.serializer.JdkSerializationRedisSerializer;
import org.springframework.data.redis.serializer.RedisSerializer;
import org.springframework.data.redis.serializer.SerializationException;
public class YourDTOObject extends JdkSerializationRedisSerializer implements Serializable{
/**
*
*/
private static final long serialVersionUID = 1L;
....
}
More details and principle, please visit my blog

apache storm, load balance, json

I am using Kafka storm, kafka sends/emits json string to storm, in the storm, I want to distribute the load to a couple of workers based on the key/field in the json. How to do that? In my case, it is groupid field in json string.
For example, I have json like that:
{groupid: 1234, userid: 145, comments:"I want to distribute all this group 1234 to one worker", size:50,type:"group json"}
{groupid: 1235, userid: 134, comments:"I want to distribute all this group 1234 to another worker", size:90,type:"group json"}
{groupid: 1234, userid: 158, comments:"I want to be sent to same worker as group 1234", size:50,type:"group json"}
I try too use following codes:
1. TopologyBuilder builder = new TopologyBuilder();
2. builder.setSpout(SPOUTNAME, kafkaSpout, 1);
3. builder.setBolt(MYDISTRIBUTEDWORKER, new DistributedBolt()).setFieldsGroup(SPOUTNAME,new Fields("groupid")); <---???
I am wondering how to put arguments in setFieldsGroup method in line 3. Could someone give me a hint?
Juhani
==Testing using storm 0.9.4 ============
=============source codes==============
import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;
import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class KafkaBoltMain {
private static final String SPOUTNAME="TopicSpout";
private static final String ANALYSISBOLT = "AnalysisWorker";
private static final String CLIENTID = "Storm";
private static final String TOPOLOGYNAME = "LocalTopology";
private static class AppAnalysisBolt extends BaseRichBolt {
private static final long serialVersionUID = -6885792881303198646L;
private OutputCollector _collector;
private long groupid=-1L;
private String log="test";
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute(Tuple tuple) {
List<Object> objs = tuple.getValues();
int i=0;
for(Object obj:objs){
System.out.println(""+i+"th object's value is:"+obj.toString());
i++;
}
// _collector.emit(new Values(groupid,log));
_collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("groupid","log"));
}
}
public static void main(String[] args){
String zookeepers = null;
String topicName = null;
if(args.length == 2 ){
zookeepers = args[0];
topicName = args[1];
}else if(args.length == 1 && args[0].equalsIgnoreCase("help")){
System.out.println("xxxx");
System.exit(0);
}
else{
System.out.println("You need to have two arguments: kafka zookeeper:port and topic name");
System.out.println("xxxx");
System.exit(-1);
}
SpoutConfig spoutConfig = new SpoutConfig(new ZkHosts(zookeepers),
topicName,
"",// zookeeper root path for offset storing
CLIENTID);
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(SPOUTNAME, kafkaSpout, 1);
builder.setBolt(ANALYSISBOLT, new AppAnalysisBolt(),2)
.fieldsGrouping(SPOUTNAME,new Fields("groupid"));
//Configuration
Config conf = new Config();
conf.setDebug(false);
//Topology run
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(TOPOLOGYNAME, conf, builder.createTopology());
}
}
==================================================
when I start to submit topology(local cluster), it
gives following error:
11658 [SyncThread:0] INFO org.apache.storm.zookeeper.server.ZooKeeperServer - Established session 0x14d097d338c0009 with negotiated timeout 20000 for client /127.0.0.1:34656
11658 [main-SendThread(localhost:2000)] INFO org.apache.storm.zookeeper.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:2000, sessionid = 0x14d097d338c0009, negotiated timeout = 20000
11659 [main-EventThread] INFO org.apache.storm.curator.framework.state.ConnectionStateManager - State change: CONNECTED
12670 [main] INFO backtype.storm.daemon.supervisor - Starting supervisor with id ccc57de0-29ff-4cb4-89de-fea1ea9b6e28 at host storm-VirtualBox
12794 [main] WARN backtype.storm.daemon.nimbus - Topology submission exception. (topology name='LocalTopology') #<InvalidTopologyException InvalidTopologyException(msg:Component: [AnalysisWorker] subscribes from stream: [default] of component [TopicSpout] with non-existent fields: #{"groupid"})>
12800 [main] ERROR org.apache.storm.zookeeper.server.NIOServerCnxnFactory - Thread Thread[main,5,main] died
backtype.storm.generated.InvalidTopologyException: null
at backtype.storm.daemon.common$validate_structure_BANG_.invoke(common.clj:178) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.common$system_topology_BANG_.invoke(common.clj:307) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.nimbus$fn__4290$exec_fn__1754__auto__$reify__4303.submitTopologyWithOpts(nimbus.clj:948) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.nimbus$fn__4290$exec_fn__1754__auto__$reify__4303.submitTopology(nimbus.clj:966) ~[storm-core-0.9.4.jar:0.9.4]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_80]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_80]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_80]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80]
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.5.1.jar:na]
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) ~[clojure-1.5.1.jar:na]
at backtype.storm.testing$submit_local_topology.invoke(testing.clj:264) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.LocalCluster$_submitTopology.invoke(LocalCluster.clj:43) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.LocalCluster.submitTopology(Unknown Source) ~[storm-core-0.9.4.jar:0.9.4]
at com.callstats.stream.analyzer.KafkaBoltMain.main(KafkaBoltMain.java:94) ~[StreamAnalyzer-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
I'm not sure which version of Storm you are using, as of 0.9.4, your requirement can be implemented as follows.
builder.setBolt(MYDISTRIBUTEDWORKER, new DistributedBolt()).fieldsGrouping(SPOUTNAME, new Fields("groupid"));
In prepare method of DistributedBolt,
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("groupid", "log"));
}
Somewhere in execute method of it, you will call
collector.emit(new Values(groupid, log));
then tuples which have same groupid will be delivered to same instance of next bolt.

Camel's BridgePropertyPlaceholderConfigurer is not working when using Java config

I'm using Spring Java config and writing a console application with a few Camel routes. I have several properties sources in my app, so I use two PropertyPlaceholderConfigurers:
#Configuration
#Import(CamelConfig.class)
#ComponentScan(basePackageClasses = {App.class})
public class Config
{
final static String ENV = System.getProperty( "ENV" );
#Bean
public static BridgePropertyPlaceholderConfigurer properties()
{
final BridgePropertyPlaceholderConfigurer result = new BridgePropertyPlaceholderConfigurer();
result.setOrder( 0 );
result.setIgnoreUnresolvablePlaceholders( true );
result.setLocations( new ClassPathResource( "a/b/c/environments/base.properties" ),
new ClassPathResource( "a/b/c/environments/" + ENV + "/env.properties" ) );
return result;
}
#Bean
public static BridgePropertyPlaceholderConfigurer dlqAppProperties()
{
final YamlPropertiesFactoryBean yaml = new YamlPropertiesFactoryBean();
final BridgePropertyPlaceholderConfigurer result = new BridgePropertyPlaceholderConfigurer();
yaml.setResources( new ClassPathResource( "app.yaml" ) );
result.setOrder( 1 );
result.setIgnoreUnresolvablePlaceholders( true );
result.setProperties( yaml.getObject() );
return result;
}
}
As per this doc I'm using BridgePropertyPlaceholderConfigurer class to make Spring properties available in Camel. It's config is simple too:
#Configuration
public class CamelConfig extends SingleRouteCamelConfiguration
{
#Override
protected CamelContext createCamelContext() throws Exception
{
final SpringCamelContext result = new SpringCamelContext( getApplicationContext() );
return result;
}
#Override
protected void setupCamelContext( CamelContext camelContext ) throws Exception
{
}
#Bean
#Override
public RouteBuilder route()
{
return (new Routes()).builder();
}
}
Test route (Scala DSL) is simple too:
class Routes extends RouteBuilder {
"timer://{{foo}}?period=2s" ==> {
process((exchange) => {
exchange.getIn.setBody("test")
})
to("log:test")
}
}
But the context does not start with following exception:
Exception in thread "main" org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'camelContext' defined in class path resource [a/b/c/config/CamelConfig.class]: Invocation of init method failed; nested exception is org.apache.camel.FailedToCreateRouteException: Failed to create route route1: Route(route1)[[From[timer://{{foo}}?period=2s]] -> [process[... because of Failed to resolve endpoint: timer://{{foo}}?period=2s due to: PropertiesComponent with name properties must be defined in CamelContext to support property placeholders.
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1566)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:539)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:303)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:299)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:755)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:757)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:480)
at org.springframework.context.annotation.AnnotationConfigApplicationContext.<init>(AnnotationConfigApplicationContext.java:84)
at a.b.c.App.main(App.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: org.apache.camel.FailedToCreateRouteException: Failed to create route route1: Route(route1)[[From[timer://{{foo}}?period=2s]] -> [process[... because of Failed to resolve endpoint: timer://{{foo}}?period=2s due to: PropertiesComponent with name properties must be defined in CamelContext to support property placeholders.
at org.apache.camel.model.RouteDefinition.addRoutes(RouteDefinition.java:182)
at org.apache.camel.impl.DefaultCamelContext.startRoute(DefaultCamelContext.java:770)
at org.apache.camel.impl.DefaultCamelContext.startRouteDefinitions(DefaultCamelContext.java:1914)
at org.apache.camel.impl.DefaultCamelContext.doStartCamel(DefaultCamelContext.java:1670)
at org.apache.camel.impl.DefaultCamelContext.doStart(DefaultCamelContext.java:1544)
at org.apache.camel.spring.SpringCamelContext.doStart(SpringCamelContext.java:179)
at org.apache.camel.support.ServiceSupport.start(ServiceSupport.java:61)
at org.apache.camel.impl.DefaultCamelContext.start(DefaultCamelContext.java:1512)
at org.apache.camel.spring.SpringCamelContext.maybeStart(SpringCamelContext.java:228)
at org.apache.camel.spring.SpringCamelContext.afterPropertiesSet(SpringCamelContext.java:104)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1625)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1562)
... 16 more
Caused by: org.apache.camel.ResolveEndpointFailedException: Failed to resolve endpoint: timer://{{foo}}?period=2s due to: PropertiesComponent with name properties must be defined in CamelContext to support property placeholders.
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:477)
at org.apache.camel.util.CamelContextHelper.getMandatoryEndpoint(CamelContextHelper.java:63)
at org.apache.camel.model.RouteDefinition.resolveEndpoint(RouteDefinition.java:192)
at org.apache.camel.impl.DefaultRouteContext.resolveEndpoint(DefaultRouteContext.java:106)
at org.apache.camel.impl.DefaultRouteContext.resolveEndpoint(DefaultRouteContext.java:112)
at org.apache.camel.model.FromDefinition.resolveEndpoint(FromDefinition.java:72)
at org.apache.camel.impl.DefaultRouteContext.getEndpoint(DefaultRouteContext.java:88)
at org.apache.camel.model.RouteDefinition.addRoutes(RouteDefinition.java:890)
at org.apache.camel.model.RouteDefinition.addRoutes(RouteDefinition.java:177)
... 27 more
Caused by: java.lang.IllegalArgumentException: PropertiesComponent with name properties must be defined in CamelContext to support property placeholders.
at org.apache.camel.impl.DefaultCamelContext.resolvePropertyPlaceholders(DefaultCamelContext.java:1121)
at org.apache.camel.impl.DefaultCamelContext.getEndpoint(DefaultCamelContext.java:475)
... 35 more
Looks like the bridge does not work (but I definitely can use placeholders in Spring). What can be the problem?
Looks like if you want to use BridgePropertyPlaceholderConfigurer, you need to instantiate Camel contexts with CamelContextFactoryBean. It has initPropertyPlaceholder method:
#Override
protected void initPropertyPlaceholder() throws Exception {
super.initPropertyPlaceholder();
Map<String, BridgePropertyPlaceholderConfigurer> beans = applicationContext.getBeansOfType(BridgePropertyPlaceholderConfigurer.class);
if (beans.size() == 1) {
// setup properties component that uses this beans
BridgePropertyPlaceholderConfigurer configurer = beans.values().iterator().next();
String id = beans.keySet().iterator().next();
LOG.info("Bridging Camel and Spring property placeholder configurer with id: " + id);
// get properties component
PropertiesComponent pc = getContext().getComponent("properties", PropertiesComponent.class);
// replace existing resolver with us
configurer.setResolver(pc.getPropertiesResolver());
configurer.setParser(pc.getPropertiesParser());
String ref = "ref:" + id;
// use the bridge to handle the resolve and parsing
pc.setPropertiesResolver(configurer);
pc.setPropertiesParser(configurer);
// and update locations to have our as ref first
String[] locations = pc.getLocations();
String[] updatedLocations;
if (locations != null && locations.length > 0) {
updatedLocations = new String[locations.length + 1];
updatedLocations[0] = ref;
System.arraycopy(locations, 0, updatedLocations, 1, locations.length);
} else {
updatedLocations = new String[]{ref};
}
pc.setLocations(updatedLocations);
} else if (beans.size() > 1) {
LOG.warn("Cannot bridge Camel and Spring property placeholders, as exact only 1 bean of type BridgePropertyPlaceholderConfigurer"
+ " must be defined, was {} beans defined.", beans.size());
}
}
Well, the problem now is to have two bridges, but that's another story..
I had the same problem. Here's what worked for me (inspired by the initPropertyPlaceholder() method):
import org.apache.camel.component.properties.PropertiesComponent;
import org.apache.camel.spring.javaconfig.CamelConfiguration;
import org.apache.camel.spring.spi.BridgePropertyPlaceholderConfigurer;
#Configuration
#ComponentScan
public class AwesomeConfig extends CamelConfiguration {
private static final String PROPERTIES_BEAN_NAME = "springProperties";
#Resource(name = PROPERTIES_BEAN_NAME)
private BridgePropertyPlaceholderConfigurer springProperties;
#Bean(PROPERTIES_BEAN_NAME)
public static BridgePropertyPlaceholderConfigurer springProperties() throws Exception {
BridgePropertyPlaceholderConfigurer configurer = new BridgePropertyPlaceholderConfigurer();
configurer.setSystemPropertiesMode(BridgePropertyPlaceholderConfigurer.SYSTEM_PROPERTIES_MODE_OVERRIDE);
String defaultPropertiesPath = buildProperties().getProperty("properties.path");
String propertiesPath = System.getProperty(PROPERTY_FILE_SYSTEM_PROPERTY, defaultPropertiesPath);
configurer.setLocations(new ClassPathResource("META-INF/application.properties"));
return configurer;
}
#Bean
public PropertiesComponent camelProperties() throws Exception {
PropertiesComponent camelProperties = new PropertiesComponent();
springProperties.setParser(camelProperties.getPropertiesParser());
springProperties.setResolver(camelProperties.getPropertiesResolver());
camelProperties.setSystemPropertiesMode(springProperties.getSystemPropertiesMode());
camelProperties.setPropertiesResolver(springProperties);
camelProperties.setPropertiesParser(springProperties);
camelProperties.setLocation("ref:" + PROPERTIES_BEAN_NAME);
return camelProperties;
}
#Override
protected void setupCamelContext(CamelContext camelContext) throws Exception {
camelContext.addComponent("properties", camelProperties());
}
}
And here's how I use it:
import org.apache.camel.spring.javaconfig.Main;
public class AwesomeMain extends Main {
setConfigClass(AwesomeConfig.class);
}
public static void main(String... args) throws Exception {
AwesomeMain main = new AwesomeMain();
instance = main;
main.run(args);
}
Try to rename your first BridgePropertyPlaceholderConfigurer bean (method's name in your case).
Look what I have hacked up. Haven't fully tested but wanted to share; should work with Spring 5.x. Basically copies all of the Environment to the Camel's properties, so I don't use the Camel's "bridge" at all. One thing I am not sure for today, if I have to put it into "initial" or "overiding" properties:
#Configuration
public static class CamelConfig extends CamelConfiguration {
#Autowired
private ConfigurableEnvironment environment;
#Bean
... some beans ...
//#Bean -- haven't yet found out if we need it as a bean ...
private PropertiesComponent camelProperties() throws Exception {
PropertiesComponent camelProperties = new PropertiesComponent();
// just brutally copy all the properties form environment
HashSet<String> propertyNames = new HashSet<String>(100);
for (PropertySource ps : environment.getPropertySources()) {
if (ps instanceof MapPropertySource) {
MapPropertySource mps = (MapPropertySource) ps;
propertyNames.addAll(Arrays.asList(mps.getPropertyNames()));
}
}
Properties allProps = new Properties();
for (String prop : propertyNames) {
allProps.setProperty(prop, environment.getProperty(prop));
}
camelProperties.setInitialProperties(allProps);
// TODO: check it this is better or worse
//camelProperties.setOverrideProperties(allProps);
return camelProperties;
}
#Override
protected void setupCamelContext(CamelContext camelContext) throws Exception {
... some configs. ...
camelContext.addComponent("properties", camelProperties());
}
}

Resources