Insert data on TitanDB using Spark (or SparkStreaming)

Insert data on TitanDB using Spark (or SparkStreaming) - spark-streaming

I am trying to add elements to TitanDB using SparkStreaming (collecting messages from a Kafka queue). But it seems that it's harder than expected.
Here the definition of the Titan connection:
val confPath: String = "titan-cassandra-es-spark.properties"
val conn: TitanModule = new TitanModule(confPath)
Titan module is a Serializable class that configure the TitanDB connection:
...
val configurationFilePath: String = confFilePath
val configuration = new PropertiesConfiguration(configurationFilePath)
val gConn: TitanGraph = TitanFactory.open(configuration)
...
When I execute the sparkStreaming job that collect messages (json) from a Kafka queue, it receive the message and trying to add it into TitanDB, it explodes with the following stackTrace.
Do you guys know if adding data into TitanDB is feasible with SparkStreaming?
Do you know what could be the solution for this?
18:03:50,596 ERROR JobScheduler:95 - Error running job streaming job 1464624230000 ms.0
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:911)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:910)
at salvob.SparkConsumer$$anonfun$main$1.apply(SparkConsumer.scala:200)
at salvob.SparkConsumer$$anonfun$main$1.apply(SparkConsumer.scala:132)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException: org.apache.commons.configuration.PropertiesConfiguration
Serialization stack:
- object not serializable (class: org.apache.commons.configuration.PropertiesConfiguration, value: org.apache.commons.configuration.PropertiesConfiguration#2cef9ce8)
- field (class: salvob.TitanModule, name: configuration, type: class org.apache.commons.configuration.PropertiesConfiguration)
- object (class salvob.TitanModule, salvob.TitanModule#20d984db)
- field (class: salvob.SparkConsumer$$anonfun$main$1$$anonfun$apply$3, name: conn$1, type: class salvob.TitanModule)
- object (class salvob.SparkConsumer$$anonfun$main$1$$anonfun$apply$3, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 28 more
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:911)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:910)
at salvob.SparkConsumer$$anonfun$main$1.apply(SparkConsumer.scala:200)
at salvob.SparkConsumer$$anonfun$main$1.apply(SparkConsumer.scala:132)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException: org.apache.commons.configuration.PropertiesConfiguration
Serialization stack:
- object not serializable (class: org.apache.commons.configuration.PropertiesConfiguration, value: org.apache.commons.configuration.PropertiesConfiguration#2cef9ce8)
- field (class: salvob.TitanModule, name: configuration, type: class org.apache.commons.configuration.PropertiesConfiguration)
- object (class salvob.TitanModule, salvob.TitanModule#20d984db)
- field (class: salvob.SparkConsumer$$anonfun$main$1$$anonfun$apply$3, name: conn$1, type: class salvob.TitanModule)
- object (class salvob.SparkConsumer$$anonfun$main$1$$anonfun$apply$3, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 28 more

Spark Streaming produces RDDs. Processing of the data inside the RDDs happens on the worker nodes. The code you write inside rdd.map(), is serialised along with the objects which are referenced inside that block and sent to the worker node for processing.
So ideal way to use the graph instance through Spark is the following :
streamRdd.map(kafkaTuple => {
// create graph instance
// use graph instance to add / modify graph
// close graph instance
})
But this will create a new graph instance for each row. As an optimisation, you can create the graph instance per instance
rdd.foreachPartition((rddRows: Iterator[kafkaTuple]) => {
val graph: TitanGraph = // create titan instance
val trans: TitanTransaction = graph.newTransaction()
rddRows.foreach(graphVertex => {
// do graph insertion in the above transaction
})
createVertexTrans.commit()
graph.close()
})
graph.newTransaction() here helps in multi threaded graph updates. Other wise you will get lock exceptions.
Only thing is that, according to what I have read so far, There is no direct support for multi node update. From what I saw, Titan Transaction updates HBase with a lock whenever it tries to modify a vertex. So other partitions will fail when they try to do any updates. You will have to build an external synchronisation mechanism or repartition your rdd into a single partition and then use the above code to do updates.

Make sure that all classes that could be passed to other slave machines are Serializable. It's quite important. Do not initialize any variables outside of these passed classes.
I have used Apache Spark (Not Streaming) and it worked well. It wasn't quite easy to get it right since Titan uses a version of Spark. So there would be some dependency conflicts. This is the only version that would work
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.2</version>
</dependency>
This is how I started the cluster.
SparkConf conf = new SparkConf()
.setAppName(AbstractSparkImporter.class.getCanonicalName())
.setMaster("spark_cluster_name");
this.sc = new JavaSparkContext(conf);
this.numPartitions=new Integer(num);
Then parse the data
JavaRDD<T> javaRDD = initRetriever(); // init JavaRDD
javaRDD.foreachPartition(iter->{
Graph graph= initGraph();
Parser<T> parser= initParser(graph);
while(iter.hasNext()){
try {
parser.parse(iter); // extends serializable !
} catch (Exception e) {
logger.error("Failed in importing all vertices ", e);
graph.tx().rollback();
}
}
graph.tx().commit();
});
I might be able to release this module on Github if it's necessary.

Related

Creating a state store with Avro inside Kstream Consumer Processor

I have a consumer defined as below. It reads a avro message out of topic and constructs a statestore of aggregated data, which is also of type avro.
#Bean
public Consumer<KStream<String, InputEvent>> avroTest() {
Serde<OutputEvent> serdeOutEvent = new SpecificAvroSerde<>(schemaRegistryClient);
return st -> st.groupByKey().aggregate(OutputEvent::new, (key, currentEvent, outputEvent) -> {
//aggregate here
return outputEvent;
}, Materialized.with(new Serdes.StringSerde(), serdeOutEvent).toStream();
}
The function is able to read messages from topic and create the first aggregated result, but when it tries to store it in statestore, receives a 404 for schema not present.
Exception in thread "odoAvroTest-e4ef8e3e-ea1e-458c-b309-b2afefbeacec-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=odometer, partition=0, offset=0, stacktrace=org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema: {"type":"record","name": "" .... }
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:226)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:252)
at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:319)
at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:307)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getIdFromRegistry(CachedSchemaRegistryClient.java:165)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getId(CachedSchemaRegistryClient.java:297)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:73)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:53)
at io.confluent.kafka.streams.serdes.avro.SpecificAvroSerializer.serialize(SpecificAvroSerializer.java:65)
at io.confluent.kafka.streams.serdes.avro.SpecificAvroSerializer.serialize(SpecificAvroSerializer.java:38)
at org.apache.kafka.streams.state.internals.ValueAndTimestampSerializer.serialize(ValueAndTimestampSerializer.java:59)
at org.apache.kafka.streams.state.internals.ValueAndTimestampSerializer.serialize(ValueAndTimestampSerializer.java:50)
at org.apache.kafka.streams.state.internals.ValueAndTimestampSerializer.serialize(ValueAndTimestampSerializer.java:27)
at org.apache.kafka.streams.state.StateSerdes.rawValue(StateSerdes.java:192)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.put(MeteredKeyValueStore.java:166)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl$KeyValueStoreReadWriteDecorator.put(ProcessorContextImpl.java:486)
at org.apache.kafka.streams.kstream.internals.KStreamAggregate$KStreamAggregateProcessor.process(KStreamAggregate.java:103)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:117)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:201)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:180)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:133)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:363)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:199)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:425)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:912)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:819)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:788)
Do let know if there are additional config tweaks that are necessary to make this work. When i change the input to hashmap and /or a simple POJO amd use JSONSerde, the code seems to work and creates aggregation

The issue here is Schema Registry needed by AVRO Serde. When you set value Serde in Materialized.with(), you have to set the schema registry config to your serde.

KafkaStream does not use the serde given in Consumed.with(), but uses the default serde

I have created Serde consuming from kafka as the following
import org.apache.kafka.connect.json.JsonDeserializer;
import org.apache.kafka.connect.json.JsonSerializer;
final Deserializer<JsonNode> jsonDeserializer = new JsonDeserializer();
final Serializer<JsonNode> jsonSerializer = new JsonSerializer();
final Serde<JsonNode> jsonNodeSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, JsonNode> eventStream = builder
.stream("my-test-1",
Consumed.with(Serdes.String(), jsonNodeSerde)
but still receive serialization error:
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.StringSerializer / value: org.apache.kafka.common.serialization.ByteArraySerializer) is not compatible to the actual key or value type (key type: java.lang.String / value type: com.fasterxml.jackson.databind.node.ObjectNode). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
As Consumed.with() is already provided, why the default serde is still used? As the answer written here, this should work, or?
https://stackoverflow.com/a/48832957/3952994

Yes, the problem is that your data doesn't match the serdes.
A serializer (key: org.apache.kafka.common.serialization.StringSerializer /
value: org.apache.kafka.common.serialization.ByteArraySerializer)
is not compatible to the actual key or value type
(key type: java.lang.String /
value type: com.fasterxml.jackson.databind.node.ObjectNode).
However, the error message says the problem is caused when data is serialized, i.e. when Kafka Streams attempts to write the data somewhere.
Your code snippet with Consumed, however, is about deserializing and thus reading data. Therefore it seems that the problem is not caused by the code snippet you shared in your question, but by code that is presumably further down in your Java file, which is not shown in your question. (Btw, it would have helped if you had provided the full stack trace of the error.)

SqsListener String index out of bounds issue

I'm encountering a really weird problem when trying to use the #SQSListener annotation from the Spring Cloud module.
Here's my listener method:
#SqsListener(value = "myproject-dev-au-error-queue")
public void listenPhoenix(String message) throws IOException {
logger.info(message);
}
However, once I run the project, it starts reading messages from the queue and fails with the following error:
Exception in thread "simpleMessageListenerContainer-4" Exception in thread "simpleMessageListenerContainer-6" Exception in thread "simpleMessageListenerContainer-9" Exception in thread "simpleMessageListenerContainer-10" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1931)
at org.springframework.cloud.aws.messaging.core.QueueMessageUtils.getNumberValue(QueueMessageUtils.java:93)
at org.springframework.cloud.aws.messaging.core.QueueMessageUtils.getMessageAttributesAsMessageHeaders(QueueMessageUtils.java:80)
at org.springframework.cloud.aws.messaging.core.QueueMessageUtils.createMessage(QueueMessageUtils.java:56)
at org.springframework.cloud.aws.messaging.listener.SimpleMessageListenerContainer$MessageExecutor.getMessageForExecution(SimpleMessageListenerContainer.java:375)
at org.springframework.cloud.aws.messaging.listener.SimpleMessageListenerContainer$MessageExecutor.run(SimpleMessageListenerContainer.java:336)
at org.springframework.cloud.aws.messaging.listener.SimpleMessageListenerContainer$SignalExecutingRunnable.run(SimpleMessageListenerContainer.java:392)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
With the problematic part being in the spring-cloud-aws-messaging module QueueMessageUtils class numberType variable assignment:
private static Object getNumberValue(MessageAttributeValue value) {
String numberType = value.getDataType().substring("Number".length() + 1);
try {
Class<? extends Number> numberTypeClass = Class.forName(numberType).asSubclass(Number.class);
return NumberUtils.parseNumber(value.getStringValue(), numberTypeClass);
} catch (ClassNotFoundException var3) {
throw new MessagingException(String.format("Message attribute with value '%s' and data type '%s' could not be converted into a Number because target class was not found.", value.getStringValue(), value.getDataType()), var3);
}
}
Has anyone seen this before and if so is there a way to fix this?
P.S: Since I don't really care about the message attributes I wouldn't mind if they were completely ignored.
Thanks in advance.

The exception in the given code is thrown from the following line.
String numberType = value.getDataType().substring("Number".length() + 1);
According to the documentation in AWS JAVA SDK,it explains how the getDataType() function works (See this link). According to the documentation, it will return one of the following values.
String
Number
Binary
Amazon SQS supports the following logical data types: String, Number,
and Binary. For the Number data type, you must use StringValue.
Now, when you call value.getDataType(), it will return one of the above values. Assuming it's "Number", you are trying to get a substring of it, starting from index 6 (where "Number".length() + 1 = 6)
But there is no such index in the String returned by value.getDataType(). Therefore, it will throw a java.lang.StringIndexOutOfBoundsException exception.
As a solution for this, you can simply use the following instead of getting the substring of it.
String numberType = value.getDataType();

I have also face the same issue, when publishing message with SQS Extended Library, it automatically added one attribute along with the message SQSLargePayloadSize with the datatype Number which is causing the problem. the exception is resolved by updating the dependency from
implementation group: 'org.springframework.cloud', name: 'spring-cloud-aws-messaging', version: '2.2.6.RELEASE'
to any latest version of awspring spring-cloud-aws-messaging
implementation group: 'io.awspring.cloud', name: 'spring-cloud-aws-messaging', version: '2.4.1'
make sure all the package must be imported from io.awspring.cloud

Using a when clause in an enum class getter in Kotlin

I'm trying to get a specific getter of a property in kotlin to be based on the value of the enum it is called from. This is what I got so far:
enum class Endpoint {
EVENTS, GAMES;
val baseUrl = "https://www.example.com/api"
val path: String
get() = when(this){
EVENTS -> "$baseUrl/events"
GAMES -> "$baseUrl/games"
}
}
Called like this:
print(Endpoint.EVENTS.path)
While this compiles without any problem, as soon as I run it I get a NullPointerException with the error Attempt to invoke virtual method 'java.lang.Object [...].Endpoint[].clone()' on a null object reference
I'm not sure what I'm doing wrong or what the proper way to accomplish what is stated above is.
EDIT: Full log of the exception:
05-09 22:51:33.793 15673-15673/com.filippovigani.eventvods E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.filippovigani.eventvods, PID: 15673
java.lang.ExceptionInInitializerError
at com.filippovigani.eventvods.networking.Endpoint.getPath(Endpoint.kt:21)
at com.filippovigani.eventvods.networking.Endpoint.<init>(Endpoint.kt:25)
at com.filippovigani.eventvods.networking.Endpoint.<clinit>(Endpoint.kt)
at com.filippovigani.eventvods.networking.EventvodsApi$Companion.getEvents(EventvodsApi.kt:8)
at com.filippovigani.eventvods.MainActivity.onCreate(MainActivity.kt:19)
at android.app.Activity.performCreate(Activity.java:5990)
at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1106)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2278)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2387)
at android.app.ActivityThread.access$800(ActivityThread.java:151)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1303)
at android.os.Handler.dispatchMessage(Handler.java:102)
at android.os.Looper.loop(Looper.java:135)
at android.app.ActivityThread.main(ActivityThread.java:5254)
at java.lang.reflect.Method.invoke(Native Method)
at java.lang.reflect.Method.invoke(Method.java:372)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:903)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:698)
Caused by: java.lang.NullPointerException: Attempt to invoke virtual method 'java.lang.Object com.filippovigani.eventvods.networking.Endpoint[].clone()' on a null object reference
at com.filippovigani.eventvods.networking.Endpoint.values(Endpoint.kt)
at com.filippovigani.eventvods.networking.Endpoint$WhenMappings.<clinit>(Unknown Source)
at com.filippovigani.eventvods.networking.Endpoint.getPath(Endpoint.kt:21) 
at com.filippovigani.eventvods.networking.Endpoint.<init>(Endpoint.kt:25) 
at com.filippovigani.eventvods.networking.Endpoint.<clinit>(Endpoint.kt) 
at com.filippovigani.eventvods.networking.EventvodsApi$Companion.getEvents(EventvodsApi.kt:8) 
at com.filippovigani.eventvods.MainActivity.onCreate(MainActivity.kt:19) 
at android.app.Activity.performCreate(Activity.java:5990) 
at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1106) 
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2278) 
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2387) 
at android.app.ActivityThread.access$800(ActivityThread.java:151) 
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1303) 
at android.os.Handler.dispatchMessage(Handler.java:102) 
at android.os.Looper.loop(Looper.java:135) 
at android.app.ActivityThread.main(ActivityThread.java:5254) 
at java.lang.reflect.Method.invoke(Native Method) 
at java.lang.reflect.Method.invoke(Method.java:372) 
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:903) 
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:698)

I can't reproduce your error, the code works fine. Nevertheless, I think the solution is a bit too complex, why don't you use a constructor argument to provide the constant-specific value:
enum class Endpoint(service: String) {
EVENTS("/events"), GAMES("/games");
private val baseUrl = "https://www.example.com/api"
val path: String = baseUrl + service
}

The exception indicates you're trying to access path within the constructor of the enum. This causes a problem because the enum is not yet ready for use during construction.
This means the following chain of calls results in failure:
Init Endpoint
Init Endpoint.EVENTS
Call to Endpoint.getPath() (This does not show in your code)
Uses Endpoint$WhenMapping, begin init of that class.
Endpoint$WhenMapping uses EndPoint.values(), but since we're still initializing the instances of the enum, the values array cannot be provided, returning null.
To ensure it is not modified, WhenMapping clones and caches the array, but since the array is null when it never should be outside of initialization this causes the NPE.
Simply put your code somehow relies on a class that requires itself to be fully initialized to use. Since your code does not display how you're calling Endpoint.EVENTS.path this is all that can be said about it.

Build a Custom Tokenizer for elasticsearch

I'm building a custom tokenizer in response to this: Performance of doc_values field vs analysed field
None of this API appears to be documented (?), so I'm going off of code samples from other plugins/tokenizers, but when I restart elastic having deployed my tokenizer I get this error constantly in the logs:
[2017-09-20 08:45:37,412][WARN ][indices.cluster ] [Samuel Silke] [[storm-crawler-2017-09-11][3]] marking and sending shard failed due to [failed to create index]
[storm-crawler-2017-09-11] IndexCreationException[failed to create index]; nested: CreationException[Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with #Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error];
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.common.inject.CreationException: Guice creation errors:
1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with #Inject or a zero-argument constructor that is not private.
at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
at _unknown_
1 error
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:360)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:172)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:157)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
... 9 more
My tokenizer is built for v2.3.4, and the TokenizerFactory looks like this:
public class UrlTokenizerFactory extends AbstractTokenizerFactory {
#Inject
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, #Assisted String name, #Assisted Settings settings){
super(index, indexSettings.getSettings(), name, settings);
}
#Override
public Tokenizer create() {
return new UrlTokenizer();
}
}
I genuinely don't know what I'm doing wrong. Have I deployed it incorrectly? It appears to be using my classes according to the logs...
I've only deployed it to one of my es nodes (4-node cluster). The /_cat/plugins?v endpoint gives this:
name component version type url
Samuel Silke urltokenizer 2.3.4.0 j
As there's little or no documentation on this process, I've got this far by copying constructs as created in plugins by other people.
The error I'm seeing doesn't make sense. My TokenizerFactory looks just like everyone else's for this version of elastic. What am I doing wrong or, possibly, not doing that I should be to make this work?

Turns out I was missing an Environment variable. It should have been this:
public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, Environment env, #Assisted String name, #Assisted Settings settings){
...
I found a similar one here in the end: https://github.com/codelibs/elasticsearch-analysis-kuromoji-neologd/blob/2.3.x/src/main/java/org/codelibs/elasticsearch/kuromoji/neologd/index/analysis/KuromojiTokenizerFactory.java

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Insert data on TitanDB using Spark (or SparkStreaming) - spark-streaming

Related

Creating a state store with Avro inside Kstream Consumer Processor

KafkaStream does not use the serde given in Consumed.with(), but uses the default serde

SqsListener String index out of bounds issue

Using a when clause in an enum class getter in Kotlin

Build a Custom Tokenizer for elasticsearch

Categories

Resources