I am using the low level processor API with state stores. Up to 0.10.0.1 it was working fine, but I have upgraded Kafka Streams and I am getting the below error. I figured out that this is due to the changelog and it is looking at the record context:
java.lang.IllegalStateException: This should not happen as timestamp() should only be called while a record is processed
! at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.timestamp(AbstractProcessorContext.java:150)
! at org.apache.kafka.streams.state.internals.StoreChangeLogger.logChange(StoreChangeLogger.java:60)
! at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:47)
! at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueStore.put(ChangeLoggingKeyValueStore.java:66)
! at org.apache.kafka.streams.state.internals.MeteredKeyValueStore$2.run(MeteredKeyValueStore.java:67)
#Override
public void process(String arg0, List<Data> data {
data.forEach((x) -> {
String rawKey = x.getId();
Data data = kvStore.get(rawKey);
long bytesize = data == null ? 0 : data.getVolume();
x.addVolume(bytesize);
kvStore.put(rawKey, x);
});
}
public void start() {
builder = new KStreamBuilder();
storeSupplier = Stores.create(getKVStoreName()).withKeys(getProcessorKeySerde()).withValues(getProcessorValueSerde()).persistent().build();
builder.addStateStore(storeSupplier);
stream = builder.stream(Serdes.String(), serde(),getTopicName());
processStream(stream);
streams = new KafkaStreams(builder, props);
streams.cleanUp();
streams.start();
}
#Override
public void init(ProcessorContext context) {
super.init(context);
this.context = context;
this.context.schedule(timeinterval);
this.kvStore = (KeyValueStore) context.getStateStore(getKVStoreName());
}
Exceptions like this may come up when using the same instance of the Processor across multiple streams threads or partitions.
Ensure that you are returning a new instance to the ProcessorSupplier:
new ProcesorSupplier(() -> new Processor(...
The same applies to Transformer and TransformerSupplier as well.
To broadly quote the documentation:
Creating a single Processor/Transformer object and returning the same object reference in ProcesorSupplier/TransformerSupplier#get() would be a violation of the supplier pattern and leads to runtime exceptions.
Related
*** Update: I have changed my approach as described in my answer to the question, due to which the original issue reported becomes moot. ***
I'm trying to develop a Nifi application that provides a WebSocket interface to Kakfa. I could not accomplish this using the standard Nifi components as I have tried below (it may not make sense but intuitively this is what I want to accomplish):
I have now created a custom Processor "ReadFromKafka" that I intend to use as shown in the image below. "ReadFromKafka" would use the same implementation as the standard "PutWebSocket" component but would read messages from a Kafka Topic and send as response to the WebSocket client.
I have provided a code snippet of the implementation below:
#SystemResourceConsideration(resource = SystemResource.MEMORY)
public class ReadFromKafka extends AbstractProcessor {
public static final PropertyDescriptor PROP_WS_SESSION_ID = new PropertyDescriptor.Builder()
.name("websocket-session-id")
.displayName("WebSocket Session Id")
.description("A NiFi Expression to retrieve the session id. If not specified, a message will be " +
"sent to all connected WebSocket peers for the WebSocket controller service endpoint.")
.required(true)
.addValidator(StandardValidators.NON_BLANK_VALIDATOR)
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
.defaultValue("${" + ATTR_WS_SESSION_ID + "}")
.build();
public static final PropertyDescriptor PROP_WS_CONTROLLER_SERVICE_ID = new PropertyDescriptor.Builder()
.name("websocket-controller-service-id")
.displayName("WebSocket ControllerService Id")
.description("A NiFi Expression to retrieve the id of a WebSocket ControllerService.")
.required(true)
.addValidator(StandardValidators.NON_BLANK_VALIDATOR)
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
.defaultValue("${" + ATTR_WS_CS_ID + "}")
.build();
public static final PropertyDescriptor PROP_WS_CONTROLLER_SERVICE_ENDPOINT = new PropertyDescriptor.Builder()
.name("websocket-endpoint-id")
.displayName("WebSocket Endpoint Id")
.description("A NiFi Expression to retrieve the endpoint id of a WebSocket ControllerService.")
.required(true)
.addValidator(StandardValidators.NON_BLANK_VALIDATOR)
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
.defaultValue("${" + ATTR_WS_ENDPOINT_ID + "}")
.build();
public static final PropertyDescriptor PROP_WS_MESSAGE_TYPE = new PropertyDescriptor.Builder()
.name("websocket-message-type")
.displayName("WebSocket Message Type")
.description("The type of message content: TEXT or BINARY")
.required(true)
.addValidator(StandardValidators.NON_BLANK_VALIDATOR)
.defaultValue(WebSocketMessage.Type.TEXT.toString())
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
.build();
public static final Relationship REL_SUCCESS = new Relationship.Builder()
.name("success")
.description("FlowFiles that are sent successfully to the destination are transferred to this relationship.")
.build();
public static final Relationship REL_FAILURE = new Relationship.Builder()
.name("failure")
.description("FlowFiles that failed to send to the destination are transferred to this relationship.")
.build();
private static final List<PropertyDescriptor> descriptors;
private static final Set<Relationship> relationships;
static{
final List<PropertyDescriptor> innerDescriptorsList = new ArrayList<>();
innerDescriptorsList.add(PROP_WS_SESSION_ID);
innerDescriptorsList.add(PROP_WS_CONTROLLER_SERVICE_ID);
innerDescriptorsList.add(PROP_WS_CONTROLLER_SERVICE_ENDPOINT);
innerDescriptorsList.add(PROP_WS_MESSAGE_TYPE);
descriptors = Collections.unmodifiableList(innerDescriptorsList);
final Set<Relationship> innerRelationshipsSet = new HashSet<>();
innerRelationshipsSet.add(REL_SUCCESS);
innerRelationshipsSet.add(REL_FAILURE);
relationships = Collections.unmodifiableSet(innerRelationshipsSet);
}
#Override
public Set<Relationship> getRelationships() {
return relationships;
}
#Override
public final List<PropertyDescriptor> getSupportedPropertyDescriptors() {
return descriptors;
}
#Override
public void onTrigger(final ProcessContext context, final ProcessSession processSession) throws ProcessException {
final FlowFile flowfile = processSession.get();
if (flowfile == null) {
return;
}
final String sessionId = context.getProperty(PROP_WS_SESSION_ID)
.evaluateAttributeExpressions(flowfile).getValue();
final String webSocketServiceId = context.getProperty(PROP_WS_CONTROLLER_SERVICE_ID)
.evaluateAttributeExpressions(flowfile).getValue();
final String webSocketServiceEndpoint = context.getProperty(PROP_WS_CONTROLLER_SERVICE_ENDPOINT)
.evaluateAttributeExpressions(flowfile).getValue();
final String messageTypeStr = context.getProperty(PROP_WS_MESSAGE_TYPE)
.evaluateAttributeExpressions(flowfile).getValue();
final WebSocketMessage.Type messageType = WebSocketMessage.Type.valueOf(messageTypeStr);
if (StringUtils.isEmpty(sessionId)) {
getLogger().debug("Specific SessionID not specified. Message will be broadcast to all connected clients.");
}
if (StringUtils.isEmpty(webSocketServiceId)
|| StringUtils.isEmpty(webSocketServiceEndpoint)) {
transferToFailure(processSession, flowfile, "Required WebSocket attribute was not found.");
return;
}
final ControllerService controllerService = context.getControllerServiceLookup().getControllerService(webSocketServiceId);
if (controllerService == null) {
getLogger().debug("ControllerService is NULL");
transferToFailure(processSession, flowfile, "WebSocket ControllerService was not found.");
return;
} else if (!(controllerService instanceof WebSocketService)) {
getLogger().debug("ControllerService is not instance of WebSocketService");
transferToFailure(processSession, flowfile, "The ControllerService found was not a WebSocket ControllerService but a "
+ controllerService.getClass().getName());
return;
}
...
processSession.getProvenanceReporter().send(updatedFlowFile, transitUri.get(), transmissionMillis);
processSession.transfer(updatedFlowFile, REL_SUCCESS);
processSession.commit();
} catch (WebSocketConfigurationException|IllegalStateException|IOException e) {
// WebSocketConfigurationException: If the corresponding WebSocketGatewayProcessor has been stopped.
// IllegalStateException: Session is already closed or not found.
// IOException: other IO error.
getLogger().error("Failed to send message via WebSocket due to " + e, e);
transferToFailure(processSession, flowfile, e.toString());
}
}
private FlowFile transferToFailure(final ProcessSession processSession, FlowFile flowfile, final String value) {
flowfile = processSession.putAttribute(flowfile, ATTR_WS_FAILURE_DETAIL, value);
processSession.transfer(flowfile, REL_FAILURE);
return flowfile;
}
}
I have deployed the custom processor and when I connect to it using the Chrome "Simple Web Socket Client" I can see the following message in the logs:
ControllerService found was not a WebSocket ControllerService but a com.sun.proxy.$Proxy75
I'm using the exact same code as in PutWebSocket and can't figure out why it would behave any different when I use my custom Processor. I have configured "JettyWebSocketServer" as the ControllerService under "ListenWebSocket" as shown in the image below.
Additional exception details seen in the log are provided below:
java.lang.ClassCastException: class com.sun.proxy.$Proxy75 cannot be cast to class org.apache.nifi.websocket.WebSocketService (com.sun.proxy.$Proxy75 is in unnamed module of loader org.apache.nifi.nar.InstanceClassLoader #35c646b5; org.apache.nifi.websocket.WebSocketService is in unnamed module of loader org.apache.nifi.nar.NarClassLoader #361abd01)
I ended up modifying my flow to utilize out-of-box ListenWebSocket, PutWebSocket Processors, and a custom "FetchFromKafka" Processor that is a modified version of ConsumeKafkaRecord. With this I'm able to provide a WebSocket interface to Kafka. I have provided a screenshot of the updated flow below. More work needs to be done with the custom Processor to support multiple sessions.
I've imlemented my own Transformer, where I'm accumulating events in my StateStore (into the internal changelog kafka topic), and once it accumulates X events, I emmit one accumulated event down the stream.
Everything is perfect, but I don't want accumulated events to be stuck in the pipe if there are no new events are posted for a while.
That's why I decided to use Punctuator:
#Override
public void init(ProcessorContext context) {
this.context = context;
this.kvStore = (KeyValueStore<Patient, Long>) this.context.getStateStore(stateStore);
this.context.schedule(Duration.ofMinutes(180), PunctuationType.WALL_CLOCK_TIME, (timestamp) -> {
KeyValueIterator<Event, Long> iter = kvStore.all();
while (iter.hasNext()) {
KeyValue<Event, Long> entry = iter.next();
...
}
iter.close();
context.commit();
});
}
The problem is that the entries in the StateStore do not contain time sent information. Is there a way to access that? Do I need to implement my own custom StateStore in order to make it work?
Thank you!
When trying to implement a Unit-test in a spring-boot application, I can't retrieve a ConsumerRecord, though a custom Serializer using an own POJO is working. I checked it with the kafka-console-consumer, where a new message is each and every time I run the test generated and appears on the console.
What do I have to do to get the record instead of a null?
#RunWith(SpringRunner.class)
#SpringBootTest
#DisplayName("Testing GlobalMessageTest")
#DirtiesContext
public class NumberPlateSenderTest {
private static Logger log = LogManager.getLogger(NumberPlateSenderTest.class);
#Autowired
KafkaeskAdapterApplication kafkaeskAdapterApplication;
#Autowired
private NumberPlateSender numberPlateSender;
private KafkaMessageListenerContainer<String, NumberPlate> container;
private BlockingQueue<ConsumerRecord<String, NumberPlate>> records;
private static final String SENDER_TOPIC = "numberplate_test_topic";
#ClassRule
public static KafkaEmbedded embeddedKafka = new KafkaEmbedded(1, true, SENDER_TOPIC);
#Before
public void setUp() throws Exception {
// set up the Kafka consumer properties
Map<String, Object> consumerProperties = KafkaTestUtils.consumerProps("sender", "false", embeddedKafka);
consumerProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
consumerProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, NumberPlateDeserializer.class);
// create a Kafka consumer factory
DefaultKafkaConsumerFactory<String, NumberPlate> consumerFactory =
new DefaultKafkaConsumerFactory<>(consumerProperties);
// set the topic that needs to be consumed
ContainerProperties containerProperties = new ContainerProperties(SENDER_TOPIC);
// create a Kafka MessageListenerContainer
container = new KafkaMessageListenerContainer<>(consumerFactory, containerProperties);
// create a thread safe queue to store the received message
records = new LinkedBlockingQueue<>();
// setup a Kafka message listener
container.setupMessageListener((MessageListener<String, NumberPlate>) record -> {
log.info("Message Listener received message='{}'", record.toString());
records.add(record);
});
// start the container and underlying message listener
container.start();
// wait until the container has the required number of assigned partitions
ContainerTestUtils.waitForAssignment(container, embeddedKafka.getPartitionsPerTopic());
}
#DisplayName("Should send a Message to a Producer and retrieve it")
#Test
public void TestProducer() throws InterruptedException {
//Test instance of Numberplate to send
NumberPlate localNumberplate = new NumberPlate();
byte[] bytes = "0x33".getBytes();
localNumberplate.setImageBlob(bytes);
localNumberplate.setNumberString("ABC123");
log.info(localNumberplate.toString());
//Send it
numberPlateSender.sendNumberPlateMessage(localNumberplate);
//Retrieve it
ConsumerRecord<String, NumberPlate> received = records.poll(3, TimeUnit.SECONDS);
log.info("Received the following content of ConsumerRecord: {}", received);
if (received == null) {
assert false;
} else {
NumberPlate retrNumberplate = received.value();
Assert.assertEquals(retrNumberplate, localNumberplate);
}
}
#After
public void tearDown() {
// stop the container
container.stop();
}
}
The complete code can be seen at my github repository.
I read a load of different SO questions and searched the web, but can't find an approach what is wrong with my code. Other users posted similar problems but to no avail.
The kafka version which runs on my Craptop is kafka_2.11-1.0.1
The springframework kafka Client is of version 2.1.5.RELEASE
Your problem that you start consumer against embedded Kafka, but send data to the real one. I don't know what is your goal, but I made it working against an embedded Kafka like this:
#BeforeClass
public static void setup() {
System.setProperty("kafka.bootstrapAddress", embeddedKafka.getBrokersAsString());
}
I override your kafka.bootstrapAddress configuration property for the producer with the broker address provided by the embedded Kafka.
In this case I fail with the:
java.lang.AssertionError: expected: dev.semo.kafkaeskadapter.models.NumberPlate<NumberPlate{numberString='ABC123', imageBlob=[48, 120, 51, 51]}> but was: dev.semo.kafkaeskadapter.models.NumberPlate<NumberPlate{numberString='ABC123', imageBlob=[48, 120, 51, 51]}>
Expected :dev.semo.kafkaeskadapter.models.NumberPlate<NumberPlate{numberString='ABC123', imageBlob=[48, 120, 51, 51]}>
Actual :dev.semo.kafkaeskadapter.models.NumberPlate<NumberPlate{numberString='ABC123', imageBlob=[48, 120, 51, 51]}>
But that's just because you use this assertion:
Assert.assertEquals(retrNumberplate, localNumberplate);
Meanwhile your NumberPlate doesn't provide a proper equals() implementation. Something like this:
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
NumberPlate that = (NumberPlate) o;
return Objects.equals(numberString, that.numberString) &&
Arrays.equals(imageBlob, that.imageBlob);
}
#Override
public int hashCode() {
int result = Objects.hash(numberString);
result = 31 * result + Arrays.hashCode(imageBlob);
return result;
}
Thank you for providing the whole project to play and reproduce! With the "question-answer-question-answer" game we would spend too much time here :-).
For one of my Kafka streams apps, I need to use the features of both DSL and Processor API. My streaming app flow is
source -> selectKey -> filter -> aggregate (on a window) -> sink
After aggregation I need to send a SINGLE aggregated message to the sink. So I define my topology as below
KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> source = builder.stream(source_stream);
source.selectKey(new MyKeyValueMapper())
.filterNot((k,v) -> k.equals("UnknownGroup"))
.process(() -> new MyProcessor());
I define a custom StateStore and register it with my processor as below
public class MyProcessor implements Processor<String, String> {
private ProcessorContext context = null;
Serde<HashMapStore> invSerde = Serdes.serdeFrom(invJsonSerializer, invJsonDeserializer);
KeyValueStore<String, HashMapStore> invStore = (KeyValueStore) Stores.create("invStore")
.withKeys(Serdes.String())
.withValues(invSerde)
.persistent()
.build()
.get();
public MyProcessor() {
}
#Override
public void init(ProcessorContext context) {
this.context = context;
this.context.register(invStore, false, null); // register the store
this.context.schedule(10 * 60 * 1000L);
}
#Override
public void process(String partitionKey, String message) {
try {
MessageModel smb = new MessageModel(message);
HashMapStore oldStore = invStore.get(partitionKey);
if (oldStore == null) {
oldStore = new HashMapStore();
}
oldStore.addSmb(smb);
invStore.put(partitionKey, oldStore);
} catch (Exception e) {
e.printStackTrace();
}
}
#Override
public void punctuate(long timestamp) {
// processes all the messages in the state store and sends single aggregate message
}
#Override
public void close() {
invStore.close();
}
}
When I run the app, I get java.lang.NullPointerException
Exception in thread "StreamThread-18" java.lang.NullPointerException
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.flush(MeteredKeyValueStore.java:167)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:332)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:252)
at org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:446)
at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:434)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:422)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:340)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
Any idea what's going wrong here?
You need to register you store outside of you processor using StreamsBuilder (or KStreamBuilder in older releases). First you create the store, than you registers it to StreamsBuilder (KStreamBuilder), and when you add the processor you provide the store name to connect the processor and the store.
StreamsBuilder builder = new StreamsBuilder();
// create store
StoreBuilder storeBuilder = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("invStore"),
Serdes.String(),
invSerde));
// register store
builder.addStateStore(storeBuilder);
KStream<String, String> source = builder.stream(source_stream);
source.selectKey(new MyKeyValueMapper())
.filterNot((k,v) -> k.equals("UnknownGroup"))
.process(() -> new MyProcessor(), "invStore"); // connect store to processor by providing store name
// older API:
KStreamBuilder builder = new KStreamBuilder();
// create store
StateStoreSupplier storeSupplier = (KeyValueStore)Stores.create("invStore")
.withKeys(Serdes.String())
.withValues(invSerde)
.persistent()
.build();
// register store
builder.addStateStore(storeSupplier);
KStream<String, String> source = builder.stream(source_stream);
source.selectKey(new MyKeyValueMapper())
.filterNot((k,v) -> k.equals("UnknownGroup"))
.process(() -> new MyProcessor(), "invStore"); // connect store to processor by providing store name
I'm implementing an IBackingMap for my Trident topology to store tuples to ElasticSearch (I know there are several implementations for Trident/ElasticSearch integration already existing at GitHub however I've decided to implement a custom one which suits my task better).
So my implementation is a classic one with a factory:
public class ElasticSearchBackingMap implements IBackingMap<OpaqueValue<BatchAggregationResult>> {
// omitting here some other cool stuff...
private final Client client;
public static StateFactory getFactoryFor(final String host, final int port, final String clusterName) {
return new StateFactory() {
#Override
public State makeState(Map conf, IMetricsContext metrics, int partitionIndex, int numPartitions) {
ElasticSearchBackingMap esbm = new ElasticSearchBackingMap(host, port, clusterName);
CachedMap cm = new CachedMap(esbm, LOCAL_CACHE_SIZE);
MapState ms = OpaqueMap.build(cm);
return new SnapshottableMap(ms, new Values(GLOBAL_KEY));
}
};
}
public ElasticSearchBackingMap(String host, int port, String clusterName) {
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", clusterName).build();
// TODO add a possibility to close the client
client = new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress(host, port));
}
// the actual implementation is left out
}
You see it gets host/port/cluster name as input params and creates an ElasticSearch client as a member of the class BUT IT NEVER CLOSES THE CLIENT.
It is then used from within a topology in a pretty familiar way:
tridentTopology.newStream("spout", spout)
// ...some processing steps here...
.groupBy(aggregationFields)
.persistentAggregate(
ElasticSearchBackingMap.getFactoryFor(
ElasticSearchConfig.ES_HOST,
ElasticSearchConfig.ES_PORT,
ElasticSearchConfig.ES_CLUSTER_NAME
),
new Fields(FieldNames.OUTCOME),
new BatchAggregator(),
new Fields(FieldNames.AGGREGATED));
This topology is wrapped into some public static void main, packed in a jar and sent to Storm for execution.
The question is, should I worry about closing the ElasticSearch connection or it is Storm's own business? If it is not done by Storm, how and when in the topology's lifecycle I should do that?
Thanks in advance!
Okay, answering my own question.
First of all, thanks again #dedek for suggestions and reviving the ticket in Storm's Jira.
Finally, since there's no official way to do that, I've decided to go for cleanup() method of Trident's Filter. So far I've verified the following (for Storm v. 0.9.4):
With LocalCluster
cleanup() gets called on cluster's shutdown
cleanup() DOESN'T get called when killing the topology, this shouldn't be a tragedy, very likely one won't use LocalCluster for real deployments anyway
With a real cluster
it gets called when the topology is killed as well as when the worker is stopped using pkill -TERM -u storm -f 'backtype.storm.daemon.worker'
it doesn't get called if the worker is killed with kill -9 or when it crashes or - sadly - when the worker dies due to an exception
In overall that gives more or less decent guarantee of cleanup() to get called, provided you'll be careful with exception handling (I tend to add 'thundercatches' to every of my Trident primitives anyway).
My code:
public class CloseFilter implements Filter {
private static final Logger LOG = LoggerFactory.getLogger(CloseFilter.class);
private final Closeable[] closeables;
public CloseFilter(Closeable... closeables) {
this.closeables = closeables;
}
#Override
public boolean isKeep(TridentTuple tuple) {
return true;
}
#Override
public void prepare(Map conf, TridentOperationContext context) {
}
#Override
public void cleanup() {
for (Closeable c : closeables) {
try {
c.close();
} catch (Exception e) {
LOG.warn("Failed to close an instance of {}", c.getClass(), e);
}
}
}
}
However would be nice if some day hooks for closing connections become a part of the API.