ConcurrentModificationException using Collectors.toSet() - java-8

I have a Set<Objects> that I want to filter by class to obtain a Set<Foo> (i.e., the subset of Objects that are instanceof Foo). To do this with Java-8 I wrote
Set<Foo> filtered = initialSet.parallelStream().filter(x -> (x instanceof Foo)).map(x -> (Foo) x).collect(Collectors.toSet());
This is throwing a ConcurrentModificationException:
java.util.ConcurrentModificationException
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1388)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
The problem is apparently the collector but I have no clue as to why.

Found (and fixed) the problem. The issue is the initialSet. This is being obtained by a call to a 3rd party library (in this case a JGraphT DirectedPseudoGraph instance). The underlying graph was being modified on another thread. Since the initialSet is returned by reference the result is the ConcurrentModificationException.
The real problem is therefore not the stream processing but using a graph that returned a Set that wasn't thread-safe. The solution was to use AsSynchronizedGraph.

Related

How can I configurate the late acceptance algorithm?

I am encountering some problems when it comes to modifying the applied algorithms:
First, it seems I cannot change the late acceptance algorithm configuration. This is my local search configuration, which works nicely when I am not specifying neither acceptor nor forager:
<localSearch>
<localSearchType>LATE_ACCEPTANCE</localSearchType>
<acceptor>
<lateAcceptanceSize>5000</lateAcceptanceSize>
<entityTabuRatio>0.1</entityTabuRatio>
<!--valueTabuSize>10000</valueTabuSize-->
</acceptor>
<forager>
<acceptedCountLimit>1</acceptedCountLimit>
</forager>
<termination>
<minutesSpentLimit>10</minutesSpentLimit>
<bestScoreLimit>[0/11496/11496]hard/[0]soft</bestScoreLimit>
</termination>
</localSearch>
I get this error message as part of the output
java.lang.IllegalArgumentException: The localSearchType (...
java.lang.IllegalArgumentException: The localSearchType (LATE_ACCEPTANCE) must not be configured if the acceptorConfig (AcceptorConfig()) is explicitly configured.
at org.optaplanner.core.config.localsearch.LocalSearchPhaseConfig.buildAcceptor(LocalSearchPhaseConfig.java:182)
at org.optaplanner.core.config.localsearch.LocalSearchPhaseConfig.buildDecider(LocalSearchPhaseConfig.java:132)
at org.optaplanner.core.config.localsearch.LocalSearchPhaseConfig.buildPhase(LocalSearchPhaseConfig.java:117)
at org.optaplanner.core.config.localsearch.LocalSearchPhaseConfig.buildPhase(LocalSearchPhaseConfig.java:54)
at org.optaplanner.core.config.solver.SolverConfig.buildPhaseList(SolverConfig.java:364)
at org.optaplanner.core.config.solver.SolverConfig.buildSolver(SolverConfig.java:270)
at org.optaplanner.core.impl.solver.AbstractSolverFactory.buildSolver(AbstractSolverFactory.java:61)
at com.gmv.g2gmps.CBand.App.main(App.java:75)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
at java.lang.Thread.run(Thread.java:748)
Finally, I was told that the configuration of LATE_ACCEPTANCE and lateAcceptanceSize at the same time is incompatible. Therefore I set late acceptance size as the only parameter (therefore Late acceptance as an acceptor) instead of writing it down as the local search type.
<localSearch>
<acceptor>
<lateAcceptanceSize>5000</lateAcceptanceSize>
<entityTabuRatio>0.1</entityTabuRatio>
<!--valueTabuSize>10000</valueTabuSize-->
</acceptor>
<forager>
<acceptedCountLimit>1</acceptedCountLimit>
</forager>
<termination>
<minutesSpentLimit>10</minutesSpentLimit>
<bestScoreLimit>[0/11496/11496]hard/[0]soft</bestScoreLimit>
</termination>
</localSearch>

How do I set up a State Store for a Transformer

I'm trying to create a Transformer, and running into problms with the initialization of its StateStore. I looked at the example in How to register a stateless processor (that seems to require a StateStore as well)?
and it makes sense, but I'm trying something different:
KeyValueBytesStoreSupplier groupToKVStore_supplier =
Stores.persistentKeyValueStore( state_store_name );
StoreBuilder< KeyValueStore< G, KeyValue< K, V > > > groupToKVStore_builder =
Stores.keyValueStoreBuilder( groupToKVStore_supplier, Gserde, KVserde );
stream_builder.addStateStore( groupToKVStore_builder );
My intention is to use a String as the State Store key and a KeyValue as the State Store value. Is the formulation above correct? I'm asking because when the stream containing my Transformer is starting up, it throws an exception that says:
Caused by: org.apache.kafka.streams.errors.TopologyBuilderException: Invalid topology building: Processor KSTREAM-TRANSFORM-0000000001 has no access to StateStore state_store_1582785598
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.getStateStore(ProcessorContextImpl.java:72)
at com.ui.streaming.processors.sort.WindowedTimeSorter.init(WindowedTimeSorter.java:135)
at org.apache.kafka.streams.kstream.internals.KStreamTransform$KStreamTransformProcessor.init(KStreamTransform.java:51)
at org.apache.kafka.streams.processor.internals.ProcessorNode$2.run(ProcessorNode.java:54)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208)
at org.apache.kafka.streams.processor.internals.ProcessorNode.init(ProcessorNode.java:10
Per Matthias' suggestion, I added a StateStore name argument to the transform invocation in my Stream, and that appears to get us past the error shown above. However, we then get the following exception:
ERROR stream-thread [A.Completely.Different.appID-b04af4b4-fdbb-4353-9aa5-6d71f7c22f9e-StreamThread-1] Failed to process stream task 0_1 due to the following error: (org.apache.kafka.streams.processor.internals.AssignedStreamsTasks:105)
java.lang.IllegalStateException: This should not happen as timestamp() should only be called while a record is processed
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.timestamp(AbstractProcessorContext.java:153)
at org.apache.kafka.streams.state.internals.StoreChangeLogger.logChange(StoreChangeLogger.java:59)
at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:69)
at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:29)
at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:198)
at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:117)
at com.ui.streaming.processors.sort.WindowedTimeSorter.transform(WindowedTimeSorter.java:167)
at com.ui.streaming.processors.sort.WindowedTimeSorter.transform(WindowedTimeSorter.java:1)
at org.apache.kafka.streams.kstream.internals.KStreamTransform$KStreamTransformProcessor.process(KStreamTransform.java:56)
Alas, things are still not quite right: First off, my Transformer's init method is being called three times; it should only be once, rigt? Second, I'm getting a runtime error in my Transformer's transform method the first time it tries to store something into the StateStore:
INFO stream-thread [A.Completely.Different.appID-7dc67466-20f4-4e6c-8a69-bc0710a42f3c-StreamThread-1] Shutdown complete (org.apache.kafka.streams.processor.internals.StreamThread:1124)
Exception in thread "A.Completely.Different.appID-7dc67466-20f4-4e6c-8a69-bc0710a42f3c-StreamThread-1" java.lang.IllegalStateException: This should not happen as timestamp() should only be called while a record is processed
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.timestamp(AbstractProcessorContext.java:153)
at org.apache.kafka.streams.state.internals.StoreChangeLogger.logChange(StoreChangeLogger.java:59)
at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:69)
at org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:29)
at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:198)
at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:117)
at com.ui.streaming.processors.sort.WindowedTimeSorter.transform(WindowedTimeSorter.java:155)
Just adding the store to the topology is not sufficient. You additionally need to connect the store to the transformer by passing the store name into transform():
stream.transform(..., state_store_name);
Update:
For the second exception, I assume that you don't return a new object when TransformerSupplier#get() is called, but you return the same object each time. As the "supplier pattern" suggests, you need to create a new object each time #get() is called (otherwise, a supplier would not make sense and it would be possible to hand in a single object directly). Compare the FAQ: https://docs.confluent.io/current/streams/faq.html#why-do-i-get-an-illegalstateexception-when-accessing-record-metadata

Project Reactor: possibly misleading documentation about error handling

I am reading Reactor reference documentation about error handling and something seems wrong. For example this section about fallback method:
Flux.just("key1", "key2")
.flatMap(k -> callExternalService(k))
.onErrorResume(e -> getFromCache(k));
But onErrorResume() lambda takes only one parameter e (error throwable). How k (previous value emitted by flux) is referenced here?
There are other similar code snippets in the docs. Am I reading this wrong?
Or if documentation is indeed incorrect how can I actually handle this case: recover from error by executing alternative path with previous value?
Yes, I think you found a bug in the documentation.
If you want to use k the call to onErrorResume must happen inside the argument to flatMap like so:
Flux.just("key1", "key2")
.flatMap(k -> callExternalService(k)
.onErrorResume(e -> getFromCache(k))
);
Regarding your comment: It is not possible to have the value being processed as part of the onErrorXXX methods because the error in question might not be happening while a value was processed. Maybe it happened for example while handling backpressure (i.e. requestion more elements) or while subscribing.

Publish-Subscribe Channels Both Going to Kafka Result in Duplicate KafkaProducerContexts

I am attempting to use Spring Integration to send data from one channel to two different Kafka queues after those same data go through different transformations on the way to their respective queues. The problem is I apparently have duplicate producer contexts, and I don't know why.
Here is my flow configuration:
flow -> flow
.channel(“firstChannel")
.publishSubscribeChannel(Executors.newCachedThreadPool(), s -> s
.subscribe(f -> f
.transform(firstTransformer::transform)
.channel(MessageChannels.queue(50))
.handle(Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(firstMetadata(), brokerAddress), e -> e.id(“firstKafkaOutboundChannelAdapter")
.autoStartup(true)
.poller(p -> p.fixedDelay(1000, TimeUnit.MILLISECONDS).receiveTimeout(0).taskExecutor(taskExecutor))
.get())
)
.subscribe(f -> f
.transform(secondTransformer::transform)
.channel(MessageChannels.queue(50))
.handle(Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(secondMetadata(), brokerAddress), e -> e.id(“secondKafkaOutboundChannelAdapter")
.autoStartup(true)
.poller(p -> p.fixedDelay(1000, TimeUnit.MILLISECONDS).receiveTimeout(0).taskExecutor(taskExecutor))
.get())
));
The exception is this:
Could not register object [org.springframework.integration.kafka.support.KafkaProducerContext#3163987e] under bean name 'not_specified': there is already object [org.springframework.integration.kafka.support.KafkaProducerContext#15f193b8] bound
I have tried using different kafkaConfig objects, but that hasn't helped. Meanwhile, the ProducerMetadata instances are distinct as you can see from the different first parameters to addProducer. Those provide the names of the respective destination queues among other metadata.
It sounds like there are some implicit bean definitions that are being created that conflict with each other.
How can I resolve this exception with the two KafkaProducerContexts?
You should not to use .get() on those KafkaProducerMessageHandlerSpec and let Framework to work out the environment for you.
The issue is because KafkaProducerMessageHandlerSpec implements ComponentsRegistration and no body cares about the:
public Collection<Object> getComponentsToRegister() {
this.kafkaProducerContext.setProducerConfigurations(this.producerConfigurations);
return Collections.<Object>singleton(this.kafkaProducerContext);
}
after manual .get() invocation.
I agree, this a some inconvenience and we should find some better solution for end-application, but there is no yet choice, unless follow with the Spec style for the Framework components, like Kafka.outboundChannelAdapter().
Hope I am clear.
UPDATE
OK, it's definitely an issue on our side. And we will fix it soon:
https://jira.spring.io/browse/INTEXT-216
https://jira.spring.io/browse/INTEXT-217
Meanwhile the workaround for you is like this:
KafkaProducerContext kafkaProducerContext = (KafkaProducerContext) kafkaProducerMessageHandlerSpec.getComponentsToRegister().iterator().next();
kafkaProducerContext.setBeanName(null);
Where you should move
Kafka.outboundChannelAdapter(kafkaConfig)
.addProducer(firstMetadata(), brokerAddress)
to the separate private method to get access to that kafkaProducerContext.

Unknown error in XPath

One of our web application, which has been running in our production environment for a long time, most recently it is running into an weird error when there is a high volume of transactions. We couldn't figure out what is exactly the root cause of the problem, but we found some similar issues in the previous version, WebSphere 6, related to a bug in Xalan version used by the app server. Our application server actually is WebSphere 7, which is supposed to have it fixed, besides it's not using Xalan under the hood anymore. Our application doesn't have Xalan jar embedded too.
To have it fixed we just restart the application itself.
One important note is that the Document is being cached (docs.get(tableName)) and reused to execute the XPath evaluation. We did it to avoid the cost of parsing the Document every time.
The app code is
Document doc = null;
try {
doc = docs.get(tableName);
if (doc == null)
return null;
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
XPathExpression expr = xpath.compile(toUpper(xPathQuery));
Object result = expr.evaluate(doc, XPathConstants.NODESET);
return (NodeList) result;
} catch (XPathExpressionException e) {
logger.error("Error executing XPath", e);
}
And the error stack is here
javax.xml.transform.TransformerException: Unknown error in XPath.
at java.lang.Throwable.<init>(Throwable.java:67)
at javax.xml.transform.TransformerException.<init>(Unknown Source)
at org.apache.xpath.XPath.execute(Unknown Source)
at org.apache.xpath.jaxp.XPathExpressionImpl.evaluate(Unknown Source)
Caused by: java.lang.NullPointerException
at org.apache.xerces.dom.ElementNSImpl.getPrefix(Unknown Source)
at org.apache.xml.dtm.ref.dom2dtm.DOM2DTM.processNamespacesAndAttributes(Unknown Source)
at org.apache.xml.dtm.ref.dom2dtm.DOM2DTM.nextNode(Unknown Source)
at org.apache.xml.dtm.ref.DTMDefaultBase._nextsib(Unknown Source)
at org.apache.xml.dtm.ref.DTMDefaultBase.getNextSibling(Unknown Source)
at org.apache.xml.dtm.ref.DTMDefaultBaseTraversers$ChildTraverser.next(Unknown Source)
at org.apache.xpath.axes.AxesWalker.getNextNode(Unknown Source)
at org.apache.xpath.axes.AxesWalker.nextNode(Unknown Source)
at org.apache.xpath.axes.WalkingIterator.nextNode(Unknown Source)
at org.apache.xpath.axes.NodeSequence.nextNode(Unknown Source)
at org.apache.xpath.axes.NodeSequence.runTo(Unknown Source)
at org.apache.xpath.axes.NodeSequence.setRoot(Unknown Source)
at org.apache.xpath.axes.LocPathIterator.execute(Unknown Source)
... 16 more
This is the similar issue what I mentioned.
http://www-01.ibm.com/support/docview.wss?uid=swg1PK42574
Thakns.
Many people don't realise that the DOM is not thread-safe (even if you are only doing reads). If you are caching a DOM object, make sure you synchronize all access to it.
Frankly, this makes DOM unsuited to this kind of application. I know I'm biased, but I would suggest switching to Saxon, whose native tree implementation is not only far faster than DOM, but also thread-safe. One user recently tweeted about getting a 100x performance improvement when they made this switch.

Resources