We are using S4 SDK's CloudLoggerFactory to log exceptions throughout our application. For a class "SampleClass", we create a logger like this:
private static final Logger logger = CloudLoggerFactory.getSanitizedLogger(SampleClass.class, "(END)");
and call it for an Exception e:
logger.error(e.getMessage(), e);
A Veracode scan has shown this logging line to be vulnerable to CLRF Injection. To my understanding, the getSanitizedLogger in conjunction with the "(END)" argument should solve this issue. Can you provide some insight into this matter, please?
Thank you in advance!
Actually we plan to remove the log sanitizing feature in the upcoming major release.
We have come to the conclusion that it actually gives a false sense of security and that it should be addressed on the logger implementation level instead, which we cannot do on SDK level as we only rely on the Slf4j abstraction.
(Disclaimer: I'm one of the SAP Cloud SDK developers.)
Update: As Sander mentioned in his answer below we dropped the CloudLoggerFactory starting with version 3.0.0 of the SAP Cloud SDK.
Our reasoning behind this is that we cannot change the used Logger implementation of every library our consumers might use in their application. This means we are not able to add the token mentioned below to all log messages of the consumer, which reduces its effectiveness tremendously.
Therefore we decided to drop the CloudLoggerFactory and advise the consumer to configure his logging implementation in a such way, that this token is automatically added. On this level it is possible to have this token at the end of every log message, allowing for automated tests on forged logs.
What the sanitized logger is supposed to do is making log forging identifiable. To allow this it does the following:
This logger has your provided class (SampleClass.class in your case) as the logger name. This name will be placed in the printed output depending on the configuration of your logger implementation. This is the default behavior of SLF4J.
Add (END OF LOG ENTRY) (or your provided token) at the end of every log message created with this logger. If this token is encountered in your log message it is replaced with (MESSAGE MIGHT BE FORGED!), as that would be an indicator that some input tried to tamper with your log messages.
Both of these properties allow you to identify whether a log message is actually valid or was created via Log Forging.
To see that have a look at the following example, at first with the "unsanitized" logger:
final Logger logger = CloudLoggerFactory.getLogger(SampleClass.class);
logger.error("Some valid first message");
logger.info("Something still valid\n[main] ERROR very.important.class Major Database Error!");
logger.error("Some valid last message");
On my machine the output of this looks like
[main] ERROR com.sap.sandbox.SampleClass - Some valid first message
[main] INFO com.sap.sandbox.SampleClass - Something still valid
[main] ERROR very.important.class Major Database Error!
[main] ERROR com.sap.sandbox.SampleClass - Some valid last message
So there is no chance to identify that something is wrong with those messages.
Therefore, if you use CloudLoggerFactory.getSanitizedLogger instead of CloudLoggerFactory.getLogger you get the following log output:
[main] ERROR com.sap.sandbox.SampleClass - Some valid first message (END OF LOG ENTRY)
[main] INFO com.sap.sandbox.SampleClass - Something still valid
[main] ERROR very.important.class Major Database Error! (END OF LOG ENTRY)
[main] ERROR com.sap.sandbox.SampleClass - Some valid last message (END OF LOG ENTRY)
Here you can see that one of the messages from the SampleClass, which should actually end with the token, ends without one. Therefore you can deduce that there is some error in the log and you need to investigate this issue further.
So much for the Log Forging aspect, which is the actual attack the sanitized logger makes identifiable.
Regarding the CLRF injection issue: This issue heavily depends on the further usage of the created log output:
If you store the log messages in a database there needs to be some way to prevent SQL injection.
If you watch the log files with a web-based log analyzer there needs to be some way to prevent XSS.
...
If we would escape all of those potential use case it would make actually just reading the log files with an editor, which is imo the most common use case, much more complicated.
So you would need to decide whether for your case this is an actual issue or just a false positive.
Another point is that also all your other dependencies would need to escape their log messages for your use case. This means an easier and overarching solution would be to configure that on the actual logger implementation, e.g. for Logback: https://logback.qos.ch/manual/layouts.html#replace.
Related
I am working on a project to read from our existing ElasticSearch instance and produce messages in Pulsar. If I do this in a highly multithreaded way without any explicit synchronization, I get many occurances of the following log line:
Message with sequence id X might be a duplicate but cannot be determined at this time.
That is produced from this line of code in the Pulsar Java client:
https://github.com/apache/pulsar/blob/a4c3034f52f857ae0f4daf5d366ea9e578133bc2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L653
When I add a synchronized block to my method, synchronizing on the pulsar template, the error disappears, but my publish rate drops substantially.
Here is the current working implementation of my method that sends Protobuf messages to Pulsar:
public <T extends GeneratedMessageV3> CompletableFuture<MessageId> persist(T o) {
var descriptor = o.getDescriptorForType();
PulsarPersistTopicSettings settings = pulsarPersistConfig.getSettings(descriptor);
MessageBuilder<T> messageBuilder = Optional.ofNullable(pulsarPersistConfig.getMessageBuilder(descriptor))
.orElse(DefaultMessageBuilder.DEFAULT_MESSAGE_BUILDER);
Optional<ProducerBuilderCustomizer<T>> producerBuilderCustomizerOpt =
Optional.ofNullable(pulsarPersistConfig.getProducerBuilder(descriptor));
PulsarOperations.SendMessageBuilder<T> sendMessageBuilder;
sendMessageBuilder = pulsarTemplate.newMessage(o)
.withSchema(Schema.PROTOBUF_NATIVE(o.getClass()))
.withTopic(settings.getTopic());
producerBuilderCustomizerOpt.ifPresent(sendMessageBuilder::withProducerCustomizer);
sendMessageBuilder.withMessageCustomizer(mb -> messageBuilder.applyMessageBuilderKeys(o, mb));
synchronized (pulsarTemplate) {
try {
return sendMessageBuilder.sendAsync();
} catch (PulsarClientException re) {
throw new PulsarPersistException(re);
}
}
}
The original version of the above method did not have the synchronized(pulsarTemplate) { ... } block. It performed faster, but generated a lot of logs about duplicate messages, which I knew to be incorrect. Adding the synchronized block got rid of the log messages, but slowed down publishing.
What are the best practices for multithreaded access to the PulsarTemplate? Is there a better way to achieve very high throughput message publishing?
Should I look at using the reactive client instead?
EDIT: I've updated the code block to show the minimum synchronization necessary to avoid the log lines, which is just synchronizing during the .sendAsync(...) call.
Your usage w/o the synchronized should work. I will look into that though to see if I see anything else going on. In the meantime, it would be great to give the Reactive client a try.
This issue was initially tracked here, and the final resolution was that it was an issue that has been resolved in Pulsar 2.11.
Please try updating the Pulsar 2.11.
I am working on a Quarkus app that uses the smallrye microprofile fault tolerance implementation.
We have configured fault tolerance on the client definitions via the annotations API (#Retry, #Bulkhead, etc) and it seems to work but we don't get any sort of feedback about what is happening. Ideally we would like to get some sort of callback but even just having logs would help out in the first step.
The rest clients look something like this:
#RegisterRestClient(configKey = "foo-backend")
#Path("/backend")
interface FooClient {
#POST
#Retry(maxRetries = 4, delay = 900)
#ExponentialBackoff
#Timeout(value = 3000)
fun getUser(payload: GetFooUserRequest): GetFooUserResponse
}
Looking at the logs, even though we trace all communication, I cannot see any event even if I manually stop foo-backend and start it again before the retires run out.
Our logging config looks like this right now but still nothing
quarkus.rest-client.logging.scope=request-response
quarkus.rest-client.logging.body-limit=2048
quarkus.log.category."org.jboss.resteasy.reactive.client.logging".level=DEBUG
Is there a way to get callbacks when a fault tolerance event happens? Or a setting which logs them out? I also would be interested in knowing when out Circuit Breakers are triggered or when a Bulkhead fills up. Logging them would be good enough for now but Ideally I would like to somehow listen for them.
You can enable DEBUG logging for the io.smallrye.faulttolerance category, and you should get all the information you need.
Specifically for circuit breakers, you can register state change listeners for circuit breakers that have been given a name using #CircuitBreakerName -- just inject CircuitBreakerMaintenance and use onStateChange. See https://smallrye.io/docs/smallrye-fault-tolerance/5.6.0/usage/extra.html#_circuit_breaker_maintenance
There's unfortunately nothing similar for bulkheads yet.
Most of the messages I looked at in the history are about disabling certain aspects of the log. I'd like the opposite. I'm seeing lots of messages like:
" DEBUG o.s.w.s.m.m.a.HttpEntityMethodProcessor.traceDebug (91) - Writing ["
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2. (truncated)...]"
I'd like to see the entire, non-truncated, (in this case) RSS feed. Any idea how I can persuade Spring/Logback/console/maven to do this?
Bonus question -- how would I write a test to verify that the logs are actually not truncated? I don't have them persisted in any way, just on the console. Many thanks!
With DEBUG log level, Spring only logs the truncated data. With TRACE log level, Spring logs the complete data.
You could configure something like
logging.level.org.springframework.web.servlet.mvc.method.annotation.HttpEntityMethodProcessor = TRACE
To test the output written to the log (like System.out), have a look into OutputCapture.
#ExtendWith(OutputCaptureExtension.class)
class OutputCaptureTests {
#Test
void testName(CapturedOutput output) {
System.out.println("Hello World!");
assertThat(output).contains("World");
}
}
You could also set the TRACE logging level on the HttpLogging entity like this:
logging.level.org.springframework.web.HttpLogging = TRACE
I have a simple route like this
from("file:data/inbox?noop=true").transform().body().to("file:data/outbox").bean(UpdateInventory.class);
from("direct:update").to("file:data/anotherbox").to("direct:newupdate");
from("direct:newupdate").to("file:data/newbox");
And the output i am expecting is
-file://data/inbox
--transform[simple{body}] 15 ms
--file:data/outbox 5
---bean[com.classico.sample.UpdateInventory#3a469fea 19
---file:data/anotherbox 6
----direct:newupdate 5
-----file:data/newbox 4
I tried using a EventNotifier and when the ExchangeCompletedEvent is received i fetched the message History.But since the second exchange is completed first message history is showing up in the reverse order of invocation.Is it possible to store all the messgae histories in a collection and print them in reverse order or Is there any event that is suitable for this.?
if (event instanceof ExchangeCompletedEvent) {
ExchangeCompletedEvent exchangeCompletedEvent = (ExchangeCompletedEvent) event;
Exchange exchange = exchangeCompletedEvent.getExchange();
String routeId = exchange.getFromRouteId();
List<MessageHistory> list = exchange.getProperty(Exchange.MESSAGE_HISTORY, List.class);
for (MessageHistory history : list) {
String id = history.getNode().getId();
String label = URISupport.sanitizeUri(history.getNode().getLabel());
log.info(String.format(MESSAGE_HISTORY_OUTPUT, routeId, id, label, history.getElapsed()));
}
}
You can use JMX to get all that details for each processor.
There is also a dumpRouteStatsAsXml operation on each route / camelContext that can output a xml file of the route(s) with all performance stats.
We use this in the hawtio web console to list this kind of information.
http://hawt.io/
Also the Camel Karaf / Jolokia Commands uses this as well
https://github.com/apache/camel/blob/master/platforms/commands/commands-core/src/main/java/org/apache/camel/commands/ContextInfoCommand.java#L175
And in next release of Camel you can also easier get the various processor mbeans, from CamelContext if you know their id, using
https://github.com/apache/camel/blob/master/camel-core/src/main/java/org/apache/camel/CamelContext.java#L545
Then you can use the getters on the mbean to get the performance stats.
The event notifer which was suggested is also great, but the events are on a higher level, although you get an event for sending to an endpoint, such as to some external system, which often is enough to capture details about. For low level details as asked here, then you need to use the JMX stats.
Ohh I forgot to tell about the message history EIP which also has a trace of how the message was routed with time taken stats as well.
http://camel.apache.org/message-history.html
That is maybe also just what you need, then you can get that information from the exchange as shown on that link.
I would suggest using the camel EventNotifier. You can find documentation on how to use it here:
http://camel.apache.org/eventnotifier-to-log-details-about-all-sent-exchanges.html
I have a spring integration flow that starts with a channel inboundadapter and picks up files and passes them through the system as messages.
After a few components, the messages are aggregated at an "Aggregator" from where they are released based on release strategies or by group timeout of 30 sec.
The downstream processing has another bunch of components till the final one.
The problem I am facing is this,
When I send 33 files which create 33 "groups/buckets" based on correlation IDs, aggregated at the "Aggregator", some of the files or messages seems to be "released" twice. The reason I conclude that is because I have a channel interceptor which shows a few messages passing through the "released" channel (appearing right after the aggregator) a second time, after completing the downstream processing successfully, the first time. Additionally, this behavior causes my application to not find a file and throw an exception which I see. This leads me to conclude that the message bucket/group/corrID is somehow being "Released" twice.
I have tried to debug this many ways , but essentially, I want to know how a corrID/bucket after being released and having successfully gone through all downstream components in a single thread, can be "released" again.
My question is, how can I debug this? I want to know what is making this message/bucket re-appear in the aggregator.
My aggregator is as follows,
<int:aggregator id="bufferedFiles" input-channel="inQueueForStage"
output-channel="released" expire-groups-upon-completion="true"
send-partial-result-on-expiry="true" release-strategy="releaseHandler"
release-strategy-method="canRelease"
group-timeout-expression="size() > 0 ? T(com.att.datalake.ifr.loader.utils.MessageUtils).getAggregatorTimeout(one, #sourceSnapshot) : -1">
<int:poller fixed-delay="${files.pickup.delay:3000}"
max-messages-per-poll="${num.files.pickup.per.poll:10}"
task-executor="executor" />
</int:aggregator>
Explanation of aggregator: The size()>0 applies to EACH correlation bucket. each of the 33 files I am sending will spawn/generate/create a new bucket because of the file name, so the aggregator will have 33 buckets/groups/corrIds, each bucket will contain only one file.
So the aggregator SPEL expression simply says that if there no release strategies, then release the bucket/group after 30 secs if the group indeed has at least some files.
My Channel inbound adapter is as follows:
<int-file:inbound-channel-adapter id="files"
channel="dispatchFiles" directory="${source.dir}" scanner="directoryScanner">
<int:poller fixed-delay="${files.pickup.delay:3000}"
max-messages-per-poll="${num.files.pickup.per.poll:10}" />
</int-file:inbound-channel-adapter>
Logs
here is the log of message completing the flow the first time. The completion time invoked suggests reaching the last component a "completionHandler" SA.
Explanation of Log: "cor" is the bucket/corrId that is being released twice. The reason I get the final exception is because during the first time, the file is removed from that original location and processed. So the second time around when this erroneous release happens, there is nothing to process there.
From the pictures it can be seen that the first batch/corrId/bucket is processed and finished around 11:09, and the second one is started around 11:10
an important point I noticed that this behavior only happens when I have a global channel interceptor in which I am doing somewhat long processing. When this interceptor is commented out, the errors go away.
Question:
is it possible for aggregator to double release a batch/corrId under any circumstance? How can I make aggregator emit any logs?
Thanks
Edit 10:15pm
My channel following the aggregator has an interceptor as follows,
public Message<?> preSend(Message<?> message, MessageChannel channel) {
LOGGER.info("******** Releasing from aggregator(interceptor) , corrID:{} at time:{} ********",MessageUtils.getCorrelationId(message), new Date() );
finalReporter.callback(channel.toString(), message);
return message;
}
From Aggregator down to final compeltionHandler SA, I have single threaded processing
Aggregator -> releasedChannel -> some SA1 -> some channel -> ..... -> completionChannel->completeSA
When I run for 33 partitions, let's follow corrId = "alh" The first time it is released, it looks like following,
What it shows is that thread-5 released it and it should process all the downstream components. But it leaves it mid-way and starts doing other things and is picked up again by a diffferent thread a little later as follows,
That seems/seemed to be the problem,
Solution Update:
I did following 3 things to sort of work around, at the moment,
for some reason, my interceptors were doing return super.preSend(message, channel) instead of simply return message. I changed it to latter
I had a global channel interceptors, I removed global and kept individual ones
If the channel interceptors had any issues before returning, would that cause a new release?
Although I still see the above scenario depicted in pictures, I am not getting double processing attempts and as such it avoids the errors. I am still trying to make sense out of this.
I understand it's too specific and difficult to explain; still thanks for the time and comments...
However, yes. I think #GaryRussell is right: since you use expire-groups-upon-completion="true" some partial groups may be released by group-timeout-expression and the new messages with the same correlationId will form a new group, which is released by the next group-timeout. Your size() > 0 isn't good too. It means that it is going to release partial group after that group-timeout. Maybe size() > 1? The group can't be size() == 0 though. Because it is created on the first message, so, if gruop exists, it contains at least one message. Yes, group can be empty, but in that case the aggregator should be marked with expire-groups-upon-completion="false". In that case it is marked as completed and doesn't allow new messages.
After struggling with debugging and various blind scenarios, I believe that at least I have a workaround and a possible root cause. I will try to outline all the things that I modified,
Root Cause:
My interceptors were calling a Common class with a common callback method. This method, based on the channel name from which the request was coming from, would decide the appropriate action to take. The actions were essentially collecting data, incrementing counters and persisting to database some information.
It seems that some of them were having errors and consequently, the thread was dying and message re-released. I am not entirely sure about it and please correct me if that's not the case.
But after I fixed those errors, the re-release issue seems to have subsided or vanished altogether.
The reason it was hard to diagnose was because I could not see those errors thrown during callback method invocations; may be I was catching them or may be they were lost.
I also found that the issue was only on any channel interceptors AFTER the aggregator. Interceptors before the aggregator did not present any issues; may be because they were simpler...
To debug,
I removed the interceptors and made the callback directly from various components (SAs), removed global interceptors and tried to add individual interceptors for specific channels.
Thanks for all the help.