TPL default constructor BufferBlock: Value of DataFlowBlockOptions - task-parallel-library

If you use the default constructor to construct a TPL BufferBlock, are the DataFlowBlockOptions unbounded? In other words, what is the BoundedCapacity of the BufferBlock?
As stated in this SO answer, it's not possible to query nor change the values of the BufferBlock after construction.

You have two options to find this out: read the docs or create BufferBlock by yourself.
From Introduction to TPL Dataflow:
The majority of the dataflow blocks included in System.Threading.Tasks.Dataflow.dll support the specification of a bounded capacity.
This is the limit on the number of items the block may be storing and have in flight at any one time. By default, this value is initialized to DataflowBlockOptions.Unbounded (-1), meaning that there is no limit.
However, a developer may explicitly specify an upper bound. If a block is already at its capacity when an additional message is offered to it, that message will be postponed.
Also, from MSDN:
DataflowBlockOptions is mutable and can be configured through its properties.
When specific configuration options are not set, the following defaults are used:
TaskScheduler: TaskScheduler.Default
MaxMessagesPerTask: DataflowBlockOptions.Unbounded (-1)
CancellationToken: CancellationToken.None
BoundedCapacity: DataflowBlockOptions.Unbounded (-1)
Dataflow blocks capture the state of the options at their construction.
Subsequent changes to the provided DataflowBlockOptions instance should not affect the behavior of a dataflow block.
You can always view private members from debugger:
You also may try to get/set them by reflection, but this is really not recommended.

Related

Are txs in the same block ordered?

Suppose I have a 2 txs A and B where B depends on A succeeding.
If both A & B are included in the same block/slot, this implies there may be some partial ordering between txs in this block that needs to be enforced for them to succeed.
Does this mean that the leader tries to order txs in the block when proposing? If yes, is this ordering of txs surfaced at the RPC level (eg maybe a "tx slot" inside the block)?
The leader validator that produces the block does in fact order the transactions in whichever order they want, based on account read / write locks, affected programs, etc. Typically, it will go first-come, first-serve, but in time, as MEV is added to validators, this will no longer be the case, and they can enforce their own constraints.
At the RPC level, the order of transactions in the block is the order in which they were executed. The explorer surfaces this with a "#" column, e.g. https://explorer.solana.com/block/137445155?filter=all

Confusion about MQ Put options (IBM MQ)

I am trying to understand some of the IBM MQ put options:
I have used https://www.ibm.com/docs/en/ibm-mq/9.2?topic=interfaces-mqputmessageoptionsnet-class for documentation.
It seems that MQPMO_ASYNC_RESPONSE and MQPMO_SYNC_RESPONSE in fact are mutually exclusive, yet these options have to different ID's (bit 12 and bit 13). What does MQ do when both options are set or neither one is set?
t seems that MQPMO_SYNCPOINT and MQPMO_NO_SYNCPOINT in fact are mutually exclusive, yet these options have to different ID's (bit 2 and bit 3). What does MQ do when both options are set or neither one is set?
MQPMO_RESPONSE_AS_Q_DEF makes things even more confusing for me. As I understand from the documentation this bit defers the control of the requests being synchronous or asynchronous to the queue definition. Therefore ignorning the options set with MQPMO_ASYNC_RESPONSE and MQPMO_SYNC_RESPONSE. But the documentation states the following
For an MQDestination.put call, this option takes the put response type from DEFPRESP attribute of the queue.
For an MQQueueManager.put call, this option causes the call to be made synchronously.
And the documentation for DEFRESP at https://www.ibm.com/docs/en/ibm-mq/9.1?topic=queues-defpresp-mqlong is stating this:
The default put response type (DEFPRESP) attribute defines the value used by applications when the PutResponseType within MQPMO has been set to MQPMO_RESPONSE_AS_Q_DEF. This attribute is valid for all queue types.
The value is one of the following:
SYNC
The put operation is issued synchronously returning a response.
ASYNC
The put operation is issued asynchronously, returning a subset of MQMD fields.
But the other documentation says setting this option makes the call synchronous.
So in short: What happens with the seemingly mutually exclusive options and what does the MQPMO_RESPONSE_AS_Q_DEF really do?
If you combine options flags that are not supposed to be combined, such as MQPMO_SYNCPOINT and MQPMO_NO_SYNCPOINT you will be return MQRC_OPTIONS_ERROR on the MQPUT call. you can see the documentation for this in IBM Docs here.
You are correct, the use of MQPMO_RESPONSE_AS_Q_DEF tells the queue manager to take the value for Put Response from the queue attribute DEFPRESP. The queue manager will look up the queue definition, and if it is SYNC it will effectively use MQPMO_SYNC_RESPONSE and if it is ASYNC it will effectively use MQPMO_ASYNC_RESPONSE.
The documentation you pointed us to states the following:-
MQC.MQPMO_RESPONSE_AS_Q_DEF
For an MQDestination.put call, this option takes the put response type from DEFPRESP attribute of the queue.
For an MQQueueManager.put call, this option causes the call to be made synchronously.
I don't know why it would be different depending on the class used, but I can tell you that if the MQPMO_RESPONSE_AS_Q_DEF makes it to the queue manager it will change it as described above. This documentation suggests that the MQQueueManager class is changing it itself which is an odd decision.

Kafka Streams - custom SessionWindows suppression with session size constraints

I am currently working on a Kafka Streams solution for retrieving user browsing sessions using the SessionWindows. My topology looks like:
builder
.stream(...)
.map(... => (newKey, value))
.groupByKey(...)
.windowedBy(SessionWindows.`with`(INACTIVITY_GAP).grace(GRACE))
.aggregate(... into list of events)
.suppress(Suppressed.untilWindowCloses(unbounded()))
This simple scenario works well for me, however I need to put some additional checks to the suppression logic. Namely I would like to force-flush all the sessions that exceed given size (e.g. all the sessions that have more than 1000 events inside, the events were produced within inactivity gap). My question is how this could be implemented?
I know that the .suppress() method does not accept any custom implementation of Suppressed. Therefore I was thinking about replacing the .suppress() with .transform() with my custom Transformer with a SessionStore inside that could do the suppression logic and also apply these additional checks. However, I am having a hard time when it comes to adding/deleting entries to the store and implementing the basic "untilWindowClosed" suppression by myself: I could probably do the periodic flush through ProcessorContext.schedule() but the SessionStore does not provide the possibility to iterate through all keys.
Is this a good direction? Are there any other ways to add size constraints to the sessions?
this might be what you're looking for:
org.apache.kafka.streams.kstream.Suppressed#untilTimeLimit
org.apache.kafka.streams.kstream.Suppressed.BufferConfig#maxRecords
.suppress(Suppressed.untilTimeLimit(Duration.ofSeconds(WAIT_UNTIL), Suppressed.BufferConfig.maxRecords(1000)))

Read ruleset topic/partition by multiple kafka stream instances of the same app

I am having a Kafka Stream app that does some processing in a main event topic and I also have a side topic that
is used to apply a ruleset to the main event topic.
Till now the app was running as a single instance and when
a rule was applied a static variable was set for the other processing operator (main topic consumer) to continue
operating evaluating rules as expected. This was necessary since the rule stream would be written to a single partition depending
on the rule key literal e.g. <"MODE", value> and therefore that way (through static variable) all the other tasks
involved would made aware of the change.
Apparently though when deploying the application to multiple nodes this approach could not work since having a
single consumer group (from e.g. two instance apps) would lead only one instance app setting its static variable to
the correct value and the other instance app never consuming that rule value (Also setting each instance app to a
different group id would lead to the unwanted side-effect of consuming the main topic twice)
On the other hand a solution of making the rule topic used as a global table would lead to have the main processing
operator querying the global table every time an event is consumed by that operator in order to retrieve the latest rules.
Is it possible to use some sort of a global table listener when a value is introduced in that topic to execute some
callback code and set a static variable ?
Is there a better/alternative approach to resolve this issue ?
Instead of a GlobalKTable, you can fall back to addGlobalStore() that allows you to execute custom code.

Significance of boolean direct Output Fields declare (boolean direct, Fields fields)

Looking at the OutputFieldsDeclarer class, I see there is an overloaded method for declare(...) with a boolean flag direct.
If I use the method declare(Fields fields), it sets this boolean flag as false.
I am not sure how Storm interprets this boolean field internally while processing with Spouts and Bolts .
Can somebody explain me the significance of this flag?
If you declare a direct stream (ie, setting the flag to true), you need to
emit tuples via
collector.emitDirect(...)
methods (collector.emit(...) is not allowed for direct streams). Those
methods require to specify the consumer task ID that should receive the
tuple.
Furthermore, when connecting a consumer to a direct stream, you need to
specify
builder.setBolt(....).directGrouping("direct-emitting-bolt", "direct-stream-Id");
All other connection patterns are not allowed on direct stream.
Direct streams have the advantage, that you have fine-grained controlled
over the data distribution from producer to consumer. You can implement
any imaginable distribution pattern. Of course, direct streams are much
more difficult to handle. For example, you need to know the task IDs of
subscribed consumers (those can be looked up in the TopologyContext
provided via Bolt.prepare).

Resources