Reactive Programming in synchronous environment: performance vs dependency management - performance

I understand that Reactive Programming shine in asynchronous environment (web request, or heavy IO/multi-threading/background task. However, in synchronous world, I found that Reactive Programming still give a great benefit of freeing dependency management burden from the programmer. I am writing a desktop app using C# that act like a spreadsheet: lots inputs, calculation on these inputs, and outputs. I am using RX.net and enjoying the benefit free dependency management it gave me: when an input change, i do not need to know what calculation need to be redo, and what ui need to be updated. However, as there are more synchronous/sequential calculation involve, the performance hit from using observable become greater. Consider these 2 ways of coding:
private static void async_world()
{
Subject<string> a_ob = new Subject<string>();
IObservable<string> A_ob = a_ob.Select(str =>
{
return my_to_upper(str);
});
IObservable<string> AA_ob = A_ob.Select(str => $"{str}{str}");
IObservable<string> AAA_ob = A_ob.Select(str => $"{str}{str}{str}");
IObservable<string> AA_AAA_ob = Observable.CombineLatest(AA_ob, AAA_ob,
(AA, AAA) =>
{
return $"{AA}_{AAA}";
});
AA_AAA_ob.Subscribe(str => Console.Out.WriteLine(str));
a_ob.OnNext("a");
}
private static void sync_world()
{
Subject<string> a_ob = new Subject<string>();
IObservable<string> result_ob = a_ob.Select(str =>
{
var upper = my_to_upper(str);
var AA = $"{upper}{upper}";
var AAA = $"{upper}{upper}{upper}";
return $"{AA}_{AAA}";
});
result_ob.Subscribe(str => Console.Out.WriteLine(str));
a_ob.OnNext("a");
}
assuming my_to_upper() is a slow process:
private static string my_to_upper(string str)
{
Console.Out.WriteLine($"{str}.ToUpper...");
for (int i = 0; i < 1000000; i++)
{
for (int j = 0; j < 2000; j++)
{
}
}
Console.Out.WriteLine($"{str}.ToUpper...done");
return str.ToUpper();
}
for async_world(), my_to_upper() is executed twice comparing to sync_world(). It would be nice that when data arrive ( on each onNext call ) A_ob perform calculation and "cache" the result of my_to_upper() and pass it on to AA_ob and AAA_ob
So my question is: is this a trade off we have to make: let the computer manage dependency automatically for us with inefficient performance, or manually manage dependency to gain better performance.

It is possible to 'cache' the result with the various Publish over-loads:
Here's a Published form of async with Publish + RefCount:
private static void async_world_publish_refcount()
{
Subject<string> a_ob = new Subject<string>();
IObservable<string> A_ob = a_ob
.Select(str => my_to_upper(str)) //same function call as async_world_original, just removed braces.
.Publish()
.RefCount();
IObservable<string> AA_ob = A_ob.Select(str => $"{str}{str}");
IObservable<string> AAA_ob = A_ob.Select(str => $"{str}{str}{str}");
IObservable<string> AA_AAA_ob = Observable.CombineLatest(AA_ob, AAA_ob,
(AA, AAA) =>
{
return $"{AA}_{AAA}";
});
AA_AAA_ob.Subscribe(str => Console.Out.WriteLine(str));
a_ob.OnNext("a");
}
And here's a Published form without RefCount:
private static void async_world_publish_only()
{
Subject<string> a_ob = new Subject<string>();
IObservable<string> AA_AAA_ob = a_ob
.Select(str => my_to_upper(str)) //same function call as async_world_original, just removed braces.
.Publish(A_ob =>
Observable.CombineLatest(
A_ob.Select(str => $"{str}{str}"),
A_ob.Select(str => $"{str}{str}{str}"),
(AA, AAA) => $"{AA}_{AAA}"
)
);
AA_AAA_ob.Subscribe(str => Console.Out.WriteLine(str));
a_ob.OnNext("a");
}
The publish without RefCount form is probably preferred if you're a fan of functional, reactive-y things: It effectively caches an observable's result values, and makes them available inside the limited lambda scope where you have to produce a selector to decide what to do with it. You could if necessary nest Publish lambdas to access multiple "cached" observable values.
Publish + RefCount, in contrast promotes statement-y, iterative-looking pipelines. Generally less recommended.
You can read more about Publish, RefCount and hot vs cold observables here.

Related

How to optimize merging elements in parallel

We have following problem. We parsing files (producer) and convert the data into a c# data format. Afterwards we need to merge all of this data together.
As this can be done in parallel we startet to implement a producer consumer pattern but stucking a bit in how to merge the results in a optimized manner.
Producer produces 5 data elements (named as follows):
1, 2, 3, 4, 5
Merges which will be done but the order does not matter. As soon as there are 2 elements created, they can be merged.
Example:
(1)and(2), (3)and(4), (12)and(34), (1234)and(5)
Data data = new Data();
BlockingCollection<Data> collection = new BlockingCollection<Data>();
Task consumer = Task.Factory.StartNew(() =>
{
while (!collection.IsCompleted)
{
var item = collection.Take();
data.Merge(item);
}
});
Task producer = Task.Factory.StartNew(() =>
{
Parallel.ForEach(files, file =>
{
collection.Add(new Data(file));
});
collection.CompleteAdding();
});
Task.WaitAll(consumer, producer);
//here we got the data merged with all files
return data;
This code works but has a problem. In our case the producer is much faster than the consumer. So we need parallel consumers who are waiting for two items to be at the queue. Then they should take them, merge them together and put them back to the queue. Is there any known pattern for such a merge issue?
We have found a quite nice solution for this.
Data data = new Data();
BlockingCollection<Data> collection = new BlockingCollection<Data>();
List<Task<Data>> tasks = new List<Task<Data>>();
Enumerable.Range(0, 5).ForEach(t => {
Task consumer = Task.Factory.StartNew(() =>
{
Data result = null;
foreach (Data data in collection.GetConsumingEnumerable())
{
if (result == null)
{
result = data;
}
else
{
result.Merge(data);
}
}
return result;
});
tasks.Add(consumer);
});
Task producer = Task.Factory.StartNew(() =>
{
Parallel.ForEach(files, file =>
{
collection.Add(new Data(file));
});
collection.CompleteAdding();
});
Task.WaitAll(consumers.Concat(new []{ producer}));
List<Data> datas = consumers.Select(t => t.Result).Where(t => t != null).ToList();
Data finalResult = datas.First();
foreach (Data toBeMerged in datas.Skip(1))
{
finalResult.Merge(toBeMerged);
}
return finalResult;

ElasticSearch NEST 5.6.1 Query for unit test

I wrote a bunch of queries to elastic search and I wanted to write a unit test for them. using this post moq an elastic connection I was able to preform a general mocking. But When I tried to view the Json which is being generated from my query I didn't manage to get it in any way.
I tried to follow this post elsatic query moq, but it is relevant only to older versions of Nest because the method ConnectionStatus and RequestInformation is no longer available for an ISearchResponse object.
My test look as follow:
[TestMethod]
public void VerifyElasticFuncJson()
{
//Arrange
var elasticService = new Mock<IElasticService>();
var elasticClient = new Mock<IElasticClient>();
var clinet = new ElasticClient();
var searchResponse = new Mock<ISearchResponse<ElasticLog>>();
elasticService.Setup(es => es.GetConnection())
.Returns(elasticClient.Object);
elasticClient.Setup(ec => ec.Search(It.IsAny<Func<SearchDescriptor<ElasticLog>,
ISearchRequest>>())).
Returns(searchResponse.Object);
//Act
var service = new ElasticCusipInfoQuery(elasticService.Object);
var FindFunc = service.MatchCusip("CusipA", HostName.GSMSIMPAPPR01,
LogType.Serilog);
var con = GetConnection();
var search = con.Search<ElasticLog>(sd => sd
.Type(LogType.Serilog)
.Index("logstash-*")
.Query(q => q
.Bool(b => b
.Must(FindFunc)
)
)
);
**HERE I want to get the JSON** and assert it look as expected**
}
Is there any other way to achieve what I ask?
The best way to do this would be to use the InMemoryConnection to capture the request bytes and compare this to the expected JSON. This is what the unit tests for NEST do. Something like
private static void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var connectionSettings = new ConnectionSettings(pool, new InMemoryConnection())
.DefaultIndex("default")
.DisableDirectStreaming();
var client = new ElasticClient(connectionSettings);
// Act
var searchResponse = client.Search<Question>(s => s
.Query(q => (q
.Match(m => m
.Field(f => f.Title)
.Query("Kibana")
) || q
.Match(m => m
.Field(f => f.Title)
.Query("Elasticsearch")
.Boost(2)
)) && +q
.Range(t => t
.Field(f => f.Score)
.GreaterThan(0)
)
)
);
var actual = searchResponse.RequestJson();
var expected = new
{
query = new {
#bool = new {
must = new object[] {
new {
#bool = new {
should = new object[] {
new {
match = new {
title = new {
query = "Kibana"
}
}
},
new {
match = new {
title = new {
query = "Elasticsearch",
boost = 2d
}
}
}
},
}
},
new {
#bool = new {
filter = new [] {
new {
range = new {
score = new {
gt = 0d
}
}
}
}
}
}
}
}
}
};
// Assert
Console.WriteLine(JObject.DeepEquals(JToken.FromObject(expected), JToken.Parse(actual)));
}
public static class Extensions
{
public static string RequestJson(this IResponse response) =>
Encoding.UTF8.GetString(response.ApiCall.RequestBodyInBytes);
}
I've used an anonymous type for the expected JSON as it's easier to work with than an escaped JSON string.
One thing to note is that Json.NET's JObject.DeepEquals(...) will return true even when there are repeated object keys in a JSON object (so long as the last key/value matches). It's not likely something you'll encounter if you're only serializing NEST searches though, but something to be aware of.
If you're going to have many tests checking serialization, you'll want to create a single instance of ConnectionSettings and share with all, so that you can take advantage of the internal caches within it and your tests will run quicker than instantiating a new instance in each test.

Running async function in parallel using LINQ's AsParallel()

I have a Document DB repository class that has one get method like below:
private static DocumentClient client;
public async Task<TEntity> Get(string id, string partitionKey = null)
{
try
{
RequestOptions requestOptions = null;
if (partitionKey != null)
{
requestOptions = new RequestOptions { PartitionKey = new PartitionKey(partitionKey) };
}
var result = await client.ReadDocumentAsync(
UriFactory.CreateDocumentUri(DatabaseId, CollectionId, id),
requestOptions);
return (TEntity)(dynamic)result.Resource;
}
catch (DocumentClientException e)
{
// Have logic for different exceptions actually
throw;
}
}
I have two collections - Collection1 and Collection2. Collection1 is non-partitioned whereas Collection2 is partitioned.
On the client side, I create two repository objects, one for each collection.
private static DocumentDBRepository<Collection1Item> collection1Repository = new DocumentDBRepository<Collection1Item>("Collection1");
private static DocumentDBRepository<Collection2Item> collection2Repository = new DocumentDBRepository<Collection2Item>("Collection2");
List<Collection1Item> collection1Items = await collection1Repository.GetItemsFromCollection1(); // Selects first forty documents based on time
List<UIItem> uiItems = new List<UIItem>();
foreach (var item in collection1Items)
{
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId); // TargetId is my partition key for Collection2
uiItems.Add(new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
});
}
This works fine. But since it is happening sequentially with foreach, I wanted to do those Get calls in parallel. When I do it in parallel as below:
ConcurrentBag<UIItem> uiItems = new ConcurrentBag<UIItem>();
collection1Items.AsParallel().ForAll(async item => {
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId); // TargetId is my partition key for Collection2
uiItems.Add(new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
});
}
);
It doesn't work and uiItems is always empty.
You don't need Parallel.For to run async operations concurrently. If they are truly asynchronous they already run concurrently.
You could collect the task returned from each operation and simply call await Task.WhenAll() on all the tasks. If you modify your lambda to create and return a UIItem, the result of await Task.WhenAll() will be a collection of UIItems. No need to modify global state from inside the concurrent operations.
For example:
var itemTasks = collection1Items.Select(async item =>
{
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId);
return new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
}
});
var results= await Task.WhenAll(itemTasks);
A word of caution though - this will fire all Get operations concurrently. That may not be what you want, especially when calling a service with rate limiting.
Try simply starting tasks and waiting for all of them at the end. That would result in parallel execution.
var tasks = collection1Items.Select(async item =>
{
//var collection2Item = await storageRepository.Get...
return new UIItem
{
//...
};
});
var uiItems = await Task.WhenAll(tasks);
PLINQ is useful when working with in-memory constructs and using as many threads as possible, but if used with the async-await technique (which is for releasing threads while accessing external resources), you can end up with strange results.
I would like to share a solution for an issue i saw in some comments.
If you're scared about thread rate limit, and you want to limit this by yourself, you can do something like this, using SemaphoreSlim.
var nbCores = Environment.ProcessorCount;
var semaphore = new SemaphoreSlim(nbCores, nbCores);
var processTasks = items.Select(async x =>
{
await semaphore.WaitAsync();
try
{
await ProcessAsync();
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(processTasks);
In this example, i called concurrently my "ProcessAsync" but limited to {processor number} concurrent processes.
Hope that's help someone.
NB : You could set the "nbCores" variable as a proper value that satisfy your code condition, of course.
NB 2 : This example is for some use cases, not all of them. I would highly suggest with a big load of task to refer to TPL programming

BroadcastBlock missing items

I have a list of project numbers that I need to process. A project could have about 8000 items and I need to get the data for each item in the project and then push this data into a list of servers. Can anybody please tell me the following..
1) I have 1000 items in iR but only 998 were written to the servers. Did I loose items by using broadCastBlock?
2) Am I doing the await on all actionBlocks correctly?
3) How do I make the database call async?
Here is the database code
public MemcachedDTO GetIR(MemcachedDTO dtoItem)
{
string[] Tables = new string[] { "iowa", "la" };
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["test"].ConnectionString))
{
using (SqlCommand command = new SqlCommand("test", connection))
{
DataSet Result = new DataSet();
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add("#ProjectId", SqlDbType.VarChar);
command.Parameters["#ProjectId"].Value = dtoItem.ProjectId;
connection.Open();
Result.EnforceConstraints = false;
Result.Load(command.ExecuteReader(CommandBehavior.CloseConnection), LoadOption.OverwriteChanges, Tables);
dtoItem.test = Result;
}
}
return dtoItem;
}
Update:
I have updated the code to the below. It just hangs when I run it and only writes 1/4 of the data to the server? Can you please let me know what I am doing wrong?
public static ITargetBlock<T> CreateGuaranteedBroadcastBlock<T>(IEnumerable<ITargetBlock<T>> targets, DataflowBlockOptions options)
{
var targetsList = targets.ToList();
var block = new ActionBlock<T>(
async item =>
{
foreach (var target in targetsList)
{
await target.SendAsync(item);
}
}, new ExecutionDataflowBlockOptions
{
CancellationToken = options.CancellationToken
});
block.Completion.ContinueWith(task =>
{
foreach (var target in targetsList)
{
if (task.Exception != null)
target.Fault(task.Exception);
else
target.Complete();
}
});
return block;
}
[HttpGet]
public async Task< HttpResponseMessage> ReloadItem(string projectQuery)
{
try
{
var linkCompletion = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
};
var cts = new CancellationTokenSource();
var dbOptions = new DataflowBlockOptions { CancellationToken = cts.Token };
IList<string> projectIds = projectQuery.Split(',').ToList();
IEnumerable<string> serverList = ConfigurationManager.AppSettings["ServerList"].Split(',').Cast<string>();
var iR = new TransformBlock<MemcachedDTO, MemcachedDTO>(
dto => dto.GetIR(dto), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3 });
List<ActionBlock<MemcachedDTO>> actionList = new List<ActionBlock<MemcachedDTO>>();
List<MemcachedDTO> dtoList = new List<MemcachedDTO>();
foreach (string pid in projectIds)
{
IList<MemcachedDTO> dtoTemp = new List<MemcachedDTO>();
dtoTemp = MemcachedDTO.GetItemIdsByProject(pid);
dtoList.AddRange(dtoTemp);
}
foreach (string s in serverList)
{
var action = new ActionBlock<MemcachedDTO>(
async dto => await PostEachServerAsync(dto, s, "setitemcache"));
actionList.Add(action);
}
var bBlock = CreateGuaranteedBroadcastBlock(actionList, dbOptions);
foreach (MemcachedDTO d in dtoList)
{
await iR.SendAsync(d);
}
iR.Complete();
iR.LinkTo(bBlock);
await Task.WhenAll(actionList.Select(action => action.Completion).ToList());
return Request.CreateResponse(HttpStatusCode.OK, new { message = projectIds.ToString() + " reload success" });
}
catch (Exception ex)
{
return Request.CreateResponse(HttpStatusCode.InternalServerError, new { message = ex.Message.ToString() });
}
}
1) I have 1000 items in iR but only 998 were written to the servers. Did I loose items by using broadCastBlock?
Yes in the code below you set BoundedCapacity to one, if at anytime your BroadcastBlock cannot pass along an item it will drop it. Additionally a BroadcastBlock will only propagate Completion to one TargetBlock, do not use PropagateCompletion=true here. If you want all blocks to complete you need to handle Completion manually. This can be done by setting the ContinueWith on the BroadcastBlock to pass Completion to all of the connected targets.
var action = new ActionBlock<MemcachedDTO>(dto => PostEachServerAsync(dto, s, "set"), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3, BoundedCapacity = 1 });
broadcast.LinkTo(action, linkCompletion);
actionList.Add(action);
Option: Instead of the BroadcastBlock use a properly bounded BufferBlock. When your downstream blocks are bound to one item they cannot receive additional items until they finish processing what they have. That will allow the BufferBlock to offer its items to another, possibly idle, ActionBlock.
When you add items into a throttled flow, i.e. a flow with a BoundedCapacity less than Unbounded. You need to be using the SendAsync method or at least handling the return of Post. I'd recommend simply using SendAsync:
foreach (MemcachedDTO d in dtoList)
{
await iR.SendAsync(d);
}
That will force your method signature to become:
public async Task<HttpResponseMessage> ReloadItem(string projectQuery)
2) Am I doing the await on all actionBlocks correctly?
The previous change will permit you to loose the blocking Wait call in favor of a await Task.WhenAlll
iR.Complete();
actionList.ForEach(x => x.Completion.Wait());
To:
iR.Complete();
await bufferBlock.Completion.ContinueWith(tsk => actionList.ForEach(x => x.Complete());
await Task.WhenAll(actionList.Select(action => action.Completion).ToList());
3) How do I make the database call async?
I'm going to leave this open because it should be a separate question unrelated to TPL-Dataflow, but in short use an async Api to access your Db and async will naturally grow through your code base. This should get you started.
BufferBlock vs BroadcastBlock
After re-reading your previous question and the answer from #VMAtm. It seems you want each item sent to All five servers, in that case you will need a BroadcastBlock. You would use a BufferBlock to distribute the messages relatively evenly to a flexible pool of servers that each could handle a message. None the less, you will still need to take control of propagating completion and faults to all the connected ActionBlocks by awaiting the completion of the BroadcastBlock.
To Prevent BroadcastBlock Dropped Messages
In general you two options, set your ActionBlocks to be unbound, which is their default value:
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3, BoundedCapacity = Unbounded });
Or broadcast messages your self from any variety of your own construction. Here is an example implementation from #i3arnon. And another from #svick

How to route, group, or otherwise split up messages into consistent sets using TPL Dataflow

I'm new to TPL Dataflow and I'm looking for a construct which will allow splitting up a list of source messages for evenly distributed parallel processing while maintaining order of the messages message through individual pipelines. Is there a specific Block or concept within the DataFlow API that can be used to accomplish this or is it more of a matter providing glue code or custom Blocks between existing Blocks?
For those familiar with Akka.NET I'm looking for functionality similar to the ConsistentHashing router which allow sending messages to a single router which then forwards these messages on to individual routees to be handled.
Synchronous example:
var count = 100000;
var processingGroups = 5;
var source = Enumerable.Range(1, count);
// Distribute source elements consistently and evenly into a specified set of groups (ex. 5) so that.
var distributed = source.GroupBy(s => s % processingGroups);
// Within each of the 5 processing groups go through each item and add 1 to it
var transformed = distributed.Select(d => d.Select(i => i + 3).ToArray());
List<int[]> result = transformed.ToList();
Check.That(result.Count).IsEqualTo(processingGroups);
for (int i = 0; i < result.Count; i++)
{
var outputGroup = result[i];
var expectedRange = Enumerable.Range(i + 1, count/processingGroups).Select((e, index) => e + (index * (processingGroups - 1)) + 3);
Check.That(outputGroup).ContainsExactly(expectedRange);
}
In general I don't think what you're looking for is pre-made in Dataflow as it may be with a ConsistentHashing router. However, by adding an id to the pieces of data you wish to flow you can process them in any order, in parallel and reorder them when the processing finishes.
public class Message {
public int MessageId { get; set; }
public int GroupId { get; set; }
public int Value { get; set; }
}
public class MessageProcessing
{
public void abc() {
var count = 10000;
var groups = 5;
var source = Enumerable.Range(0, count);
//buffer all input
var buffer = new BufferBlock<IEnumerable<int>>();
//split each input enumerable into processing groups
var messsageProducer = new TransformManyBlock<IEnumerable<int>, Message>(ints =>
ints.Select((i, index) => new Message() { MessageId = index, GroupId = index % groups, Value = i }).ToList());
//process each message, one action block may process any group id in any order
var processMessage = new TransformBlock<Message, Message>(msg =>
{
msg.Value++;
return msg;
}, new ExecutionDataflowBlockOptions() {
MaxDegreeOfParallelism = groups
});
//output of processed message values
int[] output = new int[count];
//insert messages into array in the order the started in
var regroup = new ActionBlock<Message>(msg => output[msg.MessageId] = msg.Value,
new ExecutionDataflowBlockOptions() {
MaxDegreeOfParallelism = 1
});
}
}
In the example the GroupId of a message isn't used but it could be used in a more complete example for coordinating groups of messages. Also, handling follow up posts to the bufferblock could be done by changing the output array to a List and setting up a corresponding list element each time an enumerable of integers is posted to the buffer block. Depending on your exact use, you may need to support multiple users of the output, and this can be folded back into the flow.
You can dynamically create a pipeline with linking the blocks between each other based on predicate:
var count = 100;
var processingGroups = 5;
var source = Enumerable.Range(1, count);
var buffer = new BufferBlock<int>();
var consumer1 = new ActionBlock<int>(i => { });
var consumer2 = new ActionBlock<int>(i => { });
var consumer3 = new ActionBlock<int>(i => { });
var consumer4 = new ActionBlock<int>(i => { Console.WriteLine(i); });
var consumer5 = new ActionBlock<int>(i => { });
buffer.LinkTo(consumer1, i => i % 5 == 1);
buffer.LinkTo(consumer2, i => i % 5 == 2);
buffer.LinkTo(consumer3, i => i % 5 == 3);
buffer.LinkTo(consumer4, i => i % 5 == 4);
buffer.LinkTo(consumer5);
foreach (var i in source)
{
buffer.Post(i);
// consider async option if you able to do it
// await buffer.SendAsync(i);
}
buffer.Complete();
Console.ReadLine();
The code above will write only numbers from 4th group, processing other groups silently, but I hope you got the idea. There is a general practice to link a block for at least one consumer without filtering for messages not being dropped if they aren't accepted by any consumers, and you can do this if you don't have a default handler (NullTarget<int> simply ignores all the messages it got):
buffer.LinkTo(DataflowBlock.NullTarget<int>());
The downside of this is a continuation of it's advantages: you have to provide predicates, as there is no built-in structures for this. However, it still could be done.

Resources