Using ILogger in parallel - async-await

In the following code, the program terminates before the log has been fully flushed:
class Program
{
static void Main(string[] args)
{
var count = -1;
var tasks = new ConcurrentBag<Task>();
var services = new ServiceCollection();
services.AddLogging(configure => configure.AddConsole());
var serviceProvider = services.BuildServiceProvider();
var logger = serviceProvider.GetService<ILogger<Program>>();
if (logger is null)
throw new NullReferenceException();
Parallel.ForEach(Enumerable.Range(0, 1000), i =>
{
tasks.Add(Task.Run(() => logger.LogInformation(Interlocked.Increment(ref count).ToString())));
//tasks.Add(Task.Run(() => Console.WriteLine(Interlocked.Increment(ref count).ToString())));
});
Task.WhenAll(tasks).Wait();
}
}
Even though 1.000 log statements should be printed, only the following is flushed:
info: Net5.Program[0]
6
info: Net5.Program[0]
2
info: Net5.Program[0]
5
info: Net5.Program[0]
1
info: Net5.Program[0]
7
info: Net5.Program[0]
4
info: Net5.Program[0]
3
info: Net5.Program[0]
0
info: Net5.Program[0]
8
info: Net5.Program[0]
10
C:\Users\MyName\source\repos\ConsoleApp\Net5\bin\Debug\net5.0\Net5.exe (process 28048) exited with code 0.
Press any key to close this window . . .
When using Console.WriteLine (commented out in the code above), however, all lines are printed, which leads be to suspect that the console logger is asynchronous behind the scenes.
How can I ensure that all log statements are awaited without resorting to a manual delay?
Edit:
I settled on the following code, which did the trick:
class Program
{
static void Main(string[] args)
{
var count = -1;
var services = new ServiceCollection();
services.AddLogging(configure => configure.AddConsole());
using var serviceProvider = services.BuildServiceProvider();
var logger = serviceProvider.GetRequiredService<ILogger<Program>>();
var tasks = Enumerable.Range(0, 50000)
.AsParallel()
.Select(_ => Task.Run(() => logger.LogInformation(Interlocked.Increment(ref count).ToString())));
Task.WhenAll(tasks).Wait();
}
}
Key differences:
Service Provider is disposed, which fixed the log flushing issue.
Get ILogger instance by using GetRequiredService to avoid having to null check.
Use Enumerable.Range(...).AsParallel.Select(...) instead of Parallel.ForEach to stick with Tasks all the way.

If you dispose of the service provider, it will dispose of all singleton services and that will flush the logger.
There's also no need to use Parallel.ForEach. In fact, using Parallel.ForEach with async`await` is a code smell. It's almost always wrong.
And, since you tagged the question .NET 5.0, you can change your code to:
class Program
{
static async Task Main()
{
var count = -1;
var tasks = new ConcurrentBag<Task>();
var services = new ServiceCollection();
services.AddLogging(configure => configure.AddConsole());
using var serviceProvider = services.BuildServiceProvider();
var logger = serviceProvider.GetRequiredService<ILogger<Program>>();
for (var i = 0; i < 1000; i++)
{
tasks.Add(Task.Run(() => logger.LogInformation(Interlocked.Increment(ref count).ToString())));
};
await Task.WhenAll(tasks);
}
}

Related

Why does my for loop increment past where it should stop?

I am attempting to increase the speed at which files in my application download by downloading them in parallel. Previously I was downloading them sequentially and it worked fine but when I attempted to download them in parallel I ran into unexplained issues.
Here is my method in which I downloaded the files in sequence:
public IActionResult DownloadPartFiles([FromBody] FileRequestParameters parameters)
{
List<InMemoryFile> files = new List<InMemoryFile>();
for (int i = 0; i < parameters.FileNames.Length; i++)
{
InMemoryFile inMemoryFile = GetInMemoryFile(parameters.FileLocations[i], parameters.FileNames[i]).Result;
files.Add(inMemoryFile);
}
byte[] archiveFile = null;
using (MemoryStream archiveStream = new MemoryStream())
{
using (ZipArchive archive = new ZipArchive(archiveStream, ZipArchiveMode.Create, true))
{
foreach (InMemoryFile file in files)
{
ZipArchiveEntry zipArchiveEntry = archive.CreateEntry(file.FileName, CompressionLevel.Optimal);
using (MemoryStream originalFileStream = new MemoryStream(file.Content))
using (Stream zipStream = zipArchiveEntry.Open())
{
originalFileStream.CopyTo(zipStream);
}
}
}
archiveFile = archiveStream.ToArray();
}
return File(archiveFile, "application/octet-stream");
}
Here is the method changed to download the files in parallel:
public async Task<IActionResult> DownloadPartFiles([FromBody] FileRequestParameters parameters)
{
List<Task<InMemoryFile>> fileTasks = new List<Task<InMemoryFile>>();
for (int i = 0; i < parameters.FileNames.Length; i++)
{
if(i == parameters.FileNames.Length - 1)
{
int breakpoint = 0;
}
if(i == parameters.FileNames.Length)
{
int breakpoint = 0;
}
fileTasks.Add(Task.Run(() => GetInMemoryFile(parameters.FileLocations[i], parameters.FileNames[i])));
}
InMemoryFile[] fileResults = await Task.WhenAll(fileTasks);
byte[] archiveFile = null;
using (MemoryStream archiveStream = new MemoryStream())
{
using (ZipArchive archive = new ZipArchive(archiveStream, ZipArchiveMode.Create, true))
{
foreach (InMemoryFile file in fileResults)
{
ZipArchiveEntry zipArchiveEntry = archive.CreateEntry(file.FileName, CompressionLevel.Optimal);
using (MemoryStream originalFileStream = new MemoryStream(file.Content))
using (Stream zipStream = zipArchiveEntry.Open())
{
originalFileStream.CopyTo(zipStream);
}
}
}
archiveFile = archiveStream.ToArray();
}
return File(archiveFile, "application/octet-stream");
}
Here is the method that does the actual downloading:
private async Task<InMemoryFile> GetInMemoryFile(string fileLocation, string fileName)
{
InMemoryFile file;
using (HttpClient client = new HttpClient())
using (HttpResponseMessage response = await client.GetAsync(fileLocation))
{
byte[] fileContent = await response.Content.ReadAsByteArrayAsync();
file = new InMemoryFile(fileName, fileContent);
}
return file;
}
Now the issues I run into is after I changed DownloadPartFiles to get all the files in parallel my for loop is now seeming to go past its stop condition. For example, if parameters.FileNames.Length returns 12 the for loop should not run when i = 12 and it should exit the loop. However, in my testing it will continue to run when i = 12 and as one might expect I run into an out of bounds error. I tried to set breakpoints in my code to make sure that it was actually running past the stop condition and more weird behavior arose. In my for loop I included two if statements with breakpoint variables to break on. It will always break when i should be on its last loop but will never break when i is one after its expected last loop. It seems to skip that breakpoint when i is one past the expected stop condition. It will run fine if I step through the code while debugging but will out of bounds error when I let it run normally.
I'm not sure why this is happening but I am still new to asynchronous programming so maybe its just an oversight somewhere. Let me know if I need to explain anything further.
I make a critical mistake in that I tried to wrap an asynchronous method (my GetInMemoryFile method) in the Task.Run() method which is used to wrap synchronous methods to make them run asynchronously. This caused the weird behavior.
So in short I changed
fileTasks.Add(Task.Run(() => GetInMemoryFile(parameters.FileLocations[i], parameters.FileNames[i])));
To
fileTasks.Add(GetInMemoryFile(parameters.FileLocations[i], parameters.FileNames[i]));

Google API Client for .NET: How to implement Exponential Backoff

I've created a method to add members in a Batch Request to a google group using .NET core and google's .NET client library. The code looks like this:
private void InitializeGSuiteDirectoryService()
{
_directoryServiceCredential = GoogleCredential
.FromJson(GlobalSettings.Instance.GSuiteSettings.Credentials)
.CreateScoped(_scopes)
.CreateWithUser(GlobalSettings.Instance.GSuiteSettings.User);
_directoryService = new DirectoryService(new BaseClientService.Initializer()
{
HttpClientInitializer = _directoryServiceCredential,
ApplicationName = _applicationName
});
}
public OperationResult<int> AddGroupMembers(Group group, IEnumerable<Member> members)
{
var result = new OperationResult<int>();
var memberList = members.ToList();
var batchRequestCount = 0;
if (memberList.Any())
{
var request = new BatchRequest(_directoryService);
foreach (var member in memberList)
{
batchRequestCount++;
request.Queue<Members>(_directoryService.Members.Insert(member, group.Id), (content, error, i, message) =>
{
if (message.IsSuccessStatusCode)
{
//log OK
}
else
{
// Implement Exponential backoff only on the request that failed.
}
});
if (batchRequestCount == 30|| member.Equals(memberList.Last()))
{
request.ExecuteAsync().Wait();
request = new BatchRequest(_directoryService); //Clear queue
}
}
}
return result;
}
The logic works fine if the amount of members is small; however, when the members count is let's say 100( this is the max amount of users in my google's test instance), I get an Error from Google that reads: "quotaExceeded". According to Google's documentation, the limit for a batch request on their Admin SDK is 1000 and I've set my logic to Execute when we reach a limit of 30.
The QUESTION is: How do I implement error handling to retry whenever I get this error? Their documentation suggests implementing 'Exponential Backoff' with a response that contains a 'retry-able error'(I don't see this when I inspect my response).
So here's what I ended up doing to implement Exponential Backoff on my call to add members to a Gsuite group. Since I'm using dotnet core, I was able to use 'Polly', which is a resilience and transient-fault-handling library that offers this functionality out of the box. There may be some need for refactoring, but here's what the code looks like for now:
public OperationResult<int> AddGroupMembers(Group group, IEnumerable<Member> members)
{
var result = new OperationResult<int>();
var memberList = members.ToList();
var batchRequestCount = 0;
if (memberList.Any())
{
var request = new BatchRequest(_directoryService);
foreach (var member in memberList)
{
retryRequest = false; // This variable needs to be declared at the class level to guarantee the value is available to the original thread running the process.
batchRequestCount++;
request.Queue<Members>(_directoryService.Members.Insert(member, group.Id), (content, error, i, message) =>
{
// If error code is 'quotaExceeded' retry the request ( You can add as many error codes as you'd like to retry here)
if (error.Code == 403)
{
retryRequest = true;
}
});
// Execute batch request to add members in batches of 30 member max
if (batchRequestCount == 30|| member.Equals(memberList.Last()))
{
// Below is what the code to retry using polly looks like
var response = Policy
.HandleResult<HttpResponseMessage>(message => message.StatusCode == HttpStatusCode.Conflict)
.WaitAndRetry(new[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(4)
}, (results, timeSpan, retryCount, context) =>
{
// Log Warn saying a retry was required.
})
.Execute(() =>
{
var httpResponseMsg = new HttpResponseMessage();
// Execute batch request Synchronously
request.ExecuteAsync().Wait();
if (retryRequest)
{
httpResponseMsg.StatusCode = HttpStatusCode.Conflict;
retryRequest = false;
}
else
{
httpResponseMsg.StatusCode = HttpStatusCode.OK;
}
return httpResponseMsg;
});
if (response.IsSuccessStatusCode)
{
// Log info
}
else
{
// Log warn
}
requestCount = 0;
request = new BatchRequest(_directoryService);
batchCompletedCount++;
}
}
}
return result;
}

BroadcastBlock missing items

I have a list of project numbers that I need to process. A project could have about 8000 items and I need to get the data for each item in the project and then push this data into a list of servers. Can anybody please tell me the following..
1) I have 1000 items in iR but only 998 were written to the servers. Did I loose items by using broadCastBlock?
2) Am I doing the await on all actionBlocks correctly?
3) How do I make the database call async?
Here is the database code
public MemcachedDTO GetIR(MemcachedDTO dtoItem)
{
string[] Tables = new string[] { "iowa", "la" };
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["test"].ConnectionString))
{
using (SqlCommand command = new SqlCommand("test", connection))
{
DataSet Result = new DataSet();
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add("#ProjectId", SqlDbType.VarChar);
command.Parameters["#ProjectId"].Value = dtoItem.ProjectId;
connection.Open();
Result.EnforceConstraints = false;
Result.Load(command.ExecuteReader(CommandBehavior.CloseConnection), LoadOption.OverwriteChanges, Tables);
dtoItem.test = Result;
}
}
return dtoItem;
}
Update:
I have updated the code to the below. It just hangs when I run it and only writes 1/4 of the data to the server? Can you please let me know what I am doing wrong?
public static ITargetBlock<T> CreateGuaranteedBroadcastBlock<T>(IEnumerable<ITargetBlock<T>> targets, DataflowBlockOptions options)
{
var targetsList = targets.ToList();
var block = new ActionBlock<T>(
async item =>
{
foreach (var target in targetsList)
{
await target.SendAsync(item);
}
}, new ExecutionDataflowBlockOptions
{
CancellationToken = options.CancellationToken
});
block.Completion.ContinueWith(task =>
{
foreach (var target in targetsList)
{
if (task.Exception != null)
target.Fault(task.Exception);
else
target.Complete();
}
});
return block;
}
[HttpGet]
public async Task< HttpResponseMessage> ReloadItem(string projectQuery)
{
try
{
var linkCompletion = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
};
var cts = new CancellationTokenSource();
var dbOptions = new DataflowBlockOptions { CancellationToken = cts.Token };
IList<string> projectIds = projectQuery.Split(',').ToList();
IEnumerable<string> serverList = ConfigurationManager.AppSettings["ServerList"].Split(',').Cast<string>();
var iR = new TransformBlock<MemcachedDTO, MemcachedDTO>(
dto => dto.GetIR(dto), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3 });
List<ActionBlock<MemcachedDTO>> actionList = new List<ActionBlock<MemcachedDTO>>();
List<MemcachedDTO> dtoList = new List<MemcachedDTO>();
foreach (string pid in projectIds)
{
IList<MemcachedDTO> dtoTemp = new List<MemcachedDTO>();
dtoTemp = MemcachedDTO.GetItemIdsByProject(pid);
dtoList.AddRange(dtoTemp);
}
foreach (string s in serverList)
{
var action = new ActionBlock<MemcachedDTO>(
async dto => await PostEachServerAsync(dto, s, "setitemcache"));
actionList.Add(action);
}
var bBlock = CreateGuaranteedBroadcastBlock(actionList, dbOptions);
foreach (MemcachedDTO d in dtoList)
{
await iR.SendAsync(d);
}
iR.Complete();
iR.LinkTo(bBlock);
await Task.WhenAll(actionList.Select(action => action.Completion).ToList());
return Request.CreateResponse(HttpStatusCode.OK, new { message = projectIds.ToString() + " reload success" });
}
catch (Exception ex)
{
return Request.CreateResponse(HttpStatusCode.InternalServerError, new { message = ex.Message.ToString() });
}
}
1) I have 1000 items in iR but only 998 were written to the servers. Did I loose items by using broadCastBlock?
Yes in the code below you set BoundedCapacity to one, if at anytime your BroadcastBlock cannot pass along an item it will drop it. Additionally a BroadcastBlock will only propagate Completion to one TargetBlock, do not use PropagateCompletion=true here. If you want all blocks to complete you need to handle Completion manually. This can be done by setting the ContinueWith on the BroadcastBlock to pass Completion to all of the connected targets.
var action = new ActionBlock<MemcachedDTO>(dto => PostEachServerAsync(dto, s, "set"), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3, BoundedCapacity = 1 });
broadcast.LinkTo(action, linkCompletion);
actionList.Add(action);
Option: Instead of the BroadcastBlock use a properly bounded BufferBlock. When your downstream blocks are bound to one item they cannot receive additional items until they finish processing what they have. That will allow the BufferBlock to offer its items to another, possibly idle, ActionBlock.
When you add items into a throttled flow, i.e. a flow with a BoundedCapacity less than Unbounded. You need to be using the SendAsync method or at least handling the return of Post. I'd recommend simply using SendAsync:
foreach (MemcachedDTO d in dtoList)
{
await iR.SendAsync(d);
}
That will force your method signature to become:
public async Task<HttpResponseMessage> ReloadItem(string projectQuery)
2) Am I doing the await on all actionBlocks correctly?
The previous change will permit you to loose the blocking Wait call in favor of a await Task.WhenAlll
iR.Complete();
actionList.ForEach(x => x.Completion.Wait());
To:
iR.Complete();
await bufferBlock.Completion.ContinueWith(tsk => actionList.ForEach(x => x.Complete());
await Task.WhenAll(actionList.Select(action => action.Completion).ToList());
3) How do I make the database call async?
I'm going to leave this open because it should be a separate question unrelated to TPL-Dataflow, but in short use an async Api to access your Db and async will naturally grow through your code base. This should get you started.
BufferBlock vs BroadcastBlock
After re-reading your previous question and the answer from #VMAtm. It seems you want each item sent to All five servers, in that case you will need a BroadcastBlock. You would use a BufferBlock to distribute the messages relatively evenly to a flexible pool of servers that each could handle a message. None the less, you will still need to take control of propagating completion and faults to all the connected ActionBlocks by awaiting the completion of the BroadcastBlock.
To Prevent BroadcastBlock Dropped Messages
In general you two options, set your ActionBlocks to be unbound, which is their default value:
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3, BoundedCapacity = Unbounded });
Or broadcast messages your self from any variety of your own construction. Here is an example implementation from #i3arnon. And another from #svick

Generics around Entityframework DbContext causes performance degradation?

I wrote a simple import/export application that transforms data from source->destination using EntityFramework and AutoMapper. It basically:
selects batchSize of records from the source table
'maps' data from source->destination entity
add new destination entities to destination table and saves context
I move around 500k records in under 5 minutes. After I refactored the code using generics the performance drops drastically to 250 records in 5 minutes.
Are my delegates that return DbSet<T> properties on the DbContext causing these problems? Or is something else going on?
Fast non-generic code:
public class Importer
{
public void ImportAddress()
{
const int batchSize = 50;
int done = 0;
var src = new SourceDbContext();
var count = src.Addresses.Count();
while (done < count)
{
using (var dest = new DestinationDbContext())
{
var list = src.Addresses.OrderBy(x => x.AddressId).Skip(done).Take(batchSize).ToList();
list.ForEach(x => dest.Address.Add(Mapper.Map<Addresses, Address>(x)));
done += batchSize;
dest.SaveChanges();
}
}
src.Dispose();
}
}
(Very) slow generic code:
public class Importer<TSourceContext, TDestinationContext>
where TSourceContext : DbContext
where TDestinationContext : DbContext
{
public void Import<TSourceEntity, TSourceOrder, TDestinationEntity>(Func<TSourceContext, DbSet<TSourceEntity>> getSourceSet, Func<TDestinationContext, DbSet<TDestinationEntity>> getDestinationSet, Func<TSourceEntity, TSourceOrder> getOrderBy)
where TSourceEntity : class
where TDestinationEntity : class
{
const int batchSize = 50;
int done = 0;
var ctx = Activator.CreateInstance<TSourceContext>();
//Does this getSourceSet delegate cause problems perhaps?
//Added this
var set = getSourceSet(ctx);
var count = set.Count();
while (done < count)
{
using (var dctx = Activator.CreateInstance<TDestinationContext>())
{
var list = set.OrderBy(getOrderBy).Skip(done).Take(batchSize).ToList();
//Or is the db-side paging mechanism broken by the getSourceSet delegate?
//Added this
var destSet = getDestinationSet(dctx);
list.ForEach(x => destSet.Add(Mapper.Map<TSourceEntity, TDestinationEntity>(x)));
done += batchSize;
dctx.SaveChanges();
}
}
ctx.Dispose();
}
}
Problem is invocation of the Func delegates you're doing a lot. Cache the resulting values in variables and it'll be fine.

collection processing in BackgroundWorker

I try to make my ListBox connected to ObservaleCollection be more efficient so for the DB query I implemented a BackgroundWorker to do the job. Then whithin this backgroundworker I want to add every lets say 70 ms 3 entries to the UI, so the UI on larger number of entries (lets say 100) does not get blocked.
Here is the code:
void updateTMWorker_DoWork(object sender, DoWorkEventArgs e)
{
var MessagesInDB = from MessageViewModel tm in MessagesDB.Messages
where tm.Type.Equals(_type)
orderby tm.Distance
select tm;
// Execute the query and place the results into a collection.
Dispatcher.BeginInvoke(() => { MessagesClass.Instance.Messages = new ObservableCollection<MessageViewModel>(); });
Collection<MessageViewModel> tempM = new Collection<MessageViewModel>();
int tempCounter = 0;
foreach (MessageViewModel mToAdd in MessagesInDB)
{
if (MessagesClass.Instance.Messages.IndexOf(mToAdd) == -1)
{
tempM.Add(mToAdd);
tempCounter = tempCounter + 1;
}
if (tempCounter % 3 == 0)
{
tempCounter = 0;
Debug.WriteLine("SIZE OF TEMP:" + tempM.Count());
Dispatcher.BeginInvoke(() =>
{
// add 3 messages at once
MessagesClass.Instance.Messages.Add(tempM[0]);
MessagesClass.Instance.Messages.Add(tempM[1]);
MessagesClass.Instance.Messages.Add(tempM[2]);
});
tempM = new Collection<MessageViewModel>();
Thread.Sleep(70);
}
}
// finish off the rest
Dispatcher.BeginInvoke(() =>
{
for (int i = 0; i < tempM.Count(); i++)
{
MessagesClass.Instance.Messages.Add(tempM[i]);
}
});
}
The output is:
SIZE OF TEMP:3
A first chance exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
in the line: MessagesClass.Instance.Messages.Add(tempM[0]); where the code tries to access the first element of tempM
Any hints whats wrong? Why can't I access the tempM elements, although the collection size is > 0?
You forget about thread synchronization. Look at your code:
1: Debug.WriteLine("SIZE OF TEMP:" + tempM.Count());
Dispatcher.BeginInvoke(() =>
{
// add 3 messages at once
3: MessagesClass.Instance.Messages.Add(tempM[0]);
MessagesClass.Instance.Messages.Add(tempM[1]);
MessagesClass.Instance.Messages.Add(tempM[2]);
});
2: tempM = new Collection<MessageViewModel>();
tempM already will be null when MessagesClass.Instance.Messages.Add(tempM[0]); is executed. So, use some sort or synchronization objects, for example:
EventWaitHandle Wait = new AutoResetEvent(false);
Debug.WriteLine("SIZE OF TEMP:" + tempM.Count());
Dispatcher.BeginInvoke(() =>
{
// add 3 messages at once
MessagesClass.Instance.Messages.Add(tempM[0]);
MessagesClass.Instance.Messages.Add(tempM[1]);
MessagesClass.Instance.Messages.Add(tempM[2]);
Wait.Set();
});
// wait while tempM is not in use anymore
Wait.WaitOne();
tempM = new Collection<MessageViewModel>();

Resources