Running async function in parallel using LINQ's AsParallel() - async-await

I have a Document DB repository class that has one get method like below:
private static DocumentClient client;
public async Task<TEntity> Get(string id, string partitionKey = null)
{
try
{
RequestOptions requestOptions = null;
if (partitionKey != null)
{
requestOptions = new RequestOptions { PartitionKey = new PartitionKey(partitionKey) };
}
var result = await client.ReadDocumentAsync(
UriFactory.CreateDocumentUri(DatabaseId, CollectionId, id),
requestOptions);
return (TEntity)(dynamic)result.Resource;
}
catch (DocumentClientException e)
{
// Have logic for different exceptions actually
throw;
}
}
I have two collections - Collection1 and Collection2. Collection1 is non-partitioned whereas Collection2 is partitioned.
On the client side, I create two repository objects, one for each collection.
private static DocumentDBRepository<Collection1Item> collection1Repository = new DocumentDBRepository<Collection1Item>("Collection1");
private static DocumentDBRepository<Collection2Item> collection2Repository = new DocumentDBRepository<Collection2Item>("Collection2");
List<Collection1Item> collection1Items = await collection1Repository.GetItemsFromCollection1(); // Selects first forty documents based on time
List<UIItem> uiItems = new List<UIItem>();
foreach (var item in collection1Items)
{
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId); // TargetId is my partition key for Collection2
uiItems.Add(new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
});
}
This works fine. But since it is happening sequentially with foreach, I wanted to do those Get calls in parallel. When I do it in parallel as below:
ConcurrentBag<UIItem> uiItems = new ConcurrentBag<UIItem>();
collection1Items.AsParallel().ForAll(async item => {
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId); // TargetId is my partition key for Collection2
uiItems.Add(new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
});
}
);
It doesn't work and uiItems is always empty.

You don't need Parallel.For to run async operations concurrently. If they are truly asynchronous they already run concurrently.
You could collect the task returned from each operation and simply call await Task.WhenAll() on all the tasks. If you modify your lambda to create and return a UIItem, the result of await Task.WhenAll() will be a collection of UIItems. No need to modify global state from inside the concurrent operations.
For example:
var itemTasks = collection1Items.Select(async item =>
{
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId);
return new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
}
});
var results= await Task.WhenAll(itemTasks);
A word of caution though - this will fire all Get operations concurrently. That may not be what you want, especially when calling a service with rate limiting.

Try simply starting tasks and waiting for all of them at the end. That would result in parallel execution.
var tasks = collection1Items.Select(async item =>
{
//var collection2Item = await storageRepository.Get...
return new UIItem
{
//...
};
});
var uiItems = await Task.WhenAll(tasks);
PLINQ is useful when working with in-memory constructs and using as many threads as possible, but if used with the async-await technique (which is for releasing threads while accessing external resources), you can end up with strange results.

I would like to share a solution for an issue i saw in some comments.
If you're scared about thread rate limit, and you want to limit this by yourself, you can do something like this, using SemaphoreSlim.
var nbCores = Environment.ProcessorCount;
var semaphore = new SemaphoreSlim(nbCores, nbCores);
var processTasks = items.Select(async x =>
{
await semaphore.WaitAsync();
try
{
await ProcessAsync();
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(processTasks);
In this example, i called concurrently my "ProcessAsync" but limited to {processor number} concurrent processes.
Hope that's help someone.
NB : You could set the "nbCores" variable as a proper value that satisfy your code condition, of course.
NB 2 : This example is for some use cases, not all of them. I would highly suggest with a big load of task to refer to TPL programming

Related

How to locking database record for IsolationLevel RepeatableRead in ABP UnitOfWork?

It's better if this issue is explained with an example. I have a database table VoucherNumber with an int column named [TotalNumberRecord]. It has only a record with the initial value of TotalNumberRecord == 0.
In my VoucherNumberAppService.cs, there are the following 2 methods
public async void TestIncrementA(string id)
{
using var unitOfWork = _unitOfWorkManager.Begin(isolationLevel: System.Data.IsolationLevel.RepeatableRead);
var query = await _voucherNumberService.GetQueryableAsync();
query = query.Where(p => p.Id == id);
var sections = await AsyncExecuter.FirstOrDefaultAsync(query);
sections.TotalNumberRecord++;
await Task.Delay(5000);
await _voucherNumberService.UpdateAsync(sections);
await unitOfWork.CompleteAsync();
}
public async void TestIncrementB(string id)
{
using var unitOfWork = _unitOfWorkManager.Begin(isolationLevel: System.Data.IsolationLevel.RepeatableRead);
var query = await _voucherNumberService.GetQueryableAsync();
query = query.Where(p => p.Id == id);
var sections = await AsyncExecuter.FirstOrDefaultAsync(query);
sections.TotalNumberRecord++;
await _voucherNumberService.UpdateAsync(sections);
await unitOfWork.CompleteAsync();
}
The 2 methods are essentially the same which increment the value of the column TotalNumberRecord by one except that the first method delays the thread.
Now I run 2 methods
TestIncrementA(id);
TestIncrementB(id);
I would expect the value of TotalNumberRecord in my database to be 2 now since it's been incremented twice. However it's only 1.
Is there something that I'm overlooking?

what is the faster and best way to make a few million SOAP requests and save the results into SqlDb

I have a million records in my table. I want to call a soap service and i need to do process in all the records in less than one hour. and besides i should update my table , insert the requests and responses in my other tables. but the code below works on less than 10 records every time i run my app.
I know My code is wrong,, I want to know what is the best way to do it.
static async Task Send( )
{
var results = new ConcurrentDictionary<string, int>();
using (AppDbContext entities = new AppDbContext())
{
var List = entities.Request.Where(x => x.State == RequestState.InitialState).ToList();
Parallel.ForEach(Enumerable.Range(0, List.Count), async index =>
{
var selected = List.FirstOrDefault();
List.Remove( selected );
var res1 = await DoAsyncJob1(selected); ///await
// var res = CallService(selected);
var res2 = await DoAsyncJob2(selected); ///await
var res3 = await DoAsyncJob3(selected); ///await
// var responses = await Task.WhenAll(DoAsyncJob1, DoAsyncJob2, DoAsyncJob3);
// results.TryAdd(index.ToString(), res);
});
}
}
static async Task<int> DoAsyncJob1(Request item)
{
using (AppDbContext entities = new AppDbContext())
{
var bReq = new BankRequest();
bReq.Amount = Convert.ToDecimal(item.Amount);
bReq.CreatedAt = DateTime.Now;
bReq.DIBAN = item.DIBAN;
bReq.SIBAN = item.SIBAN;
entities.BankRequest.Add(bReq);
entities.SaveChanges();
}
return item.Id;
}
static async Task<int> DoAsyncJob2(Request item)
{
using (AppDbContext entities = new AppDbContext())
{
}
return item.Id;
}
static async Task<int> DoAsyncJob3(Request item)
{
using (AppDbContext entities = new AppDbContext())
{
}
return item.Id;
}
Maybe the below lines are wrong :
var selected = List.FirstOrDefault();
List.Remove( selected );
Thanks in advance..
First, it is a bad practice to use async-await within Parallel.For - you introduce only more load to the task scheduler and more overhead.
Second, you are right:
var selected = List.FirstOrDefault();
List.Remove( selected );
is very, very wrong. Your code will behave in a totally unpredictable way, due to the race conditions.

Insert data from server to SQLite Database

I am inserting 3000 plus data from my server to my SQLite Database. The problem is the inserting process is very slow. Is there a better way to insert the data efficiently and effectively? What I am doing is I converted the data I got from my server to JSON Object and insert it one-by-one. I know what I am doing is inefficient. How can I fix this?
public class AndroidSQLiteDb : ISQLiteDB
{
public SQLiteAsyncConnection GetConnection()
{
var dbFileName = "backend.db3";
var documentsPath = System.Environment.GetFolderPath(System.Environment.SpecialFolder.Personal);
var path = Path.Combine(documentsPath, dbFileName);
return new SQLiteAsyncConnection(path);
}
}
public async void FirstSyncContacts(string host, string database, string contact)
{
try
{
var db = DependencyService.Get<ISQLiteDB>();
var conn = db.GetConnection();
var sql = "SELECT * FROM tblContacts WHERE Coordinator = '" + contact + "'";
var getContacts = conn.QueryAsync<ContactsTable>(sql);
var resultCount = getContacts.Result.Count;
var current_datetime = DateTime.Now.ToString("yyyy-MM-dd hh:mm:00");
//Check if the retailer has been sync
if (resultCount < 1)
{
try
{
syncStatus.Text = "Syncing Retailer";
var link = Constants.requestUrl + "Host=" + host + "&Database=" + database + "&Contact=" + contact + "&Request=9DpndD";
string contentType = "application/json";
JObject json = new JObject
{
{ "ContactID", contact }
};
HttpClient client = new HttpClient();
var response = await client.PostAsync(link, new StringContent(json.ToString(), Encoding.UTF8, contentType));
if (response.IsSuccessStatusCode)
{
var content = await response.Content.ReadAsStringAsync();
if (content != "")
{
var contactsresult = JsonConvert.DeserializeObject<List<ContactsData>>(content);
foreach (var item in contactsresult)
{
// update only the properties that you have to...
item.LastSync = Convert.ToDateTime(current_datetime);
item.ServerUpdate = Convert.ToDateTime(item.ServerUpdate);
item.MobileUpdate = Convert.ToDateTime(item.MobileUpdate);
}
await conn.InsertAsync(contactsresult);
}
}
//Proceed to next function
FirstSyncRetailerGroup(host, database, contact);
}
catch (Exception ex)
{
Console.Write("Syncing Retailer Error " + ex.Message);
}
}
//If not get the retailer
else
{
SyncContacts(host, database, contact);
}
}
catch (Exception ex)
{
Console.Write("Syncing Retailer Error " + ex.Message);
}
}
Use the non-async Insert in one background thread, instead of 3000 separate async calls...
Re-use the List from your DeserializeObject step instead of creating new local objects that will just be thrown away on each loop iteration.
No need to assign all those json properties (item.XXX) to another local variable, just update the properties of each existing ContactsData as needed before inserting it into the DB.
Example using SQLiteConnection:
// Use the non-async version of SQLiteConnection
var conn = new SQLiteConnection(dbPath, true, null);
// code removed for example...
await System.Threading.Tasks.Task.Run(() =>
{
var contactsresult = JsonConvert.DeserializeObject<List<ContactsData>>(content);
// start a transaction block so all 3000 records are committed at once.
conn.BeginTransaction();
// Use `foreach` in order shortcut the need to retrieve the object from the list via its index
foreach (var item in contactsresult)
{
// update only the properties that you have to...
item.LastSync = Convert.ToDateTime(current_datetime);
item.ServerUpdate = Convert.ToDateTime(item.ServerUpdate);
item.MobileUpdate = Convert.ToDateTime(item.MobileUpdate);
conn.Insert(item);
}
conn.Commit();
});
Example using SQLiteAsyncConnection:
var db = DependencyService.Get<ISQLiteDB>();
var conn = db.GetConnection();
~~~
var contactsresult = JsonConvert.DeserializeObject<List<ContactsData>>(content);
foreach (var item in contactsresult)
{
// update only the properties that you have to...
item.LastSync = Convert.ToDateTime(current_datetime);
item.ServerUpdate = Convert.ToDateTime(item.ServerUpdate);
item.MobileUpdate = Convert.ToDateTime(item.MobileUpdate);
}
conn.InsertAsync(contactsresult); // Insert the entire list at once...
I had the same problem so even the answer is some years late, maybe can be usefull for somebody.
This is how I did.
First: I get the data all from server as json
var response = await client.GetAsync("your_server_url");
var content = await response.Content.ReadAsStringAsync();
ResponseData = JsonConvert.DeserializeObject<DataModel>(content);
Second: Save data to database
await conn.InsertAllAsync(ResponseData)
But in my case, because our app works offline data, I first insert all data in a temp table, then I get all new records comparing main table with temp table.
NewDataFromTemp = await conn.QueryAsync<DataModel>("SELECT * FROM [TableTemp] t WHERE t.[TABLE_ID] NOT IN (SELECT g.[TABLE_ID] FROM [MainTable] g)");
And insert new records in Main table
await conn.InsertAllAsync(NewDataFromTemp)
Then I check for updated records
UpdatedDataFromTemp = await conn.QueryAsync<DataModel>("SELECT t.* FROM [TableTemp] t, [MainTable] o WHERE t.[TABLE_ID]=o.[TABLE_ID] AND t.[TABLE_UPDATED]>o.[TABLE_UPDATED]");
And update all record in main table
await conn.UpdateAllAsync(UpdatedDataFromTemp);
I use logical delete so when updating the logical delete will be updated too.

Populate SQLite database from generic "list of items"

I use a SQLite database to populate a listview with a generic list of TodoItems:
ListView.ItemsSource = await App.Database.GetItemsAsync("SELECT * FROM [TodoItem]");
I can read data of the same kind from a database source using the HttpClient class as well via
public async Task<List<TodoItem>> ReadDataAsync ()
....
Items = new List<TodoItem> ();
....
var content = await response.Content.ReadAsStringAsync ();
Items = JsonConvert.DeserializeObject <List<TodoItem>> (content);
....
return Items;
whereafter I can use this as a data source as well:
ListView.ItemsSource = await App.ReadDataAsync();
Both ways work well.
What I want now is combine both routines to accomplish a code which checks whether we have an online connection, if yes, drop the database and populate it using the result of ReadDataAsync from above, if no, leave the database alone.
I found no way to directly assign a List to my database, overwriting the whole contents, I think of something like:
App.Database.Items = await App.ReadDataAsync();
but SQLite doesn't seem to expose the data store directly. How can I accomplish this?
For android for example, you could try: There's no need to drop the database.
//Check for internet connection
var connectivityManager =
(ConnectivityManager)this.GetSystemService("connectivity");
var activeConnection = connectivityManager.ActiveNetworkInfo;
if ((activeConnection != null) && activeConnection.IsConnected)
{
//Make call to the API
var list = await App.ReadDataAsync();
//Update or Insert local records
foreach (var item in list)
{
var success = UpdateOrInsert(item);
//Do something after - like retry incase of failure
}
}
The UpdatedOrInsert method can look like:
private bool UpdateOrInsert(TodoItem item)
{
var rowsAffected = App.Database.Update(item);
if (rowsAffected == 0)
{
// The item doesn't exist in the database, therefore insert it
rowsAffected = App.Database.Insert(item);
}
var success = rowsAffected > 0;
return success;
}

BroadcastBlock missing items

I have a list of project numbers that I need to process. A project could have about 8000 items and I need to get the data for each item in the project and then push this data into a list of servers. Can anybody please tell me the following..
1) I have 1000 items in iR but only 998 were written to the servers. Did I loose items by using broadCastBlock?
2) Am I doing the await on all actionBlocks correctly?
3) How do I make the database call async?
Here is the database code
public MemcachedDTO GetIR(MemcachedDTO dtoItem)
{
string[] Tables = new string[] { "iowa", "la" };
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["test"].ConnectionString))
{
using (SqlCommand command = new SqlCommand("test", connection))
{
DataSet Result = new DataSet();
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add("#ProjectId", SqlDbType.VarChar);
command.Parameters["#ProjectId"].Value = dtoItem.ProjectId;
connection.Open();
Result.EnforceConstraints = false;
Result.Load(command.ExecuteReader(CommandBehavior.CloseConnection), LoadOption.OverwriteChanges, Tables);
dtoItem.test = Result;
}
}
return dtoItem;
}
Update:
I have updated the code to the below. It just hangs when I run it and only writes 1/4 of the data to the server? Can you please let me know what I am doing wrong?
public static ITargetBlock<T> CreateGuaranteedBroadcastBlock<T>(IEnumerable<ITargetBlock<T>> targets, DataflowBlockOptions options)
{
var targetsList = targets.ToList();
var block = new ActionBlock<T>(
async item =>
{
foreach (var target in targetsList)
{
await target.SendAsync(item);
}
}, new ExecutionDataflowBlockOptions
{
CancellationToken = options.CancellationToken
});
block.Completion.ContinueWith(task =>
{
foreach (var target in targetsList)
{
if (task.Exception != null)
target.Fault(task.Exception);
else
target.Complete();
}
});
return block;
}
[HttpGet]
public async Task< HttpResponseMessage> ReloadItem(string projectQuery)
{
try
{
var linkCompletion = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
};
var cts = new CancellationTokenSource();
var dbOptions = new DataflowBlockOptions { CancellationToken = cts.Token };
IList<string> projectIds = projectQuery.Split(',').ToList();
IEnumerable<string> serverList = ConfigurationManager.AppSettings["ServerList"].Split(',').Cast<string>();
var iR = new TransformBlock<MemcachedDTO, MemcachedDTO>(
dto => dto.GetIR(dto), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3 });
List<ActionBlock<MemcachedDTO>> actionList = new List<ActionBlock<MemcachedDTO>>();
List<MemcachedDTO> dtoList = new List<MemcachedDTO>();
foreach (string pid in projectIds)
{
IList<MemcachedDTO> dtoTemp = new List<MemcachedDTO>();
dtoTemp = MemcachedDTO.GetItemIdsByProject(pid);
dtoList.AddRange(dtoTemp);
}
foreach (string s in serverList)
{
var action = new ActionBlock<MemcachedDTO>(
async dto => await PostEachServerAsync(dto, s, "setitemcache"));
actionList.Add(action);
}
var bBlock = CreateGuaranteedBroadcastBlock(actionList, dbOptions);
foreach (MemcachedDTO d in dtoList)
{
await iR.SendAsync(d);
}
iR.Complete();
iR.LinkTo(bBlock);
await Task.WhenAll(actionList.Select(action => action.Completion).ToList());
return Request.CreateResponse(HttpStatusCode.OK, new { message = projectIds.ToString() + " reload success" });
}
catch (Exception ex)
{
return Request.CreateResponse(HttpStatusCode.InternalServerError, new { message = ex.Message.ToString() });
}
}
1) I have 1000 items in iR but only 998 were written to the servers. Did I loose items by using broadCastBlock?
Yes in the code below you set BoundedCapacity to one, if at anytime your BroadcastBlock cannot pass along an item it will drop it. Additionally a BroadcastBlock will only propagate Completion to one TargetBlock, do not use PropagateCompletion=true here. If you want all blocks to complete you need to handle Completion manually. This can be done by setting the ContinueWith on the BroadcastBlock to pass Completion to all of the connected targets.
var action = new ActionBlock<MemcachedDTO>(dto => PostEachServerAsync(dto, s, "set"), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3, BoundedCapacity = 1 });
broadcast.LinkTo(action, linkCompletion);
actionList.Add(action);
Option: Instead of the BroadcastBlock use a properly bounded BufferBlock. When your downstream blocks are bound to one item they cannot receive additional items until they finish processing what they have. That will allow the BufferBlock to offer its items to another, possibly idle, ActionBlock.
When you add items into a throttled flow, i.e. a flow with a BoundedCapacity less than Unbounded. You need to be using the SendAsync method or at least handling the return of Post. I'd recommend simply using SendAsync:
foreach (MemcachedDTO d in dtoList)
{
await iR.SendAsync(d);
}
That will force your method signature to become:
public async Task<HttpResponseMessage> ReloadItem(string projectQuery)
2) Am I doing the await on all actionBlocks correctly?
The previous change will permit you to loose the blocking Wait call in favor of a await Task.WhenAlll
iR.Complete();
actionList.ForEach(x => x.Completion.Wait());
To:
iR.Complete();
await bufferBlock.Completion.ContinueWith(tsk => actionList.ForEach(x => x.Complete());
await Task.WhenAll(actionList.Select(action => action.Completion).ToList());
3) How do I make the database call async?
I'm going to leave this open because it should be a separate question unrelated to TPL-Dataflow, but in short use an async Api to access your Db and async will naturally grow through your code base. This should get you started.
BufferBlock vs BroadcastBlock
After re-reading your previous question and the answer from #VMAtm. It seems you want each item sent to All five servers, in that case you will need a BroadcastBlock. You would use a BufferBlock to distribute the messages relatively evenly to a flexible pool of servers that each could handle a message. None the less, you will still need to take control of propagating completion and faults to all the connected ActionBlocks by awaiting the completion of the BroadcastBlock.
To Prevent BroadcastBlock Dropped Messages
In general you two options, set your ActionBlocks to be unbound, which is their default value:
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 3, BoundedCapacity = Unbounded });
Or broadcast messages your self from any variety of your own construction. Here is an example implementation from #i3arnon. And another from #svick

Resources