How should I handle very large projections in an event-sourcing context?

How should I handle very large projections in an event-sourcing context? - event-sourcing

I wanted to explore the implications of event-sourcing v.s. active-record.
Suppose I have events with payloads like this:
{
"type": "userCreated",
"id": "4a4cf26c-76ec-4a5a-b839-10cadd206eac",
"name": "Alice",
"passwordHash": "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
}
... and...
{
"type": "userDeactivated",
"id": "39fd0e9a-1025-42e6-8793-ed5bfa236f40"
}
I can reach the current state of my system with a reducer like this:
const activeUsers = new Map();
for (const event of events) {
// userCreated
if (event.payload.type == 'userCreated') {
const { id, name, passwordHash } = event.payload;
if (!activeUsers.has(id)) {
activeUsers.set(id, { name, passwordHash });
}
}
// userDeactivated
if (event.payload.type == 'userDeactivated') {
const { id } = event.payload;
if (activeUsers.has(id)) {
activeUsers.delete(id);
}
}
}
However, I cannot have my entire user table in a single Map.
So it seems I need a reducer for each user:
const userReducer = id => // filter events by user id...
But this will lead to slow performance because I need to run a reducer over all events for each new user.
I could also shard the users by a function of their id:
const shard = nShards => id => {
let hash = 0, i, chr;
if (this.length === 0) {
return hash;
}
for (i = 0; i < this.length; i++) {
chr = this.charCodeAt(i);
hash = ((hash << 5) - hash) + chr;
hash |= 0; // Convert to 32bit integer
}
return hash % nShards;
};
Then the maps will be less enormous.
How is this problem typically solved in event-sourcing models?

As I understand you think you need to replay all the events using a reducer in order to query all the users, correct?
This is where cqrs comes into play together with read models/denormalizers.
What almost everyone does is they have a read model (which for example is stored in a sql database or something else which is good at querying data). this read model is constantly being updated when new events are created.
When you need to query all users you query this read model and not replay all events.

Related

2 items added to DynamoDB when I run putItem

I am working on a bookmark skill for Alexa to teach myself DynamoDB. I've got over various hurdles, and can now write to my table. The issue is that whenever I putItem it adds two items. I'm trying to store the userID (partition key in DynamoDB), the timestamp of the request (as a string, and the sort key in DynamoDB), the title of a book and the page the user is on. This issue has only started since I tried working with a composite key, but I think I will need both these fields to a) get a unique primary key and b) be able to find the last item saved by a user.
Here's my intent code in Lambda:
'addBookmark': function() {
//delegate to Alexa to collect all the required slot values
var filledSlots = delegateSlotCollection.call(this);
//Get slot values as variables
var userID = this.event.session.user.userId;
var pageNumber = this.event.request.intent.slots.pageNumber.value;
var bookTitle = this.event.request.intent.slots.bookTitle.value;
//DynamoDB expects the timestamp as a string, so we convert it
var timeStamp = Date.now().toString();
var params = {
TableName: 'bookmarkV6',
Item: {
'userID' : {S: userID},
'timeStamp': { S: timeStamp },
'bookTitle': { S: bookTitle },
'pageNumber': { N: pageNumber },
}
};
//Call DynamoDB to add the item to the table
ddb.putItem(params, function(err, data) {
if (err) {
console.log("Error", err);
}
else {
console.log("Success", data);
}
});
const speechOutput = "OK, I've made a note that you're on page " + pageNumber + " of " + bookTitle + ".";
this.response.cardRenderer("Bookmark", "Page " + pageNumber + " of " + sentenceCase(bookTitle) +"\n \n" + stringToDate(timeStamp));
this.response.speak(speechOutput);
this.emit(':responseReady');
},
The "duplicate" items have slightly different timestamp values.

I am also having the same issues. It is happening of delegate collections used, but not able to solve. I have delegate slot confirmation for 6 slots and when I give all 6 slots value, finally I end up with 7 records in the table.

In delegateSlotCollection() function, return "COMPLETED" in the else block and in your addbookmark intent , please check like below after your delegateSlotCollection.call method
var filledSlots = delegateSlotCollection.call(this);
if(filledSlots==='COMPLETED'){
place all your save dynamodb logic here.
}

Running async function in parallel using LINQ's AsParallel()

I have a Document DB repository class that has one get method like below:
private static DocumentClient client;
public async Task<TEntity> Get(string id, string partitionKey = null)
{
try
{
RequestOptions requestOptions = null;
if (partitionKey != null)
{
requestOptions = new RequestOptions { PartitionKey = new PartitionKey(partitionKey) };
}
var result = await client.ReadDocumentAsync(
UriFactory.CreateDocumentUri(DatabaseId, CollectionId, id),
requestOptions);
return (TEntity)(dynamic)result.Resource;
}
catch (DocumentClientException e)
{
// Have logic for different exceptions actually
throw;
}
}
I have two collections - Collection1 and Collection2. Collection1 is non-partitioned whereas Collection2 is partitioned.
On the client side, I create two repository objects, one for each collection.
private static DocumentDBRepository<Collection1Item> collection1Repository = new DocumentDBRepository<Collection1Item>("Collection1");
private static DocumentDBRepository<Collection2Item> collection2Repository = new DocumentDBRepository<Collection2Item>("Collection2");
List<Collection1Item> collection1Items = await collection1Repository.GetItemsFromCollection1(); // Selects first forty documents based on time
List<UIItem> uiItems = new List<UIItem>();
foreach (var item in collection1Items)
{
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId); // TargetId is my partition key for Collection2
uiItems.Add(new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
});
}
This works fine. But since it is happening sequentially with foreach, I wanted to do those Get calls in parallel. When I do it in parallel as below:
ConcurrentBag<UIItem> uiItems = new ConcurrentBag<UIItem>();
collection1Items.AsParallel().ForAll(async item => {
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId); // TargetId is my partition key for Collection2
uiItems.Add(new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
});
}
);
It doesn't work and uiItems is always empty.

You don't need Parallel.For to run async operations concurrently. If they are truly asynchronous they already run concurrently.
You could collect the task returned from each operation and simply call await Task.WhenAll() on all the tasks. If you modify your lambda to create and return a UIItem, the result of await Task.WhenAll() will be a collection of UIItems. No need to modify global state from inside the concurrent operations.
For example:
var itemTasks = collection1Items.Select(async item =>
{
var collection2Item = await storageRepository.Get(item.Collection2Reference, item.TargetId);
return new UIItem
{
ItemId = item.ItemId,
Collection1Reference = item.Id,
TargetId = item.TargetId,
Collection2Reference = item.Collection2Reference,
Value = collection2Item.Value
}
});
var results= await Task.WhenAll(itemTasks);
A word of caution though - this will fire all Get operations concurrently. That may not be what you want, especially when calling a service with rate limiting.

Try simply starting tasks and waiting for all of them at the end. That would result in parallel execution.
var tasks = collection1Items.Select(async item =>
{
//var collection2Item = await storageRepository.Get...
return new UIItem
{
//...
};
});
var uiItems = await Task.WhenAll(tasks);
PLINQ is useful when working with in-memory constructs and using as many threads as possible, but if used with the async-await technique (which is for releasing threads while accessing external resources), you can end up with strange results.

I would like to share a solution for an issue i saw in some comments.
If you're scared about thread rate limit, and you want to limit this by yourself, you can do something like this, using SemaphoreSlim.
var nbCores = Environment.ProcessorCount;
var semaphore = new SemaphoreSlim(nbCores, nbCores);
var processTasks = items.Select(async x =>
{
await semaphore.WaitAsync();
try
{
await ProcessAsync();
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(processTasks);
In this example, i called concurrently my "ProcessAsync" but limited to {processor number} concurrent processes.
Hope that's help someone.
NB : You could set the "nbCores" variable as a proper value that satisfy your code condition, of course.
NB 2 : This example is for some use cases, not all of them. I would highly suggest with a big load of task to refer to TPL programming

display calculated data with CouchDB and PouchDB

I'm trying to understand how to return calculated data on docs using CouchDB and PouchDB.
Say I have two types of docs on my CouchDB: Blocks and Reports.
Reports consists of: report_id, block_id and date.
Block consists of: block_id and name.
I'd like to calculate for each block it's last report_id (the id of the most recent report), and return it with block's doc.
Is there a way to achieve that?
I'm assuming that a View of some type will do the trick but I can't figure it out.

You can do this with map/reduce functions in CouchDB.
Let's say you have those documents :
{
"_id": "report_1",
"type": "report",
"block_id": "block_1",
"date": "1500325245"
}
{
"_id": "report_2",
"type": "report",
"block_id": "block_1",
"date": "1153170045"
}
You would like to get the reports with the highest timestamp (in this case, repot_1).
We start by creating a map function that will map the documents with the bloc_id as the key and the timestamp+ report id as the value for reduce function.
Map :
function (doc) {
if(doc.type == "report")
emit(doc.block_id,{date:doc.created,report:doc._id});
}
Then, we will create a reduce function. When rereduce is false, we will simply return the values. When rereduce is true, we will find the maximum timestamp and return the report id associated to it
Reduce function :
function (keys, values, rereduce) {
if (rereduce) {
var max = 0;
var maxReportId = -1;
for (var i = 0; i < values.length; i++) {
var val = values[i][0];
if (parseInt(val.date) > max) {
max = val.date;
maxReportId = val.report;
}
}
//We return the report id of the most recent report.
return maxReportId;
} else
return values;
}

How to route, group, or otherwise split up messages into consistent sets using TPL Dataflow

I'm new to TPL Dataflow and I'm looking for a construct which will allow splitting up a list of source messages for evenly distributed parallel processing while maintaining order of the messages message through individual pipelines. Is there a specific Block or concept within the DataFlow API that can be used to accomplish this or is it more of a matter providing glue code or custom Blocks between existing Blocks?
For those familiar with Akka.NET I'm looking for functionality similar to the ConsistentHashing router which allow sending messages to a single router which then forwards these messages on to individual routees to be handled.
Synchronous example:
var count = 100000;
var processingGroups = 5;
var source = Enumerable.Range(1, count);
// Distribute source elements consistently and evenly into a specified set of groups (ex. 5) so that.
var distributed = source.GroupBy(s => s % processingGroups);
// Within each of the 5 processing groups go through each item and add 1 to it
var transformed = distributed.Select(d => d.Select(i => i + 3).ToArray());
List<int[]> result = transformed.ToList();
Check.That(result.Count).IsEqualTo(processingGroups);
for (int i = 0; i < result.Count; i++)
{
var outputGroup = result[i];
var expectedRange = Enumerable.Range(i + 1, count/processingGroups).Select((e, index) => e + (index * (processingGroups - 1)) + 3);
Check.That(outputGroup).ContainsExactly(expectedRange);
}

In general I don't think what you're looking for is pre-made in Dataflow as it may be with a ConsistentHashing router. However, by adding an id to the pieces of data you wish to flow you can process them in any order, in parallel and reorder them when the processing finishes.
public class Message {
public int MessageId { get; set; }
public int GroupId { get; set; }
public int Value { get; set; }
}
public class MessageProcessing
{
public void abc() {
var count = 10000;
var groups = 5;
var source = Enumerable.Range(0, count);
//buffer all input
var buffer = new BufferBlock<IEnumerable<int>>();
//split each input enumerable into processing groups
var messsageProducer = new TransformManyBlock<IEnumerable<int>, Message>(ints =>
ints.Select((i, index) => new Message() { MessageId = index, GroupId = index % groups, Value = i }).ToList());
//process each message, one action block may process any group id in any order
var processMessage = new TransformBlock<Message, Message>(msg =>
{
msg.Value++;
return msg;
}, new ExecutionDataflowBlockOptions() {
MaxDegreeOfParallelism = groups
});
//output of processed message values
int[] output = new int[count];
//insert messages into array in the order the started in
var regroup = new ActionBlock<Message>(msg => output[msg.MessageId] = msg.Value,
new ExecutionDataflowBlockOptions() {
MaxDegreeOfParallelism = 1
});
}
}
In the example the GroupId of a message isn't used but it could be used in a more complete example for coordinating groups of messages. Also, handling follow up posts to the bufferblock could be done by changing the output array to a List and setting up a corresponding list element each time an enumerable of integers is posted to the buffer block. Depending on your exact use, you may need to support multiple users of the output, and this can be folded back into the flow.

You can dynamically create a pipeline with linking the blocks between each other based on predicate:
var count = 100;
var processingGroups = 5;
var source = Enumerable.Range(1, count);
var buffer = new BufferBlock<int>();
var consumer1 = new ActionBlock<int>(i => { });
var consumer2 = new ActionBlock<int>(i => { });
var consumer3 = new ActionBlock<int>(i => { });
var consumer4 = new ActionBlock<int>(i => { Console.WriteLine(i); });
var consumer5 = new ActionBlock<int>(i => { });
buffer.LinkTo(consumer1, i => i % 5 == 1);
buffer.LinkTo(consumer2, i => i % 5 == 2);
buffer.LinkTo(consumer3, i => i % 5 == 3);
buffer.LinkTo(consumer4, i => i % 5 == 4);
buffer.LinkTo(consumer5);
foreach (var i in source)
{
buffer.Post(i);
// consider async option if you able to do it
// await buffer.SendAsync(i);
}
buffer.Complete();
Console.ReadLine();
The code above will write only numbers from 4th group, processing other groups silently, but I hope you got the idea. There is a general practice to link a block for at least one consumer without filtering for messages not being dropped if they aren't accepted by any consumers, and you can do this if you don't have a default handler (NullTarget<int> simply ignores all the messages it got):
buffer.LinkTo(DataflowBlock.NullTarget<int>());
The downside of this is a continuation of it's advantages: you have to provide predicates, as there is no built-in structures for this. However, it still could be done.

How can I change the dropdownlist's options?

I'm a newbie in asp.net so I'm hoping you could give me some help on my dropdownlist bound to a table.
Here's the scenario:
I have a table Account with fields UserId, UserName and Type. The Type field contains 3 items: 'S', 'A', and 'U'. Each user has his own Type. I have a dropdownlist named 'ddlType' which is already
bound on the Account table. However, I want the options of the dropdownlist to be displayed as 'Stakeholder', 'Approver', and 'User' instead of displaying letters/initials only. Since I do not prefer making any changes in the database, how can I change those options through code behind?
Here's my code:
public void BindControls(int selectedUserId)
{
DataTable dtAccount = null;
try
{
dtAccount = LogBAL.GetAccountDetails(selectedUserId);
if (dtAccount.Rows.Count > 0)
{
lblUserId.Text = dtAccount.Rows[0]["UserId"].ToString();
txtUserName.Text = dtAccount.Rows[0]["UserName"].ToString();
ddlType.SelectedValue = dtAccount.Rows[0]["Type"].ToString();
}
}
catch (Exception ex)
{
throw ex;
}
finally
{
dtAccount.Dispose();
}
}
Any help from you guys will be appreciated. Thanks in advanced! :D

You can bind Dropdownlist in codebehind.
Get your data "Type" into array. Make array two dimentional and assign values to it.
After having data your array will looks like this,
string[,] Types = { { "Stakeholder", "S" }, { "Approver", "A" }, { "User", "U" } };
Now assign values to Dropdown,
int rows = Types.GetUpperBound(0);
int columns = Types.GetUpperBound(1);
ddlType.Items.Clear();
for (int currentRow = 0; currentRow <= rows; currentRow++)
{
ListItem li = new ListItem();
for (int currentColumn = 0; currentColumn <= columns; currentColumn++)
{
if (currentColumn == 0)
{
li.Text = Types[currentRow, currentColumn];
}
else
{
li.Value = Types[currentRow, currentColumn];
}
}
ddlType.Items.Add(li);
}
I haven't tested it but hopefully it will work.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How should I handle very large projections in an event-sourcing context? - event-sourcing

Related

2 items added to DynamoDB when I run putItem

Running async function in parallel using LINQ's AsParallel()

display calculated data with CouchDB and PouchDB

How to route, group, or otherwise split up messages into consistent sets using TPL Dataflow

How can I change the dropdownlist's options?

Categories

Resources