In a twitter-like application, one of the things they do is when someone posts a tweet, they iterate over all followers and create a copy of the tweet in their timeline. I need something similar. What is the best way to insert a tweet ID into say 10/100/1000 followers assuming I have a list of follower IDs.
I am doing it within Azure WebJobs using Azure Redis. Each webjob is automatically created for every tweet received in the queue. So I may have around 16 simultaneous jobs running at the same time where each one goes through followers and inserts tweets.I'm thinking if 99% of inserts happen, they should not stop because one or a few have failed. I need to continue but log it.
Question: Should I do CreateBatch like below? If I need to retrieve latest tweets first in reverse chronological order is below fine? performant?
var tasks = new List<Task>();
var batch = _cache.CreateBatch();
//loop start
tasks.Add(batch.ListRightPushAsync("follower_id", "tweet_id"));
//loop end
batch.Execute();
await Task.WhenAll(tasks.ToArray());
a) But how do I catch if something fails? try catch?
b) how do I check in a batch for a total # in each list and pop one out if it reaches a certain #? I want to do a LeftPop if the list is > 800. Not sure how to do it all inside the batch.
Please point me to a sample or let me have a snippet here. Struggling to find a good way. Thank you so much.
UPDATE
Does this look right based on #marc's comments?
var tasks = new List<Task>();
followers.ForEach(f =>
{
var key = f.FollowerId;
var task = _cache.ListRightPushAsync(key, value);
task.ContinueWith(t =>
{
if (t.Result > 800) _cache.ListLeftPopAsync(key).Wait();
});
tasks.Add(task);
});
Task.WaitAll(tasks.ToArray());
CreateBatch probably doesn't do what you think it does. What it does is defer a set of operations and ensure they get sent contiguously relative to a single connection - there are some occasions this is useful, but not all that common - I'd probably just send them individually if it was me. There is also CreateTransaction (MULTI/EXEC), but I don't think that would be a good choice here.
That depends on whether you care about the data you're popping. If not: I'd send a LTRIM, [L|R]PUSH pair - to trim the list to (max-1) before adding. Another option would be Lua, but it seems overkill. If you care about the old data, you'll need to do a range query too.
Related
I am currently making a turn based strategy game with laravel (mysql DB with InnoDB) engine and want to make sure that I don't have bugs due to race conditions, duplicate requests, bad actors etc...
Because these kind of bugs are hard to test, I wanted to get some clarification.
Many actions in the game can only occur once per turn, like buying a new unit. Here is a simplified bit of code for purchasing a unit.
$player = Player::find($player_id);
if($player->gold >= $unit_price && $player->has_purchased == false){
$player->has_purchased = true;
$player->gold -= $unit_price;
$player->save();
$unit = new Unit();
$unit->player_id = $player->id;
$unit->save();
}
So my concern would be if two threads both made it pass the if statement and then executed the block of code at the same time.
Is this a valid concern?
And would the solution be to wrap everything in a database transaction like https://betterprogramming.pub/using-database-transactions-in-laravel-8b62cd2f06a5 ?
This means that a good portion of my code will be wrapped around database transactions because I have a lot of instances that are variations of the above code for different actions.
Also there is a situation where multiple users will be able to update a value in the database so I want to avoid a situation where 2 users increment the value at the same time and it only gets incremented once.
Since you are using Laravel to presumably develop a web-based game, you can expect multiple concurrent connections to occur. A transaction is just one part of the equation. Transactions ensure operations are performed atomically, in your case it ensures that both the player and unit save are successful or both fail together, so you won't have the situation where the money is deducted but the unit is not granted.
However there is another facet to this, if there is a real possibility you have two separate requests for the same player coming in concurrently then you may also encounter a race condition. This is because a transaction is not a lock so two transactions can happen at the same time. The implication of this is (in your case) two checks happen on the same player instance to ensure enough gold is available, both succeed, and both deduct the same gold, however two distinct units are granted at the end (i.e. item duplication). To avoid this you'd use a lock to prevent other threads from obtaining the same player row/model, so your full code would be:
DB::transaction(function () use ($unit_price) {
$player = Player::where('id',$player_id)->lockForUpdate()->first();
if($player->gold >= $unit_price && $player->has_purchased == false){
$player->has_purchased = true;
$player->gold -= $unit_price;
$player->save();
$unit = new Unit();
$unit->player_id = $player->id;
$unit->save();
}
});
This will ensure any other threads trying to retrieve the same player will need to wait until the lock is released (which will happen at the end of the first request).
There's more nuances to deal with here as well like a player sending a duplicate request from double-clicking for example, and that can get a bit more complex.
For you purchase system, it's advisable to implement DB:transaction since it protects you from false records. Checkout the laravel docs for more information on this https://laravel.com/docs/9.x/database#database-transactions As for reactive data you need to keep track of, simply bind a variable to that data in your frontEnd, then use the variable to update your DB records.
In the case you need to exit if any exception or error occurs. If an exception is thrown the data will not save and rollback all the transactions. I recommand to use transactions as possible as you can. The basic format is:
DB::beginTransaction();
try {
// database actions like create, update etc.
DB::commit(); // finally commit to database
} catch (\Exception $e) {
DB::rollback(); // roll back if any error occurs
// something went wrong
}
See the laravel docs here
I am using Lambdas and SQS queue to delete the data from DynamoDB. Earlier when I was developing this I found that the only way to delete data from DyanmoDB is to gather the data you want to delete and deleting them in Batches.
At my current organization, most of the infrastructure is in serverless. Hence, I decided to make this piece following serverless and event driven architecture as well.
In a nutshell, I post a message on the SQS queue to delete items under particular partition. Once this message invokes my Lambda, I perform the listing call to DyanmoDB for 1000 items and do the following:
Grab the cursor from this listing call, and post another message to grab next 1000 items from this cursor.
import { DynamoDBClient } from '#aws-sdk/client-dynamodb';
const dbClient = new DynamoDBClient(config);
const records = dbClient.query(...fetchFirst1000ItemsForPrimaryKey);
postMessageToFetchNextItems();
From the fetched 1000 items:
I create a batches of 20 items, and issue set of messages for another lambda to delete these items. A batch of 20 items is posted for deletion until all 1000 have been posted for deletion.
for (let i = 0; i < 1000; i += 20) {
const itemsToDelete = records.slice(i, 20);
postItemsForDeletion(itemsToDelete);
}
Another lambda gets these items and just deletes them:
dbClient.send(new BatchWriteItemCommand([itemsForDeletion]))
The listing lambda receives call to read items from next cursor and the above steps ge t repeated.
This all happens in parallel. Get items, post message to grab next 1000 items, post messages for deletion of items.
While looking good on paper, this doesn't seem to delete all records from DynamoDB. There is no set pattern, there are always some items that remain in the DynamoDB. I am not entirely sure what could be happening but have a theory that parallel deletion and listing could be something that is causing the issue?
I was unable to find any documentation to verify my theory and hence this question here.
A batch write items call will return a list of unprocessed items. You should check for that and retry them.
Look at the docs for https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-dynamodb/classes/batchwriteitemcommand.html and seach for UnprocessedItems.
Fundamentally, a batch write items call is not a transactional write. It's possible for some item writes to succeed while others fail. It's on you to check for failures and retry them. I'm sorry I don't have a link for good sample code.
So I have this requirement, that takes in one document, and from that needs to create one or more documents in the output.
During the cause of this, it needs to determine if the document is already there, because there are different operations to apply for create and update scenarios.
In straight code, this would be simple (conceptually)
InputData in = <something>
if (getItemFromExternalSystem(in.key1) == null) {
createItemSpecificToKey1InExternalSystem(in.key1);
}
if (getItemFromExternalSystem(in.key2) == null) {
createItemSpecificToKey2InExternalSystem(in.key1, in.key2);
}
createItemFromInput(in.key1,in.key2, in.moreData);
In effect a kind of "ensure this data is present".
However, in IIB How would i go about achieving this? If i used a subflow for the Get/create cycle, the output of the subflow would be whatever the result of the last operation is, is returned from the subflow as the new "message" of the flow, but really, I don't care about the value from the "ensure data present" subflow. I need instead to keep working on my original message, but still wait for the different subflows to finish before i can run my final "createItem"
You can use Aggregation Nodes: for example, use 3 flows:
first would be propagate your original message to third
second would be invoke operations createItemSpecificToKey1InExternalSystem and createItemSpecificToKey2InExternalSystem
third would be aggregate results of first and second and invoke createItemFromInput.
Have you considered using the Collector node? It will collect your records into N 'collections', and then you can iterate over the collections and output one document per collection.
I have a Spark Streaming application that is processing a stream of website click events. Each event has a property containing a GUID that identifies the user session that the event belongs to.
My application is counting up the number of events that occurred for each session, using windowing:
def countEvents(kafkaStream: DStream[(String, Event)]): DStream[(String, Session)] = {
// Get a list of the session GUIDs from the events
val sessionGuids = kafkaStream
.map(_._2)
.map(_.getSessionGuid)
// Count up the GUIDs over our sliding window
val sessionGuidCountsInWindow = sessionGuids.countByValueAndWindow(Seconds(60), Seconds(1))
// Create new session objects with the event count
sessionGuidCountsInWindow
.map({
case (guidS, eventCount) =>
guidS -> new Session().setGuid(guidS).setEventCount(eventCount)
})
}
My understanding was that the countByValueAndWindow function is only counting the values in the DStream on which the function is called. In other words, in the code above, the call to countByValueAndWindow should return the event counts only for the session GUIDs in the sessionGuids DStream on which we're calling that function.
But I'm observing something different; the call to countByValueAndWindow is returning counts for session GUIDs that are not in sessionGUIDs. It appears to be returning counts for session GUIDs that were processed in previous batches. Am I just misunderstanding how this function works? I haven't been able to find anything in the way of useful documentation online.
A colleague of mine who is much more versed in the ways of Spark than I has helped me with this. Apparently I was mis-understanding the way that the countByValueAndWindow function works. I thought that it would only return counts for values in the DStream for which you're calling the function. But, in fact, it returns counts for all values across the entire window. To address my issue, I simply perform a join between my input DStream and the DStream resulting from the countByValueAndWindow operation. Thus I only end up with results for values in my input DStream.
I have users connecting to a Node.js server, and when they join, I add them into a Lobby (essentially a queue). Any time there are 2 users in the lobby, I want them to pair off and be removed from the lobby. So essentially, it's just a simple queue.
I started off by trying to implement this with a Lobby.run method, which has an infinite loop (started within a process.nextTick call), and any time there are more than two entries in the queue, I remove them form the queue. However, I found that this was eating all my memory and that infinite loops like this are generally ill-advised.
I'm now assuming that emitting events via EventEmitter is the way to go. However, my concern is with synchronization. Let's assuming my Lobby is pretty simple:
Lobby = {
users: []
, join: function (user) {
this.users.push(user);
emitter.emit('lobby.join', user);
}
, leave: function (user) {
var index = this.users.indexOf(user);
this.users.splice(index, 1);
emitter.emit('lobby.leave', user);
}
};
Now essentially I assume I want to watch for users joining the lobby and pair them up, maybe something like this:
Lobby = {
...
, run: function () {
emitter.on('lobby.join', function (user) {
// TODO: determine if this.users contains other users,
// pair them off, and remove them from the array
});
}
}
As I mentioned, this does not account for synchronization. Multiple users can join the lobby at the same time, and so the event listener might pair up a single user with multiple other users instead of just one.
Can someone with more Node.js experience tell me if I am right to be concerned with this event-based approach? Any insight for improvement on this approach would be much appreciated.
You are wrong to be concerned with this. This is because Node.JS is single-threaded, there is no concurrency at all! Whenever a block of code is fired no other code (including event handlers) can be fired until the block finishes what it does. In particular if you define this empty loop in your app:
while(true) { }
then your server is crashed, no other code will ever fire, no other request will be ever handled. So be careful with blocks of code, make sure that each block will eventually end.
Back to the question... So in your case it is impossible for multiple users to be paired with the same user. And let me say one more time: this is simply because there is no concurrency in Node.JS!
On the other hand this only applies to one instance of Node.JS. If you want to scale it to many machines, then obviously you will have to implement some locking mechanism (which ensures that no other process can work with the data at the same time).