Redis clients can update cache simultaneously causing wrong state to be saved - caching

I have been building a simple application that uses Redis as cache to store data regarding a game where each user has a score and after a user completes a task the score is updated for the user.
My problem is when a user completes a task his score is updated which means that it will update the record in redis by replacing the previous value with the new one (in my case it will replace the entire room object with the new one even though the room has not changed but only the score of the player inside the room has changed).
The thing is if multiple users complete a task at the same time they will send each at the same time the new record to redis and only the last one will receive the update.
For example:
In the redis cache this is the starting value: { roomId: "...", score:[{ "player1": 0 }, { "player2": 0 }] }
Player 1 completes a task and sends:
{ roomId: "...", score:[{ "player1": 1 }, { "player2": 0 }] }
At the same time Player 2 completes a task and sends:
{ roomId: "...", score:[{ "player1": 0 }, { "player2": 1 }] }
In the redis cache first it will be saved the value received from Player1 let's say and then the value from player 2 which means that the new value in the cache will be:
{ roomId: "...", score:[{ "player1": 0 }, { "player2": 1 }] }
Even though this is wrong because the correct value would be: { roomId: "...", score:[{ "player1": 1 }, { "player2": 1 }] } where both changes are present.
At the moment I am also using a pub/sub system to keep track of changes so that does are reflected to every server and each user connected to the server.
What can I do to fix this? For reference consider the following image as the architecture of the system:

The issue appears to be that you are interleaving one read/write set of operation with others, which leads to using stale data while updating keys. Fortunately, the fix is (relatively) easy: just combine your read/write chunk of operations into a single atomic unit, using either a Lua script, a transaction or, even easier, through a single RedisJSON command.
Here is an example using RedisJSON. Prepare your JSON key/document which will hold all the scores for the room first, using the JSON.SET command:
> JSON.SET room:foo $ '{ "roomId": "foo", "score": [] }'
OK
After that, use the JSON.ARRAPPEND command once you need to append an item to the score array:
> JSON.ARRAPPEND room:foo $.score '{ "player1": 123 }'
1
...
> JSON.ARRAPPEND room:foo $.score '{ "player2": 456 }'
2
Getting back the whole JSON document is as easy as running:
> JSON.GET room:foo
"{\"roomId\":\"foo\",\"score\":[{\"player1\":123},{\"player2\":456}]}"

Related

Azure Data Factory Until loop slow to end/terminate

We have an until loop in a ADFv2 pipeline.
The time it takes to stop/terminate once the expression condition is met seems to corrolate between the length of time the until loop takes to completes its activities.
This particular until loop performs alot of activites and can take anywhere between 90-120 mins to complete. So it takes almost as long to end/terminate (break out of the loop).
If I "hack" it so that it only performs a handful of activities it will quickly end and break once it's finished and the expression to terminate is met.
It's like a spinning wheel that keeps spinning even after the power is turned off. The momentum that was built up while connected takes a while to slow down and eventually stop.
Is this a known issue, how can I troubleshoot the exact cause here or fix it?
Incorrect usage of nested loop switch could cause this.
Here is a Until component, and some activities after it:
Until
In the Until, some activities,
Correct:
In until - correct
Incorrect (Slow to end/terminate):
In until - incorrect
Why?
For the incorrect case, the last activity If waiting depends on three activities, perhaps its behavior was counterintuitive.
// pay attention to "dependsOn"
{
"name": "If waiting",
"type": "IfCondition",
"dependsOn": [
{
"activity": "Set loop_waiting_refresh_status to True",
"dependencyConditions": [
"Succeeded"
]
},
{
"activity": "WeChatEP_Notifier Info Get new bearer",
"dependencyConditions": [
"Succeeded"
]
},
{
"activity": "Set loop_waiting_refresh_status to False",
"dependencyConditions": [
"Succeeded"
]
}
],
...
}

Finding out on which data path shard is located in Elasticsearch

I have multiple path.datas configured for my Elasticsearch cluster.
The official documentation states that only a single path is used for a single shard, so it's never splitted across multiple paths.
I'd like to find a way to finding out which path on which node is used for some specific shard (primary or replica), like index my-index primary shard 0 → node RQzJvAgLTDOnEnmIjYU9FA path /mnt/data1. Tried /_nodes, /_stats, /_segments, /_shard_stores, but there are no any references to paths.
You can find that info using the indices stats API by specifying the level=shards parameter
GET index/_stats?level=shards
will return a structure like this
"indices": {
"listings-master": {
"primaries": {
...
},
"total": {
...
},
"shards": {
"0": [
{
"shard_path": {
"state_path": "/app/data/nodes/0",
"data_path": "/app/data/nodes/0",
"is_custom_data_path": false
},
...
}
...
Not easily but but by doing a small python script I've the info I want, here the script
import json
with open('shard.json') as json_file:
data = json.load(json_file)
print(data.keys())
data=data['indices']
for indice in data:
#print(indice)
d1=data[indice]
shards=d1['shards']
#print(shards,type(shards),shards.keys())
for nshard in shards.keys():
shard=shards[nshard]
#print(shard,type(shard))
for elt in shard:
path=elt['shard_path']['data_path']
node=elt['routing']['node']
#print(repr(elt['shard_path']['data_path']))
#print("=========================")
print(indice,'\t',nshard,'\t',node,'\t',path)
They you obtain stuff like
log-2020.11.06 1 oxx /datassd/elasticsearch/nodes/0
log-2020.11.06 0 oxx /datassd/elasticsearch/nodes/0
log-2020.11.05 1 oxx /datassd/elasticsearch/nodes/0

Aws lambda code explanation

Can anybody please explain the working of the below code.
"def lambda_handlerOut(event, context):
if len(event) > 0:
success=1
print("length of event outside for--"+str(len(event)))
for record in event['Records']:
print("length of event--"+str(len(event)))
bucket=record['s3']['bucket']['name']
key=record['s3']['object']['key']
print("Bucket--"+bucket)
print("File that triggered this event--"+key)
Thanks in advance.
Regards,
Eleena Jose
This is a Lambda that receives S3 events - for example a PutObject request that creates a new file.
The method is the standard Python function - take a look at the Lambda Function Handler Docs for more details.
The structure of the event is defined here but basically there are some number of Records that are being iterated through with and, for each record, the bucket and key are being extracted and printed.
So, in more detail (comments above the line they reference):
# standard lambda event handler definition
def lambda_handlerOut(event, context):
# make sure that something was given - likely unneeded
if len(event) > 0:
success=1
print("length of event outside for--"+str(len(event)))
# loop through each record in Records
for record in event['Records']:
print("length of event--"+str(len(event)))
# take a look at the event structure - just extracting parts
bucket=record['s3']['bucket']['name']
# key is the object name - that is, the file
key=record['s3']['object']['key']
print("Bucket--"+bucket)
print("File that triggered this event--"+key)
EDIT
As I linked to above, the data in the event object looks something like:
{
"Records":[
{
"eventVersion":"2.0",
"eventSource":"aws:s3",
"awsRegion":"us-east-1",
"eventTime":"1970-01-01T00:00:00.000Z",
"eventName":"ObjectCreated:Put",
"userIdentity":{
"principalId":"AIDAJDPLRKLG7UEXAMPLE"
},
"requestParameters":{
"sourceIPAddress":"127.0.0.1"
},
"responseElements":{
"x-amz-request-id":"C3D13FE58DE4C810",
"x-amz-id-2":"FMyUVURIY8/IgAtTv8xRjskZQpcIZ9KG4V5Wp6S7S/JRWeUWerMUE5JgHvANOjpD"
},
"s3":{
"s3SchemaVersion":"1.0",
"configurationId":"testConfigRule",
"bucket":{
"name":"mybucket",
"ownerIdentity":{
"principalId":"A3NL1KOZZKExample"
},
"arn":"arn:aws:s3:::mybucket"
},
"object":{
"key":"HappyFace.jpg",
"size":1024,
"eTag":"d41d8cd98f00b204e9800998ecf8427e",
"versionId":"096fKKXTRTtl3on89fVO.nfljtsv6qko",
"sequencer":"0055AED6DCD90281E5"
}
}
}
]
}
So, as an example, bucket=record['s3']['bucket']['name'] starts by getting the s3 record from the data which leaves:
"s3":{
"s3SchemaVersion":"1.0",
"configurationId":"testConfigRule",
"bucket":{
"name":"mybucket",
"ownerIdentity":{
"principalId":"A3NL1KOZZKExample"
},
"arn":"arn:aws:s3:::mybucket"
},
"object":{
"key":"HappyFace.jpg",
"size":1024,
"eTag":"d41d8cd98f00b204e9800998ecf8427e",
"versionId":"096fKKXTRTtl3on89fVO.nfljtsv6qko",
"sequencer":"0055AED6DCD90281E5"
}
}
From there, it gets the bucket stanza:
"bucket":{
"name":"mybucket",
"ownerIdentity":{
"principalId":"A3NL1KOZZKExample"
},
"arn":"arn:aws:s3:::mybucket"
}
and lastly, the name:
"name":"mybucket"
This is assigned to the variable bucket which is printed out later. The key (which is the file name in this example) works the same way but gets different parts of the event.
Does that make sense now?

Append an array to a json using jq in BASH

I have a json that looks like this:
{
"failedSet": [],
"successfulSet": [{
"event": {
"arn": "arn:aws:health:us-east-1::event/AWS_RDS_MAINTENANCE_SCHEDULED_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx",
"endTime": 1502841540.0,
"eventTypeCategory": "scheduledChange",
"eventTypeCode": "AWS_RDS_MAINTENANCE_SCHEDULED",
"lastUpdatedTime": 1501208541.93,
"region": "us-east-1",
"service": "RDS",
"startTime": 1502236800.0,
"statusCode": "open"
},
"eventDescription": {
"latestDescription": "We are contacting you to inform you that one or more of your Amazon RDS DB instances is scheduled to receive system upgrades during your maintenance window between August 8 5:00 PM and August 15 4:59 PM PDT. Please see the affected resource tab for a list of these resources. \r\n\r\nWhile the system upgrades are in progress, Single-AZ deployments will be unavailable for a few minutes during your maintenance window. Multi-AZ deployments will be unavailable for the amount of time it takes a failover to complete, usually about 60 seconds, also in your maintenance window. \r\n\r\nPlease ensure the maintenance windows for your affected instances are set appropriately to minimize the impact of these system upgrades. \r\n\r\nIf you have any questions or concerns, contact the AWS Support Team. The team is available on the community forums and by contacting AWS Premium Support. \r\n\r\nhttp://aws.amazon.com/support\r\n"
}
}]
}
I'm trying to add a new key/value under successfulSet[].event (key name as affectedEntities) using jq, I've seen some examples, like here and here, but none of those answers really show how to add a possible one key with multiple values (the reason why I say possible is because as of now, AWS is returning one value for the affected entity, but if there are more, then I'd like to list them).
EDIT: The value of the new key that I want to add is stored in a variable called $affected_entities and a sample of that value looks like this:
[
"arn:aws:acm:us-east-1:xxxxxxxxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
]
The value could look like this:
[
"arn:aws:acm:us-east-1:xxxxxxxxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",
"arn:aws:acm:us-east-1:xxxxxxxxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",
...
...
...
]
You can use this jq,
jq '.successfulSet[].event += { "new_key" : "new_value" }' file.json
EDIT:
Try this:
jq --argjson argval "$new_value" '.successfulSet[].event += { "affected_entities" : $argval }' file.json
Test:
sat~$ new_value='[
"arn:aws:acm:us-east-1:xxxxxxxxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
]'
sat~$ jq --argjson argval "$new_value" '.successfulSet[].event += { "affected_entities" : $argval }' file.json
Note that --argjson works with jq 1.5 and above.

Indexing tuples from storm to elasticsearch with elasticsearch-hadoop library does not work

I want to index documents into Elasticsearch from Storm, but I couldn't get any document to be indexed into Elasticsearch.
In my topology I have a KafkaSpout that emits a json like this { “tweetId”: 1, “text”: “hello” } to a EsBolt that is a native bolt from elasticsearch-hadoop library that writes the Storm Tuples to Elasticsearch (doc is here: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/storm.html).
These are the configs for my EsBolt:
Map conf = new HashMap();
conf.put("es.nodes","127.0.0.1");
conf.put("es.port","9200");
conf.put("es.resource","twitter/tweet");
conf.put("es.index.auto.create","no");
conf.put("es.input.json", "true");
conf.put("es.mapping.id", "tweetId");
EsBolt elasticsearchBolt = new EsBolt("twitter/tweet", conf);
The first two configurations have these values by default, but I chose to set them explicitly. I have also tried without them, getting the same result.
And this is how I build my topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(TWEETS_DATA_KAFKA_SPOUT_ID, kafkaSpout, kafkaSpoutParallelism)
.setNumTasks(kafkaSpoutNumberOfTasks);
builder.setBolt(ELASTICSEARCH_BOLT_ID, elasticsearchBolt, elasticsearchBoltParallelism)
.setNumTasks(elasticsearchBoltNumberOfTasks)
.shuffleGrouping(TWEETS_DATA_KAFKA_SPOUT_ID);
return builder.createTopology();
Before I run the topology locally I create the "twitter" index in Elasticsearch and a mapping "tweet" for this index.
This is what I get if I retrieve the mapping for my newly created type (curl -XGET 'http://localhost:9200/twitter/_mapping/tweet'):
{
"twitter": {
"mappings": {
"tweet": {
"properties": {
"text": {
"type": "string"
},
"tweetId": {
"type": "string"
}
}
}
}
}
}
I run the topology locally and this is what I get in my console when processing a tuple:
Processing received message FOR 6 TUPLE: source: tweets-data-kafka-spout:9, stream: default, id: {-8010897758788654352=-6240339405307942979}, [{"tweetId":"1","text":"hello"}]
Emitting: elasticsearch-bolt __ack_ack [-8010897758788654352 -6240339405307942979]
TRANSFERING tuple TASK: 2 TUPLE: source: elasticsearch-bolt:6, stream: __ack_ack, id: {}, [-8010897758788654352 -6240339405307942979]
BOLT ack TASK: 6 TIME: TUPLE: source: tweets-data-kafka-spout:9, stream: default, id: {-8010897758788654352=-6240339405307942979}, [{"tweetId":"1","text":"hello"}]
Execute done TUPLE source: tweets-data-kafka-spout:9, stream: default, id: {-8010897758788654352=-6240339405307942979}, [{"tweetId":"1","text":"hello"}] TASK: 6 DELTA:
So the tuples seems to be processed. However I don't have any document indexed in Elasticsearch.
I suppose I am doing something wrong when I set the configurations for EsBolt, maybe missing a configuration or something.
Documents will only be indexed once you reach the flush size, specified by es.storm.bolt.flush.entries.size
Alternately, you may set a TICK frequency that triggers a queue flush.
config.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 5);
By default, es-hadoop flushes on tick, as per the es.storm.bolt.tick.tuple.flush parameter.
I have also got the same issue, but when I looking for the es-Hadoop documents, I find because I was miss set the frequency that triggers a queue flush.Then I add a configurations to my store topology (es.storm.bolt.flush.entries.size ), it's fine.but when we setting the value for Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS .it's throw an exception :java.lang.RuntimeException:java.lang.NullPointerException in bolt execute function. then we use debug mode to test my topology, I find the input tuple in bolt execute don't contain any entries, but this empty tuple is been triggered.
That's what I feel confusion. Don't the tuple will be emitted according to the setting time, Even though this tuple is empty after we set Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS.i think which is a bug.
enter image description here
enter image description here
more information you can see:https://www.elastic.co/guide/en/elasticsearch/hadoop/current/storm.html

Resources