Here is an example of what I want my data structure to look like:
[games]:
[game_1]:
players: 10
maxPlayers: 24
state: "PLAYING"
currentMap: "Example Map"
[game_2]:
players: 0
maxPlayers: 24
state: "LOBBY"
currentMap: "None"
I am using the Java implementation of Redis (Jedis) in order to cache data from registered game servers. A proxy server connects all the registered game servers together, however, the proxy cannot relay data such as this. Therefore, I have gone with the Redis approach and integrated it into the core plugin which is shared across all of the game servers. The lobby server will be able to access the data from Redis to display live game statistics to the players. How would I go about structuring this? I'm fairly new to Redis and hours of searching did not help. Please do explain this in simple terms.
I would like to be able to make method calls to get the data for a specific game and iterate through all cached games in the Redis database. (e.g. games: {"game_1", "game_2"})
One way to go about this would be:
Store your games in Hashes - a Hash per game. The Hash's key name should be your game's identifier and the fields->values in it should correspond to your game's properties. In Redis lingo that should be done with:
HMSET game_1 players 10 maxPlayers 24 state PLAYING currentMap "Example Map"
Updates to a game can simply call HSET game_1 players 11 to touch specific properties.
You can get read a specific game's property, properties or all properties with HGET, HMGET and HGETALL respectively.
Getting all the games can be done in one of two ways:
a. Use SCAN to retrieve the key names in your database, and call the APIs in step 3 on each. Pro: no additional maintenance, con: could be slower.
b. Maintain a Set of that consists of your game identifiers (key names) - to fetch all games, get the contents of that Set and for each of its members use the APIs in step 3. Pro: better performance for fetching all games, con: requires maintaining another data structure (the Set).
Related
I was wondering if there was any way of attaching state to individual rooms, can that be done?
Say I create a room room_name:
socket.join("room_name")
How can I assign an object, array, variable(s) to that specific room? I want to do something like:
io.adapter.rooms["room_name"].max = maxPeople
Where I give the room room_name a state variable max, and max is assigned the value of the variable maxPeople that is say type int. Max stores a variable for the maximum amount of people allowed to join that specific room. Other rooms can/could be assigned different max values.
Well, there is an object (internal to socket.io) that represents a room. It is stored in the adapter. You can see the basic definition of the object here in the source. so, if you reached into the adapter and got the room object for a given name, you could add data to it and it would stay there as long as the room didn't get removed.
But, that's a bit dangerous for a couple reasons:
If there's only one connection in the room and that user disconnects and reconnects (like say because they navigate to a new page), the room object may get destroyed and then recreated and your data would be lost.
Direct access to the Room object and the list of rooms is not in the public socket.io interface (as far as I know). The adapter is a replaceable thing and when you're doing things like using the redis adapter with clustering it may work differently (in fact this probably does because the list of rooms is centralized into a redis database). The non-public interface is also subject to change in future versions of socket.io (and socket.io has been known to rearrange some internals from time to time).
So, if this were my code, I'd just create my own data structure to keep room specific info.
When you add someone to a room, you make sure your own room object exists and is initialized properly with the desired data. It would probably work well to use a Map object with room name as key and your own Room object as value. When you remove someone from a room, you can clean up your own data structure if the room is empty.
You could even make your own room object be the central API you use for joining or leaving a room and it would then maintain its own data structure and then also call socket.io to to the join or leave there. This would centralize the maintenance of your own data structure when anyone joins or leaves a room. This would also allow you to pre-create your room objects with their own properties before there are any users in them (if you needed/wanted to do that) which is something socket.io will not do.
Two General Problems - EventStore and persistence layer?
I would like to understand how industry is actually dealing with this problems!
If a microservice 1 persists object X into Database A. In the same time, for micro-service 2 to feed on the data from micro-service 1, micro-service 1 writes the same object X to an event store B.
Now, the question I have is, where do I write object X first?
Database A first and then to event store B, is it fair to roll back the thread at the app level if Database A is down? Also, what should be the ideal error handle if Database A is online and persisted object X but event store B is down?
What should be the error handle look like if we go vice-versa of point 1?
I do understand that in today's world of distributed high-available systems, systems going down is questionable thing. But, it can happen. I want to understand what needs to be done when either database or event store system/cluster is down?
In general you want to avoid relying on a two-phase commit of the kind you describe.
In general, (presuming an event-sourced system; not sure if that's implicit in your question/an option for you - perhaps SqlStreamStore might be relevant in your context?), this is typically managed by having something project from from a single authoritative set of events on a pull basis - each event being written that requires an associated action against some downstream maintains a pointer to how far it has got projecting events from the base stream, and restarts from there if interrupted.
First of all, an Event store is a type of Persistence, which stores the applications state as a series of events as opposed to a flat persistence that stores the last projected state.
If a microservice 1 persists object X into Database A. In the same time, for micro-service 2 to feed on the data from micro-service 1, micro-service 1 writes the same object X to an event store B.
You are trying to have two sources of truth that must be kept in sync by some sort of distributed transaction which is not very scalable.
This is an unusual mode of using an Event store. In general an Event store is the canonical source of information, the single source of truth. You are trying to use it as an communication channel. The Event store is the persistence of an event-sourced Aggregate (see Domain Driven Design).
I see to options:
you could refactor your architecture and make the object X and event-sourced entity having as persistence the Event store. Then have a Read-model subscribe to the Event store and build a flat representation of the object X that is persisted in the database A. In other words, write first to the Event store and then in the Database A (but in an eventually consistent manner!). This is a big jump and you should really think if you want to go event-sourced.
you could use CQRS without Event sourcing. This means that after every modification, the object X emits one or more Domain events, that are persisted in the Database A in the same local transaction as the object X itself. The microservice 2 could subscribe to the Database A to get the emitted events. The actual subscribing depends on the type of database.
I have a feeling you are using event store as a channel of communication, instead of using it as a database. If you want micro-service 2 to feed on the data from micro-service 1, then you should communicate with REST services.
Of course, relying on REST services might make you less resilient to outages. In that case, using a piece of technology dedicated to communication would be the right way to go. (I'm thinking MQ/Topics, such as RabbitMQ, Kafka, etc.)
Then, once your services are talking to each other, you will still need to persist your data... but only at one single location.
Therefore, you will need to define where you want to store the data.
Ask yourself:
Who will have the governance of the data persistance ?
Is it Microservice1 ? if so, then everytime Microservice2 needs to read the data, it will make a REST call to Microservice1.
is it the other way around ? Microservice2 has the governance of the data, and Microservice1 consumes it ?
It could be a third microservice that you haven't even created yet. It depends how you applied your separation of concerns.
Let's take an example :
Microservice1's responsibility is to process our data to export them in PDF and other formats
Microservice2's responsibility is to expose a service for a legacy partner, that requires our data to be returned in a very proprietary representation.
who is going to store the data, here ?
Microservice1 should not be the one to persist the data : its job is only to convert the data to other formats. If it requires some data, it will fetch them from the one having the governance of the data.
Microservice2 should not be the one to persist the data. After all, maybe we have a number of other Microservices similar to this one, but for other partners, with different proprietary formats.
If there is a service where you can do CRUD operations, this is your guy. If you don't have such a service, maybe you can find an existing Microservice who wouldn't have conflicting responsibilities.
For instance : if I have a Microservice3 that makes sure everytime an my ObjectX is changed, it will send a PDF-representation of it to some address, and notify all my partners that the data are out-of-date. In that scenario, this Microservice looks like a good candidate to become the "governor of the data" for this part of the domain, and be the one-stop-shop for writing/reading in the database.
I want to plan a solution that manages enriched data in my architecture.
To be more clear, I have dozens of micro services.
let's say - Country, Building, Floor, Worker.
All running over a separate NoSql data store.
When I get the data from the worker service I want to present also the floor name (the worker is working on), the building name and country name.
Solution1.
Client will query all microservices.
Problem - multiple requests and making the client be aware of the structure.
I know multiple requests shouldn't bother me but I believe that returning a json describing the entity in one single call is better.
Solution 2.
Create an orchestration that retrieves the data from multiple services.
Problem - if the data (entity names, for example) is not stored in the same document in the DB it is very hard to sort and filter by these fields.
Solution 3.
Before saving the entity, e.g. worker, call all the other services and fill the relative data (Building Name, Country name).
Problem - when the building name is changed, it doesn't reflect in the worker service.
solution 4.
(This is the best one I can come up with).
Create a process that subscribes to a broker and receives all entities change.
For each entity it updates all the relavent entities.
When an entity changes, let's say building name changes, it updates all the documents that hold the building name.
Problem:
Each service has to know what can be updated.
When a trailing update happens it shouldnt update the broker again (recursive update), so this can complicate to the microservices.
solution 5.
Keeping everything normalized. Fileter and sort in ElasticSearch.
Problem: keeping normalized data in ES is too expensive performance-wise
One thing I saw Netflix do (which i like) is create intermediary services for stuff like this. So maybe a new intermediary service that can call the other services to gather all the data then create the unified output with the Country, Building, Floor, Worker.
You can even go one step further and try to come up with a scheme for providing as input which resources you want to include in the output.
So I guess this closely matches your solution 2. I notice that you mention for solution 2 that there are concerns with sorting/filtering in the DB's. I think that if you are using NoSQL then it has to be for a reason, and more often then not the reason is for performance. I think if this was done wrong then yeah you will have problems but if all the appropriate fields that are searchable are properly keyed and indexed (as #Roman Susi mentioned in his bullet points 1 and 2) then I don't see this as being a problem. Yeah this service will only be as fast as the culmination of your other services and data stores, so they have to be fast.
Now you keep your individual microservices as they are, keep the client calling one service, and encapsulate the complexity of merging the data into this new service.
This is the video that I saw this in (https://www.youtube.com/watch?v=StCrm572aEs)... its a long video but very informative.
It is hard to advice on the Solution N level, but certain problems can be avoided by the following advices:
Use globally unique identifiers for entities. For example, by assigning key values some kind of URI.
The global ids also simplify updates, because you track what has actually changed, the name or the entity. (entity has one-to-one relation with global URI)
CAP theorem says you can choose only two from CAP. Do you want a CA architecture? Or CP? Or maybe AP? This will strongly affect the way you distribute data.
For "sort and filter" there is MapReduce approach, which can distribute the load of figuring out those things.
Think carefully about the balance of normalization / denormalization. If your services operate on URIs, you can have a service which turns URIs to labels (names, descriptions, etc), but you do not need to keep the redundant information everywhere and update it. Do not do preliminary optimization, but try to keep data normalized as long as possible. This way, worker may not even need the building name but it's global id. And the microservice looks up the metadata from another microservice.
In other words, minimize the number of keys, shared between services, as part of separation of concerns.
Focus on the underlying model, not the JSON to and from. Right modelling of the data in your system(s) gains you more than saving JSON calls.
As for NoSQL, take a look at Riak database: it has adjustable CAP properties, IIRC. Even if you do not use it as such, reading it's documentation may help to come up with suitable architecture for your distributed microservices system. (Of course, this applies if you have essentially parallel system)
First of all, thanks for your question. It is similar to Main Problem Of Document DBs: how to sort collection by field from another collection? I have my own answer for that so i'll try to comment all your solutions:
Solution 1: It is good if client wants to work with Countries/Building/Floors independently. But, it does not solve problem you mentioned in Solution 2 - sorting 10k workers by building gonna be slow
Solution 2: Similar to Solution 1 if all client wants is a list enriched workers without knowing how to combine it from multiple pieces
Solution 3: As you said, unacceptable because of inconsistent data.
Solution 4: Gonna be working, most of the time. But:
Huge data duplication. If you have 20 entities, you are going to have x20 data.
Large complexity. 20 entities -> 20 different procedures to update related data
High cohesion. All your services must know each other. Data model change will propagate to every service because of update procedures
Questionable eventual consistency. It can be done so data will be consistent after failures but it is not going to be easy
Solution 5: Kind of answer :-)
But - you do not want everything. Keep separated services that serve separated entities and build other services on top of them.
If client wants enriched data - build service that returns enriched data, as in Solution 2.
If client wants to display list of enriched data with filtering and sorting - build a service that provides enriched data with filtering and sorting capability! Likely, implementation of such service will contain ES instance that contains cached and indexed data from lower-level services. Point here is that ES does not have to contain everything or be shared between every service - it is up to you to decide better balance between performance and infrastructure resources.
This is a case where Linked Data can help you.
Basically the Floor attribute for the worker would be an URI (a link) to the floor itself. And Any other linked data should be expressed as URIs as well.
Modeled with some JSON-LD it would look like this:
worker = {
'#id': '/workers/87373',
name: 'John',
floor: {
'#id': '/floors/123'
}
}
floor = {
'#id': '/floor/123',
'level': 12,
building: { '#id': '/buildings/87' }
}
building = {
'#id': '/buildings/87',
name: 'John's home',
city: { '#id': '/cities/908' }
}
This way all the client has to do is append the BASE URL (like api.example.com) to the #id and make a simple GET call.
To remove the extra calls burden from the client (in case it's a slow mobile device), we use the gateway pattern with micro-services. The gateway can expand those links with very little effort and augment the return object. It can also do multiple calls in parallel.
So the gateway will make a GET /floor/123 call and replace the floor object on the worker with the reply.
I am really new to Redis and have been using it along with my Ruby on Rails (Rails 2.3 and Ruby 1.8.7) application using the redis gem for simple tagging functionality as a key value store. I recently realized that I could use it to maintain a user activity feed as well.
The thing is I need the tagging data (stored as key => Sets) in memory and its extremely important to determine results for tagging related operations, where as for the activity feed the data could be deleted on a first in first out basis. Assuming I store X number of activities for every user
Is it possible that I could namespace the redis data sets and have one remain permanently in memory and have the other stay temporarily in the memory. What is the general approach when one uses unrelated data sets that need to have different durations of survival in memory.
Would really appreciate any help on this.
You do not need to define a specific namespace for this. With Redis, you can use the EXPIRE command to set a timeout on a key by key basis.
The general policy regarding key expiration is defined in the configuration file:
# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached? You can select among five behavior:
#
# volatile-lru -> remove the key with an expire set using an LRU algorithm
# allkeys-lru -> remove any key accordingly to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys->random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
#
For your purpose, the volatile-lru policy should be set.
You just have to call EXPIRE on the keys you want to be volatile, and let Redis evict them. However please note it is difficult to guarantee that the oldest keys will be evicted first once the timeout has been triggered. More explanations here.
For your specific use case however, I would not use key expiration but rather try to simulate capped collections. If the activity feed for a given user is represented as a list of objects, it is easy to LPUSH the activity objects, and use LTRIM to limit the size of the list. You get FIFO behavior and keep memory consumption under control for free.
UPDATE:
Now, if you really need to isolate data, you have two main possibilities with Redis:
using two distinct databases. Redis database are identified by an integer, and you can have several of them per instance. Use the select command to switch between databases. Databases can be used to isolate data, but not to assign them different properties (like an expiration policy for instance).
using two distinct instances. An empty Redis instance is a very light process. So several of them can be started without any problem. It is actually the best and the more scalable way to isolate data with Redis. Each instance can have its own policies (including eviction policy). The clients should open as many connections as instances.
But again, you do not need to isolate data to implement your eviction policy requirements.
I'm making this game where I'm trying to "pair people". So I have this database where I add people when they want to join a game. And when two people want to join a game, I redirect them to the game.
But I wanted to make this in ajax (which I'm a new to), so that it continually looks at the database if a new person has joined. I thought using this would be a good method:
new Ajax.PeriodicalUpdater('products', '/some_url',
{
method: 'get',
insertion: Insertion.Top,
frequency: 1,
decay: 2
});
But then he reminded me that it'd open and close the database all in vain very many times. Is there a better solution?
Correct, using the database is going to increase its load. Depending on how much load you have, it might take more than 1 second to return, in which case the application will eat itself.
Another option would be to keep something in memory on the server side. In PHP there is no 'app server' concept, so DB is a good place, or something like memcache.
Also, you need to think about transactionality. What if 3 people end up in the game?
If the people can register for gaming only when the application is active, why to save this information to the database?
Serving information from an url doesn't need to load it from the database. It could be a repository of different type (eg. memory).
If you have only one application server it is just fine to store the people list in shared memory object, for a application farm different rules may apply - depending of your configuration and session storage used.