Categorize Key-Value Pairs/Alternative to HashMap of HashMap - data-structures

I'm creating a generic matchmaking server and basically, there is an Enum that defines the possible game modes a player can queue (not really one since game modes can sort the queue, like ranked modes) into. So, when a player decides to join a queue, their connection ID is added to a HashMap (currently still a Vec, but realized that a HashMap is better) of IDs and their respective data. That HashMap is also stored in a HashMap with the key being the game mode.
But when I looked into removing the user when they leave, I realized that I would have to iterate over the game modes and try to remove the user from each (at least until I find them).
This isn't too much of a problem, it happens under the hood, this is more my curiosity at work. I just wonder if there is a data structure that uses unique key-value pairs but provides views of data (categories). So something that would look like ViewHashMap<K, V, C>. Or is there something better for this purpose/I should organize my data different?
Code example of what I'm doing:
if let Some(queue) = queues.get_mut(game_mode) {
queue.insert(id, data);
}
// and
for queue in queues.iter_mut() {
if queue.remove(id).is_some() {
break;
}
}

Related

socket.io, how to attach state to a socket room

I was wondering if there was any way of attaching state to individual rooms, can that be done?
Say I create a room room_name:
socket.join("room_name")
How can I assign an object, array, variable(s) to that specific room? I want to do something like:
io.adapter.rooms["room_name"].max = maxPeople
Where I give the room room_name a state variable max, and max is assigned the value of the variable maxPeople that is say type int. Max stores a variable for the maximum amount of people allowed to join that specific room. Other rooms can/could be assigned different max values.
Well, there is an object (internal to socket.io) that represents a room. It is stored in the adapter. You can see the basic definition of the object here in the source. so, if you reached into the adapter and got the room object for a given name, you could add data to it and it would stay there as long as the room didn't get removed.
But, that's a bit dangerous for a couple reasons:
If there's only one connection in the room and that user disconnects and reconnects (like say because they navigate to a new page), the room object may get destroyed and then recreated and your data would be lost.
Direct access to the Room object and the list of rooms is not in the public socket.io interface (as far as I know). The adapter is a replaceable thing and when you're doing things like using the redis adapter with clustering it may work differently (in fact this probably does because the list of rooms is centralized into a redis database). The non-public interface is also subject to change in future versions of socket.io (and socket.io has been known to rearrange some internals from time to time).
So, if this were my code, I'd just create my own data structure to keep room specific info.
When you add someone to a room, you make sure your own room object exists and is initialized properly with the desired data. It would probably work well to use a Map object with room name as key and your own Room object as value. When you remove someone from a room, you can clean up your own data structure if the room is empty.
You could even make your own room object be the central API you use for joining or leaving a room and it would then maintain its own data structure and then also call socket.io to to the join or leave there. This would centralize the maintenance of your own data structure when anyone joins or leaves a room. This would also allow you to pre-create your room objects with their own properties before there are any users in them (if you needed/wanted to do that) which is something socket.io will not do.

Can a React-Redux app really scale as well as, say Backbone? Even with reselect. On mobile

In Redux, every change to the store triggers a notify on all connected components. This makes things very simple for the developer, but what if you have an application with N connected components, and N is very large?
Every change to the store, even if unrelated to the component, still runs a shouldComponentUpdate with a simple === test on the reselected paths of the store. That's fast, right? Sure, maybe once. But N times, for every change? This fundamental change in design makes me question the true scalability of Redux.
As a further optimization, one can batch all notify calls using _.debounce. Even so, having N === tests for every store change and handling other logic, for example view logic, seems like a means to an end.
I'm working on a health & fitness social mobile-web hybrid application with millions of users and am transitioning from Backbone to Redux. In this application, a user is presented with a swipeable interface that allows them to navigate between different stacks of views, similar to Snapchat, except each stack has infinite depth. In the most popular type of view, an endless scroller efficiently handles the loading, rendering, attaching, and detaching of feed items, like a post. For an engaged user, it is not uncommon to scroll through hundreds or thousands of posts, then enter a user's feed, then another user's feed, etc. Even with heavy optimization, the number of connected components can get very large.
Now on the other hand, Backbone's design allows every view to listen precisely to the models that affect it, reducing N to a constant.
Am I missing something, or is Redux fundamentally flawed for a large app?
This is not a problem inherent to Redux IMHO.
By the way, instead of trying to render 100k components at the same time, you should try to fake it with a lib like react-infinite or something similar, and only render the visible (or close to be) items of your list. Even if you succeed to render and update a 100k list, it's still not performant and it takes a lot of memory. Here are some LinkedIn advices
This anwser will consider that you still try to render 100k updatable items in your DOM, and that you don't want 100k listeners (store.subscribe()) to be called on every single change.
2 schools
When developing an UI app in a functional way, you basically have 2 choices:
Always render from the very top
It works well but involves more boilerplate. It's not exactly the suggested Redux way but is achievable, with some drawbacks. Notice that even if you manage to have a single redux connection, you still have have to call a lot of shouldComponentUpdate in many places. If you have an infinite stack of views (like a recursion), you will have to render as virtual dom all the intermediate views as well and shouldComponentUpdate will be called on many of them. So this is not really more efficient even if you have a single connect.
If you don't plan to use the React lifecycle methods but only use pure render functions, then you should probably consider other similar options that will only focus on that job, like deku (which can be used with Redux)
In my own experience doing so with React is not performant enough on older mobile devices (like my Nexus4), particularly if you link text inputs to your atom state.
Connecting data to child components
This is what react-redux suggests by using connect. So when the state change and it's only related to a deeper child, you only render that child and do not have to render top-level components everytime like the context providers (redux/intl/custom...) nor the main app layout. You also avoid calling shouldComponentUpdate on other childs because it's already baked into the listener. Calling a lot of very fast listeners is probably faster than rendering everytime intermediate react components, and it also permits to reduce a lot of props-passing boilerplate so for me it makes sense when used with React.
Also notice that identity comparison is very fast and you can do a lot of them easily on every change. Remember Angular's dirty checking: some people did manage to build real apps with that! And identity comparison is much faster.
Understanding your problem
I'm not sure to understand all your problem perfectly but I understand that you have views with like 100k items in it and you wonder if you should use connect with all those 100k items because calling 100k listeners on every single change seems costly.
This problem seems inherent to the nature of doing functional programming with the UI: the list was updated, so you have to re-render the list, but unfortunatly it is a very long list and it seems unefficient... With Backbone you could hack something to only render the child. Even if you render that child with React you would trigger the rendering in an imperative way instead of just declaring "when the list changes, re-render it".
Solving your problem
Obviously connecting the 100k list items seems convenient but is not performant because of calling 100k react-redux listeners, even if they are fast.
Now if you connect the big list of 100k items instead of each items individually, you only call a single react-redux listener, and then have to render that list in an efficient way.
Naive solution
Iterating over the 100k items to render them, leading to 99999 items returning false in shouldComponentUpdate and a single one re-rendering:
list.map(item => this.renderItem(item))
Performant solution 1: custom connect + store enhancer
The connect method of React-Redux is just a Higher-Order Component (HOC) that injects the data into the wrapped component. To do so, it registers a store.subscribe(...) listener for every connected component.
If you want to connect 100k items of a single list, it is a critical path of your app that is worth optimizing. Instead of using the default connect you could build your own one.
Store enhancer
Expose an additional method store.subscribeItem(itemId,listener)
Wrap dispatch so that whenever an action related to an item is dispatched, you call the registered listener(s) of that item.
A good source of inspiration for this implementation can be redux-batched-subscribe.
Custom connect
Create a Higher-Order component with an API like:
Item = connectItem(Item)
The HOC can expect an itemId property. It can use the Redux enhanced store from the React context and then register its listener: store.subscribeItem(itemId,callback). The source code of the original connect can serve as base inspiration.
The HOC will only trigger a re-rendering if the item changes
Related answer: https://stackoverflow.com/a/34991164/82609
Related react-redux issue: https://github.com/rackt/react-redux/issues/269
Performant solution 2: listening for events inside child components
It can also be possible to listen to Redux actions directly in components, using redux-dispatch-subscribe or something similar, so that after first list render, you listen for updates directly into the item component and override the original data of the parent list.
class MyItemComponent extends Component {
state = {
itemUpdated: undefined, // Will store the local
};
componentDidMount() {
this.unsubscribe = this.props.store.addDispatchListener(action => {
const isItemUpdate = action.type === "MY_ITEM_UPDATED" && action.payload.item.id === this.props.itemId;
if (isItemUpdate) {
this.setState({itemUpdated: action.payload.item})
}
})
}
componentWillUnmount() {
this.unsubscribe();
}
render() {
// Initially use the data provided by the parent, but once it's updated by some event, use the updated data
const item = this.state.itemUpdated || this.props.item;
return (
<div>
{...}
</div>
);
}
}
In this case redux-dispatch-subscribe may not be very performant as you would still create 100k subscriptions. You'd rather build your own optimized middleware similar to redux-dispatch-subscribe with an API like store.listenForItemChanges(itemId), storing the item listeners as a map for fast lookup of the correct listeners to run...
Performant solution 3: vector tries
A more performant approach would consider using a persistent data structure like a vector trie:
If you represent your 100k items list as a trie, each intermediate node has the possibility to short-circuit the rendering sooner, which permits to avoid a lot of shouldComponentUpdate in childs.
This technique can be used with ImmutableJS and you can find some experiments I did with ImmutableJS: React performance: rendering big list with PureRenderMixin
It has drawbacks however as the libs like ImmutableJs do not yet expose public/stable APIs to do that (issue), and my solution pollutes the DOM with some useless intermediate <span> nodes (issue).
Here is a JsFiddle that demonstrates how a ImmutableJS list of 100k items can be rendered efficiently. The initial rendering is quite long (but I guess you don't initialize your app with 100k items!) but after you can notice that each update only lead to a small amount of shouldComponentUpdate. In my example I only update the first item every second, and you notice even if the list has 100k items, it only requires something like 110 calls to shouldComponentUpdate which is much more acceptable! :)
Edit: it seems ImmutableJS is not so great to preserve its immutable structure on some operations, like inserting/deleting items at a random index. Here is a JsFiddle that demonstrates the performance you can expect according to the operation on the list. Surprisingly, if you want to append many items at the end of a large list, calling list.push(value) many times seems to preserve much more the tree structure than calling list.concat(values).
By the way, it is documented that the List is efficient when modifying the edges. I don't think these bad performances on adding/removing at a given index are related to my technique but rather related to the underlying ImmutableJs List implementation.
Lists implement Deque, with efficient addition and removal from both the end (push, pop) and beginning (unshift, shift).
This may be a more general answer than you're looking for, but broadly speaking:
The recommendation from the Redux docs is to connect React components fairly high in the component hierarchy. See this section.. This keeps the number of connections manageable, and you can then just pass updated props into the child components.
Part of the power and scalability of React comes from avoiding rendering of invisible components. For example instead of setting an invisible class on a DOM element, in React we just don't render the component at all. Rerendering of components that haven't changed isn't a problem at all as well, since the virtual DOM diffing process optimizes the low level DOM interactions.

Micro Services and noSQL - Best practice to enrich data in micro service architecture

I want to plan a solution that manages enriched data in my architecture.
To be more clear, I have dozens of micro services.
let's say - Country, Building, Floor, Worker.
All running over a separate NoSql data store.
When I get the data from the worker service I want to present also the floor name (the worker is working on), the building name and country name.
Solution1.
Client will query all microservices.
Problem - multiple requests and making the client be aware of the structure.
I know multiple requests shouldn't bother me but I believe that returning a json describing the entity in one single call is better.
Solution 2.
Create an orchestration that retrieves the data from multiple services.
Problem - if the data (entity names, for example) is not stored in the same document in the DB it is very hard to sort and filter by these fields.
Solution 3.
Before saving the entity, e.g. worker, call all the other services and fill the relative data (Building Name, Country name).
Problem - when the building name is changed, it doesn't reflect in the worker service.
solution 4.
(This is the best one I can come up with).
Create a process that subscribes to a broker and receives all entities change.
For each entity it updates all the relavent entities.
When an entity changes, let's say building name changes, it updates all the documents that hold the building name.
Problem:
Each service has to know what can be updated.
When a trailing update happens it shouldnt update the broker again (recursive update), so this can complicate to the microservices.
solution 5.
Keeping everything normalized. Fileter and sort in ElasticSearch.
Problem: keeping normalized data in ES is too expensive performance-wise
One thing I saw Netflix do (which i like) is create intermediary services for stuff like this. So maybe a new intermediary service that can call the other services to gather all the data then create the unified output with the Country, Building, Floor, Worker.
You can even go one step further and try to come up with a scheme for providing as input which resources you want to include in the output.
So I guess this closely matches your solution 2. I notice that you mention for solution 2 that there are concerns with sorting/filtering in the DB's. I think that if you are using NoSQL then it has to be for a reason, and more often then not the reason is for performance. I think if this was done wrong then yeah you will have problems but if all the appropriate fields that are searchable are properly keyed and indexed (as #Roman Susi mentioned in his bullet points 1 and 2) then I don't see this as being a problem. Yeah this service will only be as fast as the culmination of your other services and data stores, so they have to be fast.
Now you keep your individual microservices as they are, keep the client calling one service, and encapsulate the complexity of merging the data into this new service.
This is the video that I saw this in (https://www.youtube.com/watch?v=StCrm572aEs)... its a long video but very informative.
It is hard to advice on the Solution N level, but certain problems can be avoided by the following advices:
Use globally unique identifiers for entities. For example, by assigning key values some kind of URI.
The global ids also simplify updates, because you track what has actually changed, the name or the entity. (entity has one-to-one relation with global URI)
CAP theorem says you can choose only two from CAP. Do you want a CA architecture? Or CP? Or maybe AP? This will strongly affect the way you distribute data.
For "sort and filter" there is MapReduce approach, which can distribute the load of figuring out those things.
Think carefully about the balance of normalization / denormalization. If your services operate on URIs, you can have a service which turns URIs to labels (names, descriptions, etc), but you do not need to keep the redundant information everywhere and update it. Do not do preliminary optimization, but try to keep data normalized as long as possible. This way, worker may not even need the building name but it's global id. And the microservice looks up the metadata from another microservice.
In other words, minimize the number of keys, shared between services, as part of separation of concerns.
Focus on the underlying model, not the JSON to and from. Right modelling of the data in your system(s) gains you more than saving JSON calls.
As for NoSQL, take a look at Riak database: it has adjustable CAP properties, IIRC. Even if you do not use it as such, reading it's documentation may help to come up with suitable architecture for your distributed microservices system. (Of course, this applies if you have essentially parallel system)
First of all, thanks for your question. It is similar to Main Problem Of Document DBs: how to sort collection by field from another collection? I have my own answer for that so i'll try to comment all your solutions:
Solution 1: It is good if client wants to work with Countries/Building/Floors independently. But, it does not solve problem you mentioned in Solution 2 - sorting 10k workers by building gonna be slow
Solution 2: Similar to Solution 1 if all client wants is a list enriched workers without knowing how to combine it from multiple pieces
Solution 3: As you said, unacceptable because of inconsistent data.
Solution 4: Gonna be working, most of the time. But:
Huge data duplication. If you have 20 entities, you are going to have x20 data.
Large complexity. 20 entities -> 20 different procedures to update related data
High cohesion. All your services must know each other. Data model change will propagate to every service because of update procedures
Questionable eventual consistency. It can be done so data will be consistent after failures but it is not going to be easy
Solution 5: Kind of answer :-)
But - you do not want everything. Keep separated services that serve separated entities and build other services on top of them.
If client wants enriched data - build service that returns enriched data, as in Solution 2.
If client wants to display list of enriched data with filtering and sorting - build a service that provides enriched data with filtering and sorting capability! Likely, implementation of such service will contain ES instance that contains cached and indexed data from lower-level services. Point here is that ES does not have to contain everything or be shared between every service - it is up to you to decide better balance between performance and infrastructure resources.
This is a case where Linked Data can help you.
Basically the Floor attribute for the worker would be an URI (a link) to the floor itself. And Any other linked data should be expressed as URIs as well.
Modeled with some JSON-LD it would look like this:
worker = {
'#id': '/workers/87373',
name: 'John',
floor: {
'#id': '/floors/123'
}
}
floor = {
'#id': '/floor/123',
'level': 12,
building: { '#id': '/buildings/87' }
}
building = {
'#id': '/buildings/87',
name: 'John's home',
city: { '#id': '/cities/908' }
}
This way all the client has to do is append the BASE URL (like api.example.com) to the #id and make a simple GET call.
To remove the extra calls burden from the client (in case it's a slow mobile device), we use the gateway pattern with micro-services. The gateway can expand those links with very little effort and augment the return object. It can also do multiple calls in parallel.
So the gateway will make a GET /floor/123 call and replace the floor object on the worker with the reply.

How to use std::unique_ptr for a std::list variable

I have a consumer-producer situation where I am constantly pushing data into a list, if it doesn't already exist there, and then every few seconds I package and send the data to a server.
At the moment I have a thread that delays for some number of seconds, wakes up, sets a flag so nothing is added to the list, does the packaging, deleting items processed from the list, and then allows the program to start adding to the list again.
This was fine for a prototype but now I need to make it work better in a more realistic situation.
So, I want to have the producer get the information, and when the size is large enough or enough time elapses pass the list to a thread to process.
I want to pass the reference to the list, and unique_ptr would be beneficial so once it is moved the producer thread can just create a new list and for all practical purposes be using the same list as before.
But when I tried to change my list from
typedef list<string> STRINGQUEUE;
STRINGQUEUE newMachineQueue;
to
std::unique_ptr<STRINGQUEUE> newMachineQueue;
Then I get errors that insert is not a member of std::unique_ptr.
I don't think I want to use newMachineQueue.get() and then do my operations as I believe I lose the benefits of unique_ptr then.
So, how can I use unique_ptr on a list and be able to call the methods in the list?
Just use it like you would use a pointer.
newMachineQueue->insert(...);
You might be interested in the documentation.
You also don't need to use a unique_ptr, but you can just move the list and reassign a new one to it.
void consumer(std::list<string> list) {
// accept by value!
}
std::list<string> machineQueue;
// hand-off to consumer
consumer(std::move(machineQueue));
machineQueue = std::list<string>{}; // new list

Duplicates when linkswalking riak using ripple

I'm working on a project where I use Riak with Ripple, and I've stumbled on a problem.
For some reason I get duplicates when link-walking a structure of links. When I link walk using curl I don't get the duplicates as far as I can see.
The difference between my curl based link-walk
curl -v http://127.0.0.1:8098/riak/users/2306403e5177b4716da9df93b67300824aa2fd0e/_,projects,0/_,tasks,1
and my ruby ripple/riak-client based link walk
result = Riak::MapReduce.new(self.robject.bucket.client).
add(self.robject.bucket,self.key).
link(Riak::WalkSpec.new({:key => 'projects'})).
link(Riak::WalkSpec.new({:key => 'tasks', :bucket=>'tasks'})).
map("function(v){ if(!JSON.parse(v.values[0].data).completed) {return [v];} else { return [];} }", {:keep => true}).run
is as far as I can tell the map at the end.
However the result of the map/reduce contains several duplicates. I can't wrap my head around why. Now I've settled for removing the duplicates based on the key, but I wish that the riak result wouldn't contain duplicates, since it seems like waste to remove duplicates at the end.
I've tried the following:
Making sure there are no duplicates in the links sets of my ripple objects
Loading the data without the map reduce, but the link walk contains duplicate keys.
Any help is appreciated.
What you're running into here is an interesting side-effect/challenge of Map/Reduce queries.
M/R queries don't have any notion of read quorum values, and they necessarily have to hit every object (within the limitations of input filtering, of course) on every node.
Which means, when N > 1, the queries have to hit every copy of every object.
For example, let's say N=3, as per default. That means, for each written object, there are 3 copies, one each on 3 different nodes.
When you issue a read for an object (let's say with the default quorum value of R=2), the coordinating node (which received the read request from your client) contacts all 3 nodes (and potentially receives 3 different values, 3 different copies of the object).
It then checks to make sure that at least 2 of those copies have the same values (to satisfy the R=2 requirement), returns that agreed-upon value to the requesting client, and discards the other copies.
So, in regular operations (reads/writes, but also link walking), the coordinating node filters out the duplicates for you.
Map/Reduce queries don't have that luxury. They don't really have quorum values associated with them -- they are made to iterate over every (relevant) key and object on all the nodes. And because the M/R code runs on each individual node (close to the data) instead of just on the coordinating node, they can't really filter out any duplicates intrinsically. One of the things they're designed for, for example, is to update (or delete) all of the copies of the objects on all the nodes. So, each Map phase (in your case above) runs on every node, returns the matched 'completed' values for each copy, and ships the results back to the coordinating node to return to the client. And since it's very likely that your N>1, there's going to be duplicates in the result set.
Now, you can probably filter out duplicates explicitly, by writing code in the Reduce phase, to check if there's already a key present and reject duplicates if it is, etc.
But honestly, if I was in your situation, I would just filter out the duplicates in ruby on the client side, rather than mess with the reduce code.
Anyways, I hope that sheds some light on this mystery.

Resources