How would bounded context help when doing microservices? - microservices

So at first I though we would have 1:1 relation between a service and bounded context. I though strategic design in DDD would help decomposing Domain into several services and reduce a lot of complexity, but I was wrong. Actually you can have many services inside of bounded context not just one. So how would bounded context help when doing microservices since you still have those messy services but just inside of a bounded context with specific ubiquitous language.


Documenting the aggregates & relations between microservices

In my organization we're trying to design our microservices based on the Bounded Context (BC) pattern (part of Domain-driven design). While we're doing this we also try to use another DDD pattern called the Context Mapping, to better identify the various contexts in the application, their boundaries and the relations between them.
All of this can be done on a whiteboard or in some online drawing tool. However, I'm looking for a way to generate a complete picture of the various services, what aggregates they contain and potentially the relations between such aggregates (as the same User in one BC might be a Customer in another). A good example is figure 4-10 in here. The generation should ideally be based on some DSL or script which we would maintain, as this kind of work is fairly high-level and context boundaries don't change very often. For example, a team adds a new aggregate or starts keeping a copy of an aggregate from another service, they update the script/DSL and regenerate the diagram.
Solutions I've looked at so far:
Context mapper - it doesn't visualise the aggregates in each BC/service, nor does it show relations
C4 model, Level 2 - we already use it, so it could be fairly easy to add a textual list of aggregates per container, but it's not what it's intended for (and the visualisation is not optimal)
ddd bounded context/microservice canvas - it's too detailed and can't really be used to look at the big picture
I'm wondering how and if this is done in other organization, and looking for suggestions for some tooling that would be of help.
I think the format used for event storming sessions might be worth to have look at in your case. Once done it covers all domain events, commands, actors, read models, policies, external systems. Also it illustrates the bounded contexts in which the aggregates, events, etc. live in. An example can be found here:
I know the format is mainly used for domain exploration but from my experience, if done nicely (e.g. using some tool like Miro, Lucid, or the like) it also provides good documentation and overview of what's going on in your system.

Separate Messaging system Inside Bounded Context

Is it a good practice to run a separate Messaging system for internal Domain Events inside Bounded Context? Or It's better to reuse the common one, which is listened by all bounded contexts?
Check out images to understand the question better:
Option one (Common RAbbitMq for all contexts:
Option two (Separate RabbitMq for each BC):
I think the first approach is totally valid. The bounded contexts are abstractions to encapsulate domain or business logic related to one context of the business however the message system is a piece that only exixts to facilitate the communication between these decoupled and hermetic bounded context so, I think that have a unique message broker shared by multiple bounded context is correct. In addition this way you will have less overhead and latency

Moving in an actor inside system boundary of UML usecase diagram

So my question is as follows, is it possible to move an actor inside the system boundary of a use case diagram? Can it be a part of the system.
I set a server as an actor, in where a customer interacts with the server in an e-commerce environment. Is it possible or should I move the server inside of the system? Since the server is a part of the system that the customer is interacting with.
This server is most likely then going to be used by an admin role.
No, you can't do that, unless you model only a part of the system.
By definition an actor is external to the system. It can be a user, other system or a sensor.
If you want to show a system decomposition into smaller parts, use component diagram.
Note, the role of a use case diagram is to show functions of the system as a whole.
On the other hand you may depict just one part of the system (ie. system tier). In such case other parts (tiers) are external to the modeled system part under consideration.
I suppose you mean "move an actor inside the system boundary" since in any case the actor appears inside the UC diagram (or you just won't see it).
You can do that. However, it would be rather pointless since actors are meant to interact with the system under consideration (SUC) from outside. The only case where you can do that is, when you create sub-systems (that is you have boundaries of sub-systems within the SUC boundary). I wouldn't do that either from the very beginning. Only in a later design phase you could introduce such a construct. In that case you'd have independent teams working on the different sub-systems and one on the integration for the SUC. For "normally" sized systems you should leave these sub-systems just away and focus on actors and their UCs inside the SUC boundary.

When to use a certain Reinforcement Learning algorithm?

I'm studying Reinforcement Learning and reading Sutton's book for a university course. Beside the classic PD, MC, TD and Q-Learning algorithms, I'm reading about policy gradient methods and genetic algorithms for the resolution of decision problems.
I have never had experience before in this topic and I'm having problems understanding when a technique should be preferred over another. I have a few ideas, but I'm not sure about them. Can someone briefly explain or tell me a source where I can find something about typical situation where a certain methods should be used? As far as I understand:
Dynamic Programming and Linear Programming should be used only when the MDP has few actions and states and the model is known, since it's very expensive. But when DP is better than LP?
Monte Carlo methods are used when I don't have the model of the problem but I can generate samples. It does not have bias but has high variance.
Temporal Difference methods should be used when MC methods need too many samples to have low variance. But when should I use TD and when Q-Learning?
Policy Gradient and Genetic algorithms are good for continuous MDPs. But when one is better than the other?
More precisely, I think that to choose a learning methods a programmer should ask himlself the following questions:
does the agent learn online or offline?
can we separate exploring and exploiting phases?
can we perform enough exploration?
is the horizon of the MDP finite or infinite?
are states and actions continuous?
But I don't know how these details of the problem affect the choice of a learning method.
I hope that some programmer has already had some experience about RL methods and can help me to better understand their applications.
does the agent learn online or offline? helps you to decide either using on-line or off-line algorithms. (e.g. on-line: SARSA, off-line: Q-learning). On-line methods have more limitations and need more attention to pay.
can we separate exploring and exploiting phases? These two phase are normally in a balance. For example in epsilon-greedy action selection, you use an (epsilon) probability for exploiting and (1-epsilon) probability for exploring. You can separate these two and ask the algorithm just explore first (e.g. choosing random actions) and then exploit. But this situation is possible when you are learning off-line and probably using a model for the dynamics of the system. And it normally means collecting a lot of sample data in advance.
can we perform enough exploration? The level of exploration can be decided depending on the definition of the problem. For example, if you have a simulation model of the problem in memory, then you can explore as you want. But real exploring is limited to amount of resources you have. (e.g. energy, time, ...)
are states and actions continuous? Considering this assumption helps to choose the right approach (algorithm). There are both discrete and continuous algorithms developed for RL. Some of "continuous" algorithms internally discretize the state or action spaces.

How can I make my applications scale well?

In general, what kinds of design decisions help an application scale well?
(Note: Having just learned about Big O Notation, I'm looking to gather more principles of programming here. I've attempted to explain Big O Notation by answering my own question below, but I want the community to improve both this question and the answers.)
Responses so far
1) Define scaling. Do you need to scale for lots of users, traffic, objects in a virtual environment?
2) Look at your algorithms. Will the amount of work they do scale linearly with the actual amount of work - i.e. number of items to loop through, number of users, etc?
3) Look at your hardware. Is your application designed such that you can run it on multiple machines if one can't keep up?
Secondary thoughts
1) Don't optimize too much too soon - test first. Maybe bottlenecks will happen in unforseen places.
2) Maybe the need to scale will not outpace Moore's Law, and maybe upgrading hardware will be cheaper than refactoring.
The only thing I would say is write your application so that it can be deployed on a cluster from the very start. Anything above that is a premature optimisation. Your first job should be getting enough users to have a scaling problem.
Build the code as simple as you can first, then profile the system second and optimise only when there is an obvious performance problem.
Often the figures from profiling your code are counter-intuitive; the bottle-necks tend to reside in modules you didn't think would be slow. Data is king when it comes to optimisation. If you optimise the parts you think will be slow, you will often optimise the wrong things.
Ok, so you've hit on a key point in using the "big O notation". That's one dimension that can certainly bite you in the rear if you're not paying attention. There are also other dimensions at play that some folks don't see through the "big O" glasses (but if you look closer they really are).
A simple example of that dimension is a database join. There are "best practices" in constructing, say, a left inner join which will help to make the sql execute more efficiently. If you break down the relational calculus or even look at an explain plan (Oracle) you can easily see which indexes are being used in which order and if any table scans or nested operations are occurring.
The concept of profiling is also key. You have to be instrumented thoroughly and at the right granularity across all the moving parts of the architecture in order to identify and fix any inefficiencies. Say for example you're building a 3-tier, multi-threaded, MVC2 web-based application with liberal use of AJAX and client side processing along with an OR Mapper between your app and the DB. A simplistic linear single request/response flow looks like:
browser -> web server -> app server -> DB -> app server -> XSLT -> web server -> browser JS engine execution & rendering
You should have some method for measuring performance (response times, throughput measured in "stuff per unit time", etc.) in each of those distinct areas, not only at the box and OS level (CPU, memory, disk i/o, etc.), but specific to each tier's service. So on the web server you'll need to know all the counters for the web server your're using. In the app tier, you'll need that plus visibility into whatever virtual machine you're using (jvm, clr, whatever). Most OR mappers manifest inside the virtual machine, so make sure you're paying attention to all the specifics if they're visible to you at that layer. Inside the DB, you'll need to know everything that's being executed and all the specific tuning parameters for your flavor of DB. If you have big bucks, BMC Patrol is a pretty good bet for most of it (with appropriate knowledge modules (KMs)). At the cheap end, you can certainly roll your own but your mileage will vary based on your depth of expertise.
Presuming everything is synchronous (no queue-based things going on that you need to wait for), there are tons of opportunities for performance and/or scalability issues. But since your post is about scalability, let's ignore the browser except for any remote XHR calls that will invoke another request/response from the web server.
So given this problem domain, what decisions could you make to help with scalability?
Connection handling. This is also bound to session management and authentication. That has to be as clean and lightweight as possible without compromising security. The metric is maximum connections per unit time.
Session failover at each tier. Necessary or not? We assume that each tier will be a cluster of boxes horizontally under some load balancing mechanism. Load balancing is typically very lightweight, but some implementations of session failover can be heavier than desired. Also whether you're running with sticky sessions can impact your options deeper in the architecture. You also have to decide whether to tie a web server to a specific app server or not. In the .NET remoting world, it's probably easier to tether them together. If you use the Microsoft stack, it may be more scalable to do 2-tier (skip the remoting), but you have to make a substantial security tradeoff. On the java side, I've always seen it at least 3-tier. No reason to do it otherwise.
Object hierarchy. Inside the app, you need the cleanest possible, lightest weight object structure possible. Only bring the data you need when you need it. Viciously excise any unnecessary or superfluous getting of data.
OR mapper inefficiencies. There is an impedance mismatch between object design and relational design. The many-to-many construct in an RDBMS is in direct conflict with object hierarchies (person.address vs. location.resident). The more complex your data structures, the less efficient your OR mapper will be. At some point you may have to cut bait in a one-off situation and do a more...uh...primitive data access approach (Stored Procedure + Data Access Layer) in order to squeeze more performance or scalability out of a particularly ugly module. Understand the cost involved and make it a conscious decision.
XSL transforms. XML is a wonderful, normalized mechanism for data transport, but man can it be a huge performance dog! Depending on how much data you're carrying around with you and which parser you choose and how complex your structure is, you could easily paint yourself into a very dark corner with XSLT. Yes, academically it's a brilliantly clean way of doing a presentation layer, but in the real world there can be catastrophic performance issues if you don't pay particular attention to this. I've seen a system consume over 30% of transaction time just in XSLT. Not pretty if you're trying to ramp up 4x the user base without buying additional boxes.
Can you buy your way out of a scalability jam? Absolutely. I've watched it happen more times than I'd like to admit. Moore's Law (as you already mentioned) is still valid today. Have some extra cash handy just in case.
Caching is a great tool to reduce the strain on the engine (increasing speed and throughput is a handy side-effect). It comes at a cost though in terms of memory footprint and complexity in invalidating the cache when it's stale. My decision would be to start completely clean and slowly add caching only where you decide it's useful to you. Too many times the complexities are underestimated and what started out as a way to fix performance problems turns out to cause functional problems. Also, back to the data usage comment. If you're creating gigabytes worth of objects every minute, it doesn't matter if you cache or not. You'll quickly max out your memory footprint and garbage collection will ruin your day. So I guess the takeaway is to make sure you understand exactly what's going on inside your virtual machine (object creation, destruction, GCs, etc.) so that you can make the best possible decisions.
Sorry for the verbosity. Just got rolling and forgot to look up. Hope some of this touches on the spirit of your inquiry and isn't too rudimentary a conversation.
Well there's this blog called High Scalibility that contains a lot of information on this topic. Some useful stuff.
Often the most effective way to do this is by a well thought through design where scaling is a part of it.
Decide what scaling actually means for your project. Is infinite amount of users, is it being able to handle a slashdotting on a website is it development-cycles?
Use this to focus your development efforts
Jeff and Joel discuss scaling in the Stack Overflow Podcast #19.
FWIW, most systems will scale most effectively by ignoring this until it's a problem- Moore's law is still holding, and unless your traffic is growing faster than Moore's law does, it's usually cheaper to just buy a bigger box (at $2 or $3K a pop) than to pay developers.
That said, the most important place to focus is your data tier; that is the hardest part of your application to scale out, as it usually needs to be authoritative, and clustered commercial databases are very expensive- the open source variations are usually very tricky to get right.
If you think there is a high likelihood that your application will need to scale, it may be intelligent to look into systems like memcached or map reduce relatively early in your development.
One good idea is to determine how much work each additional task creates. This can depend on how the algorithm is structured.
For example, imagine you have some virtual cars in a city. At any moment, you want each car to have a map showing where all the cars are.
One way to approach this would be:
for each car {
determine my position;
for each car {
add my position to this car's map;
This seems straightforward: look at the first car's position, add it to the map of every other car. Then look at the second car's position, add it to the map of every other car. Etc.
But there is a scalability problem. When there are 2 cars, this strategy takes 4 "add my position" steps; when there are 3 cars, it takes 9 steps. For each "position update," you have to cycle through the whole list of cars - and every car needs its position updated.
Ignoring how many other things must be done to each car (for example, it may take a fixed number of steps to calculate the position of an individual car), for N cars, it takes N2 "visits to cars" to run this algorithm. This is no problem when you've got 5 cars and 25 steps. But as you add cars, you will see the system bog down. 100 cars will take 10,000 steps, and 101 cars will take 10,201 steps!
A better approach would be to undo the nesting of the for loops.
for each car {
add my position to a list;
for each car {
give me an updated copy of the master list;
With this strategy, the number of steps is a multiple of N, not of N2. So 100 cars will take 100 times the work of 1 car - NOT 10,000 times the work.
This concept is sometimes expressed in "big O notation" - the number of steps needed are "big O of N" or "big O of N2."
Note that this concept is only concerned with scalability - not optimizing the number of steps for each car. Here we don't care if it takes 5 steps or 50 steps per car - the main thing is that N cars take (X * N) steps, not (X * N2).
