Does stateless only transfers client's state to somewhere else? - session

I have spent a whole day understanding what stateless architecture is. I read many posts and answers like
Can My Web App Implement User Login and Remain Stateless?
Pros and Cons of Sticky Session / Session Affinity load blancing strategy?
http://www.quora.com/What-is-stateless-and-statefull-web-architecture
It seems that stateless is only transferring some user state to
somewhere else(database/memcache or client cookies).Is this right? If yes, the state is only stored somewhere else, so there must be something that is not stateless(client or server), though the load balancer now does not need to worry about which machine to route .
If above is right, if we choose transfer the user information to central place(transfser to client seems not always be the solution according to some answers) like database or memcache , we still need to find this session info for every requests. That means the place where holds the user state will have the same pressure on handling tens of millions requests at the same time. And probably, the way we find the the session information is just like sticky session(route the information request into a single node in the memcache). So why do we consider transferring the state is more scalable? The pressure is only transferred(And always, database has already had too much load)
Do I miss something or understand wrongly?
Thank you!

You are correct in that moving your state to a different layer means your application is stateful (there are very few truly stateless applications, mostly only ones doing pure math).
That doesn't mean individual layers can't be stateless, and those layers that are will scale differently than the stateful layers. The idea is that by making a particular part of the application stateless, you will be able to scale it horizontally, instead of vertically, thus able to respond to many more requests by simply buying more hardware.
You will still need to scale wherever you push that state to. So if you are are pushing it out to a database, you will need to be able to scale that database accordingly. This works well if you can push it out to a layer that can be scaled cheaply (like memcached).
It is often the goal to make your business and web layers stateless because they are generally much more expensive to scale than your data-store layers, but this isn't always true. If you have put a lot of load on your data store layer, and very little load on your application or web layers (like a data-driven vs an interaction-driven app, then you will overload your data layer.
So, like everything else, whether to make your application stateless comes down to "it depends". Generally, stateful business and web layers tend to get overloaded long before data layers do. Especially if you are doing significant OOP.

Related

Eventually consistent DB : How to deal with relational data?

So let's say we have microservices that uses an event broker to communicate each other.
To secure sovereignty of data, each microservices has denormalized documents.
So whenever the data is changed, from the service changed the data, 'DataAHasChanged' event gets fired. Next, all the microservices that have subscribed this event will change document they have to maintain consistency of data A. (A here is not foreign key, but it's actual data, since it's denormalized)
This seems really not good to me if services have multiple documents that have data A. And if data A is changing often. I would just send API call to other services using data A's ID as a foreign key.
Real world use case would be:
User creates 'contract requests' and it has multiple vendor information.
Vendors information will be changed often.
So if there are 2000 contract requests. It means whenever vendor changes their information. We should go through every contract requests and change the denormalized document.
Is eventual consistency still the best practice in this case? or should I just use synchronous call to just read data from vendor service?
Thank you.
I would revisit the microservices decoupling and would ask a question - who is the source of truth for each type of data? You'll probably arrive to one service owning documents and that service will be responsible for updating those documents as well.
Even with a dedicated service owning documents, you still have to answer what are the consistency guarantees you need. Usually you start with SLA's - how available your service should be? How the data is stored? Often the underlaying data storage will dictate those.
Also, I would like to note that even with synchronous calls your system will be eventually consistent - since it takes time to execute all those calls, it will be a period when the system as a whole might see non-latest data.
If you really need true strong consistency, you may will have to pick right storage for that. I would go with a strongly consistent option assuming my performance and availability goals are met. And the reason for strong consistency - it is much easier to reason about; hence the system gets simpler.

Microservices: model sharing between bounded contexts

I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.

Eventual Consistency in microservice-based architecture temporarily limits functionality

I'll illustrate my question with Twitter. For example, Twitter has microservice-based architecture which means that different processes are in different servers and have different databases.
A new tweet appears, server A stored in its own database some data, generated new events and fired them. Server B and C didn't get these events at this point and didn't store anything in their databases nor processed anything.
The user that created the tweet wants to edit that tweet. To achieve that, all three services A, B, C should have processed all events and stored to db all required data, but service B and C aren't consistent yet. That means that we are not able to provide edit functionality at the moment.
As I can see, a possible workaround could be in switching to immediate consistency, but that will take away all microservice-based architecture benefits and probably could cause problems with tight coupling.
Another workaround is to restrict user's actions for some time till data aren't consistent across all necessary services. Probably a solution, depends on customer and his business requirements.
And another workaround is to add additional logic or probably service D that will store edits as user's actions and apply them to data only when they will be consistent. Drawback is very increased complexity of the system.
And there are two-phase commits, but that's 1) not really reliable 2) slow.
I think slowness is a huge drawback in case of such loads as Twitter has. But probably it could be solved, whereas lack of reliability cannot, again, without increased complexity of a solution.
So, the questions are:
Are there any nice solutions to the illustrated situation or only things that I mentioned as workarounds? Maybe some programming platforms or databases?
Do I misunderstood something and some of workarounds aren't correct?
Is there any other approach except Eventual Consistency that will guarantee that all data will be stored and all necessary actions will be executed by other services?
Why Eventual Consistency has been picked for this use case? As I can see, right now it is the only way to guarantee that some data will be stored or some action will be performed if we are talking about event-driven approach when some of services will start their work when some event is fired, and following my example, that event would be “tweet is created”. So, in case if services B and C go down, I need to be able to perform action successfully when they will be up again.
Things I would like to achieve are: reliability, ability to bear high loads, adequate complexity of solution. Any links on any related subjects will be very much appreciated.
If there are natural limitations of this approach and what I want cannot be achieved using this paradigm, it is okay too. I just need to know that this problem really isn't solved yet.
It is all about tradeoffs. With eventual consistency in your example it may mean that the user cannot edit for maybe a few seconds since most of the eventual consistent technologies would not take too long to replicate the data across nodes. So in this use case it is absolutely acceptable since users are pretty slow in their actions.
For example :
MongoDB is consistent by default: reads and writes are issued to the
primary member of a replica set. Applications can optionally read from
secondary replicas, where data is eventually consistent by default.
from official MongoDB FAQ
Another alternative that is getting more popular is to use a streaming platform such as Apache Kafka where it is up to your architecture design how fast the stream consumer will process the data (for eventual consistency). Since the stream platform is very fast it is mostly only up to the speed of your stream processor to make the data available at the right place. So we are talking about milliseconds and not even seconds in most cases.
The key thing in these sorts of architectures is to have each service be autonomous when it comes to writes: it can take the write even if none of the other application-level services are up.
So in the example of a twitter like service, you would model it as
Service A manages the content of a post
So when a user makes a post, a write happens in Service A's DB and from that instant the post can be edited because editing is just a request to A.
If there's some other service that consumes the "post content" change events from A and after a "new post" event exposes some functionality, that functionality isn't going to be exposed until that service sees the event (yay tautologies). But that's just physics: the sun could have gone supernova five minutes ago and we can't take any action (not that we could have) until we "see the light".

Multiple RemoteObjects - Best Practices

I have an application with about 20 models and controllers and am not using any particular framework. What is the best practice for using multiple remote objects in Flex performance-wise?
1) Method 1 - One per Component - Each component instantiates a RemoteObject for itself
2) Method 2 - Multiple in Application Root - Each controller is handled by a RemoteObject in the root
3) Method 3 - One in Application Root - Combine all controllers into one class and handle them with one RemoteObject
I'm guessing 3 will have the best performance but will be too messy to maintain and 1 would be the cleanest but would take a performance hit. What do you think?
Best practice would be "none of the above." Your Views should dispatch events that a controller or Command component would use to call your service(s) and then update your model on return of the data. Your Views would be bound to the data, and then the Views would automatically be updated with the new data.
My preference is to have one service Class per different piece or type of data I am retrieving--this makes it easier to build mock services that can be swapped for real services as needed depending on what you're doing (for instance if you have a complicated server setup, a developer who is working on skinning would use the mocks). But really, how you do that is a matter of personal preference.
So, where do your services live, so that a controller or command can reach them? If you use a Dependency Injection framework such as Robotlegs or Swiz, it will have a separate object that handles instantiating, storing, and and returning instances of model and service objects (in the case of Robotlegs, it also will create your Command objects for you and can create view management objects called Mediators). If you don't use one of these frameworks, you'll need to "roll your own," which can be a bit difficult if you're not architecturally minded.
One thing people who don't know how to roll their own (such as the people who wrote the older versions of Cairngorm) tend to fall back on is Singletons. These are not considered good practice in this day and age, especially if you are at all interested in unit testing your work. http://misko.hevery.com/code-reviewers-guide/flaw-brittle-global-state-singletons/
A lot depends on how much data you have, how many times it gets refreshed from the server, and of you have to support update as well as query.
Number 3 (and 2) are basically a singletons - which tends to work best for large applications and large datasets. Yes, it would be complex to maintain yourself, but that's why people tend to use frameworks (puremvc, cairgorm, etc). much of the complexity is handled for you. Caching data within the frameworks also enhances performance and response time.
The problem with 1 is if you have to coordinate data updates per component, you basically need to write a stateless UI, always retrieving the data from the server on each component visibility.
edit: I'm using cairgorm - have ~ 30 domain models (200 or so remote calls) and also use view models. some of my models (remote object) have 10's of thousands of object instances (records), I keep a cache with/write back. All of the complexity is encapsulated in the controller/commands. Performance is acceptable.
In terms of pure performance, all three of those should perform roughly the same. You'll of course use slightly more memory by having more instances of RemoteObject and there are a couple of extra bytes that get sent along with the first request that you've made with a given RemoteObject instance to your server (part of the AMF protocol). However, the effect of these things is negligible. As such, Amy is right that you should make a choice based on ease of maintainability and not performance.

Coldfusion: is it better to keep just the user_id in the session, or the whole user object?

I've got a cfc to handle the user object. My question is: is it better to store just the user_id in the session and create the user object anew with each request? Or is is better to store the whole user object in the session?
Here are my thoughts either way:
If I store the whole object in the session:
There will be potentially less processor overhead
There will be potentially more memory overhead
all of the methods/functions are stored in the actual object, and new functions that I update in the cfc will not be available unless users logout and back in, or if I devise some way to make it refresh itself.
There could potentially be mutex or lock problems if I'm messing with the object via concurrent ajax calls
If I store just the user_id in the session:
I'll have to create the user object with each page request (potentially more processor overhead)
There will be potentially less memory overhead
There won't be a chance for mutex/lock/race conditions since each request will have its own copy of the user object
Updates to the CFC model itself will be immediately recognized across the system and users wouldn't have to log out and back in
Is there a normal practice for this sort of thing? Am I over-thinking it?
All of the CF apps I've written were targeted at high traffic levels and high availability, so we never had the luxury of being able to think about single-server practices.
So, in my experience, I always had to a) allow for multiple load-balanced servers, and b) avoid sticky-sessions on the load balancer for a number of reasons. Therefore, we needed to, at the very least, have a server become part of a cluster on the fly and pick up mid-session traffic.
So, we always pulled "session" data from a shared datastore on every request.
My suggestion is to implement a session facade.
This affords you the option to change how you persist session data (like the user record) without changing the rest of your app.
You can choose, behind the scenes, to store everything in the session scope, load it up for every request, do a hybrid, use a key-value store, whatever.
You can choose whether to eager-load data, or lazy-load data, or any mix in between, and the rest of the app doesn't need to be aware of what you've done.
On Race Conditions
If you're concerned about race conditions then I would suggest using named locks around data commit and access. This is another bonus of using a facade - your application code doesn't need to know about this, and you can choose to put locks around certain objects, as opposed to locking the whole session.
You haven't indicated whether you're using an ORM, so this is a general answer.
For typical applications, I recommend instantiating the user object into the session scope. There's a big downside to creating the object anew with each request that you didn't include in your list: changes to the user object's properties and state will not persist across requests unless you intend to flush the user object's state to your persistence layer (e.g. database) on every hit. That is likely to be a much more expensive operation than object instantiation, and it doesn't necessarily insulate you from the kinds of problems you're thinking about with respect to ajax calls, race conditions, etc -- it just transfers the manifestation of those problems to the persistence layer, where your object's data could be in an unpredictable state.
Since every new request would be an "implicit save", you would also have to design your "ephemeral" object to be able to persist itself regardless of whether it's in a valid state (imagine the case of a multi-page form that modifies some aspect of the user object).
For session-stored objects, your concerns about memory can be mitigated by careful design practices. For instance, if your user has many tasks, and each task has many items, it might be a bad idea to instantiate and compose all those objects into your user object (i.e., lazy loading would be a better approach than eager loading).
If you really must to be able to change your CFCs on the fly, you can achieve that goal even with session-stored objects. One way is to store a version flag in both the application and session. With each request, your app would compare those flags. When they differ, the app would run a session-reload routine that snapshots current properties, rebuilds the session-stored objects, and finally updates the session flag to match the application flag.
This is piggy-backing partially off Ken Redler's answer but I don't have enough reputation to comment.
The way we do it, and the way I prefer, is to store the user data in Session as a struct. Then on request start, our Auth Model creates the user object in the Request scope and overrides any default values with the Session data. There are a few advantages to this:
Less hits to the database, less CPU
Always run newest code without a complex custom system ensuring that
Clustered environment friendly (complex objects in Session can't be clustered)
Can add or remove properties without corruption (assuming your User object only updates dirty columns)
Also, if you're using CF9, one of the features they were really proud of is how much they optimized object instantiation. If you haven't, test it yourself!
It depends.
If you have a lot of traffic - in the thousands of unique visitors per minute range - the memory overhead of storing your User.cfc in the session will eventually weigh you down. This can be easily overcome by throwing hardware at it (more memory for a while, eventually more servers and a hardware load balancer). Of course popularity is a good problem to have.
If you seem to have a CPU, network or other bottleneck in your database space, you may want to have the object cached in session memory so that you have fewer hits to the database.
Why do I mention these scenarios? You may be prematurely optimizing - don't fix a problem that you don't have. Don't optimize your memory, CPU and database access until those are, or soon will be, problems.
Now from an architectural best practice - not from an optimized "what's best for my processor" - well, I can only say: It depends.
Truthfully, neither way is wrong. If you are going to find yourself needing to check credentials against your database on every request, don't cache it. If you like the feel of an object in the session, then cache it. Because you know your own domain, you can probably go back and forth all day on why you should or should not cache the user object in the session. If it's going to make it easier, do it. If it's going to make it harder, don't.
I would just warn you against doing something incredibly convoluted or anything that is not immediately obvious to a developer looking at your application - the more you write, the more you have to maintain forever, the more your co-workers will associate your name with evil.
Finally, last note, if this is a vote - I say you cache it. It makes sense and always feels good to call session.user.hasRole("xyz") or the like.

Resources