I'm just starting to get into web development in earnest, more than just static pages, and I keep wondering, how is data stored per user on the server? When one user submits a form post, and it gets looked up on another page, how can the server tell user1's FooForm data from user2's FooForm data? I know you can keep track of users with cookies and HTTP requests, but how does the server keep all these data sets available? Is there like a tiny virtual machine for each user session? Or some kind of paging system where the user's content is served, and then the server swaps that process out for a while like a system thread? I've always wondered how a site with a million concurrent visitors manages to serve them all and keep all the individual user sessions contained.
I'm familiar with OS multithreaded architecture and algorithms, but the idea of doing that with anywhere from one to a million separate "threads" on a anywhere from one to n separate machines is mind blowing.
Do the algorithms change as the scale does?
I'll stop before I ask too many questions and this gets too broad, but I'd love for someone with some expertise to elucidate this for me, or point me to a good resource.
Thanks for reading.
Related
I'm building out a transactional web app intended for mobile devices. It'll basically just allow players in a league to submit their match scores to our league admin. I've already built it out somewhat with angularjs/JSON Services/ionic but it's very slow going. Changing requirements and very little time to work on it have me considering starting over in CakePHP (despite being fairly new to it and MVC in general).
What coding practices can I follow to keep the user experience fast? My cakephp source folder is massive compared to my angular source folder but if I understand correctly, that won't necessarily affect the user because most of the heavy lifting will be done by the server and presented as a fairly small website to the client, correct?
Should I try to do a big data load right when they login so that most of the data is already client side? Are there ways I can make the requests to/from the server smaller? Any pointers would be great.
Thanks
Without knowing the specifics of your data model, it's hard to give specific ways to optimize.
I would take a look at sending data asynchronously (client-side) with Pusher (or something home-grown) or using pagination to break up large sets of results into smaller subsets.
You can use something like a Real User Metric (RUM) monitor at Pingometer to track performance for users. It'll show what, if anything, takes time to load - network stuff (connectivity, encryption, etc.), application code (controllers), DOM (JavaScript manipulation), or Page Rendering (images, CSS, etc.).
My application is joined at the hip with the facebook application. I show the user's photo and their friends from facebook. I am debating between storing the user (and their friends) photo on my system. What is better for the system performance? Is it better to store the photos in my system or retrieve them facebook at run-time?
Unless your system has some inherent advantages (like local storage), Facebook's server setup is likely to be more optimized than your own. For example:
They use CDN's, so unless you also do their requests will take fewer network hops.
They likely have servers in more geographic locations than you likely do so on average a user will reach one of their servers in fewer hops than yours.
The best way to find out though is to test.
It seems some web architects aim to have a stateless web application. Does that mean basically not storing user sessions? Or is there more to it?
If it is just the user session storing, what is the benefit of not doing that?
Reduces memory usage. Imagine if google stored session information about every one of their users
Easier to support server farms. If you need session data and you have more than 1 server, you need a way to sync that session data across servers. Normally this is done using a database.
Reduce session expiration problems. Sometimes expiring sessions cause issues that are hard to find and test for. Sessionless applications don't suffer from these.
Url linkability. Some sites store the ID of what the user is looking at in the sessions. This makes it impossible for users to simply copy and paste the URL or send it to friends.
NOTE: session data is really cached data. This is what it should be used for. If you have an expensive query which is going to be reused, then save it into session. Just remember that you cannot assume it will be there when you try and get it later. Always check if it exists before retrieving.
From a developer's perspective, statelessness can help make an application more maintainable and easier to work with. If I know a website I'm working on is stateless, I need not worry about things being correctly initialized in the session before loading a particular page.
From a user's perspective, statelessness allows resources to be linkable. If a page is stateless, then when I link a friend to that page, I know that they'll see what I'm seeing.
From the scaling and performance perspective, see tsters answer.
Let say I have a simple architecture where sessions would be shared through a database, with multiple frontends (say F1 and F2) speaking to the same backend.
My issue is the case where both frontends would receive a request corresponding to a same session: a naive implementation would cause session to overwrite each other (I looked at django which seems to fall into that case). I could try to design the backend such as it garantees that no more than one frontend can deal with a given session, but this seems hard to do correctly, especially if I want to handle frontend failures.
I can't help but thinking that the case is pathologic in the first place (there should not be more than one request for a given session at any time), and is not worths being dealt for, but I have not much experience in web development, so maybe I am missing something. How does one usually deal with this case ?
Possible solutions that I would like to avoid:
Sticky session: that's the solution I currently use, and is difficult to support once you have several load balancers, and more significantly goes against the spirit of load balancing in the first place.
Putting data in cookie: for technical reasons outside my control, I cannot use cookie.
One common solution is known as session persistence. Whatever routes your request to the f1 or f2 ensures that as long as a session is active, the client with that session only goes to one frontend.
It is a common feature in almost all loadbalancers. For example, nginx has the ip_hash http://wiki.nginx.org/NginxHttpUpstreamModule
We run a Drupal site, and are expecting a sudden burst of users some time soon.
What are some of the best Drupal practices to handle sudden burst of:
- User registrations
- User Authentication
These operations are heavily dependent on database... so, how do we optimize that?
Are there any techniques that minimize DB interaction during User Authentication? (for example: storing objects in memory, and writing them to DB at a later point in time?)
Any tips are greatly appreciated.
User authentication and registration usually aren't processes that you can cache or delay (as in MySQL's INSERT DELAY). However, there are things you can do to alleviate some load. For example:
Allow users to stay logged in via cookie so that you can avoid the DB access of having to re-authenticate
In general, store commonly used/small bits of data in the user's session or a memcached block
In general, cache as much as possible with memcached
Some of the commercal drupal distros (like Aquia or presflow) have support for multiple DBs, this may help a little. I would say if your hardware is halfway decend you would have to have a major surge to worry.
User registration and user auth is usually not a problem. Having a lot of users that are logged on can be a problem however. Drupal doesn't do much caching for logged in users. The reason is that the pages will look slightly different to each user when displaying user specific stuff. You can cache parts of a page which is the same for all to decrease the load. I don't have experience with it myself, but I've heard about a setup that did it. Doing this won't be that easy though.