I'm going to create public chat/shout-box that will refresh after several seconds (almost like IRC or something).
My plan:
1) HTML form,
2) With JavaScript disallow to normally submit that form,
3) Save submitted message (here come problems),
4) Show new messages (with AJAX, I guess);
I'm not sure where to save those messages! I could save them in database, but... that may be very slow, because each user request new messages each several seconds, right? I could try optimize, but I'm not sure how... Maybe I could save those messages somewhere else?
I don't see why the database access would be slow (assuming it's local to the webserver) - I mean the data would certainly be stored in RAM if the data is accessed permanently anyhow and databases are quite optimized to handle queries efficiently.
Obviously you could store the data in your own datastructure and then save that to the db regularly, but you're reinventing the wheel and if your server crashes you may lose data.
What I personally would do is push data to your clients and NOT pull data from them (which seems what you're planning to do), that way you'd only have to send data whenever someone logs in or a new message appears - both situations that should happen not that often.
Related
I am making a web app with Socket.io and I want to store data for each of the rooms. The data includes some data about users, as well as the room itself, etc., all in a JavaScript object.
Now my question is if I simply have an array let rooms = [] on my server.js which I manipulate and use to store data, would that be OK?
If I deploy to production and have users on the site, would this be fine and work as expected? I am not sure if I need to implement a DB here. Thoughts?
It really depends on what you want to get out of it. Using local state (i.e. what you are doing with let rooms = []) will work just fine (I've done this and had success with it).
The downside is that your state will be in one server's memory. So if that server goes down or you restart it, you will lose all that state (all your rooms). Also, if you need to scale beyond one server then this won't work because each server would have a different list of room data. Your clients would get a different view of things depending on which server they connect to.
The reason this approach has worked for me previously was because my data was very transient and I could accept losing it. I also did not have scaling needs.
In summary, if your situation is such that:
you won't have more users than you can handle on one server instance at any given time
it's okay if your data gets reset
Then go ahead with this - it worked great for me! Otherwise, if you want to make sure your room data doesn't get reset or if you need more than one server, you will want something like a database.
I am developing a app in React + Redux and I have a constant doubt and I can't find documentation about it. Is there any performance downside if, let's say in a saga, I read data from a cookie/localStorage instead from the state? This read process would only happen once on each load.
The key thing here is the performance, without taking into consideration if it's good or bad practice.
Thank in advance.
First of all - what do you mean state ? In redux - state is just a plain object (plus some methods, but still). So when you read data from there - you just read props from object.
While cookies, localstorage - it's DOM api, which first of all slower, plus you need not only read data, but also parse it (cause both cookies, storage work with serialized data). So definitely storage/cookie slower than state.
You can check http://jsben.ch/nvo5G
BUT! - you can't save in-memory object state between page reloads. So for this, you can use storage (pattern named persistent state. And there is probably no other way to implement this functionality (or client DB) - in case you need to restore some state on reload - you have just two options - save state on a client (cookies, storage/db), or on server (and do fetch request).
It's MICRO optimisations, mostly you shouldn't care about it (in the case of reading just on start app)
I was asked this question in an interview:
For a high traffic website, there is a method (say getItems()) that gets called frequently. To prevent going to the DB each time, the result is cached. However, thousands of users may be trying to access the cache at the same time, and so locking the resource would not be a good idea, because if the cache has expired, the call is made to the DB, and all the users would have to wait for the DB to respond. What would be a good strategy to deal with this situation so that users don't have to wait?
I figure this is a pretty common scenario for most high-traffic sites these days, but I don't have the experience dealing with these problems--I have experience working with millions of records, but not millions of users.
How can I go about learning the basics used by high-traffic sites so that I can be more confident in future interviews? Normally I would start a side project to learn some new technology, but it's not possible to build out a high-traffic site on the side :)
The problem you were asked on the interview is the so-called Cache miss-storm - a scenario in which a lot of users trigger regeneration of the cache, hitting in this way the DB.
To prevent this, first you have to set soft and hard expiration date. Lets say the hard expiration date is 1 day, and the soft 1 hour. The hard is one actually set in the cache server, the soft is in the cache value itself (or in another key in the cache server). The application reads from cache, sees that the soft time has expired, set the soft time 1 hour ahead and hits the database. In this way the next request will see the already updated time and won't trigger the cache update - it will possibly read stale data, but the data itself will be in the process of regeneration.
Next point is: you should have procedure for cache warm-up, e.g. instead of user triggering cache update, a process in your application to pre-populate the new data.
The worst case scenario is e.g. restarting the cache server, when you don't have any data. In this case you should fill cache as fast as possible and there's where a warm-up procedure may play vital role. Even if you don't have a value in the cache, it would be a good strategy to "lock" the cache (mark it as being updated), allow only one query to the database, and handle in the application by requesting the resource again after a given timeout
You could probably be better of using some distributed cache repository, as memcached, or others depending your access pattern.
You could use the Cache implementation of Google's Guava library if you want to store the values inside the application.
From the coding point of view, you would need something like
public V get(K key){
V value = map.get(key);
if (value == null) {
synchronized(mutex){
value = map.get(key);
if (value == null) {
value = db.fetch(key);
map.put(key, value);
}
}
}
return value;
}
where the map is a ConcurrentMap and the mutex is just
private static Object mutex = new Object();
In this way, you will have just one request to the db per missing key.
Hope it helps! (and don't store null's, you could create a tombstone value instead!)
Cache miss-storm or Cache Stampede Effect, is the burst of requests to the backend when cache invalidates.
All high concurrent websites I've dealt with used some kind of caching front-end. Bein Varnish or Nginx, they all have microcaching and stampede effect suppression.
Just google for Nginx micro-caching, or Varnish stampede effect, you'll find plenty of real world examples and solutions for this sort of problem.
All boils down to whether or not you'll allow requests pass through cache to reach backend when it's in Updating or Expired state.
Usually it's possible to actively refresh cache, holding all requests to the updating entry, and then serve them from cache.
But, there is ALWAYS the question "What kind of data are you supposed to be caching or not", because, you see, if it is just plain text article, which get an edit/update, delaying cache update is not as problematic than if your data should be exactly shown on thousands of displays (real-time gaming, financial services, and so on).
So, the correct answer is, microcache, suppression of stampede effect/cache miss storm, and of course, knowing which data to cache when, how and why.
It is worse to consider particular data type for caching only if data consumers are ready for getting stale date (in reasonable bounds).
In such case you could define invalidation/eviction/update policy to keep you data up-to-date (in business meaning).
On update you just replace data item in cache and all new requests will be responsed with new data
Example: Stocks info system. If you do not need real-time price info it is reasonable to keep in cache stock and update it every X mils/secs with expensive remote call.
Do you really need to expire the cache. Can you have an incremental update mechanism using which you can always increment the data periodically so that you do not have to expire your data but keep on refreshing it periodically.
Secondly, if you want to prevent too many users from hiting the db in one go, you can have a locking mechanism in your stored proc (if your db supports it) that prevents too many people hitting the db at the same time. Also, you can have a caching mechanism in your db so that if someone is asking for the exact same data from the db again, you can always return a cached value
Some applications also use a third service layer between the application and the database to protect the database from this scenario. The service layer ensures that you do not have the cache miss storm in the db
The answer is to never expire the Cache and have a background process update cache periodically. This avoids the wait and the cache-miss storms, but then why use cache in this scenario?
If your app will crash with a "Cache miss" scenario, then you need to rethink your app and what is cache verses needed In-Memory data. For me, I would use an In Memory database that gets updated when data is changed or periodically, not a Cache at all and avoid the aforementioned scenario.
I just want to ask for your experience. I'm designing a public website, using jQuery Ajax in most of operations. I'm having some timeouts, and I think it should be for hosting provider cause. Any of you have expirience in this case and may advise me on some hints (especially on timeouts handling)?
Thanks in advance to all.
Esteve
If you have a half-decent host, chances are these aren't network timeouts but are rather due to insufficient hardware which causes your server-side scripts to take too long to answer. For example if you have an autocomplete field and the script goes through a database of 100,000 entries, this is a breeze for newer servers but older "budget" servers or overcrowded shared hosting servers might croak on it.
Depending on what your Ajax operations are, you may be able to break them down in shorter chunks. If you're doing database queries for example, use LIMIT and OFFSET and only return say, 5 entries at a time. When those 5 entries arrive on the client, make another Ajax call for 5 more, so from the user's point of view the entries will keep coming in and it will look fluid (instead of waiting 30s and possibly timing out before they see all entries at once). If you do this make sure you display a spiffy web 2.0 turning wheel to let the user know if they should be waiting some more or if it's done.
An example:
Say, I have an AJAX chat on a page where people can talk to each other.
How is it possible to display (send) the message sent by person A to persons B, C and D while they have the chat opened?
I understand that technically it works a bit different: the chat(ajax) is reading from DB (or other source), say every second, to find out if there are new messages to display.
But I wonder if there is a method to send the new message to the rest of the people just when it is sent, and not to load the DB with 1000s of reads every second.
Please note that the AJAX chat example is just an example to explain what I want, and is not something I want to realize. I just need to know if there is a method to let all the opened browser at a specific page(ajax) that there is new content on the server that should be gathered.
{sorry for my English}
Since the server cannot respond to a client without a corresponding request, you need to keep state for each user's queued message. However, this is exactly what the database accomplishes. You cannot get around this by replacing the database with something that doesn't just accomplish the same thing in a different way. That said, there are surely optimizations you could do. Keep in mind, however, that you shouldn't prematurely optimize situations like this; databases are designed to handle extremely high traffic, and it's very possible (and in fact, likely), that the scenario described will be handled just fine by the database out of the box.
What you're describing is generally referred to as the 'Comet' concept. See the Wikipedia article for details, especially implementation options (long polling, etc.).
Another answer is to have the server push changes to connected clients, that way there is just one call to the database and then the server pushes the change to all the clients. This article indicates it is possible, however I have never tried this myself.
It's very basic, but if you want to stick with a standard AJAX solution, a simple means of reducing load on the server when polling would be to get the AJAX call to forward the last collected comment ID for that client - you then use that (with the appropriate escaping) in the lookup query on the server side to ensure you only return new comments.