I'm planning to start using Amazon EC2, and, as everyone, I want to use Spot instances.
Will be for a minigames server, so Spot instances are perfect for this. Players enter, play the match and leave, so when a Spot instance finishes because of spot instance price volatility only current match will be finished, barely any data loss and perfectly acceptable when you save a lot of money.
Now, altough players are going to be disconnected and connected to an ondemand server when volatility reaches maximum bid, I would like to know if when a Spot instance is force-terminated is called the normal shutdown command or simply is "unplugged" and I don't have a chance to disconnect players safely and save their data to the database (this will take just a few milliseconds).
As of 2015, Amazon now provides a 2-minute termination notice in the instance metadata.
A custom script can be written to poll for the termination notice and call web server graceful shutdown and associated cleanup scripts to ensure zero impact to end-users.
Source: https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termination-notices/
Related
I am new to the topic. Having read a handful of articles on it, and asked a couple of persons, I still do not understand what you people do in regard to one problem.
There are UI clients making requests to several backend instances (for now it's irrelevant whether sessions are sticky or not), and those instances are connected to some highly available DB cluster (may it be Cassandra or something else of even Elasticsearch). Say the backend instance is not specifically tied to one or cluster's machines, and instead its every request to DB may be served by a different machine.
One client creates some record, it's synchronously of asynchronously stored to one of cluster's machines then eventually gets replicated to the rest of DB machines. Then another client requests the list or records, the request ends up served by a distant machine not yet received the replicated changes, and so the client does not see the record. Well, that's bad but not yet ugly.
Consider however that the second client hits the machine which has the record, displays it in a list, then refreshes the list and this time hits the distant machine and again does not see the record. That's very weird behavior to observe, isn't it? It might even get worse: the client successfully requests the record, starts some editing on it, then tries to store the updates to DB and this time hits the distant machine which says "I know nothing about this record you are trying to update". That's an error which the user will see while doing something completely legitimate.
So what's the common practice to guard against this?
So far, I only see three solutions.
1) Not actually a solution but rather a policy: ignore the problem and instead speed up the cluster hard enough to guarantee that 99.999% of changes will be replicated on the whole cluster in, say, 0.5 secord (it's hard to imagine some user will try to make several consecutive requests to one record in that time; he can of course issue several reading requests, but in that case he'll probably not notice inconsistency between results). And even if sometimes something goes wrong and the user faces the problem, well, we just embrace that. If the loser gets unhappy and writes a complaint to us (which will happen maybe once a week or once an hour), we just apologize and go on.
2) Introduce an affinity between user's session and a specific DB machine. This helps, but needs explicit support from the DB, and also hurts load-balancing, and invites complications when the DB machine goes down and the session needs to be re-bound to another machine (however with proper support from DB I think that's possible; say Elasticsearch can accept routing key, and I believe if the target shard goes down it will just switch the affinity link to another shard - though I am not entirely sure; but even if re-binding happens, the other machine may contain older data :) ).
3) Rely on monotonic consistency, i.e. some method to be sure that the next request from a client will get results no older than the previous one. But, as I understand it, this approach also requires explicit support from DB, like being able so pass some "global version timestamp" to a cluster's balancer, which it will compare with it's latest data on all machines' timestamps to determine which machines can serve the request.
Are there other good options? Or are those three considered good enough to use?
P.S. My specific problem right now is with Elasticsearch; AFAIK there is no support for monotonic reads there, though looks like option #2 may be available.
Apache Ignite has primary partition for a key and backup partitions. Unless you have readFromBackup option set, you will always be reading from primary partition whose contents is expected to be reliable.
If a node goes away, a transaction (or operation) should be either propagated by remaining nodes or rolled back.
Note that Apache Ignite doesn't do Eventual Consistency but instead Strong Consistency. It means that you can observe delays during node loss, but will not observe inconsistent data.
In Cassandra if using at least quorum consistency for both reads and writes you will get monotonic reads. This was not the case pre 1.0 but thats a long time ago. There are some gotchas if using server timestamps but thats not by default so likely wont be an issue if using C* 2.1+.
What can get funny is since C* uses timestamps is things that occur at "same time". Since Cassandra is Last Write Wins the times and clock drift do matter. But concurrent updates to records will always have race conditions so if you require strong read before write guarantees you can use light weight transactions (essentially CAS operations using paxos) to ensure no one else updates between your read to update, these are slow though so I would avoid it unless critical.
In a true distributed system, it does not matter where your record is stored in remote cluster as long as your clients are connected to that remote cluster. In Hazelcast, a record is always stored in a partition and one partition is owned by one of the servers in the cluster. There could be X number of partitions in the cluster (by default 271) and all those partitions are equally distributed across the cluster. So a 3 members cluster will have a partition distribution like 91-90-90.
Now when a client sends a record to store in Hazelcast cluster, it already knows which partition does the record belong to by using consistent hashing algorithm. And with that, it also knows which server is the owner of that partition. Hence, the client sends its operation directly to that server. This approach applies on all client operations - put or get. So in your case, you may have several UI clients connected to the cluster but your record for a particular user is stored on one server in the cluster and all your UI clients will be approaching that server for their operations related to that record.
As for consistency, Hazelcast by default is strongly consistent distributed cache, which implies that all your updates to a particular record happen synchronously, in the same thread and the application waits until it has received acknowledgement from the owner server (and the backup server if backups are enabled) in the cluster.
When you connect a DB layer (this could be one or many different types of DBs running in parallel) to the cluster then Hazelcast cluster returns data even if its not currently present in the cluster by reading it from DB. So you never get a null value. On updating, you configure the cluster to send the updates downstream synchronously or asynchronously.
Ah-ha, after some even more thorough study of ES discussions I found this: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html
Note how they specifically highlight the "custom value" case, recommending to use it exactly to solve my problem.
So, given that's their official recommendation, we can summarise it like this.
To fight volatile reads, we are supposed to use "preference",
with "custom" or some other approach.
To also get "read your
writes" consistency, we can have all clients use
"preference=_primary", because primary shard is first to get all
writes. This however will probably have worse performance than
"custom" mode due to no distribution. And that's quite similar to what other people here said about Ignite and Hazelcast.
Right?
Of course that's a solution specifically for ES. Reverting to my initial question which is a bit more generic, turns out that options #2 and #3 are really considered good enough for many distributed systems, with #3 being possible to achieve with #2 (even without immediate support for #3 by DB).
I've been trying to implement a call centre type system using Taskrouter using this guide as a base:
https://www.twilio.com/docs/tutorials/walkthrough/dynamic-call-center/ruby/rails
Project location is Australia, if that affects call details.
This system dials multiple numbers (workers), and I have run into an issue where phones will continue to ring even after the call has been accepted or cancelled.
ie. If Taskrouter calls Workers A and B, and A picks up first they are connected to the customer, but B will continue to ring. If B then picks up the phone they are greeted by a hangup tone. Ringing can continue for at least minutes until B picks up (I haven't checked if it ever times out).
Similar occurs if no one picks up and the call simply times out and is redirected to voicemail. As you can imagine, an endlessly ringing phone is pretty annoying, especially when there's no one on the other end.
I was able to replicate this issue using the above guide without modification (other than the minimum changes to set it up locally). Note that it doesn't dial workers simultaneously, rather it dials the first in line for a few seconds before moving to the next.
My interpretation of what is occurring is that Taskrouter is dialling workers, but not updating them when dialling should end, and simply moving on to the next stage of the workflow. It does update Worker status, so it knows if they've timed out for instance, but that doesn't update the actual call.
I have looked for any solutions to this and havent found much about it except the following:
How to make Twilio stop dialing numbers when hangup() is fired?
https://www.twilio.com/docs/api/rest/change-call-state
These don't specifically apply to Taskrouter, but suggest that a call that needs to be ended can be updated and completed.
I am not too sure if I can implement this however, as it seems to be using the same CallSid for all calls being dialled within a Workflow, makes it hard/impossible to seperate each call, and would end the active call as well.
It also just seems wrong that Taskrouter wouldn't be doing this automatically, so I wanted to ask about this before I tinker too much and break things.
Has anyone run into this issue before, or is able/unable to replicate it using the tutorial code?
When testing I've noticed the problem much more on landline numbers, which may only be because mobiles have their own timeout/redirects. VOIPs seem to immediately answer calls, so they behave a bit differently.
Any help/suggestions appreciated, thanks!
Current suggestion to work around this is to not issue the dequeue instruction immediately, but rather issue a Call instruction on the REST API when the Worker wishes to accept the Inbound Call.
This will create an Outbound Call to bridge the two calls together and thus won’t have many outbound calls for the same inbound caller at once.
Your implementation will depend on the behavior that you want to achieve:
Do you want to simul-dial both Workers?
Do you want to send
the task to both Workers and whoever clicks to Accept the Task first
will have the call routed to them?
If it's #2, this is a scenario where you're saying that the Worker should accept the Reservation (reservation.accepted) before issuing the Call.
If it's #1, you can either issue a Call Instruction or Dequeue Instruction. The key being that you provide a DequeueStatusCallbackUrl or CallStatusCallbackUrl to receive call progress events. Once one of the outbound calls is connected, you will need to complete the other associated call. So you will have to unfortunately track which outbound calls are tied to which Reservation, by using AssignmentCallbacks or EventCallbacks, to make that determination within your app.
I use 1 spot instance and would like to be emailed when prices for my instance size and region are above a threshold. I can then take appropriate action and shut down and move instance to another region if needed. Any ideas on how to be alerted to the prices?
There's two ways to go about this that I can think of:
1) Since you only have one instance, you could set a CloudWatch alarm for your instance in a region that will notify you when the spot price rises above what you're willing to pay hourly.
If you create an Alarm, and tell it to use the EstimatedCharges metric for the AmazonEC2 service, and choose a period of an hour, then you are basically telling CloudWatch to send you an email whenever the hourly spot price for your instance in the region it's running in is above your threshold for wanting to pay.
Once you get the email, you can then shut the instance down and start one up in another region, and leave it running with its own alarm.
2) You could automate the whole process with a client program that polls for changes in the spot price for your instance size in your desired regions.
This has the advantage that you could go one step further and use the same program to trigger instance shutdowns when the price rises and start another instance in a different region.
Amazon recently released a sample program to detect changes in spot prices by region and instance type: How to Track Spot Instance Activity with the Spot-Notifications Sample Application.
Simply combine that with the ec2 command-line tools to stop and start instances and you don't need to manually do it yourself.
For stand alone games, the basic game loop is (source: wikipedia)
while( user doesn't exit )
check for user input
run AI
move enemies
resolve collisions
draw graphics
play sounds
end while
But what if I develop client-server-like games, like Quake, Ragnarock, Trackmania, etc,
What the Loop/Algorithm for client and the server parts of the game?
It would be something like
Client:
while( user does not exit )
check for user input
send commands to the server
receive updates about the game from the server
draw graphics
play sounds
end
Server:
while( true )
check for client commands
run AI
move all entities
resolve collisions
send updates about the game to the clients
end
Client:
connect to server
while( user does not exit && connection live)
check for user input
send commands to the server
estimate outcome and update world data with 'best guess'
draw graphics
play sounds
receive updates about the game from the server
correct any errors in world data
draw graphics
play sounds
end
Server:
while( true )
check for and handle new player connections
check for client commands
sanity check client commands
run AI
move all entities
resolve collisions
sanity check world data
send updates about the game to the clients
handle client disconnects
end
The sanity checks on the client commands and world data are to remove any 'impossible' situations caused either by deliberate cheating (moving too fast, through walls etc) or lag (going through a door that the client thinks is open, but the server knows is closed, etc).
In order to handle lag between the client and server the client has to make a best guess about what will happen next (using it's current world data and the client commands) - the client will then have to handle any discrepancies between what it predicted would happen and what the server later tells it actually happened. Normally this will be close enough that the player doesn't notice the difference - but if lag is significant, or the client and server are out of synch (for example due to cheating), then the client will need to make an abrupt correction when it receives data back from the server.
There are also lots of issues regarding splitting sections of these processes out into separate threads to optimise response times.
One of the best ways to start is to grab an SDK from one of the games that has an active modding community - delving into how that works will provide a good overview of how it should be done.
It really isn't a simple problem. At a most basic level you could say that the network provides the same data that the MoveEnemies part of the original loop did. So you could simply replace your loop with:
while( user doesn't exit )
check for user input
run AI
send location to server
get locations from server
resolve collisions
draw graphics
play sounds
end while
However you need to take into account latency so you don't really want to pause your main loop with calls to the network. To overcome this it is not unusual to see the networking engine sitting on a second thread, polling for data from the server as quickly as it can and placing the new locations of objects into a shared memory space:
while(connectedToNetwork)
Read player location
Post player location to server
Read enemy locations from server
Post enemy locations into shared memory
Then your main loop would look like:
while( user doesn't exit )
check for user input
run AI
read/write shared memory
resolve collisions
draw graphics
play sounds
end while
The advantage of this method is that your game loop will run as fast as it can, but the information from the server will only be updated when a full post to and from the server has been completed. Of course, you now have issues with sharing objects across threads and the fun with locks etc that comes with it.
On the server side the loop is much the same, there is one connection per player (quite often each player is also on a separate thread so that the latency of one won't affect the others) for each connection it will run a loop like
while (PlayerConnected)
Wait for player to post location
Place new location in shared memory
When the client machine requests the locations of enemies the server reads all the other players locations from the shared block of memory and sends it back.
This is a hugely simplified overview and there are many more tweaks that will improve performance (for instance it may be worth the server sending the enemy positions to the client rather than the client requesting them) and you need to decide where certain logically decisions are made (does the client decide whether he has been shot because he has the most up to date position for himself, or the server to stop cheating)
The client part is basically the same, except replace
run AI
move enemies
resolve collisions
with
upload client data to server
download server updates
And the server just does:
while (game is running)
{
get all clients data
run AI
resolve collisions
udpate all clients
}
You can use almost the same thing, but the most of you logic would be on the server, you can put timers, sounds, grafics, and other UI components on the client app.
Any business rule (AI,Movements) goes in the server side.
A very useful and I would argue pertinent paper to read is this one: Client-Server Architectures
I gave it a read and learned a lot from it, a lot of sense was made. By separating out your game into strategically defined components or layers, you can create a more maintainable architecture. The program is easier to code, and more robust than a conventional linear program model like the one you've described.
That thought process came out in a previous post here about using a "Shared Memory" to talk between different parts of the program, and so overcoming the limitations of having a single thread and step-followed-step game logic.
You can spend months working on the perfect architecture and program flow, read a single paper and realise you've been barking up the wrong tree.
tldr; read it.
I'm currently trying to build an application that inherently needs good time synchronization across the server and every client. There are alternative designs for my application that can do away with this need for synchronization, but my application quickly begins to suck when it's not present.
In case I am missing something, my basic problem is this: firing an event in multiple locations at exactly the same moment. As best I can tell, the only way of doing this requires some kind of time synchronization, but I may be wrong. I've tried modeling the problem differently, but it all comes back to either a) a sucky app, or b) requiring time synchronization.
Let's assume I Really Really Do Need synchronized time.
My application is built on Google AppEngine. While AppEngine makes no guarantees about the state of time synchronization across its servers, usually it is quite good, on the order of a few seconds (i.e. better than NTP), however sometimes it sucks badly, say, on the order of 10 seconds out of sync. My application can handle 2-3 seconds out of sync, but 10 seconds is out of the question with regards to user experience. So basically, my chosen server platform does not provide a very reliable concept of time.
The client part of my application is written in JavaScript. Again we have a situation where the client has no reliable concept of time either. I have done no measurements, but I fully expect some of my eventual users to have computer clocks that are set to 1901, 1970, 2024, and so on. So basically, my client platform does not provide a reliable concept of time.
This issue is starting to drive me a little mad. So far the best thing I can think to do is implement something like NTP on top of HTTP (this is not as crazy as it may sound). This would work by commissioning 2 or 3 servers in different parts of the Internet, and using traditional means (PTP, NTP) to try to ensure their sync is at least on the order of hundreds of milliseconds.
I'd then create a JavaScript class that implemented the NTP intersection algorithm using these HTTP time sources (and the associated roundtrip information that is available from XMLHTTPRequest).
As you can tell, this solution also sucks big time. Not only is it horribly complex, but only solves one half the problem, namely giving the clients a good notion of the current time. I then have to compromise on the server, either by allowing the clients to tell the server the current time according to them when they make a request (big security no-no, but I can mitigate some of the more obvious abuses of this), or having the server make a single request to one of my magic HTTP-over-NTP servers, and hoping that request completes speedily enough.
These solutions all suck, and I'm lost.
Reminder: I want a bunch of web browsers, hopefully as many as 100 or more, to be able to fire an event at exactly the same time.
Let me summarize, to make sure I understand the question.
You have an app that has a client and server component. There are multiple servers that can each be servicing many (hundreds) of clients. The servers are more or less synced with each other; the clients are not. You want a large number of clients to execute the same event at approximately the same time, regardless of which server happens to be the one they connected to initially.
Assuming that I described the situation more or less accurately:
Could you have the servers keep certain state for each client (such as initial time of connection -- server time), and when the time of the event that will need to happen is known, notify the client with a message containing the number of milliseconds after the beginning value that need to elapse before firing the event?
To illustrate:
client A connects to server S at time t0 = 0
client B connects to server S at time t1 = 120
server S decides an event needs to happen at time t3 = 500
server S sends a message to A:
S->A : {eventName, 500}
server S sends a message to B:
S->B : {eventName, 380}
This does not rely on the client time at all; just on the client's ability to keep track of time for some reasonably short period (a single session).
It seems to me like you're needing to listen to a broadcast event from a server in many different places. Since you can accept 2-3 seconds variation you could just put all your clients into long-lived comet-style requests and just get the response from the server? Sounds to me like the clients wouldn't need to deal with time at all this way ?
You could use ajax to do this, so yoǘ'd be avoiding any client-side lockups while waiting for new data.
I may be missing something totally here.
If you can assume that the clocks are reasonable stable - that is they are set wrong, but ticking at more-or-less the right rate.
Have the servers get their offset from a single defined source (e.g. one of your servers, or a database server or something).
Then have each client calculate it's offset from it's server (possible round-trip complications if you want lots of accuracy).
Store that, then you the combined offset on each client to trigger the event at the right time.
(client-time-to-trigger-event) = (scheduled-time) + (client-to-server-difference) + (server-to-reference-difference)
Time synchronization is very hard to get right and in my opinion the wrong way to go about it. You need an event system which can notify registered observers every time an event is dispatched (observer pattern). All observers will be notified simultaneously (or as close as possible to that), removing the need for time synchronization.
To accommodate latency, the browser should be sent the timestamp of the event dispatch, and it should wait a little longer than what you expect the maximum latency to be. This way all events will be fired up at the same time on all browsers.
Google found the way to define time as being absolute. It sounds heretic for a physicist and with respect to General Relativity: time is flowing at different pace depending on your position in space and time, on Earth, in the Universe ...
You may want to have a look at Google Spanner database: http://en.wikipedia.org/wiki/Spanner_(database)
I guess it is used now by Google and will be available through Google Cloud Platform.