Datacenter infrastructure - Planning a high-availability infrastructure - infrastructure

I'm new to Datacenter infrastructure and I'm solving some exercises. I came across a question that I can't answer so I'm hero to ask some help. The question is:
1) In planning a high availability infrastructure with 20 machines on wich I'm gonna need a SAN, someone propose me 3 distinct solutions:
a) A solution based on optical fibre switches to connect the machines
to the SAN without redundancy.
b) A solution completely based on iscsi;
c) A solution that allows me to connect only to 8 machines to a SAN with redundancy;
Since none of this solutions is the ideal, which could be the solution/changes that could be recomended in order to ensure the access to storage from all the machines? Why
Please some help,
Thanks

Ok, let me give you an answer that doesn't include a, b or c.
What I would do is go with a virtualized architecture (I will use VM-Ware as an example here) where you move as many as possible of the machines into your virtualized solution.
On a hardware level you set up a SAN with dual SAN-Switches. The switches are wired to Fibre-Channel HBA:s per machine. The actual machines would probably be Blades in a Blade Center.
Each Blade should be stuffed with as much power as you can, 2 x 10-Core CPU:s is a good choice. In that way you can fit as many virtual machines in as few blades as possible.
Since you are talking H/A you most likely want to have redundancy enough to loose one full Blade and have the machines restarted on the remaining Blades without affecting the performance too badly.
Online Service / Maintencance can be handled by products like V-Motion.
Based on this set up, you can continue building a H/A clustered environment.

You can use switches with more ports, or you can use a distribution topology where additional switches provide additional conectivity for additional servers.
server === \
server ==== \
server =========FC Switch 1 ===two X connections FCSwitch3
server =========FC Switch 2 ===two X connections FCSwitch4
server ==== /
server === /

Related

Fast delivery webpages on shared hosting

I have a website (.org) for a project of mine on LAMP hosted on a shared plan.
It started very small but now I extended this community to other states (in US) and it's growing fast.
I had 30,000 (or so) visits per day (about 4 months ago) and my site was doing fine and today I reached 100,000 visits.
I want to make sure my site will load fast for everyone and since it's not making any money I can't really move it to a private server. (It's volunteer work).
Here's my setup:
- Apache 2
- PHP 5.1.6
- MySQL 5.5
I have 10 pages PER state and on each page people can contribute, write articles, like, share, etc... on few pages I can hit 10,000 per hours during lunch time and the rest of the day it's quiet.
All databases are setup properly (I personally paid a DBA expert to build the code). I am pretty sure the code is also good. Now, I can make page faster if I use memcached but the problem is I can't use it since I am on a shared hosting.
Will the MySQL be able to support that many people, with lots of requests per minutes? or I should create a fund to move to a private server and install all the tools I need to make it fast?
Thanks
To be honest there's not much you can do on shared hosting. There's a reason why they are cheap ... they limit you to do stuff like you want to do.
Either you move to a VPS that allow memcache (which are cheaper) and you put some google ads OR you keep going on your shared hosting using a pre-generated content system.
VPS can be very cheap (look for coupons) and you can install what ever you want since you are root.
for example hostmysite.com with the coupon: 50OffForLife you pay 20$ per month for life ... vs a 5$ shared hosting ...
If you want to keep the current hosting, then what you can do is this:
Pages are generated by a process (cronjob or on the fly), everytime someone write a comment or make an update. This process start and fetch all the data on the page and saves it to the web page.
So let say you have a page with comments, grab the contents (meta, h1, p, etc..) and the comments and save both into the file.
Example: (using .htaccess - based on your answer you are familiar with this)
/topic/1/
If the file exists, then simply echo ...
if not:
select * from pages where page_id = 1;
select * from comments where page_id = 1;
file_put_contents('/my/public_html/topic/1/index.html', $content);
Or something along these lines.
Therefore, saving static HTML will be very fast since you don't have to call any DB. It just loads the file once it's generated.
I know I'm stepping on unstable ground providing an answer to this question, but I think it is very indicative.
Pat R Ellery didn't provide enough details to do any kind of assessment, but the good news there can't be enough details. Explanation is quite simple: anybody can build as many mental model as he wants, but real system will always behave a bit differently.
So Pat, do test your system all the time, as much as you can. What you are trying to do is to plan the capacity of your solution.
You need the following:
Capacity test - To determine how many users and/or transactions a given system will support and still meet performance goals.
Stress test - To determine or validate an application’s behavior when it is pushed beyond normal or peak load conditions.
Load test - To verify application behavior under normal and peak load conditions.
Performance test - To determine or validate speed, scalability, and/or stability.
See details here:
Software performance testing
Types of Performance Testing
In the other words (and a bit primitive): if you want to know your system is capable to handle N requests per time_period simulate N requests per time_period and see the result.
(image source)
Another example:
There are a lot of tools available:
Load Tester LITE
Apache JMeter
Apache HTTP server benchmarking tool
See list here

memcached usage patterns

I'm planning the injection of a caching system within my website, will use it in different layers (data, presentation and may be somewhere else). Being my stack LAMP and my infrastructure 100% cloud on AWS, I thought the natural choice would be Amazon Elasticache (a managed installation of memcached). But...
Surprisingly - for me - I discovered memcached completely lacks of dependency management. I don't need "advanced" stuffs like ASP.Net cache SqlDependency or FileDependency, but memcached doesn't offer an easy other-key dependency neither, something pretty useful for building a dependency tree that greatly simplify the invalidation process.
So, as I know memcached is used in many complex systems, am I missing something? Are there usage patterns that make this lack irrelevant?
thanks
UPDATE
as asked, I add some pseudo code to clarify what I mean
dependency = 'ROOT_KEY';
cache:set(dependency, 0, NEVER_EXPIRE);
expire = 600;
cache:set('key1', obj1, expire, dependency);
cache:set('key2', obj2, expire, dependency);
...
cache:set('keyN', objN, expire, dependency);
//later, when I have to invalidate
cache:remove(dependency); //this will cause all keyX to be invalidated too
Based on the example in your question, memcached (and thus Elastic Cache) does not support any sort of key metadata like you are looking for by which you could relate such keys and operate on them as a group.
I suppose if you had only a handful of different "dependencies" you could simply utilize multiple elastic cache instances, which would allow you to invalidate all items within each instance/dependency simultaneously. This of course might end up costing you more in terms of AWS hardware costs then your would like since you can only increment your cache sizes in discrete amounts. This also would eliminate the ability for you to do a cache lookup without knowing the dependency/instance upon which the lookup is to occur.
For what you are trying to do, you might be able to use something like memory tables in MySQL/RDS if you are looking for more of a works-out-of-the-box type of solution. Of course you would not want to use RDS high-availibility features or point-in-time restoration, as these will break, since they require writing to disk. You would basically need to have a standalone RDS instance doing nothing but these memory tables.
It seems none of these options however is really an exact fit for what you are looking to do, so you might need to look into either adjusting your approach (if you want to use basic AWS components), or deploying an alternate caching system on EC2.

Scaling Redis for Online Friends List

I'm having trouble thinking of how to implement an online friends list with Ruby and Redis (or with any NoSQL solution), just like any chat IM i.e. Facebook Chat. My requirements are:
Approximately 1 million total users
DB stores only the user friends' ids (a Set of integer values)
I'm thinking of using a Redis cluster (which I actually don't know too much about) and implement something along the lines of http://www.lukemelia.com/blog/archives/2010/01/17/redis-in-practice-whos-online/.
UPDATE: Our application really won't use Redis for anything else other than potentially for the online friends list. Additionally, it's really not write heavy (most of our queries, I anticipate, will be reads for online friends).
After discussing this question in the Redis DB Google Groups, my proposed solution (inspired by this article) is to SADD all my online users into a single set, and for each of my users, I create a user:#{user_id}:friends_list and store their friends list as another set. Whenever a user logs in, I would SINTER the user's friends list and the current online users set. Since we're read heavy and not write heavy, we would use a single master node for writes. To make it scale, we'd have a cluster of slave nodes replicating from the master, and our application will use a simple round-robin algorithm to do the SINTER
Josiah Carlson suggested a more refined approach:
When a user X logs on, you intersect their friends set with the
online users to get their initial online set, and you keep it with a
Y-minute TTL (any time they do anything on the site, you could update
the expire time to be Y more minutes into the future)
For every user in that 'online' set, you find their similar
'initial set', and you add X to the set
Any time a user Z logs out, you scan their friends set, and remove
Z from them all (whether they exist or not)
what about xmpp/jabber? They built-in massive concurrent use and highly reliable, all you need to do is the make an adapter for the user login stuff.
You should consider In Memory Data Grids, these platforms are targeted to provide such scalability.
And usually can easily be deployed as a cluster on any hardware of cloud.
See: GigaSpaces XAP, vMware GemFire or Oracle Coherence.
If you're looking for free edition XAP provides Community Edition.

RavenDB - slow write/save performance?

I started porting a simple ASP.NET MVC web app from SQL to RavenDB. I noticed that the pages were faster on SQL than on RavenDB.
Drilling down with Miniprofiler it seems the culprit is the time it takes to do: session.SaveChanges (150-220ms). The code for saving in RavenDB looks like:
var btime = new TimeData() { Time1 = DateTime.Now, TheDay = new DateTime(2012, 4, 3), UserId = 76 };
session.Store(btime);
session.SaveChanges();
Authentication Mode: When RavenDB is running as a service, I assume it using "Windows Authentication". When deployed as an IIS application I just used the defaults - which was "Windows Authentication".
Background: The database machine is separate from my development machine which acts as the web server. The databases are running on the same database machine. The test data is quite small - say 100 rows. The queries are simple returning an object with 12 properties 48 bytes in size. Using fiddler to run a WCAT test against RavenDB generated higher utilization on the database machine (vs SQL) and far fewer pages. I tried running Raven as a service and as an IIS application, but didn't see a noticible difference.
Edit
I wanted to ensure it wasn't a problem with a) one of my machines or b) the solution I created. So, decided to try testing it on Appharbor using another solution created by Michael Friis: RavenDN sample app and simply add Miniprofiler to that solution. Michael is one of the awesome guys at Apharbor and you can download the code here if you want to look at it.
Results from Appharbor
You can try it here (for now):
Read: (7-12ms with a few outliers at 100+ms).
Write/Save: (197-312ms) * WOW that's a long time to save *. To test the save, just create a new "thingy" and save it. You might want to do it at least twice since the first one usually takes longer as the application warms up.
Unless we're both doing something wrong, RavenDB is very slow to save - around 10-20x slower to save than read. Given that it re-indexes asynchronously, this seems very slow.
Are there ways to speed it up or is this to be expected?
First - Ayende is "the man" behind RavenDB (he wrote it). I have no idea why he's not addressing the question, although even in the Google groups, he seems to chime in once to ask some pointed questions, but rarely comes back to provide a complete answer. Maybe he's working hard to to get RavenHQ off the ground?!?
Second - We experienced a similar problem. Below's a link to a discussion on Google Groups that may be the cause:
RavenDB Authentication and 401 Response.
A reasonable question might be: "If these recommendations fix the problem, why doesn't RavenDB work that way out of the box?" or at least provide documentation about how to get decent write performance.
We played for a while with the suggestions that were made in the thread above and the response-time did improve. In the end though, we switched back to MySQL because it's well-tested, we ran into this problem early (luckily) which caused concern that we might hit more problems and, finally, because we did not have the time to:
fully test whether it fixed the performance problems we saw on the RavenDB Server
investigate and test the implications of using UnsafeAuthenticatedConnectionSharing & pre-authentication.
To summarize Ayende's response you're actually testing the summation of network latency and authentication chatter. As Joe pointed out there's ways you can optimize the authentication to be less chatty. This does however arguably reduce security, clearly Microsoft built security to be secure first and performance secondary. You as the user of RavenDB can choose if the default security model is too robust as it arguably is for protected server-to-server communication.
RavenDB is clearly defined to be READ orientated. 10-20x slower for writes than reads is entirely acceptable because writes are full ACID and transactional.
If write speed is your limiting factor with RavenDB you've likely not modeled transaction boundaries properly. That you are saving documents that are too similar to RDBMS table rows and not actually well modeled documents.
Edit: Reading your question again and looking into the background section, you explicitly define your test conditions to be an optimal scenario for SQL Server while being one of the least efficient methods for RavenDB. For data that size, that's almost certainly 1 document if it would be real world usage.

How many users can an Amazon EC2 instance serve?

The use will be to serve dynamic content from data on S3. You can make up any definition of "normal" you think is normal.
What about small, medium, and large instances?
Ok. People want some data to work with, so here:
The webservice is about 100kb at start, and uses AJAX, so it doesnt have to reload the whole page much, if at all. When it loads the page, it will send between 20 - 30 requests to the database (S3) to get small chunks of text (like comments). The average user will stay on the page for 10 min, translating to about 100kb at offset, and about 400kb more through requests. Assume that hit volume is the same at night and day.
Depends on with what and how you're serving the content, not to mention how often those users will be accessing it, the size and type of the content, etc. There's essentially not one bit of information you've provided that allows us to answer your question in any sort of meaningful way.
As others have said, this might require testing under your exact conditions. Fortunately, if you're willing to go as far as setting up a test version of your server setup, you can spawn instances that simulate users. Create a bunch of these test instances, and run Apache's ab benchmarking tool on them, directing them at your test site. If the instances are within the same availability zone as your test site, you won't be charged for bandwidth, just by the hour for the running instances. Run a test for under an hour, shutting down the test instances afterward, and it will cost you very little to organize this stress test.
As one data point, running the Apache ab tool locally on my small instance, which is serving up a database-heavy Drupal site, it reported the ability of the server to handle 45-60 requests per second. I'm assuming that ab is a reasonable tool for benchmarking, and I might be wrong there, but this is what I'm seeing.
As a suggestion, not knowing too much about your particular case, I'd move your database to an Elastic Block Store (EBS) volume. S3 is not really intended to host databases, and the latency it has might kill your performance. EBS volumes can easily be snapshotted to S3 for backup, if that's what you're worried about.
One can argue that properly designed, it doesn't matter how many users an instance can support. Ideally, when your instance is saturated, you fire up a new instance to manage the traffic.
Obviously, this grossly complicates the deployment and design.
But beyond that, an EC2 instance a low end Linux box, effectively (depending on which model you choose).
Let's rephrase the question, how many users do you want to support?

Resources