I would like like some advices on how to set up a server web in HA with a php application using virtual machines with a RED HAT linux OS.
The idea is to have two virtual server web sharing a common document root with NFS or iSCSI plus other two MariaDB databases replicating the data.
I have some documentation to follow anyway I'd like to know your opinion in particular about how to cope with the replication of the databases which must be redundant.
Many Thanks
Riccardo
Do not try to share files. The code does not know how to coordinate things between two instances of mysqld touching the same data files.
Then you talk about replication. This involves separate data files. But, if you put them on the same physical drive, what is the advantage? If the drive crashes, you lose both.
Read more about HA solutions, and make a priority list of what things "keep you up at night". Drive failure, motherboard failure, network corruption, floods, earthquakes, bats in the belfry, etc.
Related
I am running a local ElasticSearch server from my own home, but would like access to the content from outside. Since I am on a dynamic IP and besides that do not feel comfortable opening up ports to the outside, I would like to rent a VPS somewhere, setup ElasticSearch and let this server be a read only copy of the one I have at home.
As I understand it, this should be possible - however I have been unsuccessful at creating any usable version that lets another server be a read-only version of my home ES-server.
Can anyone point me to a piece of information or create a guide, that would help me to set this up? I am rather known to ES-usage, however my setup-skills are still vague.
As I understand it, this should be possible
It might be possible with some workarounds, but it's definitely not built for that:
One cluster needs to be in one physical region; mainly because of latency and the stability of the network connection.
There are no read-only versions. You could only allow read access to a node (via a reverse proxy or the security plugin), but that's only a workaround.
We want to use an ActiveMQ master/slave configuration based on a shared file system on Amazon EC2 - that's the final goal. Operating system should be Ubuntu 12.04, but that shouldn't make too much difference.
Why not a master/slave configuration based on RDS? We've tried that and it's easy to set up (including multi-AZ). However, it is relatively slow and the failover takes approximately three minutes - so we want to find something else.
Which shared file system should we use? We did some research and came to the following conclusion (which might be wrong, so please correct me):
GlusterFS is often suggested and should be supporting multi-AZs fine.
NFSv4 should be working (while NFSv3 is said to corrupt the file system), but I didn't see too many references to it on EC2 (rather: asked for NFS, got the suggestion to use GlusterFS). Is there any particular reason for that?
Ubuntu's Ceph isn't stable yet.
Hadoop Distributed File System (HDFS) sounds like overkill to me and the NameNode would again be a single point of failure.
So GlusterFS it is? We found hardly any success stories. Instead rather dissuasive entries in the bug tracker without any real explanation: "I would not recommend using GlusterFS with master/slave shared file system with more than probably 15 enqueued/dequeued messages per second." Does anyone know why or is anyone successfully using ActiveMQ on GlusterFS with a lot of messages?
EBS or ephemeral storage? Since GlusterFS should replicate all data, we could use the ephemeral storage or are there any advantages of using EBS (IMHO snapshots are not relevant for our scenario).
We'll probably try out GlusterFS, but according to Murphy's Law we'll run into problems at the worst possible moment. So we'd rather try to avoid that by (hopefully) getting a few more opinions on this. Thanks in advance!
PS: Why didn't I post this on ServerFault? It would be a better fit, but on SO there are 10 times more posts about this topic, so I stuck with the flock.
Just an idea.... but with activemq 5.7 (or maybe already 5.6) you can have pluggable lockers (http://activemq.apache.org/pluggable-storage-lockers.html). So it might be an option to use the filesystem as storage and RDS as just a locking mechanism. Note I have never tried this before.
I recently brought up a cluster on EC2, and I felt like I had to invent a lot of things. I'm wondering what kinds of tools, patterns, ideas are out there for how to deal with this.
Some context:
I had 3 different kinds of servers, so first I created AMIs for each of them. The first AMI had zookeeper, so step one in deploying the system was to get the zookeeper server running.
My script then made a note of the mapping between EC2's completely arbitrary and unpredictable hostnames, and the zookeeper server.
Then as I brought up new instances of the other 2 kinds of servers, the first thing I would do is ssh to the new server, and add the zookeeper server to its /etc/hosts file. Then as the server software on each instance starts up, it can find zookeeper.
Obviously this is a problem that lots of people have to solve, and it probably works a little bit differently in different clouds.
Are there products that address this concept? I was pretty surprised that EC2 didn't provide some kind of way to tie your own name to its name.
Thanks for any ideas.
How to do some service discovery on Amazon EC2 seems to have some good options.
I think you might want to look at http://puppetlabs.com/mcollective/introduction/ and the suite of tools from http://puppetlabs.com in general.
From the site:
The Marionette Collective AKA MCollective is a framework to build server orchestration or parallel job execution systems.
Primarily we’ll use it as a means of programmatic execution of Systems Administration actions on clusters of servers. In this regard we operate in the same space as tools like Func, Fabric or Capistrano.
I am fairly certain mcollective was built to solve exactly the problem you are trying to address. But, be forewarned, it's not a DNS-based solution, it's a method of addressing arbitrarily large and arbitrarily tagged groups of hosts.
Assume there are several computers, distributed in the same network.
I install a program on all of them, and so there is a cluster.
and I can log in it, run my application(like web server , db server, and so on).
I don't need to configure the IPs, don't need to balance the loading.
Is there any software like this now?
edit:
OK, I want to build a cluster that can provide an enterprise web server(also db server store data), we have lots of PC, they are only running a small program now(for shop floor work-flow control). I want to use the additional CPU and Disk resource to build a service.
What purpose are you planning to serve with your cluster? That will determine the tool you want to use.
That being said, you will have to do some configuration- like IPs, Authentication Mechanism, et cetra. If you don not tell it what you want, how will it know?
In general, if the application is not designed to be clustered, you will have more pain than advantages.
Is current load too high for current single box hardware?
I have a load balanced enviorment with over 10 web servers running IIS. All websites are accessing a single file storage that hosts all the pictures. We currently have 200GB of pictures - we store them in directories of 1000 images per directory. Right now all the images are in a single storage device (RAID 10) connected to a single server that serves as the file server. All web servers are connected to the file server on the same LAN.
I am looking to improve the architecture so that we would have no single point of failure.
I am considering two alternatives:
Replicate the file storage to all of the webservers so that they all access the data locally
replicate the file storage to another storage so if something happens to the current storage we would be able to switch to it.
Obviously the main operations done on the file storage are read, but there are also a lot of write operations. What do you think is the preferred method? Any other idea?
I am currently ruling out use of CDN as it will require an architecture change on the application which we cannot make right now.
Certain things i would normally consider before going for arch change is
what are the issues of current arch
what am i doing wrong with the current arch.(if this had been working for a while, minor tweaks will normally solve a lot of issues)
will it allow me to grow easily (here there will always be a upper limit). Based on the past growth of data, you can effectively plan it.
reliability
easy to maintain / monitor / troubleshoot
cost
200GB is not a lot of data, and you can go in for some home grown solution or use something like a NAS, which will allow you to expand later on. And have a hot swappable replica of it.
Replicating to storage of all the webservers is a very expensive setup, and as you said there are a lot of write operations, it will have a large overhead in replicating to all the servers(which will only increase with the number of servers and growing data). And there is also the issue of stale data being served by one of the other nodes. Apart from that troubleshooting replication issues will be a mess with 10 and growing nodes.
Unless the lookup / read / write of files is very time critical, replicating to all the webservers is not a good idea. Users(of web) will hardly notice the difference of 100ms - 200ms in loadtime.
There are some enterprise solutions for this sort of thing. But I don't doubt that they are expensive. NAS doesn’t scale well. And you have a single point of failure which is not good.
There are some ways that you can write code to help with this. You could cache the images on the web servers the first time they are requested, this will reduce the load on the image server.
You could get a master slave set up, so that you have one main image server but other servers which copy from this. You could load balance these, and put some logic in your code so that if a slave doesn’t have a copy of an image, you check on the master. You could also assign these in priority order so that if the master is not available the first slave then becomes the master.
Since you have so little data in your storage, it makes sense to buy several big HDs or use the free space on your web servers to keep copies. It will take down the strain on your backend storage system and when it fails, you can still deliver content for your users. Even better, if you need to scale (more downloads), you can simply add a new server and the stress on your backend won't change, much.
If I had to do this, I'd use rsync or unison to copy the image files in the exact same space on the web servers where they are on the storage device (this way, you can swap out the copy with a network file system mount any time).
Run rsync every now and then (for example after any upload or once in the night; you'll know better which sizes fits you best).
A more versatile solution would be to use a P2P protocol like Bittorreent. This way, you could publish all the changes on the storage backend to the web servers and they'd optimize the updates automatcially.