We have an application that can use only one MQ queue manager to put/get messages, so to get High availability we are using an active/passive cluster with IP & File system sharing between 2 linux servers and veritas service taking care of failing over the resources. While this is a working soulution we are running into multiple issues and being limited to using physical servers over a VM.
I'm looking for a solution that can replace Veritas, something that can be run on VM and takes care of moving filesystem/IP between servers seamlessly.
There is a service in MQ Advanced that can do this but is considerably expensive than base, so not really interested.
Anyone with similar MQ architecture or know a product that can do such failover please share any details.
Thanks!
Related
I have the same application running on two WAS clusters. Each cluster has 3 application servers based in different datacenters. In front of each cluster are 3 IHS servers.
Can I specify a primary cluster, and a failover cluster within the plugin-cfg.xml? Currently I have both clusters defined within the plugin, but I'm only hitting 1 cluster for every request. The second cluster is completely ignored.
Thanks!
As noted already the WAS HTTP server plugin doesn't provide the function your're seeking as documented in the WAS KC http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/rwsv_plugincfg.html?lang=en
assuming that by "failover cluster" what is actually meant is "BackupServers" in the plugin-cfg.xml
The ODR alternative mentioned previously likely isn't an option either, this because the ODR isn't supported for use in the DMZ (it's not been security hardened for DMZ deployment) http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/twve_odoecreateodr.html?lang=en
From an effective HA/DR perspective what you're seeking to accomplish should handled at the network layer, using the global load balancer (global site selector, global traffic manager, etc) that is routing traffic into the data centers, this is usually accomplished by setting a "site cookie" using the load balancer
This is by design. IHS, at least at the 8.5.5 level, does not allow for what you are trying to do. You will have to implement such level of high availability in a higher level in your topology.
There are a few options.
If the environemnt is relatively static, you could post-process plugin-cfg.xml and combine them into a single ServerCluster with the "dc2" servers listed as <BackupServer>'s in the cluster. The "dc1" servers are probably already listed as <PrimaryServer>'s
BackupServers are only used when no PrimaryServers are reachable.
Another option is to use the Java On-Demand Router, which has first-class awareness of applications running in two cells. Rules can be written that dictate the behavior of applications residing in two clusters (load balancer, failover, etc.). I believe these are "ODR Routing Rules".
I'm about to build a new system and I want maximum availability! I'll have to use Windows!
I will have clients talking to my system using webservices. I'll also get data from surrounding systems. This data is delivered using messaging, MQ-series and MSMQ.
The system will produce some data that is sent back to the surrounding systems using queues.
After new data has come to the system different processes will use this data to do diffrent tasks, like printing, writing to databases etc.
To achieve high availablity I'm planning to have two versions of the system running in parallel on two different machines. The clients will try to use the first server thats responds correctly.
I think an ideal soultion would be that the incomming data from anyone of the two servers is placed in a COMMON queue(on a third machine?). Data in the queue can be picked up by processes on both servers(think producer-consumer pattern).
I think that maybe NServiceBus will suits my needs. I have a few questions according to the above.
Can a queue be shared between two servers? I dont want data to be stuck on a server if its gets down. I that case I want the other server to keep processing.
Can two(or more) "consumers"/processes on different machines pick data from a common queue?
Any advice is welcome!
The purpose of NSB distributor is not to address availability issues but to address scale issues, distributors help scaling out systems at a low cost.
By looking at the description, your system consist of WebService endpoits, multiple databases and queuing infrastructure. If you want to achieve complete high-availability you will have to make sure there are no single points of failures. In order to do that you will need,
A load balanced web farm for web service endpoints (2 or more servers)
Application cluster for queues and applications that relies on those queues.
Highly available database server, again clustered.
On top of everything a good SAN.
But if you are referring to being available to consumers, you just have to make sure target queues and webservice endpoints are available. And making sure the overall architecture promotes deferred execution.
Two or more applications can read a MSMQ queue remotely but thats something you don't want to do since it's based on DTC. And that's a real performance killer.
Some references
[http://blogs.msdn.com/b/clustering/archive/2012/05/01/10299698.aspx][1]
[http://msdn.microsoft.com/en-us/library/ms190202.aspx][2]
In short you will want to use the distributor... http://support.nservicebus.com/customer/portal/articles/859556-load-balancing-with-the-distributor
The key thing here is that the distributor node is a single point of failure so you want to run it on a cluster.
So I have seen a number of references and links from a year +/- ago asking about support for NServiceBus on Amazon EC2. Wondering if anyone out there has attempted to do anything with this recently?
I have seen the following articles/posts but fear the information and related links are dated.
A Less Than Positive Experience w/NServiceBus on Amazon EC2
The right idea, any movement on this?
Azure Love, but no Amazon?
I see a lot of chatter on the NServiceBus forums about the "next version" having a big focus on support for the cloud (at the time the current version was 2.5). I have a scenario where I would like to run NServiceBus w/MSMQ or RabbitMQ on a cluster of Amazon EC2 instances but it concerns me that there is not more discussion around people actually using NServiceBus on Amazon.
Anyone doing it successfully or have reasons to avoid considering it?
[EDIT] - Does anyone know if using reserved instances gets around the issue with EC2 restarts described in the article above?
There are different ways to successfully run NServiceBus on EC2. Picking which option to go with requires weighing the balance of cost, scalability, & operational overhead.
MSMQ
NServiceBus runs fine on EC2 with MSMQ, but there are a few obstacles that need attention. The main issue is that the computer names / DNS names on EC2 instances change during each restart. This is an issue because the computer name is used when sending messages to endpoint as well when subscribing to messages. One simple option to overcome this overhead is to attach an elastic IP to the instance & use its DNS name. The benefit is that it's pretty easy to do this. The downside is that you're only given 5 Elastic IPs by default. You can ask for more & Amazon is usually pretty liberal with handing out extra Elastic IPs. You will also be limited in how you scale. For instance, you won't be able to simply plug into the elastic scaling features of AWS. You also have to deal with backups. I would put the queues on a separate EBS volume & take snapshots on an interval.
I'd pick this option if you want to use messaging, but you don't have really crazy SLA's, you don't need to scale up and down machines quickly, & you don't need to deal with high message volumes. This is the case with most projects.
Amazon SQS
You could write a custom transport for SQS. The benefit of using NSB with SQS remote queues is that you get highly available queues, you don't have to manage them on your EC2 instances & you don't have to worry about backups. It's also easier to leverage elastic scaling with this approach. The downside is that each read costs $$$, so it may not be economically feasible to read at the same speeds as MSMQ or RabbitMQ - although this problem is mitigated by support for long polling and the ability to download many messages in a single call. Another downside is that it doesn't support distributed transactions with DTC. If you're using NServiceBus 5 or later, you could implement the Outbox pattern in your transport as described here, to ensure your messages are still processed only once. Otherwise it's up to you to ensure that your endpoints and handlers have idempotency solutions in place. You can play around with speed vs cost by adjusting the polling intervals of each of your endpoints & perhaps even have a back-off strategy where you decrease your polling intervals if you haven't received messages in a while. You will also have to worry about the size of your messages, as SQS has a small size limitation (256 K). You don't hit this in most messages.
I'd pick this option if read / write speeds aren't an issue, but you don't want to worry about operationally supporting your queuing infrastructure.
RabbitMQ
I haven't personally played with RabbitMQ on EC2, but a quick search came up with a few articles on how to get it up and running on an EC2 instance. There is a mature RabbitMQ transport available and it supports guaranteed once-only processing of messages as of NServiceBus version 5, as described in the link above. This would be cheaper to operate than SQS & I've heard that it's easier to cluster than MSMQ. Finally, like MSMQ, you would have to come up with a backup strategy (probably using snapshots).
Mixed
Nobody says that you have to pick one queuing system. You could use SQS for endpoints that need high availability & you don't mind paying the $$$, then use MSMQ / RabbitMQ for the rest of your system.
Does anyone have an experience running clustered Tigase XMPP servers on Amazon's EC2, primarily I wish to know about anything that might trip me up that is non-obvious. (For example apparently running Ejabberd on EC2 can cause issues due to Mnesia.)
Or if you have any general advice to installing and running Tigase on Ubuntu.
Extra information:
The system I’m developing uses XMPP just to communicate (in near real-time) between a mobile app and the server(s).
The number of users will initially be small, but hopefully will grow. This is why the system needs to be scalable. Presumably for a just a few thousand users you wouldn’t need a cc1.4xlarge EC2 instance? (Otherwise this is going to be very expensive to run!)
I plan on using a MySQL database hosted in Amazon RDS for the XMPP server database.
I also plan on creating an external XMPP component written in Python, using SleekXMPP. It will be this external component that does all the ‘work’ of the server, as the application I’m making is quite different from instant messaging. For this part I have not worked out how to connect an external XMPP component written in Python to a Tigase server. The documentation seems to suggest that components are written specifically for Tigase - and not for a general XMPP server, using XEP-0114: Jabber Component Protocol, as I expected.
With this extra information, if you can think of anything else I should know about I’d be glad to know.
Thank you :)
I have lots of experience. I think there is a load of non-obvious problems. Like the only reliable instance to run application like Tigase is cc1.4xlarge. Others cause problems with CPU availability and this is just a lottery whether you are lucky enough to run your service on a server which is not busy with others people work.
Also you need an instance with the highest possible I/O to make sure it can cope with network traffic. The high I/O applies especially to database instance.
Not sure if this is obvious or not, but there is this problem with hostnames on EC2, every time you start instance the hostname changes and IP address changes. Tigase cluster is quite sensitive to hostnames. There is a way to force/change the hostname for the instance, so this might be a way around the problem.
Of course I am talking about a cluster for millions of online users and really high traffic 100k XMPP packets per second or more. Generally for large installation it is way cheaper and more efficient to have a dedicated servers.
Generally Tigase runs very well on Amazon EC2 but you really need the latest SVN code as it has lots of optimizations added especially after tests on the cloud. If you provide some more details about your service I may have some more suggestions.
More comments:
If it comes to costs, a dedicated server is always cheaper option for constantly running service. Unless you plan to switch servers on/off on hourly basis I would recommend going for some dedicated service. Costs are lower and performance is way more predictable.
However, if you really want/need to stick to Amazon EC2 let me give you some concrete numbers, below is a list of instances and how many online users the cluster was able to reliably handle:
5*cc1.4xlarge - 1mln 700k online users
1*c1.xlarge - 118k online users
2*c1.xlarge - 127k online users
2*m2.4xlarge (with 5GB RAM for Tigase) - 236k online users
2*m2.4xlarge (with 20GB RAM for Tigase) - 315k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 400k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 312k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 327k online users
5*m2.4xlarge (with 60GB RAM for Tigase) - 280k online users
A few more comments:
Why amount of memory matters that much? This is because CPU power is very unreliable and inconsistent on all but cc1.4xlarge instances. You have 8 virtual CPUs but if you look at the top command you often see one CPU is working and the rest is not. This insufficient CPU power leads to internal queues grow in the Tigase. When the CPU power is back Tigase can process waiting packets. The more memory Tigase has the more packets can be queued and it better handles CPU deficiencies.
Why there is 5*m2.4xlarge 4 times? This is because I repeated tests many times at different days and time of the day. As you can see depending on the time and date the system could handle different load. I guess this is because Tigase instance shared CPU power with some other services. If they were busy Tigase suffered from CPU under power.
That said I think with installation of up to 10k online users you should be fine. However, other factors like roster size greatly matter as they affect traffic, and load. Also if you have other elements which generate a significant traffic this will put load on your system.
In any case, without some tests it is impossible to tell how really your system behaves or whether it can handle the load.
And the last question regarding component:
Of course Tigase does support XEP-0114 and XEP-0225 for connecting external components. So this should not be a problem with components written in different languages. On the other hand I recommend using Tigase's API for writing component. They can be deployed either as internal Tigase components or as external components and this is transparent for the developer, you do not have to worry about this at development time. This is part of the API and framework.
Also, you can use all the goods from Tigase framework, scripting capabilities, monitoring, statistics, much easier development as you can easily deploy your code as internal component for tests.
You really do not have to worry about any XMPP specific stuff, you just fill body of processPacket(...) method and that's it.
There should be enough online documentation for all of this on the Tigase website.
Also, I would suggest reading about Python support for multi-threading and how it behaves under a very high load. It used to be not so great.
Assume there are several computers, distributed in the same network.
I install a program on all of them, and so there is a cluster.
and I can log in it, run my application(like web server , db server, and so on).
I don't need to configure the IPs, don't need to balance the loading.
Is there any software like this now?
edit:
OK, I want to build a cluster that can provide an enterprise web server(also db server store data), we have lots of PC, they are only running a small program now(for shop floor work-flow control). I want to use the additional CPU and Disk resource to build a service.
What purpose are you planning to serve with your cluster? That will determine the tool you want to use.
That being said, you will have to do some configuration- like IPs, Authentication Mechanism, et cetra. If you don not tell it what you want, how will it know?
In general, if the application is not designed to be clustered, you will have more pain than advantages.
Is current load too high for current single box hardware?