Horizontal Scaling of Tomcat in Microsoft Azure - session

I am working on this quiet a while, but still no conclustion.
I want to do horizontal scaling of Tomcat instances in Microsoft Azure (1,2,3,... Tomcat instances for one service). I read lots of articles about session replication, clustering,... with Tomcat. Since Azure does not support Multicasts, there is no easy way to cluster Tomcat. Also sticky sessions is no options, because Azure does round robin load balancing. Setting up two services - one with Terracotta or Apache mod_jk - and the other with Tomcat instances seems overkill for me (if even doable)...
Is this even possible?
Thanks in advance for reading and answering my question. Every comment/idea is highly appreciated.

There is the new appFabric caching service you could use, or there are examples of using Memcache on Azure, would that help?
http://code.msdn.microsoft.com/winazurememcached

Why do you feel that running 2 services is overkill, exactly? If you have no issue with scaling out to n Tomcat instances, adding another service for load distribution is a perfectly acceptable solution in my book. By running that service on a minimum of 2 instances, that service itself meets the Azure SLA requirements: your uptime will be as good as it is going to get on Azure, and you avoid a SPoF (single point of failure).
You could go with a product like terracotta, but it is also pretty straightforward to write a simple socket server to route HTTP sessions back to a particular instance running in Windows Azure. You would have to be aware of node recyles, but that is quite doable.
Be aware that memcached requires an additional Azure service as well (web roles), the appFabric caching service does not (but also has cost associated with it). I do not know Tomcat, but for IIS you can easily move session state from in memory to persisted (either SQL Azure or Azure Storage). Something to be aware of: for high volume sites, the transaction cost to Azure Storage can actually become a cost driver for your deployment if you store session info there. SQL Azure could well be the more cost-effective solution, but on the other hand might not be supported out-of-the-box for your solution.

I do not think that you can run Tomcat on Azure. Even if you could (using the virtual machine role) it is probably cheaper to run it on a Linux VM on Amazon EC2.
Edit
I see that this is possible using the Tomcat Solution Accelerator. But look at the disclaimer:
This solution accelerator is provided
for informational purposes only and
Microsoft or Infosys makes no
warranties, either express or implied
This is an unsupported solution. I know that it is often difficult to question management decisions. But using unsupported software for production systems, when a cheaper supported alternative is available, is generally not a good idea.

Related

how to make website, use distributed file system- hadoop for data management

I am naive in big data technology, and have curiosity to relate it with the conventional application development.
The conventional way to develop any web application is to have a hosting server (or application server) and a database to manage the data.
But lets say, I have a huge data set which is generated by the website, (i.e. GBs per second), then the website will fall into the category of managing big data.
lets suppose, I have a cluster of 20 computers, with 200GB of hard drives and core i3 processor. So now I will have enough processing and storage power for the website. (of-course hadoop is scalable too, if I need more resources).
how to setup application server, to host the website in this cluster ?
will I need load balancers for my application server since there is higher velocity, of http request to the application server?
can anyone please guide !
thanks in advance.
EDIT:
I just wanted to take an overview idea of how web application development takes place in relation with big data. Let's imagine Facebook. It is basically a web application. How application servers and database management is done, for Facebook is my curiosity.
As it is a fact that such a big company like Facebook, will have to use distributed system. E.g. hadoop clusters. And my question relates with the same concept. But Facebook has huge clusters, and to understand the way it has been implemented is tough, in my question I mentioned cluster of 20 computers. If someone has experience in setting up the hadoop clusters for web application hosting, then I would request to share the knowledge
I don't know much about Hadoop, but if I were going to make a web site I would use Visual Studio.
https://msdn.microsoft.com/en-us/library/k4cbh4dh.aspx?f=255&MSPPError=-2147217396
https://www.youtube.com/watch?v=GIRmPB0xshw
Visual Studio Express is free and very easy to use.

AppHarbor basic questions on architecture and realibility

AppHarbor looks very appealing for our .NET solution. But I have some questions I could not find on internet.
Our major concern is reliability of dedicated SQL Server:
Is it clustered / mirrored / replicated?
What happens when they upgrade / patch / maintain server or. hosted server and when hardware fails?
Are upgrades scheduled?
Can we set time interval when they do upgrades?
Which version and edition of Sql Server is used?
Can I use full text search?
Can I use Reporting service?
Is communication with SQL database reliable? For example in Azure SQL it is recommended to build in retry logic - if command does not succeed, retry.
Is AppHarbor reliable? Every cloud provider has occasionally some blackouts (Amazon, MS Azure ...). Is AppHarbor any less reliable compare to them? I know AppHarbor runs on top of Amazon.
Are there a lot of hidden issues you run into? What are the most common?
Did anybody decide to leave appHarbor for a good reason?
As far I can see Azure is a real cloud system with all the downside and upside - more scalable, but with modified infrastructure like customized SQL server .... AppHarbor mimics more on-premises solution. Is my understanding correct?
How is documentation?
How is support?
Thank you for your help.
Yes AppHarbor offers redundant/replicated dedicated SQL Server databases. These plans are available upon request.
This depends on the type of maintenance/update and your SQL Server database plan. If the database server is replicated, downtime can be minimized by failing over to the replica while performing maintenance. In the event of a server failure the database will be attached to a new instance and the application's configuration will be automatically updated. Should a hard drive fail leading to corrupted/lost data AppHarbor make daily backups that will be used to restore your database. It should be noted that hard drive failures are very rare.
We generally coordinate planned maintenance that requires downtime with customers whenever possible. Dedicated SQL Server customers can also select their own maintenance window.
Not really, but AppHarbor will reach out and coordinate with you when it is necessary.
Different SQL Server versions and editions are used depending on the plan. For single-instance dedicated SQL Servers we generally use SQL Server 2008 R2 Web Edition. Dedicated SQL Server 2012 instances are available upon request. Replicated setups require other and more expensive SQL Server editions. You may also want to consider our dedicated MySQL service if you'd like to reduce costs and don't rely on SQL Server specific features - since AppHarbor doesn't have to pay license costs these are less expensive, particularly for a replicated setup.
Yes.
Not by default, but we can work with you to support reporting services on your dedicated SQL Server instance.
Yes. In fact the primary reason customers upgrade from shared to dedicated SQL Server is for consistent, reliable performance.
I'd say so. The last major outage occurred on July 29th, 2012 due to an electrical storm that affecting multiple availability zones in AWS's North Virginia region. As an example, our blog has been available 99.997% of the time since then. In the event of an application instance failure applications are rapidly moved to healthy instances. We recommend running with at least two workers to ensure redundancy in those cases.
I'm admittedly not the best person to answer this question. The most common request/limitation we hear about is that you can't currently trigger a backup yourself. This will be available at a later time, but we do keep daily backups of your databases.
-
AppHarbor's cloud application platform is relatively similar to Azure in terms of scalability. We support rapid "elastic scaling" of application workers both vertically and horizontally. With regards to the dedicated SQL Server service your understanding is correct: It is very similar to an on-premise solution. While the scaling story is different compared to SQL Azure this allows for much greater flexibility. We can tailor a database plan and server that suits your requirements whether you need high CPU, RAM and/or I/O performance. Similarly we can offer database sizes that are 10x larger than SQL Azure's current 150GB database size limitation.
Most documentation is available in the knowledge base. We try and keep this as up-to-date and comprehensive as possible, but if you find yourself missing some information you're of course more than welcome to let us know and we'll add it. Third party add-on providers typically maintain their own AppHarbor-specific documentation.
This is another question where I may be a little biased, but I can tell a little about our goals: Our goal is to always answer non-critical support requests related to apps on both free and paid plans within the day. Critical support requests and supports requests related to applications or databases on paid plans take priority. Support is included in the plans, but we're working on offering premium support options as well. We generally try to exceed your expectations and are always happy to help out and give advice on issues you experience - whether they're related to the AppHarbor platform or not.
Disclaimer: I'm a co-founder of AppHarbor.

MemoryCache object and load balancing

I'm writing a web application using ASP .NET MVC 3. I want to use the MemoryCache object but I'm worried about it causing issues with load balanced web servers. When I google for it looks like that problem is solved on the server ie using AppFabric. If a company has load balanced servers is it on them to make sure they have AppFabric or something similar running? or is there anything I can or should do as a developer for this?
First of all, for ASP.NET you should look at the ASP.NET Cache instead of MemoryCache. MemoryCache is a generic caching API that was introduced in .NET 4.0 to provide an equivalent of the ASP.NET Cache in non-web applications.
You're correct to say that AppFabric resolves the issue of multiple servers having their own instances of cached data, in that it provides a single logical cache accessible from all your web servers. Before you leap on it as the solution to your problem, there's a couple of things to consider:
It does not ship as part of Windows Server - it is, as you say, on
you to install it on your servers if you want to use it. When
AppFabric was released, there was a suggestion that it would ship as
part of the next release of Windows Server, but I haven't seen
anything about Windows Server 2012 that confirms that to be the case.
You need extra servers for it, or at least you're advised to have
them. Microsoft's recommendation for AppFabric is that you run it on
dedicated servers. Which means that whilst AppFabric itself is a free
download, you may be incurring additional Windows Server licence
costs. Speaking of which...
You might need Enterprise Edition licences. If you want to use the
High Availability features of AppFabric, you can only do this with
servers running Enterprise Edition, which is a more expensive licence
than Standard Edition.
You might not need it after all. Some of this will depend on your application and why you want to use a shared caching layer. If your concern is that caches on multiple servers could get out of sync with the database (or indeed each other), some judicious use of SqlCacheDependency objects might get you past the issue.
This CodeProject article Implementing Local MemoryCache Invalidation with Redis suggests an approach for handling the scenario you describe.
You didn't mention the flavor of load balancing that you are using: "sticky" or "stateless". By far the easiest solution is to use sticky sessions.
If you want to use local memory caches and stateless load balancing, you can end up with race conditions the cross-server invalidation messages arrive late. This can be particularly problematic if you use the Post-Redirect-Get pattern so common in ASP.Net MVC. This can be overcome by using cookies to supplement the cache invalidation broadcasts. I detail this in a blog post here.

Amazon EC2 + Windows Server 2008 + Memcached = how?

We are building a system that would benefit greatly from a Distributed Caching mechanism, like Memcached. But i cant get my head around the configuration of Memcached daemons and clients finding each other on an Amazon Data Center. Do we manually setup the IP addresses of each memcache instance (they wont be dedicated, they will run on Web Servers or Worker Boxes) or is there a automagic way of getting them to talk to each other? I was looking at Microsoft Windows Server App Fabric Caching, but it seems to either need a file share or a domain to work correctly, and i have neither at the moment... given internal IP addresses are Transient on Amazon, i am wondering how you get around this...
I haven't setup a cluster of memcached servers before, but Membase is a solution that could take away all of the pain you are experiencing with memcached. Membase is basically memcached with a persistence layer underneath and comes with great cluster management software. Clustering servers together is as easy since all you need to do is tell the cluster what the ip address of the new node is. If you already have an application written for Memcached it will also work with Membase since Membase uses the Memcached protocol. It might be worth taking a look at.
I believe you could create an elastic ip in EC2 for each of the boxes that hold your memcached servers. These elastic ips can be dynamically mapped to any EC2 instance. Then your memcached clients just use the elastic ips as if they were static ip addresses.
http://alestic.com/2009/06/ec2-elastic-ip-internal
As you seemed to have discovered, Route53 is commonly used for these discovery purposes. For your specific use case, however, I would just use Amazon ElasticCache. Amazon has both memcached and redis compliant versions of ElasticCache and they manage the infrastructure for you including providing you with a DNS entry point. Also for managing things like asp.net session state, you might consider this article on the DynamoDB session state provider.
General rule of thumb: if you are developing a new app then try and leverage what the cloud provides vs. build it, it'll make your life way simpler.

Tomcat session-cluster: Is it production level? Does it scale?

I would like to know any experience with the Tomcat Session Cluster solution. Is it production level? Does it scale? Can I use it in a server farm? Do you recommend any other solution for a session cluster? (Ex: database, terracota, jgroups, etc.)
Another alternative would be the memcached-session-manager, a session failover solution for tomcat: http://code.google.com/p/memcached-session-manager/
I created this project to get the best of performance and reliability and to be able to scale out by just adding more tomcat and memcached nodes.
Cheers,
Martin
From all the documentation I've read, it will work fine for a few number of instances but then become an issue.
We use Tomcat as our backend servers but design our applications to use as little session information as possible (basically just logins). Then we front the Tomcats with a load balancer like Apache or Nginx (the later which I'm favoring recently) and use sticky sessions. If a server goes offline (which is unlikely) then the user simply needs to login again, which depending on how you set it up could be transparent to them.
When I was looking to do more session based clustering, Terracotta looked very impressive. But stateless design makes scaling much easier.

Resources