For my next project i have a choice to between choosing from Elastic Beanstalk or Heroku ? The decision will come down to two things.
Deployment, Maintenance setup etc (Heroku is a handsdown winnder)
Cost of Scalability. Is there any comparison available of the financial cost of the two. The server app will probably be answering around 100,000 to 500,000 calls a days. What would be the better of the two ?
Kind Regards
EBS is actually elastic block store, which would be a reason few people are answering. As for cost of scalability, consider that on heroku your unit of scale is a dyno, whereas elastic beanstalk requires you to step up with whole EC2 instances.
Related
I have to use Hadoop for my research work and I am deciding for the best option to start with. So far I have end up to go with Cloudera. I've downloaded the quick start VM
and started learning different turorials.
The issue is that my system can't afford to run it and perform very slow and I think it might just stop working after I feed it with all the data and run other services.
I was advised to go for a cloud service with 4 cluster node. Can someone please help me by providing the best option and estimated pricing to consider? 1 year plan might be enough to complete my research.
Thanks.
If you are a linux user, Just download the individual components(like hdfs, MR1, YARN, Hbase, Hive etc...) from this Cloudera Archives instead of loading Cloudera Quickstart VM.
If you want to try the 4 node cluster, easiest option is to use cloud.
There are plenty of cloud providers. I have personally used AWS, Google Cloud, Microsoft Azure, IBM SmartCloud. Out of which, AWS is the best to start with.
It is like pay as you go(use).I can recommend you to use a decent EC2 Machine(4 X m3.large Machines)
Type: m3.large CPU:2 RAM:7.5G Storage: 1 x 32 SSD Price: $0.133 per Hour AWS Pricing
If you plan to do the research for one year, I recommend you to go for VPC.
Cons of AWS EC2:
If you launch a machine in EC2, the moment you restart your machine, Your IP and the hostname will get changed.
In AWS VPC, your IP and hostname will remain the same.
If you use 4 Machinesx24x7xone month,it costs you $389.44.
You can calculate the AWS cost by yourself
As best as I can see you have two paths:
Setup Hadoop in a cloud service provider (i.e. Amazon's EC2 or
Redhat's Openshift.
Use Hadoop-as-a-service (i.e. Amazon's EMR or Microsoft's HDInsight).
The first path, setting up Hadoop in a cloud service provider will require you to become a semi-competent Hadoop administrator. If that's your goal, great! However you'll spend a great deal of time learning the necessary skills and mindset to become that. I don't suspect that that is your goal.
The second path is the one I'd recommend out of these two. Using Hadoop-as-a-service you will get up and running faster, but will cost more up front and on an ongoing (per hour basis). You'll still probably save money because you'll be spending less time troubleshooting your Hadoop cluster and more time doing the work you wanted to do in the first place.
I have to wonder, if you can even fit your dataset on your laptop, why are you using big data tools in the first place? True, they'll work. However Big Data is at least partially defined as data sets and computational problems that just don't fit on a single machine.
We are working on setting up our elasticsearch backend for a production environment. Up until a few weeks ago, we were using Solr, but we decided to use Elasticsearch for a few reasons, but the biggest reason is for the distributed nature of the backend.
With that said, we've been looking for some documentation and best practices on deploying elasticsearch using amazon's services.
For the moment, we were considering using a extra-large box and then scaling out from there, but we aren't sure that is the best approach. For example, it may be better to have three mediums than one extra-large.
We intend to index around 100K to 150K documents per day up to around ten million docs.
The question is, can anyone provide a general environment / deployment diagram for elasticsearch or best practices in general?
There's some docs for elasticsearch that talk about EC2 deployment. There's an autodiscovery plugin based on EC2 tags or security groups or whatever you like. You can also choose S3 for persistence, although that may not really be necessary.
I'd advise launching it in a VPC so you can have permanent internal IPs, in regular EC2 your internal IPs will change with every reboot even if you're using Elastic IPs.
I am building a MySQL database with a web front end for a client. The client and their staff will use this webapp on a daily basis, creating anywhere from a few thousand, to possibly a few hundred thousand records annually. I just picked up a second client who wishes to have the same product and will probably be creating the same number of records annually, possibly more.
In the future I hope to pick up a few more clients. In the next few years I could have up to 5 databases & web front ends running for 5 distinct clients, all needing tight security while creating, likely, millions of records annually (cumulatively across all the databases).
I would like to run all of this with Amazon's EC2 service but am having difficulty deciding on what type of instance to run. I am not sure if I should have several distinct Linux instances, one per client, or run one "large" instance which would manage all the clients' databases and web front ends.
I know that hardware configuration is rather specific to the task at hand. The web front ends will be using JQuery to make MySQL queries "pretty" and I will likely be doing some graphing of data (again with JQuery). The front ends will be using SSL for security, which I understand can add some overhead to the network speed.
I'm looking for some of your thoughts on this situation.
Thanks
Use the tools that are available. The Amazon RDS service lets you run a MySQL database in the cloud with no extra effort. You can scale it up and down as you need - start small, and then as you hit your limits, add extra capacity (at extra cost).
Next, use Elastic Load Balancing (ELB) with an SSL certificate, so you offload the overhead of SSL decryption to an Amazon service.
If you're using Java for your webapp, you could use Elastic Beanstalk to handle the whole hosting process for you.
Don't be afraid to experiment - you can always resize instances with no data loss (if they boot from an EBS volume) and you can always create and delete instances. Scaling horizontally is often better than scaling vertically, as you can spread your instances across multiple Availability Zones.
Good luck!
I've been tasked with determining if Amazon EC2 is something we should move our ecommerce site to. We currently use Amazon S3 for a lot of images and files. The cost would go up by about $20/mo for our host costs, but we could sell our server for a few thousand dollars. This all came up because right now there are no procedures in place if something happened to our server.
How reliable is Amazon EC2? Is the redundancy good, I don't see anything about this in the FAQ and it's a problem on our current system I'm looking to solve.
Are elastic IPs beneficial? It sounds like you could point DNS to that IP and then on Amazon's end, reroute that IP address to any EC2 instance so you could easily get another instance up and running if the first one failed.
I'm aware of scalability, it's the redundancy and reliability that I'm asking about.
At work, I've had something like 20-40 instances running at all times for over a year. I think we've had 1-3 alert emails come from amazon suggesting that we terminate and boot another instance (presumably because they are detecting possible failure in the underlying hardware). We've never had an instance go down suddenly, which seems rather good.
Elastic IP's are amazing and are part of the solution. The other part is being able to rapidly bring up new instances. I've learned that you shouldn't care about instances going down, that it's more important to use proper load balancing and be able to bring up commodity instances quickly.
Yes, it's very good. If you aren't able to put together a concurrent redundancy (where you have multiple servers fulfilling requests simultaneously), using the elastic IP to quickly redirect to another EC2 instance would be a way to minimize downtime.
Yeah I think moving from inhouse server to Amazon will definitely make a lot of sense economically. EBS backed instances ensure that even if the machine gets rebooted, the transient memory is not lost. And if you have a clear separation between your application and data layer and can have them on different machines, then you can build even better redundancy for your data.
For ex, if you use mysql, then you can consider using Amazon RDS service - which gives you a highly available and reliable MySQL instance, fully managed (patches and all). The application layer then can be made more resilient by having more smaller instances rather than one larger instance, through load balancing.
The cost you will save on is really hardware maintenance and the cost you would have to incur to build in disaster recovery.
I was curious if anyone has experimented with auto scaling web or db tier in EC2 or other cloud computing infrastructure? It seems theoretically possible, but I am curious what the practical limitations are/maybe.
Thanks!
We are also starting to look at auto-scaling.
The first candidate approach is to use Amazon's ELB (Elastic Load Balancer) and Cloud Front. However, our traffic is a web service. Caller's frequently send the 100-Continue http message, and ELB cannot understand that message. There's no word yet from Amazon on when that might be fixed. Also, there are a number of complaints in the Amazon forums about ELB not handling heavy load.
LigHTTPD 1.5 looks like a promising partial solution, in that it can detect when an instance is not functioning and transparently take it out of the rotation, and can be dynamically reconfigured without restarting the load balancer.
There are a number of commercial solutions as well. We will probably have a look at Right Scale.
This is more of a question than an answer, but I'm about to start experimenting with autoscaling myself (most likely using the Amazon CloudFront facilities) and am thinking that instance startup time will be a factor. I've noticed that a new EC2 instance can take from 5 to 20 minutes to start up, so it's not as if you can instantly add more capacity when your load increases; it seems like you would need one or more idle instances to be running and ready to pick up increased load.
Late addition:
Consider SimpleDB as well... this would eliminate the DB scaling side.
For autoscaling, we rolled our own scripts to monitor, launch, and provision servers and yes, the whole process takes about 7 minutes. We do a little predictive analysis to guess when new servers will be needed and then just break them down if they aren't. Total cost: ~10 cents.
Also, Scalr looks promising as a commercial solution (haven't used it).
Chad