I am looking for general advice from anyone who has experience monitoring Oracle RDS databases in AWS. The system that I am working with will involve several enterprise Oracle RDS databases (on the order of a few dozen) in AWS. My organization is considering two options for monitoring:
Setting up Cloud Control in AWS, by housing the OMS and the repository database on an EC2 instance and enabling the OEM_AGENT on our RDS instances.
Relying entirely on EM Express/CloudWatch and any other third party software that we can use without the overhead of Cloud Control.
The concern with option 1 is that it undermines our reasons for moving to RDS, namely, to remove some of the administrative overhead of maintaining traditional on-premises Oracle databases. The OEM repository database cannot be housed in RDS as the OMS requires SYS-level access to the repository and RDS does not allow for this. As a result, having Cloud Control would require a lot of the kind of maintenance we were hoping to move away from.
The problem with option 2 is mainly the lack of metric alerting. CloudWatch/Enhanced Monitoring provide some basic metrics for alerts, but more specific metrics and alerts, such as those on alert log errors, tablespaces, long-running queries, archive area used, etc are lacking. We do not mind the lack of centralization as we will simply create an internal page with links to all of the different databases, and EM Express gives us what we need from a performance monitoring standpoint. The only concern really is the lack of metrics alerting. If there is not some other way of doing this, we may also simply write our own PL/SQL scripts to trigger the alerts.
However, I am curious to know how others solved this problem or even just generally, what kinds of AWS-based Oracle monitoring systems have been set up and how they work.
A problem almost all the enterprises which moving to cloud are facing today. Companies moving to cloud to get rid of some of their admin tasks and then they are figuring out they can't do all the customization that they had in on-prem.
So, here is how you can make the option 2 better. Especially to address your concern
The only concern really is the lack of metrics alerting
RDS events are a good way for monitoring. You can subscribe to the events and get notified in multiple ways such as group email, slack channels or to a third part monitoring tool like pagerduty.
Using RDS Events integration with Lambda. I strongly suggest to have a look on Lambda. As I mentioned above, apart from subscribing to the events, you can also call/trigger a lambda function to take actions for certain events. We use Lambda for overcoming the slave skip error in mysql.
Another use case of Lambda is an alternative to cron job. Things like checking disk space every day, to make sure incremental backups are taken over night.
Let me know, if you have specific question on "how to implement" these options. I will be glad to add more information.
Related
We are moving our infrastructure to cloud formation since it's much easier to describe the infrastructure in a nice manner. This works fantastically well for things like security groups, routing, VPCs, transit gateways.
However, we have two issues which we are struggling with and I don't think fit the declarative, infrastructure-as-code paradigm which things like terrafrom and cloud formation are.
(1) We have a business requirement where we run a scheduled batch at specific times in the day. These are very computationally intensive. To save costs, we run these on an EC2 which is brought up at that time, then torn down when the batch is finished. However, this seems to require a temporary change to the terraform/CF files, then a change back. Is there a more native way of doing this?
(2) We dynamically store and allow to be edited by clients their firewalling rules on their load balancer (ALB). This information cannot be stored in the terraform/CF files since it can be changed by clients on demand.
Is there a way of properly doing these things in CF/Terraform?
(1) If you have to use EC2, you could create a Lambda that would start your EC2 instances. Then, create a CloudWatch Event that triggers the Lambda at your specified date / time. For more details you can see https://aws.amazon.com/premiumsupport/knowledge-center/start-stop-lambda-cloudwatch/. Once the job is done, have your EC2 shut itself down using the awssdk or awscli.
Alternatively, you could use AWS Lambda to run your batch job. You only get charged when the Lambda runs. Likewise, create a CloudWatch Event rule that schedules the Lambda.
(2) You could store the firewall rules in your own DB and modify the actual ALB SG rules using the awssdk. I don't think it's a good idea to store these things in Terraform/CF. IMHO Terraform/CF are great for declaring infrastructure but won't be a good solution for resources that are dynamically changing, especially by third parties like your clients.
We have a set of Microservices collaborating with each other in the eco system. We used to have occasional problems where one or more of these Microservices would go down accidentally. Thankfully, we have some monitoring built around which would realize this and take corrective action.
Now, we would like to have redundancy built around each of those Microservices. I'm thinking more like a master / slave approach where a slave is always on stand by and when the master goes off, the slave picks it up.
Should we consider using any framework that we could use as service registry, where we register each of those Microservices and allow them to be controlled? Any other suggestions on how to achieve the kind of master / slave architecture with the Microservices that would enable us to have failover redundancy?
I thought about this for a couple of minutes and this is what I currently think is the best method, based on experience.
There are a couple of problems you will face with availability. First is always having at least one endpoint up. This is easy enough to do by installing on multiple servers. In the enterprise space, you would use a name for the endpoint and then have it resolve to multiple servers (virtual or hardware). You would also load balance it.
The second is registry. This is a very easy problem with API management software. The really good software in this space is not cheap, so this is not a weekend hobbyist type of software. But there are open source API Management solutions out there. As I work in the Enterprise space, I am very familiar with options like Apigee, CA, Mashery, etc. so I cannot recommend an open source option and feel good about myself.
You could build your own registry, if you desire. Just be careful how you design it, as a "registry of all interface points" leads to a service that becomes more tightly coupled.
I am trying to build a prototype of Elasticsearch as a Service. I have thought of 2 different approaches and I'd like to get opinions towards one or the other implementation
One single installation of Elasticsearch, and a proxy layer on top to add user validation (http basic authentication + user account to validate the usage).
This approach would be relatively straight forward and the main challenge would be configure the cluster properly to handle the load, as well as the permissions so there are no data leaks of the users don't have access to the cluster management APIs.
Use Docker as a container and have one instance of elasticsearch for each user. In this case I would be providing the isolation by using the Linux container (Docker). I'd still need to manage authentication.
It probably would be good to implement both, play around and see how things behave. Any opinions about pros and cons of each approach?
Thanks!
Disclaimer: I am the founder of the Elasticsearch service provider Facetflow, which currently offers shared clusters.
I think that both approaches have merit, but maybe suited for different types of customers.
Looking at other SaaS providers, like MongoDB provider MongoLab, they essentially ended up offering both setups (although not using Docker).
So, pros and cons as I see them:
Shared Cluster
Most Elasticsearch as a Service providers operate this way.
Pros:
Far more affordable for the majority of users just looking for good search and analytics.
Simpler maintenance, less clusters for you to monitor
Potentially less versions of Elasticsearch to integrate with. If you need to communicate with other systems (which you do), write your own plugins (we did, for authentication, silos, entitlements, stats etc.) less versions will be far easier to maintain.
Cons:
Noisy neighbours have to be monitored and you have to scale and relocate indices to handle this.
Users have to choose from a limited list of versions of Elasticsearch, usually a single version.
Users don't get full cluster admin control.
Private Clusters using Docker
One provider that works this way is Found.
Pros:
Users could potentially be able to deploy a variety of versions of Elasticsearch
Users can have complete cluster admin access
Noisy neighbours don't affect their cluster, less manual intervention from you
Cons:
Complex monitoring and support. If people can do whatever they want (shut down the cluster over the api), you have to be clear where your responsibility as a provider ends, and what wakes you up at night.
Complex integration with multiple versions, see shared cluster pros.
More expensive since you have to allocate resources that might not always be used.
AppHarbor looks very appealing for our .NET solution. But I have some questions I could not find on internet.
Our major concern is reliability of dedicated SQL Server:
Is it clustered / mirrored / replicated?
What happens when they upgrade / patch / maintain server or. hosted server and when hardware fails?
Are upgrades scheduled?
Can we set time interval when they do upgrades?
Which version and edition of Sql Server is used?
Can I use full text search?
Can I use Reporting service?
Is communication with SQL database reliable? For example in Azure SQL it is recommended to build in retry logic - if command does not succeed, retry.
Is AppHarbor reliable? Every cloud provider has occasionally some blackouts (Amazon, MS Azure ...). Is AppHarbor any less reliable compare to them? I know AppHarbor runs on top of Amazon.
Are there a lot of hidden issues you run into? What are the most common?
Did anybody decide to leave appHarbor for a good reason?
As far I can see Azure is a real cloud system with all the downside and upside - more scalable, but with modified infrastructure like customized SQL server .... AppHarbor mimics more on-premises solution. Is my understanding correct?
How is documentation?
How is support?
Thank you for your help.
Yes AppHarbor offers redundant/replicated dedicated SQL Server databases. These plans are available upon request.
This depends on the type of maintenance/update and your SQL Server database plan. If the database server is replicated, downtime can be minimized by failing over to the replica while performing maintenance. In the event of a server failure the database will be attached to a new instance and the application's configuration will be automatically updated. Should a hard drive fail leading to corrupted/lost data AppHarbor make daily backups that will be used to restore your database. It should be noted that hard drive failures are very rare.
We generally coordinate planned maintenance that requires downtime with customers whenever possible. Dedicated SQL Server customers can also select their own maintenance window.
Not really, but AppHarbor will reach out and coordinate with you when it is necessary.
Different SQL Server versions and editions are used depending on the plan. For single-instance dedicated SQL Servers we generally use SQL Server 2008 R2 Web Edition. Dedicated SQL Server 2012 instances are available upon request. Replicated setups require other and more expensive SQL Server editions. You may also want to consider our dedicated MySQL service if you'd like to reduce costs and don't rely on SQL Server specific features - since AppHarbor doesn't have to pay license costs these are less expensive, particularly for a replicated setup.
Yes.
Not by default, but we can work with you to support reporting services on your dedicated SQL Server instance.
Yes. In fact the primary reason customers upgrade from shared to dedicated SQL Server is for consistent, reliable performance.
I'd say so. The last major outage occurred on July 29th, 2012 due to an electrical storm that affecting multiple availability zones in AWS's North Virginia region. As an example, our blog has been available 99.997% of the time since then. In the event of an application instance failure applications are rapidly moved to healthy instances. We recommend running with at least two workers to ensure redundancy in those cases.
I'm admittedly not the best person to answer this question. The most common request/limitation we hear about is that you can't currently trigger a backup yourself. This will be available at a later time, but we do keep daily backups of your databases.
-
AppHarbor's cloud application platform is relatively similar to Azure in terms of scalability. We support rapid "elastic scaling" of application workers both vertically and horizontally. With regards to the dedicated SQL Server service your understanding is correct: It is very similar to an on-premise solution. While the scaling story is different compared to SQL Azure this allows for much greater flexibility. We can tailor a database plan and server that suits your requirements whether you need high CPU, RAM and/or I/O performance. Similarly we can offer database sizes that are 10x larger than SQL Azure's current 150GB database size limitation.
Most documentation is available in the knowledge base. We try and keep this as up-to-date and comprehensive as possible, but if you find yourself missing some information you're of course more than welcome to let us know and we'll add it. Third party add-on providers typically maintain their own AppHarbor-specific documentation.
This is another question where I may be a little biased, but I can tell a little about our goals: Our goal is to always answer non-critical support requests related to apps on both free and paid plans within the day. Critical support requests and supports requests related to applications or databases on paid plans take priority. Support is included in the plans, but we're working on offering premium support options as well. We generally try to exceed your expectations and are always happy to help out and give advice on issues you experience - whether they're related to the AppHarbor platform or not.
Disclaimer: I'm a co-founder of AppHarbor.
For the first time I am developing an app that requires quite a bit of scaling, I have never had an application need to run on multiple instances before.
How is this normally achieved? Do I cluster SQL servers then mirror the programming across all servers and use load balancing?
Or do I separate out the functionality to run some on one server some on another?
Also how do I push out code to all my EC2 windows instances?
This will depend on the requirements you have. But as a general guideline (I am assuming a website) I would separate db, webserver, caching server etc to different instance(s) and use s3(+cloudfont) for static assets. I would also make sure that some proper rate limiting is in place so that only legitimate load is on the infrastructure.
For RDBMS server I might setup a master-slave db setup (RDS makes this easier), use db sharding etc. DB cluster solutions also exists which will be more complex to setup but simplifies database access for the application programmer. I would also check all the db queries and the tune db/sql queries accordingly. In some cases pure NoSQL type databases might be better than RDBMS or a mix of both where the application switches between them depending on the data required.
For webserver I will setup a loadbalancer and then use autoscaling on the webserver instance(s) behind the loadbalancer. Something similar will apply for app server if any. I will also tune the web servers settings.
Caching server will also be separated into its on cluster of instance(s). ElastiCache seems like a nice service. Redis has comparable performance to memcache but has more features(like lists, sets etc) which might come in handy when scaling.
Disclaimer - I'm not going to mention any Windows specifics because I have always worked on Unix machines. These guidelines are fairly generic.
This is a subjective question and everyone would tailor one's own system in a unique style. Here are a few guidelines I follow.
If it's a web application, separate the presentation (front-end), middleware (APIs) and database layers. A sliced architecture scales the best as compared to a monolithic application.
Database - Amazon provides excellent and highly available services (unless you are on us-east availability zone) for SQL and NoSQL data stores. You might want to check out RDS for Relational databases and DynamoDb for NoSQL. Both scale well and you need not worry about managing and load sharding/clustering your data stores once you launch them.
Middleware APIs - This is a crucial part. It is important to have a set of APIs (preferably REST, but you could pretty much use anything here) which expose your back-end functionality as a service. A service oriented architecture can be scaled very easily to cater multiple front-facing clients such as web, mobile, desktop, third-party widgets, etc. Middleware APIs should typically NOT be where your business logic is processed, most of it (or all of it) should be translated to database lookups/queries for higher performance. These services could be load balanced for high availability. Amazon's Elastic Load Balancers (ELB) are good for starters. If you want to get into some more customization like blocking traffic for certain set of IP addresses, performing Blue/Green deployments, then maybe you should consider HAProxy load balancers deployed to separate instances.
Front-end - This is where your presentation layer should reside. It should avoid any direct database queries except for the ones which are limited to the scope of the front-end e.g.: a simple Redis call to get the latest cache keys for front-end fragments. Here is where you could pretty much perform a lot of caching, right from the service calls to the front-end fragments. You could use AWS CloudFront for static assets delivery and AWS ElastiCache for your cache store. ElastiCache is nothing but a managed memcached cluster. You should even consider load balancing the front-end nodes behind an ELB.
All this can be bundled and deployed with AutoScaling using AWS Elastic Beanstalk. It currently supports ASP .NET, PHP, Python, Java and Ruby containers. AWS Elastic Beanstalk still has it's own limitations but is a very cool way to manage your infrastructure with the least hassle for monitoring, scaling and load balancing.
Tip: Identifying the read and write intensive areas of your application helps a lot. You could then go ahead and slice your infrastructure accordingly and perform required optimizations with a read or write focus at a time.
To sum it all, Amazon AWS has pretty much everything you could possibly use to craft your server topology. It's upon you to choose components.
Hope this helps!
The way I would do it would be, to have 1 server as the DB server with mysql running on it. All my data on memcached, which can span across multiple servers and my clients with a simple "if not on memcached, read from db, put it on memcached and return".
Memcached is very easy to scale, as compared to a DB. A db scaling takes a lot of administrative effort. Its a pain to get it right and working. So I choose memcached. Infact I have extra memcached servers up, just to manage downtime (if any of my memcached) servers.
My data is mostly read, and few writes. And when writes happen, I push the data to memcached too. All in all this works better for me, code, administrative, fallback, failover, loadbalancing way. All win. You just need to code a "little" bit better.
Clustering mysql is more tempting, as it seems more easy to code, deploy, maintain and keep up and performing. Remember mysql is harddisk based, and memcached is memory based, so by nature its much more faster (10 times atleast). And since it takes over all the read load from the db, your db config can be REALLY simple.
I really hope someone points to a contrary argument here, I would love to hear it.