Best practice for test and production environments [closed] - production-environment

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
In the company where I work, we have 2 environments: test and production. We are not currently starting a new environment, because of cost.
Here is the procedure we follow: business makes a feature request, development makes it happen and deploys it on test environment. Then business tests it (UAT), and if it's OK, the feature will be included into next production deployment.
The problem is best practices for test DB. Developers treat test environment as their playground, and sometimes they reset the DB to initial state for testing purposes. On the other hand, business people think that test DB must be stable, and should not be reset. We would like to resolve this issue, and decide if test environment should belong to development team or business team. (Developers don't want business to put their nose in test env., but business team is paying for servers.)
What is the best practice about environments? Can you recommend an article about this?

I've worked in many companies, each one with a different set of environments, the one I have like the most had 5 environments:
1) Local: Basically your machine. Here is where you code and test your changes before even asking a peer for review.
2) Dev: If for some reason you cannot test your code locally (Dependencies issues mainly: "My code has neves compiled in my machine but it compiles perfectly in Jenkins/Bamboo/Travis"), then you push your changes to your feature branch in Git and make Bamboo compile it and deploy it to a dev server where you can test (you are still not sure if it will work, so no peer review so far).
3) Staging: You think your code work and you love how it looks like. You create a Pull Request in order for you peers to take a look at it before it gets merged to the master branch. Here they make comments and you fix possible issues, since you are more sure about your changes you make Bamboo deploy it to the Staging environment where more "stable" code lives and more realistic data is stored in whatever database is there. Once deployed another developer/tester can check your changes actually work and make you a "QA Sign off in Staging Env."
4) Stable: Ok, now you are ultimate sure your changes work since you deployed to Staging and nothing got broken. You merge your branch to master and Bamboo compiles master and deploys to another sets of servers in a Stable Environment (no one else should merge to master until you finish your deployment to Production, to avoid a merge of not related merges). This environment should be a replica from Production, data, code and server conditions. Here is where you show your changes to your manager, product owner or the person in charge to validate your work before sending it to production. You get the final approval, everything is good, you are sweaty, you have worked for 30 days in a row to get this change done, your wife divorced you but you are very confident it works perfectly.
5) Production: Where the clients connect to consume the company services, or the final build of your software to send to customers. With a few clicks in Bamboo you make it deploy to Production servers or compile the final build. It is green, everything seems to be ok. You check Splunk looking for errors, everything is good, life is good, you drink another sip of coffee before leaving, you drive home and sleep all weekend with your dog by your side.
It's a happy ending because having so many "Test" environments ensure quality, no change gets to production until EVERYBODY (not just you) are completely sure that changes is working.

At our company there are two databases too, a test and a production database. The test database is mainly used for testing by developers but sometimes for business tests too. This database is refreshed daily using an actual copy of the production database. So this database can be both a playground and a serious testing database. But a third, development, database is the best option. We had one, but it is broken at the moment. But when you get one of those, you should make sure it is refreshed often enough. When developers use it as a playground, it will stray away from the production environment, and its data will be both old and currupt. Because of this, developers won't be able to test well themselves. So make sure you refresh this database periodically (maybe daily too, or at least once a week).

If possible, give each developer their own database on their local machine. That way they can do whatever they want with it without affecting anyone else. This should significantly decrease the desire to play with the test database, providing a more stable environment for business UAT.

I believe in order to establish an environment strategy that supports all ALM/SDLC activities 4 requirements should exist:
1) Development environment: to allow Dev to play around with new code/concepts and typically unit test with some basic integration testing using stubs and drivers. This environment should have loose change control procedures and would typically not be anywhere near the same scale as production. This is where the Dev team can build and tear down setups as required.
2) Interop environment: where integration of systems can be further tested and increased capability for non-functional testing I.e might be a resilient env with greater scalability than Dev. I'd see this environment having tighter change control and management. Test would perform integration and system testing in this environment.
3) Reference Architecture: This is what some might call pre-prod but is essentially the same as production in terms of scale and resilience. This would have change control and management procedures akin to prod. This env would support further test activities especially full scale performance testing, failover as well as operational fault triage and maintenance activities once a product is launched to customers.
4) Production: This environment will support live customers and so test activities will be limited once this is the case. This will be fully managed and have strict change management and config management processes.
Hope this helps

You could give each developer the latest database docker image to play on their local environment. If the data is corrupted, they can just recreate the container.

Related

Best practices for continuous deployment with Concourse and Cloud Foundry

So I've been investigating ci/cd pipelines using concourse and cloud foundry lately, and I've been confused about what the best way to do this is. So I've been thinking about how the overall flow would go from development to release. There are a lot of talks and videos that discuss this at a very high level, but often they abstract away too much of the actual implementation details for it to be useful. Like how do people actually roll this out in actual companies? I have a lot of questions, so I will try to list a few of them here in the hope that someone could enlighten me a little.
What does the overall process and pipeline look like conceptually from development to prod? So far I have something along the lines of :
During development each product team is under their own org, with each developer possibly having their own development "space" that they could manually cf push to and just develop against. There will be development spaces that devs can just directly push to as well as spaces that can only be used by the automated pipeline to deploy artifacts for functional tests.
Once devs finish a feature they would make a pull request, which would trigger a smaller pipeline with some tests using something like the git-multibranch-resource or the git-pullrequest-resource, which would hook into the github required status check hooks and report back if any particular PRs are able to be merged into master or not
Once all checks pass and the pull request is merged into master the below pipeline is kicked off, which validates the master branch before releasing the artifact to prod.
code repo [master] -> build -> snapshot artifact repo -> deploy to test space -> run functional tests -> deploy to staging space -> run smoke tests and maybe other regression tests -> deploy artifact to prod -> monitoring/rollbacks (?)
What other things could/should be added to this pipeline or any part of this process?
Once you automate deployment how do you do also automate things like canary releases or rollbacks once something happens? Should this be part of the pipeline or something completely separate?
I've been playing with the idea of creating spaces temporarily and then tearing them down for the functional testing phase, would there be any benefit to doing that? The idea is that the apps being deployed would have their own clean environments to use, but this could also potentially be slow, and it is difficult to know what services are required inside of each space. You would have to read the manifest, which only specifies service-names, which seems to necessitate some sort of canonical way of naming service instances within the same space? The alternative is managing a pool of spaces which also seems complicated...
Should the pipeline generate the manifest files? Or should that be completely up to the developers? Only the developers know which services the app needs, but also it seems like things like instance count, memory etc should be something that the performances tests/pipeline should be able to determine/automate. You could generate a manifest inside the pipeline, but then you would not know which services the app needs without reading a manifest....chicken and egg problem?
I have many more burning questions, but I will cut it off here for now. I know the subjects have kind of bounced back and forth between Concourse and Cloud Foundry, but it seems when discussing CI/CD concepts the nitty gritty implementation details are often the actual tricky bits which tangle the two rather tightly together. I am also aware that the specific implementation details are often very specific to each company, but it would be great if people could talk about how they have implemented these pipelines / automated pipelines using Concourse and Cloud Foundry at their companies (if you can spare the details of course). Thanks everyone!
During development each product team is under their own org, with each developer possibly having their own development "space" that they could manually cf push to and just develop against. There will be development spaces that devs can just directly push to as well as spaces that can only be used by the automated pipeline to deploy artifacts for functional tests.
Honestly it doesn't matter if you create multiple orgs in your CloudFoundry. If your CI/CD system runs on the same director that is (ab)used by other developers you going to have a hard time probably (i was there).
Once devs finish a feature they would make a pull request, which would trigger a smaller pipeline with some tests using something like the git-multibranch-resource or the git-pullrequest-resource, which would hook into the github required status check hooks and report back if any particular PRs are able to be merged into master or not
We are doing almost the exact thing. For PR's checkout jtarchie's PR resource here https://github.com/jtarchie/github-pullrequest-resource.
The only difference is that we are not using Github checks. The problem with them is, that you have to select a set of checks fixed for a branch.
But in case i just changed manifest xyz in the PR, i don't want to run all tests. You can overcome that problem by using the Github Status API only with the pending and successful status.
Once all checks pass and the pull request is merged into master the below pipeline is kicked off, which validates the master branch before releasing the artifact to prod.
We make PR's into the develop branch and following the Git Flow system. Our releases are merged into master manually.
You want to check first which updates you want to carry out before you merge every PR into master and trigger an update of the production system. Your test cases might be good, but you can always miss something.
What other things could/should be added to this pipeline or any part of this process?
You can have a pipeline which updates releases/stemcells in your manifests automatically.
Once you automate deployment how do you do also automate things like canary releases or rollbacks once something happens? Should this be part of the pipeline or something completely separate?
Test your stuff on a staging system before you go to production. Otherwise you a) won't know if the update is happening at zero downtime and b) to prevent a potential problem in production is always better than doing rollbacks.
Ofc you can also create a rollback pipeline, but if you come to that point something else might be wrong with your setup.
Should the pipeline generate the manifest files? Or should that be completely up to the developers? Only the developers know which services the app needs, but also it seems like things like instance count, memory etc should be something that the performances tests/pipeline should be able to determine/automate. You could generate a manifest inside the pipeline, but then you would not know which services the app needs without reading a manifest....chicken and egg problem?
We write our manifests by ourselves and use the CI/CD system to update/deploy/test them.
But if you find a valid case and a concept which lets you apply your manifest generating pipeline for many cases, i would just try it out.
At the very end you have to decide if a certain atomisation holds a business value for your company.
cheers, Dennis

CouchDB view replication

Using CouchDB to create a hosted app for clients. I have a dev database I work from, as well as separate DBs for each client. Works well, problem is when I make a change on dev, I have to manually copy the view code into each separate DB. It's fine now that I have 2 clients. But my hope is to grow to 100 clients. One small change could take a very long time!
Am I missing something simple in regards to replicating ONLY the views?
Thanks!
Here is how I usually work.
I have my local dev db. create and update my design docs (containing the views).
Have a production deployment db that will be visible to all the clients. I usually use iriscouch. Keep no data in this db.
When setting up a client, make sure you setup one way replication from #2 to this client db.
So to deploy to all clients, I put my latest design docs on the master, then all the clients will then be updated. There are some caveats to this. You have to make sure when you deploy to the master db, that you respect the revisions, so the client dbs will know to update.
Here is a quote from the master, Jason Smith:
The Good Way: Work with _rev
I think your application has a concept of "upgrading" from one
revision to another. There is staging or development code, and there
is production code. Periodically you promote development code to
production. That sounds like two Git branches and it also sounds like
two doc ids. (Or two sets of doc ids.)
You can test and refactor your code all day long, in the temporary doc
(_design/dev). But in production (_design/pro), it's just like a long
Git history. Every revision built from the one previous, to the
beginning of time.
If you want to promote _design/dev, the latest deploy is
_rev=4-abcdef. So this will be the fifth revision deployed, right?
Hey! Stop reading the "_rev" field! But yeah, probably.
COPY /db/_design/dev
Destination: _design/pro?rev=4-abcdef
{"id":"_design/pro","rev":"5-12345whatever"}
Notice that each deployed _design/pro builds from the other, so it
will naturally float out to the slaves when they replicate.
In real-life, you may have add a middle step, pushing design documents
to production servers before actually publishing them. Once you push,
how long will it take couch to build new views? The answer is,
"Christ, who knows?"
Therefore you have to copy _design/dev to _design/staging and then
push that out into the wild. Then you have to query its views until
you are satisfied that they are fresh and fast. (You can compare
"update_seq" from /db vs. "update_seq" from /db/_design/ddoc/_info).
And only then do you HTTP copy from _design/staging to _design/pro and
let that propagate out.
Source
Its not as confusing as it may sound. But to simplify the process, you can use Reupholster
(I admit, I have written this tool). It is mainly for couchapps, but even if you are just promoting design docs, it might be worth you just using reupholster to deploy to your master db. Reupholster adds in some handy info to the design doc, like date/time svn or git info. That way when you look at a clients db you can tell which design doc they are on.
Good luck
You can replicate just the design docs;
http://wiki.apache.org/couchdb/Replication#Named_Document_Replication

Windows Azure - Deploying to subset of instances within a role

I'm looking to implement continuous deployment, pretty much as a proof of concept using Windows Azure, deploying the packages and switching staging <-> production is all fine, however, I would like to add some smarts to the way it is deployed.
Essentially, if I have 10 instances, I want to deploy to all in the staging slot, and switch say 3 of them to production and monitor to make sure there is no statistical error difference between those 3 and the other 7 before switching all to production, or if there is, switch those 3 back to the original production which is now running in staging.
Essentially I want to mimic the sym link switching as described at http://timothyfitz.wordpress.com/2009/02/10/continuous-deployment-at-imvu-doing-the-impossible-fifty-times-a-day/
From what I can see, Azure only allows an all or nothing approach when switching between production and staging? I also thought about having two sets of roles defined, but the issue is there the same end point cant be used in two roles (I dont think?).
Anyone know of a way to do this?
Do a manual in-place upgrade. Then the update will happen one update domain at a time (and you can define how many update domains you want... default is five). If you set it to manual, you're in charge of when you move on to the next update domain. If something goes wrong, you start a new in-place upgrade to the old bits again.

What is the main purpose and sense to have staging server the same as production?

In our company we have staging and production servers. I'm trying to have them in state 1:1 after latest release. We've got web application running on several host and many instances of it.
The issue is that I am an advocate of having the same architecture (structure) of web applications on staging and production servers to easily test new features and avoid creating of new bugs with new releases.
But not everyone agree with me, and for them is not a such big deal to have different connection between staging application instances. Even maybe to have more application and connections between application on staging than on production server.
I would like to ask about pros and cons of such an approach? I mean some good points to agree with me, or some bad why maybe i don't have right. Some examples of consequences and so forth.
If your staging server is substantially different from your production server, then successful deployment and testing on the staging server does not tell you much about whether the world will come down crashing on you when you finally deploy to the production server.
I do not see any real advantage to your colleagues' preferred chaotic situation, to compensate for this obvious disadvantage. What do they claim they gain by letting the staging server's configuration get totally out of sync with that of the production server...?!
Staging is like the dress rehearsal of deployment. If you're not wearing the same costume you will be wearing on the night, how do you know it's going to fit, or you're not going to trip over the dangly bits.
More formally, you try to keep the staging environment as close as possible to the production environment in order to minimize differences which may cause or hide issues in the deployment. Note that I say "close as possible", since it's not always possible to have the same model of disk, or the same network interconnects, but you try to minimize those things that you can within the resources you have available.
Martin Fowler recently blogged about having identical environments that could be cut over from one to the other, so your staging environment becomes your production environment after testing. He says:
One of the challenges with automating deployment is the cut-over itself, taking software from the final stage of testing to live production. You usually need to do this quickly in order to minimize downtime. The blue-green deployment approach does this by ensuring you have two production environments, as identical as possible. At any time one of them, let's say blue for the example, is live. As you prepare a new release of your software you do your final stage of testing in the green environment. Once the software is working in the green environment, you switch the router so that all incoming requests go to the green environment - the blue one is now idle.
I think an approach like this would be a great alternative to the seemingly chaotic environment you have today. Good luck convincing your team!
I would go with "as close as possible" approach as well, as ptomli suggested... and that's mostly due to cost factor. If it is a farm that contains 5 servers, I would never recommend the staging to be just 1 stand alone server. This helps in scenarios where network layer is involved as well. If there are patches that affect network connectivity due to any reason (like security!) a single box staging server might not reflect the "real affect" of patching.

Running test on Production Code/Server

I'm relatively inexperienced when it comes to Unit Testing/Automated Testing, so pardon the question if it doesn't make any sense.
The current code base I'm working on is so tightly coupled that I'll need to refactor most of the code before ever being able to run unit tests on it, so I read some posts and discovered Selenium, which I think is a really cool program.
My client would like specific automated tests to run every ten minutes on our production server to ensure that our site is operational, and that certain features/aspects are running normally.
I've never really thought to run tests against a production server because you're adding additional stress to the site. I always thought you would run all tests against a staging server, and if those work, you can just assume the prouction site is operational as long as the hosting provider doesn't experience an issue.
Any thoughts on your end on testing production code on the actual production server?
Thanks a lot guys!
Maybe it would help if you thought of the selenium scripts as "monitoring" instead of "testing"? I would hope that every major web site out there has some kind of monitoring going on, even if it's just a periodic PING, or loading the homepage every so often. While it is possible to take this way too far, don't be afraid of the concept in general. So what might some of the benefits of this monitoring/testing to you and your client?
Somehow not all the best testing in the world can predict the odd things users will do, either intentionally or by sheer force of numbers (if 1 million monkeys on typewriters can write Hamlet, imagine what a few hundred click happy users can do? Pinging a site can tell you if it's up, but not if a table is corrupted and a report is now failing, all because a user typed in a value with a umlaut in it.
While your site might perform great on the staging servers, maybe it will start to degrade over time. If you are monitoring the performance of those selenium tests, you can stay ahead of slowness complaints. Of course as you mentioned, be sure your monitoring isn't causing problems either! You may have to convince your client that certain test are appropriate to run every X minutes, and others should only be run once a day, at 3am.
If you end up making an emergency change to the live site, you'll be more confident knowing that tests are running to make sure everything is ok.
I have worked on similar production servers from long time. From my experience, i can say is that, Always it is better to test our change changes/patches in Stage environment and just deploy it, in production servers. This is because, both Staging and Production environments are alike, except the volume of data.
If really required, it would be ok, to run few tests on Production servers, once the code/patch is installed. But it is not recommended/good way to run the tests always on the production server.
My suggestion would be to shadow the production database down to a staging/test environment on a nightly basis and run your unit tests there nightly. The approach suggested by the client would be good for making sure that new data introduced into the system didn't cause exceptions within the system, but i do not agree with doing this in production.
Running it in a staging environment would give you the ability to evaluate features as new data flows into the system without using the production environment as a test bed.
[edit] to make sure the site is up, you could write a simple program which pings it every 10 minutes rather than running your whole test suite against it.
What will change in production environment that you would need to run automated tests? I understand that you may need monitoring and alerts to make sure the servers are up and running.
Whatever the choice, whether it be a monitoring or testing type solution, the thing that you should be doing first and foremost for your client is warning them. As you have alluded to, testing in production is almost never a good idea. Once they are aware of the dangers and if there are no other logical choices, carefully construct very minimal tests. Apply them in layers and monitor them religiously to make sure that they aren't causing any problems to the app.
I agree with Peter that this sounds more like monitoring than testing. A small distinction but an important one I think. If the client's requirements relate to Service Level Agreements then their requests do not sound too outlandish.
Also, it may not be safe to assume that if the service provider is not experiencing any issues that the site is functioning properly. What if the site becomes swamped with requests? Or perhaps SQL that runs fine in test starts causing issues (timeouts, blocking etc.) with a larger production database?

Resources