CouchDB view replication - view

Using CouchDB to create a hosted app for clients. I have a dev database I work from, as well as separate DBs for each client. Works well, problem is when I make a change on dev, I have to manually copy the view code into each separate DB. It's fine now that I have 2 clients. But my hope is to grow to 100 clients. One small change could take a very long time!
Am I missing something simple in regards to replicating ONLY the views?
Thanks!

Here is how I usually work.
I have my local dev db. create and update my design docs (containing the views).
Have a production deployment db that will be visible to all the clients. I usually use iriscouch. Keep no data in this db.
When setting up a client, make sure you setup one way replication from #2 to this client db.
So to deploy to all clients, I put my latest design docs on the master, then all the clients will then be updated. There are some caveats to this. You have to make sure when you deploy to the master db, that you respect the revisions, so the client dbs will know to update.
Here is a quote from the master, Jason Smith:
The Good Way: Work with _rev
I think your application has a concept of "upgrading" from one
revision to another. There is staging or development code, and there
is production code. Periodically you promote development code to
production. That sounds like two Git branches and it also sounds like
two doc ids. (Or two sets of doc ids.)
You can test and refactor your code all day long, in the temporary doc
(_design/dev). But in production (_design/pro), it's just like a long
Git history. Every revision built from the one previous, to the
beginning of time.
If you want to promote _design/dev, the latest deploy is
_rev=4-abcdef. So this will be the fifth revision deployed, right?
Hey! Stop reading the "_rev" field! But yeah, probably.
COPY /db/_design/dev
Destination: _design/pro?rev=4-abcdef
{"id":"_design/pro","rev":"5-12345whatever"}
Notice that each deployed _design/pro builds from the other, so it
will naturally float out to the slaves when they replicate.
In real-life, you may have add a middle step, pushing design documents
to production servers before actually publishing them. Once you push,
how long will it take couch to build new views? The answer is,
"Christ, who knows?"
Therefore you have to copy _design/dev to _design/staging and then
push that out into the wild. Then you have to query its views until
you are satisfied that they are fresh and fast. (You can compare
"update_seq" from /db vs. "update_seq" from /db/_design/ddoc/_info).
And only then do you HTTP copy from _design/staging to _design/pro and
let that propagate out.
Source
Its not as confusing as it may sound. But to simplify the process, you can use Reupholster
(I admit, I have written this tool). It is mainly for couchapps, but even if you are just promoting design docs, it might be worth you just using reupholster to deploy to your master db. Reupholster adds in some handy info to the design doc, like date/time svn or git info. That way when you look at a clients db you can tell which design doc they are on.
Good luck

You can replicate just the design docs;
http://wiki.apache.org/couchdb/Replication#Named_Document_Replication

Related

Advice on resetting Mongock Changesets

We are using Mongock in our spring-boot/kotlin microservices as our main Mongo DB migration tool and it is working perfectly. We started with a simple json file to create a few collections and have been adding changesets for a while now.
By now we have so many changesets that it is becoming hard to see what 'the truth' is about our database. We do not have one single json file or a bunch of files which tells us what the state of the database is, it is an accumulation of the start json and all changesets.
I would like to create a new baseline script based on the current situation and start over.
What are some best-practices to achieve this? Of course without losing data etc.
I believe you are asking two different things. First How to know the current state of the Mongock migrations, and second, How to perform a baseline
How to know the current state
Mongock offers a cli(currently in beta, although is pretty stable) which offers this.
Simple running ./mongock state db -aj yourApp.jar
Currently it requires passing your application jar, but this dependency will be removed.
More information in the documentation.
CLI page
CLI operations page
State operation section
Springboot example readme
Baseline
Mongock currently doesn't provide this feature in the CLI, but it's one of the next items in our roadmap

Best practices for continuous deployment with Concourse and Cloud Foundry

So I've been investigating ci/cd pipelines using concourse and cloud foundry lately, and I've been confused about what the best way to do this is. So I've been thinking about how the overall flow would go from development to release. There are a lot of talks and videos that discuss this at a very high level, but often they abstract away too much of the actual implementation details for it to be useful. Like how do people actually roll this out in actual companies? I have a lot of questions, so I will try to list a few of them here in the hope that someone could enlighten me a little.
What does the overall process and pipeline look like conceptually from development to prod? So far I have something along the lines of :
During development each product team is under their own org, with each developer possibly having their own development "space" that they could manually cf push to and just develop against. There will be development spaces that devs can just directly push to as well as spaces that can only be used by the automated pipeline to deploy artifacts for functional tests.
Once devs finish a feature they would make a pull request, which would trigger a smaller pipeline with some tests using something like the git-multibranch-resource or the git-pullrequest-resource, which would hook into the github required status check hooks and report back if any particular PRs are able to be merged into master or not
Once all checks pass and the pull request is merged into master the below pipeline is kicked off, which validates the master branch before releasing the artifact to prod.
code repo [master] -> build -> snapshot artifact repo -> deploy to test space -> run functional tests -> deploy to staging space -> run smoke tests and maybe other regression tests -> deploy artifact to prod -> monitoring/rollbacks (?)
What other things could/should be added to this pipeline or any part of this process?
Once you automate deployment how do you do also automate things like canary releases or rollbacks once something happens? Should this be part of the pipeline or something completely separate?
I've been playing with the idea of creating spaces temporarily and then tearing them down for the functional testing phase, would there be any benefit to doing that? The idea is that the apps being deployed would have their own clean environments to use, but this could also potentially be slow, and it is difficult to know what services are required inside of each space. You would have to read the manifest, which only specifies service-names, which seems to necessitate some sort of canonical way of naming service instances within the same space? The alternative is managing a pool of spaces which also seems complicated...
Should the pipeline generate the manifest files? Or should that be completely up to the developers? Only the developers know which services the app needs, but also it seems like things like instance count, memory etc should be something that the performances tests/pipeline should be able to determine/automate. You could generate a manifest inside the pipeline, but then you would not know which services the app needs without reading a manifest....chicken and egg problem?
I have many more burning questions, but I will cut it off here for now. I know the subjects have kind of bounced back and forth between Concourse and Cloud Foundry, but it seems when discussing CI/CD concepts the nitty gritty implementation details are often the actual tricky bits which tangle the two rather tightly together. I am also aware that the specific implementation details are often very specific to each company, but it would be great if people could talk about how they have implemented these pipelines / automated pipelines using Concourse and Cloud Foundry at their companies (if you can spare the details of course). Thanks everyone!
During development each product team is under their own org, with each developer possibly having their own development "space" that they could manually cf push to and just develop against. There will be development spaces that devs can just directly push to as well as spaces that can only be used by the automated pipeline to deploy artifacts for functional tests.
Honestly it doesn't matter if you create multiple orgs in your CloudFoundry. If your CI/CD system runs on the same director that is (ab)used by other developers you going to have a hard time probably (i was there).
Once devs finish a feature they would make a pull request, which would trigger a smaller pipeline with some tests using something like the git-multibranch-resource or the git-pullrequest-resource, which would hook into the github required status check hooks and report back if any particular PRs are able to be merged into master or not
We are doing almost the exact thing. For PR's checkout jtarchie's PR resource here https://github.com/jtarchie/github-pullrequest-resource.
The only difference is that we are not using Github checks. The problem with them is, that you have to select a set of checks fixed for a branch.
But in case i just changed manifest xyz in the PR, i don't want to run all tests. You can overcome that problem by using the Github Status API only with the pending and successful status.
Once all checks pass and the pull request is merged into master the below pipeline is kicked off, which validates the master branch before releasing the artifact to prod.
We make PR's into the develop branch and following the Git Flow system. Our releases are merged into master manually.
You want to check first which updates you want to carry out before you merge every PR into master and trigger an update of the production system. Your test cases might be good, but you can always miss something.
What other things could/should be added to this pipeline or any part of this process?
You can have a pipeline which updates releases/stemcells in your manifests automatically.
Once you automate deployment how do you do also automate things like canary releases or rollbacks once something happens? Should this be part of the pipeline or something completely separate?
Test your stuff on a staging system before you go to production. Otherwise you a) won't know if the update is happening at zero downtime and b) to prevent a potential problem in production is always better than doing rollbacks.
Ofc you can also create a rollback pipeline, but if you come to that point something else might be wrong with your setup.
Should the pipeline generate the manifest files? Or should that be completely up to the developers? Only the developers know which services the app needs, but also it seems like things like instance count, memory etc should be something that the performances tests/pipeline should be able to determine/automate. You could generate a manifest inside the pipeline, but then you would not know which services the app needs without reading a manifest....chicken and egg problem?
We write our manifests by ourselves and use the CI/CD system to update/deploy/test them.
But if you find a valid case and a concept which lets you apply your manifest generating pipeline for many cases, i would just try it out.
At the very end you have to decide if a certain atomisation holds a business value for your company.
cheers, Dennis

Ho to do blue/green deployment for hybris?

I want to deploy hybris builds with zero down time. Our technical architecture consist of two frontend servers, two backend servers, two master/slave solr clusters, but a single DB server (MS SQL 2012). A new build may require patch execution which changes the DB schema.
Would it be possible to achieve this in a single DB landscape?
If two DB's are required (blue and green), then what is the best practice for DB replication in case of hybris?
Hybris does provide a rolling update feature (when you're running it in a cluster environment).
This is targeted to allow for zero downtime.
You can find more information on the hybris help pages, e.g.
https://help.hybris.com/6.5.0/hcd/8c455268866910149b25f7b53d1af3e1.html
Looking at the first picture there it seems to be pretty much fitting for the architecture you describe.
(But to be honest I have no experience with it, so I can't tell you whether or or how well it works :) )
If you have risky changes or end up needing to rollback your rolled out update you will have to do quite a bit of db cleanup etc.
From that perspective a blue/green setup might sound better although with db replication you would end up with the same problem (as your updated schema would be replicated as well I assume).
Hybris only adding new columns to db, never change their type or remove them. So single DB can be OK. I didn't test this using store front while updating system. I think it will be OK.
On the other hand you need development for empty/null check for new attributes in development.

How to schedule backups for development, testing, and production environments?

I was educating myself about website launches. I was reading this document to prepare an implementation check-list for my future website launches.
On page 11, they mentioned
Schedule backups for development, testing, and production environments
I can imagine taking backup of a database and website code.
How can someone automate and schedule backup of an entire x, y or z environment?
If this is a very broad question then I do not need the step by step document for doing this. Maybe some starting point which help me imagine this will be good enough.
Update:
I have sent an inquiry to Pantheon looking forward for their response.
It all depends on how you set up your environments, how complex they are, and what you are trying to back up.
How often do your environments change?
You can use something like Docker to keep track of the changes to your environments. It basically gives you version control for environments.
If you're using a hosting provider like DigitalOcean, they provide snapshots and backups of entire virtual images.
Read about rsync - https://wiki.archlinux.org/index.php/Full_system_backup_with_rsync
You could use rsync if you manually keep track of all of your config files and handle the scheduling and syncing yourself.
As for the actual site:
What's changing on your sites?
For changes to the code of a site, you should be using version control so that every single change is backed up to a repository like Github.
Most of the time though, the only thing that is changing is data in a database. Your database can easily be managed by a third party like Amazon RDS who offers automated backups. If you want to manage your own database on your own server, then you can have a cron job to automatically perform a backup of your database at scheduled times. Then you can use rsync to sync that backup to a separate machine. (You want to spread your backups across a couple of machines in different locations)
You can run cron jobs at regular interval to backup your data. The most important data could be your transactional information or the entire database itself.
Code, you can manage at developers end using github. But if you have certain folders which are populated by website users, then plan to backup those as well.

Maintaining Multiple Databases Across Several Platforms

What's the best way to maintain a multiple databases across several platforms (Windows, Linux, Mac OS X and Solaris) and keep them in sync with one another? I've tried several different programs and nothing seems to work!
I think you should ask yourself why you have to go through the hassle of maintaining multiple databases across several platforms and have them in sync with one another. Sounds like there's a lot of redundancy there. Why not just have one instance of that database, since I'm sure it can be made accessible (e.g. via SOA approach) to multiple apps on multiple platforms anyway?
Why go through the hassle? Management claims it's more expensive?
Here's how to prove them wrong.
Pick one database, call it the "master" or "system of record".
Write scripts to export data from the master and load it into your copies. If you have a nice database (MySQL, SQL/Server, Oracle or DB2) there are nice tools to do this replication for you. If you have a mixture of databases, you'll have to resort to exporting changed data and reloading changed data. The idea is that this is a 1-way copy: master to replicants.
Fix each application, one at a time, to do updates in the master database only. Since each application has a JDBC (or ODBC or whatever) connection to a database, it can just as easily be a connection to the master database.
Once you've fixed the applications to update only the master, the replicas are worthless. Management can insist that it's cheaper to have them. And there they are -- clones of the master database -- just what management says you must have.
Your life is simpler because the apps are only updating the system of record. They're happy because you have all the cloned databases lying around.

Resources