Ruby + Git: Integrating Changes On A Significantly Divergent Branch - ruby

i have an open source ruby project on github, where my master branch represents what has been released, and my dev branch represents what will be released next.
the master branch is ~ 80+ commits behind the dev branch, and the dev branch contains fairly significant architectural changes.
a contributor has sent me a pull request for changes that were made based on the master branch. i want to pull those changes into my dev branch without having to re-write them or do a ton of merge conflict resolutions (which would essentially be rewriting the changes anyway).
what are the best practices for handling a situation like this?

One solution would be:
"Any patches that doesn't apply in a fast-forward manner is rejected."
You could ask your contributor to fetch your dev branch and to replay (rebase) his/her relevant commits on top of the fetched dev branch.
Once those changes works in that dev environment, then he/she can make a new pull request.
That way, you report the extra work on the contributor, and once this refactoring is done, you can enjoy the contribution by applying it simply on top of your current dev.

Related

Git setup question for shared code folder, keeping workflow speed (Win10)

Pretty new to Git - been using TFS and simple commit/push/branching, so any help appreciated - have spent all day reading and running tests and beginning to think my requirement may not be possible.
There are two of us in the office; Dev 1 doing mostly compiled C# server code, and Dev 2 mostly exclusively web page related work. However, as there are only two of us we do need to cross over fairly regularly, particularly with client Javascript functionality.
We've been doing the "mate I'm working on foo.js" method of source control for client side code, and its worked for a while, but we are doing bigger projects and it's becoming a liability.
Our set up is as below, all on an internal network:
Dev 1's machine
Dev 2's machine
Local Windows Server running IIS that serves the websites under development
Shared drive pointing to the IIS root
So, and this is the rapid development cycle I'd like to try and keep, Dev 2 browses to the site under development edits the script / css / html files on the shared drive, hits F5, and the updates are immediately visible. This is a huge benefit for fast working with client side code.
The problems usually occur when Dev 1 needs to make a change to some scripted functionality that happens to require a style change, the same files are opened and saved by both devs, and one of the change sets is lost.
So I'd like to prevent this! However as far as I understand, Git requires the devs to have local repositories so changes can be done without affecting anyone else at all, and then conflicts are merged on commit?
I have set up a test repository on the local server and tried a few scenarios, but as I kind of expected, the scenario where both devs save the same open file is not tracked because neither set of changes has been committed, so as before, only the last set of changes is visible anywhere.
Is there any way of having these type of changes to the same physical file tracked? Or if not, a setup that does track them properly but at least maintains a rapid workflow as close as possible to the above?
Use branches.
Git has a very good branch system. Just create a branch for the work you do - you can even create a branch for every feature you want to implement. And when you "finished" the implementation, merge the branch back to master. So both - and more - developers can work based on a working master version and add there code to the common codebase if it works.
So, the workflow could look like this:
Dev1 and Dev2 clone the repo: git clone ....
Dev1 works on feature A: git checkout -b A (this creates a lokal branch A)
Dev2 works on feature B: git checkout -b B
Dev2 finishes his work: git push (on the first call you get a error message about the upstream, the errormessage contains exactly the line you need to create correct upstream, just copy it)
git checkout master; git pull back to master branch and pull
git merge B this merges B into master
Dev1 need longer for the job and wants to update to newest codebase:
git checkout master; git pull; git checkout A; git merge master branch A of Dev1 is now on new codebase.
If you have to work on different features at the same time, there exists also a good system in git. Based from master branch, create a new branch in its own folder - so that both branches are checked out at the same time and there is no need to git checkout <branch> to switch between them:
git worktree add -b <branch> <path>, like git worktree add -b A ../A
now you can switch to it trough filesystem (cd ../A) and work on both (or others you created the same way)
If you use github or gitlab, you can protect the master branch and create rules to make merges into it (called pull requests). With appveyor, travis-ci and others there exists services where you can let unittests run and give the pull request free it the unittests do not fail. Based on such a workflow, every developer can work on a running codebase.
About conflicts: With the workflow up there they do not happen as long as both versions didn't modify the same line. But you get a message at the (local) merge, and in the files it is good explained what you can do:
(we create a file with a b c in each line, in master we edit b to e, in our branch A we edit b to d, we commit both and merge master into A)
a
<<<<<<< HEAD
d
=======
e
>>>>>>> master
c
Ideally, you would:
isolate the common files in one separate Git repository
separate source control (the remote Git repository) from deployment (files copied on IIS root)
That way, each of you can:
push to a common remote bare repository: configure it to deny any non-fast-forward push. In case of concurrent pushes, you will be forced to pull first, resolve any conflict locally, then push back: there won't be any change overridden or lost that way.
setup a server-side hook in order to (on the server) pull from said bare repository, through a post-receive hook (example here).
reference that common repository in your own development repo through a Git submodule.
The goal is to keep separate:
project-specific development from common client Javascript functionality.
versionning from deployment.
The problems usually occur when Dev 1 needs to make a change to some scripted functionality that happens to require a style change, the same files are opened and saved by both devs, and one of the change sets is lost.
…
Is there any way of having these type of changes to the same physical file tracked?
To get this you need some sort of collaborative editor, that's out-of-scope for any existing vcs I know of.
Or if not, a setup that does track them properly but at least maintains a rapid workflow as close as possible to the above?
You need separate files, separate saves (i.e. a vcs) and a workflow that automates as much as possible of the pull-and-push publishing loop.
Since you're not working on the same physical files, before publishing you need to sync your changes with whatever the other guy(s) on your team have published since last you looked. Decide how you want your final history to look; for small-team work like this rebasing onto a shared linear history is often a great place to start, so git config pull.rebase true. Then when you're ready to publish the changes you've saved, commit, pull, push is your cycle; if you and your buddy are making changes even in the same file it'll still apply cleanly in one go so long as the changes aren't immediately-adjacent or overlapping.

Maintaining multiple branches of the same base project in VS

I've looked around the site but I couldn't find an answer that covers mine entirely, so please excuse me in advance if I missed it.
I inherited a VB.NET project that didn't have source control (it started as a pet project of a long-gone dev and nobody ever bothered after that to put it in), and by a friend's suggestion I thought about using Git for source control.
The project is a niche product that is customized and sold according to the customer's specs, so that brings the problem that even if 95% of the code is the same for all the customers, sometimes up to 10% of the code is changed and tailored for each customer, by changing or adding lines to existing functions, sometimes adding whole blocks of code, but there's no commonality in the changes between different customers (a function changed in one might not be changed in another).
To complicate things further, due to maintenance contracts, updates made to the baseline app have to be replicated in the customer's branches should they want them, and sometimes changes we make for a specific customer are good enough that we want to put them in the baseline app and replicate them to the other customers, BUT keeping the customizations for each customer!
So with my little knowledge of Git, I thought it would be like:
(customer 1)
C1-----
(main) /
A------B------D
\
\ (customer 2)
C2-----
\
\ (customer 3)
C3-----
...but I can't see how it's going to work after that:
Can I merge SOME changes from the customer's branches into the main trunk WITHOUT merging others that are only useful for that customer?
Can I merge SOME changes from the main trunk into each customer's branches WITHOUT losing the customizations in those branches?
Can I "mark" specific lines of code so they are not merged/committed?
Three or more devs will be working in this, each in his own machine but pushing changes to the company's repository for synchronization. What are the implications for this process?
Right now, every customer has a separate folder and separate project files with all their source code. How would be the import process to put those folders them into Git?
All of this must be done with Visual Studio, with Gitextensions and the Git Source provider for VS. Is it supported, or it has to be done with the console?
Thanks and sorry again if it overlaps with another answer.
I'm relatively new to git and normally use PoshGit for all my operations, so while I may not be able to help you with everything, I hope I can help with some things:
Can I merge SOME changes from the customer's branches into the main trunk WITHOUT merging others that are only useful for that customer?
Can I merge SOME changes from the main trunk into each customer's branches WITHOUT losing the customizations in those branches?
From what I understand, both of these operations can be achieved by using git cherry pick, which allows you to pick a particular commit from one branch, and add it to another without merging the branches together.
For example, assuming you want to add a change made to customer1's repository, to customer2:
First you get the hash ID of the commit from customer 1 that you want to insert into customer2
git checkout customer1Branch
git log
commit 2e8c40025939e8cf41dec70f213da75aa462184b
Author: xxxxxxx
Date: xxxxxx
This made a change that you want...
You then copy the first few characters of the hash you want to cherry pick, change to customer 2's branch and cherry pick it into the branch.
git checkout customer2Branch
git cherry-pick 2e8c40025939e8c
Now, if you do a git log, you'll see your cherry pick at the top. A similar tutorial can be found here (http://nathanhoad.net/how-to-cherry-pick-changes-with-git)
Can I "mark" specific lines of code so they are not merged/committed?
You may find help from a similar question was asked and answered here:
Commit only part of a file in Git
Three or more devs will be working in this, each in his own machine but pushing changes to the company's repository for synchronization. What are the implications for this process?
Since GIT is a fully Distributed VCS, each dev on your team will effectively have a full clone of the central repo on his own machine (complete with full history of that repo.) This means that log history queries and other requests (such as finding out who did what) don't need to go through your central server, but can be done privately and offline by each dev.
Similarly, the changes that each dev makes will become available to all of you (for example, all new branches will be available), but it can sometimes be frustrating to be working on the same features if you're not quite used to git.
As always its a good idea to commit early and often, this will decrease the tension you're likely to face when changes clash. you should also set some structure to when pushes are done, especially if you rely on each other's work to continue.
Another idea you may want to try is having one person in charge of the repo and having him merge changes and patches to help coordinate your efforts.
Right now, every customer has a separate folder and separate project files with all their source code. How would be the import
process to put those folders them into Git?
EDIT
Thanks for clarifying what you meant by this question. You could expand on a similar approach adapted from the answer given here: How do you create a remote Git branch?
Create a new mainline branch for your BASE project and push it to your remote repository.
cd baseProjectDirectory # navigate to your main project directory
git init # git initialize the dir
git add . # recursively add all files in directory to git repo
git remote add <remote-branch-name> <remote-url> # Add the url to your remote directory to your git repo
git commit -m "Initial commit of base project"
git push <remote-branch-name> <local-branch-name>
This will establish your Baseline project on a remote repository called remote-branch-name under a branch called local-branch-name.
You can then navigate to your other projects and repeat these steps putting your repositories under different branches on the same remote, by using new local branch names, i.e. instead of using the local-branch-name when creating a branch, just use a new branch name, such as git checkout -b new-local-branch-name
so if, for example your base project push (the last line of code) was:
git push clientproject base
Where "clientproject" is the name of your remote, and "base" is the name of your local branch, you can just change the line to:
git checkout -b client1 # Creates new branch named client1
git branch -d base # Deletes base branch
git push clientproject client1
Note that while it's not strictly necessary to delete the "base" branch before continuing, it does keep your repository cleaner and is thus considered good practice. Don't worry about losing anything though, your entire git history from base will be copied to client1 on checkout.
Also note: Since your situation requires you to do this from different directories, you'll probably be deleting a branch named "master" and not "base".
Pushing like this will keep client1 on the "clientproject" remote, but will place the project under on a new branch called client1, complete with its own history.
The same steps can be used for the rest of the projects. If I've lost you anywhere along the way, I suggest reading the above link (it's much more concise than I am).
All of this must be done with Visual Studio, with Gitextensions and the Git Source provider for VS. Is it supported, or it has to be done
with the console?
I haven't yet used VS with Git, but I assume most if not all these operations would be supported since they are native git commands.
Hope this helps.

how can I work on both default and branch at same time in Hg?

OK, I'm new to Mercurial and version control branching in general, so I may have a fundamental misunderstanding of what's going on here -- please be kind... ;)
We are a small development team (2 developers) working on a project, and we have a need to implement a fairly significant change that may take weeks or months. At the same time, the program is in daily use, so we have a need to make regular patches and fixes.
Because of the long-running nature of the significant change, I created a branch off the default branch (call it dev1). I will want to periodically merge the changes from the default branch into the dev1 branch, for reasons that don't need to be reiterated here. However, I do NOT want the changes from dev1 merged into the default branch until much later in the development.
I have tried several different ways to do this, but it always seems the merge affects both branches. After the merge, if I update to the default I now have changes from dev1 merged into the source.
Can I work on both branches using the same repository? If so, can someone please share the sequence of commands to use? If not, it seems to me I would not be able to push the dev1 branch up to the master repo until it was finished, and that just doesn't seem right.
We are running the latest TortoiseHg for Windows, and for the most part I love the graphical tool. However, I am perfectly willing to drop to the command line to do certain tasks when necessary.
Thanks for any help,
Dave
This depends on what sort of branch you've created.
If you have created a named branch, and are working in a single working directory, then you need to use one workflow, but if you have cloned your production repository, you need to use a different workflow.
Named branch workflow, single repo/working directory
In this case, you are using update to switch between the default branch and the dev1 feature branch.
When you want to work on the default branch, update to it, do your bug fixes, and commit those changes. Do not merge in changes from dev1 branch.
When you want to work on your dev1 branch, update to it, merge in your bug fixes from the default branch, work on your feature and commit when done.
If you are working on the dev1 branch and a colleague fixes a bug in default that you need, commit your work, fetch their changes, merge them in and then resume your work (there are shortcuts you can take here, but this way you can backout the merge if it gets messy)
Note: All of these assume that all of your changes are committed at the point you want to switch between dev1 and default branches.
The important thing to note is that you only get the changes from your dev1 branch in default when you merge them in. If you only merge default into dev1 then your feature branch will keep up to date with default so that when you are ready to deploy the feature into the default branch, you can do so with one simple merge operation.
Unnamed branch workflow using dev1 repo cloned from production repo
This workflow is similar, but allows you to work on the default and dev1 branches simultaneously, without having to update to switch between the two.
When you want to work on the default branch, use the repository where the tip is your production code. Do your bug fixes, and commit those changes just as you would normally.
When you want to work on your dev1 branch, use the repository where the tip is your dev1 feature branch. If there have been fixes in the default repository, pull in the changes and merge them into your clone, but do not push the merge changeset back. Only push your changeset back when you want to deploy you feature to production code. Once the changesets from default have been merged in, you can continue working on the feature.
If you are working on the dev1 branch and a colleague fixes a bug in default that you need, commit your work, fetch their changes from your shared repository into your default production clone, then pull those changes down into your dev1 feature clone, merge them in and then resume your work.
Again, the important thing to note is that you only get the changes from your dev1 branch in default when you push them up to your default production repository. If you only pull/merge default changesets into the dev1 clone then your feature branch will keep up to date with default so that when you are ready to deploy the feature into the default branch, you can do so with one simple push operation.
Yes, you can absolutely do this with Mercurial.
First, in case it isn't clear to you (it wasn't to me for some time), there are 3 types of 'branches' in Mercurial:
clone a repository
a 'named branch' (using the hg branch command)
an anonymous branch, which you can manage with bookmarks or just remembering the changeset
I'm guessing that you used the hg branch method. Be aware that this is often not what you want, because that branch name will live in the repo's history forever (well, there is the --close-branch option, but still...).
The essential workflow is:
update to dev branch with hg up devbranch
commit changes to dev branch
merge with main branch via hg merge default or just hg merge as desired
(repeat as desired)
And for working on the default branch:
update to default branch with hg up default
commit changes
(repeat as desired)
Do NOT do this:
update to default branch with hg up default
merge with dev branch with hg merge
I suspect that you are using the command hg merge without specifying a branch name. That will merge with any other head, which may or may not be what you want.
Edit: The above info is probably not your issue. Your issue is probably running a merge when your current branch is the default one.
You don't want to run hg merge from your default branch.
# bang on dev1
# more banging on dev1
# someone beats on default for a while
# update to dev1
hg up dev1
# bring in the changes from default
hg merge -r default
# validate successful merge
hg commit -m "merging"
The key is committing on dev1 when you bring changes over from default.
Note that I'm using named branches here.
This sentence:
After the merge, if I update to the default I now have changes from dev1 merged into the source.
tells me that you're doing something wrong. It is perfectly doable what you want to do, work on two branches in parallel, and merge from one to the other, without influencing both.
It is important to know that the merge is a directional merge. You merge from one branch to the other, and when you initiate the merge, you should be on the to-branch.
directional in the sense that the direction plays a role in the outcome. For the actual contents of the file, it doesn't matter which direction you merge, but the new merge-changeset you commit will be on the branch you was on when you initiated the merge (unless you override.)
So, update to the head of dev1 first, then merge with default, and after committing, you should have a new changeset on the dev1 branch, but default should be left undisturbed.
This is more of a tip than an answer, but...
I use this workflow a lot. I find the Transplant extension very useful for named branch workflows. TortoiseHg supports it, so you can enable it in the TortoiseHg options. It lets you cherry-pick from other branches, which is very useful - especially if you regularly commit to the wrong branch.

Multiple feature branches and continuous integration

I've been doing some reading about continuous integration recently and there is a scenario which could occur which I don't understand how to deal with appropriately.
We have a stable mainline/trunk branch and create branches for features. Each developer will keep their own feature branches up to date by merging from trunk into their branch on a regular basis. However it is entirely possible that two or more feature branches could be created and worked on over a period of several weeks or months. In this time many releases of the software could be deployed. This where my confusion arises.
It is very likely that changes for one feature branch will cause merge conflicts with other feature branches. CI suggests you should merge into trunk at least daily which would resolve the conflicts quickly. However, you may not want to merge the feature code into trunk because it may not be finished or you may not want that feature available in the next release. So, how do you deal with this scenario and still follow CI principles of daily code integration?
There are no feature branches in proper CI. Use feature toggles instead.
The idea explained more fully in this article is to merge from the trunk/release branch to feature branches daily, but only merge back in the other direction once a feature meets your definition of 'done'.
Code written by one feature team will be pushed into the trunk once it's complete, and will be 'distributed' to the other teams, where conflicts can be dealt with, as part of the daily merge process.
This doesn't go as far as satisfying Nick's desire for a version control system that can be used a backup tool, unless the changes being made are small enough that they can be committed to the feature branch within a timeframe where the the risk of losing your work is acceptable.
I personally don't try to reintegrate code into the release branch before it's done, and although I've never really tried, I'm sure building feature toggles in for unfinished work has its own issues.
I think they mean merging mainline into the feature branch, not the other way 'round. This way, the feature branch will not deviate from mainline too much, and be kept in an easily mergeable state.
The git folks do the same thing by rebasing feature branches on top of the master branch before submitting a feature.
In my experience with CI, the way that you should keep your feature branches up to date with the main line changes as others have suggested. This has been working me for several releases. If you are using subversion make sure you to merge with the merge history enable. This way when you are trying to merge your changes back to line it will only like you are merging the feature changes to line, not trying resolve conflicts which your feature might have with the main line. If you are using more advance VCS like git the first merge will be a rebase where the second will be a merge.
There are tools that can support you to get thins done more smoothly like this Feature branches with Bamboo
Feature branches committing back into the mainline, and OFTEN is an essential feature of Continuous Integration. For a thorough breakdown, see This Article
There's now some good resources showing how to combine both CI and feature branches. Bamboo or Feature Branch Notifier are some ways to look.
And this is another quite long article showing pros of so called distributed CI. Hereunder, one excerpt explaining the benefits:
Distributed CI has the advantage for Continuous Deployment because it keeps a clean and stable Mainline branch that can always be deployed to Production. During a Centralized CI process, an unstable Mainline will exist if code does not integrate properly (broken build) or if there is unfinished work integrated. This works quite well with iteration release planning, but creates a bottleneck for Continuous Deployment. The direct line from developer branch to Production must be kept clean in CD, Distributed CI does this by only allowing Production ready code to be put into the Mainline.
One thing that still can be challenging is keeping the branch build isolated so that it doesn't pollute your repository of binaries by pushing its branch builds to it. Bamboo seems to address that, but not sure it's as easy with Jenkins.

Handling versioning in a continuous integration environment

How do you handle versioning in a continuous integration environment where there is a development branch and a release branch? I'm using git so there is no incrementing repository version to use. Seems like there will be overlapping versions such as 1.1.0 on the dev branch and 1.1.0 on the release branch. Do you just append the text "dev" or "release"?
Also, when you create a release branch do you immediately increment the development branch to the next "proposed" release number? You may not know the next release number yet but if you don't increment it then you have 1.1.0 dev containing new work not included in 1.1.0 release.
So my main question is what is the relationship in the versioning sequences between these two branches?
Keep in mind, I'm not asking anything about how to decide what version numbers to use. I tried asking this before and kept getting comments like "increment major for breaking changes" etc.
I don't version the dev branch. The devline is the trunk and I periodically branch from dev to a new release folder. So the release branch is full of folders which are basically snapshots of the devline.
IE, under root I have /dev, /releases/0.1, /releases/0.2, /releases/1.0, etc.
I'm not sure if this really answers your question.
I would recommend set a final activity for your CI environment to make tags. I believe the git command looks like this: git tag -a name
We use Major.Minor.Release.BuildNumber
though some places use Major.Minor.Release.CheckinNumber
So, if you want to use that then I would version your dev branch, otherwise just version the release branch.
If you have only one development branch, it is more effective to make it be the trunk and branch off a release branch every time you just want to stabilize for release. If you have multiple feature projects, you can have a branch for each of them with CI setup on those. Once they are done, you merge them one by one to the trunk and once all are merged, you get to the first scenario, where you branch a release branch off of the trunk again.
In any case, you don't keep the development branch going between releases. You want to end it and start a new one for development for the next version. This way some of the features can be branched off during several releases if they take longer. But also you don't have too much mess on your development branches.
You can branch the development branches for the next version as soon as you branched the release branch or even before if you chose to, but it is generally good idea once you stabilize for release, merge the changes from release branch into trunk and from there into the development branches if you do that. If you wait with branching off after release, you avoid few merges there, but you slow down development.

Resources