This answer explains that normally a git commit SHA is generated based on various parameters. However, I would like to know: how can one specify a custom/particular/specific git commit sha (in Bash)?
For example, suppose one wants to create and push a commit to Git with the following sha:
1e23456ffd118db9dc04caf40a442040e5ec99f9
(For simplicity, assume one can assume it is a unique sha).
The XY-problem is a manual mirror script between two different Git servers. It would be more convenient to simply have identical commit SHA's than to keep a mapping of the commits between the Git servers. This is because the manual mirror is more efficient (saving computation time and server bandwidth) if I can skip certain commits from the source server. Yet that means the parent commits change in the target server, with respect to the same commit in the source server. In turn, that would imply the SHA changes, which would require me to keep track of a mapping of the sha's in the source and target server. In short, it would be more convenient to simply override the sha's of the commits to the target server, than to ensure the two servers have the exact same commits (for the few commits that are actually mirrored).
A commit SHA isn't just "normally" generated based on those parameters, it is by definition a hash of those parameters. "SHA" is the name of the hashing algorithm used to generate it.
Rather than trying to change the commit hashes, you should look for an efficient way to track them. One approach would be similar to how plugins like git svn work:
When copying a commit to the mirror, record the original commit hash as part of the new commit's commit message.
Possibly, since you're "skipping" commits in the original repo, each new commit should have multiple source hashes, since it will act like a "squash" of those commits.
Have a script which processes the result of git log and extracts these recorded commit hashes. This can then be used instead of the real commit hashes when determining what new commits to copy from the source.
However, make sure this is all worth it: if the eventual changes are all included, the chances are that git's existing de-duplication and compression will mean the overhead of the "skipped" commits is fairly low.
Since you've already outlined in your question that you have ways of handling your differences, I will assume this question is really and only this:
I would like to know: how can one specify a custom/particular/specific git commit sha (in Bash)?
And not "or do you have any other ideas that I could use instead".
And with that question, the answer is actually quite simple:
You can't.
Git doesn't just calculate the commit id because that's just a by-product of the implementation chosen. The way it is done is a core concept of how git is designed.
The commit id is calculated based upon the content of the commit, and this includes, as you have observed, the link to the parent. Change the parent but keep everything else identical, the commit id still changes.
This is core to how the distributed part of the version control system works, and cannot be changed.
You simply cannot change the id of a commit and keep the contents of it the same. This is by design
There has been some attempts at doing commit collisions by carefully constructing distinct commits that end up having the same id.
Here's such a successful attempt (collision): https://www.theregister.com/2017/02/23/google_first_sha1_collision/
First ever' SHA-1 hash collision calculated. All it took were five clever brains... and 6,610 years of processor time
I don't believe anyone yet have managed to take an arbitrary commit and then targeting a specific commit id with it. The collisions were carefully constructed by manipulating two commits simultaneously according to very specific criteria such that they arrived at the same id, but that id was not chosen by the researches.
TL;DR: It can't be done
The net effect of the collision(s) generated though is that Git will move away from SHA-1 at some point and go for a system that produces longer, and "more secure" (tm) hashes than what we have today. Since Git also wants to be backwards compatible with existing repositories, this work is not yet fully completed.
From the comment by CodeCaster, it seems I could use the freely choosable bits in the commit message in `git commit -m "some message" to ensure the sha of the commit ends up with a specific value.
However, based on the comment by Lasse V. Karlsen I would assume this approach requires non-linear computation resources. I did not go into detail in this, however I imagine/assume that as the commit history grows, the relative impact of the (limited (5mb) ) freely choosable bits of the commit message becomes smaller. I guess that could be an explanation on why leveraging these freely choosable bits in the commit message becomes costly.
So in practice, the answer seems to be: "You could (perhaps, if you spend a lot of computational resources), but you shouldn't.".
how can one specify a custom/particular/specific git commit sha (in Bash)?
One cannot. The commit hash is a value constructed, as you say, by hashing various values together, and the whole point is to uniquely identify a particular commit. You could commit the same set of files at a different time on a different machine and you'd end up with a different commit hash.
The way to ensure that you have the same commits on two different machines is to git pull (or similar) those commits from one machine to the other.
You don't necessarily have to move all the commits -- you could e.g. squash them or cherry-pick only certain commits.
Related
I am not sure if I can do anything about this and it is not a huge hardship to leave it as it is.
I did try to fix things by following steps on other SO topics etc. and ended up loosing all my help revision commits and files.
Things are a little messy and I will try to explain. The history is OK up to a certain point:
Removing unused resource ID values from resource.h
Can you see that towards the bottom of the screenshot of the log?
Since that time, the majority of the commits are help file revisions:
Deleting help topics and redundant images
Revising help topics and images
Adding new help topics and images
But it gets complicated because with that big chunk of help revision commits I have some code change commits. Eg:
Added SetLoggingPath to CMSATools.
Revised CChristianLifeMinistryUtils::FillStudentsListBox method. Now it reads the students from the publishers database.
The plot thickens, for a small handful of them, eg:
Add Help menu to CPublishersDatabaseDlg.
In those cases the commit is a combination of code changes and help revision changes:
Added OnHelpHelp menu handler.
Started writing Help/HelpPublisherDatabase.html help topic.
The primary issue:
Beginning at this commit in the master branch: Removing unused resource ID values from resource.h can I make a new feature branch called: help-revisions and then, move the commits from master to the feature branch?
If it is possible, I am assuming we would need to move just the commits that are purely help revisions. I am not sure how to handle the commits that are a mixture of help changes and code changes.
So, ideally I am hoping to split out all the help revisions into a feature branch so that it can be merged in to the master and look better in the log. Leaving the code tweak commits alone in the master in an appropriate position.
The related matter is the cause of some of this. But I am not going to discuss that here after all.
As mentioned, I am just curious as to know if it is possible to improve the history I have as indicated.
I am a lone developer so do not have to worry about other individuals repositories.
Thanks for your help and time.
Update
I have given the rebase a go. I marked all the commits I wanted to split as edit. Then I started the rebase. I ticked edit/split and revised them as I needed until it completed.
Now my log looks like this:
Underneath, it looks like this:
So how do I get rid of that section? I have to fix that before I create the feature branch and do the cherry picking.
So, at the top I now have a new set of all the commits including the split ones.
Got it - did a force push of master branch.
This can be accomplished by a mixture of cherry-picking and rebasing.
Create a new feature branch which is before all your commits which are affected. Then select all commits you want to have on that new feature branch and select "Cherry-pick commits". After that you have a branch where only the selected commits are on.
Switch back to the previous branch and do a rebase on the parent of the newly created branch (you will need to enabled "force"). Now mark all cherry-picked commits again and select skip and start the rebase. Now, this branch does not contain the cherry-picked branches any more.
I know I can do a "compare" between two changesets and get a list of the changes made in the period of time between the changesets in question.
However, from that list I would like to exclude all changes that are the result of merge operations only (change types merge; merge, edit; merge, branch; etc.).
My goal is to get a list of what changes (edits, adds, deletes, ...) have been made within the particular branch, including to any files which have also had changes merged into the branch from other branches, without cluttering up the list with changes made in other branches and simply merged into my branch of interest.
How do I do that?
Getting a list of changes to a particular branch is quite easy. In source control, just press Ctrl-G. You can then filter on the branch and get a list of changes, and you can specify the change sets; then select Find. This will include merges though.
This may not completely solve your issue, but it will help I guess.
If you know about the source branches which would have merged to your branch then you can make use of the TF MERGES COMMAND which will give you the changeset numbers on when the merges happened.
I was about to commit about 1000 files at once after few refactoring stuff. Is it advisable to commit such huge number of files or I should commit them in batches. I am trying to look at pros and cons sort of.
One of the pros is that I will have same entry in the SVN for all my changes and will be easy to navigate.
With a number of files as small as 1000, I would worry less about performance and more about correct work flow. 1000 files is a lot of files and thus a lot of changes, but Subversion should handle it reasonably well.
However, if all of the changes are not actually 1 change, then it should not be one commit. For example, if you're renaming 3 functions, I would make each rename a separate commit. Depending on what specifically you're doing, you may be able to get away with one commit, but a year from now when you're browsing through the logs, you'll make life easier on yourself if you tend to stick to small commits. If it really is only one change, then one commit is definitely your best option (for example, renaming one function).
SVN can handle 1000 files at once. The only reason to check in batches is to give each batch a different commit message, like "fixed bug #22" and "added flair".
the number of files doesnt really matter.
when you commit changes to your code repo, you should be thinking of build statiblity and test compliance.
That answers your question: If you have made changes to n files and only commit some of them, then you're likely to break the build (not even talking abt the tests). So you should commit all necessary files to guarantee build integrity at least.
svn and other tools are well capable of dealing with such nb of files, which will represent a single transaction on the server.
In this article, the author explains rebasing with this diagram:
Rebase: If you have not yet published your
branch, or have clearly communicated
that others should not base their work
on it, you have an alternative. You
can rebase your branch, where instead
of merging, your commit is replaced by
another commit with a different
parent, and your branch is moved
there.
while a normal merge would have looked like this:
So, if you rebase, you are just losing a history state (which would be garbage collected sometime in the future). So, why would someone want to do a rebase at all? What am I missing here?
There are variety of situations in which you might want to rebase.
You develop a few parts of a feature on separate branches, then realize they're in reality a linear progression of ideas. Rebase them into that configuration.
You fork a topic from the wrong place. Maybe it's too early (you need something from later), maybe it's too late (it actually applies to previous versions as well). Move it to the right place. The "too late" case actually can't be fixed by a merge, so rebase is critical.
You want to test the interaction of a branch with another branch, but for some reason don't want to merge. For example, you might want to see what conflicts crop up commit-by-commit, instead of all at once.
The general theme here is that excessive merging clutters up the history, and rebasing is a way to avoid it if you didn't get your branch/merge plan right at first. Too many merges can make it hard for a human to follow the history, and also can make it harder to use tools like git-bisect.
There are also all the many cases which prompt an interactive rebase:
Multiple commits should've been one commit.
A commit (not the current one) should've been multiple commits.
A commit (not the current one) had a mistake in it or its message.
A commit (not the current one) should be removed.
Commits should be reordered (e.g. to flow more logically).
While it's true that you "lose history" doing these things, the reality is that you want to only publish clean work. If something is still unpublished, it's okay to rebase it in order to transform it to the way you should have committed it. This means that the final version in the public repository will be logical and easy to follow, not preserving any of the hiccups a developer had along the way.
Rebasing allows you to pick up merges in the proper order. The theory behind merging means you shouldn't have to worry about that. The reality of resolving complicated conflicts gets easier if you rebase, then merge new changes in order.
You might want to read up on Bunny Hopping
My co-worker is trying to merge his development branch back into the baseline. Even though he only modified a couple files, all files in the baseline are being checked out for merging. As if it's a baseless merge. What gives?
I don't experience this and the only difference I can see is that I branched directly from the baseline and he made a branch and then did a "move" on the branch. Does moving a branch mess up the link back to the baseline? He is still able to select the baseline in the GUI so I don't think it's doing a baseless merge since that's only available via command line, but it's behaving like that.
Anyone got some insight or know what else we should check?
This is by design. TFS needs to mark the changeset where you moved the source branch as "already accounted for" so it's no longer a candidate next time you merge.
Merge history is recorded at Checkin time by updating all of the pending changes that have their Merge bit set. Ordinarily, this is accompanied by other change types like Edit, Delete, etc. If not, it's just a recordkeeping transaction like the case you've encountered. (there are other cases) No files will be modified by checking in the "no-op" merges.