Automate post-"git move", making history log stick - bash

I am facing merging of a few repositories in to one, with miscellaneous file moved around
Based on some research on SO, SO, how to merge repositories I ended up with following sketch:
user=some_user
new_superproj=new_proj # new repository, will include old repositories
hosting=bitbucket.org # gitgub etc
r1=repo1 # repo 1 to merge
r2=repo2
...
# clone to the new place. These are throw-away (!!!) directory
git clone git#${hosting}:${some_user}/${r1}.git
git clone git#${hosting}:${some_user}/${r2}.git
...
mkdir ${new_superproj} && cd ${new_superproj}
# dummy commit so we can merge
git init
dir > deleteme.txt
git add .
git commit -m "Initial dummy commit"
git rm ./deleteme.txt
git commit -m "Clean up initial file"
# repeat for all source repositories
repo=${r1}
pushd .
cd ../${repo}
# In the throw-away repository, move to the subfolder and rewrite log
git filter-branch --index-filter '
git ls-files -s |
sed "s,\t,&'"${repo}"'/," |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
' HEAD
popd
# now bring data in to the new repository
git remote add -f ${repo} ../${repo}
git merge --allow-unrelated-histories ${repo}/master -m "Merging repo ${repo} in"
# remove remote to throw-away repo
git remote rm ${repo}
So far so good, unless we want to move files around while still preserving log. Git is sucks on move/rename and log rewrite fragment is not quite adapted, hence rewrite done uniform way, recursively for whole directory
Idea is, while files are moving we know there are no other changes in repository but renames and moves. So, how can I rewrite following part to be canonical, per file. Taken from git filter-branch, official documentation
git filter-branch --index-filter \
'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
git update-index --index-info &&
mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
I have hard time to understand stuff past 'sed' and how it is applied for git filter-branch
I want to run script (bash, python etc), so:
for each file in repository get moved/renamed
...
# in the loop, moved/renamed file found
old_file="..." # e.g. a/b/c/old_name.txt
new_file="..." # e.g. a/b/f/g/new_name.txt, at this point it is known, old_file and new_file is the same file
update_log_paths(old_file, new_file) # <--- this part is needed
Any ideas?

As it turned out to be, hinting from the following command Move file-by-file in git, it is as simple as (pseudocode):
move_files
cd repo_root
git add . # so changes detected as moves, vs added/deleted
repo_moves=collect_moves_data()
git reset HEAD && git checkout . && git clean -df . # undo all moves
Biggest misunderstanding I found is "git log --follow" or other, "stronger" options doesn't work for many in related SO questions:
git log --follow <file>
does not show log until moved, while unchanged, file is committed.
for each_move in repo_moves
old_file, new_file=deduct_old_new_name(each_move)
new_dir=${new_file%/*}
filter="$filter \n\
if [ -e \"${old_file}\" ]; then \n\
echo \n\
if [ ! -e \"${new_dir}\" ]; then \n\
mkdir --parents \"${new_dir}\" && echo \n\
fi \n\
mv \"${old_file}\" \"${new_file}\" \n\
fi \n\
"
git filter-branch -f --index-filter "`echo -e $filter`"
If you need to get back:
git pull # with merge
git reset --hard <hash> # get hash of your origin/master, orignin/HEAD), which will be HEAD~2, but I'd check it manually and copy/paste hash

Related

I have a code that reads git log returning the error below

Error message: "fatal: your current branch 'master' does not have any commits yet"
After Making a file with this code executable
#!/usr/bin/env bash
cd "$(dirname "$(readlink -f "$BASH_SOURCE")")/.."
{
cat <<- 'EOH'
EOH
echo
git log --format='%aN <%aE>' | LC_ALL=C.UTF-8 sort -uf
} > AUTHORS
The problem is that you didn't add anything, and possibly didn't even have a change to commit, so no commit was done. If you really want that first commit without any changes, you can do this:
git commit --allow-empty -m "first commit"

Can git filter repo create a monorepo from many repos interweaving commits by date?

Using git-filter-repo is it possible to combine N repositories into a mono-repository re-writing the commits so that the commits are interwoven, or "zippered" up by date?
Currently, I'm testing this with only 2 repos with each repo having their own subdirectory. After the operation, the commits for each repo are on "top" of each other rather than interwoven. What I really want is to be able to have a completely linear history by authored data without the added merge commits.
rm -rf ___x
mkdir ___x
cd ___x
echo "creating the monorepo"
git init
touch "README.md"
git add .
git commit -am "Hello World!"
declare -A data
data=(
["foo"]="https://github.com/bcanzanella/foo.git"
["bar"]="https://github.com/bcanzanella/bar.git"
)
for d in "${!data[#]}";
do {
REPO_NAME=$d
REPO_REMOTE=${data[$d]}
# since we can use a foo/bar as the repo identifier, replace the / with a -
REPO_DIR_TMP="$(mktemp -d -t "${REPO_NAME/\//-}.XXXX")"
echo "REPO REMOTE: $REPO_REMOTE"
echo "REPO NAME: $REPO_NAME"
echo "REPO TMP DIR: $REPO_DIR_TMP"
echo ""
echo "Cloning..."
git clone "$REPO_REMOTE" "$REPO_DIR_TMP"
echo "filtering into ..."
cd $REPO_DIR_TMP && git-filter-repo --to-subdirectory-filter "$REPO_NAME"
# cat .git/filter-repo/commit-map
## merge the rewritten repo
git remote add "$REPO_NAME" "$REPO_DIR_TMP"
echo "fetching..."
git fetch "$REPO_NAME"
echo "merging..."
git merge --allow-unrelated-histories "$REPO_NAME/master" --no-edit
## delete the rewritten repo
echo "Removing temp dir $REPO_DIR_TMP..."
rm -rf "$REPO_DIR_TMP"
echo "Removing remote $REPO_NAME..."
# git remote rm "$REPO_NAME"
echo "$REPO_NAME done!"
}
done
To emphasize on eftshift0's comment : rebasing and rewriting history can lead to commits being ordered in seemingly absurd chronoogical order.
If you know for a fact that all commits are well ordered (e.g : the commit date of a parent commit is always "older" than the commit date of its child commit), you may be able to generate the correct list of commits to feed in a git rebase -i script.
[edit] after thinking about it, this may be enough for your use case :
Look at the history of your repo using --date-order :
git log --graph --oneline --date-order
If the sequence of commits matches what you expect, you can use git log to generate a rebase -i sequence script :
# --reverse : 'rebase -i' asks for entries starting from the oldest
# --no-merges : do not mention the "merge" commits
# sed -e 's/^/pick /' : use any way you see fit to prefix each line with 'pick '
# (another valid way is to copy paste the list of commits in an editor,
# and add 'pick ' to each line ...)
git log --reverse --no-merges --oneline --date-order |\
sed -e 's/^/pick /' > /tmp/rebase-apply.txt
Then rebase the complete history of your repo :
git rebase -i --root
In the editor, copy/paste the script you created with your first command,
save & close.
Hopefully, you will get a non conflicting unified history.

Git-Bash File Lookup Depending On File Type

I am trying to navigate through all existing all branches and lookup if files with a certain extension such as (.zip or .exe exist)
I tried to write a bash script to achieve this task.
for branch in $(git branch);
do
echo "I am in: $branch"
git ls-files *.exe
done
I would like to see the file path when it is detected.
You are not changing to the branch so you are always checking the last branch you checked out. Try this:
# In the repo's working directory
for branch in $(git branch -a|grep -v remotes|sed 's/\*//g'); do
echo "I am in branch: ${branch}"
git checkout ${branch}
find . -type f -name '*.md'
done
Following is how I solved my problem:
read -p "Extension to lookup [example: .zip]: " extensionType
for branch in $(git branch);
do
if [[ $branch == *"Release"* ]]; then
echo "----------------------------------"
echo ">>Navigating to: $branch"
echo ">>$branch..."
git checkout $branch
git ls-files "*$extensionType"
echo "----------------------------------"
fi
done
I hope this helps.

change directory in bash is not affecting the post-commit hook

i've made the following bash script to commit the parent repo after some change in submodule.
it's all about that the script want to cd .. to check the parent repo current branch but the problem is that the cd .. is not affecting the upcoming commands because i guess the subshell
i've tried to run
1- cd ../ && before each command
2- make alias but didn't succeed
3- run exec but the script didn't continued
#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by "git commit" with no arguments. The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "post-commit".
commit_msg= git log -1 --pretty=%B
if [[ $(git branch | grep \* | cut -d ' ' -f2) == "int1177/next" ]]; then
cd ..
if [[ $(git branch | grep \* | cut -d ' ' -f2) == "B0/next" ]]; then
git add 6_Tests
git commit -m "bs esss"
echo "development branch B0/next has now new commit"
else
echo "development branch isn't B0/next"
fi
else
echo "current branch isn't int1177/next"
fi
Actually, this particular problem is not a bash issue, but rather a Git issue.
Why doesn't "cd" work in a shell script? is valid in general, and is a suitable answer to many other questions. But this particular post-commit hook is trying to chdir out of a submodule into its parent superproject, then make a commit within the parent superproject. That is possible. It may be a bad idea for other reasons—in general it's unwise to have Git commit hooks create commits, even in other repositories1—but in this particular case you're running into the fact that Git finds its directories through environment variables.
In particular, there's an environment variable GIT_DIR that tells Git: The .git directory containing the repository is at this path. When Git runs a Git hook, Git typically sets $GIT_DIR to . or .git. If $GIT_DIR is not set, Git will find the .git directory by means of a directory-tree search, but if $GIT_DIR is set, Git assumes that $GIT_DIR is set correctly.
The solution is to unset GIT_DIR:
unset GIT_DIR
cd ..
The rest of the sub-shell commands will run in the one-step-up directory, and now that $GIT_DIR is no longer set, Git will search the superproject's work-tree for the .git directory for the superproject.
As an aside, this:
$(git branch | grep \* | cut -d ' ' -f2)
is a clumsy way to get the name of the current branch. Use:
git rev-parse --abbrev-ref HEAD
instead, here. (The other option is git symbolic-ref --short HEAD but that fails noisily with a detached HEAD, while you probably want the quiet result to be just the word HEAD, which the rev-parse method will produce.)
1The main danger in this case is that the superproject repository is not necessarily in any shape to handle a commit right now. Edit: or, as discovered in this comment, is not even set up to be a superproject for that submodule, yet, much less to have a submodule-updating commit added.

How can I get git's `.git` path from git itself?

I am trying to write a shell script that needs to be able to find the .git folder for the current directory, correctly handling all of the following possibilities:
I might be in a bare repo, in which case the .git folder is either . or .. or ../.. or so on.
I might be in a submodule (in which I'll find a .git file that contains the path to the git folder)
$GIT_DIR might be set.
I might not be in a git repo at all
I have this:
seemsToBeGitdir() {
# Nothing special about "config --local -l" here, it's just a git
# command that errors out if the `--git-dir` argument is wrong.
git --git-dir "$1" config --local -l >/dev/null 2>/dev/null
return $?
}
gitdir() {
local cursor relpath
if [ "$GIT_DIR" ]; then
echo "$GIT_DIR"
return 0
fi
cursor="$(pwd)"
while [ -e "$cursor" ] && ! seemsToBeGitdir "$cursor"; do
# Git won't traverse mountpoints looking for .git
if mountpoint -q "$cursor"; then
return 1
fi
# We might be in a submodule
if [ -f "$cursor/.git" ]; then
# If .git is a file, its syntax is "gitdir: " followed by a
# relative path.
relpath="$(awk '/^gitdir:/{print$2}' "$cursor/.git")"
# convert the relative path to an absolute path.
cursor="$(readlink -f "$cursor/$relpath")"
continue
fi
if seemsToBeGitdir "$cursor/.git"; then
echo "$cursor/.git"
return 0
fi
cursor="$(dirname "$cursor")"
done
echo "$cursor"
}
And it works, but seems way too complicated -- clearly, git itself does this sort of calculation every time it's invoked. Is there a way to make git itself tell me where .git is?
Use git rev-parse, which has options specifically for this:
git rev-parse --git-dir
See also:
git rev-parse --absolute-git-dir
(new in Git version 2.13.0), and:
git rev-parse --show-toplevel
and:
git rev-parse --show-cdup
(note that its output is empty if you are already in the top level of the repository). View your own documentation to find out which options your Git supports; most of these have been around since Git 1.7, though.

Resources