How to automate git history squash by date?

How to automate git history squash by date? - bash

I've a git repository that I use as folder sync system: any time I change something in a file in the laptop, pc or mobile the changes are automatically committed. No branches, single user.
This leads to plenty of commits, like 50 per day. I would like to write a bash cron script to automate the history squashing, having a single commit per day, no matters about the comments but preserving the date.
I tried git-rebase -i SHA~count, but I can't figure out how to automate the process, i.e. pick the first commit and squashing the other count commits.
Any suggestions?
I've no problem about writing the bash that find the first SHA of the date and the counts the commits to merge, some loop over this would do the trick:
git log --reverse|grep -E -A3 ^commit| \
grep -E -v 'Merge|Author:|--|^$'|paste - -| \
perl -pe 's/commit (\w+)\s+Date:\s+\w+\s+(\w+)\s+(\d+).+/\2_\3 \1/'

I share the resulsts based on Alderath suggstions: I've used git filter-branch to parse the history and keep just the last commit of the day. A first loop on git log will write the commit timestamps that needs to be preserved (the last in the day) in a temporary file; then with git filter-branch I keep only the commit with the timestamp present in the file.
#!/bin/bash
# extracts the timestamps of the commits to keep (the last of the day)
export TOKEEP=`mktemp`
DATE=
for time in `git log --date=raw --pretty=format:%cd|cut -d\ -f1` ; do
CDATE=`date -d #$time +%Y%m%d`
if [ "$DATE" != "$CDATE" ] ; then
echo #$time >> $TOKEEP
DATE=$CDATE
fi
done
# scan the repository keeping only selected commits
git filter-branch -f --commit-filter '
if grep -q ${GIT_COMMITTER_DATE% *} $TOKEEP ; then
git commit-tree "$#"
else
skip_commit "$#"
fi' HEAD
rm -f $TOKEEP

From my understanding, you intend to do something along the lines of this:
#!/bin/bash
FIRST_COMMIT_HASH_TODAY="$(git log --since="1 days ago" --pretty=format:%H | tail -n 1)"
git reset --soft ${FIRST_COMMIT_HASH_TODAY}^
git commit -m "Squashed changes for $(date +%F)"
Ie.
List commit hashes for all commits that happend the during the last day, and extract the first of those commit hashes.
(this assumes that there is at least one commit each day, in its current form above)
Move the repo's HEAD pointer to the commit before $FIRST_COMMIT_HASH_OF_THE_DAY, but keep the work-tree and index unchanged.
Commit the squashed changes.
A word of caution though... Note that now you're effectively rewriting history. You can no longer just do git pull to
sync the changes because if a client repo still has the original commit history, while the server has the rewritten history,
you will get something like:
Your branch and 'origin/master' have diverged,
and have 50 and 1 different commit(s) each, respectively.
<EDIT>
If you want to process the entire history, one approach would be to use some variant of git filter-branch. I put one example approach below, but this approach has many weaknesses, so you might want to improve it a bit.
Weaknesses/characteristics:
Simply ignores the time zones from git raw time stamps. (weird behaviour if commits made in different time zones)
Identifies the latest commit on the branch you want to process by its root tree hash. (weird behaviour if multiple commits have same root tree (e.g. a revert commit reverting its parent commit))
Assumes a linear branch history. (weird behaviour if there are merge commits in the branch)
Doesn't specifically create one commit per day. Instead, for each commit, it checks if at least 24 hours have elapsed since the previous commit. If it hasn't it just skips that commit.
Always keeps the first and last commits, regardless of whether they are close in time to subsequent/previous commits.
Works based on GIT_COMMITER_DATEs rather than GIT_AUTHOR_DATEs.
Not well tested. So make sure to backup the original repo if you are going to try to run this.
Example command:
LATEST_TREE=$(git rev-parse HEAD^{tree}) git filter-branch --commit-filter '
# $3 = parent commit hash (if commit has at least one parent)
if [ -z "$3" ]
then
# First commit. Keep it.
git commit-tree "$#"
elif [ "$1" == "$LATEST_TREE" ]
then
# Latest commit. Keep it.
git commit-tree "$#"
else
PREVIOUS_COMMIT_COMMITTER_DATE="$(git log -1 --date=raw --pretty=format:%cd $3)"
PREVIOUS_COMMIT_COMMITTER_DATE_NO_TIMEZONE="$(echo $PREVIOUS_COMMIT_COMMITTER_DATE | egrep -o "[0-9]{5,10}")"
GIT_COMMITTER_DATE_NO_TIMEZONE="$(echo $GIT_COMMITTER_DATE | egrep -o "[0-9]{5,10}")"
SECONDS_PER_DAY="86400"
if [ $(expr $GIT_COMMITTER_DATE_NO_TIMEZONE - $PREVIOUS_COMMIT_COMMITTER_DATE_NO_TIMEZONE) -gt $SECONDS_PER_DAY ]
then
# 24 hours elapsed since previous commit. Keep this commit.
git commit-tree "$#"
else
skip_commit "$#"
fi
fi' HEAD
If you had a command to extract the commit hashes of the commits you'd want to keep, maybe you could get the root tree hash for all those commits, and store them to a separate file. Then you could change the commit-filter condition to check "is the current root tree hash present in the file of desired root tree hashes?" instead of "has 24 hours elapsed since the previous commit?". (This would amplify the "identify commits by root tree hash" issue that I mentioned above though, as it would apply for all commits, rather than just the latest commit)
</EDIT>

If you have the number of commits you want to go back then you could just use git reset --soft and then make a new commit e.g.
COMMIT_COUNT=$(git log --pretty=oneline --since="1 days" | wc -l)
git reset --soft HEAD~$COMMIT_COUNT
git commit -m "Today's work"

Related

How to add a file change to the latest commits of all branches in a git repo? [duplicate]

I have a repo that has over 300 branches, and I am wanting to store various files on in a Git LFS. To do this I need to add the .gitattributes file to the branches first. However, as I have over 300 branches, manually doing this will be very time consuming.
Is there a way that I can add a prepopulated .gitattributes file to the root of every branch automatically and push them?

A one-liner which assumes you have a branch named feature/add-gitattributes which makes the necessary changes;
git for-each-ref refs/remotes/origin --format="%(refname:lstrip=3)" | xargs -n 1 sh -c 'git checkout "$1"; git merge feature/add-gitattributes;' --
To break it down...
This part just gets a list of those 300 branch names;
git for-each-ref refs/remotes/origin --format="%(refname:lstrip=3)"
This part takes those names and passes them to a sub-shell;
| xargs -n 1 sh -c
This part is the command to the sub-shell which checks out the target branch and merges your feature branch to add the .gitattributes file.
'git checkout "$1"; git merge feature/add-gitattributes;' --
The trailing -- ensures the branch name is passed as an argument to the sub-shell.

What is the simplest way to display all branches that have not been committed to for more than 6 months?

It seems like 1836 branches exist in one of the company's repos and I've been given a task to first display and then delete all branches that have not been committed to for 6 months.
I found this SO question and tried running (with both --until and --before and "month"):
#!/bin/bash
branches_to_delete_count=0
for k in $(git branch -a | sed /\*/d); do
if [ -n "$(git log -1 --before='6 month ago' -s $k)" ]; then
echo "NOT REALLY DELETING, git branch -D $k"
fi
((branches_to_delete_count=branches_to_delete_count+1))
done
echo "Found $branches_to_delete_count branches to delete!"
But to no avail, I get the same number of branches to delete each time which is 1836.
What am I doing wrong? How can I list all branches that haven't been committed for more than 6 months?

The reason why all your branches show up : git log branch does not look at branch's head only, it looks at its whole history.
git log -1 --before='6 month ago' branch will :
unroll the history of branch
keep only commits older than 6 month
keep the first of these commits
Since (in your company's repo) all branches have a commit that is at least 6 month old in their history, git log -1 --before='6 month ago' branch will always show one line.
You can either restrict the range of commits to "a range which contains only the head commit" :
git log -1 --before='6 month ago' branch^..branch
or use git for-each-ref as #phd suggested in his comment :
git for-each-ref --format="%(refname) %(creatordate)" --sort='-creatordate' refs/heads/
and keep the branches with old enough dates.

we have not direct get the branch name which has last commit in 6 month ago so we combine git command and make shell script
Here we are passing two git command
first was git branch | sed s/^..// here get branch and remove two space
second was git log -1 --before='6 month ago' <branch-name>
pass following command in terminal
copy and past in terminal give branch name
for branch in `git branch | sed s/^..//` ; do log=`git log -1 --before='6 month ago' $branch`; if [ ${#log} -gt 0 ] ; then echo $branch; fi; done
this is shell script along with git command same as above
save shell script with test.sh,change mode chmod +x test.sh then run bash test.sh
month=6 #check how many year ago
for branch in `git branch | sed s/^..//` #get branch one by one
do
log=`git log -1 --before='%s month ago'$month $branch` #getting log of the branch last commit base on month
if [ ${#log} -gt 0 ] #check if log has output then it has branch commit before specify month ago
then
echo $branch # print branch name which is in our project
fi
done
let me know does it work or not

change directory in bash is not affecting the post-commit hook

i've made the following bash script to commit the parent repo after some change in submodule.
it's all about that the script want to cd .. to check the parent repo current branch but the problem is that the cd .. is not affecting the upcoming commands because i guess the subshell
i've tried to run
1- cd ../ && before each command
2- make alias but didn't succeed
3- run exec but the script didn't continued
#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by "git commit" with no arguments. The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "post-commit".
commit_msg= git log -1 --pretty=%B
if [[ $(git branch | grep \* | cut -d ' ' -f2) == "int1177/next" ]]; then
cd ..
if [[ $(git branch | grep \* | cut -d ' ' -f2) == "B0/next" ]]; then
git add 6_Tests
git commit -m "bs esss"
echo "development branch B0/next has now new commit"
else
echo "development branch isn't B0/next"
fi
else
echo "current branch isn't int1177/next"
fi

Actually, this particular problem is not a bash issue, but rather a Git issue.
Why doesn't "cd" work in a shell script? is valid in general, and is a suitable answer to many other questions. But this particular post-commit hook is trying to chdir out of a submodule into its parent superproject, then make a commit within the parent superproject. That is possible. It may be a bad idea for other reasons—in general it's unwise to have Git commit hooks create commits, even in other repositories1—but in this particular case you're running into the fact that Git finds its directories through environment variables.
In particular, there's an environment variable GIT_DIR that tells Git: The .git directory containing the repository is at this path. When Git runs a Git hook, Git typically sets $GIT_DIR to . or .git. If $GIT_DIR is not set, Git will find the .git directory by means of a directory-tree search, but if $GIT_DIR is set, Git assumes that $GIT_DIR is set correctly.
The solution is to unset GIT_DIR:
unset GIT_DIR
cd ..
The rest of the sub-shell commands will run in the one-step-up directory, and now that $GIT_DIR is no longer set, Git will search the superproject's work-tree for the .git directory for the superproject.
As an aside, this:
$(git branch | grep \* | cut -d ' ' -f2)
is a clumsy way to get the name of the current branch. Use:
git rev-parse --abbrev-ref HEAD
instead, here. (The other option is git symbolic-ref --short HEAD but that fails noisily with a detached HEAD, while you probably want the quiet result to be just the word HEAD, which the rev-parse method will produce.)
1The main danger in this case is that the superproject repository is not necessarily in any shape to handle a commit right now. Edit: or, as discovered in this comment, is not even set up to be a superproject for that submodule, yet, much less to have a submodule-updating commit added.

Why is the output of git diff-tree empty in my post-receive hook?

I have a bare remote repository with source files in which I want to build only the changed files after it has been pushed to. I thought the best way to detect which files have been changed would be by putting the command
changed_files=$(git diff-tree --no-commit-id --name-only -r HEAD) into a post-receive hook.
However, the variable ends up empty as I have verified by echoing it into a file. If I put HEAD^ instead of HEAD, it does show the changed files of the second to most recent commit. However, it doesn't show the most recent changes when I put HEAD but just shows nothing.
Can anyone help me? Or is there a smarter approach to my problem altogether?
I would definitely prefer a lean approach like automatically triggering a build with a push over one that would have to e. g. periodically check for changes.

At the point the post-receive hook is executed, all the references have already been updated. Therefore, HEAD means the new head, not the old one.
This may not produce the results you want, since it assumes that there is one non-merge commit and you want to diff with its parent, while you may have pushed a merge or multiple commits.
What you probably want to do is take advantage of the standard input which provides the old and new values. Something like the following will print the changed files as output from the remote side when you push:
#!/bin/sh
while read old new ref
do
# Handle created or deleted branches.
echo $old | grep -qsE '^0+$' && old=$(git hash-object -t tree /dev/null)
echo $new | grep -qsE '^0+$' && new=$(git hash-object -t tree /dev/null)
git diff-tree --no-commit-id --name-only -r "$old" "$new"
done

OK, I've figured it out: I was getting a
remote: fatal: ambiguous argument 'HEAD': both revision and filename
error in the push command which I had not noticed. After changing
changed_files=$(git diff-tree --no-commit-id --name-only -r HEAD)
to
changed_files=$(git diff-tree --no-commit-id --name-only -r HEAD --)
everything is working fine. Apparently, this is caused by the hook being executed in the .git directory of the remote repository, and there is a file called HEAD in that directory, which makes referring to the HEAD revision as HEAD ambiguous.

Git fast undo all changed with one command

We have a big repo... and drive encryption. So git reset --(whatever) takes quite long. Let's imagine a situation:
you're on a feature branch
you have some configuration changed
you want to checkout master a-clean && pull
checking out master is not possible straight away because you have made some changes
There are several options I know to revert those changes:
git reset --hard --> slow
git checkout . in root dir --> seems it's identical to reset --hard, and slow as well the same way
git stash - takes even longer
git status and then git checkout -- (filename). Now, that's fast, but you have to repeat it for every file!
Myself and bash don't understand each other very well, so doing something fancy like git status | grep modified: | awk "git checkout -- {%2}" is something beyond my current knowledge.
However, maybe there's a command in mgit that does git checkout -- to all the "modified:" files?

git status -s | grep -Po '^ ?M ?\K.*' | xargs git checkout --
-s short format useful to parse
grep -Po : -P perl regex \K keep out of match left of \K, -o print matches
xargs repeat argument as much shell command can accept

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio