git - only fetch the files, not the history - performance

when I am running git pull or git fetch, I obviously retrieve both history and files. For huge projects, that takes very much time. I wonder how this process could be sped up, as for some projects I am only interested in the source code and not in the history. Is there a way to tell git that I only want to fetch the current snapshot of the files and not the whole history as well?

You probably want to look at the --depth option in git clone--called a "shallow clone". In particular, you probably want:
git clone --depth=1 <url>
If the project is on GitHub, you can always use the download links from there. Note, there are some catches to using a shallow clone:
Create a shallow clone with a history truncated to the specified number of revisions. A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it), but is adequate if you are only interested in the recent history of a large project with a long history, and would want to send in fixes as patches.
But that sounds like something you can live with.
Also, as positron pointed out, you can do this with git archive as well.

You can use a shallow clone:
git clone --depth=1 git://url/of/repo
However you won't be able to commit/push changes made in a shallow clone.

If there is a webview like gitweb or cgit, you can very well take a snapshot. But I don't think fetch of the code alone is possible. Because fetch is working on your git objects and not the code.
git archive --format=tar --remote=gitolite#server:repo.git HEAD | bzip2 > repo-snapshot.tar.bz2

Related

Spring and GitHub: hide sensitive data

I have a repository on GitHub that I would like to make public so recruiters can view it.
This repository though holds my SMTP and a MongoDB URI that shouldn't be shared with others. This information is in my application.properties file.
What's the simplest way to hide this sensitive data and also make sure no one can go look at old commits and see how it was before hiding it?
I have seen some ways on the web but they all look quite complicated...
Thank you for your experience and time
Use environment variables to hide your sensitive data. Like
spring.data.mongodb.host=${MONGO_DB_HOST}
spring.mail.host=${MAIL_HOST}
Set the values at your dev environment.
I don't have any idea about how to hide your old commits.
Make a .gitignore file at the root of your project and inside list whatever files you don't want git to have access to it when you push into GitHUb, for example:
/public/packs
/node_modules/
.pnp.js
/ (forward slash) is used for folders and
. (dot) is used for files
Here follows a picture of the location of the .gitignore file.
If the goal is just for recruitment, would it be acceptable to have a second copy for recruitment, while leaving the original copy alone?
While there's certainly more idiomatic ways of achieving this through git, a simple solution with minimal git knowledge or advanced techniques would be:
Create a new empty git project on GitHub
Clone the new project locally
Copy the (non-.git) files from the existing project into the new project (using either the console or your OS's windowed UI)
Delete or redact the offending entries from the new project
Commit the changes as a single commit
Push the new project back to GitHub
I have not used it myself, but the open source BFG Repo-Cleaner looks like it might satisfy your requirements of simplicity while retaining the activity chart for reviewers to view. This can be done on a publicly-facing copy of the repo if you wish to keep your private working copy, while still keeping the activity history viewable.
Following the tool's usage instructions, you should be able do the following (assuming you want these changes in a fresh copy of the repo):
The first step is to duplicate the repository on GitHub, following the instructions in the GitHub docs.
To do this, first create a new repository.
Next, mirror the repository, following the GitHub instructions:
Open Terminal.
Create a bare clone of the repository.
$ git clone --bare https://github.com/exampleuser/old-repository.git
Mirror-push to the new repository.
$ cd old-repository.git
$ git push --mirror https://github.com/exampleuser/new-repository.git
Remove the temporary local repository you created earlier.
$ cd ..
$ rm -rf old-repository.git
Now that you have the duplicate repository, you can run the BFG Repo-Cleaner to replace all instances of text you want hidden with ***REMOVED***.
$ java -jar bfg.jar --replace-text replacements.txt my-repo.git
The replacements.txt file would contain the SMTP, MongoDB URI, and any other text you want hidden.
mongodb://my-username:my-password#host1.example.com:27017,host2.example.com:27017/my-database
marco-f#example.com
Note that this does not update the latest commit on the master/HEAD branch, so this will need to be manually changed, and then committed. This can either achieved using a final commit using the --amend option, or by making a new commit prior to running the BFG Repo-Cleaner with the files manually changed.
$ git commit --amend
Now that the changes have been made, they can be pushed to GitHub.
$ git push

SVN: find files updated to nonexistence

I am writing a shell script which can store the actual state of a SVN working copy and restore it later, exactly as it was. Currently I have a problem with specific, rare combination of revisions of files and directories which seems to be undetectable.
Let's say that there is a repository with two revisions.
There are two cases:
Assume that foo is a file (or a directory) that exists only in revision 2. At the beginning the whole working copy is at revision 2. Then foo (and only foo) is updated to revision 1.
Assume that bar is a file (or a directory) that exists only in revision 1. At the beginning the whole working copy is at revision 1. Then bar (and only bar) is updated to revision 2.
The both cases are very similar but it seems that they have different solutions. In both cases the file (or directory) simply vanishes. However, output of command svn status contains no information about that.
How to create by a shell script a list of such files and directories?
There is one simple but bad solution. It is possible to use command svn list to get a list of files that should exist in current revision and compare it to the list of files that really exist.
This solution is unacceptable because it takes a lot of time and generates a big traffic to the server.
I posted the best answer that I can come up with. Still, it works only for the first case and has false-positives.
I once attempted to do the same thing that you're doing, and I hit so many corner cases that I eventually went a completely different direction. Instead of using a script, I used a local git repository.
Check out a working copy from the Subversion repository, then create a local git repository in that folder using git init. Add the entire contents of your Subversion working copy to the git repository - including the .svn metadata directories - using git add followed by a git commit. Git is now keeping track of your working copy plus all of the Subversion metadata associated with it. My current git repository has 5 different branches, each based off of a different Subversion revision and containing different sets of changes that haven't been committed to the Subversion repository yet. The git repository makes it easy to switch back and forth between them, and Subversion works as if they were all separate working copies. Even for large working copies, git does a good job at storing contents efficiently and switching between branches quickly.
Note that this is different than the git svn command, which is git's method of directly interfacing with a Subversion repository. I found git svn to be more complicated to use and easier to break things. Wrapping a normal Subversion working copy in a git repository allowed me to still do all of my repository operations using Subversion, and only required me to learn a few basic git commands (add, commit, branch, checkout, etc). It's a bit easier for someone who is experienced with Subversion and new to git; git svn is more geared towards someone who is experienced with git and stuck with a Subversion repository.
I found partially solution for the first case.
svn status -u | grep '^........\*........ ' | cut -c 22-
This code shows all files that exist in head revision and do not exists in current one. This finds files and directories from first case. However, it generates false-positives, when a file is removed when the parent directory (which still exists) is updated to lower revision.

How to recover files in git repository

I had a project connected to a local git repository. I decided to reinit that after some mess with branches and commits. Firstly, I deleted old repository with "rm -r .git", and than created new one with "git init". After that, I found out my work directory looking the same way as if my project was only created - the results of all my work are gone.
Trying many recipes from the internet didn't give results. Please, give me a cue, is there any chance to recover my project's files or not.
In the case your "local repository" means you did git clone /path/to/your/local/repo, yes you can restore it by cloning again with git ckibe /path/to/your/local/repo, or git remote add origin /path/to/your/local/repo && git fetch && git pull origin/master ).
Same thing if you cloned from a remote repository.
Otherwise, there is no way to recover your files with git, except if you removed by a graphical interface (which move to a trash folder instead of making a real deletion) or if you have a back up.

Working with multiple Git

I have following dir stucture
root
root/framework (Yii)
root/protected/messages
All of this folder must be separate git repos
What I want to do is
root and root/framework must be separate repos. But
root/framework must be pull only because I have no push access to this repository. I mean I want to pull yii when I pull parent repo, but don't want to push when I push parent repo.
Another problem is, remote dir structure of Yii (root/framework) looks like http://screencast.com/t/mU1TgXuZDv
I need only framework folder's contents. How can I pull only this folder's contents into root/framework ?
To make root/protected/messages separate git repo so that, when I push & pull root git repo, to do it for this one too. In other words, to push & pull with parent one to 2 separate remotes.
To solve second problem, I initialized new repo inside root/protected/messages but now they push & pull separatelly. I mean, I want them to push & pull changes to/from 2 remotes at once. Can't figure out how to do it.
Also I have no idea about first problem.
Any suggestions?
In order to create a separate and independent git repos within a parent git repo, you want to look into Git Submodules (http://git-scm.com/book/en/Git-Tools-Submodules). These basically allow you to create a completely independent git repos inside a directory which by itself is a git repository.
To create the submodule the command is git submodule add git://path/to/gitname.git folder-containing-the-inner-git. Of course you will need to cd into the parent folder before firing this command, which in your case will be root. The git://path/to/gitname.git will be the git url for Yii and folder-containing-the-inner-git will be root/framework.
In order to pull a specific folder of Yii of the entire git repo you might want to try out git checkout as suggested by this question on stackoverflow How to pull specific directory with git. I have never tried this myself.
Also, as of Git 1.7 you can also do a sparse checkout (https://www.kernel.org/pub/software/scm/git/docs/v1.7.0/git-read-tree.html#_sparse_checkout). Although you will still have to fetch the entire repo.
Once you create a separate git repo using git submodules inside root, you will have to push and pull the git inside root/protected/messages seperately. You can however automate this process by creating a git hook (http://git-scm.com/book/en/Customizing-Git-Git-Hooks) for the repo inside root. A hook is a script that can be executed upon specific git events/operations like committing, merging, etc. For a full list of these events you can refer to this page ... http://www.manpagez.com/man/5/githooks/
It seems that there is no event for a git push or pull. However there is an event for git merge ... post-merge :
This hook is invoked by git merge, which happens when a git pull is
done on a local repository. The hook takes a single parameter, a status
flag specifying whether or not the merge being done was a squash merge.
This hook cannot affect the outcome of git merge and is not executed,
if the merge failed due to conflicts.
This hook can be used in conjunction with a corresponding pre-commit
hook to save and restore any form of metadata associated with the
working tree (eg: permissions/ownership, ACLS, etc). See
contrib/hooks/setgitperms.perl for an example of how to do this.
So you can write a simple bash script like :
cd root/protected/messages
git pull origin master
So everytime you pull from the outer repo in root this script will get fired and you will be able to pull the contents of your inner repo as well. However, this will happen on every merge, not just the merges that happen on a pull so you might want to be careful.
Hope this helps.
You may try more straightforward way:
Init your git repo in root;
Add your root/framework to .gitignore in it;
Go to root/framework and init new git repository there;
You will have matroshka styled repos. But, to be frankly, they will be harder to support than git-submodules solution, since root repo does not aware about other repos at all, and all pushesh, pulls need to be done separately inn each repo.

Creating a local transparent cache of a mercurial repository

I have lots of different clones which I work on separately. When I want to update those clones, it can be quite slow to update them from the server. So instead I have a "clean" clone, which I update from the server regularly, and all the other clones clone from the clean clone (hope that makes sense).
But now what I have is a two-step approach. First go to the clean clone and pull, and then go to the real clone i'm working on and pull from the clean clone. Can this be made into 1 step?
Ideally, the "clean" clone would be transparent: when it's pulled from, it would do a pull itself. That way we'd have caching and 1-step. Is there a way to do this?
Keeping a clean clone locally is very common and a good idea in general. I've always stuck with the two step process you describe, but you could do this with hooks if you wanted.
In your cache repos you'd put soemthing like this in the .hg/hgrc file:
[hooks]
preoutgoing = hg pull
which tells that repo to do a hg pull before it bundles up changes in response to a pull or clone request made on it.
Note that even if the downstream (real clone) repo requests a subset of the changesets using pull -r or clone -r this cache repo will pull down everything. That's likely what you want since your goal is a mirror but the commenter points it's worth pointing out.
You can do this using hooks. In your <clean-clone>/.hg/hgrc, add these as a first draft:
[hooks]
# Before a pull from this repository, pull from upstream.
preoutgoing.autopull = [ $HG_SOURCE = 'pull' ] && hg pull
# After a push to this repository, push to upstream.
changegroup.autopush = [ $HG_SOURCE = 'push' ] && hg push
(Note: "autopush" and "autopull" are optional identifiers with no special meaning; you can leave them out if you have no other hooks defined.)

Resources