SVN: find files updated to nonexistence - bash

I am writing a shell script which can store the actual state of a SVN working copy and restore it later, exactly as it was. Currently I have a problem with specific, rare combination of revisions of files and directories which seems to be undetectable.
Let's say that there is a repository with two revisions.
There are two cases:
Assume that foo is a file (or a directory) that exists only in revision 2. At the beginning the whole working copy is at revision 2. Then foo (and only foo) is updated to revision 1.
Assume that bar is a file (or a directory) that exists only in revision 1. At the beginning the whole working copy is at revision 1. Then bar (and only bar) is updated to revision 2.
The both cases are very similar but it seems that they have different solutions. In both cases the file (or directory) simply vanishes. However, output of command svn status contains no information about that.
How to create by a shell script a list of such files and directories?
There is one simple but bad solution. It is possible to use command svn list to get a list of files that should exist in current revision and compare it to the list of files that really exist.
This solution is unacceptable because it takes a lot of time and generates a big traffic to the server.
I posted the best answer that I can come up with. Still, it works only for the first case and has false-positives.

I once attempted to do the same thing that you're doing, and I hit so many corner cases that I eventually went a completely different direction. Instead of using a script, I used a local git repository.
Check out a working copy from the Subversion repository, then create a local git repository in that folder using git init. Add the entire contents of your Subversion working copy to the git repository - including the .svn metadata directories - using git add followed by a git commit. Git is now keeping track of your working copy plus all of the Subversion metadata associated with it. My current git repository has 5 different branches, each based off of a different Subversion revision and containing different sets of changes that haven't been committed to the Subversion repository yet. The git repository makes it easy to switch back and forth between them, and Subversion works as if they were all separate working copies. Even for large working copies, git does a good job at storing contents efficiently and switching between branches quickly.
Note that this is different than the git svn command, which is git's method of directly interfacing with a Subversion repository. I found git svn to be more complicated to use and easier to break things. Wrapping a normal Subversion working copy in a git repository allowed me to still do all of my repository operations using Subversion, and only required me to learn a few basic git commands (add, commit, branch, checkout, etc). It's a bit easier for someone who is experienced with Subversion and new to git; git svn is more geared towards someone who is experienced with git and stuck with a Subversion repository.

I found partially solution for the first case.
svn status -u | grep '^........\*........ ' | cut -c 22-
This code shows all files that exist in head revision and do not exists in current one. This finds files and directories from first case. However, it generates false-positives, when a file is removed when the parent directory (which still exists) is updated to lower revision.

Related

Spring and GitHub: hide sensitive data

I have a repository on GitHub that I would like to make public so recruiters can view it.
This repository though holds my SMTP and a MongoDB URI that shouldn't be shared with others. This information is in my application.properties file.
What's the simplest way to hide this sensitive data and also make sure no one can go look at old commits and see how it was before hiding it?
I have seen some ways on the web but they all look quite complicated...
Thank you for your experience and time
Use environment variables to hide your sensitive data. Like
spring.data.mongodb.host=${MONGO_DB_HOST}
spring.mail.host=${MAIL_HOST}
Set the values at your dev environment.
I don't have any idea about how to hide your old commits.
Make a .gitignore file at the root of your project and inside list whatever files you don't want git to have access to it when you push into GitHUb, for example:
/public/packs
/node_modules/
.pnp.js
/ (forward slash) is used for folders and
. (dot) is used for files
Here follows a picture of the location of the .gitignore file.
If the goal is just for recruitment, would it be acceptable to have a second copy for recruitment, while leaving the original copy alone?
While there's certainly more idiomatic ways of achieving this through git, a simple solution with minimal git knowledge or advanced techniques would be:
Create a new empty git project on GitHub
Clone the new project locally
Copy the (non-.git) files from the existing project into the new project (using either the console or your OS's windowed UI)
Delete or redact the offending entries from the new project
Commit the changes as a single commit
Push the new project back to GitHub
I have not used it myself, but the open source BFG Repo-Cleaner looks like it might satisfy your requirements of simplicity while retaining the activity chart for reviewers to view. This can be done on a publicly-facing copy of the repo if you wish to keep your private working copy, while still keeping the activity history viewable.
Following the tool's usage instructions, you should be able do the following (assuming you want these changes in a fresh copy of the repo):
The first step is to duplicate the repository on GitHub, following the instructions in the GitHub docs.
To do this, first create a new repository.
Next, mirror the repository, following the GitHub instructions:
Open Terminal.
Create a bare clone of the repository.
$ git clone --bare https://github.com/exampleuser/old-repository.git
Mirror-push to the new repository.
$ cd old-repository.git
$ git push --mirror https://github.com/exampleuser/new-repository.git
Remove the temporary local repository you created earlier.
$ cd ..
$ rm -rf old-repository.git
Now that you have the duplicate repository, you can run the BFG Repo-Cleaner to replace all instances of text you want hidden with ***REMOVED***.
$ java -jar bfg.jar --replace-text replacements.txt my-repo.git
The replacements.txt file would contain the SMTP, MongoDB URI, and any other text you want hidden.
mongodb://my-username:my-password#host1.example.com:27017,host2.example.com:27017/my-database
marco-f#example.com
Note that this does not update the latest commit on the master/HEAD branch, so this will need to be manually changed, and then committed. This can either achieved using a final commit using the --amend option, or by making a new commit prior to running the BFG Repo-Cleaner with the files manually changed.
$ git commit --amend
Now that the changes have been made, they can be pushed to GitHub.
$ git push

git on windows says untracked working tree files would be overwritten by checkout but they don't exist

This is a weird one, and I have tried everything both in git and in windows but can't get the message to go away.
I have a folder in my repo as follows:-
/startup
/client
/index.js
/server
/index.js
but for some reason, somehow, at some point, git has appeared to have decided I have 2 other, very similar files, and won't let me checkout my main branch, it forever complains:
error: The following untracked working tree files would be overwritten by checkout:
imports/startup./client/index.js
imports/startup./server/index.js
Please move or remove them before you switch branches.
Aborting
I don't have any such named files (and don't believe I ever did), and the situation is totally compounded by the fact that a trailing "." on a folder name is not something windows plays ball with! If I try to rename my startup folder from "startup" to "startup." it just ignores the change.
I've git cleaned with every flag / flags known to man, git forced everything that can be forced, I've even deleted the entire repo and recloned.
I've checked my central repo too, in all branches (master, the current develop, and my own current branch that I'm stuck in), and of course the "startup." files aren't listed - just the "startup" folder as expected.
So I'm at a loss, and putting this out to the community in the hope that someone else can help.

git automatic add and remove?

so to add a file I need to run git add and to remove a file git remove
but this seems to be a very time consuming job if the project has a lot of files that change on my local copy and then the remote repo needs to be updated.
is there some automatic way to sync the local repo with the remote one, like in the GUI version of git? the gui vesion automatically adds new files and removes deleted files
You can use git add -A. It works on your entire working copy and stages (adds to the "Changes to be committed" section) all new (not ignored), modified and deleted files.
There are GUIs available that may help you add files in bulk. I have used Atlassian SourceTree with some success. However there are ways to add multiple files easily from the command line.
You can use wild cards to add multiple files (i.e. git add CurrentDir/*.c to add all the .c files in the current directory. There are more examples of using wildcards in the git documentation.

Externals when migrating from SVN to GIT

Ok, I've read it all, and tried to find solutions to my problem to no avail, so was wondering if anyone would be able to give me the ultimate solution to the migration issue I'm having.
It's to do with using SVN externals in GIT, so hold on to your chairs.
I have looked at the following topics only to find that no one has the exact same setup as I have.
git submodule svn external
git: How do you add an external directory to the repository?
What happens when I clone a repository with symlinks on Windows?
Git Symlinks in Windows
Now my setup is really not that complicated, but I can't work out a way to get it working the way I need it. I have:
Project1
Core
Libs
I then have in my main project
MainProject
MainFolder
file.cs
file1.cs
file2.cs
Core (external of Project1)
Libs (external of Project1)
Obviously this is fine on SVN, as you can use externals, but with sub-modules, you can only create them pointing to the root of the repository, which in this case doesn't quite work for me, since I have both Core and Libs on the same repository. Moving them out isn't an option at this point, since we're still in the migration process, and I need to keep constantly syncing them.
I then thought I could just go and use symlinks (notice I'm on a windows environment), as this way I would be able to checkout my Project1 repository at the same level as my MainProject, and via symlinks make sure my project still thinks everything is where it should.
This magically worked, however, upon doing git status I now noticed Project1 was marked as Untracked files, and by committing this and pushing, and checking out again, my symlink was gone, and I now had a hard copy of my Project1 repository copied into MainProject.
This obviously turned out to be a bit of a nightmare ow, so I was just wondering if anyone could help me with this, and maybe point me towards the right direction.
Thanks in advance,
Marcos
While I believe Michael Geddes is working on supporting symlinks in a future msysgit2, there is one way to get that support right now (that you have mentioned)
"Git Symlinks in Windows"
It allows to restore symlinks on checkout in Windows.
If you add Project1 as a submodule of your main project:
it won't be displayed as untracked files in your git status.
you can add a symlink in MainFolder to (Project1/)Core in order to get the structure you want.

git - only fetch the files, not the history

when I am running git pull or git fetch, I obviously retrieve both history and files. For huge projects, that takes very much time. I wonder how this process could be sped up, as for some projects I am only interested in the source code and not in the history. Is there a way to tell git that I only want to fetch the current snapshot of the files and not the whole history as well?
You probably want to look at the --depth option in git clone--called a "shallow clone". In particular, you probably want:
git clone --depth=1 <url>
If the project is on GitHub, you can always use the download links from there. Note, there are some catches to using a shallow clone:
Create a shallow clone with a history truncated to the specified number of revisions. A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it), but is adequate if you are only interested in the recent history of a large project with a long history, and would want to send in fixes as patches.
But that sounds like something you can live with.
Also, as positron pointed out, you can do this with git archive as well.
You can use a shallow clone:
git clone --depth=1 git://url/of/repo
However you won't be able to commit/push changes made in a shallow clone.
If there is a webview like gitweb or cgit, you can very well take a snapshot. But I don't think fetch of the code alone is possible. Because fetch is working on your git objects and not the code.
git archive --format=tar --remote=gitolite#server:repo.git HEAD | bzip2 > repo-snapshot.tar.bz2

Resources