How to find all empty folders and untracked files when using git?

How to find all empty folders and untracked files when using git? - bash

Based on this post and this post ,
git ls-files --others --exclude-standard can list all untracked files.
But I test it, cannot list empty folder(both tracked and not tracked).
For example,cannot list empty folder archiver folder as below:
.
├── admin.php
├── api
│   ├── index.htm
│   └── remote
│   └── mod
│   ├── index.htm
│   ├── mod_cron.php
│   └── mod_index.php
└── archiver folder
Then my question is: how to list all untracked files and empty folders?

TL;DR: just look for empty directories. You can safely remove them—well, "safe" depends on your own software, but as far as Git is concerned, it's safe. (Watch out for missing files—see the definition of "missing" below—which may remove a directory that Git might want later, but that's sort of OK, because Git will just create it again.)
On a Unix / Linux system (edited to correct lost word in transcription):
find . -name .git -prune -o -type d -empty -print
(at the top level of the work-tree) will find the empty directories.
Long(ish)
Git is not interested in folders / directories. There's no such thing as an untracked folder in the same way that there's no such thing as a tracked folder: Git only cares about files. Specifically, a file is either in the index, or not in the index, and if it's not in the index, it's untracked.
When you use the various options to list untracked files (which tend to skip over ones that are untracked-and-ignored since you normally want that), Git will, sometimes, aggregate together all the files that are in some folder, notice that there are no tracked files in that folder, and report them using the aggregated notation. You can stop this with, e.g., git status --untracked-mode=all; then you'll get the individual file names.
Note that it's possible to have some file that is tracked, yet missing. For instance, suppose sub/README.txt is a tracked file, and actually exists. Then we run rm sub/README.txt. The file sub/README.txt remains in Git's index, and will be in the next commit, but it's missing. If that was the only file in sub in your work-tree, sub is now empty, and you can remove it with rmdir sub. Even though sub/README.txt remains missing (and sub is missing too!), that does not affect the next commit: it will still contain sub/README.txt, because that file is in the index. (Using git rm --cached sub/README.txt, you can remove it from the index too, if that's what you wanted.)
If and when Git goes to copy sub/README.txt back out of the index into the work-tree, Git will, at this point, discover that there is no sub. Git will merely shrug its metaphorical shoulders and create the directory sub, and then put sub/README.txt into it. So this is why Git is not interested in folders / directories: they're just boring and dull, required only when needed to hold files, created on demand.
If you want Git to create a directory, you need to store a file in it. Since programs managed by Git need to be able to ignore the file named .gitignore, this is a very good file name to stick into such a directory. You can write * into that file, and add it to your commits, so that Git will create the directory and write a .gitignore file there containing *, and will thus ignore all additional untracked files within that directory automatically.
Side note: In general, when Git pulls the last file out of some directory, it will remove the directory too, but occasionally I've seen it leave some behind. (Of course, it has to leave the directory behind if it still contains some untracked files. Note that git clean -fd will remove the empty directories, though it also removes the untracked files.)

git ls-files --others --exclude-standard> not_tracked
find . -depth -empty -type d \( ! -regex '.*/\..*' \) >> not_tracked
Please check my answer,I spent 2 days for it.

The command git clean does exactly what you want.

Related

pandoc to make each directory a chapter

I have a lot of markdown files in various directories each with the same format (# title, then ## sub-title).
can I make the --toc respect the folder layout, in that the folder itself is the name of chapter, and each markdown file is content of this chapter.
so far pandoc totally ignores my folder names, it works the same as putting all the markdown files within the same folder.

My approach to this is to create index files in each folder with first level heading and downgrade headings in other files by one level.
I use Git and by default I'm using default structure, having first level headings in files, but when I want to generate ebook using pandoc I'm modifying files via automated Linux shell script. After that, I revert changed files via Git.
Here's the script:
find ./docs/*/ -name "*.md" ! -name "*index.md" -exec perl -pi -e "s/^(#)+\s/#$&/g" {} \;
./docs/*/ means I'm looking only for files inside subfolders of docs directory like docs/foo/file1.md, docs/bar/file2.md.
I'm also interested only in *.md files, excluding *index.md files.
In index.md files (that I name usually 00-index.md to make them appear as first), I put a first level heading # and because those files are excluded from find portion of the script, their headings aren't downgraded.
Next, there's a perl's search and replace command with regular expression s/^(#)+\s/#$&/g that looks for all lines starting from one or more # and adds another # to them.
In the end, I'm running pandoc with --toc-depth=2 so the table of content contains only first and second level headings.
pandoc ./docs/**/*.md --verbose --fail-if-warnings --toc-depth=2 --table-of-contents -o ./ebook.epub
To revert all changes made to files, I restore changes in the Git repo.
git restore .

linux - batch move files into a directory and rename those files according to sequential syntax in that directory

I have two directories - A and B - that contain a bunch of photo files. Directory A is where I keep photos long-term, and the files inside are sequentially named "Photo-1.jpg, Photo-2.jpg, etc.".
Directory B is where I upload new photos to from my camera, and the naming convention is whatever the camera names the file. I figured out how to run some operations on Directory B to ensure everything is in .jpg format as needed (imagemagik convert), remove duplicate files (fdupes), etc.
My goal now is to move the files from B to A, and end up with the newly-added files in A sequentially named according to A's naming convention described above.
I know how to move the files into A, and then to batch rename everything in A after the new files have been added (which would theoretically occur every night), but I'm guessing there must be a more efficient way of moving the files from B to A without re-naming all 20,000+ photos every night, just because a few new files were added.
I guess my question is two parts - 1) I found a solution that works (us mv to rename all photos every night), is there any downside to this? and 2) If there is a downside and a more elegant method exists, can anyone help with a script that would look at whatever the highest number that exists in A, then re-name the files, appending onto that number, in B as they are moved over to A?
Thank you!

This bash script will only move and rename the new files from DiretoryB into your DirectoryA path. It also handles file names with spaces and/or any other odd characters in their name in DirectoryB
#!/bin/bash
aPath="./photos-A"
bPath="./photos-B"
aPattern="Photo-"
lNum=$(find $aPath -type f -name "*.jpg" -printf "%f\n" | \
awk -F'[-.]' '{if($2>m)m=$2}END{print m}')
while IFS= read -r -d $'\0' photo; do
mv "$photo" "$aPath/$aPattern$((++lNum)).jpg"
done < <(find $bPath -type f -name "*.jpg" -print0)
Note
The command to find the last numbered photo, aka $lNum will run over all 20K+ files, but it should be fairly quick. If it's not, you can always run this once and store the latest number into a file and read from that file.
Proof of Concept
$ tree photos-A/
photos-A/
├── Photo-1.jpg
├── Photo-2.jpg
├── Photo-3.jpg
├── Photo-5.jpg
├── Photo-6.jpg
├── Photo-7.jpg
└── Photo-8.jpg
0 directories, 7 files
$ tree photos-B/
photos-B/
├── bar.jpg
├── baz\ with\ spaces.jpg
└── foo.jpg
0 directories, 3 files
$ ./mvphoto.sh
$ tree photos-A/
photos-A/
├── Photo-10.jpg
├── Photo-11.jpg
├── Photo-1.jpg
├── Photo-2.jpg
├── Photo-3.jpg
├── Photo-5.jpg
├── Photo-6.jpg
├── Photo-7.jpg
├── Photo-8.jpg
└── Photo-9.jpg
0 directories, 10 files

rsync subset of directories

I am trying to use include and exclude options in rsync to copy a directory structure, excluding most but not all of the subdirectories, based on a pattern in the directory names. But, it isn't working. It is trying to copy everything over instead of just the subfolders I want. Is my syntax wrong?
I have tried:
rsync -am --include='*/*/*MPRAGE*/' --exclude='*' /parent_directory/ /destination
Also:
rsync -am --include='*/' --include='*/*/*MPRAGE*/' --exclude='*' /parent/ /dest
MPRAGE is the pattern that is in the name of each folder I want copied. But these folders are three levels deep in the structure, and I want to keep the well-organized directory structure intact for these folders I want copied.
Thanks in advance for any tips.

Git is tracking directories and will not ignore them

I must have read at least 50 StackOverflow questions and answers that say that Git cannot track directories. And yet, that is exactly what seems to be happening.
I created a project (.NET, on Windows), and added and committed all the files prior to adding a .gitignore. Realizing my mistake later on, I git rm -r --cached :/ everything, added this .gitignore, and the re-added and committed my files. The thing is, git still tracks my obj and bin folders even though they seem to be ignored in the .gitignore.
Here are the relevant lines from the .gitignore:
[Bb]in/
[Oo]bj/
bin/**
obj/**
One or two of those might not make sense, I'm not totally familiar with .gitignore rules and was just trying to see what would stick.
Here's what I get for git status:
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/main/dotnet/ETB/ETB.Droid/bin/
src/main/dotnet/ETB/ETB.Droid/obj/
src/main/dotnet/ETB/ETB.iOS/bin/
src/main/dotnet/ETB/ETB.iOS/obj/
src/main/dotnet/ETB/ETB/bin/
src/main/dotnet/ETB/ETB/obj/
src/main/dotnet/packages/
This is even after I do something like git rm -r --cached .\src\main\dotnet\ETB\ETB.Droid\bin from the root level. There are also ZERO tracked files from within these directories that appear in the "Changes not staged for commit" section when I do a git status.
I'm really, really stumped. Can anyone help me figure out why I can't ignore these directories completely?
Update
I made the changes that the commenters suggested, and it seemed to solve some, but not all, of my problems (sorry I had it marked answered for a bit there). Relevant lines in my .gitignore at the root level are:
**/[Dd]ebug/**
**/bin/**
**/obj/**
That first line is probably not necessary, but I figured it couldn't hurt. There is definitely no extra whitespace on any of these lines.
For some reason, only one of the obj directories is still showing up in Git. I even deleted and re-added everything just to try it out.
The offending directory is the ETB.Data directory:
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/main/dotnet/ETB.Data/
So I ran this command:
git rm -r --cached .\src\main\dotnet\
I then committed those deletes. Then I tried to re-add the directory
git add .\src\main\dotnet
When I look at my status, here is what I'm seeing:
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: src/main/dotnet/ETB.Data/obj/Debug/TemporaryGeneratedFile_036C0B5B-1481-4323-8D20-8F5ADCB23D92.cs
new file: src/main/dotnet/ETB.Data/obj/Debug/TemporaryGeneratedFile_5937a670-0e60-4077-877b-f7221da3dda1.cs
new file: src/main/dotnet/ETB.Data/obj/Debug/TemporaryGeneratedFile_E7A71F73-0F8D-4B9B-B56E-8E70B10BC5D3.cs
new file: src/main/dotnet/ETB.sln
...
...
Why do these files keep showing up?! The obj and bin directories in other project directories are being ignored. Does anyone know why this one isn't being ignored?

You need to tell git to ignore all the bin/obj files/folders, not just the ones at its root :
**/bin/**
**/obj/**
From man gitignore :
A leading "**" followed by a slash means match in all directories. For example, "**/foo" matches file or directory "foo"
anywhere, the same as pattern "foo". "**/foo/bar" matches file or directory "bar" anywhere that is directly under
directory "foo".
A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to
the location of the .gitignore file, with infinite depth.

Thats very simple because your line in your .gitignore file are not correct. I can't test it now but try something like this for example
**/bin/**
**/obj/**
When you don't write the * at the beginning your line is interpreted as the start.
There is a good comment if you read the man page.
. A leading "" followed by a slash means match in all directories.
For example, "/foo" matches file or directory "foo" anywhere, the
same as pattern "foo". "**/foo/bar" matches file or directory "bar"
anywhere that is directly under directory "foo".
. A trailing "/" matches everything inside. For example, "abc/"
matches all files inside directory "abc", relative to the location of
the .gitignore file, with infinite depth.

Git: How to ignore files on Windows?

I create .gitignore in folder with my repository near .git
project
--.git
--.gitignore
--Source
----tmp
----scr
But git doesnt see it, wouldnt ignore files in .gitignore
My .gitignore file:
*.tmp
*~
*.pdb
*.user
*.suo
Sources/tmp
What`s wrong?
Up:
I created new repositiry, add .gitignore before init commit - it work!
But if I add in old repository it doesn`t...

The problem is that you're specifying glob syntax when the default syntax for git is regex.
Try this instead:
.*\.tmp
.*~
.*\.pdb
.*\.user
.*\.suo
Sources\/tmp

What you have should work, though your directory listing has Source/ while your .gitignore has Sources/.
The one thing that springs to mind is that the line endings might not be what git is expecting.
Also, as tmp is a directory, usually a trailing '/' is used:
Source/tmp/
Finally, you can also create a .gitignore in Source/ with the line:
tmp/
instead of having it in the top directory.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio