How can I debug .gitignore file handling? - windows

I’m having lots of trouble convincing git to ignore files in my project.
Basically, sometimes it works, sometimes it just seems to ignore the .gitignore file for no obvious reason. (By “seems” I mean that there are patterns in it that look as if they should exclude something, but that something is not excluded.)
There’s a 'git check-ignore' command, but it only says which pattern matched a file. But I can’t find any option to make it say which patterns it’s found and where, nor why those patterns do not match a file.
Is there a way to do this kind of debugging?
P.S. There is a single issue which I did find, and I’m mentioning it here in case it helps others:
I was adding patterns using “echo pattern >> .gitignore”, which at least on my system results in spaces at the of the line (i.e., everything between “echo” and “>>” is echoed in the file, except for the first space character after “echo”).
Git does not trim those spaces when matching patterns, so for the command above it wouldn’t match a file named “pattern” but it would match “pattern{space}”.
I think most of my issues stem from this. Those extra spaces are hard to notice, so I’d still like a debug command that makes sure I notice them, if there is one.
Edit #1:
Yes, I did try -v. For example:
> mkdir test
> touch test/file.txt
> echo test >> .gitignore
> git check-ignore -v test/file.txt
(nothing is printed)
> echo test>> .gitignore
> git check-ignore -v test/cuc.txt
.gitignore:8:test test/cuc.txt
Note the extra space in the first echo line, which makes it enter “test[space]” as a pattern. As I mentioned, “check-ignore” tells you what matched, but it doesn’t tell you what didn’t nor why.

Related

How to refer to the current path in this recursive find and replace?

Disclaimer: (off-topic warning) This is not about outputting the list of ignored files actually detected in the repo. This is about ignored paths, even when no file is in fact matching one of these paths.
Context: I'm attempting to write a git alias to "flatten" all .gitignore patterns recursively and output a list of paths as they're seen from the top level.
What I mean with an example:
├─ .git
├─ .gitignore
└─ dir1
├─ .gitignore
├─ file1.txt
└─ file2.txt
With these contents in .gitignore files:
# (currently pointing at top-level directory)
$ cat .gitignore
some_path
$ cat dir1/.gitignore
yet_another_path
*.txt
I try to have an alias to output something along the lines of
$ git flattened-ignore-list
some_path
dir1/yet_another_path
dir1/*.txt
What do I have so far?
I know I can search for all .gitignore files in the repo with
find . -name ".gitignore"
which in this case would output
.gitignore
dir1/.gitignore
So I've tried to combine this with cat to get their contents (either of these work)
find . -name ".gitignore" | xargs cat
# or
cat $(find . -name ".gitignore")
with this result:
some_path
yet_another_path
*.txt
which is technically expected but unfortunately unhelpful for what I am trying to achieve. So to (at last!) arrive at my actual question:
How can I, for each result of find, refer to the current path? (in order to eventually prepend it to the line)
Note for people suspecting an XY problem : It might be the case, my approach might just be naive here, but maybe not, I'm unsure. For example I didn't consider complex cases where nested .gitignore files could refer to upper-levels, or special syntax with **. I've stuck to very simple structures for now, so in case you see a flaw and/or can suggest a totally different way to achieve the same goal, I'll of course be happy to hear about it also.
I try to have an alias to output something along the lines of
$ git flattened-ignore-list
some_path
dir1/yet_another_path
dir1/*.txt
Unfortunately, this approach is naive (and perhaps doomed, but maybe not) because entries in .gitignore files are a bit complicated.
The simple answer to the simple question you asked is to use something that prepends the directory name, relative to the top level. Since find never outputs unnecessarily-complicated names, you can do this with direct string processing:
.gitignore
dir1/.gitignore
tells you that when reading the first file, prepend nothing, and when reading the second, prepend dir1 to each entry. Doing this in shell is a little tricky, but bash has the tools needed: you just get the line minus the /.gitignore at the end, either using regexp replacement or just removing 11 characters (if I counted right) from anything that has a slash in it or isn't the literal 10-character string .gitignore. Grab the directory off the part before the /.gitignore name and use sed or awk to insert it, and a slash, in front of non-comment entries (and remember to handle ! entries a little differently).
You are probably better off handling the top level .gitignore separately–you can just copy it straight through, adding a final newline if necessary—and then dealing with subdirectory .gitignores in a different code path.
Note that a subdirectory .gitignore cannot refer to something above it: nothing in dir1/.gitignore can change whether ./foo or dir2/foo is ignored or not. So that part is not a problem.
The part that is a problem is that, in dir1, the entry:
*.txt
implies that the top level should not only ignore untracked dir1/*.txt files, but also ignore dir1/sub/*.txt files, dir1/sub/sub2/*.txt, and so on. However, a dir1 entry reading:
sub/*.txt
means that the top level should ignore only untracked dir1/sub/*.txt files, without ignoring any dir1/sub/sub2/*.txt files!
You may be able to salvage this with yet more code: while reading a subdirectory .gitignore, check to see if there are embedded slashes in any given line. An embedded slash is one that is not the final slash, because final slashes are removed for this particular differentiation.
If the entry contains an embedded slash, it applies only to the full-path-relative-to-the-subdirectory. You can therefore add dir1/ in front and be done, e.g.:
dir1/foo/*.txt
If the entry does not contain an embedded slash, it applies to the subdirectory and all of its nested sub-subdirectories. You will need to allow for any arbitrary number of subdirectories. This might be correct, but it's quite untested:
dir1/*.txt
dir1/**/*.txt
(In theory **/ should also match the empty list of subdirectories, so only the second line should be needed, but in practice I have seen this not happen for some cases. I do not recall whether this was in other pathspecs, .gitignore files, or both.)
In general, most .gitignore entries seem not to contain embedded slashes, so any successful script you write will probably produce a nearly double-length "flattened" ignore file, compared to its input length.
You can produce a complete list of ignore patterns, with directory prefix like this:
#!usr/bin/env sh
find \
. \
-type f \
-name '.gitignore' \
-printf '%h\n' \
| while IFS= read -r dir_name; do
printf \
"${dir_name}/%s\\n" \
$(
sed \
--silent \
'/^[^#[:space:]]/p' \
"$dir_name/.gitignore"
)
done
The above code will just list all patterns found in .gitignore files across directories, and add the directory as prefix of each pattern.
It does not reflect gitignore syntax and behavior that is described here in git documentation: https://git-scm.com/docs/gitignore

Replace/sync only certain lines using Bash, SSH and rsync

I am looking for a quick and dirty one-liner to sync only certain settings in remote config files. Need to preserve what's unique and sync generic settings. Example:
Config1.conf:
HOSTNAME=COMP1
IP=10.10.13.10
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
Remote-Config2.txt:
HOSTNAME=COMP2
IP=10.10.13.11
LOCATION=FOO
BUILDING=BAR
ROOM=BAZ
I need to sync or copy replace only the bottom 3 lines over ssh. The line numbers are predictable, by the way. Always lines 4,5 and 6 in this case.
Here's a working idea that is missing one piece (a standard replacement for the non-standard utility I used to replace the vars in the local conf):
for var in $(ssh root#10.10.8.12 'sed -n "4,6p" /etc/conf1.conf');do <missing piece> ${var/=*}=${var/*=} local-conf.conf; done
So this uses variable expansion and a non-standard utility but needs like a sed or Perl routine to replace the info in the local conf.
Update
The last line of code actually works. Tested and works! However -- the missing piece is a custom non-standard utility. I'm asking if someone can think of something, using standard Linux tools, to replace that.
One solution would be to take the left side and match, then replace the right side. This is basically what that utility does. Looks for the variable in the conf then sets it. Using variable expansion is one way (shown).
Here's an alternative solution that does not require the command to have special knowledge of the file contents:
Take a copy of the files you want to sync. Then, in the copy, deliberately vandalise (arbitrarily modify) the lines you do not want synced. It doesn't matter what they say as long as there are the same number of lines and they'll never match the actual file contents. Have some fun. This becomes your base version. Your example might look like this:
HOSTNAME=foo
IP=bar
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
rsync the remote files into a temporary location. This is the remote version.
For each file, take a three-way diff.
diff3 -3 <localfile> <basefile> <remotefile>
The output of diff3 is an "ed script" that decribes what edits to make to the local file so that it would look like the remote file.
The -3 option tells it to only output the non-conflicting differences. This is why we vandalised the base files in the first place: so those lines would have conflicts.
Once you have the ed script for a file, you can visually check it, if you choose, and then apply the update using patch:
cat <ed-script> | patch --ed <localfile>
So, to do this recursively, you might have:
cd $localdir
for file in `find . -type f`; do
diff3 -3 "$file" "$basedir/$file" "$remotedir/$file" | patch --ed "$file"
done
You probably need to add some checks that the base and remote files actually exist.

Bash: find references to filenames in other files

Problem:
I have a list of filenames, filenames.txt:
Eg.
/usr/share/important-library.c
/usr/share/youneedthis-header.h
/lib/delete/this-at-your-peril.c
I need to rename or delete these files and I need to find references to these files in a project directory tree: /home/noob/my-project/ so I can remove or correct them.
My thought is to use bash to extract the filename: basename filename, then grep for it in the project directory using a for loop.
FILELISTING=listing.txt
PROJECTDIR=/home/noob/my-project/
for f in $(cat "$FILELISTING"); do
extension=$(basename ${f##*.})
filename=$(basename ${f%.*})
pattern="$filename"\\."$extension"
grep -r "$pattern" "$PROJECTDIR"
done
I could royally screw up this project -- does anyone see a flaw in my logic; better: do you see a more reliable scalable way to do this over a huge directory tree? Let's assume that revision control is off the table ( it is, in fact ).
A few comments:
Instead of
for f in $(cat "$FILELISTING") ; do
...
done
it's somewhat safer to write
while IFS= read -r f ; do
...
done < "$FILELISTING"
That way, your code will have no problem with spaces, tabs, asterisks, and so on in the filenames (though it still won't support newlines).
Your goal in separating f into extension and filename, and then reassembling them with \., seems to be that you want the filename to be treated as a literal string; right? Like, you're worried that grep will treat the . as meaning "any character" rather than as "one dot". A more general solution is to use grep's -F option, which tells it to treat the pattern as a fixed string rather than a regex:
grep -r -F "$f" "$PROJECTDIR"
Your introduction mentions using basename, but then you don't actually use it. Is that intentional?
If your non-use of basename is intentional, then filenames.txt really just contains a list of patterns to search for; you don't even need to write a loop, in this case, since grep's -f option tells it to take a newline-separated list of patterns from a file:
grep -r -F -f "$FILELISTING" "$PROJECTDIR"
You should back up your project, using something like tar -czf backup.tar.gz "$PROJECTDIR". "Revision control is off the table" doesn't mean you can't have a rollback strategy!
Edited to add:
To pass all your base-names to grep at once, in the hopes that it can do something smarter with them than just looping over them just as though the calls were separate, you can write something like:
grep -r -F "$(sed 's#.*/##g' "$FILELISTING")" "$PROJECTDIR"
(I used sed rather than while+basename for brevity's sake, but you can an entire loop inside the "$(...)" if you prefer.)
This is a job for an IDE.
You're right that this is a perilous task, and unless you know the build process and the search directories and the order of the directories, you really can't say what header is with which file.
Let's take something as simple as this:
# include "sql.h"
You have a file in the project headers/sql.h. Is that file needed? Maybe it is. Maybe not. There's also a /usr/include/sql.h. Maybe that's the one that's actually used. You can't tell without looking at the Makefile and seeing the order of the include directories which is which.
Then, there are the libraries that get included and may need their own header files in order to be able to compile. And, once you get to the C preprocessor, you really will have a hard time.
This is a task for an IDE (Integrated Development Environment). An IDE builds the project and tracks file and other resource dependencies. In the Java world, most people use Eclipse, and there is a C/C++ plugin for those developers. However, there are over 2 dozen listed in Wikipedia and almost all of them are open source. The best one will depend upon your environment.

why doesn't *.abc match a file named .abc?

I thought I understood wildcards, till this happened to me. Essentially, I'm looking for a wild card pattern that would return all files that are not named .gitignore. I came up with this, which seems to work for all cases I could conjure:
ls *[!{gitignore}]
To really validate if this works, I thought I'd negate the expression and see if it returns the file named .gitignore (actually any file that ended with gitignore; so 1.gitignore should also be returned). To that effect, I thought the negated expression would be:
ls *[{gitignore}]
However, this expression doesn't return a files named .gitignore (although it returns a file named 1.gitignore).
Essentially, my question, after simplification, boils down to:
Why doesn't *.abc match a file that is named .abc
I think I can take it from there.
PS:
I am working on Mac OSX Lion (10.7.4)
I wanted to add a clause to .gitignore such that I would ignore every file, except .gitignore in a given folder. So I ended up adding * in the .gitignore file. Result was, git ended up ignoring .gitignore :)
From the numerous searches I've made on google - Use the asterisk character (*) to represent zero or more characters.
I assume you're using Bash. From the Bash manual:
When a pattern is used for filename expansion, the character ‘.’ at the start of a filename or immediately following a slash must be matched explicitly, unless the shell option dotglob is set.
.gitignore patterns, however, are treated differently:
Otherwise, git treats the pattern as a shell glob suitable for consumption by fnmatch(3) with the FNM_PATHNAME flag: wildcards in the pattern will not match a / in the pathname.
According to the fnmatch(3) docs, a leading dot has to be explicitly matched only if the FNM_PERIOD flag is set, so *gitignore as a gitignore pattern would match .gitignore.
There is an easier way to accomplish this, though. To have .gitignore ignore everything except .gitignore:
*
!.gitignore
If you want to ignore everything except the gitignore file, use this as the file:
*
!.gitignore
Lines starting with an exclamation point are interpreted as exceptions.

Makefile problem with files beginning with "#"

I have a directory "FS2" that contains the following files:
ARGH
this
that
I have a makefile with the following contents.
Template:sh= ls ./FS2/*
#all: $(Template)
echo "Template is: $(Template)"
touch all
When I run "clearmake -C sun" and the file "all" does not exist, I get the following output:
"Template is: ./FS2/#ARGH# ./FS2/that ./FS2/this"
Modifying either "this" or "that" does not cause "all" to be regenerated. When run with "-d" for debug, the "all" target is only dependent on the directory "./FS2", not the three files in the directory. I determined that when it expands "Template", the "#" gets treated as the beginning of a comment and the rest of the line is ignored!
The problem is caused by an editor that when killed leaves around files that begin with "#". If one of those files exists, then no modifications to files in the directory causes "all" to be regenerated.
Although, I do not want to make compilation dependent on whether a temporary file has been modified or not and will remove the file from the "Template" variable, I am still curious as to how to get this to work if I did want to treat the "#ARGH#" as a filename that the rule "all" is dependent on. Is this even possible?
I have a directory "FS2" that contains the following files: #ARGH# ...
Therein lies your problem. In my opinion, it is unwise using "funny" characters in filenames. Now I know that those characters are allowed but that doesn't make them a good idea (ASCII control characters like backspace are also allowed with similar annoying results).
I don't even like spaces in filenames, preferring instead SomethingLikeThis to show independent words in a file name, but at least the tools for handling spaces in many UNIX tools is known reasonably well.
My advice would be to rename the file if it was one of yours and save yourself some angst. But, since they're temporary files left around by an editor crash, delete them before your rules start running in the makefile. You probably shouldn't be rebuilding based on an editor temporary file anyway.
Or use a more targeted template like: Template:sh= ls ./FS2/[A-Za-z0-9]* to bypass those files altogether (that's an example only, you should ensure it doesn't faslely exclude files that should be included).
'#' is a valid Makefile comment char, so the second line is ignored by the make program.
Can you filter out (with grep) the files that start with # and process them separately?
I'm not familiar with clearmake, but try replacing your template definition with
Template:sh= ls ./FS2/* | grep -v '#'
so that filenames containing # are not included in $(Template).
If clearmake follows the same rules as GNU make, then you can also re-write your target using something like Template := $(wildcard *.c) which will be a little more intelligent about files with oddball names.
If I really want the file #ARGH# to contribute to whether the target all should be rebuilt as well as be included in the artifacts produced by the rule, the Makefile should be modified so that the line
Template:sh= ls ./FS2/*
is changed to
Template=./FS2/*
Template_files:sh= ls $(Template)
This works because $(Template) will be replaced by the literal string ./FS2/* after all and in the expansion of $(Template_files).
Clearmake (and GNU make) then use ./FS2/* as a pathname containing a wildcard when evaluating the dependencies, which expands in to the filenames ./FS2/#ARGH# ./FS2/that ./FS2/this and $(Template_files) can be used in the rules where a list of filenames is needed.

Resources