Ignoring file with similar names from compiling in Makefile

Ignoring file with similar names from compiling in Makefile - makefile

Suppose we have a directories two foo and bar. Where foo contains files example.S, run.c, exec.S and bar contains example.S
when I pass foo and bar as dependencies.
I want to compile only example.S in the directory bar by ignoring example.S in directory foo
DEP := foo bar
DIR := $(foreach $dirs, $(DEP), $(shell $(HOME) -find -type d -name $(dirs))),
would provide me the absolute path of the foo and bar directories.
FILES := $(foreach file, $(DIR), $(wildcard $(file)/*)), would provide me the list of files foo and bar as
foo/example.S foo/run.c foo/exec.S bar/example.S
I would like to check for multiple occurance of file with same name. and only compile the latest, i.e, instead of foo/example.S I like to compile bar/example.S...
I have no idea how to do it, As I am very new to creation of Makefile.

This is actually more of a shell question than a makefile one. What you want to do is create a shell command that generates two column list for each file -- the first column being the full filename, and the second being just the filename. Then sort it, and remove the ones with duplicate filenames, and then output only the first column.
Basically you would use:
DIRS = ./A ./B
files=`find $(DIRS) -type f -exec sh -c 'echo {} $$(basename {})' \; | sort -u --stable -k2,2 | awk '{print $$1}'`
You would have to replace the ./B ./A with a sorted list of directories you wanted to search in. Notice the $$'s -- Make resolves these to $ before running the shell command. The $(DIRS) only has a single $, so it is expanded before the command is executed.
Explanation:
find ./B ./A -type f -exec sh -c 'echo {} $(basename {})' \; : This searches the directories B and then A (in that order)
type -f specifies files only -- it won't return directories.
-exec sh -c 'echo {} $(basename {})' \; : for each file found, it will run the echo command where {} resolves to the file it found. It will therefore print the full file path, followed by just the filename.
sort -u --stable -k2,2: Sort the list of files based on column 2 (the filename). The -u means unique, so if there are matching files, it only prints the first. The --stable means that if there are two matching lines, it will always output the first one.
awk '{print $1}': print the first column of the output (the full path name).
You then have a list of unique filenames, including their directories in $(files)

There's a (relatively) simple way to get what you want, starting with reversing a list (with thanks to #simona). After you have calculated DIR your way,
reverse = $(if $(1),$(call reverse,$(wordlist 2,$(words $(1)),$(1)))) $(firstword $(1))
DIR := $(call reverse,$(DIR))
Then use VPATH:
VPATH = $(DIR)
Now you can tell Make to search for a file (such as example.S), and it will find the last instance (in this case .../bar/example.S):
example: example.S
#echo building $# from $^

Related

Create archive from difference of two folders

I have the following problem.
There are two nested folders A and B. They are mostly identical, but B has a few files that A does not. (These are two mounted rootfs images).
I want to create a shell script that does the following:
Find out which files are contained in B but not in A.
copy the files found in 1. from B and create a tar.gz that contains these files, keeping the folder structure.
The goal is to import the additional data from image B afterwards on an embedded system that contains the contents of image A.
For the first step I put together the following code snippet. Note to grep "Nur" : "Nur in" = "Only in" (german):
diff -rq <A> <B>/ 2>/dev/null | grep Nur | awk '{print substr($3, 1, length($3)-1) "/" substr($4, 1, length($4)-1)}'
The result is the output of the paths relative to folder B.
I have no idea how to implement the second step. Can someone give me some help?

Using diff for finding files which don't exist is severe overkill; you are doing a lot of calculations to compare the contents of the files, where clearly all you care about is whether a file name exists or not.
Maybe try this instead.
tar zcf newfiles.tar.gz $(comm -13 <(cd A && find . -type f | sort) <(cd B && find . -type f | sort) | sed 's/^\./B/')
The find commands produce a listing of the file name hierarchies; comm -13 extracts the elements which are unique to the second input file (which here isn't really a file at all; we are using the shell's process substitution facility to provide the input) and the sed command adds the path into B back to the beginning.
Passing a command substitution $(...) as the argument to tar is problematic; if there are a lot of file names, you will run into "command line too long", and if your file names contain whitespace or other irregularities in them, the shell will mess them up. The standard solution is to use xargs but using xargs tar cf will overwrite the output file if xargs ends up calling tar more than once; though perhaps your tar has an option to read the file names from standard input.

With find:
$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print
./c
./d
The idea is to use the exec action with a shell script that tests the existence of the current file in the other directory. There are a few subtleties:
The first argument of sh -c is the script to execute, the second (here _ but could be anything else) corresponds to the $0 positional parameter of the script and the third ({}) is the current file name as set by find and passed to the script as positional parameter $1.
The -print action at the end is needed, even if it is normally the default with find, because the use of -exec cancels this default.
Example of use to generate your tarball with GNU tar:
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print > ../list.txt
$ tar -c -v -f ../diff.tar --files-from=../list.txt
./c
./d
Note: if you have unusual file names the --verbatim-files-from GNU tar option can help. Or a combination of the -print0 action of find and the --null option of GNU tar.
Note: if the shell is POSIX (e.g., bash) you can also run find from the parent directory and get the path of the files relative from there, if you prefer:
$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ find B -type f -exec sh -c '[ ! -f A"${1#B}" ]' _ {} \; -print
B/c
B/d

Get names of directories containing any .py files

I have a list of Python files scattered over a directory structure that looks like this:
top\py\
html\
docs.html
foo\
v1\
a.py
b.py
bar\
v1\
c.py
d.py
baz\
aws\
sess.py
What I'm trying to here is extract the root directory names of any directories within top/py that contain Python files, so foo, bar and baz, using a shell script. Note that there may be files other than Python code here and I do not want those included in the output. I will admit that I'm somewhat of a novice at shell scripting and this is what I've come up with so far:
find top/py/. -name *py -exec echo {} \; -exec sh -c "echo $# | grep -o '[^/]*' | sed -n 3p" {} \;
However, the output isn't quite right. This is what I'm getting:
top/py/./foo/v1/a.py
foo
top/py/./foo/v1/b.py
foo
top/py/./bar/v1/c.py
foo
top/py/./bar/v1/d.py
foo
top/py/./baz/aws/sess.py
foo
It appears that the inner variable to the grep is not being updated but I'm not sure what to do about it. Any help would be appreciated.

If your files don't contain newlines in their names:
find top/py -name '*.py' | awk -F / '1; { print $3 }'
Otherwise:
find top/py -name '*.py' -exec sh -c '
for py_path; do
pk_path=${py_path%/"${py_path#*/*/*/}"}
pk_name=${pk_path##*/}
printf '\''%s\n'\'' "$py_path" "$pk_name"
done' sh {} +
For values other than 3, replace */*/*/ with as many */s as n.

Find all duplicate subdirectories in directory

I need to make a shell script that "lists all identical sub-directories (recursively) under the current working directory."
I'm new to shell scripts. How do I approach this?
To me, this means:
for each directory starting in some starting directory, compare it to every other directory it shares by name.
if the other directory has the same name, check size.
if same size also, recursively compare contents of each directory item by item, maybe by md5sum(?) and continuing to do so for each subdirectory within the directories (recursively?)
then, continue by recursively calling this on every subdirectory encountered
then, repeat for every other directory in the directory structure
It would have been the most complicated program I'd have ever written, so I assume I'm just not aware of some shell command to do most of it for me?
I.e., how should I have approached this? All the other parts were about googling until I discovered the shell command that did it 90% of it for me.
(For a previous assignment that I wasn't able to finish, took a zero on this part, need to know how to approach it in the future.)

I'd be surprised to hear that there is a special Unix tool or special usage of a standard Unix tool to do exactly what you describe. Maybe your understanding of the task is more complex than what the task giver intended. Maybe with "identical" something concerning linking was meant. Normally, hardlinking directories is not allowed, so this probably also isn't meant.
Anyway, I'd approach this task by creating checksums for all nodes in your tree, i. e. recursively:
For a directory take the names of all entries and their checksums (recursion) and compute a checksum of them,
for a plain file compute a checksum of its contents,
for symlinks and special files (devices, etc.) consider what you want (I'll leave this out).
After creating checksums for all elements, search for duplicates (by sorting a list of all and searching for consecutive lines).
A quick solution could be like this:
#!/bin/bash
dirchecksum() {
if [ -f "$1" ]
then
checksum=$(md5sum < "$1")
elif [ -d "$1" ]
then
checksum=$(
find "$1" -maxdepth 1 -printf "%P " \( ! -path "$1" \) \
-exec bash -c "dirchecksum {}" \; |
md5sum
)
fi
echo "$checksum"
echo "$checksum $1" 1>&3
}
export -f dirchecksum
list=$(dirchecksum "$1" 3>&1 1>/dev/null)
lastChecksum=''
while read checksum _ path
do
if [ "$checksum" = "$lastChecksum" ]
then
echo "duplicate found: $path = $lastPath"
fi
lastChecksum=$checksum
lastPath=$path
done < <(sort <<< "$list")
This script uses two tricks which might not be clear, so I mention them:
To pass a shell function to find -exec one can export -f it (done below it) and then call bash -c ... to execute it.
The shell function has two output streams, one for returning the result checksum (this is via stdout, i. e. fd 1), and one for giving out each checksum found on the way to this (this is via fd 3).
The sorting at the end uses the list given out via fd 3 as input.

Maybe something like this:
$ find -type d -exec sh -c "echo -n {}\ ; sh -c \"ls -s {}; basename {}\"|md5sum " \; | awk '$2 in a {print "Match:"; print a[$2], $1; next} a[$2]=$1{next}'
Match:
./bar/foo ./foo
find all directories: find -type d, output:
.
./bar
./bar/foo
./foo
ls -s {}; basename {} will print the simplified directory listing and the basename of the directory listed, for example for directory foo: ls -s foo; basename foo
total 0
0 test
foo
Those will cover the files in each dir, their sizes and the dir name. That output will be sent to md5sum and that along the dir:
. 674e2573b49826d4e32dfe81d9680369 -
./bar 4c2d588c5fa9781ad63ad8e86e575e01 -
./bar/foo ff8d1569685be86366f18ea89851db35 -
./foo ff8d1569685be86366f18ea89851db35 -
will be sent to awk:
$2 in a { # hash as array key
print "Match:" # separate hits in output
print a[$2], $1 # print matching dirscompared to
next # next record
}
a[$2]=$1 {next} # only first match is stored and
Test dir structure:
$ mkdir -p test/foo; mkdir -p test/bar/foo; touch test/foo/test; touch test/bar/foo/test
$ find test/
test/
test/bar
test/bar/foo
test/bar/foo/test # touch test
test/foo
test/foo/test # touch test

How do I get the files in a directory and all of its subdirectories in bash?

I'm working on a C kernel and I want to make it easier to compile all of the sources by using a bash script file. I need to know how to do a foreach loop and only get the files with a .c extension, and then get the filename of each file I find so I can make gcc compile each one.

Use find to walk through your tree
and then read the list it generates using while read:
find . -name \*.c | while read file
do
echo process $file
done
If the action that you want to do with file is not so complex
and can be expressed using ore or two commands, you can avoid while
and make all things with the find itself. For that you will use -exec:
find . -name \*.c -exec command {} \;
Here you write your command instead of command.
You can also use -execdir:
find . -name \*.c -execdir command {} \;
In this case command will be executed in the directory of found file (for each file that was found).

If you're using GNU make, you can do this using only make's built-in functions, which has the advantage of making it independent of the shell (but the disadvantage of being slower than find for large trees):
# Usage: $(call find-recursive,DIRECTORY,PATTERN)
find-recursive = \
$(foreach f,$(wildcard $(1)/*),\
$(if $(wildcard $(f)/.),\
$(call find-recursive,$(f),$(2)),\
$(filter $(2),$(f))))
all:
#echo $(call find-recursive,.,%.c)

sed in makefile, usage

I am in the process of learning makefile creation.
Current target is to scan the $(SOURCEDIR) for .c files and create (according to that lise of .c file paths in the format of /path/file.c) a nice $(VPATH) so I do not need to use recursive makefiles (aka pain in the a..).
Atm I am stuck with this, where $(SOURCETREE) will be empty on $(ECHO)
SOURCES := $(shell find $(SOURCEDIR) -name '*.c')
SOURCETREE := $(dir $(SOURCES))
SOURCETREE := $(shell $(ECHO) $(SOURCETREE) | $(SED) -e "s/[[:space:]]/\n/g" | uniq | $(SED) -e "s/\n/[[:space:]]/g");
Maybe I just do not get the point (got late again :/ )
Thanks for any help.
Note: In a bash shell it works perfectly on my linux workbox (I replace thevariables accordingly)
Note: I am no sed pro, so please explain if you do an voodoo with sed, thank you

Comments:
Remove the backticks. They are unnecessary; the $(shell) function already captures the command's output.
The echo/sed/uniq/sed chain can be replaced by ls/uniq. ls will print one file name per line so no need for the replace-and-undo song and dance.
It looks like you're losing the result from $(dir) because you use $(SOURCES) rather than $(SOURCETREE) in the third assignment.
Result:
SOURCES := $(shell find $(SOURCEDIR) -name '*.c')
SOURCETREE := $(dir $(SOURCES))
SOURCETREE := $(shell ls $(SOURCETREE) | uniq);
Actually, even this shortened shell invocation is unnecessary. The $(sort) function will sort a list of names and remove duplicates. From make's documentation: "Incidentally, since sort removes duplicate words, you can use it for this purpose even if you don't care about the sort order."
SOURCES := $(shell find $(SOURCEDIR) -name '*.c')
SOURCETREE := $(sort $(dir $(SOURCES)))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ignoring file with similar names from compiling in Makefile - makefile

Related

Create archive from difference of two folders

Get names of directories containing any .py files

Find all duplicate subdirectories in directory

How do I get the files in a directory and all of its subdirectories in bash?

sed in makefile, usage

Categories

Resources