I am flattening a directory of nested folders/picture files down to a single folder. I want to move all of the nested files up to the root level.
There are 3,381 files (no directories included in the count). I calculate this number using these two commands and subtracting the directory count (the second command):
find ./ | wc -l
find ./ -type d | wc -l
To flatten, I use this command:
find ./ -mindepth 2 -exec mv -i -v '{}' . \;
Problem is that when I get a count after running the flatten command, my count is off by 46. After going through the list of files before and after (I have a backup), I found that the mv command is overwriting files sometimes even though I'm using -i.
Here's details from the log for one of these files being overwritten...
.//Vacation/CIMG1075.JPG -> ./CIMG1075.JPG
..more log
..more log
..more log
.//dog pics/CIMG1075.JPG -> ./CIMG1075.JPG
So I can see that it is overwriting. I thought -i was supposed to stop this. I also tried a -n and got the same number. Note, I do have about 150 duplicate filenames. Was going to manually rename after I flattened everything I could.
Is it a timing issue?
Is there a way to resolve?
NOTE: it is prompting me that some of the files are overwrites. On those prompts I just press Enter so as not to overwrite. In the case above, there is no prompt. It just overwrites.
Apparently the manual entry clearly states:
The -n and -v options are non-standard and their use in scripts is not recommended.
In other words, you should mimic the -n option yourself. To do that, just check if the file exists and act accordingly. In a shell script where the file is supplied as the first argument, this could be done as follows:
[ -f "${1##*/}" ]
The file, as first argument, contains directories which can be stripped using ##*/. Now simply execute the mv using ||, since we want to execute when the file doesn't exist.
[ -f "${1##*/}" ] || mv "$1" .
Using this, you can edit your find command as follows:
find ./ -mindepth 2 -exec bash -c '[ -f "${0##*/}" ] || mv "$0" .' '{}' \;
Note that we now use $0 because of the bash -c usage. It's first argument, $0, can't be the script name because we have no script. This means the argument order is shifted with respect to a usual shell script.
Why not check if file exists, prior move? Then you can leave the file where it is or you can rename it or do something else...
Test -f or, [] should do the trick?
I am on tablet and can not easyly include the source.
Related
I have a directory which has 70000 xml files in it. Each file has a tag which looks something like this, for the sake of simplicity:
<ns2:apple>, <ns2:orange>, <ns2:grapes>, <ns2:melon>. Each file has only one fruit tag, i.e. there cannot be both apple and orange in the same file.
I would like rename every file (add "1_" before the beginning of each filename) which has one of: <ns2:apple>, <ns2:orange>, <ns2:melon> inside of it.
I can find such files with egrep:
egrep -r '<ns2:apple>|<ns2:orange>|<ns2:melon>'
So how would it look as a bash script, which I can then user as a cron job?
P.S. Sorry I don't have any bash script draft, I have very little experience with it and the time is of the essence right now.
This may be done with this script:
#!/bin/sh
find /path/to/directory/with/xml -type f | while read f; do
grep -q -E '<ns2:apple>|<ns2:orange>|<ns2:melon>' "$f" && mv "$f" "1_${f}"
done
But it will rescan the directory each time it runs and append 1_ to each file containing one of your tags. This means a lot of excess IO and files with certain tags will be getting 1_ prefix each run, resulting in names like 1_1_1_1_file.xml.
Probably you should think more on design, e.g. move processed files to two directories based on whether file has certain tags or not:
#!/bin/sh
# create output dirs
mkdir -p /path/to/directory/with/xml/with_tags/ /path/to/directory/with/xml/without_tags/
find /path/to/directory/with/xml -maxdepth 1 -mindepth 1 -type f | while read f; do
if grep -q -E '<ns2:apple>|<ns2:orange>|<ns2:melon>'; then
mv "$f" /path/to/directory/with/xml/with_tags/
else
mv "$f" /path/to/directory/with/xml/without_tags/
fi
done
Run this command as a dry run, then remove --dry_run to actually rename the files:
grep -Pl '(<ns2:apple>|<ns2:orange>|<ns2:melon>)' *.xml | xargs rename --dry-run 's/^/1_/'
The command-line utility rename comes in many flavors. Most of them should work for this task. I used the rename version 1.601 by Aristotle Pagaltzis. To install rename, simply download its Perl script and place into $PATH. Or install rename using conda, like so:
conda install rename
Here, grep uses the following options:
-P : Use Perl regexes.
-l : Suppress normal output; instead print the name of each input file from which output would normally have been printed.
SEE ALSO:
grep manual
I can use the cp or mv command to copy/mv files to a new folder manually but while in a for loop, it fails.
I've tried various ways of doing this and none seem to work. The most frustrating part is it works when run locally.
A simple version of what I'm trying to do is shown below:
#!bin/bash
#Define path variables
source_dir=/home/me/loop
destination_dir=/home/me/loop/new
#Change working dir
cd "$source_dir"
#Step through source_dir for each .txt. file
for f in *.txt
do
# If the txt file was modified within the last 300 minutes...
if [[ $(find "$f" -mmin -300) ]]
then
# Add breaks for any spaces in filenames
f="${f// /\\ }"
# Copy file to destination
cp "$source_dir/$f $destination_dir/"
fi
done
Error message is:
cp: missing destination file operand after '/home/me/loop/first\ second.txt /home/me/loop/new/'
Try 'cp --help' for more information.
However, I can manually run:
mv /home/me/loop/first\ second.txt /home/me/loop/new/
and it works fine.
I get the same error using cp and similar errors using rsync so I'm not sure what I'm doing wrong...
cp "$source_dir/$f $destination_dir/"
When you surround both arguments with double quotes you turn them into one argument with an embedded space. Quote them separately.
cp "$source_dir/$f" "$destination_dir/"
There's no do anything special for spaces beforehand. The quoting already ensures files with whitespace are handled correctly.
# Add breaks for any spaces in filenames
f="${f// /\\ }"
Let's take a step back, though. Looping over all *.txt files and then checking each one with find is overly complicated. find already loops over multiple files and does arbitrary things to those files. You can do everything in this script in a single find command.
#!bin/bash
source_dir=/home/me/loop
destination_dir=/home/me/loop/new
find "$source_dir" -name '*.txt' -mmin -300 -exec cp -t "$destination_dir" {} +
You need to divide it in to two strings, like this:
cp "$source_dir/$f" "$destination_dir/"
by having as one you are basically telling cp that the entire line is the first parameter, where it is actually two (source and destination).
Edit: As #kamil-cuk and #aaron states there are better ways of doing what you try to do. Please read their comments
I am writing a bash script and want it to tell me if the names of the files in a directory appear in a text file and if not, remove them.
Something like this:
counter = 1
numFiles = ls -1 TestDir/ | wc -l
while [$counter -lt $numFiles]
do
if [file in TestDir/ not in fileNames.txt]
then
rm file
fi
((counter++))
done
So what I need help with is the if statement, which is still pseudo-code.
You can simplify your script logic a lot :
#/bin/bash
# for loop to iterate over all files in the testdir
for file in TestDir/*
do
# if grep exit code is 1 (file not found in the text document), we delete the file
[[ ! $(grep -x "$file" fileNames.txt &> /dev/null) ]] && rm "$file"
done
It looks like you've got a solution that works, but I thought I'd offer this one as well, as it might still be of help to you or someone else.
find /Path/To/TestDir -type f ! -name '.*' -exec basename {} + | grep -xvF -f /Path/To/filenames.txt"
Breakdown
find: This gets file paths in the specified directory (which would be TestDir) that match the given criteria. In this case, I've specified it return only regular files (-type f) whose names don't start with a period (-name '.*'). It then uses its own builtin utility to execute the next command:
basename: Given a file path (which is what find spits out), it will return the base filename only, or, more specifically, everything after the last /.
|: This is a command pipe, that takes the output of the previous command to use as input in the next command.
grep: This is a regular-expression matching utility that, in this case, is given two lists of files: one fed in through the pipe from find—the files of your TestDir directory; and the files listed in filenames.txt. Ordinarily, the filenames in the text file would be used to match against filenames returned by find, and those that match would be given as the output. However, the -v flag inverts the matching process, so that grep returns those filenames that do not match.
What results is a list of files that exist in the directory TestDir, but do not appear in the filenames.txt file. These are the files you wish to delete, so you can simply use this line of code inside a parameter expansion $(...) to supply rm with the files it's able to delete.
The full command chain—after you cd into TestDir—looks like this:
rm $(find . -type f ! -name '.*' -exec basename {} + | grep -xvF -f filenames.txt")
The history of this problem is:
I have millions of files and directories on a NAS system. I found a count of 1,095,601 empty (0 byte) files. These files used to have data but were destroyed by a predecessor not using the correct toolsets to migrate data between an XSAN and this Isilon NAS.
The files were media production data, like fonts, pdfs and image files. They are no longer useful beyond the history of their existence. Before I proceed to delete them, the production user's need a record of which files used to exist, so when they browse a project folder, they can use the unaffected files but then refer to a text file in the same directory which records which files used to also be there and thus provide reason as to why certain reference files are broken.
So how do I find files across multiple directories and delete them but first output their filename to a text file which would be saved to each relevant path location?
I am thinking along the lines of:
for file in $(find . -type f -size 0); do
echo "$file" >> /PATH/TO/FOUND/FILE/PARENT/DIR/deletedFiles.txt -print0 |
xargs -0 rm ;
done
To delete each empty file while leaving behind a file called deletedFiles.txt which contains the names of the deleted files, try:
PATH=/bin:/usr/bin find . -empty -type f -execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} + -delete
How it works
PATH=/bin:/usr/bin
This sets a temporary but secure path.
find .
This starts find looking in the current directory
-empty
This tells find to only look for empty files
-type f
This restricts find to looking for regular files.
-execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
In each directory that contains an empty file, this adds the name of each empty file to the file deletedFiles.txt.
Notice the peculiar use of none in the command:
bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
When this command is run, bash will execute the string printf "%s\n" "$#" >>deletedFiles.txt and the arguments that follow that string are assigned to the positional parameters: $0, $1, $2, etc. When we use $#, it does not include $0. It, as is usual, expands to $1, $2, .... Thus, we add the placeholder none so that the placeholder is assigned is the $0, which we will ignore, and the complete list of file names are assigned to "$#".
-delete
This deletes each empty file.
Why not simply
find . -type f -size 0 -exec rm -v + |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
If your find is too old to support -exec ... + you'll need to revert to -exec rm -v {} \; or refactor to
find . -type f -size 0 -print0 |
xargs -r -0 rm -v |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
The brief sed script is to postprocess the output from rm -v which looks like
removed ‘./bar’
removed ‘./foo’
(with some funny quote characters around the file name) on my system. If you are fine with that output, of course, just omit the sed script from the pipeline.
If you know in advance which directories contain empty files, you can run the above snippet individually in those directories. Assuming you saved the snippet above as a script (with a proper shebang and execute permissions) named find-empty, you could simply use
for path in /path/to/first /path/to/second/directory /path/to/etc; do
cd "$path" && find-empty
done
This will only work if you have absolute paths (if not, you can run the body of the loop in a subshell by adding parentheses around it).
If you want to inspect all the directories in a tree, change the script to print to standard output instead (remove >deletedFiles.txt from the script) and try something like
find /path/to/tree -type d -exec sh -c '
t=$(mktemp -t find-emptyXXXXXXXX)
cd "$1" &&
find-empty | grep . >"$t" &&
mv "$t" deletedFiles.txt ||
rm "$t"' _ {} \;
This uses a temporary file so as to avoid updating the timestamp of directories which do not contain any empty files. The grep . is used purely for side effect; if any (non-empty) lines are printed, it will return success, whereas otherwise, it will report failure; this way, we know whether or not to move the temporary file to the target directory.
With prompting from #JonathanLeffler I have succeeded with the following:
#!/bin/bash
## call this script with: find . -type f -empty -exec handleEmpty.sh {} +
for file in "$#"
do
file2="$(basename "$file")"
echo "$file2" >> "$(dirname "$file")"/deletedFiles.txt
rm "$file"
done
This means I retain a trace of the removed files in a deletedFiles.txt flag file in each respective directory for the users to see when files are missing. That way, they can pursue going back to archive CD's to retrieve these deleted files, which are hopefully not 0 byte files.
Thanks to #John1024 for the suggestion of using the empty flag rather than size.
I was thinking if using a BASH script is possible without manually copying each file that is in this parent directory
"/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk
/System/Library/PrivateFrameworks"
So in this folder PrivateFrameworks, there are many subfolders and in each subfolder it consists of the file that I would like to copy it out to another location. So the structure of the path looks like this:
-PrivateFrameworks
-AccessibilityUI.framework
-AccessibilityUI <- copy this
-AccountSettings.framework
-AccountSettings <- copy this
I do not want the option of copying the entire content in the folder as there might be cases where the folders contain files which I do not want to copy. So the only way I thought of is to copy by the file extension. However as you can see, the files which I specified for copying does not have an extension(I think?). I am new to bash scripting so I am not familiar if this can be done with it.
To copy all files in or below the current directory that do not have extensions, use:
find . ! -name '*.*' -exec cp -t /your/destination/dir/ {} +
The find . command looks for all files in or below the current directory. The argument -name '*.*' would restrict that search to files that have extensions. By preceding it with a not (!), however, we get all files that do not have an extension. Then, -exec cp -t /your/destination/dir/ {} + tells find to copy those files to the destination.
To do the above starting in your directory with the long name, use:
find "/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk/System/Library/PrivateFrameworks" ! -name '*.*' -exec cp -t /your/destination/dir/ {} +
UPDATE: The unix tag on this question has been removed and replaced with a OSX tag. That means we can't use the -t option on cp. The workaround is:
find . ! -name '*.*' -exec cp {} /your/destination/dir/ \;
This is less efficient because a new cp process is created for every file moved instead of once for all the files that fit on a command line. But, it will accomplish the same thing.
MORE: There are two variations of the -exec clause of a find command. In the first use above, the clause ended with {} + which tells find to fill up the end of command line with as many file names as will fit on the line.
Since OSX lacks cp -t, however, we have to put the file name in the middle of the command. So, we put {} where we want the file name and then, to signal to find where the end of the exec command is, we add a semicolon. There is a trick, though. Because bash would normally consume the semicolon itself rather than pass it on to find, we have to escape the semicolon with a backslash. That way bash gives it to the find command.
sh SCRIPT.sh copy-from-directory .extension copy-to-directory
FROM_DIR=$1
EXTENSION=$2
TO_DIR=$3
USAGE="""Usage: sh SCRIPT.sh copy-from-directory .extension copy-to-directory
- EXAMPLE: sh SCRIPT.sh PrivateFrameworks .framework .
- NOTE: 'copy-to-directory' argument is optional
"""
## print usage if less than 2 args
if [[ $# < 2 ]]; then echo "${USAGE}" && exit 1 ; fi
## set copy-to-dir default args
if [[ -z "$TO_DIR" ]] ; then TO_DIR=$PWD ; fi
## DO SOMETHING...
## find directories; find target file;
## copy target file to copy-to-dir if file exist
find $FROM_DIR -type d | while read DIR ; do
FILE_TO_COPY=$(echo $DIR | xargs basename | sed "s/$EXTENSION//")
if [[ -f $DIR/$FILE_TO_COPY ]] ; then
cp $DIR/$FILE_TO_COPY $TO_DIR
fi
done