Move files with whitespaces in their names to folders named like the files' modification dates - bash

I created a bunch of folders from modification dates, like these:
..
2012-11-29
2012-11-20
..
Now I want to move files into these folders, in case the have a modification date that equals the folders name. The files contain whitespace in their names.
If I run this I get a list that looks like the folder names:
find . -iname "*.pdf" -print0 | while read -d $'\0' file; do stat -c "%.10y" "$file"; done
How do I take this output and use it in a script that moves these files like (pseudocode):
find . -iname "*.pdf" -print0 | while read -d $'\0' file; do mv $<FILEWITHWHITESPACEINNAME> <FOLDERNAMEDLIKE $file stat -c "%.10y" "$file" > ; done

find . -iname "*.pdf" -print0 |
while IFS= read -r -d '' file; do
folder=$(stat -c "%.10y" -- "$file") || continue # store date in variable
[[ $file = ./$folder/$file ]] && continue # skip if already there
mkdir -p -- "$folder" || continue # ensure directory exists
mv -- "$file" "$folder/" # actually do the move
done
To read a NUL-delimited name correctly, use IFS= to avoid losing leading or trailing spaces, and -r to avoid losing backslashes in the filename. See BashFAQ #1.
Inside your shell loop, you have shell variables -- so you can use one of those to store the output of your stat command. See How do I set a variable to the output of a command in bash?
Using -- signifies end-of-arguments, so even if your find were replaced with something that could emit a path starting with a - instead of a ./, we wouldn't try to treat values in file as anything but positional arguments (filenames, in the context of stat, mkdir and mv).
Using || continue inside the loop means that if any step fails, we don't do the subsequent ones -- so we don't try to do the mkdir if the stat fails, and we don't try to do the mv if the mkdir fails.

Related

Copy files of specific type but prevent overwriting files

I need to copy all files of a specific type to a folder, which I'm doing like this:
find ./ -name '*.gql' -exec cp -prv '{}' '/path/to/dir/' ';'
But if there are two files with a identical name, although located in different subfolders, some files would be overwritten.
Is it possible to keep all files, which are copied? Maybe renaming the copied file or is it possible to keep the folder structure in the target directory?
c.f. https://ss64.com/osx/cp.html : Historic versions of the cp utility had a -r option...its use is strongly discouraged, as it does not correctly copy special files, symbolic links, or fifo's. You can use -n to prevent overwrites, but more complex logic will likely require custom code.
dest=/path/to/dir
while IFS= read -r -d '' filename # read null-delimited list from find
do basename="${filename##*/}"
if [[ -e "$dest/$basename" ]]
then ctr=0
while [[ -e "$dest/$basename.$((++ctr))" ]]; do : ; done # find available name
mv "$dest/$basename" "$dest/$basename.$ctr"
fi
cp -pv "$filename" "$dest/$basename"
done < <( find ./ -name '*.gql' -type f -print0 ) # null-delimit

Replace the complete filenames for files with their MD5 hash string of the content in bash

Problem:
I have a bunch of files in a folder,i want to rename all of them to the md5 of the content of the file.
What i tried:
This is the command i tried.
for i in $(find /home/admin/test -type f);do mv $i $(md5sum $i|cut -d" " -f 1);done
But this is failing after sometime with the error and only some files are getting renamed leaving rest untouched.
mv: missing destination file operand after /home/admin/test/help.txt
Try `mv --help' for more information.
Is the implementation correct? Am i doing something wrong in the script.
Make things simple by making use the glob patterns that the shell provides, instead of using external utilities like find. Also see Why you don't read lines with "for"
Navigate inside the folder /home/admin/test and do the following which should be sufficient
for file in *; do
[ -f "$file" ] || continue
md5sum -- "$file" | { read sum _; mv "$file" "$sum"; }
done
Try using echo inplace of mv first to check once if they files are renamed as expected.
To go to sub-directories below, which I assume would also be your requirement, enable globstar, which is one of the extended globing options provided by the shell to go deeper
shopt -s globstar
for file in **/*; do
If you want to recursively rename all files with their md5 hash, you could try this:
find /home/admin/test -type f -exec bash -c 'md5sum "$1" | while read s f; do mv "${f#*./}" "$(dirname ${f#*./})/$s"; done' _ {} \;
The hash and filename is given as argument into the s and f variables. The ${f#*./} removes the prefix added by md5sum and find commands.
Note that if some file have exact same content, it will end up with only 1 file.

Bash: how to copy multiple files with same name to multiple folders

I am working on Linux machine.
I have a lot of files named the same, with a directory structure like this:
P45_input_foo/result.dat
P45_input_bar/result.dat
P45_input_tar/result.dat
P45_input_cool/result.dat ...
It is difficult to copy them one by one. I want to copy them into another folder named as data with similar folder names and file names:
/data/foo/result.dat
/data/bar/result.dat
/data/tar/result.dat
/data/cool/result.dat ...
In stead of copy them one by one what I should do?
Using a for loop in bash :
# we list every files following the pattern : ./<somedirname>/<any file>
# if you want to specify a format for the folders, you could change it here
# i.e. for your case you could write 'for f in P45*/*' to only match folders starting by P45
for f in */*
do
# we strip the path of the file from its filename
# i.e. 'P45_input_foo/result.dat' will become 'P45_input_foo'
newpath="${f%/*}"
# mkdir -p /data/${newpath##*_} will create our new data structure
# - /data/${newpath##*_} extract the last chain of character after a _, in our example, 'foo'
# - mkdir -p will recursively create our structure
# - cp "$f" "$_" will copy the file to our new directory. It will not launch if mkdir returns an error
mkdir -p /data/${newpath##*_} && cp "$f" "$_"
done
the ${newpath##*_} and ${f%/*} usage are part of Bash string manipulation methods. You can read more about it here.
You will need to extract the 3rd item after "_" :
P45_input_foo --> foo
create the directory (if needed) and copy the file to it. Something like this (not tested, might need editing):
STARTING_DIR="/"
cd "$STARTING_DIR"
VAR=$(ls -1)
while read DIR; do
TARGET_DIR=$(echo "$DIR" | cut -d'_' -f3)
NEW_DIR="/data/$DIR"
if [ ! -d "$NEW_DIR" ]; then
mkdir "$NEW_DIR"
fi
cp "$DIR/result.dat" "$NEW_DIR/result.dat"
if [ $? -ne 0 ];
echo "ERROR: encountered an error while copying"
fi
done <<<"$VAR"
Explanation: assuming all the paths you've mentioned are under root / (if not change STARTING_PATH accordingly). With ls you get the list of the directories, store the output in VAR. Pass the content of VAR to the while loop.
A bit of find and with a few bash tricks, the below script could do the trick for you. Remember to run the script without the mv and see if "/data/"$folder"/" is the actual path that you want to move the file(s).
#!/bin/bash
while IFS= read -r -d '' file
do
fileNew="${file%/*}" # Everything before the last '\'
fileNew="${fileNew#*/}" # Everything after the last '\'
IFS="_" read _ _ folder <<<"$fileNew"
mv -v "$file" "/data/"$folder"/"
done < <(find . -type f -name "result.dat" -print0)

How to loop through file names returned by find?

x=$(find . -name "*.txt")
echo $x
if I run the above piece of code in Bash shell, what I get is a string containing several file names separated by blank, not a list.
Of course, I can further separate them by blank to get a list, but I'm sure there is a better way to do it.
So what is the best way to loop through the results of a find command?
TL;DR: If you're just here for the most correct answer, you probably want my personal preference (see the bottom of this post):
# execute `process` once for each file
find . -name '*.txt' -exec process {} \;
If you have time, read through the rest to see several different ways and the problems with most of them.
The full answer:
The best way depends on what you want to do, but here are a few options. As long as no file or folder in the subtree has whitespace in its name, you can just loop over the files:
for i in $x; do # Not recommended, will break on whitespace
process "$i"
done
Marginally better, cut out the temporary variable x:
for i in $(find -name \*.txt); do # Not recommended, will break on whitespace
process "$i"
done
It is much better to glob when you can. White-space safe, for files in the current directory:
for i in *.txt; do # Whitespace-safe but not recursive.
process "$i"
done
By enabling the globstar option, you can glob all matching files in this directory and all subdirectories:
# Make sure globstar is enabled
shopt -s globstar
for i in **/*.txt; do # Whitespace-safe and recursive
process "$i"
done
In some cases, e.g. if the file names are already in a file, you may need to use read:
# IFS= makes sure it doesn't trim leading and trailing whitespace
# -r prevents interpretation of \ escapes.
while IFS= read -r line; do # Whitespace-safe EXCEPT newlines
process "$line"
done < filename
read can be used safely in combination with find by setting the delimiter appropriately:
find . -name '*.txt' -print0 |
while IFS= read -r -d '' line; do
process "$line"
done
For more complex searches, you will probably want to use find, either with its -exec option or with -print0 | xargs -0:
# execute `process` once for each file
find . -name \*.txt -exec process {} \;
# execute `process` once with all the files as arguments*:
find . -name \*.txt -exec process {} +
# using xargs*
find . -name \*.txt -print0 | xargs -0 process
# using xargs with arguments after each filename (implies one run per filename)
find . -name \*.txt -print0 | xargs -0 -I{} process {} argument
find can also cd into each file's directory before running a command by using -execdir instead of -exec, and can be made interactive (prompt before running the command for each file) using -ok instead of -exec (or -okdir instead of -execdir).
*: Technically, both find and xargs (by default) will run the command with as many arguments as they can fit on the command line, as many times as it takes to get through all the files. In practice, unless you have a very large number of files it won't matter, and if you exceed the length but need them all on the same command line, you're SOL find a different way.
What ever you do, don't use a for loop:
# Don't do this
for file in $(find . -name "*.txt")
do
…code using "$file"
done
Three reasons:
For the for loop to even start, the find must run to completion.
If a file name has any whitespace (including space, tab or newline) in it, it will be treated as two separate names.
Although now unlikely, you can overrun your command line buffer. Imagine if your command line buffer holds 32KB, and your for loop returns 40KB of text. That last 8KB will be dropped right off your for loop and you'll never know it.
Always use a while read construct:
find . -name "*.txt" -print0 | while read -d $'\0' file
do
…code using "$file"
done
The loop will execute while the find command is executing. Plus, this command will work even if a file name is returned with whitespace in it. And, you won't overflow your command line buffer.
The -print0 will use the NULL as a file separator instead of a newline and the -d $'\0' will use NULL as the separator while reading.
find . -name "*.txt"|while read fname; do
echo "$fname"
done
Note: this method and the (second) method shown by bmargulies are safe to use with white space in the file/folder names.
In order to also have the - somewhat exotic - case of newlines in the file/folder names covered, you will have to resort to the -exec predicate of find like this:
find . -name '*.txt' -exec echo "{}" \;
The {} is the placeholder for the found item and the \; is used to terminate the -exec predicate.
And for the sake of completeness let me add another variant - you gotta love the *nix ways for their versatility:
find . -name '*.txt' -print0|xargs -0 -n 1 echo
This would separate the printed items with a \0 character that isn't allowed in any of the file systems in file or folder names, to my knowledge, and therefore should cover all bases. xargs picks them up one by one then ...
Filenames can include spaces and even control characters. Spaces are (default) delimiters for shell expansion in bash and as a result of that x=$(find . -name "*.txt") from the question is not recommended at all. If find gets a filename with spaces e.g. "the file.txt" you will get 2 separated strings for processing, if you process x in a loop. You can improve this by changing delimiter (bash IFS Variable) e.g. to \r\n, but filenames can include control characters - so this is not a (completely) safe method.
From my point of view, there are 2 recommended (and safe) patterns for processing files:
1. Use for loop & filename expansion:
for file in ./*.txt; do
[[ ! -e $file ]] && continue # continue, if file does not exist
# single filename is in $file
echo "$file"
# your code here
done
2. Use find-read-while & process substitution
while IFS= read -r -d '' file; do
# single filename is in $file
echo "$file"
# your code here
done < <(find . -name "*.txt" -print0)
Remarks
on Pattern 1:
bash returns the search pattern ("*.txt") if no matching file is found - so the extra line "continue, if file does not exist" is needed. see Bash Manual, Filename Expansion
shell option nullglob can be used to avoid this extra line.
"If the failglob shell option is set, and no matches are found, an error message is printed and the command is not executed." (from Bash Manual above)
shell option globstar: "If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match." see Bash Manual, Shopt Builtin
other options for filename expansion: extglob, nocaseglob, dotglob & shell variable GLOBIGNORE
on Pattern 2:
filenames can contain blanks, tabs, spaces, newlines, ... to process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL. see also Gnu Findutils Manpage, Unsafe File Name Handling, safe File Name Handling, unusual characters in filenames. See David A. Wheeler below for detailed discussion of this topic.
There are some possible patterns to process find results in a while loop. Others (kevin, David W.) have shown how to do this using pipes:
files_found=1
find . -name "*.txt" -print0 |
while IFS= read -r -d '' file; do
# single filename in $file
echo "$file"
files_found=0 # not working example
# your code here
done
[[ $files_found -eq 0 ]] && echo "files found" || echo "no files found"
When you try this piece of code, you will see, that it does not work: files_found is always "true" & the code will always echo "no files found". Reason is: each command of a pipeline is executed in a separate subshell, so the changed variable inside the loop (separate subshell) does not change the variable in the main shell script. This is why I recommend using process substitution as the "better", more useful, more general pattern.See I set variables in a loop that's in a pipeline. Why do they disappear... (from Greg's Bash FAQ) for a detailed discussion on this topic.
Additional References & Sources:
Gnu Bash Manual, Pattern Matching
Filenames and Pathnames in Shell: How to do it Correctly, David A. Wheeler
Why you don't read lines with "for", Greg's Wiki
Why you shouldn't parse the output of ls(1), Greg's Wiki
Gnu Bash Manual, Process Substitution
(Updated to include #Socowi's execellent speed improvement)
With any $SHELL that supports it (dash/zsh/bash...):
find . -name "*.txt" -exec $SHELL -c '
for i in "$#" ; do
echo "$i"
done
' {} +
Done.
Original answer (shorter, but slower):
find . -name "*.txt" -exec $SHELL -c '
echo "$0"
' {} \;
If you can assume the file names don't contain newlines, you can read the output of find into a Bash array using the following command:
readarray -t x < <(find . -name '*.txt')
Note:
-t causes readarray to strip newlines.
It won't work if readarray is in a pipe, hence the process substitution.
readarray is available since Bash 4.
Bash 4.4 and up also supports the -d parameter for specifying the delimiter. Using the null character, instead of newline, to delimit the file names works also in the rare case that the file names contain newlines:
readarray -d '' x < <(find . -name '*.txt' -print0)
readarray can also be invoked as mapfile with the same options.
Reference: https://mywiki.wooledge.org/BashFAQ/005#Loading_lines_from_a_file_or_stream
# Doesn't handle whitespace
for x in `find . -name "*.txt" -print`; do
process_one $x
done
or
# Handles whitespace and newlines
find . -name "*.txt" -print0 | xargs -0 -n 1 process_one
I like to use find which is first assigned to variable and IFS switched to new line as follow:
FilesFound=$(find . -name "*.txt")
IFSbkp="$IFS"
IFS=$'\n'
counter=1;
for file in $FilesFound; do
echo "${counter}: ${file}"
let counter++;
done
IFS="$IFSbkp"
As commented by #Konrad Rudolph this will not work with "new lines" in file name. I still think it is handy as it covers most of the cases when you need to loop over command output.
As already posted on the top answer by Kevin, the best solution is to use a for loop with bash glob, but as bash glob is not recursive by default, this can be fixed by a bash recursive function:
#!/bin/bash
set -x
set -eu -o pipefail
all_files=();
function get_all_the_files()
{
directory="$1";
for item in "$directory"/* "$directory"/.[^.]*;
do
if [[ -d "$item" ]];
then
get_all_the_files "$item";
else
all_files+=("$item");
fi;
done;
}
get_all_the_files "/tmp";
for file_path in "${all_files[#]}"
do
printf 'My file is "%s"\n' "$file_path";
done;
Related questions:
Bash loop through directory including hidden file
Recursively list files from a given directory in Bash
ls command: how can I get a recursive full-path listing, one line per file?
List files recursively in Linux CLI with path relative to the current directory
Recursively List all directories and files
bash script, create array of all files in a directory
How can I creates array that contains the names of all the files in a folder?
How can I creates array that contains the names of all the files in a folder?
How to get the list of files in a directory in a shell script?
based on other answers and comment of #phk, using fd #3:
(which still allows to use stdin inside the loop)
while IFS= read -r f <&3; do
echo "$f"
done 3< <(find . -iname "*filename*")
You can put the filenames returned by find into an array like this:
array=()
while IFS= read -r -d ''; do
array+=("$REPLY")
done < <(find . -name '*.txt' -print0)
Now you can just loop through the array to access individual items and do whatever you want with them.
Note: It's white space safe.
I think using this piece of code (piping the command after while done):
while read fname; do
echo "$fname"
done <<< "$(find . -name "*.txt")"
is better than this answer because while loop is executed in a subshell according to here, if you use this answer and variable changes cannot be seen after while loop if you want to modify variables inside the loop.
You can store your find output in array if you wish to use the output later as:
array=($(find . -name "*.txt"))
Now to print the each element in new line, you can either use for loop iterating to all the elements of array, or you can use printf statement.
for i in ${array[#]};do echo $i; done
or
printf '%s\n' "${array[#]}"
You can also use:
for file in "`find . -name "*.txt"`"; do echo "$file"; done
This will print each filename in newline
To only print the find output in list form, you can use either of the following:
find . -name "*.txt" -print 2>/dev/null
or
find . -name "*.txt" -print | grep -v 'Permission denied'
This will remove error messages and only give the filename as output in new line.
If you wish to do something with the filenames, storing it in array is good, else there is no need to consume that space and you can directly print the output from find.
function loop_through(){
length_="$(find . -name '*.txt' | wc -l)"
length_="${length_#"${length_%%[![:space:]]*}"}"
length_="${length_%"${length_##*[![:space:]]}"}"
for i in {1..$length_}
do
x=$(find . -name '*.txt' | sort | head -$i | tail -1)
echo $x
done
}
To grab the length of the list of files for loop, I used the first command "wc -l".
That command is set to a variable.
Then, I need to remove the trailing white spaces from the variable so the for loop can read it.
find <path> -xdev -type f -name *.txt -exec ls -l {} \;
This will list the files and give details about attributes.
Another alternative is to not use bash, but call Python to do the heavy lifting. I recurred to this because bash solutions as my other answer were too slow.
With this solution, we build a bash array of files from inline Python script:
#!/bin/bash
set -eu -o pipefail
dsep=":" # directory_separator
base_directory=/tmp
all_files=()
all_files_string="$(python3 -c '#!/usr/bin/env python3
import os
import sys
dsep="'"$dsep"'"
base_directory="'"$base_directory"'"
def log(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
def check_invalid_characther(file_path):
for thing in ("\\", "\n"):
if thing in file_path:
raise RuntimeError(f"It is not allowed {thing} on \"{file_path}\"!")
def absolute_path_to_relative(base_directory, file_path):
relative_path = os.path.commonprefix( [ base_directory, file_path ] )
relative_path = os.path.normpath( file_path.replace( relative_path, "" ) )
# if you use Windows Python, it accepts / instead of \\
# if you have \ on your files names, rename them or comment this
relative_path = relative_path.replace("\\", "/")
if relative_path.startswith( "/" ):
relative_path = relative_path[1:]
return relative_path
for directory, directories, files in os.walk(base_directory):
for file in files:
local_file_path = os.path.join(directory, file)
local_file_name = absolute_path_to_relative(base_directory, local_file_path)
log(f"local_file_name {local_file_name}.")
check_invalid_characther(local_file_name)
print(f"{base_directory}{dsep}{local_file_name}")
' | dos2unix)";
if [[ -n "$all_files_string" ]];
then
readarray -t temp <<< "$all_files_string";
all_files+=("${temp[#]}");
fi;
for item in "${all_files[#]}";
do
OLD_IFS="$IFS"; IFS="$dsep";
read -r base_directory local_file_name <<< "$item"; IFS="$OLD_IFS";
printf 'item "%s", base_directory "%s", local_file_name "%s".\n' \
"$item" \
"$base_directory" \
"$local_file_name";
done;
Related:
os.walk without hidden folders
How to do a recursive sub-folder search and return files in a list?
How to split a string into an array in Bash?
How about if you use grep instead of find?
ls | grep .txt$ > out.txt
Now you can read this file and the filenames are in the form of a list.

How do I copy directory structure containing placeholders

I have the situation, where a template directory - containing files and links (!) - needs to be copied recursively to a destination directory, preserving all attributes. The template directory contains any number of placeholders (__NOTATION__), that need to be renamed to certain values.
For example template looks like this:
./template/__PLACEHOLDER__/name/__PLACEHOLDER__/prog/prefix___FILENAME___blah.txt
Destination becomes like this:
./destination/project1/name/project1/prog/prefix_customer_blah.txt
What I tried so far is this:
# first create dest directory structure
while read line; do
dest="$(echo "$line" | sed -e 's#__PLACEHOLDER__#project1#g' -e 's#__FILENAME__#customer#g' -e 's#template#destination#')"
if ! [ -d "$dest" ]; then
mkdir -p "$dest"
fi
done < <(find ./template -type d)
# now copy files
while read line; do
dest="$(echo "$line" | sed -e 's#__PLACEHOLDER__#project1#g' -e 's#__FILENAME__#customer#g' -e 's#template#destination#')"
cp -a "$line" "$dest"
done < <(find ./template -type f)
However, I realized that if I want to take care about permissions and links, this is going to be endless and very complicated. Is there a better way to replace __PLACEHOLDER__ with "value", maybe using cp, find or rsync?
I suspect that your script will already do what you want, if only you replace
find ./template -type f
with
find ./template ! -type d
Otherwise, the obvious solution is to use cp -a to make an "archive" copy of the template, complete with all links, permissions, etc, and then rename the placeholders in the copy.
cp -a ./template ./destination
while read path; do
dir=`dirname "$path"`
file=`basename "$path"`
mv -v "$path" "$dir/${file//__PLACEHOLDER__/project1}"
done < <(`find ./destination -depth -name '*__PLACEHOLDER__*'`)
Note that you'll want to use -depth or else renaming files inside renamed directories will break.
If it's very important to you that the directory tree is created with the names already changed (i.e. you must never see placeholders in the destination), then I'd recommend simply using an intermediate location.
First copy with rsync, preserving all the properties and links etc.
Then change the placeholder strings in the destination filenames:
#!/bin/bash
TEMPL="$PWD/template" # somewhere else
DEST="$PWD/dest" # wherever it is
mkdir "$DEST"
(cd "$TEMPL"; rsync -Hra . "$DEST") #
MyRen=$(mktemp)
trap "rm -f $MyRen" 0 1 2 3 13 15
cat >$MyRen <<'EOF'
#!/bin/bash
fn="$1"
newfn="$(echo "$fn" | sed -e 's#__PLACEHOLDER__#project1#g' -e s#__FILENAME__#customer#g' -e 's#template#destination#')"
test "$fn" != "$newfn" && mv "$fn" "$newfn"
EOF
chmod +x $MyRen
find "$DEST" -depth -execdir $MyRen {} \;

Resources